100% found this document useful (1 vote)
317 views201 pages

E. Kowalski - Arithmetic Randonn Ee An Introduction To Probabilistic Number Theory (2021)

This document provides an introduction to probabilistic number theory. It begins with an overview of how probability is linked to number theory and provides some introductory examples, including the distribution of integers in arithmetic progressions and the distribution of the Euler function. The book is then outlined, with chapters covering classical probabilistic number theory, the distribution of values of the Riemann zeta function, the Chebychev bias, the shape of exponential sums, and further topics. Appendices provide background on analysis, probability, and number theory. The style emphasizes probabilistic aspects and interactions between probability and number theory.

Uploaded by

Noe Martinez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
317 views201 pages

E. Kowalski - Arithmetic Randonn Ee An Introduction To Probabilistic Number Theory (2021)

This document provides an introduction to probabilistic number theory. It begins with an overview of how probability is linked to number theory and provides some introductory examples, including the distribution of integers in arithmetic progressions and the distribution of the Euler function. The book is then outlined, with chapters covering classical probabilistic number theory, the distribution of values of the Riemann zeta function, the Chebychev bias, the shape of exponential sums, and further topics. Appendices provide background on analysis, probability, and number theory. The style emphasizes probabilistic aspects and interactions between probability and number theory.

Uploaded by

Noe Martinez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 201

Arithmetic Randonnée

An introduction to probabilistic number theory

E. Kowalski

Version of May 10, 2021


[email protected]
“Les probabilités et la théorie analytique des nombres, c’est la même chose”,
paraphrase of Y. Guivarc’h, Rennes, July 2017.
Contents

Preface 1
Prerequisites and notation 2
Chapter 1. Introduction 4
1.1. Presentation 4
1.2. How does probability link with number theory really? 5
1.3. A prototype: integers in arithmetic progressions 6
1.4. Another prototype: the distribution of the Euler function 14
1.5. Generalizations 18
1.6. Outline of the book 20
Chapter 2. Classical probabilistic number theory 22
2.1. Introduction 22
2.2. Distribution of arithmetic functions 22
2.3. The Erdős–Kac Theorem 26
2.4. Convergence without renormalization 31
2.5. Final remarks 35
Chapter 3. The distribution of values of the Riemann zeta function, I 38
3.1. Introduction 38
3.2. The theorems of Bohr-Jessen and of Bagchi 41
3.3. The support of Bagchi’s measure 53
3.4. Generalizations 57
Chapter 4. The distribution of values of the Riemann zeta function, II 59
4.1. Introduction 59
4.2. Strategy of the proof of Selberg’s theorem 60
4.3. Dirichlet polynomial approximation 66
4.4. Euler product approximation 69
4.5. Further topics 74
Chapter 5. The Chebychev bias 76
5.1. Introduction 76
5.2. The Rubinstein–Sarnak distribution 77
5.3. Existence of the Rubinstein–Sarnak distribution 80
5.4. The Generalized Simplicity Hypothesis 88
5.5. Further results 95
Chapter 6. The shape of exponential sums 96
6.1. Introduction 96
6.2. Proof of the distribution theorem 99
6.3. Applications 107
iii
6.4. Generalizations 110
Chapter 7. Further topics 111
7.1. Equidistribution modulo 1 111
7.2. Roots of polynomial congruences and the Chinese Remainder Theorem 114
7.3. Gaps between primes 116
7.4. Cohen-Lenstra heuristics 116
7.5. Ratner theory 117
7.6. And even more... 119
Appendix A. Analysis 120
A.1. Summation by parts 120
A.2. The logarithm 121
A.3. Mellin transform 122
A.4. Dirichlet series 124
A.5. Density of certain sets of holomorphic functions 128
Appendix B. Probability 131
B.1. The Riesz representation theorem 131
B.2. Support of a measure 132
B.3. Convergence in law 133
B.4. Perturbation and convergence in law 135
B.5. Convergence in law in a finite-dimensional vector space 139
B.6. The Weyl criterion 143
B.7. Gaussian random variables 148
B.8. Subgaussian random variables 150
B.9. Poisson random variables 151
B.10. Random series 152
B.11. Some probability in Banach spaces 158
Appendix C. Number theory 168
C.1. Multiplicative functions and Euler products 168
C.2. Additive functions 171
C.3. Primes and their distribution 171
C.4. The Riemann zeta function 174
C.5. Dirichlet L-functions 177
C.6. Exponential sums 182
Bibliography 187
Index 192

iv
Preface

The style of this book is a bit idiosyncratic. The results that interest us belong to
number theory, but the emphasis in the proofs will be on the probabilistic aspects, and
on the interaction between number theory and probability theory. In fact, we attempt to
write the proofs so that they use as little arithmetic as possible, in order to clearly isolate
the crucial number-theoretic ingredients which are involved.
This book is quite short. We attempt to foster an interest in the topic by focusing
on a few key results that are accessible and at the same time particularly appealing, in
the author’s opinion, without targeting an encyclopedic treatment of any. We also try to
emphasize connections to other areas of mathematics – first, to a wide array of arithmetic
topics, but also to some aspects of ergodic theory, expander graphs, etc.
In some sense, the ideal reader of this book is be a student who has attended at least
one introductory advanced undergraduate or beginning graduate-level probability course,
including especially the Central Limit Theorem, and maybe some aspects of Brownian
motion, and who is interested in seeing how probability interacts with number theory.
For this reason, there are almost no number-theoretic prerequisites, although it is helpful
to have some knowledge of the distribution of primes.
Probabilistic number theory is currently evolving very rapidly, and uses more and
more refined probabilistic tools and results. For many number theorists, we hope that
the detailed and motivated discussion of basic probabilistic facts and tools in this book
will be useful as a basic “toolbox”.

Zürich, May 10, 2021

Acknowledgments. The first draft of this book was prepared for a course “Intro-
duction to probabilistic number theory” that I taught at ETH Zürich during the Fall
Semester 2015. Thanks to the students of the course for their interest, in particular to
M. Gerspach, A. Steiger, P. Zenz for sending corrections, and to B. Löffel for organizing
and preparing the exercise sessions.
Thanks to M. Burger for showing me Cauchy’s proof of the Euler formula
+∞
X 1 π2
=
n=1
n2 6
in Exercise 1.3.4. Thanks to V. Tassion for help with the proof of Proposition B.11.11,
and to G. Ricotta and E. Royer for pointing out a small mistake in [79].
Thanks to M. Radziwill and K. Soundararajan for sharing their proof [95] of Selberg’s
Central Limit Theorem for log |ζ( 12 + it)|, which was then unpublished.
This work was partially supported by the DFG-SNF lead agency program grant
200020L 175755.

1
Prerequisites and notation

The basic requirements for most of this text are standard introductory graduate
courses in algebra, analysis (including Lebesgue integration and complex analysis) and
probability. Of course, knowledge and familiarity with basic number theory (for instance,
the distribution of primes up to the Bombieri-Vinogradov Theorem) are helpful, but we
review in Appendix C all the results that we use. Similarly, Appendix B summarizes the
notation and facts from probability theory which are the most important for us.

We will use the following notation:


(1) For subsets Y1 and Y2 of an arbitrary set X, we denote by Y1 Y2 the difference
set, i.e., the set of elements x ∈ Y1 such that x ∈ / Y2 .
(2) A locally compact topological space is always assumed to be separated (i.e.,
Hausdorff), as in Bourbaki [15].
(3) For a set X, |X| ∈ [0, +∞] denotes its cardinal, with |X| = ∞ if X is infinite.
There is no distinction in this text between the various infinite cardinals.
(4) If X is a set and f , g two complex-valued functions on X, then we write synony-
mously f = O(g) or f  g to say that there exists a constant C > 0 (sometimes
called an “implied constant”) such that |f (x)| 6 Cg(x) for all x ∈ X. Note that
this implies that in fact g > 0. We also write f  g to indicate that f  g and
g  f.
(5) If X is a topological space, x0 ∈ X and f and g are functions defined on a
neighborhood of x0 , with g(x) 6= 0 for x in a neighborhood of x0 , then we say
that f (x) = o(g(x)) as x → x0 if f (x)/g(x) → 0 as x → x0 , and that f (x) ∼ g(x)
as x → x0 if f (x)/g(x) → 1.
(6) We write a | b for the divisibility relation “a divides b”; we denote by (a, b) the
gcd of two integers a and b, and by [a, b] their lcm.
(7) P
Usually, the variable p will always refer to prime numbers. In particular, a series
p (· · · ) refers to a series over primes (summed in increasing order, in case it is
not known to be absolutely convergent), and similarly for a product over primes.
(8) We denote by Fp the finite field Z/pZ, for p prime, and more generally by Fq a
finite field with q elements, where q = pn , n > 1, is a power of p. We will recall
the properties of finite fields when we require them.
(9) For a complex number z, we write e(z) = e2iπz . If q > 1 and x ∈ Z/qZ, then
e(x/q) is then well-defined by taking any representative of x in Z to compute
the exponential.
(10) If q > 1 and x ∈ Z (or x ∈ Z/qZ) is an integer which is coprime to q (or a residue
class invertible modulo q), we sometimes denote by q̄ the inverse class such that
xx̄ = 1 in Z/qZ. This will always be done in such a way that the modulus q is
clear from context, in the case where x is an integer.
(11) Given a probability space (Ω, Σ, P), we denote by E(·) (resp. V(·)) the expecta-
tion (resp. the variance) computed with respect to P. It will often happen that
2
we have a sequence (ΩN , ΣN , PN ) of probability spaces; we will then denote by
EN or VN the respective expectation and variance with respect to PN .
(12) Given a measure space (Ω, Σ, µ) (not necessarily a probability space), a set Y
with a σ-algebra Σ0 and a measurable map f : Ω −→ Y, we denote by f∗ (µ)
(or sometimes f (µ)) the image measure on Y; in the case of a probability space,
so that f is seen as a random variable on Ω, this is the probability law of f
seen as a “random Y-valued element”. If the set Y is given without specifying a
σ-algebra, we will view it usually as given with the σ-algebra generated by sets
Z ⊂ Y such that f −1 (Z) belongs to Σ.
(13) As a typographical convention, we will use sans-serif fonts like X to denote an
arithmetic random variable, and more standard fonts (like X) for “abstract”
random variables. When using the same letter, this will usually mean that
somehow the “purely random” X is the “model” of the arithmetic quantity X.

3
CHAPTER 1

Introduction

1.1. Presentation
Different authors might define “probabilistic number theory” in different ways. Our
point of view will be to see it as the study of the asymptotic behavior of arithmetically-
defined sequences of probability measures, or random variables. Thus the content of this
book is based on examples of situations where we can say interesting things concerning
such sequences. However, in Chapter 7, we will quickly survey some topics that might
quite legitimately be seen as part of probabilistic number theory in a broader sense.
To illustrate what we have in mind, the most natural starting point is a famous result
of Erdős and Kac.
Theorem 1.1.1 (the Erdős-Kac Theorem). For any positive integer n > 1, let ω(n)
denote the number of prime divisors of n, counted without multiplicity. Then for any real
numbers a < b, we have
Z b
1 n ω(n) − log log N o 1 2
lim 16n6N | a6 √ 6b = √ e−x /2 dx.
N→+∞ N log log N 2π a
To spell out the connection between this statement and our slogan, one sequence
of probability measures involved here is the sequence (µN )N>1 defined as the uniform
probability measure supported on the finite set ΩN = {1, . . . , N}. This sequence is de-
fined arithmetically, because the study of integers is part of arithmetic. The asymptotic
behavior is revealed by the statement. Namely, consider the sequence of random variables
ω(n) − log log N
XN (n) = √
log log N
defined on ΩN for N > 3,1 and the sequence (νN ) of their probability distributions, which
are (Borel) probability measures on R defined by
1 n ω(n) − log log N o
νN (A) = µN (XN ∈ A) = 16n6N | √ ∈A
N log log N
for any measurable set A ⊂ R. These form another arithmetically-defined sequence of
probability measures, since primes are definitely arithmetic objects. Theorem 1.1.1 is, by
basic probability theory, equivalent to the fact that the sequence (νN ) converges in law
to a standard gaussian random variable as N → +∞. (We recall here that a sequence of
real-valued random variables (XN ) converges in law to a random variable X if
E(f (XN )) → E(f (X))
for all bounded continuous functions f : R → C, and that one can show that it is equiv-
alent to
P(a < XN < b) → P(a < X < b)
1 Simply so that log log N > 0.
4
for all a < b such that P(X = a) = P(X = b) = 0; for the standard gaussian, this means
for all a and b; see Section B.3 for reminders about this).
The Erdős-Kac Theorem is probably the simplest case where a natural deterministic
arithmetic quantity (the number of prime factors of an integer), which is individually
very hard to grasp, nevertheless exhibits a statistical or probabilistic behavior which fits
a very common probability distribution. This is the prototype of the kinds of statements
we will discuss (although sometimes the limiting measures will be far from standard!).
We will prove Theorem 1.1.1 in the next chapter. Before we do this, we will begin
with a few results that are much more elementary but which may, with hindsight, be
considered as the simplest cases of the type of results we want to describe.

1.2. How does probability link with number theory really?


Before embarking on this, however, it might be useful to give a rough idea of the way
probability theory and arithmetic will combine to give interesting limit theorems like the
Erdős-Kac Theorem. The strategy that we outline here will be, in different guises, at the
core of the strategy of the proofs of many theorems in this book.
We typically will be working with a sequence (Xn ) of arithmetically interesting random
variables, and we wish to prove that it converges in law. In many cases, we do this with
a two-step process.
(1) We begin by approximating (Xn ) by another sequence (Yn ), in such a way that
convergence in law of these approximations implies that of (Xn ), with the same
limit. In other words, we see Yn as a kind of perturbation of Xn , which is small
enough to preserve convergence in law. Notably, the approximation might be
of different sorts: the difference Xn − Yn might, for instance, converge to 0 in
probability, or in some Lp -space; in fact, we will sometimes encounter a process
of successive approximations, where the successive perturbations are small in
different senses, before reaching a convenient approximation Yn (this is the case
in the proof of Theorem 4.1.2).
(2) Having found a good approximation Yn , we prove that it converges in law using a
probabilistic criterion that is sufficiently robust to apply; typical examples are the
method of moments, and the convergence theorem of P. Lévy based on character-
istic functions (i.e., Fourier transforms), because analytic number theory often
gives tools to compute approximately such invariants of arithmetically-defined
random variables.
Both steps are sometimes quite easy to motivate using some heuristic arguments (for
instance, when Xn or Yn are represented as a sum of various terms, we might guess that
these are “approximately independent”, to lead to a limit similar to that of sums of
independent random variables), but they may also involve quite subtle ideas.
We will not dwell further on this overarching strategy, but the reader will be able to
recognize how it fits into this skeleton when we discuss the steps of the proof of some of
the main theorems.
In many papers written by (or for) analytic number theorists, the approximations
of Step 1, as well as (say) the moment computations of Step 2, are performed using
notation, terminology and normalizations coming from the customs and standards of
analytic number theory. In this book, we will try to express them instead, as much as
possible, in good probabilistic style (e.g., we attempt to mention as little as possible
the “elementary events” of the underlying probability space). This is usually simply a
matter of cosmetic transformations, but sometimes it leads to slightly different emphasis,
5
in particular concerning the nature of the approximations in Step 1. We suggest that the
reader compare our presentation with that of some of the original source papers, in order
to assess whether this style is enlightening (as we often find it to be), or not.

1.3. A prototype: integers in arithmetic progressions


As mentioned above, we begin with a result that is so elementary that it is usually
not presented as a separate statement (let alone as a theorem!). Nevertheless, as we
will see, it is the basic ingredient (and explanation) for the Erdős-Kac Theorem, and
generalizations of it become quite quickly very deep.
Theorem 1.3.1. For N > 1, let ΩN = {1, . . . , N} with the uniform probability measure
PN . Fix an integer q > 1, and denote by πq : Z −→ Z/qZ the reduction modulo q map.
Let XN be the random variables given by XN (n) = πq (n) for n ∈ ΩN .
As N → +∞, the random variables XN converge in law to the uniform probability
measure µq on Z/qZ. In fact, for any function
f : Z/qZ −→ C,
we have
2
(1.1) E(f (XN )) − E(f ) 6 kf k1 ,
N
where X
kf k1 = |f (a)|.
a∈Z/qZ

Proof. It is enough to prove (1.1), which gives the convergence in law by letting
N → +∞. This is quite simple. By definition, we have
1 X
E(f (XN )) = f (πq (n)),
N 16n6N
and
1 X
E(f ) = f (a).
q
a∈Z/qZ

The idea is then clear: among the integers 1 6 n 6 N, roughly N/q are in any given
residue class a (mod q), and if we use this approximation in the first formula, we obtain
precisely the second.
To do this in detail, we gather the integers in the sum according to their residue class
a modulo q. This gives
1 X X 1 X
f (πq (n)) = f (a) × 1.
N 16n6N N 16n6N
a∈Z/qZ
n≡a (mod q)

The inner sum, for each a, counts the number of integers n in the interval 1 6 n 6 N such
that the remainder under division by q is a. These integers n can be written n = mq + a
for some m ∈ Z, if we view a as an actual integer, and therefore it is enough to count
those integers m ∈ Z for which 1 6 mq + a 6 N. The condition translates to
1−a N−a
6m6 ,
q q
and therefore we are reduced to counting integers in an interval. This is not difficult,
although we have to be careful with boundary terms, since the bounds of the interval are
6
not necessarily integers. The length of the interval is (N − a)/q − (1 − a)/q = (N − 1)/q.
In general, in an interval [α, β] with α 6 β, the number Nα,β of integers satisfies
β − α − 1 6 Nα,β 6 β − α + 1
(and the boundary contributions should not be forgotten, although they are typically
negligible when the interval is long enough).
Hence the number Na of values of m satisfies
N−1 N−1
(1.2) − 1 6 Na 6 + 1,
q q
and therefore
N 1
Na − 61+ .
q q
By summing over a in Z/qZ, we deduce now that
1 X 1 X X N
a 1
f (πq (n)) − f (a) = f (a) −
N 16n6N q N q
a∈Z/qZ a∈Z/qZ

1 + q −1 X 2
6 |f (a)| 6 kf k1 .
N N
a∈Z/qZ


Remark 1.3.2. As a matter of notation, we will sometimes remove the variable N
from the notation of random variables, since the value of N is usually made clear by the
context, frequently because of its appearance in an expression involving PN (·) or EN (·),
which refers to the probability and expectation on ΩN .
Despite its simplicity, this result already brings up a number of important features
that will occur extensively in later chapters.
A first remark is that we actually proved something much stronger than the statement
of convergence in law: the bound (1.1) gives a rather precise estimate of the speed of
convergence of expectations (or probabilities) computed using the law of XN to those
computed using the limit uniform distribution µq . Most importantly, as we will see
shortly, these estimates are uniform in terms of q, and give us information on convergence,
or more properly speaking on the “distance” between the law of XN and µq , even if q
depends on N in some way.
To be more precise, take f to be the characteristic function of a residue class a ∈ Z/qZ.
Then since E(f ) = 1/q, we get
1 2
P(πq (n) = a) − 6 .
q N
This is non-trivial information as long as q is a bit smaller than N. Thus, this states
that the probability that n 6 N is congruent to a modulo q is close to the intuitive
probability 1/q, uniformly for all q just a bit smaller than N, and also uniformly for all
residue classes. We will see, both below and in many similar situations, that uniformity
aspects are essential in applications.
The second remark concerns the interpretation of the result. Theorem 1.3.1 can
explain what is meant by such intuitive statements as the probability that an integer is
divisible by 2 is 1/2. Namely, this is the probability, according to the uniform measure
on Z/2Z, of the set {0}, and this is simply the limit given by the convergence in law of
the variables π2 (n) defined on ΩN to the uniform measure µ2 .
7
This idea applies to many other similar-sounding problems. The most elementary
among these can often be solved using Theorem 1.3.1. We present one famous example:
what is the “probability” that an integer n > 1 is squarefree, which means that n is not
divisible by a square m2 for some integer m > 2 (or, equivalently, by the square of some
prime number)? Here the interpretation is that this probability should be
1
lim |{1 6 n 6 N | n is squarefree}|.
N→+∞ N

If we prefer (as we do) to speak of sequences of random variables, we can take the sequence
of Bernoulli random variables BN indicators of the event that n ∈ ΩN is squarefree, so
that
1
P(BN = 1) = |{1 6 n 6 N | n is squarefree}|.
N
We then ask about the limit in law of (BN ). The answer is as follows:
Proposition 1.3.3. The sequence (BN ) converges in law to a Bernoulli random vari-
able B with P(B = 1) = π62 . In other words, the “probability” that an integer n is
squarefree, in the interpretation discussed above, is 6/π 2 .
Proof. The idea is to use inclusion-exclusion: to say that n is squarefree means that
it is not divisible by the square p2 of any prime number. Thus, if we denote by PN the
probability measure on ΩN , we have
 \ 
2
PN (n is squarefree) = PN {p does not divide n} .
p prime

There is one key step now that is both obvious and crucial: because of the nature
√ of
ΩN , the infinite intersection may be replaced by the intersection over primes p 6 N,
since all integers in ΩN are 6 N. Applying the inclusion-exclusion formula, we obtain
 \  X \ 
2 |I| 2
(1.3) PN {p does not divide n} = (−1) PN {p divides n}
p6N1/2 I p∈I

where I runs over the set of subsets of the set {p 6 N1/2 } of primes 6 N1/2 , and |I| is the
cardinality of I. But, by the Chinese Remainder Theorem, we have
\
{p2 divides n} = {d2I divides n}
p∈I

where dI is the product of the primes in I. Once more, note that this set is empty if
d2I > N. Moreover, the fundamental theorem of arithmetic shows that I 7→ dI is injective,
and we can recover |I| also from dI as the number of prime factors of dI . Therefore, we
get
X
PN (n is squarefree) = µ(d) PN (d2 divides n)
d6N1/2

where µ(d) is the Möbius function, defined for integers d > 1 by


(
0 if d is not squarefree,
µ(d) = k
(−1) if d = p1 · · · pk with pi distinct primes

(see Definition C.1.3).


8
But d2 divides n if and only if the image of n by reduction modulo d2 is 0. By
Theorem 1.3.1 applied with q = d2 for all d 6 N1/2 , with f the indicator function of the
residue class of 0, we get
1
PN (d2 divides n) = 2
+ O(N−1 )
d
for all d, where the implied constant in the O(·) symbol is independent of d (in fact, it is
at most 2). Note in passing how we use crucially here the fact that Theorem 1.3.1 was
uniform and explicit with respect to the parameter q.
Summing the last formula over d 6 N1/2 , we deduce
X µ(d)  1 
PN (n is squarefree) = 2
+O √ .
d N
d6n1/2

Since the series with terms 1/d2 converges, this shows the existence of the limit, and
that (BN ) converges in law as N → +∞ to a Bernoulli random variable B with success
probability
X µ(d) X µ(d)
P(B = 1) = 2
, P(B = 0) = 1 − 2
.
d>1
d d>1
d
It is a well-known fact (the “Basel problem”, first solved by Euler; see Exercise 1.3.4 for
a proof) that
X 1 π2
= .
d>1
d2 6
Moreover, a basic property of the Möbius function states that
X µ(d) 1
s
=
d>1
d ζ(s)

for any complex number s with Re(s) > 1, where


X 1
ζ(s) =
d>1
ds

is the Riemann zeta function (Corollary C.1.5), and hence we get


X µ(d) 1 6
2
= = 2.
d>1
d ζ(2) π


Exercise 1.3.4. In this exercise, we explain a proof of Euler’s formula ζ(2) = π 2 /6.
(1) Assuming that
sin(πx) Y x2 
= 1− 2
πx n>1
n
(another formula of Euler), find a heuristic proof of ζ(2) = π 2 /6. [Hint: First, express
the sum of the inverses of the roots of a polynomial (with non-zero constant term) in
terms of its coefficients.]
The following argument, due to Cauchy, can be seen as a way to make rigorous the
previous idea.
9
(2) Show that for n > 1 and x ∈ R πZ, we have
 
sin(nx) X
m n
= (−1) cotan(x)n−(2m+1) .
(sin x)n 2m + 1
06m6n/2

(3) Let m > 1 be an integer and let n = 2m + 1. Show that


Xm  rπ 2 2m(2m − 1)
cotan =
r=1
n 6
and
m
X 1 2m(2m + 2)
 rπ 2 = .
r=1 sin
6
n
[Hint: Using (1), view the numbers cotan(rπ/n)2 as the roots of a polynomial of degree m,
and use the formula for the sum of the roots of a polynomial.]
(4) Deduce that
m
2m(2m − 1) X 2m + 1 2 2m(2m + 2)
< < ,
6 k=1
kπ 6
and conclude.
The proof of Proposition 1.3.3 above was written in probabilistic style, emphasizing
the connection with Theorem 1.3.1. It can be expressed more straightforwardly as a
sequence of manipulation with finite sums, using the formula
(
X 1 if n is squarefree
(1.4) µ(d) =
d2 |n
0 otherwise,

for n > 1 (which is implicit in our discussion) and the approximation


X N
1 = + O(1)
16n6N
d
d|n

for the number of integers in an interval which are divisible by some d > 1. This goes as
follows:
X XX X X
1= µ(d) = µ(d) 1

n6N n6N d2 |n d6 N n6N
n squarefree d2 |n
X N  X µ(d) √
= µ(d) 2 + O(1) = N + O( N).
√ d d
d2
d6 N

Obviously, this is much shorter, although one needs to know the formula (1.4), which
was implicitly derived in the previous proof.2 But there is something quite important to
be gained from the probabilistic viewpoint, which might be missed by reading too quickly
the second proof. Indeed, in formulas like (1.3) (or many others), the precise nature of the
underlying probability space ΩN is quite hidden – as is customary in probability where
this is often not really relevant. In our situation, this suggests naturally to study similar
2 Readers who are already well-versed in analytic number theory might find it useful to translate
back and forth various estimates written in probabilistic style in this book.
10
problems for different sequences of integer-valued random variables rather than taking
integers uniformly between 1 and N.
This has indeed been done, and in many different ways. But even before looking at
any example, we can predict that some new – interesting – phenomena will arise when
doing so. Indeed, even if our first proof of Proposition 1.3.3 was written in a very general
probabilistic language, it did use one special feature of ΩN : it only contains integers
n 6 N, and √ even more particularly, it does not contain any element divisible by d2 for d
larger than N. (More probabilistically, the probability PN (d2 divides n) is then zero).
Now consider the following extension of the problem, which is certainly one of the
first that may come to mind beyond our initial setting: we fix an irreducible polynomial
P ∈ Z[X] of degree m > 1, and consider new Bernoulli random variables BP,N which are
indicators of the event that P(n) is squarefree on ΩN (instead of n itself). Asking about the
limit of these random variables means asking for the “probability” that P(n) is squarefree,
when 1 6 n 6 N. But although there is an elementary analogue of Theorem 1.3.1, it is
easy to see that this does not give enough control of
PN (d2 divides P(n))
when d is “too large” compared with N. And this explains partly why, in fact, as of 2020
at least, there is not even a single irreducible polynomial P ∈ Z[X] of degree 4 or higher
for which it is known that P(n) is squarefree infinitely often.
Exercise 1.3.5. (1) Let k > 2 be an integer. Compute the “probability”, in the
same sense as in Proposition 1.3.3, that an integer n is k-free, i.e., that there is no integer
m > 2 such that mk divides n.
(2) Compute the “probability” that two integers n1 and n2 are coprime, in the sense
of taking the corresponding Bernoulli random variables on ΩN × ΩN and their limit as
N → +∞.
Exercise 1.3.6. Let P ∈ Z[X] be an irreducible polynomial of degree m > 1. For q >
1, let πq be the projection from Z to Z/qZ as before.
(1) Show that for any q > 1, the random variables XN (n) = πq (P(n)) converge in law
to a probability measure µP,q on Z/qZ. Is µP,q uniform?
(2) Find values of T, depending on N and as large as possible, such that
PN (P(n) is not divisible by p2 for p 6 T) > 0.
How large should T be so that this implies straightforwardly that
{n > 1 | P(n) is squarefree}
is infinite?
(3) Prove that the set
{n > 1 | P(n) is (m + 1)-free}
is infinite.
We conclude this section with another very important feature of Theorem 1.3.1 from
the probabilistic point of view, namely its link with independence. If q1 and q2 are positive
integers which are coprime, then the Chinese Remainder Theorem implies that the map
(
Z/q1 q2 Z −→ Z/q1 Z × Z/q2 Z
x 7→ (x (mod q1 ), x (mod q2 ))
11
is a bijection (in fact, a ring isomorphism). Under this bijection, the uniform probability
measure µq1 q2 on Z/q1 q2 Z corresponds to the product measure µq1 ⊗ µq2 . In particular,
the random variables x 7→ x (mod q1 ) and x 7→ x (mod q2 ) on Z/q1 q2 Z are independent.
The interpretation of this is that the random variables πq1 and πq2 on ΩN are asymp-
totically independent as N → +∞, in the sense that
1  
lim PN (πq1 (n) = a and πq2 (n) = b) = = lim PN (πq1 (n) = a) ×
N→+∞ q1 q2 N→+∞
 
lim PN (πq2 (n) = b)
N→+∞

for all (a, b) ∈ Z2 . Intuitively, one would say that divisibility by q1 and q2 are independent,
and especially that divisibility by distinct primes are independent events. We summarize
this in the following extremely useful proposition:
Proposition 1.3.7. For N > 1, let ΩN = {1, . . . , N} with the uniform probability
measure PN . Fix a finite set S of pairwise coprime integers.
As N → +∞, the vector (πq )q∈S seen as random vector on ΩN with values in
Y
XS = Z/qZ
q∈S

converges in law to a vector of independent and uniform random variables. In fact, for
any function
f : XS −→ C
we have
2
(1.5) E(f ((πq )q∈S )) − E(f ) 6
kf k1 .
N
Proof. This is just an elaboration of the previous discussion. Let r be the product
of the elements of S. Then the Chinese Remainder Theorem gives a ring-isomorphism
XS −→ Z/rZ such that the uniform measure µr on the right-hand side corresponds to
the product of the uniform measures on XS . Thus f can be identified with a function
g : Z/rZ −→ C, and its expectation to the expectation of g according to µr . By
Theorem 1.3.1, we get
2kgk1
E(f ((πq )q∈S )) − E(f ) = E(g(πr )) − E(g) 6 ,
N
which is the desired result since f and g have also the same `1 norm. 
Remark 1.3.8. (1) Note that the random variables obtained by reduction modulo
two coprime integers are not exactly independent: it is not true in general that
PN (πq1 (n) = a and πq2 (n) = b) = PN (πq1 (n) = a) PN (πq2 (n) = b).
This is the source of many interesting aspects of probabilistic number theory where clas-
sical ideas and concepts of probability for sequences of independent random variables are
generalized or “tested” in a context where independence only holds in an asymptotic or
approximate sense.
(2) There is one subtle point that appears in quantitative applications of Theo-
rem 1.3.1 and Proposition 1.3.7 that is worth mentioning. Given an integer q > 1, certain
functions f on Z/qZ might have a large norm kf k1 , and yet they may have expressions
as linear combinations of functions fe on certain spaces Z/dZ, where d is a divisor of q,
12
which have much smaller norms kfek1 . Taking such possibilities into account and arguing
modulo d instead of modulo q may lead to stronger estimates for the error
EN (f (πq (n))) − E(f )
than those we have written down in terms of kf k1 . This is, for instance, especially clear
if we take f to be a non-zero constant, in which case the difference is actually 0, but kf k1
is of size q.
One can incorporate formally these improvements by using a different norm than kf k1 ,
as we now explain.
Let q > 1 be an integer. Let Φq be the set of functions ϕd,a : Z/qZ → C which are
characteristic functions of classes x ≡ a (mod d) for some positive divisor d | q and some
a ∈ Z/dZ (these are well-defined functions modulo q). In particular, the function ϕq,a is
just the delta function at a in Z/qZ, and ϕ1,0 is the constant function 1.
For an arbitrary function f : Z/qZ → C, let
nX X X X o
kf kc,1 = inf |λd,a | | f = λd,a ϕd,a .
d|q a (mod d) d|q a (mod d)

This defines a norm on the space of functions on Z/qZ (the subscript c refers to congru-
ences); the norm kf kc,1 measures how simply the function f may be expressed as a linear
combination of indicator functions of congruence classes modulo divisors of q.3 Note that
kf kc,1 6 kf k1 , because one always has the representation
X
f= f (a)ϕq,a .
a∈Z/qZ

Now the estimates (1.1) and (1.5) can be improved to


2
(1.6) E(f (XN )) − E(f ) 6 kf kc,1 ,
N
2
(1.7) E(f ((πq )q∈S )) − E(f ) 6 kf kc,1 ,
N
respectively. Indeed, it suffices (using linearity and the triangle inequality) to check this
for f = ϕd,a for some divisor d | q and some a ∈ Z/dZ (with kϕd,a kc,1 replaced by 1 in
the right-hand side), in which case the difference (in the first case) is
1 X 1 X 1 X 1
1− 1= 1− ,
N n6N q N n6N d
x (mod q)
n≡a (mod d) x≡a (mod d) n≡a (mod d)

which reduces to the case of single element modulo d, for which we now apply Theo-
rem 1.3.1.
Another corollary of these elementary statements identifies the limiting distribution of
the valuations of integers. To state it, we denote by SN the identity random variable on the
probability space ΩN = {1, . . . , N} with uniform probability measure of Theorem 1.3.1.
Corollary 1.3.9. For p prime, let vp denote the p-adic valuation on Z. The random
vectors (vp (SN ))p converge in law, in the sense of finite distributions, to a sequence (Vp )p
of independent geometric random variables with
 1 1
P(Vp = k) = 1 −
p pk
3In terms of functional analysis, this means that this is a quotient norm of the `1 norm on the space
with basis Φq .
13
for k > 0. In other words, for any finite set of primes S and any non-negative inte-
gers (kp )p∈S , we have
Y
lim PN (vp (SN ) = kp for p ∈ S) = P(Vp = kp ).
N→+∞
p∈S

Proof. For a given prime p and integer k > 0, the condition that vp (n) = k means
that n (mod pk+1 ) belongs to the subset in Z/pk+1 Z of residue classes of the form bpk
where 1 6 b 6 p − 1; by Theorem 1.3.1, we therefore have
p−1
lim PN (vp (SN ) = k) = k+1 = P(Vp = k).
N→+∞ p
Proposition 1.3.7 then shows that this extends to any finite set of primes. 
Example 1.3.10. Getting quantitative estimates in this corollary is a good example
of Remark 1.3.8 (2). We illustrate this point in the simplest case, which will be used in
Section 2.2.
Consider two primes p 6= q and the probability
PN (vp (SN ) = vq (SN ) = 1).
The indicator function ϕ of this event is naturally defined modulo p2 q 2 , and its norm kϕk1
is the number of integers modulo p2 q 2 that are multiples of pq, but not of p2 or q 2 . By
inclusion-exclusion, this means that kϕk1 = (p − 1)(q − 1). On the other hand, we have
ϕ = ϕ1 − ϕ2 − ϕ3 + ϕ4 where
• The function ϕ1 is defined modulo pq as the indicator of the class 0;
• The function ϕ2 is defined modulo p2 q as the indicator of the class 0;
• The function ϕ3 is defined modulo pq 2 as the indicator of the class 0;
• The function ϕ4 is defined modulo p2 q 2 as the indicator of the class 0.
Hence, in the notation of Remark 1.3.8 (2), we have kϕkc,1 6 4; using this remark, or by
applying Theorem 1.3.1 four times, we get
1 1  1 1
PN (vp (SN ) = vq (SN ) = 1) = 1− 1− +O ,
pq p q N
instead of having an error term of size pq/N, as suggested by a direct application of (1.1).

1.4. Another prototype: the distribution of the Euler function


Although Proposition 1.3.7 is extremely simple, it is the only necessary arithmetic
ingredient in the proof of a result that is another prototype of probabilistic number the-
ory in our sense. This is a theorem proved by Schoenberg [108] in 1928, which therefore
predates the Erdős–Kac Theorem by about ten years (although Schoenberg phrased the
result quite differently, since this date is also before Kolmogorov’s formalization of prob-
ability theory).
The Euler “totient” function is defined for integers n > 1 by ϕ(n) = |(Z/nZ)× | (the
number of invertible residue classes modulo n). By the Chinese Remainder Theorem (see
Example C.1.8), this function is multiplicative, in the sense that ϕ(n1 n2 ) = ϕ(n1 )ϕ(n2 )
for n1 coprime to n2 . Computing ϕ(pk ) = pk − pk−1 = pk (1 − 1/p) for p prime and k > 1,
one deduces that
ϕ(n) Y 1
= 1−
n p
p|n

for all integers n > 1 (where the product is over primes p dividing n).
14
Now define random variables FN on ΩN = {1, . . . , N} (with the uniform probability
measure as before) by
ϕ(n)
FN (n) = .
n
We will prove that the sequence (FN )N>1 converges in law, and identify its limiting dis-
tribution. For this purpose, let (Bp )p be a sequence of independent Bernoulli random
variables, indexed by primes, with
1 1
P(Bp = 1) = , P(Bp = 0) = 1 −
p p
(such random variables will also occur prominently in the next chapter).
Proposition 1.4.1. The random variables FN converge in law to the random variable
given by
Y Bp 
F= 1− ,
p
p
where the infinite product ranges over all primes and converges almost surely.
This proposition is not only a good illustration of limiting behavior of arithmetic ran-
dom variables, but the proof that we give, which emphasizes probabilistic methods, is an
excellent introduction to a number of techniques that will occur later in more complicated
contexts. Before we begin, note how the limiting random variable is highly non-generic,
and in fact retains some arithmetic information, since it is a product over primes. In
particular, although the arithmetic content does not go beyond Proposition 1.3.7, this
theorem is certainly not an obvious fact.
Proof. For M > 1, we denote by FN,M the random variable on ΩN defined by
Y 1
FN,M (n) = 1− .
p
p|n
p6M

It is natural to think of these as approximations to FN . On the other hand, for a fixed M,


these are finite products and hence easier to handle. We will use a fairly simple “per-
turbation lemma” to prove the convergence in law of the sequence (FN )N>1 from the
understanding of the behavior of FN,M . The lemma is precisely Proposition B.4.4, which
the reader should read now.4
First, we fix M > 1. Since only primes p 6 M occur in the definition of FN,M , it follows
from Proposition 1.3.7 that the random variables FN,M converge in law as N → +∞ to
the random variable
Y Bp 
FM = 1− .
p6M
p
Thus Assumption (1) in Proposition B.4.4 is satisfied. We proceed to check Assumption
(2), which concerns the approximation of FN by FN,M on average.
We write EN,M = FN − FN,M . The expectation of |EN,M | is given by
1 X Y 1 Y 1 1 X Y 1
EN (|EN,M |) = 1− − 1− 6 1− −1 .
N n6N p p N n6N p
p|n p|n p|n
p6M p>M

4 Note that a similar argument reappears in a much more sophisticated context in Chapter 5 (see
the proof of Theorem 5.2.2, page 86).
15
For a given n, expanding the product, we see that the quantity
Y 1
1− −1
p
p|n
p>M

is bounded by the sum of 1/d over integers d > 2 which are squarefree, divide n, and
have all prime factors > M; let Dn be the set of such integers. In particular, we always
have M < d 6 N if d ∈ Dn
Thus
1 XX 1 X 1 1 X X 1 1
EN (|EN,M |) 6 6 × 16 6
N n6N d∈D d M<d6N d N n6N M<d6N
d2 M
n
n≡0 (mod d)

for all N > M. Assumption (2) of Proposition B.4.4 follows immediately, and we conclude
that (FN )N>1 converges in law, and that its limit is the limit in law F of the random
variables FM as M → +∞. The last thing to check in order to finish the proof is that the
random product
Y Bp 
(1.8) 1−
p
p
over primes converges almost surely, and has the same law as F. The almost sure con-
vergence follows from Kolmogorov’s Three Series Theorem, applied to the logarithm of
this product, which is a sum
X  Bp 
Yp , Yp = log 1 −
p
p
of independent random variables. Note that Yp 6 0 and that it only takes the values 0
(with probability 1 − 1/p) and log(1 − 1/p) (with probability 1/p), so that
1  1 1
E(Yp ) = log 1 − ∼ − 2,
p p p
1  1 2 1  1 2 1
V(Yp ) = E(Yp2 ) − E(Yp )2 = log 1 − − 2 log 1 −  3,
p p p p p
P
which implies by Theorem B.10.1 that the random series Yp converges almost surely,
and hence so does its exponential, which is the product (1.8). Now, from this convergence
almost surely, it is immediate that the law of the random product is also the law of F. 
In Section 2.2 of the next chapter, we will state and prove a theorem due to Erdős
and Wintner that implies the existence of limiting distributions for much more general
multiplicative functions.
Remark 1.4.2. The distribution function of the arithmetic function n 7→ ϕ(n)/n is
the function defined for x ∈ R by
f (x) = P(F 6 x).
This function has been extensively studied, and is still the object of current research. It
is a concrete example of a function exhibiting unusual properties in real analysis: it was
proved by Schoenberg [108, 109] that f is continuous and strictly increasing, and by
Erdős [34] that it is purely singular, i.e., that there exists a set N of Lebesgue measure 0
in R such that P(F ∈ N) = 1; this means that the function f is differentiable for all x ∈/ N,
with derivative equal to 0 (Exercise 1.4.4 explains the proof).
16
Figure 1.1. Empirical plot of the distribution function of ϕ(n)/n for n 6 106 .

In Figure 1.1, we plot the “empirical” values of f coming from integers n 6 106 .
In the next two exercises, we use the notation of Proposition 1.4.1.
Exercise 1.4.3. Prove probabilistically that
1 Y 1  ζ(2)ζ(3)
lim EN (FN ) = , lim EN (F−1
N ) = 1+ =
N→+∞ ζ(2) N→+∞
p
p(p − 1) ζ(6)
where Y
ζ(s) = (1 − p−s )−1
p

is the Riemann zeta function (see Corollary C.1.5 for the product expression). In other
words, we have
1 X ϕ(n) 1 1 X n ζ(2)ζ(3)
lim = , lim = .
N→+∞ N n ζ(2) N→+∞ N ϕ(n) ζ(6)
n6N n6N

Recover these formulas using Möbius inversion (as in the “direct” proof of Proposi-
tion 1.3.3).
Exercise 1.4.4. (1) Prove that the support of the law of F is [0, 1]. [Hint: Use
Proposition B.10.8.]
By the Jessen–Wintner purity theorem (see, e.g. [20, Th. 3.26]), this fact implies that
the function f is purely singular (in the sense of Remark 1.4.2), provided there exists a
set N of Lebesgue measure 0 such that P(F ∈ N) > 0. In turn, by elementary properties
of absolutely continuous probability measures, this follows if there exists α > 0 and, for
any ε > 0, a Borel set Iε ⊂ [0, 1] such that
17
(1) We have P(F ∈ Iε ) > α for all ε small enough,
(2) The Lebesgue measure of Iε tends to 0 as ε → 0.
The next questions will establish the existence of such sets. We define G = log(F), and
for M > 2, we let GM denote the partial sum
X  Bp 
GM = log 1 − .
p6M
p

(2) Prove that for any δ > 0, we have


1
P(|G − GM | > δ) 
δM
for any M > 0.
(3) For any finite set T of primes p 6 M, with characteristic function χT , prove that
1 Y1
P(Bp = χT (p) for p 6 M)  × .
log M p∈T p

[Hint: Use the Mertens Formula (Proposition C.3.1).]


(4) Let TM be a set of subsets T of the set of primes p 6 M, and let XM be the event
{ there exists T ∈ TM such that Bp = χT (p) for p 6 M }.
Show that
1 X Y1
P(XM )  .
log M T∈T p∈T p
M

(5) Let δ > 0 be some auxiliary parameter and


[ hX X i
IM = log(1 − 1/p) − δ, log(1 − 1/p) + δ, .
T∈TM p∈T p∈T

Show that the Lebesgue measure of IM is 6 2δ|TM | and that


1 X Y1 1
P(G ∈ IM )  − .
log M T∈T p∈T p δM
M

(6) Conclude by finding a choice of δ > 0 and TM such that the Lebesgue measure
of IM tends to 0 as M → +∞ whereas P(G ∈ IM )  1 for M large enough.

1.5. Generalizations
Theorem 1.3.1 and Proposition 1.3.7 are obviously very simple statements. However,
Proposition 1.4.1 has already shown that they should not be disregarded as trivial (and
our careful presentation should – maybe – not be considered as overly pedantic). A further
and even stronger sign in this direction is the fact that if one considers other natural
sequences of probability measures on the integers, instead of the uniform measures on
{1, . . . , N}, one quickly encounters very delicate questions, and indeed fundamental open
problems.
We have already mentioned the generalization related to polynomial values P(n) for
some fixed polynomial P ∈ Z[X]. Here are two other natural sequences of measures that
have been studied.
18
1.5.1. Primes. Maybe the most important variant consists in replacing the space
ΩN of positive n 6 N by the subset ΠN of prime numbers p 6 N (with the uniform
probability measure on these finite sets). According to the Prime Number Theorem
(Theorem C.3.3), there are about N/(log N) primes in ΠN . In this case, the qualitative
analogue of Theorem 1.3.1 is given by the theorem of Dirichlet, Hadamard and de la
Vallée-Poussin on primes in arithmetic progression (Theorem C.3.7), which implies that,
for any fixed q > 1, the random variables πq on ΠN converge in law to the probability
measure on Z/qZ which is the uniform measure on the subset (Z/qZ)× of invertible
residue classes (this change of the measure compared with the case of integers is simply
due to the obvious fact that at most one prime may be divisible by the integer q).
It is expected that a bound similar to (1.1) should be true. More precisely, there
should exist a constant C > 0 such that
C(log qN)2
(1.9) √
EΠN (f (πq )) − E(f ) 6 kf k1 ,
N
but that statement, once it is translated to more standard notation, is very close to the
Generalized Riemann Hypothesis
√ for Dirichlet L-functions (which is Conjecture C.5.8).5
Even a similar bound with N replaced by Nθ for any fixed θ > 0 is not known, and would
be a sensational breakthrough. Note that here the function f is defined on (Z/qZ)× and
we have
1 X
E(f ) = f (a),
ϕ(q) × a∈(Z/qZ)

with ϕ(q) = |(Z/qZ)× | denoting the Euler function (see Example C.1.8).
However,
√ weaker versions of (1.9), amounting roughly to a version valid on average
over q 6 N, are known: the Bombieri–Vinogradov Theorem states that, for any constant
A > 0, there exists B > 0 such that we have
X 1 1
(1.10) max × PΠN (πq = a) −  ,

B
a∈(Z/qZ) ϕ(q) (log N)A
q6 N/(log N)

where the implied constant depends only on A (see, e.g.,[59, ch. 17]). In many applica-
tions, this is essentially as useful as (1.9).
Exercise 1.5.1. Compute the “probability” that p − 1 be squarefree, for p prime.
(This can be done using the Bombieri–Vinogradov theorem, for instance.)

[Further references: Friedlander and Iwaniec [43]; Iwaniec and Kowalski [59].]

1.5.2. Random walks. A more recent (and extremely interesting) type of problem
arises from taking measures on Z derived from random walks on certain discrete groups.
For simplicity, we only consider a special case. Let m > 2 be an integer, and let G =
SLm (Z) be the group of m × m matrices with integral coefficients and determinant 1.
This is a complicated infinite (countable) group, but it is known to have finite generating
sets. We fix one such set S, and assume that 1 ∈ S and S = S−1 for convenience. (A
well-known example is the set S consisting of 1 and the elementary matrices 1 + Ei,j for
1 6 i 6= j 6 m, where Ei,j is the matrix where only the (i, j)-th coefficient is non-zero,
and equal to 1, and their inverses 1 − Ei,j ; the fact that these generate SLn (Z) can be
seen from the row-and-column operation reduction algorithm for such matrices).
5 It implies it for non-trivial Dirichlet characters.
19
The generating set S defines then a random walk (γn )n>0 on G: let (ξn )n>1 be a
sequence of independent S-valued random variables (defined on some probability space
Ω) such that P(ξn = s) = 1/|S| for all n and all s ∈ S. Then we let
γ0 = 1, γn+1 = γn ξn+1 .
Fix some (non-constant) polynomial function F of the coefficients of an element g ∈ G,
so F ∈ Z[(gi,j )] (for instance F(g) = g1,1 , or F(g) = Tr(g) for g = (gi,j ) in G). We can then
study the analogue of Theorem 1.3.1 when applied to the random variables πq (F(γn )) as
n → +∞, or in other words, the distribution of F(g) modulo q, as g varies in G according
to the distribution of the random walk.
Let Gq = SLm (Z/qZ) be the finite special linear group. It is an elementary exercise,
using finite Markov chains and the surjectivity of the projection map G −→ Gq , to check
that the sequence of random variables (πq (F(γn )))n>0 converges in law as n → +∞.
Indeed, its limit is a random variable Fq on Z/qZ defined by
1
P(Fq = x) = |{g ∈ Gq | F(g) = x}|,
|Gq |
for all x ∈ Z/qZ, where we view F as also defining a function F : Gq −→ Z/qZ. (In
other words, Fq is distributed like the direct image under F of the uniform measure on
Gq .)
In fact, elementary Markov chain theory (or direct computations) shows that there
exists a constant cq > 1 such that for any function f : Gq −→ C, we have
kf k1
(1.11) E(f (πq (γn )) − E(f ) 6 ,
cnq
in analogy with (1.1), with
X
kf k1 = |f (g)|.
g∈Gq

This is a very good result for a fixed q (note that the number of elements reached by
the random walk after n steps also grows exponentially with n). For applications, our
previous discussion already shows that it will be important to exploit (1.11) for q varying
with n, and uniformly over a wide range of q. This requires an understanding of the
variation of the constant cq with q. It is a rather deep fact (Property (τ ) of Lubotzky
for SL2 (Z), and Property (T) of Kazhdan for SLm (Z) if m > 3) that there exists c > 1,
depending only on m, such that cq > c for all q > 1. Thus we do get a uniform bound
kf k1
E(f (πq (γn )) − E(f ) 6
cn
valid for all n > 1 and all q > 1. This is related to the theory (and applications) of
expander graphs.
[Further references: Breuillard and Oh [21], Kowalski [65], [67].]

1.6. Outline of the book


Here is now a quick outline of the main results that we will prove in the text. For
detailed statements, we refer to the introductory sections of the corresponding chapters.
Chapter 2 presents first the Erdős–Wintner Theorem on the limiting distribution of
additive functions, before discussing the Erdős-Kac Theorem. These are good examples
to begin with, because they are the most natural starting point for probabilistic number
20
theory, and remain quite lively topics of contemporary research. This will lead to natural
appearances of the gaussian distribution as well as Poisson distributions.
Chapters 3 and 4 are concerned with the distribution of values of the Riemann zeta
function. We discuss results outside of the critical line (due to Bohr-Jessen, Bagchi and
Voronin) in the first of these chapters, and consider deeper results on the critical line (due
to Selberg, but following a recent presentation of Radziwill and Soundararajan) in the
second. The limit theorems one obtains can have rather unorthodox limiting distributions
(random Euler products, sometimes viewed as random functions, and – conjecturally –
also eigenvalues of random unitary matrices of large size).
Chapter 5 takes up a fascinating topic in the distribution of prime numbers: the
Chebychev bias, which attempts to compare the number of primes 6 x in various residue
classes modulo a fixed integer q > 1, and to see if some classes are “more equal” than
others. Our treatment follows the basic paper of Rubinstein and Sarnak.
In Chapter 6, we consider the distribution, in the complex plane, of polygonal paths
joining partial sums of Kloosterman sums, following work of the author and W. Sawin [79,
12]. Here we will use convergence in law in Banach spaces and some elementary prob-
ability in Banach spaces, and the limit object that arises will be a very special random
Fourier series.
In all of these chapters, we usually only discuss in detail one specific example of fairly
general results and theories: just the additive function ω(n) instead of more general
additive functions, just the Riemann zeta function instead of more general L-functions,
and specific families of exponential sums. However, we will briefly mention some of the
natural generalizations of the results presented.
Similarly, since our objective in this book is explicitly to write an introduction to the
topic of probabilistic number theory, we did not attempt to cover the most refined results
or the cutting-edge of research, or to discuss all possible topics. For the same reason, we
do not discuss in depth the applications of our main results, although we usually mention
at least some of them. Besides the discussion in Chapter 7 of other areas of interaction
between probability theory and number theory, the reader is invited to read the short
survey by Perret-Gentil [92].
At the end of the book are appendices that discuss the results of complex analysis,
probability theory and number theory that we use in the main chapters of the book. In
general, these are presented with some examples and detailed references, but without
complete proofs, at least when they can be considered to be standard parts of their
respective fields. We do not expect every reader to already be familiar with all of these
facts, and in order to make it possible to read the text relatively linearly, each chapter
begins with a list of the main results from these appendices that it will require, with
the corresponding reference (when no reference is given, this means that the result in
question will be presented within the chapter itself). We also note that the number-
theoretic results in Appendix C are stated in the “classical” style of analytic number
theory, without attempting to fit them to a probabilistic interpretation.

21
CHAPTER 2

Classical probabilistic number theory

Probability tools Arithmetic tools


Definition of convergence in law (§ B.3) Integers in arithmetic progressions (§ 1.3)
Convergence in law using auxiliary param- Mertens and Chebychev estimate
eters (prop. B.4.4) (prop. C.3.1)
Central Limit Theorem (th. B.7.2) Additive and multiplicative functions
(§ C.1, C.2)
Gaussian random variables (§ B.7)
The method of moments (th. B.5.5)
Poisson random variables (§ B.9)

2.1. Introduction
This chapter contains some of the earliest theorems of probabilistic number theory.
We will prove the Erdős–Kac Theorem, but first we consider an even more classical
topic: the distribution of multiplicative and additive arithmetic functions. The essential
statements predate the Erdős–Kac Theorem, and can be taken to be the beginning of true
probabilistic number theory. As we will see, the limiting distributions that are obtained
are far from generic.

2.2. Distribution of arithmetic functions


The classical problem of the distribution of the values of arithmetic functions concerns
the limiting behavior of (arithmetic) random variables of the form g(SN ), where g is
an additive or multiplicative function, and SN is the identity random variable on the
probability space ΩN = {1, . . . , N} with uniform probability measure. We saw an example
in Proposition 1.4.1, but we will now prove a much more general statement.
In fact, in the additive case (see Section C.2 for the definition of additive functions),
there is a remarkable characterization of those additive functions g for which the se-
quence (g(SN ))N converges in law as N → +∞. Arithmetically, it may be surprising
that it depends on no more than Theorem 1.3.1 (or Corollary 1.3.9), and the simplest
upper-bound of the right order of magnitude for the numbers of primes less than a given
quantity (Chebychev’s estimate); this was not even needed for Proposition 1.4.1.
Theorem 2.2.1. Let g be a complex-valued additive function such that the series
X g(p) X |g(p)|2 X 1
, ,
p p p
|g(p)|61 |g(p)|61 |g(p)|>1

22
converge. Then the sequence of random variables (g(SN ))N converges in law to the series
over primes
X
(2.1) g(pVp ),
p

where (Vp )p is a sequence of independent geometric random variables with


 1 1
P(Vp = k) = 1 −
p pk
for k > 0.
Recall that, in terms of p-adic valuations of integers, we can write
X
g(n) = g(pvp (n) )
p

for any integer n > 1. Since the sequence of p-adic valuations converges in law to the
sequence (Vp ) (Corollary 1.3.9), the formula (2.1) for the limiting distribution appears
as a completely natural expression.
Proof. We write g = g [ + g ] where both summands are additive functions, and
(
g(p) if k = 1 and |g(p)| 6 1
g [ (pk ) =
0 otherwise.
Thus g ] (p) = 0 for a prime p unless |g(p)| > 1. We denote by (Bp ) the Bernoulli random
variable indicator function of the event {Vp = 1}; we have
1 1
P(Bp = 1) = 1− .
p p
We will prove that the vectors (g [ (SN ), g ] (SN )) converge in law to
X X 
g [ (pVp ), g ] (pVp ) ,
p p

and the desired conclusion then follows by composing with the continuous addition map
C2 → C (i.e., applying Proposition B.3.2).
We will apply Proposition B.4.4 to the random vectors GN = (g [ (SN ), g ] (SN )) (with
values in C2 ), with the approximations GN = GN,M + EN,M where
X X 
[ vp (SN ) ] vp (SN )
GN,M = g (p ), g (p ) .
p6M p6M

Let M > 1 be fixed. The random vectors GN,M are finite sums, and are expressed as
obviously continuous functions of the valuations vp of the elements of ΩN , for p 6 M. Since
the vector of these valuations converges in law to (Vp )p6M by Corollary 1.3.9, applying
composition with a continuous map (Proposition B.3.2 again), it follows that (GN,M )N
converges in law as N → +∞ to the vector
X X 
g [ (pVp ), g ] (pVp ) .
p6M p6M

It is therefore enough to verify that Assumption (2) of Proposition B.4.4 holds, and
we may do this separately for each of the two coordinates of the vector (by taking the
norm on C2 in the proposition to be the maximal of the modulus of the two coordinates).
23
We begin with the second coordinate involving g ] . For any δ > 0, and 2 6 M < N,
we have
 X  X X
PN g ] (pvp (SN ) ) > δ 6 PN (vp (SN ) > 2) + PN (vp (SN ) = 1)
M<p6N M<p6N M<p6N
|g(p)|>1
X 1 X 1
(2.2) 6 +
p>M
p2 p>M
p
|g(p)|>1

(simply because, if the sum is non-zero, at least one term must be non-zero, and the
probability of a union of countably many sets is bounded by the sums of the probabilities
of the individual sets).
Since the right-hand side converges to 0 as M → +∞ (by assumption), this verifies
that the variant discussed in Remark B.4.5 of the assumption of Proposition B.4.4 holds
(note that the series
X
g ] (pVp )
p6M

converges in law by a straightforward application of Kolmogorov’s Three Series Theorem,


which is stated in Remark B.10.2 – indeed, since |g ] | > 1, it suffices to observe that
X
P(|g ] (pV
p )| > 2) < +∞,
p6M

which follows by arguing as in (2.2).)


We next handle g [ . We denote by BN,p the Bernoulli random variable indicator of the
event {vp (SN ) = 1}, and define
$N (p) = PN (BN,p = 1) = PN (vp (SN ) = 1).
We also write $(p) = P(Bp = 1). Note that
1 1 1 1 1
$N (p) 6 , $N (p) = 1− +O = $(p) + O .
p p p N N
The first coordinate of EN,M is
X X
HN,M = g [ (pVp ) = g [ (p)BN,p
p>M p>M

(which is a finite sum for each n ∈ ΩN , so convergence issues do not arise). We will prove
that
lim lim sup EN (|HN,M |2 ) = 0,
M→+∞ N→+∞

which will also us to conclude.


By expanding the square, we have
X 2 X  
(2.3) EN (|HN,M |2 ) = EN g [ (p)BN,p = EN g [ (p1 )g [ (p2 )BN,p1 BN,p2 .
p>M p1 ,p2 >M

The contribution of the diagonal terms p1 = p2 to (2.3) is


X X |g [ (p)|2
|g [ (p)|2 $N (p) 6 .
p>M p>M
p
24
We have
1
EN (BN,p1 BN,p2 ) = PN (vp1 (SN ) = vp2 (SN ) = 1) = $(p1 )$(p2 ) + O
N
(by Example 1.3.10), so that the non-diagonal terms become
X 1 X 
(2.4) g [ (p1 )g [ (p2 )$(p1 )$(p2 ) + O |g [ (p1 )||g [ (p2 )|
p ,p >M
N p ,p >M
1 2 1 2
p1 6=p2 p1 p2 6N

The first term S1 in this sum is


X 2 X X 2 X g [ (p)  1 2
S1 = g [ (p)$(p) − |g [ (p)|2 $(p)2 6 g [ (p)$(p) = 1− ,
p>M p>M p>M p>M
p p
where the right-hand side of the last equality is convergent because of the assumptions
of the theorem, so that the left-hand side is also finite.
Next, since |g [ (p)| 6 1 for all primes, the second term S2 in (2.4) satisfies
1 X log log N
S2  1
N p ,p >M log N
1 2
p1 p2 6N

for all M > 1 by Chebychev’s estimate of Proposition C.3.1 (extended to products of two
primes as in Exercise C.3.2 (2)). Finally, from the convergence assumptions, this means
that
X g [ (p) 2 X |g [ (p)|2
lim sup EN (|HN,M |2 )  + →0
N→+∞
p>M
p p>M
p
as M → +∞, and this concludes the proof. 
Remark 2.2.2. The result above is due to Erdős [33]; the fact that the converse
assertion also holds (namely, that if the sequence (g(SN ))N converges in law, then the
three series
X g(p) X |g(p)|2 X 1
, ,
p p p
|g(p)|61 |g(p)|61 |g(p)|>1

are convergent) is known as the Erdős–Wintner Theorem [36]. The reader may be inter-
ested in thinking about proving this; see, e.g., [115, p. 327–328] for the details.
Although it is of course customary and often efficient to pass from additive func-
tions to multiplicative functions by taking the logarithm, this is not always possible. For
instance, the (multiplicative) Möbius function µ(n) does have the property that the se-
quence (µ(SN ))N converges in law to a random variable taking values 0, 1 and −1 with
probabilities which are equal, respectively, to
6 3 3
1 − 2, 2
, .
π π π2
The limiting probability that µ(n) = 0 comes from the elementary Proposition 1.3.3, but
the fact that, among the values 1 and −1, the asymptotic probability is equal, is quite a
bit deeper: it turns out to be “elementarily” equivalent to the Prime Number Theorem
in the form
x
π(x) ∼
log x
as x → +∞ (see, e.g., [59, §2.1] for the proof). However, there is no additive func-
tion log µ(n), so we cannot even begin to speak of its potential limiting distribution!
25
0.30

0.25

0.20

0.15

0.10

0.05

-1 1 2 3 4 5

Figure 2.1. The normalized number of prime divisors for n 6 1010 .

2.3. The Erdős–Kac Theorem


We begin by recalling the statement (see Theorem 1.1.1), in its probabilistic phrasing:
Theorem 2.3.1 (Erdős–Kac Theorem). For N > 1, let ΩN = {1, . . . , N} with the
uniform probability measure PN . Let XN be the random variable
ω(n) − log log N
n 7→ √
log log N
on ΩN for N > 3. Then (XN )N>3 converges in law to a standard gaussian random variable,
i.e., to a gaussian random variable with expectation 0 and variance 1.
Figure 2.1 shows a plot of the empirical density of XN for N = 1010 : one can see
something that could be the shape of the gaussian density appearing, but the fit is very
far from perfect (we will comment later why this could be expected).
The original proof of Theorem 2.3.1 is due to Erdős and Kac in 1939 [35]. We will
explain a proof following the work of Granville and Soundararajan [51] and of Billings-
ley [9, p. 394]. As usual, the presentation emphasizes the probabilistic nature of the
argument.
As before, we begin by explaining why the statement can be considered to be unsur-
prising. This is an elaboration of the type of heuristic argument that we used to justify
the limit in Theorem 2.2.1.
The arithmetic function ω is additive. Write
X
ω(n) = Bp (n)
p

for n ∈ ΩN , where Bp is as usual the Bernoulli random variable on ΩN that is the char-
acteristic function of the event p | n. Using Proposition 1.3.7, the natural probabilistic
guess for a limit (if there was one) would be the series
X
Bp
p

where (Bp ) are independent Bernoulli random variables, as in Proposition 1.4.1. But this
series diverges almost surely: indeed, the series
X X1
E(Bp ) =
p p
p
26
diverges by the basic Mertens estimate from prime number theory, namely
X1
= log log N + O(1)
p6N
p

for N > 3 (see Proposition C.3.1 in Appendix C), so that the divergence follows from
Kolmogorov’s Theorem B.10.1 (or indeed an application of the Borel–Cantelli Lemma,
see Exercise B.10.4).
One can however refine the formula for ω by observing that n ∈ ΩN has no prime
divisor larger than N, so that we also have
X
(2.5) ω(n) = Bp (n)
p6N

for n ∈ ΩN . Correspondingly, we may expect that the probabilistic distribution of ω


on ΩN will be similar to that of the sum
X
(2.6) Bp .
p6N

But the latter is a sum of independent (though not identically distributed) random
variables, and its asymptotic behavior is therefore well-understood. In fact, a simple case
of the Central Limit Theorem (see Theorem B.7.2) implies that the renormalized random
variables X X
Bp − p−1
p6N p6N
sX
p−1 (1 − p−1 )
p6N

converge in law to a standard gaussian random variable. It is then to be expected that


the arithmetic sums (2.5) are sufficiently close to (2.6) so that a similar renormalization
of ω on ΩN will lead to the same limit, and this is exactly the statement of Theorem 2.3.1
(by the Mertens Formula again).
We now begin the rigorous proof. We will prove convergence in law using the method
of moments, as explained in Section B.3 of Appendix B, specifically in Theorem B.5.5
and Remark B.5.9. This is definitely not the only way to confirm the heuristic above,
but it may be the simplest.
More precisely, we will proceed as follows:
(1) We show, using Theorem 1.3.1, that for any fixed integer k > 0, we have
EN (XkN ) = E(XkN ) + o(1),
where (XN ) is the same renormalized random variable described above, namely
ZN − E(ZN )
XN = p
V(ZN )
with
X
(2.7) ZN = Bp .
p6N

(2) As we already mentioned, the Central Limit Theorem applies to the sequence
(XN ), and shows that it converges in law to a standard gaussian random variable
N.
27
(3) It follows that
lim EN (XkN ) = E(Nk ),
N→+∞
and hence, by the method of moments (Theorem B.5.5), we conclude that XN
converges in law to N. (Interestingly, we do not need to know the value of the
moments E(Nk ) for this argument to apply.)
This sketch indicates that the Erdős-Kac Theorem is really a result of very general
nature that should be valid for many random integers, and not merely for a uniformly
chosen integer in ΩN . Note that only Step 1 has real arithmetic content. As we will see,
that arithmetic content is concentrated on two results: Theorem 1.3.1, which makes the
link with probability theory, and the Mertens estimate, which is only required in the form
of the divergence of the series
X1
,
p
p
(at least if one is ready to use its partial sums
X1
.
p6N
p

for renormalization, instead of the asymptotic value log log N.)


We now implement this strategy. As will be seen, some tweaks will be required. (The
reader is invited to check that omitting those tweaks leads, at the very least, to a much
more complicated-looking problem!).
Step 1 (Truncation). This is a classical technique that applies here, and is used to
shorten and simplify the sum in (2.7), in order to control the error terms in the next step.
We consider the random variables Bp on ΩN as above, i.e., Bp (n) = 1 if p divides n and
Bp (n) = 0 otherwise. Let
X1
σN = .
p6N
p
We only need recall at this point that σN → +∞ as N → +∞. We then define
1/3
(2.8) Q = N1/(log log N)
and
X X X 1
ω̃(n) = 1= Bp (n), ω̃0 (n) = Bp (n) − ,
p6Q p6Q
p
p|n
p6Q
viewed as random variables on ΩN . The point of this truncation is the following: first,
for n ∈ ΩN , we have
ω̃(n) 6 ω(n) 6 ω̃(n) + (log log N)1/3 ,
simply because if α > 0 and if p1 , . . . , pm are primes > Nα dividing n 6 N, then we get
Nmα 6 p1 · · · pm 6 N,
and hence m 6 α−1 . Second, for any N > 1 and any n ∈ ΩN , we get by definition of σN
the identity
X1
ω̃0 (n) = ω̃(n) −
p6Q
p

(2.9) = ω(n) − σN + O((log log N)1/3 )


28
because the Mertens formula
X1
= log log x + O(1),
p6x
p
and the definition of σN show that
X1 X1
= + O(log log log N) = σN + O(log log log N).
p6Q
p p6N p

Now define
ω̃0 (n)
X̃N (n) = √
σN
as random variables on ΩN . We will prove that X̃N converges in law to N. The elementary
Lemma B.5.3 of Appendix B (applied using (2.9)) then shows that the random variables
ω(n) − σN
n 7→ √
σN
converge in law to N. Finally, applying the same lemma one more time using the Mertens
formula we obtain the Erdős-Kac Theorem.
It remains to prove the convergence of X̃N . We fix a non-negative integer k, and our
target is to prove the limit
(2.10) EN (X̃kN ) → E(Nk )
as N → +∞. Once this is proved for all k, then the method of moments shows that (XN )
converges in law to the standard normal random variable N.
Remark 2.3.2. We might also have chosen to perform a truncation at p 6 Nα for some
fixed α ∈]0, 1[. However, in that case, we would need to adjust the value of α depending
on k in order to obtain (2.10), and then passing from the truncated variables to the
original ones would require some minor additional argument. Note that the function
(log log N)1/3 which is used to define the truncation could be replaced by any function
going to infinity slower than (log log N)1/2 .

Step 2 (Moment computation). We now begin the proof of (2.10). We use the
definition of ω̃0 (n) and expand the k-th power in EN (X̃kN ) to derive
1 X X  1  1 
EN (X̃kN ) = k/2 ··· EN Bp1 − · · · Bpk −
σN p1 6Q pk 6Q p1 pk

(where we omit for simplicity the subscripts N for the arithmetic random variables Bpi ).
The crucial point is that the random variable
 1  1
(2.11) Bp1 − · · · Bpk −
p1 pk
can be expressed as f (πq ) for some modulus q > 1 and some function f : Z/qZ −→ C,
so that the basic result of Theorem 1.3.1 may be applied to each summand.
To be precise, the value at n ∈ ΩN of the random variable (2.11) only depends on the
residue class x of n in Z/qZ, where q is the least common multiple of p1 , . . . , pk . In fact,
this value is equal to f (x) where
 1  1
f (x) = δp1 (x) − · · · δpk (x) −
p1 pk
29
with δpi denoting the characteristic function of the residues classes modulo q which are 0
modulo pi . It is clear that |f (x)| 6 1, as product of terms which are all 6 1, and hence
we have the bound
kf k1 6 q
(this is extremely imprecise, but suffices for now). From this we get
 1  1  2q 2Qk
EN Bp1 − · · · Bp k − − E(f ) 6 6
p1 pk N N
by Theorem 1.3.1.
But by the definition of f , we also see that
 1  1 
E(f ) = E Bp1 − · · · Bpk −
p1 pk
where the random variables (Bp ) form a sequence of independent Bernoulli random vari-
ables with P(Bp = 1) = 1/p (the (Bp ) for p dividing q are realized concretely as the
characteristic functions δp on Z/qZ with uniform probability measure).
Therefore we derive
1 X X n  1  1  o
EN (X̃kN ) = k/2 ··· E Bp1 − · · · Bp k − + O(Qk N−1 )
σN p1 6Q pk 6Q p1 pk
 τ k/2
N
= E(XkN ) + O(Q2k N−1 )
σN
 τ k/2
N
= E(XkN ) + o(1)
σN
by our choice (2.8) of Q, where
1 X 1
XN = √ Bp −
τN p6Q p

and
X 1 1 X
τN = 1− = V(Bp ).
p6Q
p p p6Q

Step 3 (Conclusion). We now note that the version of the Central Limit Theorem
which is recalled in Theorem B.7.2 applies to the random variables (Bp ), and implies
precisely that XN converges in law to N. But moreover, the sequence (XN ) satisfies
the uniform integrability assumption in the converse of the method of moments (see
Example B.5.7, applied to the variables Bp − 1/p, which are independent and bounded
by 1), and hence we have in particular
E(XkN ) −→ E(Nk ).
Since τN ∼ σN by the Mertens formula, we deduce that EN (X̃kN ) converges also to E(Nk ),
which was our desired goal (2.10).
Exercise 2.3.3. One can avoid appealing to the converse of the method of moments
by directly using the combinatorics involved in proofs of the Central Limit Theorem based
on moments, which directly imply the convergence of moments for (XN ). Find such a
proof in this special case. (See for instance [9, p. 391]; note that one must then know what
are the moments of gaussian random variables,; these are recalled in Proposition B.7.3).
30
Exercise 2.3.4. Consider the probability spaces Ω[N consisting of integers 1 6 n 6 N
that are squarefree, with the uniform probability measure. Prove a version of the Erdős–
Kac Theorem for the number of prime factors of an element of Ω[N .
Exercise 2.3.5. For an integer N > 1, let m(N) denote the set of integers that occur
in the multiplication table for integers 1 6 n 6 N:
m(N) = {k = ab | 1 6 a 6 N, 1 6 b 6 N} ⊂ ΩN2 .
Prove that PN2 (m(N)) → 0, i.e., that
|m(N)|
lim = 0.
N→+∞ N2
This result is the basic statement concerning the “multiplication table” problem of
Erdős; the precise asymptotic behavior of |m(N)| has been determined by K. Ford [41]
(improving results of Tenenbaum): we have
|m(N)|
 (log N)−α (log log N)−3/2
N2
where
1 + log log 2
α=1− .
log 2
See also the work of Koukoulopoulos [64] for generalizations.
Exercise 2.3.6. Let Ω(n) be the number of prime divisors of an integer n > 1,
counted with multiplicity (so Ω(12) = 3).1 Prove that
 
PN Ω(n) − ω(n) > (log log N)1/4 6 (log log N)−1/4 ,
and deduce that the random variables
Ω(n) − log log N
n 7→ √
log log N
also converge in law to N.
Exercise 2.3.7. Try to prove the Erdős–Kac Theorem using the same “approxima-
tion” approach used in the proof of the Erdős–Wintner Theorem; what seems to go wrong
(suggesting – if not proving – that one really should use different tools).
2.4. Convergence without renormalization
One important point that is made clear by the proof of the Erdős-Kac Theorem is that,
although one might think that a statement about the behavior of the number of prime
factors of integers tells us something about the distribution of primes (which are those
integers n with ω(n) = 1), the Erdős-Kac Theorem provides no such information. This
can be seen mechanically from the proof, where the truncation step means in particular
that primes are simply discarded unless they are smaller than the truncation level Q,
or intuitively from the fact that the statement itself implies that “most” integers of size
about N have log log N prime factors. For instance, as N → +∞, we have
 p 
PN |ω(n) − log log N| > a log log N −→ P(|N| > a)
r Z +∞
2 2 2
6 e−x /2 dx 6 e−a /4 .
π a
1 We only use this function in this section and hope that confusion with ΩN will be avoided.
31
0.30

0.25

0.20

0.15

0.10

0.05

2 4 6 8 10 12

Figure 2.2. The number of prime divisors for n 6 1010 (blue) compared
with a Poisson distribution.

The problem lies in the normalization used to obtain a definite theorem of convergence
in law: this “crushes” to some extent the more subtle aspects of the distribution of
values of ω(n), especially with respect to extreme values. One can however still study
this function probabilistically, but one must use less generic methods, to go beyond the
“universal” behavior given by the Central Limit Theorem. There are at least two possible
approaches in this direction, and we now briefly survey some of the results.
Both methods have in common a switch in probabilistic focus: instead of looking
for a gaussian approximation of a normalized version of ω(n), one looks for a Poisson
approximation of the un-normalized function.
Recall (see also Section B.9 in the Appendix) that a Poisson distribution with real
parameter λ > 0 satisfies
λk
P(λ = k) = e−λ
k!
for any integer k > 0. It turns out that an inductive computation using the Prime
Number Theorem leads to the asymptotic formula
1 1 (log log N)k−1 (log log N)k−1
|{n 6 N | ω(n) = k}| ∼ = e− log log N ,
N (k − 1)! log N (k − 1)!
for any fixed integer k > 1. This suggests that a better probabilistic approximation to the
arithmetic function ω(n) on ΩN is a Poisson distribution with parameter log log N. The
Erdős-Kac Theorem would then be, in essence, a consequence of the simple fact that a
sequence (Xn ) of Poisson random variables with parameters λn → +∞ has the property
that
Xn − λn
(2.12) √ → N,
λn
as explained in Proposition B.9.1. Figure 2.2 shows the density of the values of ω(n)
for n 6 1010 and the corresponding Poisson density. (The values of the probabilities for
consecutive integers are joined by line segments for readability).

Remark 2.4.1. The fact that the approximation error in such a statement is typically
−1/2
of size comparable to λn explains why one can expect that the convergence to a gaussian
in the Erdős–Kac Theorem should be extremely slow, since in that case the normalizing
factor is of size log log N, and goes to infinity very slowly.
32
To give a rigorous meaning to these ideas of Poisson approximation of ω(n), one must
first give a precise definition, which can not be a straightforward convergence property,
because the parameter of the Poisson approximation is not fixed.
Harper [53] (to the author’s knowledge) was the first to implement explicitly such
an idea. He derived an explicit upper-bound for the total variation distance between a
truncated version of ω(n) on ΩN and a suitable Poisson random variable, namely between
2
X
1, where Q = N1/(3 log log N)
p|n
p6Q

and a Poisson random variable PoN with parameter


X 1 jNk
λN =
p6Q
N p

(so that the Mertens formula implies that λN ∼ log log N).
Precisely, Harper proves that for any subset A of the non-negative integers, we have
X  1
PN 1 ∈ A − P(PoN ∈ A)  ,
log log N
p|n
p6Q

and moreover that the decay rate (log log N)−1 is best possible. This requires some
additional arithmetic information than the proof of Theorem 2.3.1 (essentially some form
of sieve), but the arithmetic ingredients remain to a large extent elementary. On the
other hand, new ingredients from probability theory are involved, especially cases of
Stein’s Method for Poisson approximation.
A second approach starts from a proof of the Erdős–Kac Theorem due to Rényi and
Turán [100], which is the implementation of the Lévy Criterion for convergence in law.
Precisely, they prove that
it −1
(2.13) EN (eitω(n) ) = (log N)e (Φ(t) + o(1))
for any t ∈ R as N → +∞ (in fact, uniformly for t ∈ R – note that the function here is
2π-periodic), with a factor Φ(t) given by
1 Y 1 eit  eit 
(2.14) Φ(t) = 1− 1+ ,
Γ(eit ) p p p−1

where the product over all primes is absolutely convergent. Recognizing that the term
it
(log N)e −1 is the characteristic function of a Poisson random variable PoN with parameter
log log N, one can then obtain the Erdős-Kac Theorem by the same computation that
leads to (2.12), combined with the continuity of Φ that shows that
 t 
Φ √ −→ Φ(0) = 1
log log N
as N → +∞.
The computation that leads to (2.13) is now interpreted as an instance of the Selberg–
Delange method (see [115, II.5, Th. 3] for the general statement, and [115, II.6, Th. 1]
for the special case of interest here).
It should be noted that the proof of (2.13) is quite a bit deeper than the proof
of Theorem 2.3.1, and this is at it should be, because this formula contains precise
information about the extreme values of ω(n), which we saw are not relevant to the
33
Erdős-Kac Theorem. Indeed, taking t = π and observing that Φ(π) = 0 (because of the
pole of the Gamma function), we obtain
1 X ω(n) −iπω(n)
 1 
(−1) = E(e )=o
N n6N (log N)2

It is well-known (as for the partial sums of the Möbius function, mentioned in Re-
mark 2.2.2) that this implies elementarily the Prime Number Theorem
X N
1∼
p6N
log N

(see again [59, §2.1]).


The link between the formula (2.13) and Poisson distribution was noticed in joint
work with Nikeghbali [77]. Among other things, we remarked that it implies easily a
bound for the Kolmogorov–Smirnov distance between n 7→ ω(n) on ΩN and a Poisson
random variable PoN . Additional work with A. Barbour [5] leads to bounds in total vari-
ation distance, and to even better (but non-Poisson) approximations. Another suggestive
remark is that if we consider the independent random variables that appear in the proof
of the Erdős-Kac theorem, namely
X 1
XN = Bp − ,
p6N
p

where (Bp ) is a sequence of independent Bernoulli random variables with P(Bp = 1) =


1/p, then we have (by a direct computation) the following analogue of (2.13):

itXN eit −1
Y 1 eit  eit  
E(e ) = (log N) 1− 1+ + o(1) .
p
p p−1

It is natural to ask then if there is a similar meaning to the factor 1/Γ(eit ) that also
appears in (2.14). And there is: for N > 1, define `N as the random variable on the
symmetric group SN that maps a permutation σ to the number of cycles in its canonical
cyclic representation (where we count fixed points as cycles of length 1, so for instance
we have `N (1) = N). Then, giving SN the uniform probability measure, we have
it
 1 
(2.15) E(eit`N ) = Ne −1 + o(1) ,
Γ(eit )
corresponding to a Poisson distribution with parameter log N this time. This is not an
isolated property: see the survey paper of Granville [48] for many significant analogies
between (multiplicative) properties of integers and random permutations.2
Remark 2.4.2. Observe that (2.13) would be true if we had a decomposition
ω(n) = PoN (n) + YN (n)
as random variables on ΩN , where YN is independent of PoN and converges in law to a
random variable with characteristic function Φ. However, this is not in fact the case,
because Φ is not a characteristic function of a probability measure! (It is unbounded on
R).
2 Some readers might also enjoy the comic-book version [49].
34
Exercise 2.4.3. The goal of this exercise is to give a proof of the formula (2.15). We
assume basic familiarity with the notion of tensor product of vector spaces and symmetric
powers of vector spaces, and elementary representation theory of finite groups.
For N > 1, we define `N as a random variable on SN as above.
(1) Show that the formula (2.15) follows from the exact expression
N 
it`N
Y 1 eit 
E(e )= 1− +
j=1
j j

valid for all N > 1 and all t ∈ R. [Hint: Use the formula
1 Y z  1 −z
= 1+ 1+
Γ(z + 1) k>1 k k

which is valid for all z ∈ C (this is due to Euler).]


(2) Show that (1) is also equivalent with the formula
N 
`N
Y 1 m
(2.16) E(m ) = 1− +
j=1
j j

for all N > 1 and all integers m > 0.


(3) Let m > 0 be a fixed integer. Let V be an m-dimensional complex vector space.
For any N > 1, there is a homomorphism
%N : SN → GL(V ⊗ · · · ⊗ V) = GL(V⊗N )
(with N tensor factors) such that σ ∈ SN is sent to the unique automorphism of the
tensor power V⊗N which satisfies
x1 ⊗ · · · ⊗ xN 7→ xσ(1) ⊗ · · · ⊗ xσ(N) .
for all (x1 , . . . , xN ) ∈ V⊗N . (This is a representation of SN on the vector space V⊗N ; note
that this space has dimension mN .)
(4) Show that for any σ ∈ SN , the trace of the automorphism %N (σ) of V⊗N is equal
to m`N (σ) .
(5) Deduce that the formula (2.16) holds. [Hint: Use the fact that for any representa-
tion % : G → GL(E) of a finite group on a finite-dimensional C-vector space, the average
of the trace of %(g) over g ∈ G is equal to the dimension of the space of vectors x ∈ E
that are invariant, i.e., that satisfy %(g)(x) = x for all g ∈ G (see, e.g. [70, Prop. 4.3.1]
for this); then identify this space to compute its dimension.]
(6) Deduce also from (2.16) that there exists a sequence (Bj )j>1 of independent
Bernoulli random variables such that we have an equality in law
`N = B1 + · · · + BN
for all N > 1, and P(Bj = 1) = 1/j for all j > 1. (This decomposition is often obtained
by what is called the “Chinese Restaurant Process” in the probabilistic literature; see for
instance [2, Example 2.4].)

2.5. Final remarks


Classically, the Erdős–Wintner and the Erdős–Kac Theorem (and related topics) are
presented in a different manner, which is well illustrated in the book of Tenenbaum [115,
35
III.1, III.2]. This emphasizes the notion of density of sets of integers, namely quantities
like
1
lim sup |{1 6 n 6 N | n ∈ A}|
N→+∞ N
for a given set A, or the associated liminf, or the limit when it exists. Convergence in
law is then often encapsulated in the existence of these limits for sets of the form
A = {n > 1 | f (n) 6 x},
the limit F(x) (which is only assumed to exist for continuity points of F) being a “distri-
bution function”, i.e., F(x) = P(X 6 x) for some real-valued random variable X.
Our emphasis on a more systematic probabilistic presentation has the advantage of
leading more naturally to the use of purely probabilistic techniques and insights. This will
be especially relevant when we consider random variables with values in more complicated
sets than R (as we will do in the next chapters), in which case the analogue of distribution
functions becomes awkward or simply doesn’t exist. Our point of view is also more natural
when we come to consider arithmetic random variables YN on ΩN that genuinely depend
on N, in the sense that there doesn’t exist an arithmetic function f such that YN is the
restriction of f to ΩN for all N > 1.
Among the many generalizations of the Erdős–Kac Theorem (and related results for
more general arithmetic functions), we wish to mention Billingsley’s work [8, Th. 4.1,
Example 1, p. 764] that obtains a functional version where the convergence in law is
towards Brownian motion (we refer to Billingsley’s very accessible text [7] for a first
presentation of Brownian motion, and to the book of Revuz and Yor [104] for a complete
modern treatment): for 0 6 t 6 1, define a random variable X e N on ΩN with values in the
Banach space C([0, 1]) of continuous functions on [0, 1] by putting X e N (n)(0) = 0 and

e N (n) log log k = 1


  X 
X 1 − log log k
log log N (log log N)1/2
p|n
p6k

for 2 6 k 6 N, and by linear interpolation between such points. Then Billingsley proves
that Xe N converges in law to Brownian motion as N → +∞.
Another very interesting limit theorem of Billingsley (see [6] and also [10, Th. I.4.5])
deals with the distribution of all the prime divisors of an integer n ∈ ΩN , and establishes
convergence in law of a suitable normalization of these. Precisely, let X be the compact
topological space Y
X= [0, 1].
k>1
For all integers n > 1, denote by
p1 > p2 > · · · > pΩ(n)
the prime divisors of n, counted with multiplicity and in non-increasing order. Moreover,
define pk = 1 if k > Ω(n). Define then an X-valued random variable DN = (DN,k )k>1
where
log pk
DN,k (n) =
log n
DN,k (n)
for n ∈ ΩN (in other words, we have pk = n ). Then Billingsley proved that the
random variables DN converge, as N → +∞, to a measure on X, which is called the
Poisson–Dirichlet distribution (with parameter 1). This measure is quite an interesting
one, and occurs also (among other places) in a similar limit theorem for random variables
36
encoding the length of the cycles occurring in a random permutation, again ordered to
be non-increasing (another example of the connections between prime factorizations and
permutations which were mentioned in the previous section 2.4).
A shorter proof of this limit theorem was given by Donnelly and Grimmett [27]. It
is based on the remark that the Poisson–Dirichlet measure is the image under a certain
continuous map of the natural measure on X under which the components of elements
of X form a sequence of independent uniformly distributed random variables on [0, 1];
arithmetically, it turns out to depend only on the estimate
X log p
= log N + O(1),
p6N
p
which is at the same level of depth as the Mertens formula (see C.3.1 (3)).
[Further references: Tenenbaum [115].]

37
CHAPTER 3

The distribution of values of the Riemann zeta function, I

Probability tools Arithmetic tools


Definition of convergence in law (§ B.3) Dirichlet series (§ A.4)
Kolmogorov’s Theorem for random series Riemann zeta function (§ C.4)
(th. B.10.1)
Weyl Criterion and Kronecker’s Theorem Fundamental theorem of arithmetic
(§ B.6, th. B.6.5)
Menshov–Rademacher Theorem (th. B.10.5) Mean square of ζ(s) outside the critical line
(prop. C.4.1)
Lipschitz test functions (prop. B.4.1) Euler product (lemma C.1.4)
Support of a random series (prop. B.10.8) Strong Mertens estimate and Prime Number
Theorem (cor. C.3.4)

3.1. Introduction
The Riemann zeta function is defined first for complex numbers s such that Re(s) > 1,
by means of the series
X 1
ζ(s) = .
n>1
ns
It plays an important role in prime number theory, arising because of the famous Euler
product formula, which expresses ζ(s) as a product over primes, in this region: we have
Y
(3.1) ζ(s) = (1 − p−s )−1
p

if Re(s) > 1 (see Corollary C.1.5). By standard properties of series of holomorphic


functions (note that s 7→ ns = es log n is entire for any n > 1), the Riemann zeta function
is holomorphic for Re(s) > 1. It is of crucial importance however that it admits an
analytic continuation to C − {1}, with furthermore a simple pole at s = 1 with residue
1.
This analytic continuation can be performed simultaneously with the proof of the
functional equation: the function defined by
ξ(s) = π −s/2 Γ(s/2)ζ(s)
satisfies
ξ(1 − s) = ξ(s),
38
and has simple poles with residue 1 at s = 0 and s = 1. Since the inverse of the
Gamma function is an entire function (Proposition A.3.2), the analytic continuation of
the Riemann zeta function follows immediately.
However, for many purposes (including the results of this chapter), it is enough to
know that ζ(s) has analytic continuation for Re(s) > 0, and this can be checked quickly
using the following computation, based on summation by parts (Lemma A.1.1): using the
notation hxi for the fractional part of a real number x, namely the unique real number
in [0, 1[ such that x − hxi ∈ Z for Re(s) > 1,1 we have
X 1 Z +∞  X 
s
=s 1 t−s−1 dt
n>1
n 1 16n6t
Z +∞
=s (t − hti)t−s−1 dt
1
Z +∞ Z +∞ Z +∞
−s −s−1 s
=s t dt − s htit dt = −s htit−s−1 dt.
1 1 s−1 1

The rational function s 7→ s/(s − 1) has a simple pole at s = 1 with residue 1. Also,
since 0 6 hti 6 1, the integral defining the function
Z +∞
s 7→ s htit−s−1 dt
1

is absolutely convergent, and therefore this function is holomorphic, for Re(s) > 0. The
expression above then shows that the Riemann zeta function is meromorphic, with a
simple pole at s = 1 with residue 1, for Re(s) > 0.
Since ζ(s) is quite well-behaved for Re(s) > 1, and since the Gamma function is a
very well-known function, the functional equation ξ(1 − s) = ξ(s) shows that one can
understand the behavior of ζ(s) for s outside of the critical strip
S = {s ∈ C | 0 6 Re(s) 6 1}.
The Riemann Hypothesis is a fundamental (still conjectural) statement about the Rie-
mann zeta function in the critical strip: it states that if s ∈ S satisfies ζ(s) = 0, then the
real part of s must be 1/2. Because holomorphic functions (with relatively slow growth, a
property true for ζ, although this requires some argument to prove) are essentially char-
acterized by their zeros (just like polynomials are!), the proof of this conjecture would
enormously expand our understanding of the properties of the Riemann zeta function.
Although it remains open, this should motivate our interest in the distribution of values
of the zeta function. Another motivation is that it contains crucial information about
primes, which will be very visible in Chapter 5.
We first focus our attention to a vertical line Re(s) = τ , where τ is a fixed real number
such that τ > 1/2 (the case τ 6 1 will be the most interesting, but some statements do
not require this assumption). We consider real numbers T > 2 and define the probability
space ΩT = [−T, T] with the uniform probability measure dt/(2T). We then view
t 7→ ζ(τ + it)
as a random variable Zτ,T on ΩT = [−T, T]. These are arithmetically defined random
variables. Do they have some specific, interesting, asymptotic behavior?
1 A more standard notation would be {x}, but this clashes with the notation used for set
constructions.
39
Figure 3.1. The modulus of ζ(s + it) for s in the square [3/4 − 1/8, 3/4 +
1/8] × [−1/8, 1/8], for t = 0, 21000, 58000 and 75000.

The answer to this question turns out to depend on τ , as the following first result of
Bohr and Jessen reveals:
Theorem 3.1.1 (Bohr-Jessen). Let τ > 1/2 be a fixed real number. Define Zτ,T as the
random variable t 7→ ζ(τ + it) on ΩT . There exists a probability measure µτ on C such
that Zτ,T converges in law to µτ as T → +∞. Moreover, the support of µτ is compact if
τ > 1, and is equal to C if 1/2 < τ 6 1.
We will describe precisely the measure µτ in Section 3.2: it is a highly non-generic
probability distribution, whose definition (and hence properties) retains a significant
amount of arithmetic, in contrast with the Erdős–Kac Theorem, where the limit is a
very generic distribution.
Theorem 3.1.1 is in fact a direct consequence of a result due to Voronin [119] and
Bagchi [4], which extends it in a very surprising direction. Instead of fixing τ ∈]1/2, 1[
and looking at the distribution of the single values ζ(τ + it) as t varies, we consider for
such τ some radius r such that the disc
D = {s ∈ C | |s − τ | 6 r}
is contained in the interior of the critical strip, and we look for t ∈ R at the functions

D −→ C
ζD,t :
s 7→ ζ(s + it)
which are “vertical translates” of the Riemann zeta function restricted to D. For each
T > 0, we view t 7→ ζD,t as a random variable (say ZD,T ) on ([−T, T], dt/(2T)) with values
in the space H(D) of functions which are holomorphic in the interior of D and continuous
on its boundary. Bagchi’s remarkable result is a convergence in law in this space, i.e.,
a functional limit theorem: there exists a probability measure ν on H(D) such that the
random variables ZD,T converge in law to ν as T → +∞. Computing the support of ν
(which is a non-trivial task) leads to a proof of Voronin’s Universality Theorem: for any
function f ∈ H(D) which does not vanish on D, and for any ε > 0, there exists t ∈ R
such that
kζ(· + it) − f k∞ < ε,
where the norm is the supremum norm on D. In other words, up to arbitrarily small
error, all homomorphic functions f (that do not vanish) can be seen by looking at some
vertical translate of the Riemann zeta function!
We illustrate this fact in Figure 3.1, which presents contour plots of |ζ(s + it)| for
various values of t ∈ R, as functions of s in the square [3/4 − 1/8, 3/4 + 1/8] × [−1/8, 1/8].
Voronin’s Theorem implies that, for suitable t, such a picture will be indistinguishable
from that associated to any holomorphic function on this square that never vanishes
there.
40
We will prove the Bohr-Jessen-Bagchi theorems in the next section, and use in par-
ticular the computation of the support of Bagchi’s limiting distribution for translates of
the Riemann zeta function to prove Voronin’s universality theorem in Section 3.3.
3.2. The theorems of Bohr-Jessen and of Bagchi
We begin by stating a precise version of Bagchi’s Theorem. In the remainder of this
chapter, we denote by ΩT the probability space ([−T, T], dt/(2T)) for T > 1. We will
often write ET (·) and PT (·) for the corresponding expectation and probability.
Theorem 3.2.1 (Bagchi [4]). Let τ be such that 1/2 < τ . If 1/2 < τ < 1, let r > 0
be such that
D = {s ∈ C | |s − τ | 6 r} ⊂ {s ∈ C | 1/2 < Re(s) < 1},
and if τ > 1, let D be any compact subset of {s ∈ C | Re(s) > 1} such that τ ∈ D.
Consider the H(D)-valued random variables ZD,T defined by
t 7→ (s 7→ ζ(s + it))
on ΩT . Let (Xp )p be a sequence of independent random variables, indexed by the primes,
which are identically distributed, with distribution uniform on the unit circle S1 ⊂ C× .
Then we have convergence in law ZD,T −→ ZD as T → +∞, where ZD is the random
Euler product defined by Y
ZD (s) = (1 − p−s Xp )−1 .
p

In this theorem, the space H(D) is viewed as a Banach space (hence a metric space,
so that convergence in law makes sense) with the norm
kf k∞ = sup |f (z)|.
z∈D

We can already see that Theorem 3.2.1 is (much) stronger than the convergence in
law component of Theorem 3.1.1, which we now prove assuming this result:
Corollary 3.2.2. Fix τ such that 1/2 < τ . As T → +∞, the random variables Zτ,T
of Theorem 3.1.1 converge in law to the random variable ZD (τ ), where D is either a disc
D = {s ∈ C | |s − τ | 6 r}
contained in the interior of the critical strip, if τ < 1, or any compact subset of {s ∈ C |
Re(s) > 1} such that τ ∈ D.
Proof. Fix D as in the statement. Tautologically, we have
Zτ,T = ζD,T (τ )
or Zτ,T = eτ ◦ ζD,T , where 
H(D) −→ C

f 7→ f (τ )
is the evaluation map. This map is continuous on H(D), so it follows by composition
(Proposition B.3.2 in Appendix B) that the convergence in law ZD,T −→ ZD of Bagchi’s
Theorem implies the convergence in law of Zτ,T to the random variable eτ ◦ ZD , which is
simply ZD (τ ). 
In order to prove the final part of Theorem 3.1.1, and to derive Voronin’s universality
theorem, we need to understand the support of the limit ZD in Bagchi’s Theorem. We
will prove in Section 3.3:
41
Theorem 3.2.3 (Bagchi, Voronin). Let τ be such that 1/2 < τ < 1, and r such that
D = {s ∈ C | |s − τ | 6 r} ⊂ {s ∈ C | 1/2 < Re(s) < 1}.
The support of ZD contains
H(D)× = {f ∈ H(D) | f (z) 6= 0 for all z ∈ D}
and is equal to H(D)× ∪ {0}.
In particular, for any function f ∈ H(D)× , and for any ε > 0, there exists t ∈ R such
that
(3.2) sup |ζ(s + it) − f (s)| < ε.
s∈D

It is then obvious that if 1/2 < τ < 1, the support of the Bohr-Jessen random variable
ZD (τ ) is equal to C.
Exercise 3.2.4. Prove that the support of the Bohr-Jessen random variable ZD (1) is
also equal to C.
We now begin the proof of Theorem 3.2.1 by giving some intuition for the result
and in particular for the shape of the limiting distribution. Indeed, this very elementary
argument will suffice to prove Bagchi’s Theorem in the case τ > 1. This turns out to be
similar to the intuition behind the Erdős-Kac Theorem. We begin with the Euler product
Y
ζ(s + it) = (1 − p−s−it )−1 ,
p

which is valid for Re(s) > 1. We can express this also (formally, we “compute the
logarithm”, see Proposition A.2.2 (2)) in the form
 X 
(3.3) ζ(s + it) = exp − log(1 − p−s−it ) .
p

This displays the Riemann zeta function on ΩT as the exponential of a sum involving
the sequence (indexed by primes) of random variables (Xp,T )p such that
Xp,T (t) = p−it ,
each taking value in the unit circle S1 . To understand how the zeta function will behave
statistically on ΩT , the first step is to understand the limiting behavior of this sequence.
This has a very simple answer:
Proposition 3.2.5. For T > 0, let XT = (Xp,T )p be the sequence of random variables
on ΩT given by
t 7→ (p−it )p .
Then XT converges in law as T → +∞ to a sequence X = (Xp )p of independent random
variables, each of which is uniformly distributed on S1 .
Bagchi’s Theorem is therefore to be understood as saying that we can “pass to the
limit” in the formula (3.3) to obtain a convergence in law of ζ(s + it), for s ∈ D, to
 X 
−s
exp − log(1 − p Xp ) ,
p

viewed as a random function.


This sketch is of course incomplete in general, the foremost objection being that we
are especially interested in the zeta function outside of the region of absolute convergence,
42
so the meaning of (3.3) is unclear. But we will see that nevertheless enough connections
remain to carry the argument through.
We isolate the crucial part of the proof of Proposition 3.2.5 as a lemma, since we will
also use it in the proof of Selberg’s Theorem in the next chapter (see Section 4.2).
Lemma 3.2.6. Let r > 0 be a real numbers. We have
 1 
(3.4) | ET (r−it )| 6 min 1, .
T| log r|
In particular, if r = n1 /n2 for some positive integers n1 6= n2 , then we have
 √n n 
−it 1 2
(3.5) ET (r )  min 1,
T
where the implied constant is absolute.
Proof of Lemma 3.2.6. Since |r−it | = 1, we see that the expectation is always 6 1.
If r 6= 1, then we get
−it 1 h i −it iT i(riT − r−iT )
ET (r ) = r = ,
2T log r −T 2T(log r)
which has modulus at most | log r|−1 T−1 , hence the first bound holds.
Assume now that r = n1 /n2 with n1 6= n2 positive integers. Assume that n2 > n1 > 1.
Then n2 > n1 + 1, and hence
n1  1 1 1
log > log 1 +  >√ .
n2 n1 n1 n1 n2
If n2 < n1 , we exchange the role of n1 and n2 , and since both sides of the bound (3.5)
are symmetric in terms of n1 and n2 , the result follows. 
Proof of Proposition 3.2.5. It is convenient here to view the sequences (Xp,T )p
and (Xp )p as two random variables on ΩT , taking value in the infinite product
Y
b1 =
S S1
p

of copies of the unit circle indexed by primes. Note that S b 1 is a compact abelian group
(with componentwise product).
In this interpretation, the limit (or more precisely the law of (Xp )) is simply the
probability Haar measure on the group S b 1 (see Section B.6). This allows us to prove
convergence in law using the well-known Weyl Criterion: the statement of the proposition
is equivalent with the property that
(3.6) lim ET (χ(Xp,T )) = 0
T→+∞

b 1 −→ S1 . An elementary property
for any non-trivial continuous unitary character χ : S
of compact groups shows that for any such character there exists a finite non-empty
subset S of primes, and for each p ∈ S some integer mp ∈ Z − {0}, such that
Y
χ(z) = zpmp
p∈S

b 1 (see Example B.6.2(2)). We then have by definition


for any z = (zp )p ∈ S
Z TY Z T
1 −itmp 1
ET (χ(Xp,T )) = p dt = r−it dt
2T −T p∈S 2T −T
43
where r > 0 is the rational number given by
Y
r= pmp .
p∈S

Since we have r 6= 1 (because S is not empty and mp 6= 0), we obtain ET (χ(Xp,T )) → 0


as T → +∞ from (3.4). 
As a corollary, Bagchi’s Theorem follows formally for τ > 1 and D contained in the
set of complex numbers with real part > 1. This is once more a very simple fact which
is often not specifically discussed, but which gives an indication and a motivation for the
more difficult study in the critical strip.
Special case of Theorem 3.2.1 for τ > 1. Assume that τ > 1 and that D is a
compact subset containing τ contained in {s ∈ C | Re(s) > 1}. We view XT = (Xp,T )
b 1 , as before. This is also (as a
as random variables with values in the topological space S
countable product of metric spaces) a metric space. We claim that the map
 1
 S b −→ H(D)
 
ϕ
X
−s
 (x p ) →
7 s →
7 − log(1 − x p p )
p

is continuous (see Definition A.2.1 again for the definition of the logarithm here). If this
is so, then the composition principle (see Proposition B.3.2) and Proposition 3.2.5 imply
that ϕ(XT ) converges in law to the H(D)-valued random variable ϕ(X), where X = (Xp )
with the Xp uniform and independent on S1 . But this is exactly the statement of Bagchi’s
Theorem for D.
Now we check the claim. Fix ε > 0. Let T > 0 be some parameter to be chosen later
in terms of ε. For any x = (xp ) and y = (yp ) in S b 1 , we have
X
kϕ(x) − ϕ(y)k∞ 6 k log(1 − xp p−s ) − log(1 − yp p−s )k∞ +
p6T
X X
k log(1 − xp p−s )k∞ + k log(1 − yp p−s )k∞ .
p>T p>T

Because D is compact in the half-plane Re(s) > 1, the minimum of the real part of
s ∈ D is some real number σ0 > 1. Since |xp | = |yp | = 1 for all primes, and since
| log(1 − z)| 6 2|z|
for |z| 6 1/2 (Proposition A.2.2 (3)), it follows that
X X X
k log(1 − xp p−s )k∞ + k log(1 − yp p−s )k∞ 6 4 p−σ0  T1−σ0 .
p>T p>T p>T

We fix T so that T1−σ0 < ε/2. Now the map


X
(xp )p6T 7→ k log(1 − xp p−s ) − log(1 − yp p−s )k∞
p6T

is obviously continuous, and therefore uniformly continuous since the domain is a compact
set. This function has value 0 when xp = yp for p 6 T, so there exists δ > 0 such that
X ε
| log(1 − xp p−s ) − log(1 − yp p−s )| <
p6T
2
44
if |xp − yp | 6 δ for p 6 T. Therefore, provided that
max |xp − yp | 6 δ,
p6T

we have
kϕ(x) − ϕ(y)k∞ 6 ε.
This proves the (uniform) continuity of ϕ. 
We now begin the proof of Bagchi’s Theorem in the critical strip. The argument fol-
lows partly his original proof [4], which is quite different from the Bohr–Jessen approach,
with some simplifications. Here are the main steps of the proof:
• We prove convergence almost surely of the random Euler product, and of its
formal Dirichlet series expansion; this also shows that they define random holo-
morphic functions;
• We prove that both the Riemann zeta function and the limiting Dirichlet series
are, in suitable mean sense, limits of smoothed partial sums of their respective
Dirichlet series;
• We then use an elementary argument to conclude using Proposition 3.2.5.
We fix from now on a sequence (Xp )p of independent random variables all uniformly
distributed on S1 . We often view the sequence (Xp ) as an S b 1 -valued random variable, as
in the proof of Proposition 3.2.5. Furthermore, for any positive integer n > 1, we define
Y
(3.7) Xn = Xvpp (n)
p|n

where vp (n) is the p-adic valuation of n. Thus (Xn ) is a sequence of S1 -valued random
variables.
Exercise 3.2.7. Prove that the sequence (Xn )n>1 is neither independent nor sym-
metric.
Exercise 3.2.8. The following exercise provides the starting point of recent prob-
abilistic approaches to the problem of estimating the so-called pseudomoments of the
Riemann zeta function (see the thesis of M. Gerspach [46]), although it is often proved
using different approaches, such as the ergodic theorem for flows.
For any real numbers q > 0 and x > 1 and any sequence of complex numbers (an ),
prove that the limit
1 2T X
Z q
lim an n−it dt
T→+∞ T T
n6x
exists, and that it is equal to X q
E an Xn .
n6x

We first show that the limiting random functions are indeed well-defined as H(D)-
valued random variables.
Proposition 3.2.9. Let τ ∈]1/2, 1[ and let Uτ = {s ∈ C | Re(s) > τ }.
(1) The random Euler product defined by
Y
Z(s) = (1 − Xp p−s )−1
p

45
converges almost surely for any s ∈ Uτ . For any compact subset K ⊂ Uτ , the random
function (
K −→ C
ZK :
s 7→ Z(s)
is an H(K)-valued random variable.
(2) The random Dirichlet series defined by
X
Z̃ = Xn n−s
n>1

converges almost surely for any s ∈ Uτ . For any compact subset K ⊂ Uτ , the random
function Z̃K : s 7→ Z̃(s) on K is an H(K)-valued random variable.
(3) We have Z̃K = ZK almost surely.
Proof. (1) For N > 1 and s ∈ K we have by definition
X X Xp X X Xpk
log(1 − Xp p−s )−1 = s
+ ks
.
p6N p6N
p k>2 p6N
p

Since Re(s) > 1/2 for s ∈ K, the series


X X Xpk

k>2 p
pks

converges absolutely for s ∈ Uτ . By Lemma A.4.1, its sum is therefore an H(K)-valued


random variable for any compact subset K of Uτ .
Fix now τ1 < τ such that τ1 > 21 . We can apply Kolmogorov’s Theorem B.10.1 to the
independent random variables (Xp p−τ1 ), since
X X 1
V(p−τ1 Xp ) = < +∞.
p p
p2τ1
Thus the series
X Xp

p
p τ1
converges almost surely. By Lemma A.4.1 again, it follows that
X Xp
P(s) =
p
ps
converges almost surely for all s ∈ Uτ , and is holomorphic on Uτ . By restriction, its sum
is an H(K)-valued random variable for any K compact in Uτ .
These facts show that the sequence of partial sums
X
log(1 − Xp p−s )−1
p6N

converges almost surely as N → +∞ to a random holomorphic function on K. Taking


the exponential, we obtain the almost sure convergence of the random Euler product to
a random holomorphic function ZK on K.
(2) The argument is similar, except that the sequence (Xn )n>1 is not independent.
However, it is orthonormal: if n 6= m, we have
E(Xn Xm ) = 0, E(|Xn |2 ) = 1
46
(indeed Xn and Xm may be viewed as characters of S b 1 , and they are distinct if n 6= m,
so that this is the orthogonality property of characters of compact groups). We can then
apply the Menshov–Rademacher Theorem B.10.5 to (Xn ) and an = n−τ1 : since
X X (log n)2
2 2
|an | (log n) = < +∞,
n>1 n>1
n2τ1

Xn n−τ1 converges almost surely, and Lemma A.4.1 shows that Z̃ converges
P
the series
almost surely on Uτ , and defines a holomorphic function there. Restricting to K leads to
Z̃K as H(K)-valued random variable.
Finally, to prove that ZK = Z̃K almost surely, we may replace K by the compact
subset
K1 = {s ∈ C | τ1 6 σ 6 A, |t| 6 B},
with A > 2 and B chosen large enough to ensure that K ⊂ K1 . The previous argument
shows that the random Euler product and Dirichlet series converge almost surely on K1 .
But K1 contains the open set
V = {s ∈ C | 1 < Re(s) < 2, |t| < B}
where the Euler product and Dirichlet series converge absolutely, so that Lemma C.1.4
proves that the random holomorphic functions ZK1 and Z̃K1 are equal when restricted
to V. By analytic continuation (and continuity), they are equal also on K1 , hence a
posteriori on K. 
We will prove Bagchi’s Theorem using the random Dirichlet series, which is easier to
handle than the Euler product. However, we will still denote it Z(s), which is justified
by the last part of the proposition.
Some additional properties of this random Dirichlet series are now needed. Most
importantly, we need to find a finite approximation that also applies to the Riemann zeta
function. This will be done using smooth partial sums.
First we need to check that Z(s) is of polynomial growth on average on vertical strips.
Xn n−s defined and holo-
P
Lemma 3.2.10. Let Z(s) be the random Dirichlet series
morphic almost surely for Re(s) > 1/2. For any σ1 > 1/2, we have
E(|Z(s)|)  1 + |s|
uniformly for all s such that Re(s) > σ1 .
Proof. The series
X Xn

n>1
nσ1
converges almost surely. Therefore the partial sums
X Xn
Su =
n6u
nσ1

are bounded almost surely.


By summation by parts (Lemma A.1.1), it follows that for any s with real part σ > σ1 ,
we have Z +∞
Su
Z(s) = (s − σ1 ) du,
1 u 1 +1
s−σ

47
where the integral converges almost surely. Hence
Z +∞
|Su |
|Z(s)| 6 (1 + |s|) du.
1 u 1 +1
σ−σ

Fubini’s Theorem (for non-negative functions) and the Cauchy–Schwarz inequality


then imply
Z +∞
du
E(|Z(s)|) 6 (1 + |s|) E(|Su |) σ−σ1 +1
1 u
Z +∞
du
6 (1 + |s|) E(|Su |2 )1/2 σ−σ1 +1
1 u
Z +∞ X
1  1/2 du
= (1 + |s|) 2σ
1 n6u
n 1 u 1 +1
σ−σ

1
using the orthonormality of the variables Xn . The integrand is  u− 2 −σ , hence the
integral converges uniformly for σ > σ1 . 
We can then deduce a good result on average approximation by partial sums. We
refer to Section A.3 for the definition and properties of the Mellin transform.
Proposition 3.2.11. Let ϕ : [0, +∞[−→ [0, 1] be a smooth function with compact
support such that ϕ(0) = 1. Let ϕ
b denote its Mellin transform. For N > 1, define the
H(D)-valued random variable
X n
ZD,N = Xn ϕ n−s .
n>1
N
There exists δ > 0 such that
E(kZD − ZD,N k∞ )  N−δ
for N > 1.
We recall that the norm k · k∞ refers to the sup norm on the compact set D.
Proof. The first step is to apply the smoothing process of Proposition A.4.3 in
Appendix A. The random Dirichlet series
X
Z(s) = Xn n−s
n>1

converges almost surely for Re(s) > 1/2. For σ > 1/2 and any δ > 0 such that
−δ + σ > 1/2,
we have therefore almost surely the representation
Z
1 w
(3.8) ZD (s) − ZD,N (s) = − Z(s + w)ϕ(w)N
b dw
2iπ (−δ)
for s ∈ D. (Figure 3.2 may help understand the location of the regions involved in the
proof.)
Note that here and below, it is important that the “almost surely” property holds for
all s; this is simply because we work with random variables taking values in H(D), and
not with particular evaluations of these random functions at a specific s ∈ D.
We need to control the supremum norm on D, since this is the norm on the space
H(D). For this purpose, we use Cauchy’s integral formula.
48
Figure 3.2. Regions and contours in the proof of Proposition 3.2.11.

Let S be a compact segment in ]1/2, 1[ such that the fixed rectangle R = S ×


[−1/2, 1/2] ⊂ C contains D in its interior. Then, almost surely, for any v in D, Cauchy’s
theorem gives Z
1 ds
ZD (v) − ZD,N (v) = (ZD (s) − ZD,N (s)) ,
2iπ ∂R s−v
where the boundary of R is oriented counterclockwise. The definition of R ensures that
|s − v|−1  1 for v ∈ D and s ∈ ∂R, so that the random variable kZD − ZD,N k∞ satisfies
Z
kZD − ZD,N k∞  |ZD (s) − ZD,N (s)| |ds|.
∂R
Using (3.8) and writing w = −δ + iu with u ∈ R, we obtain
Z Z
−δ
kZD − ZD,N k∞  N |Z(−δ + σ + i(t + u))| |ϕ(−δ
b + iu)||ds|du.
∂R R
Therefore, taking the expectation, and using Fubini’s Theorem (for non-negative func-
tions), we get
Z Z
−δ

E(kZD − ZD,N k∞ )  N E |Z(−δ + σ + i(t + u))| |ϕ(−δ
b + iu)||ds|du
∂R R
Z
−δ

N sup E |Z(−δ + σ + i(t + u))| |ϕ(−δ
b + iu)|du.
s=σ+it∈R R
We therefore need to bound
Z

E |Z(−δ + σ + i(t + u))| |ϕ(−δ
b + iu)|du.
R
for some fixed σ + it in the compact rectangle R. We take
1
δ = (min S − 1/2)
2
which is > 0 since S is compact in ]1/2, 1[, so that
−δ + σ > 1/2, 0 < δ < 1.
Since ϕ
b decays faster than any polynomial at infinity in vertical strips, and
E(|Z(s)|)  1 + |s|
uniformly for s ∈ R by Lemma 3.2.10, we have
Z

E |Z(−δ + σ + i(t + u))| |ϕ(−δ
b + iu)|du  1
R
49
uniformly for s = σ + it ∈ R, and the result follows. 
The last preliminary result is a similar approximation result for the translates of the
Riemann zeta function by smooth partial sums of its Dirichlet series.
Proposition 3.2.12. Let ϕ : [0, +∞[−→ [0, 1] be a smooth function with compact
support such that ϕ(0) = 1. Let ϕ
b denote its Mellin transform. For N > 1, define
X n
ζN (s) = ϕ n−s ,
n>1
N

and define2 ZN,T to be the H(D)-valued random variable t 7→ (s 7→ ζN (s + it)).


There exists δ > 0 such that
ET (kZD,T − ZN,T k∞ )  N−δ + NT−1
for N > 1 and T > 1.
Note that ζN is an entire function, since ϕ has compact support, so that the range
of the sum is in fact finite. The meaning of the statement is that the smoothed partial
sums ζN give very uniform and strong approximations to the vertical translates of the
Riemann zeta function.
Proof. We will write ZT for ZD,T for simplicity. We begin by applying the smoothing
process of Proposition A.4.3 in Appendix A in the case an = 1. For σ > 1/2 and any
δ > 0 such that −δ + σ > 1/2, we have (as in the previous proof) the representation
Z
1 w
(3.9) ζ(s) − ζN (s) = − ζ(s + w)ϕ(w)N
b dw − N1−s ϕ(1
b − s)
2iπ (−δ)
where the second term on the right-hand side comes from the fact that the Riemann zeta
function has a pole at s = 1 with residue 1.
As before, let S be a compact segment in ]1/2, 1[ such that the fixed rectangle R =
S × [−1/2, 1/2] ⊂ C contains D in its interior. Then for any v with Re(v) > 1/2 and
t ∈ R, Cauchy’s theorem gives
Z
1 ds
ζ(v + it) − ζN (v + it) = (ζ(s + it) − ζN (s + it)) ,
2iπ ∂R s−v
where the boundary of R is oriented counterclockwise; using |s − v|−1  1 for v ∈ D and
s ∈ ∂R, we deduce that the random variable kZT − ZN,T k∞ , which takes the value
sup |ζ(s + it) − ζN (s + it)|
s∈D

at t ∈ ΩT , satisfies
Z
kZT − ZN,T k∞  |ζ(s + it) − ζN (s + it)||ds|
∂R

for t ∈ ΩT . Taking the expectation with respect to t (i.e., integrating over t ∈ [−T, T])
and applying Fubini’s Theorem for non-negative functions leads to
  Z

ET kZT − ZN,T k∞  ET |ζ(s + it) − ζN (s + it)| |ds|
∂R

(3.10)  sup ET |ζ(s + it) − ζN (s + it)| .
s∈∂R

2 There should be no confusion with ZD,T .


50
We take again δ = 12 (min S − 1/2) > 0, so that 0 < δ < 1. For any fixed s = σ + it ∈ ∂R,
we have
1 1
−δ + σ > + δ > .
2 2
Applying (3.9) and using again Fubini’s Theorem, we obtain
Z
−δ
 
ET |ζ(s + it) − ζN (s + it)|  N |ϕ(−δ
b + iu)| ET |ζ(σ − δ + i(t + u))| du
R
+ N1−σ ET (|ϕ(1
b − s − it)|).
The rapid decay of ϕ
b on vertical strips shows that the second term (arising from the
−1
pole) is  NT . In the first term, since σ − δ > min(S) − δ > 21 + δ, we have
Z T
 1 |u|
(3.11) ET |ζ(σ − δ + i(t + u))| = |ζ(σ − δ + i(t + u))|dt  1 +  1 + |u|
2T −T T
by Proposition C.4.1 in Appendix C. Hence
Z
−δ
+ iu)|(1 + |u|)du + NT−1 .

(3.12) ET |ζ(s + it) − ζN (s + it)|  N |ϕ(−δ
b
R
b on the vertical line Re(s) = −δ shows that the last integral is
Now the fast decay of ϕ(s)
bounded, and we conclude from (3.10) that
 
ET kZT − ZN,T k∞  N−δ + NT−1 ,
as claimed.

Finally we can prove Theorem 3.2.1:
Proof of Bagchi’s Theorem. By Proposition B.4.1, it is enough to prove that
for any bounded and Lipschitz function f : H(D) −→ C, we have
ET (f (ZD,T )) −→ E(f (ZD ))
as T → +∞. We may use the Dirichlet series expansion of ZD according to Proposi-
tion 3.2.9, (2).
Since D is fixed, we omit it from the notation for simplicity, denoting ZT = ZD,T and
Z = ZD . Fix some integer N > 1 to be chosen later. We denote
X n
−s−it
ZT,N = n ϕ
n>1
N
(viewed as random variable defined for t ∈ [−T, T]) and
X n
−s
ZN = Xn n ϕ
n>1
N
the smoothed partial sums of the Dirichlet series as in Propositions 3.2.12 and 3.2.11.
We then write
| ET (f (ZT )) − E(f (Z))| 6 | ET (f (ZT ) − f (ZT,N ))|+
| ET (f (ZT,N )) − E(f (ZN ))| + | E(f (ZN ) − f (Z))|.
Since f is a Lipschitz function on H(D), there exists a constant C > 0 such that
|f (x) − f (y)| 6 Ckx − yk∞
51
for all x, y ∈ H(D). Hence we have
| ET (f (ZT )) − E(f (Z))| 6 C ET (kZT − ZT,N k∞ )+
| ET (f (ZT,N )) − E(f (ZN ))| + C E(kZN − Zk∞ ).
Fix ε > 0. Propositions 3.2.12 and 3.2.11 together show that there exists some N > 1
and some constant C1 > 0 such that
C1 N
ET (kZT − ZT,N k∞ ) < ε +
T
for all T > 1 and
E(kZN − Zk∞ ) < ε.
We fix such a value of N. By Proposition 3.2.5 and composition, the random variables
ZT,N (which are Dirichlet polynomials) converge in law to ZN as T → +∞. Since N/T → 0
also for T → +∞, we deduce that for all T large enough, we have
| ET (f (ZT )) − E(f (Z))| < 4ε.
This finishes the proof. 
Exercise 3.2.13. Prove that if σ > 1/2 is fixed, then we have almost surely
Z T
1
lim |Z(σ + it)|2 dt = ζ(2σ).
T→+∞ 2T −T

[Hint: Use the Birkhoff–Khintchine pointwise ergodic theorem for flows, see e.g. [30,
§8.6.1].]
Before we continue towards the computation of the support of Bagchi’s measure,
and hence the proof of Voronin’s Theorem, we can use the current available information
to obtain bounds on the probability that the Riemann zeta function is “large” on the
subset D. More precisely, it is natural to discuss the probability that the logarithm of the
modulus of translates of the zeta function is large, since this will also detect how close it
might approach zero.
Proposition 3.2.14. Let σ0 be the infimum of the real part of s for s ∈ D. There
exists a positive constant c > 0, depending on D, such that for any A > 0, we have
lim sup PT (k log |ZD,T | k∞ > A) 6 c exp(−c−1 A1/(1−σ0 ) (log A)1/(2(1−σ0 )) ).
T→+∞

Proof. Convergence in law implies that


lim sup PT (k log |ZD,T | k∞ > A) 6 PT (k log |ZD | k∞ > A)
T→+∞

and X 
X p
log |ZD | = Re s + O(1)
p
p
where the implied constant depends on D. In addition, we have
 X X    XX 
p p
PT Re s > A 6 PT > A .
p
p ∞
p
ps ∞

Since σ0 > 21 and the random variables (Xp ) are independent and bounded by 1, we
can therefore estimate the right-hand side of this last inequality using the variant of
Proposition B.11.13 discussed in Remark B.11.14 (2) for the Banach space H(D), and
hence conclude the proof. 
52
Remark 3.2.15. It is also possible to obtain lower bounds for these probabilities, by
evaluating at a fixed element of D (see Theorem 6.3.1 for a similar argument, although
the shape of the lower bound is different).

3.3. The support of Bagchi’s measure


Our goal in this section is to explain the proof of Theorem 3.2.3, which is due to
Bagchi [4, Ch. 5]. Since it involves results of complex analysis that are quite far from the
main interest of this book, we will only treat in detail the part of the proof that involves
arithmetic, giving references for the other results that are used.
The support is easiest to compute using the random Euler interpretation of the random
Dirichet series, because it is essentially a sum of independent random variables. To be
precise, define
X Xp X X Xkp
P(s) = , P̃(s) =
p
ps p k>1
pks
(see the proof of Proposition 3.2.9). The series converge almost surely for Re(s) > 1/2.
We claim that the support of the distribution of P̃, when viewed as an H(D)-valued
random variable, is equal to H(D). Let us first assume this.
Since Z = exp(P̃), we deduce by composition (see Lemma B.2.1) that the support of
Z is the closure of the set of functions of the form eg , where g ∈ H(D). But this last set
is precisely H(D)× , and Lemma A.5.5 in Appendix A shows that its closure in H(D) is
H(D)× ∪ {0}.
Finally, to prove the approximation property (3.2), which is the original version of
Voronin’s Universality Theorem, we simply apply Lemma B.3.3 to the family of random
variables ZT , which gives the much stronger statement that for any ε > 0, we have

lim inf λ {t ∈ [−T, T] | sup |ζ(s + it) − f (s)| < ε} > 0,
T→+∞ s∈D

where λ denotes Lebesgue measure.


From Proposition B.10.8 in Appendix B, the following proposition will imply that the
support of the random Dirichlet series P is H(D). The statement is slightly more general
to help with the last step afterwards.
Proposition 3.3.1. Let τ be such that 1/2 < τ < 1. Let r > 0 be such that
D = {s ∈ C | |s − τ | 6 r} ⊂ {s ∈ C | 1/2 < Re(s) < 1}.
Let N be an arbitrary positive real number. The set of all series
X xp
b1,
, with (xp ) ∈ S
p s
p>N

which converge in H(D) is dense in H(D).


We will deduce the proposition from the density criterion of Theorem A.5.1 in Ap-
pendix A, applied to the space H(D) and the sequence (fp ) with fp (s) = p−s for p prime.
Since kfp k∞ = p−σ1 , where σ1 = τ − r > 1/2, the condition
X
kfp k2∞ < +∞
p

holds. Furthermore,P b1
Proposition 3.2.9 certainly shows that there exist some (xp ) ∈ S
such that the series p xp fp converges in H(D). Hence the conclusion of Theorem A.5.1
53
is what we seek, and we only need to check the following lemma to establish the last
hypothesis required to apply it:
Lemma 3.3.2. Let µ ∈ C(D̄)0 be a continuous linear functional. Let
g(z) = µ(s 7→ esz )
be its Laplace transform. If
X
(3.13) |g(log p)| < +∞,
p

then we have g = 0.
Indeed, the point is that µ(fp ) = µ(s 7→ p−s ) = g(log p), so that the assumption (3.13)
concerning g is precisely (A.3).
This is a statement that has some arithmetic content, as we will see, and indeed the
proof involves the Prime Number Theorem.
Proof. Let
log |g(r)|
% = lim sup ,
r→+∞ r
which is finite by Lemma A.5.2 (1). By Lemma A.5.2 (3), it suffices to prove that % 6 1/2
to conclude that g = 0. To do this, we will use Theorem A.5.3, that provides access to
the value of % by “sampling” g along certain sequences of real numbers tending to infinity.
P implies that |g(log p)| cannot be often of size at least 1/p =
The idea is that (3.13)
e− log p , since the series p−1 diverges. Since the sequence log p increase slowly, this makes
it possible to find real numbers rk → +∞ growing linearly and such that |g(rk )| 6 e−rk ,
and from this and Theorem A.5.3 we will get a contradiction.
To be precise, we first note that for y ∈ R, we have
|g(iy)| 6 kµk ks 7→ eiys k∞ 6 kµker|y|
(since the maximum of the absolute value of the imaginary part of s ∈ D̄ is r), and
therefore
log |g(iy)|
lim sup 6 r.
y∈R |y|
|y|→+∞

We put α = r 6 1/4. Then the first condition of Theorem A.5.3 holds for the function
g. We also take β = 1 so that αβ < π.
For any k > 0, let Ik be the set of primes p such that ek 6 p < ek+1 . By the Mertens
Formula (C.4), or the Prime Number Theorem, we have
X1 1

p∈I
p k
k

as k → +∞. Let further A be the set of those k > 0 for which the inequality
1
|g(log p)| >
p
holds for all primes p ∈ Ik , and let B be its complement among the non-negative integers.
We then note that
X1 XX 1 XX
  |g(log p)| < +∞.
k∈A
k k∈A p∈I
p k∈A p∈I
k k

54
This shows that B is infinite. For k ∈ B, let pk be a prime in Ik such that |g(log pk )| < p−1
k .
Let rk = log pk . We then have
log |g(rk )|
lim sup 6 −1.
k→+∞ rk
Since pk ∈ Ik , we have
rk = log pk ∼ k.
Furthermore, if we order B in increasing order, the fact that
X1
< +∞
k
k∈B
/

implies that the k-th element nk of B satisfies nk ∼ k.


Now we consider the sequence formed from the r2k , arranged in increasing order. We
have r2k /k → 2 from the above. Moreover, since rk ∈ Ik , we have
r2k+2 − r2k > 1,
by construction, hence |r2k − r2l |  |k − l|. Since |g(r2k )| 6 e−r2k for all k ∈ B, we can
apply Theorem A.5.3 to this increasing sequence and we get
log |g(r2k )|
% = lim sup 6 −1 < 1/2,
k→+∞ r2k
as desired. 
There remains a last lemma to prove, that allows us to go from the support of the
series P(s) of independent random variables to that of the full series P̃(s).
Lemma 3.3.3. The support of P̃(s) is H(D).
Proof. We can write X
P̃ = − log(1 − Xp p−s )
p

where the random variables (log(1 − Xp p−s ))p are independent, and the series converges
almost surely in H(D). Therefore it is enough by Proposition B.10.8 to prove that the
set of convergent series
X
− log(1 − xp p−s ), b1,
(xp ) ∈ S
p

is dense in H(D).
b 1 , let
Fix f ∈ H(D) and ε > 0 be fixed. For N > 1 and any (xp ) ∈ S
X X xkp
hN (s) = .
p>N k>2
kpks

b 1 , and we
This series converges absolutely for any s such that Re(s) > 1/2 and (xp ) ∈ S
have
XX 1
khN k∞ 6 →0
p>N k>2
kpk/2
b 1 . Fix N such that khN k∞ < ε
as N → +∞, uniformly with respect to (xp ) ∈ S 2
for any
b1.
(xp ) ∈ S
55
Now let xp = 1 for p 6 N and define f0 ∈ H(D) by
X
f0 (s) = f (s) + log(1 − xp p−s ).
p6N

For any choice of (xp )p>N such that the series


X xp

p
ps
defines an element of H(D), we can then write
X
f (s) + log(1 − xp p−s ) = gN (s) + f0 (s) + hN (s),
p

for s ∈ D, where X xp
gN (s) = .
p>N
ps
By Proposition 3.3.1, there exists (xp )p>N such that the series gN converges in H(D)
and kgN + f0 k∞ < 2ε . We then have
X
f+ log(1 − xp p−s ) < ε.

p


Exercise 3.3.4. This exercise uses Voronin’s Theorem to deduce that the Riemann
zeta function is not the solution to any algebraic differential equation.
(1) For (a0 , . . . , am ) ∈ Cm+1 such that a0 6= 0, prove that there exists (b0 , . . . , bm ) ∈
m+1
C such that we have
m m
X
k
 X ak k
exp bk s = s + O(sm+1 )
k=0 k=0
k!
for s ∈ C.
Now fix a real number σ with 21 < σ < 1 and let g be a holomorphic function on C
which does not vanish.
(2) For any ε > 0, prove that there exists a real number t and r > 0 such that
rk
sup |ζ(s + σ + it) − g(s)| < ε .
|s|6r k!
(3) Let n > 1 be an integer. Prove that there exists t ∈ R such that for all integers
k with 0 6 k 6 n − 1, we have
|ζ (k) (σ + it) − g (k) (0)| < ε.
(4) Let n > 1 be an integer. Prove that the image in Cn of the map
(
R −→ Cn
t 7→ (ζ(σ + it), . . . , ζ (n−1) (σ + it))
is dense in Cn .
(5) Using (4), prove that if n > 1 and N > 1 are integers, and F0 , . . . , FN are
continuous functions Cn → C, not all identically zero, then the function
N
X
sk Fk (ζ(s), ζ 0 (s), . . . , ζ (n−1) (s))
k=0
56
is not identically zero. In particular, the Riemann zeta function satisfies no algebraic
differential equation.

3.4. Generalizations
If we look back at the proof of Bagchi’s Theorem, and at the proof of Voronin’s
Theorem, we can see precisely which arithmetic ingredients appeared. They are the
following:
• The crucial link between the arithmetic objects and the probabilistic model is
provided by Proposition 3.2.5, which depends on the unique factorization of
integers into primes; this is an illustration of the asymptotic independence of
prime numbers, similarly to Proposition 1.3.7;
• The proof of Bagchi’s theorem then relies on the mean-value property (3.11) of
the Riemann zeta function; this estimate has arithmetic meaning;
• The Prime Number Theorem, which appears in the proof of Voronin’s Theorem,
in order to control the distribution of primes in (roughly) dyadic intervals.
Note that some arithmetic features remain in the Random Dirichlet Series that arises
as the limit in Bagchi’s Theorem, in contrast with the Erdős-Kac Theorem, where the
limit is the universal gaussian distribution. This means, in particular, that going beyond
Bagchi’s Theorem to applications (as in Voronin’s Theorem) still naturally involves arith-
metic problems, many of which are very interesting in their interaction with probability
theory (see below for a few references).
From this analysis, it shouldn’t be very surprising that Bagchi’s Theorem can be
generalized to many other situations. The most interesting concerns perhaps the limiting
behavior, in H(D), of families of L-functions of the type
X
L(f, s) = λf (n)n−s
n>1

where f runs over some sequence of arithmetic objects with associated L-functions, or-
dered in a sequence of probability spaces (which need not be continuous like ΩT ). We
refer to [59, Ch. 5] for a survey and discussion of L-functions, and to [69] for a discus-
sion of families of L-functions. There are some rather elementary special cases, such as
the vertical translates L(χ, s + it) of a fixed Dirichlet L-function L(χ, s), since almost
all properties of the Riemann zeta function extend quite easily to this case. Another
interesting case is the finite set Ωq of non-trivial Dirichlet characters modulo a prime
number q, with the uniform probability measure. Then one can look at the distribution
of the restrictions to D of the Dirichlet L-functions L(s, χ) for χ ∈ Ωq , and indeed one
can check that Bagchi’s Theorem extends to this situation.
A second example, which is treated in [72] is, still for a prime q > 2, the set Ωq
of holomorphic cuspidal modular forms of weight 2 and level q, either with the uniform
probability measure, or with that provided by the Petersson formula ([69, 31, ex. 8]).
An analogue of Bagchi’s Theorem holds, but the limiting random Dirichlet series is not
the same as in Theorem 3.2.1: with the Petersson average, it is
Y
(3.14) (1 − Xp p−s + p−2s )−1 ,
p

where (Xp ) is a sequence of independent random variables, which are all distributed
according to the Sato–Tate measure (the same that appears in Example B.6.1 (3)). This
different limit is simply due to the form that “local spectral equidistribution” (in the sense
of [69]) takes for this family (see [69, 38]). Indeed, the local spectral equidistribution
57
property plays the role of Proposition 3.2.5. The analogue of (3.11) follows from a stronger
mean-square formula, using the Cauchy–Schwarz inequality: there exists a constant A > 0
such that, for any σ0 > 1/2 and all s ∈ C with Re(s) > σ0 , we have
X
(3.15) ωf |L(f, s)|2  (1 + |s|)A
f ∈Ωq

for q > 2, where ωf is the Petersson-averaging weight (see [76, Prop. 5], which proves an
even more difficult result where Re(s) can be as small as 12 + c(log q)−1 ).
However, extending Bagchi’s Theorem to many other families of L-functions (e.g.,
vertical translates of an L-function of higher rank) requires restrictions, in the current
state of knowledge. The reason is that the analogue of the mean-value estimates (3.11)
or (3.15) is usually only known when Re(s) > σ0 > 1/2, for some σ0 such that σ0 < 1.
Then the only domains D for which one can prove a version of Bagchi’s Theorem are
those contained in Re(s) > σ0 .
[Further references: Titchmarsh [117], especially Chapter 11, discusses the older
work of Bohr and Jessen, which has some interesting geometric aspects that are not
apparent in modern treatments. Bagchi’s Thesis [4] contains some generalizations as well
as more information concerning the limit theorem and Voronin’s Theorem..]

58
CHAPTER 4

The distribution of values of the Riemann zeta function, II

Probability tools Arithmetic tools


Definition of convergence in law (§ B.3) Riemann zeta function (§ C.4)
Kronecker’s Theorem (th. B.6.5) Möbius function (def. C.1.3)
Central Limit Theorem (th. B.7.2) Mean-square of ζ(s) on the critical line
(prop. C.4.1)
Gaussian random variable (§ B.7) Multiplicative functions (§ C.1)
Lipschitz test functions (prop. B.4.1) Euler products (lemma C.1.4)
Method of moments (th. B.5.5)

4.1. Introduction
In this chapter, as indicated previously, we will continue working with the values of
the Riemann zeta function, but on the critical line s = 12 + it, where the issues are much
deeper.
Indeed, the analogue of Theorem 3.1.1 fails for τ = 1/2, which shows that the Riemann
zeta function is significantly more complicated on the critical line. However, there is a
limit theorem after normalization, due to Selberg, for the logarithm of the Riemann zeta
function. To state it, we specify carefully the meaning of log ζ( 21 + it). We define a
random variable LT on ΩT by putting L(t) = 0 if ζ(1/2 + it) = 0, and otherwise
LT (t) = log ζ( 12 + it),
where the logarithm of zeta is the unique branch that is holomorphic on a narrow strip
1
{s = σ + iy ∈ C | σ > 2
− δ, |y − t| 6 δ}
for some δ > 0, and satisfies log ζ(σ + it) → 0 as σ → +∞.
Theorem 4.1.1 (Selberg). With notation as above, the random variables
LT
q
1
2
log log T
on ΩT converge in law as T → +∞ to a standard complex gaussian random variable.
We will in fact only prove “half” of this theorem: we consider only the real part
of log ζ( 21 + it), or in other words, we consider log |ζ( 21 + it)|. So we (re)define the
arithmetic random variables LT on ΩT by LT (t) = 0 if ζ( 12 + it) = 0, and otherwise
LT (t) = log |ζ( 12 + it)|. Note that dealing with the modulus means in particular that we
need not worry about the choice of the branch of the logarithm of complex numbers. We
will prove:
59
Theorem 4.1.2 (Selberg). The random variables
LT
q
1
2
log log T
converge in law as T → +∞ to a standard real gaussian random variable.

4.2. Strategy of the proof of Selberg’s theorem


We present the recent proof of Theorem 4.1.2 due to Radziwill and Soundararajan [95].
In comparison with Bagchi’s Theorem, the strategy has the common feature of the use of
suitable approximations to ζ, and the probabilistic limiting behavior will ultimately derive
from the independence and distribution of the vector t 7→ (p−it )p (as in Proposition 3.2.5).
However, one has to be much more careful than in the previous section.
Precisely, the approximation used by Radziwill and Soundararajan involves three
steps:
• Step 1: An approximation of LT by the random variable e LT given by t 7→
log |ζ(σ0 + it)| for σ0 sufficiently close to 1/2 (where σ0 depends on T);
• Step 2: For the random variable ZT given by t 7→ ζ(σ0 +it), so that log |ZT | = e
LT ,
an approximation of the inverse 1/ZT by a short Dirichlet polynomial DT of the
type X
DT (s) = aT (n)µ(n)n−s
n>1
where aT (n) is zero for n large enough (again, depending on T); here µ(n)
denotes the Möbius function (see Definition C.1.3), and we recall once more that
it satisfies
X 1
µ(n)n−s =
n>1
ζ(s)
if Re(s) > 1 (see Corollary C.1.5). At this point, we get an approximation of LT
by − log |DT |;
• Step 3: An approximation of |DT | by what is essentially a short Euler product,
namely by exp(− Re(PT )), where
X 1 1
(4.1) PT (t) =
k
k p 0 +it)
k(σ
p 6X

for suitable X (again depending on T). In this definition, and in all formulas in-
volving such sums below, the condition pk 6 X is implicitly restricted to integers
k > 1. At this point, LT is approximated by Re(PT ).
Finally, the last probabilistic step is to prove that the random variables
Re(PT )
q
1
2
log log T
converge in law to a standard gaussian random variable as T → +∞.
None of these steps (except the last) is easy, in comparison with the results discussed
up to now, and the specific approximations that are used (namely, the choices of the
coefficients aT (n) as well as of the length parameter X) are quite subtle and by no means
obvious (they can be seen to be related to sieve methods). Even the nature of the
approximation will not be the same in the three steps!
60
In order to simplify the reading of the proof, we first specify the relevant parameters.
2
We assume from now on that T > ee . We denote by
q
%T = 12 log log T > 1,
the normalizing factor in the theorem. We then define
1 W 1  (log % )4 
T
(4.2) W = (log log log T)4  (log %T )4 , σ0 = + = +O
2 log T 2 log T
2

(4.3) X = T1/(log log log T) = T1/ W
.
Note that we omit the dependency on T in most of these notation. We will also require
a further parameter
2 4
(4.4) Y = T1/(log log T) = T4/% 6 X.
We begin by stating the precise approximation statements. All parameters are now
fixed as above for the remainder of this chapter. After stating the precise form of each
steps of the proof, we will show how they combine to imply Theorem 4.1.2, and finally
we will establish these intermediate results.
Proposition 4.2.1 (Moving outside of the critical line). We have
 
ET |LT − L̃T | = o(%T )
as T → +∞.
We now define properly the Dirichlet polynomials that appear in the second step of
the approximation. It is here that the arithmetic subtlety lies, since the definition is quite
delicate. We define first
(4.5) m1 = 100 log log T  %T m2 = 100 log log log T  log %T .
We denote by bT (n) the characteristic function of the set of squarefree integers n > 1
such that all prime factors of n are 6 Y, and n has at most m1 prime factors. We denote
by cT (n) the characteristic function of the set of squarefree integers n > 1 such that all
prime factors p of n satisfy Y < p 6 X, and n has at most m2 prime factors. We associate
to these the Dirichlet polynomials
X X
B(s) = µ(n)bT (n)n−s , C(s) = µ(n)cT (n)n−s
n>1 n>1

for s ∈ C. Finally, define D(s) = B(s)C(s). The coefficient of n−s in the expansion of
D(s) is the Dirichlet convolution
X X
bT (d)cT (e)µ(d)µ(e) = bT (d)cT (e)µ(d)µ(e)
de=n de=n
(d,e)=1
X
= µ(n) bT (d)cT (e) = µ(n)aT (n),
de=n
(d,e)=1

say, by Proposition A.4.4, where we used the fact that d and e are coprime if bT (d)cT (e)
is non-zero since the set of primes dividing an integer in the support of bT is disjoint
from the set of primes dividing an integer in the support of cT . It follows then from this
formula that aT (n) is the characteristic function of the set of squarefree integers n > 1
such that
(1) All prime factors of n are 6 X;
61
(2) There are at most m1 prime factors p of n such that p 6 Y;
(3) There are at most m2 prime factors p of n such that Y < p 6 X.
It is immediate, but very important, that aT (n) = 0 unless n is quite small, namely
n 6 Y100 log log T X100 log log log T = Tc
where
100 100
c= + →0 as T → +∞.
log log T log log log T
Finally, we define the arithmetic random variable
(4.6) DT = D(σ0 + it).
Remark 4.2.2. Although the definition of D(s) may seem complicated, we will see
its different components coming together in the proofs of this proposition and the next.
If we consider the support of aT (n), we note that (by the Erdős–Kac Theorem,,
restricted to squarefree integers as in Exercise 2.3.4) the typical number of prime factors
of an integer n 6 Ym1 is about log log Ym1 ∼ log log T. Therefore the integers satisfying
bT (n) = 1 are quite typical, and only extreme outliers (in terms of the number of prime
factors) are excluded. On the other hand, the integers satisfying cT (n) = 1 have much
fewer prime factors than is typical, and are therefore quite rare (they are, in a weak
sense, “almost prime”). This indicates that aT is a subtle arithmetic truncation of the
characteristic function of integers n 6 Tc , and hence that
X
aT (n)µ(n)n−s
n>1

is an arithmetic truncation of the Dirichlet series that formally gives the inverse of ζ(s).
This should be contrasted with the more traditional analytic truncations of ζ(s) that
were used in Lemma 3.2.10 and Proposition 3.2.11. For comparison, it is useful to note
that Selberg himself used in many applications certain truncations that are roughly of
the shape
X µ(n)  log n 
s
1 − .
n6X
n log X

Proposition 4.2.3 (Dirichlet polynomial approximation). The difference ZT DT con-


verges to 1 in L2 , i.e., we have
 
2
lim ET |1 − ZT DT | = 0.
T→+∞

Proposition 4.2.4 (Euler product approximation). The random variables DT exp(−PT )


converge to 1 in probability, i.e., for any ε > 0, we have
 
lim PT |DT exp(PT ) − 1| > ε = 0.
T→+∞

In particular, PT (DT = 0) tends to 0 as T → +∞.


Despite our probabilistic presentation, the three previous statement are really theo-
rems of number theory, and would usually be stated without probabilistic notation. For
instance, Proposition 4.2.1 means that
1 T
Z p
| log |ζ(1/2 + it)| − log |ζ(σ0 + it)||dt = o( log log T).
T −T
The last result finally introduces the probabilistic behavior,
62
Proposition 4.2.5 (Gaussian Euler products). The random variables %−1
T PT converge
in law as T → +∞ to a standard complex gaussian random variable. In particular, the
random variables
Re(PT )
q
1
2
log log T
converge in law to a standard gaussian random variable.
We will now explain how to combine these ingredients for the final step of the proof.
Proof of Theorem 4.1.2. Until Proposition 4.2.5 is used, this is essentially a vari-
ant of the fact that convergence in probability implies convergence in law, and that con-
vergence in L1 or L2 implies convergence in probability.
For the details, fix some standard gaussian random variable N. Let f be a bounded
Lipschitz function R −→ R, and let C > 0 be a real number such that
|f (x) − f (y)| 6 C|x − y|, |f (x)| 6 C, for x, y ∈ R
We consider the difference   L 
T
ET f − E(f (N)) ,
%T
and must show that this tends to 0 as T → +∞.
We estimate this quantity using the “chain” of approximations introduced above: we
have
  L 
T
(4.7) ET f − E(f (N)) 6
%T
 L  LT  
e  e LT   log |D |−1  
T T
ET f −f + ET f −f +
%T %T %T %T
  log |D |−1   Re P     Re P 
T T T
ET f −f + ET f − E(f (N)) ,
%T %T %T
and we discuss each of the four terms on the right-hand side using the four previous
propositions (here and below, we define |DT |−1 to be 0 if DT = 0).
The first term is handled straightforwardly using Proposition 4.2.1: we have
 L  eLT   C
T
ET f −f 6 ET (|LT − e LT |) −→ 0
%T %T %T
as T → +∞.
For the second term, let AT ⊂ ΩT be the event
{DT = 0} ∪ {|eLT − log |DT |−1 | > 1/2},
and A0T its complement. Since log |ZT | = e
LT , we then have
 e LT   log |D |−1   C
T
ET f −f 6 2C PT (AT ) + .
%T %T 2%T
Proposition 4.2.3 implies that PT (AT ) → 0 (convergence to 1 of ZT DT in L2 implies
convergence to 1 in probability, hence convergence to 0 in probability for the logarithm
of the modulus) and therefore
 e LT   log |D |−1  
T
ET f −f →0
%T %T
as T → +∞.
63
We now come to the third term on the right-hand side of (4.7). Distinguishing ac-
cording to the events

BT = log |DT exp(PT )| > 1/2
and its complement, we get as before
  log |D |−1   Re P   C
T T
ET f −f 6 2C PT (BT ) + ,
%T %T 2%T
and this also tends to 0 as T → +∞ by Proposition 4.2.4.
Finally, Proposition 4.2.5 implies that
  Re P 
T
ET f − E(f (N)) → 0
%T
as T → +∞, and hence we conclude the proof of the theorem, assuming the approxima-
tion statements. 

We now explain the proofs of these four propositions. We begin with the easiest part,
which also happens to be where the transition to the pure probabilistic behavior hap-
pens. A key tool is the quantitative form of Proposition 3.2.5 contained in Lemma 3.2.6.
More precisely, as in Section 3.2, let X = (Xp )p be a sequence of independent random
variables uniformly distributed on S1 . We define Xn for n > 1 by multiplicativity as in
formula (3.7).
Lemma 4.2.6. Let (a(n))n>1 be any sequence of complex numbers with bounded sup-
port. For any T > 2 and σ > 0, we have
 X a(n) 2  X |a(n)|2  1 X |a(m)a(n)| 
ET = + O
n>1
nσ+it n>1
n2σ T m,n>1 (mn)σ− 12
m6=1
 X X 2  1 X |a(m)a(n)| 
n
=E +O ,
n>1
nσ T m,n>1 (mn)σ− 12
m6=1

where the implied constant is absolute.


Proof. We have
 X a(n) 2  X X a(m)a(n)  n it 
ET σ+it
= σ
E T .
n>1
n m n
(mn) m

We now apply Lemma 3.2.6 and separate the “diagonal” contribution where m = n from
the remainder. This leads to the first formula in the lemma, and the second then reflects
the orthonormality of the sequence (Xn )n>1 . 

When applying this lemma, we call the first term the “diagonal” contribution, and
the second the “off-diagonal” one.

Proof of Proposition 4.2.5. We have PT = QT + RT , where QT is the contribu-


tion of the primes, and RT the contribution of squares and higher powers of primes. We
first claim that RT is uniformly bounded in L2 for all T. Indeed, using Lemma 4.2.6, we
64
get
X X 1 2
ET (|RT |2 ) = ET p−kσ0 p−kit
k>2 1/k
k
p6X
X 1 1 X X 1 1 X2 log X
= p −2kσ0
+ O (pq)−2σ0 + 2  1 + 1
k
k2 T k,l>2 kl T
p 6X pk ,q l 6X
k>2 p6=q

since X  Tε for any ε > 0.


From this, it follows that it is enough to show that QT /%T converges in law to a
standard complex gaussian random variable N. For this purpose, we use moments, i.e.,
we compute  
k `
ET QT QT
for integers k, ` > 0, and we compare with the corresponding moment of the random
variable X
QT = p−σ0 Xp .
p6X

After applying Lemma 3.2.6 again (as in the proof of Lemma 4.2.6), we find that
  1 X 
k ` k ` −σ0 +1/2
ET QT QT = E(QT QT ) + O (mn)
T m6=n

where the sum in the error term runs over integers m (resp. n) with at most k prime
factors, counted with multiplicity, all of which are 6 X (resp. at most ` prime factors,
counted with multiplicity, all of which are 6 X). Hence this error term is
1 X k+` Xk+`
 1  .
T p6X T
Next, we note that
X 1 X −2σ0
V(QT ) = p−2σ0 V(X2p ) = p .
p6X
2 p6X

We compute this sum by splitting in two ranges p 6 Y and Y < p 6 X (recall that σ0
depends on T). The second sum is
X 1  log X 
 = log + O(1)  log log log T
Y<p6X
p log Y
2
by Proposition C.3.1 and (4.2). On the other hand, for p 6 Y = T1/(log log T) , we have
 (log p)    W 
p−2σ0 = p−1 exp −2 W = p−1 1 + O ),
(log T) (log log T)2
which, in view of (4.2), implies that V(QT ) ∼ 12 log log T = %2T as T → +∞.
p
It is finally again a case of the Central Limit Theorem that QT / V(QT ), and hence
also QT /%T , converges in law to a standard complex gaussian random variable, with
convergence of the moments (Theorem B.7.2 and Theorem B.5.6 (2), Remark B.5.8), so
the conclusion follows from the method of moments since Xk+` /T → 0 as T → +∞. 
The other propositions will now be proved in order. Some of the arithmetic results
that we will used are only stated in Appendix C (with suitable references).
65
Proof of Proposition 4.2.1. We appeal to Hadamard’s factorization of the Rie-
mann zeta function (Proposition C.4.3) in the form of its corollary, Proposition C.4.4.
Let t ∈ ΩT be such that there is no zero of zeta with ordinate t (this only excludes finitely
many values of t for a given T). We have
Z σ 0 ζ 0  Z σ0  ζ 0 
1
log |ζ(σ0 + it)| − log |ζ( 2 + it)| = Re (σ + it)dσ = Re (σ + it) dσ.
1/2 ζ 1/2 ζ
1
For any σ with 2
6 σ 6 σ0 , we have
ζ0 X 1
− (σ + it) = + O(log(2 + |t|)),
ζ σ + it − %
|t−Im(%)|<1

by Proposition C.4.4, where the sum is over zeros % of ζ(s), counted with multiplicity,
such that |σ + it − %| < 1.
We fix t0 ∈ ΩT and integrate over t such that |t − t0 | 6 1. This leads to
Z t0 +1 Z t0 +1 Z σ0
1
X  1 
log |ζ(σ0 + it)| − log |ζ( 2 + it)| dt 6 Re dtdσ.
t0 −1 t0 −1
1 σ + it − %
| Im(%)−t0 |61 2

An elementary integral (!) gives


Z t0 +1
|σ − β|
Z Z
 1   1 
Re dt 6 Re dt = 2 2
dt = π
t0 −1 σ + it − % R σ + it − % R (σ − β) + (t − γ)

for all σ and %. Hence we get


Z
1 m(t0 )
log |ζ( 12 + it − %)| − log |ζ(σ0 + it − %)| dt  (σ0 − 21 ) ,
T |t−t0 |61 T
where m(t0 ) is the number of zeros % such that |t0 − Im(%)| 6 1. This is  log(2 + |t0 |)
by Proposition C.4.4 again. Finally, by summing the bound
log(2 + |t0 |)
Z
1
log |ζ( 12 + it − %)| − log |ζ(σ0 + it − %)| dt  (σ0 − 21 ) ,
T |t−t0 |61 T
over a partition of ΩT in  T intervals of length 2, we deduce that
LT |)  (σ0 − 12 ) log T = W.
ET (|LT − e
We have W = o(%T ) (by a rather wide margin!), and the proposition follows. 
The last two propositions are more involved, and we present their proofs is separate
sections

4.3. Dirichlet polynomial approximation


We will prove Proposition 4.2.3 in this section, i.e., we need to prove that
ET (|1 − ZT DT |2 ),
where ZT (t) = ζ(σ0 +it), tends to 0 as T → +∞. This is arithmetically the most involved
part of the proof.
First of all, we use the approximation formula
X  T1−σ0 
ζ(σ0 + it) = n−σ0 −it + O + T−1/2
16n6T
|t| + 1
66
for t ∈ ΩT (see Proposition C.4.5). Multiplying by DT , we obtain
X
ET (ZT DT ) = aT (m)µ(m) ET ((mn)−σ0 )+
m>1
n6T
 X X 
1/2 −1 −1/2 −σ0
O T aT (m) ET ((|t| + 1) ) + T aT (m)m .
m>1 m>1

We recall that |aT (n)| 6 1 for all n, and aT (n) = 0 unless n  Tε , for any ε > 0. Hence,
by (3.4), this becomes
1 X 
ET (ZT DT ) = 1 + O aT (m)(mn) (log mn) + O(T−1/2+ε )
−σ0
T n6T
m6=n
−1/2+ε
= 1 + O(T )
for any ε > 0 (in the diagonal terms, only m = n = 1 contributes, and in the off-diagonal
terms mn 6= 1, we have ET ((mn)−it )  T−1 log(mn)). It follows that it suffices to prove
that
lim ET (|ZT DT |2 ) = 1.
T→+∞

We expand the mean-square using the formula for DT , and obtain


X µ(m)µ(n)  m it 
ET (|ZT DT |2 ) = a T (m)a T (n) E T |ZT |2
.
m,n
(mn)σ0 n

Now the asymptotic formula of Proposition C.4.6 translates to a formula for ET (m/n)it |ZT |2 ,
namely
 m it   (m, n)2 σ0
ET |ZT |2 = ζ(2σ0 )
n mn
 (m, n)2 1−σ0  |t| 1−2σ0 
+ ζ(2 − 2σ0 ) ET + O(min(m, n)T−σ0 +ε )
mn 2π
for any ε > 0, where the expectation is really the integral
Z T  1−2σ
1 |t| 0
dt,
2T −T 2π
and we recall that (m, n) denotes here the gcd of m and n.
Using the properties of aT (n), the error term is easily handled, since it is at most
X X 2
T−σ0 +ε (mn)−σ0 aT (m)aT (n) min(m, n) 6 T−σ0 +ε m1/2 aT (m)  T−σ0 +2ε
m,n m

for any ε > 0. Thus we only need to handle the main terms, which we write as
 |t| 1−2σ0 
(4.8) ζ(2σ0 )M1 + ζ(2 − 2σ0 ) ET M2 ,

where
X µ(m)µ(n)
M1 = 2σ0
aT (m)aT (n)(m, n)2σ0
m,n
(mn)
67
and M2 is the other term. Using the multiplicative structure of aT , the first term factors
in turn as M1 = M01 M001 , where
X µ(m)µ(n)
M01 = bT (m)bT (n),
m,n
[m, n]2σ0
X µ(m)µ(n)
M001 = cT (m)cT (n).
m,n
[m, n]2σ0

We compare M01 to the similar sum M̃01 where bT (n) and bT (m) are replaced by charac-
teristic functions of integers with all prime factors 6 Y, forgetting only the requirement
to have 6 m1 prime factors. By Example C.1.7, we have
Y 1 
M̃01 = 1 − 2σ0 .
p6Y
p

The difference M01 − M̃01 can be bounded from above by


X |µ(m)µ(n)|
2e−m1 2σ0
eΩ(m)
m,n
[m, n]

where the sum runs over integers with all prime factors 6 Y (this step is a case of what
is called “Rankin’s trick”: the condition Ω(m) > m1 is handled by bounding its charac-
teristic function by the non-negative function eΩ(m)−m1 ). Again from Example C.1.7, this
is at most
Y 1 + 2e 
2(log T)−100 1+  (log T)−90
p6Y
p

(By Proposition C.3.6). Thus


Y
M01 ∼ (1 − p−2σ0 )
p6Y

as T → +∞. One deals similarly with the second term M001 , which turns out to satisfy
Y
M001 ∼ (1 − p−2σ0 ),
Y<p6X

and hence Y Y
M1 ∼ ζ(2σ0 ) (1 − p−2σ0 ) = (1 − p−2σ0 ).
p6X p>X

Now, by the choice of the parameters, we obtain from the Prime Number Theorem
(Theorem C.3.3) the upper-bound
Z +∞
X
−2σ0 1 dt X1−2σ0 X1−2σ0 1
p  2σ
 = √ 6 √ .
p>X X t log t
0 (2σ0 − 1) log X 2 W 2 W

Since this tends to 0 as T → +∞, it follows that


Y X  1  1   X 
(1 − p−2σ0 ) = exp 2σ0
+ O 4σ0
= exp − p −2σ0
(1 + o(1))
p>X p>X
p p p>X

converges to 1 as T → +∞.
68
There only remains to check that the second part M2 of the main term (4.8 tends to
0 as T → +∞. We have
X µ(m)µ(n) X µ(m)µ(n)
M2 = aT (m)aT (n)(m, n)2−2σ0 = a (m)aT (n)(mn)1−2σ0 .
2−2σ0 T
m,n
mn m,n
[m, n]

The procedure is very similar: we factor M2 = M02 M002 , where M02 has coefficients bT instead
of aT , and M002 has cT . Applying Example C.1.7 and Rankin’s Trick to both factors now
leads to
Y 1  1 1 1  Y  2 1 
M2 ∼ 1 + 2−2σ0 − 2σ0 −1 − 2σ0 −1 + 4σ0 −2 = 1 − + 2σ0 .
p6X
p p p p p6X
p p

We deduce from this that the contribution of M2 to (4.8) is


 |t| 1−2σ0  Y  2 1 
∼ ζ(2 − 2σ0 ) ET 1 − + 2σ0 .
2π p6X
p p

Since ζ(s) has a pole at s = 1 with residue 1, this last expression is


T1−2σ0 Y  1 T1−2σ0
 1−  .
2σ0 − 1 p6X p (2σ0 − 1) log X

In terms of the parameter W, since 2σ0 − 1 = 2W/ log T and X = T1/ W , the right-hand
side is simply exp(−2W)W−1/2 , and hence tends to 0 as T → +∞. This concludes the
proof.

4.4. Euler product approximation


This section is devoted to the proof of Proposition 4.2.4. We need to prove that
DT exp(PT ) converges to 1 in probability. This involves some extra decomposition of PT :
we write
PT = QT + RT
where QT is the contribution to (4.1) of the prime powers pk 6 Y.
In addition, for any integer m > 0, we denote by expm the Taylor polynomial of
degree m of the exponential function at 0, i.e.
m
X zj
expm (z) = .
j=0
j!

We have an elementary lemma:


Lemma 4.4.1. Let z ∈ C and m > 0. If m > 100|z|, then
expm (z) = ez + O(exp(−m)) = ez (1 + O(exp(−99|z|))).
Proof. Indeed, since j! > (j/e)k for all j > 0 and |z| 6 m/100, the difference
ez − expm (z) is at most
X (m/100)j X em j
6  exp(−m).
j>m
j! j>m
100j


69
We define
ET = expm1 (−QT ), FT = expm2 (−RT ),
where we recall that m1 and m2 are the parameters defined in (4.5). We have by definition
DT = BT CT , with
X X
BT (t) = bT (n)µ(n)n−σ0 −it , CT (t) = cT (n)µ(n)n−σ0 −it ,
n>1 n>1

where bT and cT are defined after the statement of Proposition 4.2.1, e.g., bT (n) is the
characteristic function of squarefree integers n such that n has 6 m1 prime factors, all of
which are 6 Y.
The idea of the proof is that, usually, QT (resp. RT ) is not too large, and then the
random variable ET is a good approximation to exp(−QT ). On the other hand, because
of the shape of ET (and the choice of the parameters), it will be possible to prove that
ET is close to BT in L2 -norm, and similarly for FT and CT . Combining these facts will
lead to the conclusion.
We first observe that, as in the beginning of the proof of Proposition 4.2.5, by the
usual appeal to Lemma 3.2.6, we have
ET (|QT |2 )  %T , ET (|RT |2 )  log %T .
Markov’s inequality implies that PT (|QT | > %T ) tends to 0 as T → +∞. Now by
Lemma 4.4.1, whenever |QT | 6 %T , we have
 
ET = exp(−QT ) 1 + O((log T)−99 ) .
Similarly, the probability PT (|RT | > log %T ) tends to 0, and whenever |RT | 6 log %T , we
have  
−99
FT = exp(−RT ) 1 + O((log log T) ) .
For the next step, we claim that
 
(4.9) ET |ET − BT |2  (log T)−60 ,
 
(4.10) ET |FT − CT |  (log log T)−60 .
2

We begin the proof of the first estimate with a lemma.


Lemma 4.4.2. For t ∈ ΩT , we have
X
ET (t) = α(n)n−σ0 +it ,
n>1

where the coefficients α(n) are zero unless n 6 Ym1 and n has only prime factors 6 Y.
Moreover |α(n)| 6 1 for all n, and α(n) = µ(n)bT (n) if n has 6 m1 prime factors,
counted with multiplicity, and if there is no prime power pk dividing n such that pk > Y.
Proof. Since
m1
X (−1)j  X 1 j
ET = expm1 (−QT ) = ,
j=0
j! kpk(σ0 +it)
pk 6Y

we obtain by expanding the j-th power an expression of the desired kind, with coefficients
X (−1)j X 1
α(n) = .
06j6m
j! k k
k 1 · · · kj
1
p1 1 ···pj j =n
k
pi i 6Y

70
We see from this expression that α(n) is 0 unless n 6 Ym1 and n has only prime factors
6 Y. Suppose now that n has 6 m1 prime factors, counted with multiplicity, and that
no prime power pk > Y divides n. Then we may extend the sum defining α(n) to all
j > 0, and remove the redundant conditions pki i 6 Y, so that
X (−1)j X 1
α(n) = .
j>0
j! k k k1 · · · kj
p1 1 ···pj j =n

But we recognize that this is the coefficient of n−s in the expansion of


 X1  1 X µ(n)
exp − p−ks = exp(− log ζ(s)) = = s
,
k>1
k ζ(s) n>1
n

(viewed as a formal Dirichlet series, or by restricting to Re(s) > 1). This means that, for
such integers n, we have α(n) = µ(n) = µ(n)bT (n).
Finally, for any n > 1 now, we have
X1 X 1
|α(n)| 6 = 1,
j>0
j! k k k1 · · · kj
p1 1 ···pj j =n

since the right-hand side is the coefficient of n−s in exp(log ζ(s)) = ζ(s). 
Now define δ(n) = α(n) − µ(n)bT (n) for all n > 1. We have
   X δ(n) 2 
ET |ET − BT |2 = ET σ0 +it
,
n>1
n
which we estimate using Lemma 4.2.6. The contribution of the off-diagonal term is
1 X 1
−σ0 4  X 2
 |δ(n)δ(m)|(mn) 2 6 1  T−1+ε
T m,n6Ym1 T m6Ym1

for any ε > 0, hence is negligible. The diagonal term is


X |δ(n)|2 X |δ(n)|2
M= 2σ0
6 .
n>1
n n>1
n

By Lemma 4.4.2, we have δ(n) = 0 unless either n has > m1 prime factors, counted with
multiplicity, or is divisible by a power pk such that pk > Y (and necessarily p 6 Y since δ
is supported on integers only divisible by such primes). The contribution of the integers
satisfying the first property is at most
X 1
.
n
Ω(n)>m1
p|n⇒p6Y

We use Rankin’s trick once more to bound this from above: for any fixed real number
η > 1, we have
X 1 Y η 
6 η −m1 1 + + · · ·  η −m1 (log Y)η 6 (log T)−100 log η+η .
n p6Y
p
Ω(n)>m1
p|n⇒p6Y

by Proposition C.3.6. Selecting η = e2/3 6 2, for instance, this shows that this contribu-
tion is  (log T)−60 .
71
The contribution of integers divisible by pk > Y is at most
 X 1  X 1  1 X Y 1
k
6 1 −1
 Y−1/2 (log Y),
p6Y
p n Y √
p6Y
1 − p
p|n⇒p6Y k
Y<p 6Y
pk >Y k>2

which is even smaller. This concludes the proof of (4.9).


The proof of the second estimate (4.10) is quite similar, with one extra consideration
to handle. Indeed, arguing as in Lemma 4.4.2, we obtain the expression
X
FT (t) = β(n)n−σ0 +it ,
n>1

for t ∈ ΩT , where the coefficients β(n) are zero unless n 6 Xm2 and n has only prime
factors 6 X, satisfy |β(n)| 6 1 for all n, and finally satisfy β(n) = µ(n) if n has 6 m2
prime factors, counted with multiplicity, and if there is no prime power pk dividing n
with Y < pk 6 X.
Using this, and defining now δ(n) = β(n) − µ(n)cT (n), we get from Lemma 4.2.6 the
bound   X 1 X 1
2
ET |FT − CT |  6 .
n>1
n2σ0 n>1
n
δ(n)6=0 δ(n)6=0

But the integers that satisfy δ(n) 6= 0 must be of one of the following types:
(1) Those with cT (n) = 1, which (by the previous discussion) must either have Ω(n) >
m2 (and be divisible by primes 6 X only), or must be divisible by a prime power pk such
that pk > X (the possibility that pk 6 Y is here excluded, because cT (n) = 1 implies that
n has no prime factor < Y). The contribution of these integers is handled as in the case
of the bound (4.9) and is  (log log T)−60 .
(2) Those with cT (n) = 0 and β(n) 6= 0; since
X (−1)j X 1
β(n) = ,
06j6m
j! k k k1 · · · kj
2
p1 1 ···pj j =n
k
Y<pi i 6X

as in the beginning of the proof of Lemma 4.4.2, such an integer n has at least one
k
factorization n = pk11 · · · pj j for some j 6 m2 , where each prime power pki i is between Y
and X. Since cT (n) = 0, either Ω(n) > m2 , or n has a prime factor p > X, or n has
a prime factor p 6 Y. The first two possibilities are again handled exactly like in the
proof of (4.9), but the third is somewhat different. We proceed as follows to estimate its
contribution, say N. We have X
N= Nj ,
06j<m2
where
X X 1
Nj =
p6Y k k
n
n=pk p1 1 ···pj j
k
Y<pi i 6X

is the contribution of integers with a factorization of length j + 1 as a product of prime


powers between Y and X. By multiplicativity, we get
X X 1  X X 1 j−1
Nj 6 k k
.
p6Y k
p p6X k
p
Y<p 6X Y<p 6X

72
Consider the first factor. For a given prime p 6 Y, let l be the smallest integer such that
pl > Y. The sum over k is then
X 1 1 1 1 1
k
6 l + l+1 + · · ·  l 6 ,
k
p p p p Y
Y<p 6X

so that the first factor is  π(Y)/Y  (log Y)−1 . On the other hand, for the second
factor, we have
X X 1 X X 1 X X 1 π(Y) X X 1
= +  + ,
p6X Y<pk 6X
pk p6Y k
p k
Y<p6X k
p k Y Y<p6X k
p k
Y<p 6X Y<p 6X Y<p 6X

where we used the bound arising from the first factor. For a given prime p with Y < p 6
X, the last sum over k is
1 1 1
+ 2 + ···  ,
p p p
and the sum over p is therefore
X 1  log X 
= log + O(1) = log log log T + O(1),
Y<p6X
p log Y

using the values of X and Y and Proposition C.3.1. Hence the final estimate is
1
N (log log log T)m2  (log log T)(log log log T)m2 (log T)−1 → 0
log Y
as T → +∞, from which we finally deduce that (4.10) holds.
With the mean-square estimates (4.9) and (4.10) in hand, we can now finish the proof
of Proposition 4.2.5. Except on sets of measure tending to 0 as T → +∞, we have
 
BT = ET + O((log T)−25 ), ET = exp(−QT ) 1 + O((log T)−99 )
1
6 exp(−QT ) 6 (log T)
log T
(where the first property follows from (4.9)), and hence
 
−20
BT = exp(−QT ) 1 + O((log T) ) ,

again outside of a set of measure tending to 0. Similarly, using (4.10), we get


 
−20
CT = exp(−RT ) 1 + O((log log T) )

outside of a set of measure tending to 0. Multiplying the two equalities shows that
 
DT = exp(−PT ) 1 + O((log log T)−20 )

with probability tending to 1 as T → +∞. This concludes the proof.


Exercise 4.4.3. Try to see what happens if one uses a single range pk 6 X, instead
of having the distinction between pk 6 Y and Y < pk 6 X.
73
4.5. Further topics
Generalizations of Selberg’s Central Limit Theorem are much harder to come by than
those of Bagchi’s Theorem (which is another illustration of the fact that arithmetic L-
functions have much more delicate properties on the critical line). There are very few
cases other than that of the Riemann zeta function where such a statement is known
(see the remarks in [95, §7] for references). For instance, consider the family of modular
forms f that is described in Section 3.4. The natural question is now to consider the
distribution (possibly with weights ωf ) of L(f, 12 ). First, it is a known fact (due to
Waldspurger and Kohnen–Zagier) that L(f, 12 ) > 0 in that case. This property reflects a
different type of expected distribution of the values L(f, 12 ), namely one expects that the
correct normalization is
log L(f, 12 ) + 21 log log q
f 7→ √ ,
log log q
in the sense that this defines a sequence of random variables on Ωq that should converge
in law to a standard (real) gaussian random variable. Now observe that such a statement,
if true, would immediately imply that the proportion of f ∈ Ωq with L(f, 12 ) = 0 tends
to 0 as q → +∞, and this is not currently known (this would indeed be a major result
in the analytic theory of modular forms).
Nevertheless, there has been significant progress in this direction, for various families,
in recent and ongoing work of Radziwill and Soundararajan. In [96], they prove sub-
gaussian upper bounds for the distribution of L-values in certain families similar to Ωq
(specifically, quadratic twists of a fixed modular form). In [97], they announce gaussian
lower bounds, but for families conditioned to have L(f, 12 ) 6= 0 (which, for a number of
cases, is known to be a subfamily with positive density as the size tends to infinity).
In addition to these developments, it should be emphasized that Selberg’s Theorem
serves as a general guiding principle when studying any probabilistic question for the
Riemann zeta function on the critical line, and the ideas in its proof are often the starting
points towards other results. Indeed, some of the deepest works in probabilistic number
theory in recent years have been devoted to studies of finer aspects of the distribution of
the Riemann Zeta function on the critical line. A particular focus has been a conjecture
of Fedorov, Hiary and Keating [39] that addresses the distribution of the maximum
of ζ(1/2+it) when t varies over an interval of length 1 (and t is taken uniformly at random
in [−T, T] or [T, 2T] with T → +∞). This leads to links with objects like log-correlated
fields, branching random walks, or gaussian multiplicative chaos. We refer to the Bourbaki
seminar survey of Harper [54] for a discussion of the work of Najnudel [90] and Arguin–
Belius–Bourgade–Radziwill–Soundararajan [1], and to Harper’s recent preprint [55] for
the latest developments in this direction.
One of the reasons that Central Limit Theorems are expected to hold is that they are
known to follow from the widely believed moment conjectures for families of L-functions,
which predict (with considerable evidence, theoretic, numerical and heuristic) the asymp-
totic behavior of the Laplace or Fourier transform of the logarithm of the special values
of the L-functions. In other words, taking the example of the Riemann Zeta function,
these conjectures (due to Keating and Snaith [63]) predict the asymptotic behavior of
Z T
1
s log |ζ( +it)| s 1
ET (e 2 1
) = ET (|ζ( 2 + it)| ) = |ζ( 1 + it)|s dt
2T −T 2
for suitable s ∈ C. It is of considerable interest that, besides natural arithmetic factors
(related to the independence of Proposition 3.2.5 or suitable analogues), these conjectures
74
involve certain terms which originate in Random Matrix Theory. In addition to imply-
ing straightforwardly the Central Limit Theorem, note that the moment conjectures also
immediately yield the generalization of (3.11) or (3.15), hence can be allowed to deduce
general versions of Bagchi’s Theorem and universality. Moreover, these moment conjec-
tures (in suitably uniform versions) are also able to settle other interesting conjectures
concerning the distribution of values of ζ( 12 + it). For instance, as shown by Kowalski
and Nikeghbali [78], they are known to imply that the image of t 7→ ζ( 12 + it), for t ∈ R,
is dense in C (a conjecture of Ramachandra).
[Further references: Katz–Sarnak [62], Blomer, Fouvry, Kowalski, Michel, Milićević
and Sawin [11].]

75
CHAPTER 5

The Chebychev bias

Probability tools Arithmetic tools


Definition of convergence in law (§ B.3) Primes in arithmetic progressions
Kronecker’s Theorem (th. B.6.5) Orthogonality of Dirichlet characters
(prop. C.5.1)
Convergence in law using auxiliary param- Dirichlet L-functions (§ C.5)
eters (prop. B.4.4)
Characteristic functions (§ B.5) Generalized Riemann Hypothesis
(conj. C.5.8)
Kolmogorov’s Theorem for random series Explicit formula (th. C.5.6)
(th. B.10.1)
Method of moments (th. B.5.5) Distribution of the zeros of L-functions
(prop. C.5.3)
Generalized Simplicity Hypothesis

5.1. Introduction
One of the most remarkable limit theorems in probabilistic number theory is related
to a surprising feature of the distribution of prime numbers, which was first noticed by
Chebychev [24] in 1853: there seemed to be many more primes p such that p ≡ 3 (mod 4)
than primes with p ≡ 1 (mod 4) (any prime, except p = 2, must satisfy one of these two
conditions). More precisely, he states:
En cherchant l’expression limitative des fonctions qui déterminent la to-
talité des nombres premiers de la forme 4n + 1 et de ceux de la forme
4n + 3, pris au-dessous d’une limite très grande, je suis parvenu à re-
connaı̂tre que ces deux fonctions diffèrent notablement entre elles par
leurs seconds termes, dont la valeur, pour les nombres 4n + 3, est plus
grande que celle pour les nombres 4n + 1; ainsi, si de la totalité des
nombres premiers de la forme 4n + 3, on retranche celle des nombres
premiers de la forme

4n + 1, et que l’on divise ensuite cette différence
x
par la quantité log x , on trouvera plusieurs valeurs de x telles, que ce
quotient s’approchera de l’unité aussi près qu’on le voudra.1
1 English translation: “While searching for the limiting expression of the functions that determine
the number of prime numbers of the form 4n + 1 and of those of the form 4n + 3, less than a very large
limit, I have succeeded in recognizing that the second terms of these two functions differ notably from
each other; its value [of this second term], for the numbers 4n + 3, is larger than that for the numbers
4n + 1; thus, if from the number of prime numbers of the form 4n + 3, we subtract that of the prime
76
It is unclear from Chebychev’s very short note what exactly he had proved, or simply
conjectured, and he did not publish anything more on this topic. It is definitely not the
case that we have
π(x; 4, 3) > π(x; 4, 1)
for all x > 2, where (in general), for an integer q > 1 and an integer a, we write π(x; q, a)
for the number of primes p 6 x such that p ≡ a (mod q). Indeed, for x = 26861, we have
π(x; 4, 3) = 1472 < 1473 = π(x; 4, 1)
(as discovered by Leech in 1957), and one can prove that there are infinitely many sign
changes of the difference π(x; 4, 3) − π(x; 4, 1).
In any case, by communicating his observations, Chebychev created a fascinating area
of number theory. We will discuss some of the basic known results in this chapter, which
put the question on a rigorous footing, and in particular confirm the existence of the bias
towards the residue class of 3 modulo 4, in a precise sense (although this conclusion will
depend on currently unproved conjectures). Because of this feature, the subject is called
the study of the Chebychev bias.

5.2. The Rubinstein–Sarnak distribution


In order to study the problem suggested by Chebychev, we consider for X > 1 the
probability space ΩX = [1, X], with the probability measure
1 dx
(5.1) PX = .
log X x
Let q > 1 be an integer. We define a random variable on ΩX , with values in the
vector space CR ((Z/qZ)× ) of real-valued functions on the (fixed) finite group (Z/qZ)× ,
by defining NX,q (x), for x ∈ ΩX , to be the function such that
log x 
(5.2) NX,q (x)(a) = √ ϕ(q)π(x; q, a) − π(x) .
x
for a ∈ (Z/qZ)× (this could also, of course, be viewed as a random real vector with values
×
in R|(Z/qZ) | , but the perspective of a function will be slightly more convenient).
We see that the knowledge of NX,q allows us to compare the number of primes up
to X in any family of invertible residue classes modulo q. It is therefore appropriate for
the study of the questions suggested by Chebychev.
We observe that in the remainder of this chapter, we will consider q to be fixed
(although there are interesting questions that one can ask about uniformity with respect
to q). For this reason, we will often simplify the notation (especially during proofs) to
write NX instead of NX,q , and similarly dropping q in some other cases.
Remark 5.2.1. (1) If q = 4, then (Z/4Z)× = {1, 3}, and for x ∈ ΩT , the random
function NX,4 (x) is given by
log x log x
1 7→ √ (2π(x; 4, 1) − π(x)), 3 7→ √ (2π(x; 4, 3) − π(x)).
x x

x
numbers of the form 4n + 1, and then divide this difference by the quantity log x , we will find several
values of x such that this ratio will approach one as closely as we want.”
77
(2) Recall that the fundamental theorem of Dirichlet, Hadamard and de la Vallée
Poussin (Theorem C.3.7) shows that
1
π(x; q, a) ∼ π(x)
ϕ(q)
for all a coprime to q. Thus the random variables NX are considering the correction term
from the asymptotic behavior. √
(3) The normalizing factor (log x)/ x, which is the “correct one”, is the same one
that is suggested by Chebychev’s quote.
The basic probabilistic result concerning these arithmetic quantities is the following:
Theorem 5.2.2 (Rubinstein–Sarnak). Let q > 1. Assume the Generalized Riemann
Hypothesis modulo q. Then the random functions NX,q converge in law to a random
function Nq . The support of Nq is contained in the hyperplane
n X o
(5.3) Hq = f : (Z/qZ)× → R | f (a) = 0 .
a∈(Z/qZ)×

We call Nq the Rubinstein–Sarnak distribution modulo q.


Remark 5.2.3. One may wonder if the choice of the logarithmic weight in the proba-
bility measure PX is necessary for such a statement of convergence in law: this is indeed
the case, and we will say a few words to explain this in Remark 5.3.5.
The Generalized Riemann Hypothesis modulo q is originally a statement about the
zeros of certain analytic functions, the Dirichlet L-functions modulo q. It has, however,
a concrete formulation in terms of the distribution of prime numbers: it is equivalent to
the statement that, for all integers a coprime with q and all x > 2, we have
Z x
1 dt
π(x; q, a) = + O(x1/2 (log qx))
ϕ(q) 2 log t
where the implied constant is absolute (see, e.g., [59,√5.14, 5.15] for this equivalence).
The size of the (expected) error term, approximately x, is related to the zeros of the
Dirichlet L-functions, as we will see later; it explains that the normalization factor in (5.2)
is the right one for the existence of a limit in law as in Theorem 5.2.2. Indeed, using the
case q = 1, which is the formula
Z x
dt
π(x) = + O(x1/2 (log x)),
2 log t
we deduce that each value of the function NX satisfies
log x
√ (ϕ(q)π(x; q, a) − π(x)) = O(ϕ(q)(log qx)2 ).
x
To see how Theorem 5.2.2 helps answer questions related to the Chebychev bias, we
take q = 4. Then we expect that
lim PX (π(x; 4, 3) > π(x; 4, 1)) = P(N4 ∈ H4 ∩ C),
X→+∞

where C = {(x1 , x3 ) | x3 > x1 } (although whether this limit exists or not does not
follow from Theorem 5.2.2, without further information concerning the properties of the
limit N4 ). Then Chebychev’s basic observation could be considered to be confirmed if
P(N4 ∈ H4 ∩ C) is close to 1. But in the absence of any other information, it seems very
hard to prove (or disprove) this last fact.
78
However, Rubinstein and Sarnak showed that one could go much further by making
one extra assumption on the distribution of the zeros of Dirichlet L-functions. Indeed, one
can then represent Nq explicitly as the sum of a series of independent random variables
(and in particular compute explicitly the characteristic function of the random func-
tion Nq ). We describe this random series in Section 5.4, since to do so at this point would
lead to a statement that would appear highly unmotivated. The proof of Theorem 5.2.2
will lead us naturally to this next step (see Theorem 5.4.4 for the details).
Below, we write
X∗ Y∗
( · · · ), (···)
χ (mod q) χ (mod q)

for a sum or a product over non-trivial2 Dirichlet characters modulo q; we recall that these
are (completely) multiplicative functions on Z such that χ(n) = 0 unless n is coprime to q,
in which case we have χ(n) = χ e(n) for some group homomorphism χ e : (Z/qZ)× → C× .
×
We define a function mq on (Z/qZ) by
X∗
(5.4) mq (a) = − χ(a)
χ (mod q)
χ2 =1

for a ∈ (Z/qZ)× . This can also, using orthogonality of characters modulo q (see Propo-
sition C.5.1), be expressed in the form
X
mq (a) = 1 − 1,
b∈(Z/qZ)×
b2 =a (mod q)

from which we see that in fact we have simply two possible values, namely
(
1 if a is not a square modulo q
(5.5) mq (a) =
1 − σq otherwise,
where
σq = |{b ∈ (Z/qZ)× | b2 = 1}| = |{χ (mod q) | χ2 = 1}|
is also the index of the subgroup of squares in (Z/qZ)× .
In the remaining sections of this chapter, we will explain the proof of Theorem 5.2.2,
following Rubinstein and Sarnak. We will assume some familiarity with Dirichlet L-
functions (in Section C.5, we recall the relevant definitions and standard facts). Readers
who have not yet been exposed to these functions will probably find it easier to assume
in what follows that q = 4. In this case, there is only one non-trivial Dirichlet L-function
modulo 4, which is defined by
X (−1)k X
L(χ4 , s) = s
= χ4 (n)n−s ,
k>0
(2k + 1) n>1

corresponding to the character χ4 such that


(
0 if n is even
(5.6) χ4 (n) = k
(−1) if n = 2k + 1 is odd.

2 We emphasize, for readers already familiar with analytic number theory, that this does not mean
primitive characters.
79
for n > 1. The arguments should then be reasonably transparent. In particular, any sum
of the type
X∗
(···)
χ (mod 4)

means that one only considers the expression on the right-hand side for the character χ4
defined in (5.6).

5.3. Existence of the Rubinstein–Sarnak distribution


The proof of Theorem 5.2.2 depends roughly on two ingredients:
• On the arithmetic side, we can represent the arithmetic random functions NX as
combinations of x 7→ xiγ , where the γ are ordinates of zeros of the L-functions
modulo q;
• Once this is done, we observe that Kronecker’s Equidistribution Theorem (The-
orem B.6.5) implies convergence in law for any function of this type.
There are some intermediate approximation steps involved, but the ideas are quite intu-
itive.
In this section, we always assume the validity of the Generalized Riemann Hypothesis
modulo q, unless otherwise noted.
For a Dirichlet character χ modulo q, we define random variables ψχ on ΩX by
1 X
ψχ (x) = √ Λ(n)χ(n)
x n6x

for x ∈ ΩX , where Λ is the von Mangoldt function (see Section C.4, especially (C.6), for
the definition of this function).
The next lemma is a key step to express NX in terms of Dirichlet characters. It looks
first like standard harmonic analysis, but there is a subtle point in the proof that is crucial
for the rest of the argument, and for the very existence of the Chebychev bias.
Lemma 5.3.1. We have
X∗
NX,q = mq + ψχ χ + EX,q
χ (mod q)

where EX,q converges to 0 in probability as X → +∞.


Proof. By orthogonality of the Dirichlet characters modulo q (see Proposition C.5.1),
we have X X
ϕ(q)π(x; q, a) = χ(a) χ(p),
χ (mod q) p6x

hence
log x  X∗ log x X  log x 
√ ϕ(q)π(x; q, a) − π(x) = χ(a) √ χ(p) + O √
x x p6x x
χ (mod q)

for x > 2, where the error term accounts for primes p dividing q (for which the trivial
character takes the value 0 instead of 1); in particular, the implied constant depends on q.
We now need to connect the sum over primes, for a fixed character χ, to ψχ . Recall
that the von Mangoldt functions differs little from the characteristic function of primes
80
multiplied by the logarithm function. The sum of this simpler function is the random
variable defined by
1 X
θχ (x) = √ χ(p) log(p)
x p6x
for x ∈ ΩX . It is related to ψχ by
1 XX 1 XX
θχ (x) − ψχ (x) = − √ χ(pk ) log p = − √ χ(p)k log p.
x k>2 k x k>2 k
p 6x p 6x

We can immediately see that the contribution of k > 3 is very small: since the exponent k
is at most of size log x, and |χ(p)| 6 1 for all primes p, it is bounded by
1 XX 1 X (log x)2
√ χ(p)k log p 6 √ (log x)x1/k  ,
x k>2 k x 36k6log x x1/6
p 6x
k>3

where the implied constant is absolute.


For k = 2, there are two cases. If χ2 is the trivial character then
1 X 1 X  1 
√ χ(p)2 log p = √ log p = 1 + O
x √ x √ log x
p6 x p6 x
p-q

by a simple form of the Prime Number Theorem in arithmetic progressions (the General-
ized Riemann Hypothesis would of course give a much better error term, but this is not
needed here). If χ2 is non-trivial, then we have
1 X 1
√ χ(p)2 log p 
x √ log x
p6 x

for the same reason. Thus we have


 1 
(5.7) θχ (x) = ψχ (x) − δ + O χ2
log x
2
where δχ2 is 1 if χ is trivial, and is zero otherwise.
By summation by parts, we have
Z x X
X 1 X  dt
χ(p) = χ(p) log p + χ(p) log p
p6x
log x p6x 2 p6t
t(log t)2
for any Dirichlet character χ modulo q, so that
log x  X∗
(5.8) √ ϕ(q)π(x; q, a) − π(x) = χ(a) θχ (x)
x
χ (mod q)

log x x θχ (t)
Z  log x 
+ √ dt + O √ .
x 2 t1/2 (log t)2 x
We begin by handling the integral for a non-trivial character χ. We have θχ (x) =
ψχ (x) + O(1/ log x) if χ2 6= 1q , which implies
Z x Z x  x1/2 
θχ (t) ψχ (t)
1/2 (log t)2
dt = 1/2 (log t)2
dt + O
2 t 2 t (log x)3
since Z x
1 x1/2
1/2 (log t)2
dt  .
2 t (log x)2
81
If χ2 is trivial, we have an additional constant term θχ (x) − ψχ (x) = 1 + O(1/ log x), and
we get
Z x Z x Z x  x1/2 
θχ (t) ψχ (t) 1
1/2 (log t)2
dt = 1/2 (log t)2
dt + 1/2 (log t)2
dt + O
2 t 2 t 2 t (log x)3
Z x  x1/2 
ψχ (t)
= 1/2 (log t)2
dt + O .
2 t (log x)2
Thus, in all cases, we get
log x x θχ (t) log x x ψχ (t)
Z Z  1 
√ dt = √ dt + O .
x 2 t1/2 (log t)2 x 2 t1/2 (log t)2 log x
Now comes the subtle point we previously mentioned. If we were to use the pointwise
bound ψχ (t)  (log t)2 (which is essentially the content of the Generalized Riemann
Hypothesis) in the remaining integral, we would only get
log x x ψχ (t)
Z
√ dt  log x,
x 2 t1/2 (log t)2
which is too big. So we need to use the integration process non-trivially. Precisely, by
Corollary C.5.11, we have Z x
ψχ (t)dt  x
2
for all x > 2 (this reflects a “smoothing” effect due to the convergence of the series with
terms 1/| 21 + iγ|2 , where γ are the ordinates of zeros of L(s, χ)). Using integration by
parts, we can then deduce that
log x x ψχ (t)
Z Z x 1/2 
log x  x t dt 1
√ 1/2 2
dt  √ 1/2 2
+ 2
 .
x 2 t (log t) x x (log x) 2 (log t) log x
Finally, we transform the first term of (5.8) to express it in terms of ψχ , again us-
ing (5.7). For any element a ∈ (Z/qZ)× and x ∈ ΩX , we have
log x  X X∗  1 
√ ϕ(q)π(x; q, a) − π(x) = − χ(a) + χ(a)ψχ (x) + O
x 2
log x
χ =1 χ (mod q)
χ6=1
X∗  1 
= mq (a) + χ(a)ψχ (x) + O
log x
χ (mod q)

where the implied constant depends on q. Since the error term is  (log x)−1 for x ∈ ΩX ,
it converges to zero in probability, and this concludes the proof. 
We keep further on the notation of the lemma, except that we also sometimes write EX
for EX,q . Since EX tends to 0 in probability and mq is a fixed function on (Z/qZ)× ,
Theorem 5.2.2 will follow (by Corollary B.4.2) from the convergence in law of the random
functions X∗
MX,q = ψχ χ.
χ (mod q)
Now we express these functions in terms of zeros of L-functions. Here and later, a sum
over zeros of a Dirichlet L-function always means implicitly that zeros are counted with
their multiplicity.
We will denote by IX the identity variable x 7→ x on ΩX ; thus, for a complex number s,
the random variable IsX is the function x 7→ xs on ΩX .
82
Below, when we have a random function X on (Z/qZ)× , and a non-negative random
variable Y, the meaning of a statement of the form X = O(Y) is that kXk = O(Y), where
the norm is the euclidean norm, i.e., we have
X
kXk2 = |X(a)|2 .
a∈(Z/qZ)×

Lemma 5.3.2. We have


X∗  X Iiγ
X
  (log X)2 
MX,q = − 1 χ+O
+ iγ X1/2
χ (mod q) |γ|6X 2

where γ ranges over ordinates of zeros of L(s, χ), counted with multiplicity, and the
implied constant depends on q.
Proof. The key ingredient is the (approximate) explicit formula of Prime Number
Theory, which can be stated in the form
1
β− +iγ 1/2
 I log(X)2 
X IX 2
ψχ = − +O X
β + iγ X
L(β+iγ)=0
|γ|6X

where the sum is over zeros of the Dirichlet L-functions with 0 6 β 6 1, counted with
multiplicity (see Theorem C.5.6). Under the assumption of the Generalized Riemann
Hypothesis modulo q, we always have β = 21 , and this formula implies
X Iiγ  (log X)2 
X
ψχ = − 1 + O .
2
+ iγ X1/2
|γ|6X

Summing over the characters (the number of which is ϕ(q) − 1 6 q), the formula follows.

Probabilistically, we have now a finite linear combination (of length depending on X)
of the random variables IiγX . The link with probability theory, and to the existence of
the Rubinstein–Sarnak distribution, is then performed by the following theorem (quite
similar to Proposition 3.2.5).
Proposition 5.3.3. Let k > 1 be an integer. Let F be a finite set of real numbers
and let (α(t))t∈F be a family of elements in Ck . The random vectors
X
IitX α(t)
t∈F
on ΩX converge in law as X → +∞.
Proof. After a simple translation, this is a direct consequence of the Kronecker
Equidistribution Theorem B.6.5. Indeed, consider the vector
 t 
z= ∈ RF .
2π t∈F
By Kronecker’s Theorem, the probability measures µY on (R/Z)F defined for Y > 0 by
1
µY (A) = |{y ∈ [0, Y] | yz ∈ A}|,
Y
for any measurable set A, converge in law to the probability Haar measure µ on the
subgroup T of (R/Z)F generated by the classes modulo ZF of the elements yz, where y
ranges over R.
83
We extend the isomorphism θ 7→ e(θ) from R/Z to S1 componentwise to define an
isomorphism of (R/Z)F to (S1 )F . For any continuous function f on (S1 )F , we observe
that
1 Y
Z Z
f (e(v))dµY (v) = f (e(yz))dy
(R/Z)F Y 0
1 Y  ity
Z 
= f (e )t∈F dy
Y 0
Z Y
1 e dx
f ((xit )t∈F ) = EX f ((IitX )t∈F )

=
Y 1 x
for X = eY , after the change of variable x = ey . Hence the vector (IitX )t∈F converges in law
as X → +∞ to the image of µ by v 7→ e(v). Now we finish the proof of the proposition
by composition with the continuous map from (S1 )F to Ck defined by
X
(zt )t∈F 7→ zt α(t),
t∈F

using Proposition B.3.2. 


From the proof, we see that we can make the result more precise:
Corollary 5.3.4. With notation and assumptions as in Proposition 5.3.3, the ran-
dom vectors X
IitX α(t)
t∈F

on ΩX converge in law as X → +∞ to
X
It α(t),
t∈F

where (It )t∈F is a random variable with values in (S1 )F with law given by the probability
Haar measure of the closure of the subgroup of (S1 )F generated by all elements (xit )t∈F
for x ∈ R.
Remark 5.3.5. This proposition explains why the logarithmic weight in (5.1) is abso-
lutely natural. It also hints that it is necessary. Indeed, the statement of the proposition
becomes false if the probability measure PX on ΩX is replaced by the uniform measure.
This is already visible in the simplest case where F = {t} contains a single non-zero real
number t; for instance, taking the test function f to be the identity, observe that with
this other probability measure, the expectation of x 7→ xit is
Z X
1 1 Xit+1 − 1 Xit
xit dx = ∼ ,
X−1 1 it + 1 X − 1 it + 1
which has no limit as X → +∞.
Let T > 2 be a parameter. It follows from Lemma 5.3.2 and Proposition 5.3.3 that
for X > T, we have
X∗  X Iiγ
X
  (log X)2 
(5.9) MX,q = NX,T,q + 1 χ+O √ ,
2
+ iγ X
χ (mod q) T6|γ|6X

84
where
X∗  X Iiγ
X

NX,T,q = − 1 χ
+ iγ
χ (mod q) |γ|6T 2
are random functions that converge in law as X → +∞ for any fixed T > 2. The next
lemma will allow us to check that the remainder term in this approximation is small.
Lemma 5.3.6. Let k > 1 be an integer. Let F be a countable set of real numbers and
let (α(t))t∈F be a family of elements in Ck . Assume that the following conditions hold for
all T > 2 and all t0 ∈ R:
X
(5.10) kα(t)k2 |t|1/2 log(1 + |t|) < +∞,
t∈F
X kα(t)k (log T)2
(5.11)  ,
t∈F
|t|1/4 T1/4
|t|>T

(5.12) |{t ∈ F | |t − t0 | 6 1}|  log(1 + |t0 |).


Then we have X
lim IitX α(t) = 0,
T6X L2
T→+∞ t∈F
|t|>T
where the limit is over pairs (T, X) with T 6 X and T tends to infinity.
In this statement, we use the Hilbert space L2 (ΩX ; Rk ) of Rk -valued L2 -functions
on ΩX , with norm defined by
kf k2L2 = EX (kf k2 )
for f ∈ L2 (ΩX ; Rk ).
Proof. Note first that an explicit computation of the integral gives
i(t −t ) 1 Xi(t1 −t2 ) − 1
EX (IX 1 2 ) =
log X t1 − t2
for t1 6= t2 , hence the general bound
i(t −t )
 1 2 
(5.13) | EX (IX 1 2 )| 6 min 1, .
log X |t1 − t2 |
We will use this bound slightly wastefully (using the first estimate even when it is not
the best of the two) to gain some flexibility.
All sums below involving t, t1 , t2 are restricted to t ∈ F. Assume 25 6 T 6 X. We
have
 X 2
i(t −t )
X X
IitX α(t) = EX IitX α(t) = α(t1 ) · α(t2 ) EX (IX 1 2 ).
L2
t∈F T6|t|6X T6|t1 |,|t2 |6X
|t|>T

We write this double sum as S1 + S2 , where S1 is the contribution of the terms where
|t1 − t2 | 6 |t1 t2 |1/4 , and S2 is the remainder. √
In the sum S1 , we first claim that if T > 2, then the condition |t1 − t2 | 6 |t1 t2 |1/4
implies |t2 | 6 2|t1 |. Indeed, suppose that |t2 | > 2|t1 |. We have
|t2 | 6 |t1 − t2 | + |t1 | 6 |t1 t2 |1/4 + 21 |t2 |,
hence |t2 | 6 2|t1 t2 |1/4 , which implies |t2 | 6 24/3 |t1 |1/3 , and further
2|t1 | < |t2 | 6 24/3 |t1 |1/3 ,
85

which implies that T 6 |t1 | < 2, reaching a contradiction.
Exchanging the roles of t1 and t2 , we see also that |t1 | 6 2|t2 |. In particular, it now
follows that we also have
|t2 − t1 | 6 |t1 t2 |1/4 6 2|t1 |1/2 , |t2 − t1 | 6 |t1 t2 |1/4 6 2|t2 |1/2 .

Still for T > 2, we get
X
|S1 | 6 |α(t1 ) · α(t2 )|
T6|t1 |,|t2 |6X
|t2 −t1 |6|t1 t2 |1/4
1 X
kα(t1 )k2 + kα(t2 )k2

6
2
T6|t1 |,|t2 |6X
|t2 −t1 |6|t1 t2 |1/4
X X X X
6 kα(t1 )k2 1+ kα(t2 )k2 1
T6|t1 |6X T6|t2 |6X T6|t2 |6X T6|t1 |6X
|t2 −t1 |62|t1 |1/2 |t2 −t1 |62|t2 |1/2
X
 kα(t)k2 |t|1/2 log(1 + |t|)
T6|t|6X

by (5.12). This quantity tends to 0 as T → +∞ since the series over all t converges by
assumption (5.10).
For the sum S2 , we have
2 X kα(t1 )k kα(t2 )k 2 X kα(t1 )k kα(t2 )k
|S2 | 6 6 ,
log X |t1 − t2 | log X |t1 t2 |1/4
T6|t1 |,|t2 |6X T6|t1 |,|t2 |6X
|t2 −t1 |>|t1 t2 |1/4 |t2 −t1 |>|t1 t2 |1/4

and therefore
2 X kα(t1 )k X kα(t2 )k 1 (log T)4
|S2 | 6  ,
log X |t1 |1/4 |t2 |1/4 log X T1/2
T6|t1 |6X T6|t2 |6X

by (5.11). The lemma now follows. 


Remark 5.3.7. Although we have stated this lemma in some generality, it is far from
the best that can be achieved along such lines.
The assumptions might look complicated, but note that (5.12) means that the density
of F is roughly logarithmic; then (5.10) and (5.11) are certainly satisfied if the series with
terms kα(t)k is convergent, and more generally when kα(t)k is comparable with (1+|t|)−α
with α > 3/4.
We will now finish the proof of Theorem 5.2.2. We apply Lemma 5.3.6 to the set F
of ordinates γ of zeros of some L(s, χ), for χ a non-trivial character modulo q, and to
X∗ 1
α(γ) = 1 χ
2
+ iγ
χ (mod q)
1
L( +iγ)=0
2
×
for γ ∈ F, viewing α(γ) as a vector in C(Z/qZ) , and taking into account the multiplicity
of the zero 12 + iγ for any character χ such that L( 21 + iγ, χ) = 0. We need to check the
three assumptions of the lemma.
86
From the asymptotic von Mangoldt formula (C.10), we first know that (5.12) holds
for the zeros of a fixed L-function modulo q, with an implied constant depending on q,
and hence it holds also for F.
We next have
X∗ 1 1/2
X∗ 1 ϕ(q)3/2
(5.14) kα(γ)k 6 kχk = ϕ(q) 6
| 12 + iγ| | 12 + iγ| | 12 + iγ|
χ (mod q) χ (mod q)
1 1
L( +iγ)=0 L( +iγ)=0
2 2

by a trivial estimate of the number of characters of which 21 + iγ can be a zero.


Condition (5.10) follows from (5.14), since we even have
X 1
(5.15) < +∞
1
| 12 + iγ|1+ε
L( +iγ,χ)=0
2

for any fixed ε > 0 and any χ (mod q), and condition (5.11) is again an easy consequence
of (5.15) and (5.14).
From (5.9), we conclude that for X > T > 2, we have
MX = NX,T + E0X,T
where
X∗  X Iiγ   (log X)2 
E0X,T = 1
X
χ+O √ .
+ iγ X
χ (mod q) T6|γ|6X 2

These random functions converge to 0 in L2 , hence in L1 , by Lemma 5.3.6 as applied


before. By Proposition B.4.4 (and Remark B.4.6), we conclude that the random func-
tions MX converge in law, and that their limit is the same as the limit as T → +∞ of
the law of the limit of
X∗  X Iiγ 
X
− 1 χ.
2
+ iγ
χ (mod q) |γ|6T

In the next section, we compute these limits, and hence the law of Nq , assuming
that the zeros of the Dirichlet L-functions are “as independent as possible”, so that
Proposition 5.3.3 becomes explicit in the special case of interest.
To finish the proof of Theorem 5.2.2, we need to check the last assertion, namely that
the support of Nq is contained in the hyperplane (5.3). But note that
X log x X
NX (x)(a) = √ (ϕ(q)π(x; q, a) − π(x))
×
x
a∈(Z/qZ) a∈(Z/qZ)×

log x X log x
= √ 1 √
x p6x
x
p (mod q)∈(Z/qZ)
/ ×

for all x ∈ ΩX , since at most finitely many primes are not congruent to some a ∈ (Z/qZ)× .
Hence the random variables
X
NX (a)
a∈(Z/qZ)×

87
converge in probability to 0 as X → +∞, and by Corollary B.3.4, it follows that the
support of Nq is contained in the zero set of the linear form
X
f 7→ f (a),
a∈(Z/qZ)×

i.e., in Hq .
5.4. The Generalized Simplicity Hypothesis
The proof of Theorem 5.2.2 now allows us to understand what is needed for the next
step, which we take to be the explicit determination of the random variable Nq . Indeed,
the proof tells us that Nq is the limit, as T → +∞, of the random variables that are
themselves the limits in law as X → +∞ of the random function given by the finite sum
X∗ X Iiγ
X
mq − 1 χ(a),
1 2
+ iγ
χ (mod q) L( +iγ,χ)=0
2
|γ|6T

which converge by Proposition 5.3.3. The proof of that proposition shows how this
limit Nq,T can be computed in principle. Precisely, let XT be the set of pairs (χ, γ),
where χ runs over non-trivial Dirichlet characters modulo q and γ runs over the ordinates
of the non-trivial zeros of L(s, χ) with |γ| 6 T. Then, by Corollary 5.3.4, we have
X∗ X Iχ,γ
(5.16) Nq,T = mq − 1 χ(a),
1 2
+ iγ
χ (mod q) L( +iγ,χ)=0
2
|γ|6T

where (Iχ,γ ) is distributed on (S1 )XT according to the probability Haar measure of the
closure ST of the subgroup generated by the elements (xiγ )(χ,γ)∈XT for x ∈ R.
Thus, to compute Nq explicitly, we “simply” need to know what the subgroup Sq,T
is. If (hypothetically) this subgroup was equal to (S1 )Xq,T , then the (Iχ,γ ) would simply
be independent and uniformly distributed on S1 , and we would immediately obtain a
formula for Nq from (5.16) as a sum of a series of independent terms.
This hypothesis is however too optimistic. Indeed, there is an “obvious” type of
dependency among the ordinates γ, which amount to restrictions on the subgroup ST
in (S1 )XT . Beyond these relations, there are none that are immediately apparent. The
Generalized Simplicity Hypothesis modulo q is then the statement that, in fact, these
obvious relations should exhaust all possible constraints satisfied by ST .3
These systematic relations between the elements of XT are simply the following: a
complex number 12 + iγ is a zero of L(s, χ) if and only if the conjugate 12 − iγ is a zero of
L(s, χ), simply because L(s̄, χ) = L(s, χ) as holomorphic functions; hence (χ, γ) belongs
to XT if and only if (χ, −γ) does.
We are therefore led to the so-called Generalized Simplicity Hypothesis modulo q.
Definition 5.4.1. Let q > 1 be an integer. The Generalized Simplicity Hypothesis
holds modulo q if the family of non-negative ordinates γ of the non-trivial zeros of all non-
trivial Dirichlet L-functions modulo q, with multiplicity taken into account, is linearly
independent over Q.
We emphasize that we are looking at the family of the ordinates, not just the set of
values. In particular, the Generalized Simplicity Hypothesis modulo q implies that
3 In other words, it is an application of Occam’s Razor.
88
• for a given γ > 0, there is at most one primitive Dirichlet character χ modulo q
such that L( 21 + iγ, χ) = 0,
• all non-trivial zeros are of multiplicity 1,
• we have L( 21 , χ) 6= 0 for any non-trivial character χ.
All these statements are highly non-trivial conjectures!
Lemma 5.4.2. Under the assumption of the Generalized Simplicity Hypothesis mod-
ulo q, the subgroup ST is given by
(5.17) ST = {(zχ,γ ) ∈ (S1 )XT | zχ,−γ = zχ,γ for all (χ, γ) ∈ XT },
for all T > 2. In particular, denoting by X+
T the set of pairs (χ, γ) in XT with γ > 0, the
projection
(5.18) (zχ,γ ) 7→ (zχ,γ )(χ,γ)∈X+
T

1 X+
from ST to (S ) T is surjective.
Proof. Indeed, ST is contained in the subgroup e ST in the right-hand side of (5.17),

because each vector (x )(χ,γ)∈XT has this property for x ∈ R, by the relation between
zeros of the L-functions of χ and χ.
To show that ST is not a proper subgroup of e ST , it is enough to prove the last
assertion, since an element of ST is uniquely determined by the value of the projec-
e
tion (5.18). But if that projection is not surjective, then there exists a non-zero family
of integers (mχ,γ )(χ,γ)∈X+ such that
T
Y
ximχ,γ γ = 1
(χ,γ)∈X+
T

for all x ∈ R, and this implies


X∗ X
mχ,γ γ = 0,
χ (mod q) γ>0

which contradicts the Generalized Simplicity Hypothesis modulo q. 


Remark 5.4.3. If we were also considering problems involving the comparison of the
number of primes in arithmetic progressions with different moduli, say modulo q1 and q2 ,
then there would be another systematic source of relations between the zeros of the L-
functions modulo q1 and q2 . Precisely, if d is a common divisor of q1 and q2 , and χ0 a
Dirichlet character modulo d, corresponding to a character χ0 of (Z/dZ)× , then there is
a Dirichlet character χi modulo qi , for i = 1, 2, corresponding to the composition
χ0
(Z/qi Z)× → (Z/dZ)× −→ C× ,
and we have Y
L(s, χi ) = (1 − χ0 (p)p−s )L(s, χ0 ),
p|qi /d

which shows that the ordinates of the non-trivial zeros of L(s, χ1 ) and L(s, χ2 ) are the
same.
Because of this, the correct formulation of the Generalized Simplicity Hypothesis,
without reference to a single modulus q, is that the non-negative ordinates of zeros of the
L-functions of all primitive Dirichlet characters are Q-linearly independent; this is the
statement as formulated in [105].
89
We can now state precisely the computation of the law of the random function Nq
under the assumption of the Generalized Simplicity Hypothesis modulo q.
To do this, let X+ be the set of all pairs (χ, γ) where χ is a non-trivial Dirichlet
character modulo q and γ > 0 is a non-negative ordinate of a non-trivial zero of L(s, χ),
i.e, we have L( 21 + iγ, χ) = 0. Let (Iχ,γ )(χ,γ)∈X+ be a family of independent random
variables all uniformly distributed over the circle S1 .
Define further
(5.19) Iχ,−γ = Iχ,γ
for all ordinates γ > 0 of a zero of L(s, χ). We have then defined random variables Iχ,γ
for all ordinates of a zero of L(s, χ).
Theorem 5.4.4 (Rubinstein–Sarnak). Let q > 1. In addition to the Generalized
Riemann Hypothesis, assume the Generalized Simplicity Hypothesis modulo q. Then the
law of Nq is the law of the series
X∗  X Iχ,γ 
(5.20) mq − 1 χ,
γ 2
+ iγ
χ (mod q)
1
L( +iγ,χ)=0
2

where the series converges almost surely and in L2 as the limit of partial sums
X∗ X Iχ,γ
(5.21) lim 1 .
T→+∞
1 2
+ iγ
χ (mod q) L( +iγ,χ)=0
2
|γ|6T

In these formulas, for each Dirichlet character χ modulo q, the sum runs over the
ordinates of zeros of L(s, χ).
Remark 5.4.5. (1) Since the Generalized Simplicity Hypothesis modulo q implies
that each zero has multiplicity one (even as we vary χ modulo q), there is no need to
worry about this issue when defining the series over the zeros.
(2) This result shows that the random function Nq is probabilistically quite subtle. It
is somewhat analogue to Bagchi’s measure, or to one of its Bohr–Jessen specializations
(see Theorem 3.2.1), with a sum (or a product) of rather simple individual independent
random variables, but it retains important arithmetic features because the sum and the
coefficients involve the zeros of Dirichlet L-functions (instead of the primes that occur in
Bagchi’s random Euler product).
One important contrasting feature, in comparison with either Theorem 3.2.1 (or Sel-
berg’s Theorem) is that the series defining Nq is not far from being absolutely convergent,
which is not the case at all of the series
X Xp

p
ps
1
that occurs in Bagchi’s Theorem when 2
< Re(s) < 1.
Before giving the proof, we can draw some simple conclusions from Theorem 5.4.4, in
the direction of confirming the existence of a bias for certain residue classes.
Under the assumptions of Theorem 5.4.4, we have E(Nq ) = mq , since the convergence
also holds in L2 , and E(Iχ,γ ) = 0 for all (χ, γ). Using either (5.4) or (5.5), we know that
1 X 1 X X∗
mq (a) = 0, mq (a)2 = σq = 1.
ϕ(q) ×
ϕ(q) ×
a∈(Z/qZ) a∈(Z/qZ) 2
χ =1

90
It is natural to say that “not all residue classes modulo q are equal”, as far as repre-
senting primes is concerned, if the average function mq of Nq is not constant (assuming
that Theorem 5.4.4 is applicable). This is equivalent (by (5.5)) to the existence of at least
one b 6= 1 such that b2 = 1, and therefore holds whenever q 6= 2, since one can always
take b = −1.
This statement can be considered to be the simplest general confirmation of the
Chebychev bias; note that q = 2 is of course an exception, since all primes (with one
exception) are odd.

Remark 5.4.6. (1) The mean-square σq of mq is also the size of the quotient group

(Z/qZ)× /((Z/qZ)× )2

of invertible residues modulo quadratic residues, minus 1. Using the Chinese Remainder
Theorem, this expression can be computed in terms of the factorization of q, namely if
we write
Y
q= pnp ,
p

then we obtain
Y
σq = 2min(n2 −1,2) 2−1
p|q
p>3

(because for p odd, the group of squares is of index 2 in (Z/pnp Z)× if np > 1, whereas
for p = 2, it is trivial if np = 1 or np = 2, and of index 4 if n2 > 3).
(2) Consider once more the case q = 4. Then m4 (1) = −1 and m4 (3) = 1, and in
particular we certainly expect to have, in general, more primes congruent to 3 modulo 4
than there are congruent to 1 modulo 4.
In fact, using Theorem 5.4.4 and numerical tables of zeros of the Dirichlet L-functions
modulo 4 up to some bound T, one can get approximations to the distribution of N4 (e.g.,
through the characteristic function of N4 , and approximate Fourier inversion). Rubinstein
and Sarnak [105, §4] established in this manner that

P(N4 ∈ H4 ∩ C) = 0.9959 . . .

(under the assumptions of Theorem 5.4.4 modulo 4). This confirms a very strong bias for
primes to be ≡ 3 modulo 4, but also shows that one has sometimes π(x; 4, 1) > π(x; 4, 3)
(in fact, in the sense of the probability measure PX , this happens with probability about
1/250, and we have already mentioned that the first occurrence of this reverse inequality
is for X = 26861).

We now give the proof of Theorem 5.4.4. We first check that the series (5.20) converges
almost surely and in L2 in the sense of the limit (5.21).4

4This convergence could be proved without any condition, not even the Generalized Riemann Hy-
pothesis, but the series has no arithmetic meaning without such assumptions.
91
It suffices to prove that each value Nq (a) of the random function Nq converges almost
surely and in L2 . To check this, we first observe that for any T > 2, we have
X∗ X Iχ,γ X∗ X  Iχ,γ Iχ,−γ 
1 χ(a) = 1 χ(a) + 1 χ(a)
1 2
+ iγ 1 2
+ iγ 2
− iγ
χ (mod q) L( +iγ,χ)=0 χ (mod q) L( +iγ,χ)=0
2 2
|γ|6T 0<γ6T
X∗ X  I
χ,γ

(5.22) =2 Re 1 χ(a)
2
+ iγ
χ (mod q) L( 1 +iγ,χ)=0
2
0<γ6T

according to the definition (5.19) of Iχ,γ for negative γ (we use here the fact that, under
the Generalized Simplicity Hypothesis, no zero has ordinate γ = 0).
The right-hand side of (5.22) is the partial sum of a series of independent random
variables, and we can apply Kolmogorov’s Theorem B.10.1. Indeed, we have
  I 
χ,γ
E Re 1 χ(a) = 0
2
+ iγ
for any pair (χ, γ), and
X∗ X   I
χ,γ
 X∗ X  Iχ,γ 2 
V Re 1 χ(a) 6 E 1
γ>0 2
+ iγ γ>0 2
+ iγ
χ (mod q) χ (mod q)
X∗ X 1
= 1 < +∞
+ γ2
χ (mod q) γ>0 4

by Proposition C.5.3 (2), so that the series converges almost surely and in L2 , by Kol-
mogorov’s Theorem, as claimed.
Now we need only go through the steps described above when motivating Defini-
tion 5.4.1. The random function Nq is the limit as T → +∞ of
 X∗ X Iiγ
X

Nq,T = mq − lim 1 χ(a) .
X→+∞ + iγ
χ (mod q) L( 1 +iγ,χ)=0 2
2
|γ|6T

We write once more


X∗ X Iiγ
X
X∗ X  Iiγ 
mq − 1 χ(a) = mq − 2 Re 1 X χ(a) .
+ iγ + iγ
χ (mod q) L( 1 +iγ,χ)=0 2 χ (mod q) L( 1 +iγ,χ)=0 2
2 2
|γ|6T 0<γ6T

By Proposition 5.3.3, or Corollary 5.3.4, as explained above, and the Generalized Sim-
plicity Hypothesis modulo q (precisely through Lemma 5.4.2), the limit as X → +∞ of
these random functions is simply
X∗ X  I
χ,γ

mq − 2 Re 1 χ(a) ,
1 2
+ iγ
χ (mod q) L( +iγ,χ)=0
2
0<γ6T

which in turn converge to the random function Nq as T → +∞ by definition. This


concludes the proof of Theorem 5.4.4.
92
Theorem 5.4.4 is equivalent to the computation of the characteristic function of Nq ,
viewed as a random vector, i.e., of the function
t 7→ E(eit·Nq )
×
for t ∈ R(Z/qZ) , where
X
t·f = ta f (a)
a∈(Z/qZ)×
×
for t = (ta ) ∈ R(Z/qZ) and f : (Z/qZ)× → R. (Indeed, this is how the result is presented
in [105, §3.1].)
To state the formula for the characteristic function, define the Bessel function J0 on R
by
Z 2π
1
J0 (x) = eix cos(t) dt.
2π 0
It is elementary that J0 is a real-valued and even function of x.
Corollary 5.4.7. Let q > 2 be an integer. Assume the Generalized Riemann Hy-
pothesis and the Generalized Simplicity Hypothesis modulo q. The characteristic function
of the law of the Rubinstein–Sarnak distribution Nq modulo q is given by
Y∗ Y  2 |t · χ| 
E(eit·Nq ) = exp(it · mq ) J0 1
γ>0
( 4 + γ 2 )1/2
χ (mod q)
1
L( +iγ,χ)=0
2
×
for t ∈ R(Z/qZ) , where, for each Dirichlet character χ modulo q, the product runs over
the positive ordinates of zeros of L(s, χ).
Proof. Using the previous argument, we write the series defining Nq in the form
X∗ X Iχ,γ Iχ,−γ   X∗ X I
χ,γ

mq − 1 χ + 1 χ = m q − 2 Re 1 χ .
γ>0 2
+ iγ 2
− iγ γ>0 2
+ iγ
χ (mod q) χ (mod q)

Since the characteristic function of a limit in law is the pointwise limit of the char-
acteristic functions of the sequence involved, we obtain using the independence of the
random variables (Iχ,γ ) the convergent product formula
Y∗ Y  −2it·Re Iχ,γ χ 
it·Nq it·mq 1/2+iγ
E(e )=e E e
χ (mod q) γ>0
Y∗ Y  t·χ 
= eit·mq ϕ 1
γ>0 2
+ iγ
χ (mod q)

where, for z ∈ C, we defined


ϕ(z) = E(e−2i Re(zI) )
for a random variable I uniformly distributed over the unit circle. By invariance of the
law of I under rotation (i.e., the law of zeiθ I is the same as that of zI for any θ ∈ R),
applied to the angle θ such that zeiθ = |z|, we have
Z 2π
−2i Re(|z|I) −2i|z| Re(I) 1
ϕ(z) = E(e ) = E(e )= e−2i|z| cos(t) dt = J0 (2|z|).
2π 0
93
Hence we obtain
it·Nq it·mq
Y∗ Y  |t · χ| 
E(e )=e J0 2 1 ,
γ>0
| 2 + iγ|
χ (mod q)
as claimed. 
Another consequence of Theorem 5.4.4 is an estimate for the probability that Nq takes
large values.
Corollary 5.4.8. There exists a constant cq > 0 such that, for A > 0, we have

c−1
q exp(− exp(cq A
1/2
)) 6 lim inf PX (kNX,q k > A)
X→+∞

6 lim sup PX (kNX,q k > A) 6 cq exp(− exp(c−1


q A
1/2
)).
X→+∞

Proof. We view Nq as a random variable with values in the complex finite-dimensional


Banach space of complex-valued functions on (Z/qZ)× . We have the series representation
X∗ X  I
χ,γ

Nq = mq − 2 Re 1 χ .
γ>0 2
+ iγ
χ (mod q)
1
L( +iγ,χ)=0
2

This series converges almost surely, the terms are independent and the random vari-
ables Iχ,γ are bounded by 1 in modulus. Moreover
P(kNq k > A) 6 P(kN
e q k > A)
where
X∗ X Iχ,γ
e q = mq − 2
N χ,
1
γ>0 2
+ iγ
χ (mod q)
1
L( +iγ,χ)=0
2

since kNq k 6 kN
e q k. By Corollary C.5.5, the functions
2
−1 χ
2
+ iγ
satisfy the bounds described in Remark B.11.14 (2), namely
X∗ X 1
1 χ  (log T)2
0<γ<T 2
+ iγ
χ (mod q)
1
L( +iγ,χ)=0
2

and
X∗ X 1 2 log T
1 χ 
γ>T 2
+ iγ T
χ (mod q)
1
L( +iγ,χ)=0
2
for T > 1. Thus by Remark B.11.14 (2), and the convergence in law of NX,q to Nq , we
deduce the upper bound
lim sup PX (kNX,q k > A) 6 P(kNq k > A) 6 c exp(− exp(c−1 A1/2 ))
X→+∞

for some real number c > 0.


In the case of the lower bound, it suffices to prove it for Nq (a), where a is any
fixed element of (Z/qZ)× . Since the series expressing Nq (a) is not exactly of the form
94
required for the lower-bound in Remark B.11.14 (2) (and in Proposition B.11.13), we first
transform it a bit. We have
 I  1 γ
χ,γ
Re 1 χ(a) = 1 2
Re(Iχ,γ χ(a)) + 1 Im(Iχ,γ χ(a)),
2
+ iγ 2( 4 + γ ) 4
+ γ2
for any pair (χ, γ), which implies that
X∗ X γ
Nq (a) = mq (a) + eq (a) − 2 1 Im(Iχ,γ χ(a))
γ>0 4
+ γ2
χ (mod q)
1
L( +iγ,χ)=0
2

where the random variable eq (a) (arising from the sum of the first terms in the previous
expression) is uniformly bounded (by Proposition C.5.3 (2)). Now we can apply the
lower bound in Remark B.11.14 (2) to the last series: the random variables Im(Iχ,γ χ(a))
are independent, symmetric and bounded by 1, and the assumptions on the size of the
coefficients are provided by Corollary C.5.5 again. 
5.5. Further results
In recent years, the Chebychev bias has been a popular topic in analytic number
theory; besides further studies of the original setting that we have discussed, it has also
been generalized in many ways. We only indicate a few examples here, without any
attempt to completeness.
In the first direction, there have been many studies of the properties of the Rubinstein–
Sarnak measures, and of the consequences concerning various “races” between primes (see,
for instance, the papers of Granville and Martin [50] and Harper and Lamzouri [56]).
In parallel, attempts have been made to weaken the assumptions used by Rubinstein
and Sarnak to establish properties of their measures (recall that the existence of the
measure does not require the Generalized Simplicity Hypothesis). Among these, we refer
in particular to the work of Devin [26], who found a much weaker condition that ensures
that the Rubinstein–Sarnak measure is absolutely continuous.
Among generalizations, it seems worth mentioning the discussion by Sarnak [107]
of a bias related to elliptic curves, as well as the recent extensive work of Fiorilli and
Jouve [38] concerning Artin L-functions. In another direction, Kowalski [68] and later
Cha–Fiorilli–Jouve [23] have considered analogue questions over finite fields, where the
main difference is that relations between zeros of the analogues of the Dirichlet L-functions
may well exist (although they are rare), leading to new phenomena.

95
CHAPTER 6

The shape of exponential sums

Probability tools Arithmetic tools


Definition of convergence in law (§ B.3) Kloosterman sums (§ C.6)
Kolmogorov’s Theorem for random series Riemann Hypothesis over finite fields
(th. B.10.1) (th. C.6.4)
Convergence of finite distributions Average Sato–Tate theorem
(def. B.11.2)
Kolmogorov’s Criterion for tightness Weyl criterion (§ B.6)
(prop. B.11.10)
Fourier coefficients criterion (prop. B.11.8) Deligne’s Equidistribution Theorem
Subgaussian random variables (§ B.8)
Talagrand’s inequality (th. B.11.12)
Support of a random series (prop. B.10.8)

6.1. Introduction
We consider in this chapter a rather different type of arithmetic objects: exponential
sums and their partial sums. Although the ideas that we will present apply to very
general situations, we consider as usual only an important special case: the partial sums
of Kloosterman sums modulo primes. In Section C.6, we give some motivation for the
type of sums (and questions) discussed in this chapter.
Thus let p be a prime number. For any pair (a, b) of invertible elements in the finite
field Fp = Z/pZ, the (normalized) Kloosterman sum Kl(a, b; p) is defined by the formula
1 X  ax + bx̄ 
Kl(a, b; p) = √ e ,
p ×
p
x∈Fp

where we recall that we denote by e(z) the 1-periodic function defined by e(z) = e2iπz ,
and that x̄ is the inverse of x modulo p.
These are finite sums, and they are of great importance in many areas of number
theory, especially in relation with automorphic and modular forms and with analytic
number theory (see [66] for a survey of the origin of these sums and of their applications,
due to Poincaré, Kloosterman, Linnik, Iwaniec, and others). Among their remarkable
properties is the following estimate for the modulus of Kl(a, b; p), due to A. Weil: for any
(a, b) ∈ F× ×
p × Fp , we have

(6.1) | Kl(a, b; p)| 6 2.


96
0.8

0.6

0.4

0.2

0.5 1.0 1.5

Figure 6.1. The partial sums of Kl(1, 1; 139).



This is a very strong result if one considers that Kl(a, b; p) is, up to dividing by p, the

sum of p−1 roots of unity, so that the “trivial” estimate is that | Kl(a, b; p)| 6 (p−1)/ p.
What this reveals is that the arguments of the summands e((ax + bx̄)/p) in C vary in
a very complicated manner that leads to this remarkable cancellation property. This is
due essentially to the very “random” behavior of the map x 7→ x̄ when seen at the level
of representatives of x and x̄ in the interval {0, . . . , p − 1}.

From a probabilistic point of view, the order of magnitude p of the sum (before
normalization) is not unexpected. If we simply heuristically model an exponential sum
as above by a random walk with independent summands uniformly distributed on the
unit circle, say
SN = X1 + · · · + XN
where the random variables (Xn ) are independent and uniform√on the unit circle, then the
Central Limit Theorem implies a convergence √ in law of XN / N to a standard complex
gaussian random variable, which shows that N is the “right” order of magnitude. Note
however that probabilistic analogies
√ of this type would also suggest that SN is sometimes
(although rarely) larger than N (the √ law of the iterated logarithm suggests that it should
almost surely reach values as large as N(log log N); see, e.g., [9, Th. 9.5]). Hence Weil’s
bound (6.1) indicates that the summands defining the Kloosterman sum have very special
properties.
This probabilistic analogy and the study of random walks (or sheer curiosity) suggests
to look at the partial sums of Kloosterman sums, and the way they move in the complex
plane. This requires some ordering of the sum defining Kl(a, b; p), which we simply achieve
by summing over 1 6 x 6 p − 1 in increasing order. Thus we will consider the p − 1
points
1 X  ax + bx̄ 
zj = √ e
p 16x6j p
for 1 6 j 6 p − 1. We illustrate this for the sum Kl(1, 1; 139) in Figure 6.1.
Because this cloud of points is not particularly enlightening, we refine the construction
by joining the successive points with line segments. This gives the result in Figure 6.2
for Kl(1, 1; 139). If we change the values of a and b, we observe that the figures change
in apparently random and unpredictable way, although some basic features remain (the
final point is on the real axis, which reflects the easily-proven fact that Kl(a, b; p) ∈ R,
and there is a reflection symmetry with respect to the line x = 21 Kl(a, b; p)). For instance,
Figure 6.3 shows the curves corresponding to Kl(2, 1; 139), Kl(3, 1; 139) and Kl(4, 1; 139);
see [71] for many more pictures.
97
0.8

0.6

0.4

0.2

0.5 1.0 1.5

Figure 6.2. The partial sums of Kl(1, 1; 139), joined by line segments.
0.3

0.3

0.2
0.6

0.2

0.1

0.1

0.4
-0.6 -0.4 -0.2 0.2 0.4 0.6 0.8

-0.8 -0.6 -0.4 -0.2 0.2


-0.1

-0.1
0.2
-0.2

-0.2

-0.3

-0.3
-2.0 -1.5 -1.0 -0.5

-0.4

-0.4

Figure 6.3. The partial sums of Kl(a, 1; 139) for a = 2, 3, 4.

We then ask whether there is a definite statistical behavior for these Kloosterman
paths as p → +∞, when we pick (a, b) ∈ F× ×
p × Fp uniformly at random. As we will see,
this is indeed the case!
To state the precise result, we introduce some further notation. Thus, for p prime
and (a, b) ∈ F× ×
p × Fp , we denote by Kp (a, b) the function

[0, 1] −→ C
such that, for 0 6 j 6 p − 2, the value at a real number t such that
j j+1
6t<
p−1 p−1
is obtained by interpolating linearly between the consecutive partial sums
1 X  ax + bx̄  1 X  ax + bx̄ 
zj = √ e and zj+1 = √ e .
p 16x6j p p 16x6j+1 p

The path t 7→ Kp (a, b)(t) is the polygonal path described above; for t = 0, we have
Kp (a, b)(0) = 0, and for t = 1, we obtain Kp (a, b)(1) = Kl(a, b; p).
Let Ωp = F× ×
p × Fp . We view Kp as a random variable

Ωp −→ C([0, 1]),
where C([0, 1]) is the Banach space of continuous functions ϕ : [0, 1] → C with the
supremum norm kϕk∞ = sup |ϕ(t)|. Alternatively, we may think of the family of random
variables (Kp (t))t∈[0,1] such that
(a, b) 7→ Kp (a, b)(t),
and view it as a “stochastic process” with t playing the role of “time”.
Here is the theorem that gives the limiting behavior of these arithmetically-defined
random variables, proved by Kowalski and Sawin in [79].
98
Theorem 6.1.1. Let (STh )h∈Z be a sequence of independent random variables, all
distributed according to the Sato–Tate measure
r
1 x2
µST = 1 − dx
π 4
on [−2, 2].
(1) The random Fourier series
X e(ht) − 1
K(t) = tST0 + STh
h∈Z
2iπh
h6=0

defined for t ∈ [0, 1] converges uniformly almost surely, in the sense of symmetric partial
sums
X e(ht) − 1
K(t) = tST0 + lim STh .
H→+∞
h∈Z
2iπh
16|h|<H

This random Fourier series defines a C([0, 1])-valued random variable K.


(2) As p → +∞, the random variables Kp converge in law to K, in the sense of
C([0, 1])-valued variables.
The Sato–Tate measure is better known in probability as a semi-circle law, but its
appearance in Theorem 6.1.1 is really due to the group-theoretic interpretation that
often arises in number theory, and reflects the choice of name. Namely, we recall (see
Example B.6.1 (3)) that µST is the direct image under the trace map of the probability
Haar measure on the compact group SU2 (C).
Note in particular that the theorem implies, by taking t = 1, that the Kloosterman
sums Kl(a, b; p) = Kp (a, b)(1), viewed as random variables on Ωp , become asymptotically
distributed like K(1) = ST0 , i.e., that Kloosterman sums are Sato–Tate distributed in
the sense that for any real numbers −2 6 α < β 6 2, we have
Z β
1 × ×
|{(a, b) ∈ Fp × Fp | α < Kl(a, b; p) < β}| −→ dµST (t).
(p − 1)2 α

This result is a famous theorem of N. Katz [61]. In some sense, Theorem 6.1.1 is a “func-
tional” extension of this equidistribution theorem. In fact, the key arithmetic ingredient
in the proof is an extension of the results and methods developed by Katz to prove many
similar statements.
Remark 6.1.2. Although we do not require this, we mention a few regularity prop-
erties of the random series K(t): it is almost surely nowhere differentiable, but almost
surely Hölder-continuous of order α for any α < 1/2 (see the references in [79, Prop.
2.1]; these follow from general results of Kahane).

6.2. Proof of the distribution theorem


We will explain the proof of the theorem. We use a slightly different approach than the
original article, bypassing the method of moments, and exploiting some simplifications
that arise from the consideration of this single example.
The proof will be complete from a probabilistic point of view, but it relies on an
extremely deep arithmetic result that we will only be able to view as a black box in this
book. The crucial underlying result is the very general form of the Riemann Hypothesis
over finite fields, and the formalism that is attached to it. This is due to Deligne, and the
99
particular application we use relies extensively on the additional work of Katz. All of this
builds on the algebraic-geometric foundations of Grothendieck and his school (see [59,
Ch. 11] for an introduction).
In outline, the proof has three steps:
• Step 1: Show that the random Fourier series K exists, as a C([0, 1])-valued
random variable;
• Step 2: Prove that (a small variant of) the sequence of Fourier coefficients of
Kp converges in law to the sequence of Fourier coefficients of K;
• Step 3: Prove that the sequence (Kp )p is tight (Definition B.3.6), using Kol-
mogorov’s Tightness Criterion (Proposition B.11.10).
Once this is done, a simple probabilistic statement (Proposition B.11.8, which is a
variant of Prokhorov’s Theorem B.11.4) shows that the combination of (2) and (3) implies
that Kp converges to K. Both steps (2) and (3) involve non-trivial arithmetic information;
indeed, the main input in (2) is exceptionally deep, as we will explain soon.
We denote by Pp and Ep the probability and expectation with respect to the uniform
measure on Ωp = F× ×
p × Fp . Before we begin the proof in earnest, it is useful to see why
the limit arises, and why it is precisely this random Fourier series. The idea is to use
discrete Fourier analysis to represent the partial sums of Kloosterman sums.
Lemma 6.2.1. Let p > 3 be a prime and a, b ∈ F×p . Let t ∈ [0, 1]. Then we have

1 X  an + bn̄  X
(6.2) √ e = αp (h, t) Kl(a − h, b; p),
p p
16n6(p−1)t |h|<p/2

where
1 X  nh 
αp (h, t) = e .
p p
16n6(p−1)t

Proof. This is a case of the discrete Plancherel formula, applied to the characteristic
(indicator) function of the discrete interval of summation; to check it quickly, insert the
definitions of αp (h, t) and of Kl(a − h, b; p) in the right hand-side of (6.2). This shows
that it is equal to
X 1 X X X  nh   (a − h)m + bm̄ 
αp (h, t) Kl(a − h, b; p) = 3/2 e e
p p p
|h|<p/2 |h|<p/2 16n6(p−1)t m∈Fp

1 X X  am + bm̄  1 X  h(n − m) 
=√ e e
p m∈F
p p h∈F p
16n6(p−1)t p p

1 X  an + bn̄ 
=√ e ,
p p
16n6(p−1)t

as claimed, since by the orthogonality of characters we have


1 X  h(n − m) 
e = δ(n, m)
p h∈F p
p

for any n, m ∈ Fp , where δ(n, m) = 1 if n = m modulo p, and otherwise δ(n, m) = 0. 


If we observe that αp (h, t) is essentially a Riemann sum for the integral
Z t
e(ht) − 1
e(ht)dt =
0 2iπh
100
for all h 6= 0, and that αp (0, t) → t as p → +∞, we see that the right-hand side of (6.2)
looks like a Fourier series of the same type as K(t), with coefficients given by shifted
Kloosterman sums Kl(a − h, b; p) instead of STh . Now the crucial arithmetic information
is contained in the following very deep theorem:
Theorem 6.2.2 (Katz; Deligne). Fix an integer b 6= 0. For p prime not dividing b,
consider the random variable
Sp : a 7→ (Kl(a − h, b; p))h∈Z
on F×
p with uniform probability measure, taking values in the compact topological space
Y
T̂ = [−2, 2].
h∈Z

Then Sp converges in law to the product probability measure


O
µST .
h∈Z

In other words, the sequence of random variables a 7→ Kl(a − h, b; p) converges in law to


a sequence (STh )h∈Z of independent Sato–Tate distributed random variables.
Because of this theorem, the formula (6.2) suggests that Kp (t) converges in law to the
random series
X e(ht) − 1
tST0 + STh ,
h∈Z
2iπh
h6=0

which is exactly K(t). We now proceed to the implementation of the three steps above,
which will use this deep arithmetic ingredient.
Remark 6.2.3. There is a subtlety in the argument: although Theorem 6.2.2 holds
for any fixed b, when averaging only over a, we cannot at the current time prove the
analogue of Theorem 6.1.1 for fixed b, because the proof of tightness in the last step uses
crucially both averages.

Step 1. (Existence and properties of the random Fourier series)


We can write the series K(t) as
X e(ht) − 1 e(−ht) − 1 
K(t) = tST0 + STh − ST−h .
h>1
2iπh 2iπh

The summands here, namely


e(ht) − 1 e(−ht) − 1
Xh = STh − ST−h
2iπh 2iπh
for h > 1, are independent and have expectation 0 since E(STh ) = 0 (see (B.8)). Fur-
thermore, since STh is independent of ST−h , and they have variance 1, we have
X X e(ht) − 1 2 e(−ht) − 1 2  X 1
V(Xh ) = + 6 2
< +∞
h>1 h>1
2iπh 2iπh h>1
h

for any t ∈ [0, 1]. From Kolmogorov’s criterion for almost sure convergence of random
series with finite variance (Theorem B.10.1), it follows that for any t ∈ [0, 1], the series
K(t) converges almost surely and in L2 to a complex-valued random variable.
101
To prove convergence in C([0, 1]), we will use convergence of finite distributions com-
bined with Kolmogorov’s Tightness Criterion. Consider the partial sums
X e(ht) − 1
KH (t) = tST0 + STh
2iπh
16|h|6H

for H > 1. These are C([0, 1])-valued random variables. The convergence of KH (t) to K(t)
in L1 , for any t ∈ [0, 1], implies (see Lemma B.11.3) that the sequence (KH )H>1 converges
to K in the sense of finite distributions. Therefore, by Proposition B.11.10, the sequence
converges in the sense of C([0, 1])-valued random variables if there exist constants C > 0,
α > 0 and δ > 0 such that for any H > 1, and real numbers 0 6 s < t 6 1, we have
(6.3) E(|KH (t) − KH (s)|α ) 6 C|t − s|1+δ .
We will take α = 4. We have
X e(ht) − e(hs)
KH (t) − KH (s) = (t − s)ST0 + STh .
2iπh
16|h|6H

This is a sum of independent, centered and bounded random variables, so that by Propo-
sition B.8.2 (1) and (2), it is σH2 -subgaussian with
X e(ht) − e(hs) 2 X e(ht) − e(hs) 2
σH2 = |t − s|2 + 6 |t − s|2 + .
2iπh h6=0
2iπh
16|h|6H

By Parseval’s formula for ordinary Fourier series, we have


X e(ht) − e(hs) 2 Z 1
2
|t − s| + = |ϕs,t (x)|2 dx,
h6=0
2iπh 0

where ϕs,t is the characteristic function of the interval [s, t]. Therefore σH2 6 |t − s|. By
the properties of subgaussian random variables (see Proposition B.8.3 in Section B.8), we
deduce that there exists C > 0 such that
E(|KH (t) − KH (s)|4 ) 6 CσH4 6 C|t − s|2 ,
which establishes (6.3).
Step 2. (Computation of Fourier coefficients)
As in Section B.11, we will denote by C0 ([0, 1]) the subspace of functions f ∈ C([0, 1])
such that f (0) = 0. For f ∈ C0 ([0, 1]), the sequence FT(f ) = (fe(h))h∈Z is defined by
fe(0) = f (1) and
Z 1
f (h) =
e (f (t) − tf (1))e(−ht)dt
0
for h 6= 0. The map FT is a continuous linear map from C0 ([0, 1]) to C0 (Z), the Banach
space of functions Z → C that tend to zero at infinity.
Lemma 6.2.4. The “Fourier coefficients” FT(Kp ) converge in law to FT(K), in the
sense of convergence of finite distribution.
We begin by computing the Fourier coefficients of a polygonal path. Let z0 and z1 be
complex numbers, and t0 < t1 real numbers. We define ∆ = t1 − t0 and f ∈ C([0, 1]) by
(
1
(z1 (t − t0 ) + z0 (t1 − t)) if t0 6 t 6 t1 ,
f (t) = ∆
0 otherwise,
which parameterizes the segment from z0 to z1 over the interval [t0 , t1 ].
102
Let h 6= 0 be an integer. By direct computation, we find
Z 1
1
(6.4) f (t)e(−ht)dt = − (z1 e(−ht1 ) − z0 e(−ht0 ))+
0 2iπh
1 ∆
Z
1 
(z1 − z0 )e(−ht0 ) e(−hu)du
2iπh ∆ 0
1
=− (z1 e(−ht1 ) − z0 e(−ht0 ))+
2iπh
1 sin(πh∆)   ∆ 
(z1 − z0 ) e −h t0 + .
2iπh πh∆ 2
Consider now an integer n > 1 and a family (z0 , . . . , zn ) of complex numbers. For
0 6 j 6 n − 1, let fj be the function as above relative to the points (zj , zj+1 ) and the
interval [j/n, (j + 1)/n], and define
n−1
X
f= fj ,
j=0

so that f parameterizes the polygonal path joining z0 to z1 to . . . to zn , each over time


intervals of equal length 1/n.
For h 6= 0, we obtain by summing (6.4), using a telescoping sum and the relations
z0 = f (0), zn = f (1), the formula
Z 1
1
(6.5) f (t)e(−ht)dt = − (f (1) − f (0))+
0 2iπh
n−1  h(j + 1 ) 
1 sin(πh/n) X 2
(zj+1 − zj )e − .
2iπh πh/n j=0 n
We specialize this general formula to Kloosterman paths. Let p be a prime, (a, b) ∈
F× ×
p × Fp , and apply the formula above to n = p − 1 and the points

1 X  ax + bx̄ 
zj = √ e , 0 6 j 6 p − 1.
p 16x6j p
For h 6= 0, the h-th Fourier coefficient of Kp − tKp (1) is the random variable on Ωp
that maps (a, b) to
p−1 
1 sin(πh/(p − 1))  h  1 X ax + bx̄   hx 
e − √ e e − .
2iπh πh/(p − 1) 2(p − 1) p x=1 p p−1
Note that for fixed h, we have
 hx   hx   hx   hx 
e − =e − e − =e − (1 + O(p−1 ))
p−1 p p(p − 1) p
for all p and all x such that 1 6 x 6 p − 1, hence
p−1
1 X  ax + bx̄   hx   1 
√ e e − = Kl(a − h, b; p) + O √ ,
p x=1 p p−1 p
where the implied constant depends on h. Let
sin(πh/(p − 1))  h 
βp (h) = e − .
πh/(p − 1) 2(p − 1)
103
Note that |βp (h)| 6 1, so we can express the h-th Fourier coefficient as
1
Kl(a − h, b; p)βp (h) + O(p−1/2 ),
2iπh
where the implied constant depends on h.
Note further that the 0-th component of FT(Kp ) is Kl(a, b; p). Since βp (h) → 1 as p →
+∞ for each fixed h, we deduce from Katz’s equidistribution theorem (Theorem 6.2.2)
and from Lemma B.4.3 (applied to the vectors of Fourier coefficients at h1 , . . . , hm
for arbitrary m > 1) that FT(Kp ) converges in law to FT(K) in the sense of finite
distributions.
Step 3. (Tightness of the Kloosterman paths)
We now come to the second main step of the proof of Theorem 6.1.1: the fact that
the sequence (Kp )p is tight. According to Kolmogorov’s Criterion (Proposition B.11.10),
it is enough to find constants C > 0, α > 0 and δ > 0 such that, for all primes p > 3 and
all t and s with 0 6 s < t 6 1, we have
(6.6) Ep (|Kp (t) − Kp (s)|α ) 6 C|t − s|1+δ .
We denote by γ > 0 the real number such that
|t − s| = (p − 1)−γ .
So γ is larger when t and s are closer. The proof of (6.6) involves two different ranges.
Assume first that γ > 1 (that is, that |t − s| < 1/(p − 1)). In that range, we use the
polygonal nature of the paths x 7→ Kp (x), which implies that
p p
|Kp (t) − Kp (s)| 6 p − 1|t − s| 6 |t − s|
√ √
(since the “velocity” of the path is (p − 1)/ p 6 p − 1). Consequently, for any α > 0,
we have
(6.7) Ep (|Kp (t) − Kp (s)|α ) 6 |t − s|α/2 .
In the remaining range γ 6 1, we will use the discontinuous partial sums K̃p (t) instead
of Kp (t). To check that this is legitimate, note that
1
|K̃p (t) − Kp (t)| 6 √
p
for all primes p > 3 and all t. Hence, using Hölder’s inequality, we derive for α > 1 the
relation
Ep (|Kp (t) − Kp (s)|α ) = Ep (|K̃p (t) − K̃p (s)|α ) + O(p−α/2 )
(6.8) = Ep (|K̃p (t) − K̃p (s)|α ) + O(|t − s|α/2 )
where the implied constant depends only on α.
We take α = 4. The following computation of the fourth moment is an idea that goes
back to Kloosterman’s very first non-trivial estimate for individual Kloosterman sums.
We have
1 X  an + bn̄ 
K̃p (t) − K̃p (s) = √ e ,
p n∈I p
where I is the discrete interval
(p − 1)s < n 6 (p − 1)t
104
of summation. The length of I is
b(p − 1)tc − d(p − 1)se 6 2(p − 1)|t − s|
since (p − 1)|t − s| > 1.
By expanding the fourth power, we get
1 X 1 X  an + bn̄  4
Ep (|K̃p (t) − K̃p (s)|4 ) = √ e
(p − 1)2 p n∈I p
(a,b)∈F× ×
p ×Fp

1 X X  a(n + n − n − n )   b(n̄ + n̄ − n̄ − n̄ ) 
1 2 3 4 1 2 3 4
= 2 2
e e .
p (p − 1) a,b n ,...,n ∈I p p
1 4

After exchanging the order of the sums, which “separates” the two variables a and b, we
get
1 X  X  a(n1 + n2 − n3 − n4 )  X  b(n̄1 + n̄2 − n̄3 − n̄4 ) 
e e .
p2 (p − 1)2 n ,...,n ∈I ×
p ×
p
1 4 a∈Fp b∈Fp

The orthogonality relations for additive character (namely the relation


1 X  ah  1
e = δ(h, 0) −
p ×
p p
a∈Fp

for any h ∈ Fp ) imply that


1 X
(6.9) Ep (|K̃p (t) − K̃p (s)|4 ) = 1 + O(|I|3 (p − 1)−3 ).
(p − 1)2 n1 ,...,n4 ∈I
n1 +n2 =n3 +n4
n̄1 +n̄2 =n̄3 +n̄4

Fix first n1 and n2 in I with n1 + n2 6= 0. Then if (n3 , n4 ) satisfy


n1 + n2 = n3 + n4 , n̄1 + n̄2 = n̄3 + n̄4 ,
the value of n3 + n4 is fixed, and n̄1 + n̄2 is non-zero, so
n3 + n4
n3 n4 =
n̄1 + n̄2
(in F×p ) is also fixed. Hence there are at most two pairs (n3 , n4 ) that satisfy the equations
for these given (n1 , n2 ). This means that the contribution of these n1 , n2 to (6.9) is
6 2|I|2 (p − 1)−2 . Similarly, if n1 + n2 = 0, the equations imply that n3 + n4 = 0, and
hence the solutions are determined uniquely by (n1 , n3 ). Hence the contribution is then
6 |I|2 (p − 1)2 , and we get
Ep (|K̃p (t) − K̃p (s)|4 )  |I|2 (p − 1)−2 + |I|3 (p − 1)−3  |t − s|2 ,
where the implied constants are absolute. Using (6.8), this gives
(6.10) Ep (|Kp (t) − Kp (s)|4 )  |t − s|2
with an absolute implied constant. Combined with (6.7) with α = 4 in the former range,
this completes the proof of tightness.
Final Step. (Proof of Theorem 6.1.1) In view of Proposition B.11.8, the theorem
follows directly from the results of Steps 2 and 3.

105
Remark 6.2.5. The proof of tightness uses crucially that we average over both a
and b to reduce the problem to counting the number of solutions of certain equations
over Fp (see (6.9)), which turn out to be accessible. Since Kl(a, b; p) = Kl(ab; 1, p) for
all a and b in F×p , it seems natural to try to prove an analogue of Theorem 6.1.1 when
averaging only over a, with b = 1 fixed. The convergence of finite distributions extends
to that setting (since Theorem 6.2.2 holds for any fixed b), but a proof of tightness is not
currently known for fixed b. Using moment estimates (derived from Deligne’s Riemann
Hypothesis) and the trivial bound
K̃p (t) − K̃p (s) 6 |I|p−1/2 ,
one can check that it is enough to prove a suitable estimate for the average over a in the
restricted range where
1 1
−η 6γ 6 +η
2 2
for some fixed but arbitrarily small value of η > 0 (see [79, §3]). The next exercise
illustrates this point.
Exercise 6.2.6. Assume p is odd. Let Ω0p = F× × 2 × 2
p × (Fp ) , where (Fp ) is the set of
non-zero squares in Fp . We denote by Kp (t) the random variable Kp (t) restricted to Ω0p ,
× 0

with the uniform probability measure, for which P0p (·) and E0p (·) denote probability and
expectation.
(1) Prove that FT(K0p ) converges to FT(K) in the sense of finite distributions.
(2) For n ∈ Fp , prove that
X  bn  p − 1 √
e = δ(n, 0) + O( p)
× 2
p 2
b∈(Fp )

where the implied constant is absolute. [Hint: Show that if n ∈ F×


p , we have
X  nb2  √
e = p,
×
p
b∈Fp

where the left-hand sum is known as a quadratic Gauss sum, see Example C.6.2 (1) and
Exercise C.6.5.]
(3) Deduce that if |t − s| > 1/p, then

E0p (|K0p (t) − K0p (s)|4 )  p|t − s|3 + |t − s|2
where the implied constant is absolute.
(3) Using notation as in the proof of tightness for Kp , prove that if η > 0, α > 1 and
1
+ η 6 γ 6 1,
2
then
E0p (|K0p (t) − K0p (s)|α )  |t − s|αη + |t − s|α/2 ,
where the implied constant depends only on α.
(4) Prove that if η > 0 and
1
0 6 γ 6 − η,
2
then there exists δ > 0 such that
E0p (|K0p (t) − K0p (s)|4 )  |t − s|1+δ ,
106
where the implied constant depends only on η.
(5) Conclude that (K0p ) converges in law to K in C([0, 1]). [Hint: It may be convenient
to use the variant of Kolmogorov’s tightness Criterion in Proposition B.11.11.]

6.3. Applications
We can use Theorem 6.1.1 to gain information on partial sums of Kloosterman sums.
We will give two examples, one concerning large values of the partial sums, and the other
dealing with the support of the Kloosterman paths, following [12].
Theorem 6.3.1. For p prime and A > 0, let Mp (A) and Np (A) be the events
n 1 X  an + bn̄  o
Mp (A) = (a, b) ∈ F×
p × F×
p | max √ e > A ,
16j6p−1 p 16n6j p
n 1 X  an + bn̄  o
Np (A) = (a, b) ∈ F×
p × F×
p | max √ e > A .
16j6p−1 p 16n6j p
There exists a positive constant c > 0 such that, for any A > 0, we have
c−1 exp(− exp(cA)) 6 lim inf Pp (Np (A)) 6 lim sup Pp (Mp (A)) 6 c exp(− exp(c−1 A)).
p→+∞ p→+∞

In particular, partial sums of normalized Kloosterman sums are unbounded (whereas


the full normalized Kloosterman sums are always of modulus at most 2), but large values
of partial sums are extremely rare.
Proof. The functions t 7→ Kp (a, b)(t) describe polygonal paths in the complex plane.
Since the maximum modulus of a point on such a path is achieved at one of the vertices,
it follows that
1 X  an + bn̄ 
max √ e = kKp (a, b)k∞ ,
16j6p−1 p 16n6j p
so that the event Mp (A) is the same as {kKp k∞ > A}, and Np (A) is the same as {kKp k∞ >
A}.
By Theorem 6.1.1 and composition with the norm map (Proposition B.3.2), the real-
valued random variables kKp k∞ converge in law to the random variable kKk∞ , the norm
of the random Fourier series K. By elementary properties of convergence in law, we have
therefore
P(kKk∞ > A) 6 lim inf Pp (Np (A)) 6 lim sup Pp (Mp (A)) 6 P(kKk∞ > A).
p→+∞ p→+∞

So the problem is reduced to questions about the limiting random Fourier series.
We first consider the upper-bound. Here it suffices to prove the existence of a constant
c > 0 such that
P(k Im(K)k∞ > A) 6 c exp(− exp(c−1 A)),
P(k Re(K)k∞ > A) 6 c exp(− exp(c−1 A)).
We will do this for the real part, since the imaginary part is very similar and can be left
as an exercise. The random variable R = Re(K) takes values in the separable real Banach
space CR ([0, 1]) of real-valued continuous functions on [0, 1]. It is almost surely the sum
of the random Fourier series X
R= ϕh Yh ,
h>0

107
where ϕh ∈ CR ([0, 1]) and the random variables Yh are defined by
ϕ0 (t) = 2t, Y0 = 12 ST0 ,
sin(2πht)
ϕh (t) = , Yh = 41 (STh + ST−h ) for h > 1.
8πh
We note that the random variables (Yh ) are independent and that |Yh | 6 1 (almost
surely) for all h. We can then apply the bound of Proposition B.11.13 (1) to conclude.
We now prove the lower bound. It suffices to prove that there exists c > 0 such that
(6.11) P(| Im(K(1/2))| > A) > c−1 exp(− exp(cA)),
since this implies that
P(kKk∞ > A) > c−1 exp(− exp(cA)).
We have
1 X cos(πh) − 1 1X1
Im(K(1/2)) = − STh = STh ,
2π h6=0 h π h>1 h
which is a series which converges almost surely in R with independent terms, and
where π1 STh is symmetric and 6 1 in absolute value for all h. Thus the bound
P(| Im(K(1/2))| > A) > c−1 exp(− exp(cA))
for some c > 0 follows immediately from Proposition B.11.13 (2). 
Remark 6.3.2. In the lower bound, the point 1/2 could be replaced by any t ∈]0, 1[ for
the imaginary part, and one could also use the real part and any t such that t ∈ / {0, 1/2, 1};
the symmetry of the Kloosterman paths with respect to the line x = 21 Kl(a, b; p) shows
that the real part of Kp (a, b)(1/2) is 21 Kl(a, b; p), and this is a real number in [−1, 1].
For our second application, we compute the support of the random Fourier series K.
Theorem 6.3.3. The support of the law of K is the set of all f ∈ C0 ([0, 1]) such that
(1) We have f (1) ∈ [−2, 2],
(2) For all h 6= 0, we have fe(h) ∈ iR and
1
|fe(h)| 6 .
π|h|

Proof. Denote by S the set described in the statement. Then S is closed in C([0, 1]),
since it is the intersection of closed sets. By Theorem 6.1.1, a sample function f ∈ C([0, 1])
of the random process K is almost surely given by a series
X e(ht) − 1
f (t) = α0 t + αh
h6=0
2πih

that is uniformly convergent in the sense of symmetric partial sums, for some real numbers
αh such that |αh | 6 2. We have fe(0) = f (1) ∈ [−2, 2], and the uniform convergence
implies that for h 6= 0, we have
αh
fe(h) = ,
2iπh
so that f certainly belongs to S. Consequently, the support of K is contained in S.
108
0.8

0.6

0.4

0.2

-0.4 -0.2 0.2

Figure 6.4. The partial sums of Kl(88, 1; 1021).

We now prove the converse inclusion. By Lemma B.3.3, the support of K contains
the set of continuous functions with uniformly convergent (symmetric) expansions
X e(ht) − 1
tα0 + αh
h6=0
2πih

where αh ∈ [−2, 2] for all h ∈ Z. In particular, since 0 belongs to the support of the
Sato–Tate measure, S contains all finite sums of this type.
Let f ∈ S and put g(t) = f (t) − tf (1). We have
X  |h| 
f (t) − tf (1) = lim gb(h)e(ht) 1 − ,
N→+∞ N
|h|6N

in C0 ([0, 1]), by the uniform convergence of Cesàro means of the Fourier series of a con-
tinuous periodic function (see, e.g., [121, III, th. 3.4]). Evaluating at 0 and subtracting
yields
X  |h| 
f (t) = tf (1) + lim fe(h)(e(ht) − 1) 1 −
N→+∞ N
|h|6N
h6=0
X αh  |h| 
= tf (1) + lim (e(ht) − 1) 1 −
N→+∞ 2iπh N
|h|6N
h6=0

in C([0, 1]), where αh = 2iπhfe(h) for h 6= 0. Then αh ∈ R and |αh | 6 2 by the assumption
that f ∈ S, so each function
X e(ht) − 1  |h| 
tf (1) + αh 1 − ,
2πih N
|h|6N
h6=0

belongs to the support of K. Since the support is closed, we conclude that f also belongs
to the support of K. 
109
The support of K is an interesting set of functions. Testing whether a function f ∈
C0 ([0, 1]) belongs to it, or not, is straightforward if the Fourier coefficients of f are
known, and a positive or negative answer has interesting arithmetic consequences, by
Lemma B.3.3. In particular, since 0 clearly belongs to the support of K, we get:
Corollary 6.3.4. For any ε > 0, we have
1 n
× × 1 X  ax + bx̄  o
lim inf (a, b) ∈ Fp × F p | max √ e < ε > 0.
p→+∞ (p − 1)2 06j6p−1 p 16x6j p

We refer to [12] for further examples of functions belonging (or not) to the support
of K and mention only a remarkable result of J. Bober: the support of K contains space-
filling curves, i.e., functions f such that the image of f has non-empty interior.
6.4. Generalizations
The method of Kowalski and Sawin can be extended to study the “shape” of many
other exponential sums. On the other hand, natural generalizations require different tools,
when the Riemann Hypothesis is not applicable anymore. This was achieved by Ricotta
and Royer [101] for Kloosterman sums modulo pn when n > 2 is fixed and p → +∞,
and later, they succeeded with Shparlinski [102] in obtaining convergence in law in that
setting with a single variable a. If p is fixed and n → +∞, the corresponding study was
done by Milićević and Zhang [87], where tools related to p-adic analysis are crucial. In
the three cases, the limit random Fourier series are similar, but have coefficients that
have distributions different from the Sato–Tate distribution.
Related developments concern quantitative versions of Theorem 6.3.1: how large (and
how often) can one make a partial sum of Kloosterman sums? Results of this kind have
been proved by Lamzouri [82] and Bonolis [14], and in great generality by Autissier,
Bonolis and Lamzouri [3].
Finally, in another direction, Cellarosi and Marklof [22] have established beautiful
functional limit theorems for other types of exponential sums closer to the Weyl sums
that arise in the circle method, and especially for quadratic Weyl sums. The tools as well
as the limiting functions are completely different.
[Further references: Iwaniec and Kowalski [59, Ch. 11].]

110
CHAPTER 7

Further topics

We explained at the beginning of this book that we would restrict our focus on a
certain special type of results in probabilistic number theory: convergence in law of
arithmetically defined sequences of random variables. In this chapter, we will quickly
survey (with some references) some important and beautiful results that either do not
exactly fit our precise setting, or require rather deeper tools than we wished to assume,
or could develop from scratch.
7.1. Equidistribution modulo 1
We have begun this book with the motivating “founding” example of the Erdős–Kac
Theorem, which is usually interpreted as the first result in probabilistic number theory.
However, one could arguably say that at the time when this was first proved, there already
existed a substantial theory that is really part of probabilistic number theory in our sense,
namely the theory of equidistribution modulo 1, due especially to Weyl [120]. Indeed, this
concerns originally the study of the fractional parts of various sequences (xn )n>1 of real
numbers, and the fact that in many cases, including many when xn has some arithmetic
meaning, the fractional parts become equidistributed in [0, 1] with respect to the Lebesgue
measure.
We now make this more precise in probabilistic terms. For a real number x, we will
denote (as in Chapter 3) by hxi the fractional part of x, namely the unique real number in
[0, 1[ such that x − hxi ∈ Z. We can identify this value with the point e(x) = e2iπx on the
unit circle, or with its image in R/Z, either of which might be more convenient. Given
a sequence (xn )n>1 of real numbers, we define random variables SN on ΩN = {1, . . . , N}
(with uniform probability measure) by
SN (n) = hxn i.
Then the sequence (xn )n>1 is said to be equidistributed modulo 1 if the random variables
SN converge in law to the uniform probability measure dx on [0, 1], as N → +∞.
Among other things, Weyl proved the following results:
Theorem 7.1.1. (1) Let P ∈ R[X] be a polynomial of degree d > 1 with leading term
d
ξX where ξ ∈ / Q. Then the sequence (P(n))n>1 is equidistributed modulo 1.
(2) Let k > 1 be an integer, and let ξ = (ξ1 , . . . , ξd ) ∈ (R/Z)d . The closure T of the
set {nξ | n ∈ Z} ⊂ (R/Z)d is a compact subgroup of (R/Z)d and the T-valued random
variables on ΩN defined by
KN (n) = nξ
converge in law as N → +∞ to the probability Haar measure on T.
The second part of this theorem is the same as Theorem B.6.5, (1). We sketch partial
proofs of the first property, which is surprisingly elementary, given the Weyl Criterion
(Theorem B.6.3).
We proceed by induction on the degree d > 1 of the polynomial P ∈ R[X], using a
rather clever trick for this purpose. We may assume that P(0) = 0 (as the reader should
111
check). If d = 1, then P = ξX for some real number ξ, and P(n) = nξ; the assumption is
that ξ is irrational, and the result then follows from the 1-dimensional case of the second
part, as explained in Example B.6.6.
Suppose that d = deg(P) > 2 and that the statement is known for polynomials of
smaller degree. We use the following:
Lemma 7.1.2. Let (xn )n>1 be a sequence of real numbers. Suppose that for any integer
h 6= 0, the sequence (xn+h − xn )n is equidistributed modulo 1. Then (xn ) is equidistributed
modulo 1.
Sketch of the proof. We leave this as an exercise to the reader; the key step is
to use the following very useful inequality of van der Corput: for any integer N > 1, for
any family (an )16n6N of complex numbers, and for any integer H > 1, we have
N
X 2  N + 1 X  |h|  X
an 6 1+ 1− an+h ān .
n=1
H H 16n6N
|h|<H
16n+h6N

We also leave the proof of this inequality as an exercise... 


In the special case of KN (n) = hP(n)i, this means that we have to consider auxiliary
sequences K0N (n) = hP(n + h) − P(n)i, which corresponds to the same problem for the
polynomials
P(X + h) − P(X) = ξ(X + h)d − ξXd + · · · = dξXd−1 + · · ·
Since these polynomials have degree d − 1, and leading coefficient dξ ∈ / Q, the induc-
tion hypothesis applies to prove that the random variables K0N converge to the Lebesgue
measure. By the lemma, so does KN .
Remark 7.1.3. The reader might ask what happens in Theorem B.6.3 if we replace
the integers n 6 N by primes taken uniformly from those that are 6 N. The answer is
that the same properties hold – for both assertions, we have the same limit in law, under
the same conditions on the polynomial for the first one. The proofs are quite a bit more
involved however, and depend on Vinogradov’s fundamental insight on the “bilinear”
nature of the prime numbers. We refer to [59, 13.5, 21.2] for an introduction.
Exercise 7.1.4. Suppose that 0 < α < 1. Prove that the sequence (hnα i)n>1 is
equidistributed modulo 1.
Even in situations where equidistribution modulo 1 holds, there remain many fasci-
nating and widely-open questions when one attempts to go “beyond” equidistribution to
understand fluctuations and variations that lie deeper. One of the best known problem
in this area is that of the distribution of the gaps in a sequence that is equidistributed
modulo 1.
Thus let (xn )n>1 be a sequence in R/Z that is equidistributed modulo 1. For N > 1,
consider the set of the N first values
{x1 , , . . . , xN }
of the sequence. The complement in R/Z of these points is a disjoint union of “intervals”
(in [0, 1[, all but one of them are literally sub-intervals, and the last one “wraps-around”).
The number of these intervals is 6 N (there might indeed be less than N, since some of
the values xi might coincide). The question that arises is: what is the distribution of
the lengths of these gaps? Stated in a different way, the intervals in question are the
112
connected components of R/Z {x1 , . . . , xN }, and we are interested in the Lebesgue
measure of these connected components.
Let ΩN be the set of the intervals in R/Z {x1 , . . . , xN }, with uniform probability
measure. We define random variables by
GN (I) = N length(I)
for I ∈ ΩN . Note that the average gap is going to be about 1/N, so that the multiplication
by N leads to a natural normalization where the average of GN is about 1.
In the case of purely random points located in S1 independently at random, a classical
probabilistic result is that the analogue random variables converge in law to an exponential
random variable E on [0, +∞[, i.e., a random variable such that
Z b
P(a < E < b) = e−x dx
a

for any non-negative real numbers a < b. This is also called the “Poisson” behavior. For
any (deterministic, for instance, arithmetic) sequence (xn ) that is equidistributed modulo
1, one can then ask whether a similar distribution will arise.
Already the special case of the sequence (hnξi), for a fixed irrational number ξ, leads
to a particularly nice and remarkable answer, the “Three Gaps Theorem” (conjectured by
Steinhaus and first proved by Sós [113]). This says that there are at most three distinct
gaps between the fractional parts hnξi for 1 6 n 6 N, independently of N and ξ ∈ / Q.
Although this is in some sense unrelated to our main interests (there is no probabilistic
limit theorem here!) we will indicate in Exercise 7.1.5 the steps that lead to a recent proof
due to Marklof and Strömbergsson [86]. It is rather modern in spirit, as it depends on
the use of lattices in R2 , and especially on the space of lattices.
Very little is known in other cases, but numerical experiments are often easy to per-
form and lead at least to various conjectural statements. For instance, let 0 < α < 1 be
fixed and put xn = hnα i. By Exercise 7.1.4, the sequence (xn )n>1 is equidistributed mod-
ulo 1. In this case, it is expected that GN should have the exponential limiting behavior
for all α except for α = 12 . Remarkably, this exceptional case is the only one where the
answer is known! This is a result of Elkies and McMullen that we will discuss below in
Section 7.5.
Exercise 7.1.5. Throughout this exercise, we fix an irrational number ξ ∈
/ Q.
(1) For g ∈ SL2 (R) and 0 6 t < 1, show that
ϕ(g, t) = inf{y > 0 | there exists x such that − t < x 6 1 − t and (x, y) ∈ Z2 g}
exists. Show that the function ϕ that it defines satisfies ϕ(γg, t) = ϕ(g, t) for all γ ∈
SL2 (Z).
(2) Let N > 1 and 1 6 n 6 N. Prove that the gap between hnξi and the “next”
element of the set
{ hξi, . . . , hNξi }
(i.e., the next one in “clockwise order”) is equal to
1  n
ϕ gN , ,
N N
where    −1 
1 ξ N 0
gN = ∈ SL2 (R).
0 1 0 N
113
(3) Let g ∈ SL2 (R) be fixed. Consider the set
 
2
Ag = gZ ∩ ]−1, 1[ × ]0, +∞[ .
Show that there exists a = (x1 , y1 ) ∈ Ag with y1 > 0 minimal.
(4) Show that either there exists b = (x2 , y2 ) ∈ Ag , not proportional to a, with y2
minimal, or ϕ(g, t) = y1 for all t.
(5) Assume that y2 > y1 . Show that (a, b) is a basis of the lattice gZ2 ⊂ R2 , and that
x1 and x2 have opposite signs. Let
I1 = ]0, 1] ∩ ]−x1 , 1 − x1 ] , I2 = ]0, 1] ∩ ]−x2 , 1 − x2 ] .
Prove that 
y2
 if t ∈ I1
ϕ(g, t) = y1 if t ∈ I2 , t ∈/ I1

y + y
1 2 otherwise.
(6) If y2 = y1 , show that t 7→ ϕ(g, t) takes at most three values by considering similarly
a = (x01 , y10 ) ∈ Ag with x01 > 0 minimal, and b0 = (x02 , y20 ) with x02 < 0 maximal.
0

7.2. Roots of polynomial congruences and the Chinese Remainder Theorem


One case of equidistribution modulo 1 deserves mention since it involves some inter-
esting philosophical points, and has been the subject of a number of important works.
Let f be a fixed integral monic polynomial of degree d > 1. For any integer q > 1, the
number %f (q) of roots of f modulo q is finite, and the function %f is multiplicative (by
the Chinese Remainder Theorem); moreover it is elementary that the set Mf of integers
q > 1 such that %f (q) > 1 is infinite. On the other hand, we always have %f (p) 6 d for p
prime, so %f (q) 6 dω(q) at least when q is squarefree.
Exercise 7.2.1. Prove that Mf is infinite. [Hint: It suffices to check that the set
of primes p such that %f (p) > 1 is infinite; assuming that it is not, show that the set of
values f (n) for n > 1 would be “too small.”]
The question is then: is it true that the fractional parts ha/qi of the roots a ∈ Z/qZ
of f modulo q, when %f (q) > 1, become equidistributed modulo 1?
This problem admits a number of variants, and the deepest is undoubtedly the case
of equidistribution of ha/pi when the modulus p is restricted to be a prime number.
Indeed, it is only when d = 2 and f is irreducible that the equidistribution of roots
modulo primes has been proven, first by Duke–Friedlander–Iwaniec [29] for quadratic
polynomials with negative discriminant, and by Toth [118] for quadratic polynomials
with positive discriminant, i.e., with two real roots.
When all moduli q are taken into account, on the other hand, one can prove equidis-
tribution for any irreducible polynomial, as was first done by Hooley [57]. However,
although one might think that this provides evidence for the stronger statement mod-
ulo primes, it turns out that this result has in fact almost nothing to do with roots of
polynomials!
More precisely, Kowalski and Soundararajan [80] show that equidistribution holds
for the fractional parts of elements of sets modulo q obtained by the Chinese Remainder
Theorem, starting from subsets Apν of Z/pν Z, under the sole condition that Ap should
have at least two elements for a positive proportion of the primes.
In other words, for p prime and ν > 1, let Apν ⊂ Z/pν Z be an arbitrary subset of
residue classes, and for q > 1, define Aq ⊂ Z/qZ to be the set of x (mod q) such that,
114
for all primes p dividing q, with exact exponent ν, we have x (mod pν ) ∈ Apν . Define
%(q) = |Aq |, which is a multiplicative function, and let Ω be the set of all q > 1 such
that Aq is not empty. For any q ∈ Ω, let ∆q be the probability measure on R/Z given by
1 X
∆q = δa ,
%(q) x∈A h q i
q

where δx denotes a Dirac mass at x (the measure ∆q is the image of the uniform probability
measure on Aq by the map a 7→ h aq i). Then [80, Th. 1.1] implies that:

Theorem 7.2.2. Suppose that there exists α > 0 such that


X
1 > απ(Q)
p6Q
%(p)>2

for all Q large enough. Let N(Q) be the number of q 6 Q such that Aq is not empty.
Then the probability measures
1 X
∆q
N(Q) q6Q
q∈Ω

converge to the Lebesgue measure on R/Z.


Example 7.2.3. Let f ∈ Z[X] be monic and without repeated roots. If deg(f ) > 2,
then this theorem applies to the case where Apν is the set of roots of f modulo pν , because
a basic theorem of algebraic number theory (the Chebotarev Density Theorem, see for
instance [91, Th. 13.4]) implies that there is a positive proportion of primes p for which f
has deg(f ) > 2 distinct roots in Z/pZ. However, the theorem shows that we can replace
Apν by any other subset A0pν of Z/pν Z with the same cardinality, without changing the
conclusion concerning the fractional parts modulo all q, whereas (of course) we could
select A0p in such a way that there is no equidistribution modulo primes, in the sense that
the measures
1 X
∆p
P(Q) p6Q
p∈Ω

where P(Q) is the number of primes p 6 Q in Ω, do not converge to the Lebesgue measure.
Remark 7.2.4. Theorem 7.2.2 does not correspond exactly to the setting considered
in [57], which concerns (implicitly) the slightly different probability measures
1 X
(7.1) %(q)∆q
M(Q) q6Q
q∈Ω

where
X
M(Q) = %(q).
q6Q

Interestingly, these two ways of making precise the idea of equidistribution modulo q are
not equivalent: it is shown in [80, Prop. 2.8] that there exist choices of subsets (Ap ) to
which Theorem 7.2.2 applies, but for which the measures (7.1) do not converge to the
uniform measure.
115
7.3. Gaps between primes
The Prime Number Theorem
x
π(x) ∼
log x
indicates that the average gap between successive prime numbers of size x is about log x.
A natural problem, especially in view of the many conjectures that exist concerning
the distribution of primes (such as the Twin Prime conjecture), is to understand the
distribution of these gaps.
One way to do this, which is consistent with our general framework, is the following.
For any integer N > 1, we define the probability space ΩN to be the set of integers n such
that 1 6 n 6 N (as in Chapter 2), with the uniform probability measure. Fix λ > 0. We
then define the random variables
Gλ,N (n) = π(n + λ log n) − π(n)
which measures how many primes exist in the interval starting at n of length equal to λ
times the average gap.
A precise conjecture exists concerning the limiting behavior of Gλ,N as N → +∞:
Conjecture 7.3.1. The sequence (Gλ,N )N converges in law as N → +∞ to a Poisson
random variable with parameter λ, i.e., for any integer r > 0, we have
λr
PN (Gλ,N = r) → e−λ .
r!
To the author’s knowledge, this conjecture first appears in the work of Gallagher [45],
who in fact proved that it would follow from a suitably uniform version of the famous
Hardy-Littlewood k-tuple conjecture. (Interestingly, the same assumption would imply
also a generalization of Conjecture 7.3.1 where one considers suitably normalized gaps
between simultaneous prime values of a family of polynomials, e.g., between twin primes;
see [73], where Gallagher’s argument is presented in a probabilistic manner very much
in the style of this book).
Part of the interest of Conjecture 7.3.1 is that the distribution obtained for the gaps is
exactly what one expects from “purely random” sets (see the discussion by Feller in [37,
I.3, I.4]).
7.4. Cohen-Lenstra heuristics
In this section, we will assume some basic knowledge concerning algebraic number
theory. We refer, for instance, to the book [58] of Ireland and Rosen for an elementary
introduction to this subject, in particular to [58, Ch. 12], and to the book [91] of Neukirch
for a complete account.
Beginning with a famous paper of Cohen and Lenstra [25], there is by now an im-
pressive body of work concerning the limiting behavior of certain arithmetic measures
of a rather different nature than all those we have described up to now. For these, the
underlying arithmetic objects are families of number fields of certain kinds, and the ran-
dom variables of interest are given by the ideal class groups of the number fields, or some
invariants of the ideal class groups, such as their p-primary subgroups (recall that, as a
finite abelian group, the ideal class group C of a number field K can be represented as a
direct product of groups of order a power of p, which are zero for all but finitely many
p).
The basic idea of Cohen and Lenstra is that the ideal class groups, in suitable families,
should behave (in general) in such a way that a given finite abelian group C appears as
116
an ideal class group with “probability” proportional to the inverse 1/ Aut(C) of the order
of the automorphism group of C, so that for instance, obtaining a group of order p2 of
the form Z/pZ × Z/pZ, with automorphism group of size about p4 , is much more unlikely
than obtaining the cyclic group Z/p2 Z, which has automorphism group of size p2 − p.
Imaginary quadratic fields provide a first basic (and still very open!) special case.
Using our way of presenting probabilistic number theory, one could define the finite
probability spaces ΩD of negative “fundamental discriminants” −d (that is, either −d
is a squarefree integer congruent to 3 modulo 4, or −d = 4δ where δ is squarefree and
congruent to 1 or 2 modulo 4) with 1 6 d 6 D and the uniform probability measure, and
one would define for each D and each prime p a random variable Pp,D taking values in
the set Ap of isomorphism classes of finite abelian
√ groups of order a power of p, such that
Pp,D (−d) is the p-part of the class group of Q( −d). One of the conjectures (“heuristics”)
of Cohen and Lenstra is that if p > 3, then Pp,D should converge in law as D → +∞ to
the probability measure µp on Ap such that
1 1
µp (A) =
Zp | Aut(A)|
for any group A ∈ Ap , where Zp is the constant required to make the measure thus defined
a probability measure (the existence of this measure – in other words, the convergence of
the series defining Zp – is something that of course requires a proof).
Very few unconditional results are known towards these conjectures, and progress
often requires significant ideas. There has however been striking progress by Ellenberg,
Venkatesh and Westerland [32] in some analogue problems for quadratic extensions of
polynomial rings over finite fields, where geometric methods make the problem more
accessible, and in fact allow the use of essentially topological ideas (see the Bourbaki
report [98] of O. Randal-Williams).

7.5. Ratner theory


Although all the results that we have described up to now are beautiful and important,
maybe the most remarkably versatile tool that can be considered to lie within our chosen
context is Ratner theory, named after the fundamental work of M. Ratner [99]. We lack
the expertise to present anything more than a few selected statements of applications of
this theory; we refer to the survey of É. Ghys [47] and to the book of Morris [89] for an
introduction (Section 1.4 of that book lists more applications of Ratner Theory), and to
that of Einsiedler and Ward [30] for background results on ergodic theory and dynamical
systems (some of which also have remarkable applications in number theory).
We illustrate the remarkable power of this theory with the beautiful result of Elkies
and McMullen [31] which was√already mentioned in Section 7.1. We consider the se-
quence of fractional parts of n for n > 1 (viewed as elements of R/Z). As in the
previous section, for any integer N
√ > 1, we define the space ΩN to be the set of connected
components of R/Z {h1i, . . . , h Ni}, with uniform probability measure, and we define
random variables on ΩN by
GN (I) = N length(I).
Elkies and McMullen found the limiting distribution of GN as N → +∞. It is a very
non-generic probability measure on R!
Theorem 7.5.1 (Elkies–McMullen). As N → +∞, the random variables GN converge
in law to a random variable on [0, +∞[ with probability law µEM = π62 f (x)dx, where f is
117
continuous, analytic on the intervals [0, 1/2], [1/2, 2] and [2, +∞[, is not of class C3 , and
satisfies f (x) = 1 if 0 6 x 6 1/2.
This is [31, Th. 1.1]. The restriction of the density f to the two intervals [1/2, 2] and
[2, +∞[ can be written down explicitly and it is an “elementary” function. For instance,
if 1/2 6 x 6 2, then let r = 21 x−1 and
 2r − 1   1 
ψ(r) = arctan √ − arctan √ ;
4r − 1 4r − 1
we then have
2
f (x) = (4r − 1)3/2 ψ(r) + (1 − 6r) log r + 2r − 1
3
(see [31, (3.53)]).
We give the barest outline of the proof, in order to simply point out what kind of
results are meant by Ratner Theory. The paper of Elkies and McMullen also gives a
detailed and highly readable introduction to this area.
The proof studies the gap distribution by means of the function LN defined for x ∈
R/Z so that LN (x) is the measure of the gap interval containing
√ x√(with LN (x) = 0 if
x is one of the boundary points of the gap intervals for h 1i, . . . , h Ni). We can then
check that for t ∈ R, the total measure in R/Z of the points lying in a gap interval of
length < t, which is equal to the Lebesgue measure
µ({x ∈ R/Z | LN (x) < t}),
is given by Z t Z t
td(PN (GN < t)) = t PN (GN < t) − PN (GN < t)dt.
0 0
Concretely, this means that it is enough to understand the limiting behavior of LN in
order to understand the limit gap distribution. Note that there is nothing special about
the specific sequence considered in that part of the argument.
Fix t > 0. The key insight that leads to questions involving Ratner theory is that if
N is a square of an integer, then the probability
µ({x ∈ R/Z | LN (x) < t})
can be shown (asymptotically as N → +∞) to be very close to the probability that a
certain affine lattice ΛN,x in R2 intersects the triangle ∆t with vertices (0, 0), (1, 0) and
(0, 2t) (with area t). The lattice has the form ΛN,x = gN,x · Z2 , for some (fairly explicit)
affine transformation gN,t .
Let ASL2 (R) be the group of affine transformations
z 7→ z0 + g(z)
of R2 whose linear part g ∈ GL2 (R) has determinant 1, and ASL2 (Z) the subgroup of
those affine transformations of determinant 1 where both the translation term z0 and the
linear part have coefficients in Z. Then the lattices ΛN,x can be interpreted as elements
of the quotient space
M = ASL2 (Z)\ ASL2 (R)
which parameterizes affine lattices Λ ⊂ R2 with R2 /Λ of area 1. This space admits a
unique probability measure µ e that is invariant under the right action of ASL2 (R) by
multiplication.
Now we have, for each N > 1, a probability measure µN on M, namely the law of the
random variable R/Z → M defined by x 7→ ΛN,x . What Ratner Theory provides is a very
118
powerful set of tools to prove that certain probability measures on M (or on similar spaces
constructed with groups more general than ASL2 (R) and suitable quotients) are equal to
the canonical measure µ e. This is applied, essentially, to all possible limits of subsequences
of (µN ), to show that these must coincide with µ e, which leads to the conclusion that the
whole sequence converges in law to µ e. It then follows that
µ({x ∈ R/Z | LN (x) < t}) → µ
e({Λ ∈ M | M ∩ ∆t 6= ∅}).
The gives, in principle, an explicit form of the gap distribution. To compute it exactly is
an “exercise” in euclidean geometry – which is by no means easy!
7.6. And even more...
And there are even more interactions between probability theory and number theory
than what our point of view considers... Here are some examples, which we order, roughly
speaking, in terms of how close they are from the perspective of this book:
• Applications of limit theorems for arithmetic probability measures to other prob-
lems of analytic number theory: we have given a few examples in exercises (see
Exercise 2.3.5, or Exercise 3.3.4), but there are many more of course.
• Using probabilistic ideas to model arithmetic objects, and make conjectures or
prove theorems concerning those; in contrast we our point of view, it is not always
expected in such cases that there should exist actual limit theorems comparing
the model with the actual arithmetic phenomena. A typical example is the co-
called “Cramér model” for the distribution of primes, which is know to lead to
wrong conclusions in some cases, but is often close enough to the truth to be
used to suggest how certain problems might behave (see for instance the survey
of Pintz [94]).
• Using number theoretic ideas to derandomize certain constructions or algorithms.
There are indeed a number of very interesting results that use the “randomness”
of specific arithmetic objects to give deterministic constructions, or deterministic
proofs of existence, for mathematical objects that might have first been shown
to exist using probabilistic ideas. Examples include the construction of expander
graphs by Margulis (see, e.g., [74, §4.4]), or of Ramanujan graphs by Lubotzky,
Phillips and Sarnak [84], or in different vein, the construction of explicit “ul-
traflat” trigonometric polynomials (in the sense of Kahane) by Bombieri and
Bourgain [13], or the construction of explicit functions modulo a prime with
smallest possible Gowers norms by Fouvry, Kowalski and Michel [42].

119
APPENDIX A

Analysis

In Chapters 3 and 4, we use a number of facts of analysis, and especially complex anal-
ysis, which are not necessarily included in most introductory graduate courses. We review
them here, and give some details of the proofs (when they are sufficiently elementary and
enlightening) or detailed references.

A.1. Summation by parts


Analytic number theory makes very frequent use of “summation by parts”, which is
a discrete form of integration by parts. We state the version that we use.
Lemma A.1.1 (Summation by parts). Let (an )n>1 be a sequence of complex number
and f : [0, +∞[→ C a function of class C1 . For all x > 0, define
X
Ma (x) = an .
16n6x

For x > 0, we then have


X Z x
(A.1) an f (n) = Ma (x)f (x) − Ma (t)f 0 (t)dt.
16n6x 1

If Ma (x)f (x) tends to 0 as x → +∞, then we have


X Z +∞
an f (n) = − Ma (t)f 0 (t)dt,
n>1 1

provided either the series or the integral converges absolutely, in which case both of them
do.
Using this formula, one can exploit known information (upper bounds or asymptotic
formulas) concerning the summation function Ma , typically when the sequence (an ) is
irregular, in order to understand the summation function for an f (n) for many sufficiently
regular functions f .
The reader should attempt to write a proof of this lemma, but we give the details for
completeness.

Proof. Let N > 0 be the integer such that N 6 x < N + 1. We have


X X
an f (n) = an f (n).
16n6x 16n6N

By the usual integration by parts formula, we then note that


Z x Z N
0
Ma (x)f (x) − Ma (t)f (t)dt = Ma (N)f (N) − Ma (t)f 0 (t)dt
1 1
120
(because Ma is constant on the interval N 6 t 6 x). We therefore reduce to the case x =
N. We then have
X X
an f (n) = (Ma (n) − Ma (n − 1))f (n)
n6N 16n6N
X X
= Ma (n)f (n) − Ma (n)f (n + 1)
16n6N 06n6N−1
X
= Ma (N)f (N) + Ma (n)(f (n) − f (n + 1))
16n6N−1
X Z n+1
= Ma (N)f (N) − Ma (n) f 0 (t)dt
16n6N−1 n
Z N
= Ma (N)f (N) − Ma (t)f 0 (t)dt,
1
which concludes the first part of the lemma. The last assertion follows immediately by
letting x → +∞, in view of the assumption on the limit of Ma (x)f (x). 

A.2. The logarithm


In Chapters 3 and 4, we sometimes use the logarithm for complex numbers. Since
this is not a globally defined function on C× , we clarify here what we mean.
Definition A.2.1. Let z ∈ C be a complex number with |z| < 1. We define
X zk
log(1 − z) = − .
k>1
k

Proposition A.2.2. (1) For any complex number z such that |z| < 1, we have
elog(1−z) = 1 − z.
(2) Let (zn )n>1 be a sequence of complex numbers such that |zn | < 1. If
X
|zn | < +∞,
n
then Y X 
(1 − zn ) = exp log(1 − zn ) .
n>1 n>1
1
(3) For |z| 6 2
, we have | log(1 − z)| 6 2|z|.
Proof. Part (1) is standard since the series used in the definition is the Taylor
series of the logarithm around 1 (evaluated at −z), and this power series has radius of
convergence 1.
Part (2) is then simply a consequence of the continuity of the exponential, and the
fact that the product is convergent under the assumption on (zn )n>1 .
For (3), we note that for |z| < 1, we have
 z z2 z k−1 
log(1 − z) = −z 1 + + + ··· − + ···
2 3 k
so that if |z| 6 21 , we get
 1 1 1 
| log(1 − z)| 6 |z| 1 + + + · · · + k−1 + · · · 6 2|z|.
4 8 2
121


A.3. Mellin transform


The Mellin transform is a multiplicative analogue of the Fourier transform, to which
it can indeed in principle be reduced. We consider it only in simple cases. Let
ϕ : [0, +∞[−→ C
be a continuous function that decays faster than any polynomial at infinity (for instance,
a function with compact support). Then the Mellin transform ϕ̂ of ϕ is the holomorphic
function defined by the integral
Z +∞
dx
ϕ̂(s) = ϕ(x)xs ,
0 x
for all those s ∈ C for which the integral makes sense, which under our assumption
includes all complex numbers with Re(s) > 0.
The basic properties of the Mellin transform that are relevant for us are summarized
in the next proposition:
Proposition A.3.1. Let ϕ : [0, +∞[−→ C be smooth and assume that ϕ and all its
derivatives decay faster than any polynomial at infinity.
(1) The Mellin transform ϕ̂ extends to a meromorphic function on Re(s) > −1, with
at most a simple pole at s = 0 with residue ϕ(0).
(2) For any real numbers −1 < A < B, the Mellin transform has rapid decay in the
strip A 6 Re(s) 6 B, in the sense that for any integer k > 1, there exists a constant
Ck > 0 such that
|ϕ̂(s)| 6 Ck (1 + |t|)−k
for all s = σ + it with A 6 σ 6 B and |t| > 1.
(3) For any σ > 0 and any x > 0, we have the Mellin inversion formula
Z
1
ϕ(x) = ϕ̂(s)x−s ds.
2iπ (σ)
R
In the last formula, the notation (σ) (· · · )ds refers to an integral over the vertical line
Re(s) = σ, oriented upward.
Proof. (1) We integrate by parts in the definition of ϕ̂(s) for Re(s) > 0, and obtain
xs i+∞ 1 +∞ 0 1 +∞ 0
Z Z
s+1 dx dx
h
ϕ̂(s) = ϕ(x) − ϕ (x)x =− ϕ (x)xs+1
s 0 s 0 x s 0 x
since ϕ and ϕ0 decay faster than any polynomial at ∞. It follows that ψ(s) = sϕ̂(s) is
holomorphic for Re(s) > −1, and hence that ϕ̂(s) is meromorphic in this region. Since
Z +∞
lim sϕ̂(s) = ψ(0) = − ϕ0 (x)dx = ϕ(0),
s→0 0

it follows that there is at most a simple pole with residue ϕ(0) at s = 0.


(2) Iterating the integration by parts k > 2 times, we obtain for Re(s) > −1 the
relation Z +∞
(−1)k dx
ϕ̂(s) = ϕ(k) (x)xs+k .
s(s + 1) · · · (s + k − 1) 0 x
122
Hence for A 6 σ 6 B and |t| > 1 we obtain the bound
Z +∞
1 dx 1
|ϕ̂(s)|  k
|ϕ(k) (x)|xB+k  .
(1 + |t|) 0 x (1 + |t|)k
(3) We interpret ϕ̂(s), for s = σ + it with σ > 0 fixed, as a Fourier transform:, by
means of the change of variable x = ey : we have
Z +∞ Z
σ it dx
ϕ̂(s) = ϕ(x)x x = ϕ(ey )eσy eiyt dy,
0 x R
which shows that t 7→ ϕ̂(σ + it) is the Fourier transform (with the above normalization)
of the function g(y) = ϕ(ey )eσy . Note that g is smooth, and tends to zero very rapidly
at infinity (for y → −∞, this is because ϕ is bounded close to 0, but eσy then tends
exponentially fast to 0). Therefore the Fourier inversion formula holds, and for any
y ∈ R, we obtain Z
1
y σy
ϕ(e )e = ϕ̂(σ + it)e−ity dt.
2π R
Putting x = ey , this translates to
Z Z
1 −σ−it 1
ϕ(x) = ϕ̂(σ + it)x dt = ϕ̂(s)x−s ds.
2π R 2iπ (σ)

One of the most important functions of analysis is classically defined as a Mellin
transform. This is the Gamma function of Euler, which is essentially the Mellin transform
of the exponential function, or more precisely of exp(−x). In other words, we have
Z +∞
dx
Γ(s) = e−x xs
0 x
for all complex numbers s such that Re(s) > 0. Proposition A.3.1 shows that Γ extends
to a meromorphic function on Re(s) > −1, with a simple pole at s = 0 with residue 1.
In fact, much more is true:
Proposition A.3.2. The function Γ(s) extends to a meromorphic function on C with
only simple poles at s = −k for all integers k > 0, with residue (−1)k /k!. It satisfies
Γ(s + 1) = sΓ(s)
for all s ∈ C, with the obvious meaning if s or s + 1 is a pole, and in particular we have
Γ(n) = (n − 1)!
for all integers n > 0.
Moreover, the function 1/Γ is entire.
Proof. It suffices to prove that Γ(s + 1) = sΓ(s) for Re(s) > 0. Indeed, this formula
proves, by induction on k > 1, that Γ has an analytic continuation to Re(s) > −k, with
a simple pole at −k + 1, where the residue r−k+1 satisfies
r−k+2
r−k+1 = .
−k + 1
This easily gives every statement in the proposition. And the formula we want is just a
simple integration by parts away:
Z +∞ h i+∞ Z +∞
−x s −x s
Γ(s + 1) = e x dx = −e x +s e−x xs−1 dx = sΓ(s).
0 0 0
123
Since Γ is meromorphic, its inverse 1/Γ is also meromorphic; for the proof that 1/Γ
is in fact entire (i.e., that Γ(s) 6= 0 for s ∈ C), we refer to [116, p. 149] (it follows, e.g.,
from the formula
π
Γ(s)Γ(1 − s) = ,
sin(πs)
valid for all s ∈ C, since the known poles of Γ(1 − s) are compensated by those
of 1/ sin(πs)). 
An important feature of the Gamma function, which is often quite important, is
that its asymptotic behavior in very wide ranges of the complex plane is very clearly
understood. This is the so-called Stirling formula.
Proposition A.3.3. Let α > 0 be a real number and let Xα be either the set of s ∈ C
such that either Re(s) > α, or the set of s ∈ C such that | Im(s)| > α. We have
1 1
log Γ(s) = s log s − s − log s + log 2π + O(|s|−1 )
2 2
Γ0 (s) 1
= log s − + O(s−2 )
Γ(s) 2s
for any s ∈ Xα .
For a proof, see for instance [16, Ch. VII, Prop. 4].

A.4. Dirichlet series


We present in this section some of the basic analytic properties of series of the type
X
an n−s ,
n>1

where an ∈ C for n > 1. These are called Dirichlet series, and we refer to Titchmarsh’s
book [116, Ch. 9] for basic information about these functions. If an = 0 for n large
enough (so that there are only finitely many terms), the series converges of course for
all s, and the resulting function is called a Dirichlet polynomial.
Lemma A.4.1. Let (an )n>1 be a sequence of complex numbers. Let s0 ∈ C. If the
series X
an n−s0
n>1

converges, then the series


X
an n−s
n>1

converges uniformly on compact subsets of U = {s ∈ C | Re(s) > Re(s0 )}. In particular


the function
X
f (s) = an n−s
n>1

is holomorphic on U.
Sketch of proof. We may assume (by considering an n−s0 instead of an ) that s0 =
0. For any integers N < M, let
sN,M = aN + · · · + aM .
124
By Cauchy’s criterion, we have sN,M → 0 as N, M → +∞. Suppose that σ = Re(s) > 0.
Let N < M be integers. By the elementary summation by parts formula (Lemma A.1.1),
we have X X
an n−s = aM M−s − ((n + 1)−s − n−s )sN,n .
N6n6M N6n<M
It is however also elementary that
n+1
|s| −σ
Z
−s −s
(A.2) |(n + 1) −n |= s x−s−1 dx 6 (n − (n + 1)−σ ).
n σ
Hence
X |s|  1 1 
an n−s 6 max |sN,n | σ − σ .
N6n6M
σ N6n6M N M
It therefore follows by Cauchy’s criterion that the Dirichlet series f (s) converges uniformly
in any region in C defined by the condition
|s|
6A
σ
for some A > 0. This includes, for a suitable value of A, any compact subset of the
half-plane {s ∈ C | σ > 0}. 
In general, the convergence is not absolute. We can see in this lemma a first instance
of a fairly general principle concerning Dirichlet series: if some particular property holds
for some s0 ∈ C (or for all s0 with some fixed real part), then it holds – or even a stronger
property holds – for any s with Re(s) > Re(s0 ).
This principle also applies in many cases to the possible analytic continuation of
Dirichlet series beyond the region of convergence. The next proposition is another exam-
ple, concerning the size of the Dirichlet series.
Proposition A.4.2. Let σ ∈ R be a real number and let (an )n>1 be a bounded sequence
of complex numbers such that the Dirichlet series
X
f (s) = an n−s
n>1

converges for Re(s) > σ. Then for any σ1 > σ, we have


|f (s)|  1 + |t|
uniformly for Re(s) > σ1 .
an converges by replacing (an ) by (an n−τ ) for some
P
Proof. We may assume that
τ > σ. The partial sums
s N = a1 + · · · + aN
are then bounded. Let s ∈ C be such that σ = Re(s) > 0. Then we have by partial
summation
N M N
X an X an X 1 1  sM sN
s
= s
+ s
− s
sn − s
+
n=1
n n=1
n n=M+1
n (n + 1) (M + 1) (N + 1)s
for any integers M 6 n. Letting N → +∞, as we may, we get
M
X an X  1 1  sM
f (s) = s
+ s
− s
sn − .
n=1
n n>M
n (n + 1) (M + 1)s
125
Applying (A.2), this leads to
M
X 1 |s| X  1 1  1
|f (s)|  σ
+ σ
− σ
+
n=1
n σ n>M n (n + 1) (M + 1)σ
 M + tM−σ + 1,
and the desired bounds follows by taking M = dIm(s)e (see also [116, 9.33]). 
In order to express in a practical manner a Dirichlet series outside of its region of
convergence, one can use smooth partial sums, which exploit harmonic analysis.
Proposition A.4.3. Let ϕ : [0, +∞[−→ [0, 1] be a smooth function with compact
support such that ϕ(0) = 1. Let ϕ̂ denote its Mellin transform. Let σ0 > 0 be given with
0 < σ0 < 1, and let (an )n>1 be any sequence of complex numbers with |an | 6 1 such that
the Dirichlet series X
an n−s
n>1

extends to a holomorphic function f (s) in the region Re(s) > σ0 with at most a simple
pole at s = 1 with residue c ∈ C, and such that furthermore the function f has polynomial
growth in vertical strips in this region.
For N > 1, define
X n
fN (s) = an ϕ n−s .
n>1
N
Let σ be a real number such that σ0 < σ < 1. Then we have
Z
1
f (s) − fN (s) = − f (s + w)Nw ϕ̂(w)dw − cN1−s ϕ̂(1 − s)
2iπ (−δ)
for any s = σ + it and any δ > 0 such that −δ + σ > σ0 .
It is of course possible that c = 0, which corresponds to a Dirichlet series that is
holomorphic for Re(s) > σ0 .
This result gives a convergent approximation of f (s), inside the strip Re(s) > σ1 ,
using the finite sums fN (s): The point is that |Nw | = N−δ , so that the polynomial growth
of f on vertical lines combined with the fast decay of the Mellin transform ϕ̂ show that
the integral on the right tends to 0 as N → +∞. Moreover, the shape of the formula
makes it very accessible to further manipulations, as done in Chapter 3.
Proof. Fix α > 1 such that the Dirichlet series f (s) converges absolutely for Re(s) =
α. By the Mellin inversion formula, followed by exchanging the order of the sum and
integral, we have
Z
X
−s 1
fN (s) = an n × Nw n−w ϕ̂(w)dw
n>1
2iπ (α)
Z X
1 
= an n−s−w Nw ϕ̂(w)dw
2iπ (α) n>1
Z
1
= f (s + w)Nw ϕ̂(w)dw,
2iπ (α)
where the absolute convergence justifies the exchange of sum and integral.
126
Now consider some T > 1, and some δ such that 0 < δ < 1. Let RT be the rectangle
in C with sides [α − iT, α + iT], [α + iT, −δ + iT], [−δ + iT, −δ − T], [−δ − iT, α − iT],
oriented counterclockwise. Inside this rectangle, the function
w 7→ f (s + w)Nw ϕ̂(w)
is meromorphic. It has a simple pole at w = 0, by our choice of δ and the properties
of the Mellin transform of ϕ given by Proposition A.3.1, and the residue at w = 0 is
ϕ(0)f (s) = f (s), again by Proposition A.3.1. If c 6= 0, it may also have a simple pole at
w = 1 − s, with residue equal to cN1−s ϕ̂(1 − s).
Cauchy’s theorem therefore implies that
Z
1
f (s + w)Nw ϕ̂(w)dw = f (s) + cN1−s ϕ̂(1 − s).
2iπ RT
Now we let T → +∞. Our assumptions imply that w 7→ f (s + w) has polynomial
growth on the strip −δ 6 Re(w) 6 α, and therefore the fast decay of ϕ̂ (Proposition A.3.1
again) shows that the contribution of the two horizontal segments to the integral along
RT tends to 0 as T → +∞. The integral over the vertical segment with real part α
converges to
Z
1
f (s + w)Nw ϕ̂(w)dw = fN (s).
2iπ (α)
Taking into account orientation, we deduce that
Z
1
f (s) − fN (s) = − f (s + w)Nw ϕ̂(w)dw − cN1−s ϕ̂(1 − s),
2iπ (−δ)
as claimed. 

We also recall the formula for the product of two Dirichlet series, which involves the
so-called Dirichlet convolution (see also Section C.1 for more properties and examples of
this operation).
Proposition A.4.4. Let (a(n))n>1 and (b(n))n>1 be sequences of complex numbers.
For any s ∈ C such that the Dirichlet series
X X
A(s) = a(n)n−s , B(s) = b(n)n−s ,
n>1 n>1

converge absolutely, we have


X
A(s)B(s) = c(n)n−s ,
n>1

where n
X
c(n) = a(d)b .
d
d|n
d>1

We will denote c(n) = (a ? b)(n), and often abbreviate the definition by writing
X n X
(a ? b)(n) = a(d)b , or (a ? b)(n) = a(d)b(e).
d de=n
d|n

127
Proof. Formally, this is quite clear:
X X 
−s −s
A(s)B(s) = a(n)n b(m)m
n>1 n>1
X X X 
= a(n)b(m)(nm)−s = k −s a(n)b(m) = C(s),
m,n>1 k>1 mn=k

and the assumptions are sufficient to allow us to rearrange the double series so that these
manipulations are valid. 

A.5. Density of certain sets of holomorphic functions


Let D be a non-empty open disc in C and D̄ its closure. We denote by H(D) the
Banach space of all continuous functions f : D̄ −→ C which are holomorphic in D, with
the norm
kf k∞ = sup |f (z)|.
z∈D̄
We also denote by C(K) the Banach space of continuous functions on a compact space
K, also with the norm
kf k∞ = sup |f (x)|
x∈K
(so that there is no risk of confusion if K = D and we apply this to a function that also
belongs to H(D)). We denote by C(K)0 the dual of C(K), namely the space of continuous
linear functionals C(K) −→ C. An element µ ∈ C(K)0 can also be interpreted as a
complex measure on K (by the Riesz–Markov Theorem, see e.g. [40, Th. 7.17]), and in
this interpretation one would write
Z
µ(f ) = f (x)dµ(x)
K

for f ∈ C(K).
Theorem A.5.1. Let D be as above. Let (fn )n>1 be a sequence of elements of H(D)
with X
kfn k2∞ < +∞.
n>1

Let X be the set of sequences (αn ) of complex numbers with |αn | = 1 such that the
series X
αn f n
n>1

converges in H(D).
Assume that X is not empty and that, for any continuous linear functional µ ∈ C(D̄)0
such that
X
(A.3) |µ(fn )| < +∞,
n>1

the Laplace transform of µ is identically 0. Then for any N > 1, the set of series
X
αn f n
n>N

for (αn ) in X is dense in H(D).


128
Here, the Laplace transform of µ is defined by
g(z) = µ(w 7→ ewz )
for z ∈ C. In the interpretation of µ as a complex measure, which can be viewed as a
complex measure on C that is supported on D̄, one would write
Z
g(z) = ewz dµ(w).
C

Proof. This result is proved, for instance, in [4, Lemma 5.2.9], except that only the
case N = 1 is considered. However, if the assumptions hold for (fn )n>1 , they hold equally
for (fn )n>N , hence the general case follows. 
We will use the last part of the following lemma as a criterion to establish that the
Laplace transform is zero in certain circumstances.
Lemma A.5.2. Let K be a complex subset of C and µ ∈ C(K)0 a continuous linear
functional. Let Z
g(z) = ewz dµ(z) = µ(w 7→ ewz )
be its Laplace transform.
(1) The function g is an entire function on C, i.e., it is holomorphic on C.
(2) We have
log |g(z)|
lim sup < +∞.
|z|→+∞ |z|
(3) If g 6= 0, then
log |g(r)|
lim sup > inf Re(z).
r→+∞ r z∈K

Proof. (1) Let z ∈ C be fixed. For h 6= 0, we have


g(z + h) − g(z)
= µ(fh )
h
where fh (w) = (ew(z+h) − ewz )/h. We have
fh (w) → wewz
as h → 0, and the convergence is uniform on K. Hence we get
g(z + h) − g(z)
−→ µ(w 7→ wewz ),
h
which shows that g is holomorphic at z with derivative µ(w 7→ wewz ). Since z is arbitrary,
this means that g is entire.
(2) We have
|g(z)| 6 kµk kw 7→ ewz k∞ 6 kµke|z|M
where M = supw∈K |w|, and therefore
log |g(z)|
lim sup 6 M < +∞.
|z|→+∞ |z|
(3) This is proved, for instance, in [4, Lemma 5.2.2], using relatively elementary
properties of entire functions satisfying growth conditions such as those in (2). 
Finally, we will use the following theorem of Bernstein, extending a result of Pólya.
129
Theorem A.5.3. Let g : C −→ C be an entire function such that
log |g(z)|
lim sup < +∞.
|z|→+∞ |z|
Let (rk ) be a sequence of positive real numbers, and let α, β be real numbers such that
(1) We have αβ < π;
(2) We have
log |g(iy)|
lim sup 6 α.
y∈R |y|
|y|→+∞
(3) We have |rk − rl |  |k − l| for all k, l > 1, and rk /k → β.
Then it follows that
log |g(rk )| log |g(r)|
lim sup = lim sup .
k→+∞ rk r→+∞ r
This is explained in Lemma [4, 5.2.3].
Example A.5.4. Taking g(z) = sin(πz), with α = 1, rn = nπ so that β = π, we see
that the first condition is best possible.
We also use a relatively elementary lemma due to Hurwitz on zeros of holomorphic
functions
Lemma A.5.5. Let D be a non-empty open disc in C. Let (fn ) be a sequence of
holomorphic functions in H(D). Assume fn converges to f in H(D). If fn (z) 6= 0 all
n > 1 and z ∈ D, then either f = 0 or f does not vanish on D.
Proof. We assume that f is not zero, and show that it has no zero in D. Let z0 ∈ D
be fixed, and let C be a circle of radius r > 0 centered at z0 such that C ⊂ D and
such that f has no zero, except possibly z0 , in the disc with boundary C. We have then
δ = inf z∈C |f (z)| > 0. For n large enough, we get
sup |f (z) − fn (z)| < δ,
z∈C

and then the relation f = f − fn + fn combined with Rouché’s Theorem (see, e.g., [116,
3.42]) shows that f has the same number of zeros as fn in the disc bounded by C. This
means that f has no zeros there, and in particular that f (z0 ) 6= 0. 

130
APPENDIX B

Probability

This Appendix summarizes the probabilistic notions that are most important in the
book. Although many readers will not need to be reminded of the basic definitions,
they might still refer to it to check some easy probabilistic statements whose proof we
have included here to avoid disrupting the arguments in the main part of the book. For
convergence in law, we will refer mostly to the book of Billingsley [10], and for random
series and similar topics, to that of Li and Queffélec [83].
B.1. The Riesz representation theorem
Let X be a locally compact topological space (such as Rd for d > 1). We recall
that Radon measures on X are certain measures for which compact subsets of X have
finite measure, and which satisfy some regularity property (the latter requirement being
unnecessary if any open set in X is a countable union of compact sets, as is the case of R
for instance).
The Riesz representation theorem interprets Radon measures in terms of the corre-
sponding integration functional. It can be taken as a definition (and it is indeed the
definition in Bourbaki’s theory of integration [17]); for a proof in the usual context where
measures are defined “set-theoretically”, see, e.g., [106, Th 2.14].
Theorem B.1.1. Let X be a locally compact topological space and Cc (X) the space of
compactly supported continuous functions on X. For any linear form λ : Cc (X) → C such
that λ(f ) > 0 if f > 0, there exists a unique Radon measure µ on X such that
Z
λ(f ) = f dµ
X
for all f ∈ Cc (X).
Let k > 1 be an integer. If X = Rk (or an open set in Rk ), then Radon measures can
be identified by the integration of much more regular functions. For instance, we have
the following (see, e.g., [28, Th. 3.18]):
Proposition B.1.2. Let C∞ k
c (R ) be the space of smooth compactly-supported func-
tions on R. For any linear form λ : C∞ k
c (R ) → C such that λ(f ) > 0 if f > 0, there
exists a unique Radon measure µ on Rk such that
Z
λ(f ) = f dµ
Rk
for all f ∈ C∞ k
c (R ).

Remark B.1.3. When applying either form of the Riesz representation theorem,
we may wish to identify whether the measure µ obtained from the linear form λ is a
probability measure on X or not. This is the case if and only
sup λ(f ) = 1,
f ∈Cc (X)
06f 61

131
(see, e.g., [17, Ch. 4, § 1, no 8]) where, in the setting of Proposition B.1.2, we may also
restrict f to be smooth.
Moreover, if a positive linear form λ : Cc (X) → C admits an extension to a linear
form λ : Cb (X) → C, where Cb (X) is the space of continuous and bounded functions on X,
which is still positive (so λ(f ) > 0 if f ∈ Cb (X) is non-negative), then the measure µ
associated to λ is a probability measure if and only if λ(1) = 1, where 1 on the left is the
constant function. (This is natural enough, but it is not entirely obvious; the underlying
reason is that the positivity implies that
|λ(f )| 6 kf k∞ λ(1)
where kf k∞ is the supremum norm for a bounded continuous function, so that λ is a
continuous linear form on the Banach space Cb (X).)

B.2. Support of a measure


Let M be a topological space. If M is either second countable (i.e., there is basis of
open sets that is countable) or compact, then any Borel measure µ on M has a well-defined
closed support, denoted supp(µ), which is characterized by either of the following prop-
erties: (1) it is the complement of the largest open set U, with respect to inclusion, such
that µ(U) = 0; or (2) it is the set of those x ∈ M such that, for any open neighborhood
U of x, we have µ(U) > 0.
If X is a random variable with values in M, we will say that the support of X is the
support of the law of X, which is a probability measure on M.
We need the following elementary property of the support of a measure:
Lemma B.2.1. Let M and N be topological spaces that are each either second countable
or compact. Let µ be a probability measure on M, and let f : M −→ N be a continuous
map. The support of f∗ (µ) is the closure of f (supp(µ)).
We recall that given a probability measure µ on M and a continuous map f : M → N,
the image measure f∗ (µ) is defined by
f∗ (µ)(A) = µ(f −1 (A))
for a measurable set A ⊂ N, and it satisfies
Z Z
ϕ(x)d(f∗ µ)(x) = ϕ(f (y))dµ(y)
N M

for ϕ > 0 and measurable, or ϕ ◦ f integrable with respect to µ.

Proof. First, if y = f (x) for some x ∈ supp(µ), and if U is an open neighborhood


of y, then we can find an open neighborhood V ⊂ M of x such that f (V) ⊂ U. Then
(f∗ µ)(U) > µ(V) > 0. This shows that y belongs to the support of f∗ µ. Since the support
is closed, we deduce that f (supp(µ)) ⊂ supp(f∗ µ).
For the converse, let y ∈ N be in the support of f∗ µ. For any open neighborhood U of
y, we have µ(f −1 (U)) = (f∗ µ)(U) > 0. This implies that f −1 (U) ∩ supp(µ) is not empty,
and since U is arbitrary, that y belongs to the closure of f (supp(µ)). 

Recall that a family (Xi )i∈I of random variables, each taking possibly values in a dif-
ferent metric space Mi , is independent
Q if, for any finite subset J ⊂ I, the joint distribution
of (Xj )j∈J is the measure on Mj which is the product measure of the laws of the Xj ’s.
132
Lemma B.2.2. Let X = (Xi )i∈I be a finite family of random variables with values in
a topological space M that is compact or second countable. Viewed as a random variable
taking values in MI , we have
Y
supp(X) = supp(Xi ).
i∈I
I
Q Proof. If x = (xi ) ∈ M , then an open neighborhood U of x contains a product set
Ui , where Ui is an open neighborhood of xi in M. Then we have
Y Y
P(X ∈ U) > P(X ∈ Ui ) = P(Xi ∈ Ui )
i i

by independence. If xi ∈ supp(Xi ) for each i, then this is > 0, and hence x ∈ supp(X).
Conversely, if x ∈ supp(X), then for any j ∈ I, and any open neighborhood U of xj ,
the set
V = {y = (yi )i∈I ∈ MI | yj ∈ U} ⊂ MI
is an open neighborhood of x. Hence we have P(X ∈ V) > 0, and since P(X ∈ V) =
P(Xi ∈ U), it follows that xj is in the support of Xj . 

B.3. Convergence in law


Let M be a metric space. We view it as given with the Borel σ-algebra generated
by open sets, and we denote by Cb (M) the Banach space of bounded complex-valued
continuous functions on M, with the norm
kf k∞ = sup |f (x)|.
x∈M

Given a sequence (µn ) of probability measures on M, and a probability measure µ on


M, one says that µn converges weakly to µ if and only if, for any bounded and continuous
function f : M −→ R, we have
Z Z
(B.1) f (x)dµn (x) −→ f (x)dµ(x).
M M
If (Ω, Σ, P) is a probability space and (Xn )n>1 is a sequence of M-valued random
variables, and if X is an M-valued random variable, then one says that (Xn ) converges in
law to X if and only if the measures Xn (P) converge weakly to X(P). If µ is a probability
measure on M, then we will also say that Xn converges to µ if the measures Xn (P)
converge weakly to µ.
The probabilistic versions of (B.1) in those cases is that
Z
(B.2) E(f (Xn )) −→ E(f (X)) = f dµ
M
for all functions f ∈ Cb (M).
Remark B.3.1. If M = R, convergence in law is often introduced in terms of the
distribution function FX (x) = P(X 6 x) of a real-valued random variable X. Precisely, it
is classical (see, e.g., [9, Th. 25.8]) that a a sequence of real-valued random variables (XN )
converges in law to a random variable X if and only if FXm (x) → FX (x) for all x ∈ R
such that FX is continuous at x (which is true for all but at most countably many x,
namely all x such that P(X = x) = 0).
The definition immediately implies the following very useful fact, which we state in
probabilistic language (we will refer to it as the composition principle).
133
Proposition B.3.2. Let M be a metric space. Let (Xn ) be a sequence of M-valued
random variables such that Xn converges in law to a random variable X. For any metric
space N and any continuous function ϕ : M → N, the N-valued random variables ϕ ◦ Xn
converge in law to ϕ ◦ X.
Proof. For any continuous and bounded function f : N −→ C, the composite f ◦ ϕ
is bounded and continuous on M, and therefore convergence in law implies that
E(f (ϕ(Xn ))) −→ E(f (ϕ(X))).
By definition, this formula, valid for all f , means that ϕ(Xn ) converges in law to ϕ(X). 
Checking the condition (B.2) for all f ∈ Cb (M) may be difficult. A number of con-
venient criteria, and properties, of convergence in law are related to weakening this re-
quirement to only certain “test functions” f , which may be more regular, or have special
properties. We will discuss some of these in the next sections.
One often important consequence of convergence in law is a simple relation with the
support of the limit of a sequence of random variables.
Lemma B.3.3. Let M be a second countable or compact topological space. Let (Xn ) be
a sequence of M-valued random variables, defined on some probability spaces Ωn . Assume
that (Xn ) converges in law to some random variable X, and let N ⊂ M be the support of
the law of X.
(1) For any x ∈ N and for any open neighborhood U of x, we have
lim inf P(Xn ∈ U) > 0,
n→+∞

and in particular there exists some n > 1 and some ω ∈ Ωn such that Xn (ω) ∈ U.
(2) For any x ∈ M not belonging to N, there exists an open neighborhood U of x such
that
lim sup P(Xn ∈ U) = 0.
n→+∞

Proof. For (1), a standard equivalent form of convergence in law is that, for any
open set U ⊂ M, we have
lim inf P(Xn ∈ U) > P(X ∈ U)
n→+∞

(see [10, Th. 2.1, (i) and (iv)]). If x ∈ N and U is an open neighborhood of x, then by
definition we have P(X ∈ U) > 0, and therefore
lim inf P(Xn ∈ U) > 0.
n→+∞

For (2), if x ∈ M is not in N, there exists an open neighborhood V of x such that


P(X ∈ V) = 0. For some δ > 0, this neighborhood contains the closed ball C of radius δ
around f , and by [10, Th. 2.1., (i) and (iii)], we have
0 6 lim sup P(Xn ∈ C) 6 P(X ∈ C) = 0,
p→+∞

hence the second assertion with U the open ball of radius δ. 


Another useful relation between support and convergence in law is the following:
Corollary B.3.4. Let M be a second countable or compact topological space. Let
(Xn ) be a sequence of M-valued random variables, defined on some probability spaces Ωn
134
such that (Xn ) converges in law to a random variable X. Let g be a continuous function
on M such that g(Xn ) converges in probability to zero, i.e., we have
lim Pn (|g(Xn )| > δ) = 0
n→+∞

for all δ > 0. The support of X is then contained in the zero set of g.
Proof. Let N be the support of X. Suppose that there exists x ∈ N such that |g(x)| =
δ > 0. Since the set of all y ∈ M such that |g(y)| > δ is an open neighborhood of x, we
have
lim inf P(|g(Xn )| > δ) > 0
n→+∞
by the previous lemma; this contradicts the assumption, which implies that
lim P(|g(Xn )| > δ) = 0.
n→+∞


Remark B.3.5. Another proof is that it is well-known (and elementary) that conver-
gence in probability implies convergence in law, so in the situation of the corollary, the
sequence (g(Xn )) converges to 0 in law. Since it also converges to g(X) by composition,
we have P(g(X) 6= 0) = 0, which precisely means that the support of X is contained in
the zero set of g.
We also recall an important definition that is a property of weak-compactness for a
family of probability measures (or random variables).
Definition B.3.6 (Tightness). Let M be a complete separable metric space. Let
(µi )i∈I be a family of probability measures on M. One says that (µi ) is tight if for any
ε > 0, there exists a compact subset K ⊂ M such that µi (K) > 1 − ε for all i ∈ I.
It is a non-obvious fact that a single probability measure on a complete separable
metric space is tight (see [10, Th. 1.3]).
B.4. Perturbation and convergence in law
As we suggested in Section 1.2, we will often prove convergence in law of the sequences
of random variables that interest us by showing that they are obtained by “perturbation”
of other sequences that are more accessible. In this section, we explain how to handle
some of these perturbations.
A very useful tool for this purpose is the following property, which is a first example of
reducing the proof of convergence in law to more regular test functions than all bounded
continuous functions.
Let M be a metric space, with distance d. Recall that a continuous function f : M → C
is said to be a Lipschitz function if there exists a real number C > 0 such that
|f (x) − f (y)| 6 Cd(x, y)
for all (x, y) ∈ M × M. We then say that C is a Lipschitz constant for f (it is, of course,
not unique).
Proposition B.4.1. Let M be a complete separable metric space. Let (Xn ) be a
sequence of M-valued random variables, and µ a probability measure on M. Then Xn
converges in law to µ if and only if we have
Z
E(f (Xn )) → f (x)dµ(x)
M
for all bounded Lipschitz functions f : M −→ C.
135
In other words, it is enough to prove the convergence property (B.2) for Lipschitz test
functions.
Proof. A classical argument shows that convergence in law of (Xn ) to µ is equivalent
to
(B.3) µ(F) > lim sup P(Xn ∈ F)
n→+∞

for all closed subsets F of M (see, e.g., [10, Th. 2.1, (iii)]).
However, the proof that convergence in law implies this property uses only Lipschitz
test functions f (see for instance [10, (ii)⇒(iii), p. 16, and (1.1), p. 8], where it is
only stated that the relevant functions f are uniformly continuous, but this is shown by
checking that they are Lipschitz). Hence the assumption that (B.2) holds for Lipschitz
functions implies (B.3) for all closed subsets F, and consequently it implies convergence
in law. 
We can now deduce various corollaries concerning perturbation of sequences that
converge in law.
The first result along these lines is quite standard, and the second is a bit more ad-hoc,
but will be convenient in Chapter 6.
Corollary B.4.2. Let M be a separable Banach space. Let (Xn ) and (Yn ) be se-
quences of M-valued random variables. Assume that the sequence (Xn ) converges in law
to a random variable X.
If the sequence (Yn ) converges in probability to 0, or if (Yn ) converges to 0 in Lp
for some fixed p > 1, with the possibility that p = +∞, then the sequence (Xn + Yn )n
converges in law to X in M.
Proof. Let f : M −→ C be a bounded Lipschitz continuous function, and C a Lips-
chitz constant of f . For any n, we have
(B.4) | E(f (Xn + Yn )) − E(f (X))| 6 | E(f (Xn + Yn )) − E(f (Xn ))|
+ | E(f (Xn )) − E(f (X))|.
p
First assume that (Yn ) converges to 0 in L , and that p < +∞. Then we obtain
| E(f (Xn + Yn )) − E(f (X))| 6 C E(|Yn |) + | E(f (Xn )) − E(f (X))|
6 C E(|Yn |p )1/p + | E(f (Xn )) − E(f (X))|
which converges to 0, hence (Xn + Yn ) converges in law to X. If p = +∞, a similar
argument, left to the reader, applies.
Suppose now that (Yn ) converges to 0 in probability. Let ε > 0. For n large enough,
the second term in (B.4) is 6 ε since Xn converges in law to X. For the first, we fix
another parameter δ > 0 and write
| E(f (Xn + Yn )) − E(f (Xn ))| 6 Cδ + 2kf k∞ P(|Yn | > δ)
by separating the integral depending on whether |Yn | 6 δ or not. Take δ = C−1 ε, so the
first term here is 6 ε. Then since (Yn ) converges in probability to 0, we have
2kf k∞ P(|Yn | > δ) 6 ε
for all n large enough, and therefore
| E(f (Xn + Yn )) − E(f (X))| 6 3ε
for all n large enough. The result now follows from Proposition B.4.1. 
136
Here is the second variant, where we do not attempt to optimize the assumptions.
Corollary B.4.3. Let m > 1 be an integer. Let (Xn ) and (Yn ) be sequences of
R -valued random variables, let (αn ) be a sequence in Rm and (βn ) a sequence of real
m

numbers. Assume
(1) The sequence (Xn ) converges in law to a random variable X, and kXn k is bounded
by a constant N > 0, independent of n.
(2) For all n, we have kYn k 6 βn .
(3) We have αn → (1, . . . , 1) and βn → 0 as n → +∞.
Then the sequence (αn · Xn + Yn )n converges in law to X in Rm , where here · denotes
the componentwise product of vectors.1
Proof. We begin as in the previous corollary. Let f : Rm −→ C be a bounded
Lipschitz continuous function, and C its Lipschitz constant. For any n, we now have
| E(f (αn · Xn + Yn )) − E(f (X))| 6 | E(f (αn · Xn + Yn )) − E(f (αn · Xn ))|
+ | E(f (αn · Xn )) − E(f (Xn ))| + | E(f (Xn )) − E(f (X))|.
The last term tends to 0 since Xn converges in law to X. The second is at most
Ckαn − (1, . . . , 1)k E(kXn k) 6 CNkαn − (1, . . . , 1)k → 0,
and the first is at most
C E(kYn k) 6 Cβn → 0.
The result now follows from Proposition B.4.1. 
The last instance of perturbations is slightly different. It amounts, in practice, to using
some “auxiliary parameter m” to approximate a sequence of random variables; when the
error in such an approximation is suitably small, and the approximations converge in law
for each fixed m, we obtain convergence in law.
Proposition B.4.4. Let M be a finite-dimensional Banach space. Let (Xn )n>1 and
(Xn,m )n>m>1 be M-valued random variables. Define En,m = Xn − Xn,m . Assume that
(1) For each m > 1, the random variables (Xn,m )n>m converge in law to a random
variable Ym .
(2) We have
lim lim sup E(kEn,m k) = 0.
m→+∞ n→+∞

Then the sequences (Xn ) and (Ym ) converge in law as n → +∞, and have the same
limit distribution.
The second assumption means in practice that
E(kEn,m k) 6 f (n, m)
where f (n, m) → 0 as m tends to +∞, uniformly for n > m.
A statement of this kind can be found also, for instance, in [10, Th. 3.2], but the
latter assumes that it is already known that (Ym ) converges in law.
Proof. We begin by proving that (Xn ) converges in law. Let f : M −→ R be a
bounded Lipschitz continuous function, and C a Lipschitz constant for f . For any n > 1
and any m 6 n, we have
| E(f (Xn )) − E(f (Xn,m ))| 6 C E(kEn,m k),
1 I.e., we have (a1 , . . . , am ) · (b1 , . . . , bm ) = (a1 b1 , . . . , am bm ).
137
hence
E(f (Xn,m )) − C E(kEn,m k) 6 E(f (Xn )) 6 E(f (Xn,m )) + C E(kEn,m k).
Fix first m > 1. By the first assumption, the expectations E(f (Xn,m )) converge
to E(f (Ym )) as n → +∞. Then these inequalities imply that we have
lim sup E(f (Xn )) − lim inf E(f (Xn )) 6 2C lim sup E(kEn,m k)
n→+∞ n→+∞ n→+∞

(because any limit of a convergent subsequence of E(f (Xn )) will lie in an interval of length
at most the right-hand side). Letting m go to infinity, the second assumption allows us
to conclude that
lim sup E(f (Xn )) − lim inf E(f (Xn )) = 0,
n→+∞ n→+∞

so that the sequence (E(f (Xn )))n>1 converges.


Now consider the map µ defined on bounded Lipschitz functions on M by
µ(f ) = lim E(f (Xn )).
n→+∞

It is elementary that µ is linear, and that it is positive (in the sense that if f > 0,
we have µ(f ) > 0) and satisfies µ(1) = 1. By the Riesz representation theorem (see
Proposition B.1.2 and Remark B.1.3, noting that a finite-dimensional Banach space is
locally compact), it follows that µ “is” a probability measure on M. It is then tau-
tological that (Xn ) converges in law to a random vector X with probability law µ by
Proposition B.4.1.
It remains to prove that the sequence (Ym ) also converges in law with limit X. We
again consider the Lipschitz function f , with Lipschitz constant C, and write
| E(f (Xn )) − E(f (Xn,m ))| 6 C E(kEn,m k).
For a fixed m, we let n → +∞. Since we have proved that (Xn ) converges to X, we
deduce by the first assumption that
| E(f (X)) − E(f (Ym ))| 6 C lim sup E(kEn,m k).
n→+∞

Since the right-hand side converges to 0 by the second assumption, we conclude that
E(f (Ym )) → E(f (X)),
and finally that (Ym ) converges to X. 
Remark B.4.5. If one knows that Yn converges in law, one also obtains convergence
in law (by a straightforward adaptation of the previous argument) if Assumption (2) of
the proposition is replaced by
(B.5) lim lim sup P(kEn,m k > δ) = 0
m→+∞ n→+∞

for any δ > 0; see again [10, Th. 3.2].


Remark B.4.6. Although we have stated all results in the case where the random
variables are defined on the same probability space, the proofs do not rely on this fact,
and the statements apply also if they are defined on spaces depending on n, with obvious
adaptations of the assumptions. For instance, in the last statement, we can take Xn and
Xn,m to be defined on a space Ωn (independent of m) and the second assumption means
that
lim lim sup En (|En,m |) = 0.
m→+∞ n→+∞

138
B.5. Convergence in law in a finite-dimensional vector space
We will use two important criteria for convergence in law for random variables with
values in a finite-dimensional real vector space V, which both amount to testing (B.1) for
a restricted set of functions. Another important criterion applies to variables with values
in a compact topological group, and is reviewed below in Section B.6.
The first result is valid in all cases, and is based on the Fourier transform. Given an
integer m > 1 and a probability measure µ on Rm , recall that the characteristic function
(or Fourier transform) of µ is the function
ϕµ : Rm −→ C
defined by Z
ϕµ (t) = eit·x dµ(x),
Rm
where t · x = t1 x1 + · · · + tm xm is the standard inner-product. This is a continuous
bounded function on Rm . For a random vector X with values in Rm , we denote by ϕX
the characteristic function of X(P), namely
ϕX (t) = E(eit·X ).
We state two (obviously equivalent) versions of P. Lévy’s theorem for convenience:
Theorem B.5.1 (Lévy Criterion). Let m > 1 be an integer.
(1) Let (µn ) be a sequence of probability measures on Rm , and let µ be a probability
measure on Rm . Then (µn ) converges weakly to µ if and only if, for any t ∈ Rm , we have
ϕµn (t) −→ ϕµ (t)
as n → +∞.
(2) Let (Ω, Σ, P) be a probability space. Let (Xn )n>1 be Rm -valued random vectors on
Ω, and let X be an Rm -valued random vector. Then (Xn ) converges in law to X if and
only if, for all t ∈ Rm , we have
E(eit·Xn ) −→ E(eit·X ).
For a proof, see e.g. [9, Th. 26.3] in the case m = 1.
Remark B.5.2. In fact, the precise version of Lévy’s Theorem does not require to
know in advance the limit of the sequence: if a sequence (µn ) of probability measures is
such that, for all t ∈ Rm , we have
ϕµn (t) −→ ϕ(t)
for some function ϕ, and if ϕ is continuous at 0, then one can show that ϕ is the
characteristic function of a probability measure µ (and hence that µn converges weakly
to µ); see for instance [9, p. 350, cor. 1]. So, for instance, it is not necessary to know
2
beforehand that ϕ(t) = e−t /2 is the characteristic function of a probability measure in
order to prove the Central Limit Theorem using Lévy’s Criterion.
Lemma B.5.3. Let m > 1 be an integer. Let (Xn )n>1 be a sequence of random variables
with values in Rm on some probability space. Let (βn ) be a sequence of positive real
numbers such that βn → 0 as n → +∞. If (Xn ) converges in law to an Rm -valued
random variable X, then for any sequence (Yn ) of Rm -valued random variables such that
kXn − Yn k∞ 6 βn for all n > 1, the random variables Yn converge to X.
139
Proof. We use Lévy’s criterion. We fix t ∈ Rm and write
E(eit·Yn ) − E(eit·X ) = E(eit·Yn − eit·Xn ) + E(eit·Xn − eit·X ).
By Lévy’s Theorem and our assumption on the convergence of the sequence (Xn ), the
second term on the right converges to 0 as n → +∞. For the first, we can simply apply
the dominated convergence theorem to derive the same conclusion: we have
kXn − Yn k∞ 6 βn → 0
hence  
eit·Yn − eit·Xn = eit·Yn 1 − eit·(Xn −Yn ) → 0
(pointwise) as n → +∞. Moreover, we have

eit·Yn − eit·Xn 6 2

for all n > 1. Hence the dominated convergence theorem implies that the expectation
E(eit·Yn − eit·Xn ) converges to 0.
Lévy’s Theorem applied once more allows us to conclude that (Yn ) converges in law
to X, as claimed. 
The second convergence criterion is known as the method of moments. It is more
restrictive than Lévy’s criterion, but is sometimes analytically more flexible, especially
because it is often more manageable when there is no independence assumptions.
Definition B.5.4 (Mild measure). Let µ be a probability measure on Rm . We will
say that µ is mild 2 if the absolute moments
Z
a
Mk (µ) = |x1 |k1 · · · |xm |km dµ(x1 , . . . , xm )
Rm

exist for all tuples of non-negative integers k = (k1 , . . . , km ), and if there exists δ > 0
such that the power series
XX z k1 · · · zm
km
Mak (µ) 1
k >0
k1 ! · · · km !
i

converges in the region


{(z1 , . . . , zm ) ∈ Cm | |zi | 6 δ}.
If a measure µ is mild, then it follows in particular that the moments
Z
Mk (µ) = xk11 · · · xkmm dµ(x1 , . . . , xm )
Rm

exist for all k = (k1 , . . . , km ) with ki non-negative integers.


If X is a random variable, we will say as usual that a random vector X = (X1 , . . . , Xm )
is mild if its law X(P) is mild. The moments and absolute moments are then
Mk (X) = E(Xk11 · · · Xkmm ), Mak (X) = E(|X1 |k1 · · · |Xm |km ).
We again give two versions of the method of moments for weak convergence when the
limit is mild:
2There doesn’t seem to be an especially standard name for this notion.
140
Theorem B.5.5 (Method of moments). Let m > 1 be an integer.
(1) Let (µn ) be a sequence of probability measures on Rm such that all moments
Mk (µn ) exist, and let µ be a probability measure on Rm . Assume that µ is mild. Then
(µn ) converges weakly to µ if for any m-tuple k of non-negative integers, we have
Mk (µn ) −→ Mk (µ)
as n → +∞.
(2) Let (Ω, Σ, P) be a probability space. Let (Xn )n>1 be Rm -valued random vectors
on Ω such that all moments Mk (Xn ) exist, and let Y be an Rm -valued random vector.
Assume that Y is mild. Then (Xn ) converges in law to Y if for any m-tuple k of non-
negative integers, we have
E(Xkn,1
1
· · · Xkn,m
m
) −→ E(Y1k1 · · · Ynkm ).
For a proof (in the case m = 1), see for instance [9, Th. 30.2 and Th. 30.1].
This only gives one implication in comparison with the Lévy Criterion. It is often
useful to have a converse, and here is one such statement:
Theorem B.5.6 (Converse of the method of moments). Let (Ω, Σ, P) be a probability
space.
Let m > 1 be an integer and let (Xn )n>1 be a sequence of Rm -valued random vectors
on Ω such that all moments Mk (Xn ) exist, and such that there exist constants ck > 0 with
(B.6) E(|Xn,1 |k1 · · · |Xn,m |km ) 6 ck
for all n > 1. Assume that Xn converges in law to a random vector Y. Then Y is mild
and for any m-tuple k of non-negative integers, we have
E(Xkn,1
1
· · · Xkn,m
m
) −→ E(Y1k1 · · · Ynkm ).
Proof. See [9, Th 25.12 and Cor.] for a proof (again for m = 1). 
Example B.5.7. This converse applies, in particular, if (Xn ) is a sequence of real-
valued random variables given by
B1 + · · · + B n
Xn =
σn
where the variables (Bi )i>1 are independent and satisfy
X n
2
E(Bi ) = 0, |Bi | 6 1, σn = V(Bi ) → +∞ as n → +∞.
i=1

Then the Central Limit Theorem (see below Theorem B.7.2) implies that the sequence
(Xn ) converges in law to a standard gaussian random variable Y. Moreover, Xn is bounded
(by n/σn ), so all its moments exist. We will check that this sequence satisfies the uniform
integrability condition (B.6), from which we deduce the convergence of moments
lim E(Xkn ) = E(Yk )
n→+∞

for all integers k > 0 (the moments of Y are described explicitly in Proposition B.7.3).
For any k > 0, there exists a constant Ck > 0 such that
|x|k 6 Ck (ex + e−x )
for all x ∈ R. It follows that if we can show that there exists D > 0 such that
(B.7) E(eXn ) 6 D, E(e−Xn ) 6 D
141
for all n > 1, then we obtain E(|Xn |k ) 6 2Ck D for all n, which gives the desired conclu-
sion. Note that, since Xn is bounded, these expectations make sense, and moreover we
may assume that we only consider n large enough so that σn > 1.
To prove (B.7), fix more generally t ∈ [−1, 1]. Since the (Bi ) are independent random
variables, we have
Yn   tB 
tXn i
E(e ) = E exp .
i=1
σn
Since we assumed that σn > 1 and |Bi | 6 1, we have |tBi /σn | 6 1, hence
 tB 
i tBi t2 B2i
exp 61+ + 2
σn σn σn
(because ex 6 1 + x + x2 for |x| 6 1, as can be checked using basic calculus). We then
obtain further n 
Y t2 
E(etXn ) 6 1 + 2 E(B2i )
i=1
σn
since E(Bi ) = 0. Using 1 + x 6 ex , this leads to
 t2 X n 
E(etXn ) 6 exp 2 E(B2i ) = exp(t2 ).
σn i=1
Applying this with t = 1 and t = −1, we get (B.7) with D = e, hence also (B.6), for all n
large enough.
Remark B.5.8. In the case m = 2, one often deals with random variables that are
naturally seen as complex-valued, instead of R2 -valued. In that case, it is sometimes
quite useful to use the complex moments
M̃k1 ,k2 (X) = E(Xk1 X̄k2 )
of a C-valued random variable instead of Mk1 ,k2 (X). The corresponding statements are
that X is mild if and only if the power series
XX z1k1 z2k2
M̃k1 ,k2 (X)
k ,k >0
k1 !k2 !
1 2

converges in a region
{(z1 , z2 ) ∈ C | |z1 | 6 δ, |z2 | 6 δ}
for some δ > 0, and that if X is mild, then (Xn ) converges weakly to X if
M̃k1 ,k2 (Xn ) −→ M̃k1 ,k2 (X)
for all k1 , k2 > 0. Example B.5.7 extends to the complex valued case.
Example B.5.9. (1) Any bounded random vector is mild. Indeed, if kXk∞ 6 B, say,
then we get
Mak (X) 6 Bk1 +···+km ,
and therefore
XX |z1 |k1 · · · |zm |km
Mak (µ) 6 eB|z1 |+···+B|zm | ,
k 1 ! · · · km !
k >0 i

so that the power series converges, in that case, for all z ∈ Cm .


(2) Any gaussian random vector is mild (see Section B.7).
(3) If X is mild, and Y is another random vector with |Yi | 6 |Xi | (almost surely) for
all i, then Y is also mild.
142
B.6. The Weyl criterion
One important special case of convergence in law is known as equidistribution in the
context of topological groups in particular. We only consider compact groups here for
simplicity. Let G be such a group. Then there exists on G a unique Borel probability
measure µG which is invariant under left (and right) translations: for any integrable
function f : G −→ C and for any fixed g ∈ G, we have
Z Z Z
f (gx)dµG (x) = f (xg)dµG (x) = f (x)dµG (x).
G G G

This measure is called the (probability) Haar measure on G (see, e.g., [17, VII, §1, n. 2,
th. 1 and prop. 2]).
If a G-valued random variable X is distributed according to µG , one says that X is
uniformly distributed on G.
Example B.6.1. (1) Let G = S1 be the multiplicative group of complex numbers
of modulus 1. This group is isomorphic to R/Z by the isomorphism θ 7→ e(θ). The
measure µG is then identified with the Lebesgue measure dθ on R/Z. In other words, for
any integrable function f : S1 → C, we have
Z Z Z 1
f (z)dµG (z) = f (e(θ))dθ = f (e(θ))dθ.
S1 R/Z 0

(2) If (Gi )i∈I is any family of compact groups, each with a probability Haar measure
µi , then the (possibly infinite) tensor product
O
µi
i∈I

is the probability Haar measure µ on the product G of the groups Gi . Probabilistically,


one would interpret this as saying that µ is the law of a family (Xi ) of independent random
variables, where each Xi is uniformly distributed on Gi .
(3) Let G be the non-abelian compact group SU2 (C), i.e.
n 
α β̄
o
G= | α, β ∈ C, |α|2 + |β|2 = 1 .
−β ᾱ
Writing α = a + ib, β = c + id, we can identify G, as a topological space, with the unit
3-sphere
{(a, b, c, d) ∈ R4 | a2 + b2 + c2 + d2 = 1}
in R4 . Then the left-multiplication by some element on G is the restriction of a rotation
of R4 . Hence the surface (Lebesgue) measure µ0 on the 3-sphere is a Borel invariant
measure on G. By uniqueness, we see that the probability Haar measure on G is
1
µ= µ0
2π 2
(since the surface area of the 3-sphere is 2π 2 ).
Consider now the trace Tr : G −→ R, which is given by (a, b, c, d) 7→ 2a in the
sphere coordinates. One can show that the direct image Tr∗ (µ) is the so-called Sato–Tate
measure r
1 x2
µST = 1 − dx,
π 4
143
supported on [−2, 2] (for probabilists, this is also a semi-circle law); equivalently if we
write the trace as
Tr(g) = 2 cos(θ),
for a unique θ ∈ [0, π], then this measure is identified with the measure
2
sin2 θdθ
π
on [0, π] (for a proof, see e.g. [18, Ch. 9, p. 58, exemple]). One obtains from either
description of µST the expectation and variance
Z Z
(B.8) tdµST = 0, t2 dµST = 1.
R R

(4) If G is a finite group then the probability Haar measure is just the normalized
counting measure: for any function f on G, the integral of f is
1 X
f (x).
|G| x∈G

For a topological group G, a unitary character χ of G is a continuous homomorphism


χ : G −→ S1 .
The trivial character is the character g 7→ 1 of G. The set of all characters of G is
denoted G.b It has a structure of abelian group by multiplication of functions. If G
is locally compact, then G b is a locally compact topological group with the topology of
uniform convergence on compact sets (for this theory, see, e.g., [19, Ch. 2]).
In general, G
b may be reduced to the trivial character (this is the case if G = SL2 (R) for
instance). Assume now that G is locally compact and abelian. Then it is a fundamental
fact (known as Pontryagin duality, see e.g. [70, §7.3] for a survey, or [19, II, p. 222, th.
2] for the details) that there are “many” characters, in a suitable sense. If G is compact,
then a simple version of this assertion is that G b is an orthonormal basis of the space
2
L (G, µ), where µ is the probability Haar measure on G.
For an integrable function f ∈ L1 (G, µ), its Fourier transform is the function fb :
b −→ C defined by
G
Z
f (χ) =
b f (x)χ(x)dµ(x)
G
b For a compact commutative group G, and f ∈ L2 (G, µ), we have
for all χ ∈ G.
X
f= fb(χ)χ,
χ∈G
b

as a series converging in L2 (G, µ). It follows easily that a function f ∈ L1 (G, µ) is almost
everywhere constant if and only if fb(χ) = 0 for all χ 6= 1.
The following relation is immediate from the invariance of Haar measure: for f inte-
grable and any fixed y ∈ G, if we let g(x) = f (xy), then g is well-defined as an integrable
function, and
Z Z
gb(χ) = f (xy)χ(x)dµ(x) = χ(y) f (x)χ(x)dµ(x) = χ(y)fb(y).
G G
144
Example B.6.2. (1) The characters of S1 are given by
z 7→ z m
for m ∈ Z. Equivalently, the characters of R/Z are given by x 7→ e(hx), where e(z) =
exp(2iπz). More generally, the characters of (R/Z)n are of the form
x = (x1 , . . . , xn ) 7→ e(h1 x1 + · · · + hn xn ) = e(h · x)
for some (unique) h ∈ Zn (see, e.g., [19, p. 236, cor. 3]).
(2) If (Gi )i∈I is any family of compact groups, each with the probability Haar measure
µi , then the characters of the product G of the Gi are given in a unique way as follows:
take a finite subset S of I, and for any i ∈ I, pick a non-trivial character χi of Gi , then
define Y
χ(x) = χi (xi )
i∈S

for any x = (xi )i∈I in G. Here, the trivial character corresponds to S = ∅. See, e.g., [70,
Example 5.6.10] for a proof.
In particular, if I is a finite set, this computation shows that the group of characters
of G is isomorphic to the product of the groups of characters of the Gi , and that the
isomorphism is such that a family (χi ) of characters of the groups Gi is mapped to the
character Y
(xi ) 7→ χi (xi ).
i∈I

(3) If G is a finite abelian group, then the group Gb of characters of G is also finite,
and it is isomorphic to G. This can be seen from the structure theorem for finite abelian
groups, which shows that any finite abelian group is a direct product of some finite cyclic
groups (see, e.g., [103, Th. B-3.13]) combined with the previous example and the explicit
computation of the dual group of a finite cyclic group Z/qZ for q > 1: an isomorphism
\ is given by sending a (mod q) to the character
from Z/qZ to Z/qZ
 ax 
x 7→ e ,
q
which is well-defined because replacing a and x by other integers congruent modulo q
does not change the value of e(ax/q).
In this case, one can also prove elementarily that the characters form an orthonormal
basis of the finite-dimensional vector space C(G) of complex-valued functions on G, which
in this case has the inner product
1 X
hf, gi = f (x)g(x).
|G| x∈G

Indeed, one can also reduce to the case of cyclic groups by checking that there is a
unique isomorphism
C(G1 ) ⊗ C(G2 ) → C(G1 × G2 )
such that a pure tensor f1 ⊗ f2 is mapped to the function (x1 , x2 ) 7→ f1 (x1 )f2 (x2 ). In
particular, the characters of G1 × G2 (which belong to C(G1 × G2 )) correspond under
this isomorphism to the pure tensors χ1 ⊗ χ2
In addition, under this isomorphism, we have
hf1 ⊗ f2 , g1 ⊗ g2 i = hf1 , g1 i hf2 , g2 i,
145
for any functions fi and gi on Gi . This implies that if the characters of G1 and those of G2
form orthornomal bases of their respective spaces of functions, then so do the characters
of G1 × G2 .
And in the case of G = Z/qZ, we can simply compute using the explicit description
of the characters χa : x 7→ e(ax/q) for a ∈ Z/qZ that
1 X  ax   bx 
hχa , χb i = e e −
q 06x6q−1 q q
which is equal to 1 if a = b, and otherwise is
1 − e(q(a − b)/q)
1 − e((a − b)/q)
by summing a finite geometric sum, and is therefore zero, as we wanted.
The Weyl Criterion is a criterion for a sequence of G-valued random variables to
converge in law to a uniformly distributed random variable on G. We state it for compact
abelian groups only:
Theorem B.6.3 (Weyl’s Criterion). Let G be a compact abelian group. A sequence
(Xn ) of G-valued random variables converges in law to a uniformly distributed random
variable on G if and only if, for any non-trivial character χ of G, we have
lim E(χ(Xn )) −→ 0.
n→+∞

Remark B.6.4. (1) Note that the orthogonality of characters implies that
Z
χ(x)dµG (x) = hχ, 1i = 0
G
for any non-trivial character χ of G. Hence the Weyl Criterion has the same flavor of
Lévy’s Criterion (note that, for any t ∈ Rm , the function x 7→ eix·t is a character of Rm ).
(2) If G is compact, but not necessarily abelian, there is a version of the Weyl Criterion
using as “test functions” the traces of irreducible finite-dimensional representations of G
(see [70, §5.5] for an account).
The best known example of application of the Weyl Criterion is to prove the following
equidistribution theorem of Kronecker:
Theorem B.6.5 (Kronecker). Let d > 1 be an integer. Let z be an element of Rd
e be the closure of the subgroup of (R/Z)d generated by the class of z
and let T (resp. T)
(resp. generated by the classes of the elements yz for y ∈ R).
(1) As N → +∞, the probability measures on (R/Z)d defined by
1 X
δnz
N 06n<N
converge in law to the probability Haar measure on T.
(2) Let λ denote the Lebesgue measure on R. As X → +∞, the probability measures
µX on (R/Z)d defined by
1
µX (A) = λ({x ∈ [0, X] | xz ∈ A})
X
for a measurable subset A of (R/Z)d , converge in law to the probability Haar measure
on T.
e
146
Proof. We only prove the “continuous” version in (2), since the first one is easier
(and better known). First note that each probability measure µX has support contained
in T
e by definition, so it can be viewed as a measure on T.
e
From the theory of compact abelian groups, we know that any character χ of T e can
be extended to a character of (R/Z)d (see, e.g., [19, p. 226, th. 4], applied to the exact
sequence 1 → T e → (R/Z)d ), which is therefore of the form
v 7→ e(n · v)
d
for some n ∈ Z (by Example B.6.2, (1)). We then have
1 X
Z Z
χ(v)dµX (v) = e((x n) · z)dx.
(R/Z)d X 0
e since the classes of yz for y ∈ R
Suppose that χ is a non-trivial character of T;
e we have then n · z 6= 0. Hence
generate a dense subgroup of T,
1 e((X n) · z) − 1
Z
χ(v)dµX (v) = →0
(R/Z)d X 2iπn · z
as X → +∞. We conclude by an application of the Weyl Criterion. 
Example B.6.6. In order to apply Theorem B.6.5 in practice, we need to identify
the subgroup T (or T). e The following special cases are quite often sufficient (writing z =
(z1 , . . . , zd ) ∈ Rd ):
(1) We have T = (R/Z)d if and only if (1, z1 , . . . , zd ) are Q-linearly independent;
(2) We have T e = (R/Z)d if and only if (z1 , . . . , zd ) are Q-linearly independent.
For instance, if d = 1, then the first condition means that z = z1 is irrational, and
the second means that z is non-zero.
We check (1), leaving (2) as an exercise. If (1, z1 , . . . , zd ) are not Q-linearly indepen-
dent, then multiplying a non-trivial linear dependency relation with a suitable non-zero
integer, we obtain a relation
Xd
m0 + mi zi = 0,
i=1
where mi ∈ Z, and not all mi are zero, in fact where not all mi with i > 1 are zero (since
this would also imply that m0 = 0). Then the class of nz modulo Zd is, for all n ∈ Z, an
element of the proper closed subgroup
{x = (x1 , . . . , xd ) ∈ (R/Z)d | m1 x1 + · · · + md xd = 0},
which implies that T is also contained in that subgroup, hence is not all of (R/Z)d .
Conversely, a simple argument is to check that if (1, z1 , . . . , zd ) are Q-linearly in-
dependent, then a direct application of the Weyl Criterion proves that the probability
measures
1 X
δnz
N 06n<N
converge in law to the probability Haar measure on (R/Z)d (because non-trivial characters
of this group correspond to (mi ) ∈ Zd , and the integral against the measure above is
N
1X
e((m1 z1 + · · · + md zd )n)
N n=1
147
where the real number m1 z1 + · · · + md zd is not an integer by the linear-independence,
so that the sum tends to 0 by summing a finite geometric series...)

B.7. Gaussian random variables


By definition, a random vector X with values in Rm is called a (centered) gaussian
vector if there exists a non-negative quadratic form Q on Rm such that the characteristic
function ϕX of X is of the form
ϕX (t) = e−Q(t)/2
for t ∈ Rm . The quadratic form can be recovered from X by the relation
X
Q(t1 , . . . , tm ) = ai,j ti tj ,
16i,j6m

with ai,j = E(Xi Xj ), and the (symmetric) matrix (ai,j )16i,j6m is called the correlation
matrix of X. The components Xi of X are independent if and only if ai,j = 0 if i 6= j, i.e.,
if and only if the components of X are orthogonal.
If X is a gaussian random vector, then X is mild, and in fact
X tk1 · · · tkmm
Mm (X) 1 = E(et·X ) = eQ(t)/2
k
k 1 ! · · · km !
for t ∈ Rm , so that the power series converges on all of Cm . The Laplace transform
ψX (z) = E(ez·X ) is also defined for all z ∈ Cm , and in fact
(B.9) E(ez·X ) = eQ(z)/2 .
For m = 1, this means that a random variable is a centered gaussian if and only if
there exists σ > 0 such that
2 t/2
(B.10) ϕX (t) = e−σ ,
and in fact we have
E(X2 ) = V(X) = σ 2 .
If σ = 1, then we say that X is a standard gaussian random variable (also sometimes
called a standard normal random variable). We then have
Z b
1 2
P(a < X < b) = √ e−x /2 dx
2π a
for all real numbers a < b.
Exercise B.7.1. We recall a standard proof of the fact that the measure on R given
by
1 2
µ = √ e−x /2 dx

is indeed a gaussian probability measure with variance 1.
(1) Define Z
1 2
ϕ(t) = ϕµ (t) = √ eitx−x /2 dx
2π R
for t ∈ R. Prove that ϕ is of class C on R and satisfies ϕ0 (t) = −tϕ(t) for all t ∈ R
1

and ϕ(0) = 1.
2
(2) Deduce that ϕ(t) = e−t /2 for all t ∈ R. [Hint: This is an elementary argument
with ordinary differential equations, but because the order is 1, one can define g(t) =
2
et /2 ϕ(t) and check by differentiation that g 0 (t) = 0 for all t ∈ R.]
148
We will use the following simple version of the Central Limit Theorem:
Theorem B.7.2. Let B > 0 be a fixed real number. Let (Xn ) be a sequence of inde-
pendent real-valued random variables with |Xn | 6 B for all n. Let
αn = E(Xn ), βn = V(X2n ).
Let σN > 0 be defined by
σN2 = β1 + · · · + βN
for N > 1. If σN → +∞ as n → +∞, then the random variables
(X1 − α1 ) + · · · + (XN − αN )
YN =
σN
converge in law to a standard gaussian random variable.
Proof. Although this is a very simple case of the general Central Limit Theorem for
sums of independent random variables (indeed, even of Lyapunov’s well-known version),
we give a proof using Lévy’s criterion for convenience. First of all, we may assume that
αn = 0 for all n by replacing Xn by Xn − αn (up to replacing B by 2B, since |αn | 6 B).
By independence of the variables (Xn ), the characteristic function ϕN of YN is given
by
Y
ϕN (t) = E(eitYN ) = E(eitXn /σN )
16n6N
for t ∈ R.
Fix t ∈ R. Since tXn /σN is bounded (because t is fixed), we have a Taylor expansion
around 0 of the form
itXn t2 X2n  |t|3 |X |3 
n
eitXn /σN = 1 + − + O ,
σN 2σN2 σN3
for 1 6 n 6 N. Consequently, we obtain
 t  1  t 2  |t| 3 
ϕ Xn = E(eitXn /σN ) = 1 − E(X2n ) + O E(|Xn |3 ) .
σN 2 σN σN
Observe that with our assumption, we have
E(|Xn |3 ) 6 B E(X2n ) = Bβn .
Moreover, for N large enough (depending on t, but t is fixed), the modulus of
1  t 2  |t| 3 
− E(X2n ) + O E(|Xn |3 )
2 σN σN
is less than 1, so that we can use Proposition A.2.2 and deduce that
N
X 
ϕN (t) = exp log E(eitXn /σN )
n=1
 t2 X N N
 B|t|3 X 
= exp − βn + O βn
2σN n=1 σN3 n=1
 t2  B|t|3 
= exp − + O −→ exp(−t2 /2)
2 σN
as N → +∞; we conclude then by Lévy’s Criterion and (B.10). 
149
If one uses directly the method of moments to get convergence in law to a gaussian
random variable, it is useful to know the values of their moments. We only state the
one-dimensional and the simplest complex case:
Proposition B.7.3. (1) Let X be a real-valued gaussian random variable with expec-
tation 0 and variance σ 2 . For k > 0, we have
(
0 if k is odd,
E(Xk ) = k k! k
σ 2k/2 (k/2)! = σ · (1 · 3 · · · · (k − 1)) if k is even.
(1) Let X be a complex-valued gaussian random variable with covariance matrix
 
σ 0
0 σ
for some σ > 0. For k > 0 and l > 0, we have
(
0 if k 6= l,
E(Xk X̄l ) =
σ k 2k k! if k = l.
Exercise B.7.4. Prove this proposition.

B.8. Subgaussian random variables


Gaussian random variables have many remarkable properties. It is a striking fact
that a number of these, especially with respect to integrability properties, are shared by
a much more general class of random variables.
Definition B.8.1 (Subgaussian random variable). Let σ > 0 be a real number. A
real-valued random variable X is σ 2 -subgaussian if we have
2 t2 /2
E(etX ) 6 eσ
for all t ∈ R. A complex-valued random variable X is σ 2 -subgaussian if X = Y + iZ with
Y and Z real-valued σ 2 -subgaussian random variables.
If X is a real σ 2 -subgaussian random variable, then we obtain immediately good
gaussian-type upper-bounds for the tail of the distribution: for any b > 0, using first a
general auxiliary parameter t > 0, we have
E(etX ) 2 2
P(X > b) 6 P(etX > ebt ) 6 bt
6 eσ t /2−bt ,
e
and selecting t = − 12 b2 /σ 2 , we get
1 2 /σ 2
P(X > b) 6 e− 2 b .
The right-hand side is a standard upper-bound for the probability P(N > b) for a cen-
tered gaussian random variable N with variance σ 2 , so this inequality justifies the name
“subgaussian”.
A gaussian random variable is subgaussian by (B.9). But there are many more exam-
ples, in particular the random variables described in the next proposition.
Proposition B.8.2. (1) Let X be a complex-valued random variable and m > 0 a
real number such that E(X) = 0 and |X| 6 m. Then X is m2 -subgaussian.
(2) Let X1 and X2 be independent random variables such that Xi is σi2 -subgaussian.
Then X1 + X2 is (σ12 + σ22 )-subgaussian.
150
Proof. (1) We may assume that X is real-valued, and by considering m−1 X instead
of X, we may assume that |X| 6 1, and of course that X is not almost surely 0. In
particular, the Laplace transform ϕ(t) = E(etX ) is well-defined, and ϕ(t) > 0 for all
t ∈ R. Moreover, it is easy to check that ϕ is smooth on R with
ϕ0 (t) = E(XetX ), ϕ00 (t) = E(X2 etX )
(by differentiating under the integral sign) and in particular
ϕ(0) = 1, ϕ0 (0) = E(X) = 0.
We now define f (t) = log(ϕ(t)) − 21 t2 . The function f is also smooth and satisfies
f (0) = f 0 (0) = 0. Moreover, we compute that
ϕ00 (t)ϕ(t) − ϕ0 (t)2 − ϕ(t)2
f 00 (t) = .
ϕ(t)2
The formula for ϕ00 and the condition |X| 6 1 imply that 0 6 ϕ00 (t) 6 ϕ(t) for all
t ∈ R. Therefore
ϕ00 (t)ϕ(t) − ϕ0 (t)2 − ϕ(t)2 6 −ϕ0 (t)2 6 0,
and hence f 00 (t) 6 0 for all t ∈ R. This means that the derivative of f is decreasing, so
that f 0 (t) 6 0 for t > 0, and f 0 (t) > 0 for t 6 0. Thus f is non-decreasing when t 6 0
and non-increasing when t > 0. In particular, we have f (t) 6 f (0) = 0 for all t ∈ R,
2
which translates exactly to the condition E(etX ) 6 et /2 defining a subgaussian random
variable.
(2) Since X1 and X2 are independent and subgaussian, we have
E(et(X1 +X2 ) ) = E(etX1 ) E(etX2 ) 6 exp( 21 (σ12 + σ22 )t2 )
for any t ∈ R. 
Proposition B.8.3. Let σ > 0 be a real number and let X be a σ 2 -subgaussian random
variable, either real or complex-valued. For any integer k > 0, there exists ck > 0 such
that
E(|X|k ) 6 ck σ k .
Proof. The random variable Y = σ −1 X is 1-subgaussian. As in the proof of Theo-
rem B.5.6 (2), we observe that there exists ck > 0 such that
|Y|k 6 ck (eXk + e−Xk ),
and therefore
σ −k E(|X|k ) = E(|Y|k ) 6 ck (e1/2 + e−1/2 ),
which gives the result. 
Remark B.8.4. A more precise argument leads to specific values of ck . For instance,
if X is real-valued, one can show that the inequality holds with ck = k2k/2 Γ(k/2).
B.9. Poisson random variables
Let λ > 0 be a real number. A random variable X is said to have a Poisson distribution
with parameter λ ∈ [0, +∞[ if and only if it is integral-valued, and if for any integer k > 0,
we have
λk
P(X = k) = e−λ .
k!
One checks immediately that
E(X) = λ, V(X) = λ,
151
and that the characteristic function of X is
X λk
(B.11) ϕX (t) = e−λ eikt = exp(λ(eit − 1)).
k>0
k!

Proposition B.9.1. Let (λn ) be a sequence of real numbers such that λn → +∞ as


n → +∞. Then
Xn − λn

λn
converges in law to a standard gaussian random variable.
Proof. Use the Lévy Criterion: the characteristic function ϕn of Xn is given by
√  p √ 
it(Xn −λn )/ λn it/ λn
ϕn (t) = E(e ) = exp −it λn + λn (e − 1)
for t ∈ R, by (B.11). Since
it √ p  it t2  |t|3  t2  |t|3 
− √ + λn (eit/ λn − 1) = it λn + λn √ − + O 3/2 = − + O 1/2 ,
λn λn 2λn λn 2 λn
we obtain ϕn (t) → exp(−t2 /2), which is the characteristic function of a standard gaussian
random variable. 
B.10. Random series
We will need some fairly elementary results on certain random series, especially con-
cerning almost sure convergence. We first have a well-known sufficient criterion of Kol-
mogorov for convergence in the case of independent summands:
Theorem B.10.1 (Kolmogorov). Let (Xn )n>1 be a sequence of independent complex-
valued random variables such that both series
X
(B.12) E(Xn )
n>1
X
(B.13) V(Xn )
n>1

converge. Then the series X


Xn
n>1
converges almost surely,
P and hence also in law. Moreover, its sum X is square integrable
and has expectation E(Xn ).
Proof. By replacing Xn with Xn − E(Xn ), we reduce to the case where E(Xn ) = 0
for all n. Assuming that this is the case, we will prove that the series converges almost
surely by checking that the sequence of partial sums
X
SN = Xn
16n6N

is almost surely a Cauchy sequence. For this purpose, denote


YN,M = sup |SN+k − SN |
16k6M

for N, M > 1. For fixed N, YN,M is an increasing sequence of random variables; we denote
by YN = supk>1 |SN+k − SN | its limit. Because of the estimate
|SN+k − SN+l | 6 |SN+k − SN | + |SN+l − SN | 6 2YN
152
for N > 1 and k, l > 1, we have
[ \ [[
{(SN )N>1 is not Cauchy} = {|SN+k − SN+l | > 2−k }
k>1 N>1 k>1 l>1
[\
⊂ {YN > 2−k−1 }.
k>1 N>1

It is therefore sufficient to prove that


\
P( {YN > 2−k−1 }) = 0
N>1

for each k > 1, or what amounts to the same thing, to prove that for any ε > 0, we have
lim P(YN > ε) = 0.
N→+∞

(which means that YN converges to 0 in probability).


We begin by estimating P(YN,M > ε). If YN,M was defined as SN+M − SN (without the
sup over k 6 M) this would be easy using the Markov inequality. To handle it, we use
Kolmogorov’s Maximal Inequality (see Lemma B.10.3 below): since the (Xn )N+16n6N+M
are independent, this shows that for any ε > 0, we have
N+M
 X  1 X
P(YN,M > ε) = P sup XN+n > ε 6 2 V(Xn ).
k6M
16n6k
ε n=N+1

Letting M → +∞, we obtain


1 X
P(YN > ε) 6 V(Xn ).
ε2 n>N+1
From the assumption on the convergence of the series of variance, this tends to 0 as
N → +∞, which shows
P that the partial sums converge almost surely as claimed.
Now let X = Xn be the sum of the series, defined almost surely. For N > 1
and M > 1, we have
 N+M
X 2
N+M
X N+M
X
2 2
kSM+N − SN kL2 = E Xn = E(|Xn | ) = V(Xn ).
n=N+1 n=N+1 n=N+1

The assumption that (B.13) converges therefore implies that (SN )N>1 is a Cauchy
sequence in L2 , hence converges. Its limit necessarily coincides (almost surely) with the
sum X, which shows that X is square-integrable. It follows that it is integrable and that
its expectation can be computed as the sum of E(Xn ). 
Remark B.10.2. (1) This result is a special case of Kolmogorov’s Three Series Theo-
rem which gives a necessary and sufficient condition for almost sure convergence of a series
of independent complex random variables (Xn ), namely it is enough that for some c > 0,
and necessary that for all c > 0, the series
X X X
P(|Xn | > c), E(Xcn ), V(Xcn )
n n n

converge, where Xcn = Xn if |Xn | 6 c and Xcn


= 0 otherwise (see, e.g., [9, Th. 22.8] for
the full proof, or try to reduce it to the previous case).
(2) It is worth mentioning two further results for context: (1) the event “the series
converges” is an asymptotic event, in the sense that it doesn’t depend on any finite num-
ber of the random variables; Kolmogorov’s Zero–One Law then shows that this event can
153
only have probability 0 or 1; (2) a theorem of P. Lévy shows that, again for indepen-
dent summands, the almost sure convergence is equivalent to convergence in law, or to
convergence in probability. For proofs and discussion of these facts, see for instance [83,
§0.III].
Here is Kolmogorov’s maximal inequality:
Lemma B.10.3. Let M > 1 be an integer, Y1 , . . . , YM independent complex random
variables in L2 with E(Yn ) = 0 for all n. Then for any ε > 0, we have
M
  1 X
P sup |Y1 + · · · + Yk | > ε 6 2 V(Yn ).
16k6M ε n=1

Proof. Let
Sn = Y1 + · · · + Yn
for 1 6 n 6 M. We define a random variable T with values in [0, +∞] by T = ∞ if
|Sn | 6 ε for all n 6 M, and otherwise
T = inf{n 6 M | |Sn | > ε}.
We then have [

sup |Y1 + · · · + Yk | > ε = {T = n},
16k6M
16n6M

and the union is disjoint. In particular, we get


  XM
P sup |Sk | > ε = P(T = n).
16k6M n=1

We now note that |Sn |2 > ε2 on the event {T = n}, so that we can also write
M
  1 X
(B.14) P sup |Sk | > ε 6 2 E(|Sn |2 1{T=n} ).
16k6M ε n=1
We claim next that
(B.15) E(|Sn |2 1{T=n} ) 6 E(|SM |2 1{T=n} )
for all n 6 M.
Indeed, if we write SM = Sn + Rn , the independence assumption shows that Rn is
independent of (X1 , . . . , Xn ), and in particular is independent of the indicator function
of the event {T = n}, which only depends on X1 , . . . , Xn . Moreover, we have E(Rn ) =
0. Now, taking the modulus square in the definition and multiplying by this indicator
function, we get
|SM |2 1{T=n} = |Sn |2 1{T=n} + Sn Rn 1{T=n} + Sn Rn 1{T=n} + |Rn |2 1{T=n} .
Taking then the expectation, and using the positivity of the last term, this gives
E(|SM |2 1{T=n} ) > E(|Sn |2 1{T=n} ) + E(Sn Rn 1{T=n} ) + E(Sn Rn 1{T=n} ).
But, by independence, we have
E(Sn Rn 1{T=n} ) = E(Sn 1{T=n} ) E(Rn ) = 0,
and similarly E(Sn Rn 1{T=n} ) = 0. Thus we get the bound (B.15).
154
Using this in (B.14), this gives
M
  1 X 1
P sup |Sk | > ε 6 2 E(|SM |2 1{T=n} ) 6 2 E(|SM |2 )
16k6M ε n=1 ε
by positivity once again. 
Exercise B.10.4. Deduce from Kolmogorov’s Theorem the non-trivial direction of
the Borel–Cantelli Lemma: if (An )n>1 is a sequence of independent events such that
X
P(An ) = +∞,
n>1

then an element of the underlying probability space belongs almost surely to infinitely
many of the sets An .
The second result we need is more subtle. It concerns similar series, but without the
independence assumption, which is replaced by an orthogonality condition.
Theorem B.10.5 (Menshov-Rademacher). Let (Xn ) be a sequence of complex-valued
random variables such that E(Xn ) = 0 and
(
0 if n 6= m,
E(Xn Xm ) =
1 if n = m.
Let (an ) be any sequence of complex numbers such that
X
|an |2 (log n)2 < +∞.
n>1

Then the series X


an X n
n>1
converges almost surely, and hence also in law.
Remark B.10.6. Consider the probability space Ω = R/Z with the Lebesgue mea-
sure, and the random variables Xn (t) = e(nt) for n ∈ Z. One easily sees (adapting to
double-sided sequences and symmetric partial sums) that Theorem B.10.5 implies that
the series X
an e(nt)
n∈Z
converges almost everywhere (with respect to Lebesgue measure), provided
X
|an |2 (log |n|)2 < +∞.
n∈Z

This may be proved more directly (see, e.g., [121, III, th. 4.4]), using properties of
Fourier series, but it is far from obvious. Note that, in this P
case, a very famous theorem
of Carleson shows that the condition may be replaced with |an |2 < +∞. On the other
hand, Menshov proved that Theorem B.10.5 can not in general be relaxed in this way:
for general orthonormal sequences, the term (log n)2 can not be replaced by any positive
function f (n) such that f (n) = o((log n)2 ), even for R/Z.
We begin with a lemma which will play an auxiliary role similar to Kolmogorov’s
inequality.
155
Lemma B.10.7. Let (X1 , . . . , XN ) be orthonormal random variables, (a1 , . . . , aN ) be
complex numbers and Sk = a1 X1 + · · · + ak Xk for 1 6 k 6 N. We have
N
X
2 2
|an |2 ,

E max |Sk |  (log N)
16k6N
n=1
where the implied constant is absolute.
Proof. The basic ingredient is a simple combinatorial property, which we present a
bit abstractly. We claim that there exist a family J of discrete intervals
I = {nI , . . . , mI − 1}, mI − nI > 1,
for I ∈ J, with the following two properties:
(1) any interval 1 6 n 6 M with M 6 N is the disjoint union of  log N intervals
I ∈ J;
(2) an integer n with 1 6 n 6 N belongs to  log N intervals in J;
and in both cases the implied constant is independent of N.
To see this, let n > 1 be such that 2n−1 6 N 6 2n (so that n  log N), and consider
for instance the family J of dyadic intervals
Ii,j = {n | 1 6 n 6 N and i2j 6 n < (i + 1)2j }
for 0 6 j 6 n and 1 6 i 6 2n−j . (The proof of both properties in this case are left to the
reader.)
Now, having fixed such a collection of intervals, we denote by T the smallest integer
between 1 and N such that
max |Sk | = |ST |.
16k6N
By our first property of the intervals J, we can write
X
ST = S̃I
I
where I runs over a set of  log N disjoint intervals in J, and
X
S̃I = an Xn
n∈I

is the corresponding partial sum. By the Cauchy–Schwarz inequality, and the first prop-
erty again, we get X X
|ST |2  (log N) |S̃I |2  (log N) |S̃I |2 .
I I∈J
Taking the expectation and using orthonormality, we derive
X
E max |Sk |2 = E(|ST |2 )  (log N) E(|S̃I |2 )

16k6N
I∈J
XX X
= (log N) |an |2  (log N)2 |an |2
I∈J n∈I 16n6N

by the second property of the intervals J. 


Proof of Theorem B.10.5. If the factor (log N)2 in Lemma B.10.7 was replaced
by (log n)2 inside the sum, we would proceed just like the deduction of Theorem B.10.1
from Lemma B.10.3. Since this is not the case, a slightly different argument is needed.
We define
Sn = a1 X1 + · · · + an Xn
156
for n > 1. For j > 0, we also define the dyadic sum
X
S̃j = an Xn = S2j+1 −1 − S2j .
2j 6n<2j+1

We first note that the series X


T= (j + 1)2 |S̃j |2
j>0
converges almost surely. Indeed, since it is a series of non-negative terms, it suffices to
show that E(T) < +∞. But we have
X X X X
E(T) = (j + 1)2 E(|S̃j |2 ) = (j + 1)2 |an |2  |an |2 (log 2n)2 < +∞
j>0 j>0 2j 6n<2j+1 n>1

by orthonormality and by the assumption of the theorem.


Next, we observe that for j > 0 and k > 0, we have
j+k−1  |T| 1/2
X  X 1 1/2 1/2
|S2j+k − S2j | 6 |S̃i | 6 |T| 
i=j j6i<j+k
(i + 1)2 j+1

by the Cauchy-Schwarz inequality. We conclude that the sequence (S2j ) is almost surely
a Cauchy sequence, and hence converges almost surely to a random variable S.
Finally, to prove that (Sn ) converges almost surely to S, we observe that for any n > 1,
and j > 0 such that 2j 6 n < 2j+1 , we have
k
X
(B.16) |Sn − S2j | 6 Mj = max an X n .
2j <k62j+1
m=2j

Lemma B.10.7 implies that


X  X X
E M2j = E(M2j )  (log 2n)2 |an |2 < +∞,
j>0 j>0 n>1

which means in particular that Mj tends to 0 as j → +∞ almost surely. From (B.16)


and the convergence of (S2j )j to S, we deduce that (Sn ) converges almost surely to S.
This finishes the proof. 
We will also use information on the support of the distribution of a random series
with independent summands.
Proposition B.10.8. Let B be a separable Banach space. Let (Xn )n>1 P be a sequence
of independent B-valued random variables such that the series X = Xn converges
3
almost surely. The P support of the law of X is the closure of the set of all convergent
series of the form xn , where xn belongs to the support of the law of Xn for all n > 1.
Proof. For N > 1, we write
N
X
SN = Xn , RN = X − S N .
n=1

The variables SN and RN are independent.


First, we observe that Lemmas B.2.1 and B.2.2 imply that the support of SN is the
closure of the set of elements x1 + · · · + xN with xn ∈ supp(Xn ) for 1 6 n 6 N (apply
3 Recall that by the result of P. Lévy mentioned in Remark B.10.2, this is in fact equivalent to
convergence in law.
157
Lemma B.2.1 to the law of (X1 , . . . , XN ) on BN , which has support the product of the
N
supp(Xn ) by Lemma B.2.2, and to the addition Pmap B → B).
We first prove that all convergent series xn with xn ∈ supp(Xn ) belong to the
support of X,Phence the closure of this set is contained in the support of X, as claimed.
Thus let x = xn be of this type. Let ε > 0 be fixed.
For all N large enough, we have
X
xn < ε,
n>N
and it follows that x1 + · · · + xN , which belongs to the support of SN as first remarked,
also belongs to the open ball Uε of radius ε around x. Hence
P(SN ∈ Uε ) > 0
for all N large enough (Uε is an open neighborhood of some element in the support of
SN ).
Now the almost sure convergence implies (by the dominated convergence theorem, for
instance) that P(kRN k > ε) → 0 as N → +∞. Therefore, taking N suitably large, we get
P(kX − xk < 2ε) > P(kSN − xk < ε and kRN k < ε)
= P(kSN − xk < ε) P(kRN k < ε) > 0
(by independence). Since ε is arbitrary, this shows that x ∈ supp(X), as claimed.
Conversely, let x ∈ supp(X). For any ε > 0, we have
 X 
P Xn − x < ε > 0.
n>1

Since, for any n0 > 1, we have


 X 
P Xn − x < ε and Xn0 ∈
/ supp(Xn0 ) = 0,
n>1

this means in fact that


 X 
P Xn − x < ε and Xn ∈ supp(Xn ) for all n > 0.
n>1
P
In particular, we can find xn ∈ supp(Xn ) such that the series xn converges and
X
xn − x < ε,
n>1
P
and hence x belongs to the closure of the set of convergent series xn with xn in the
support of Xn for all n. 
B.11. Some probability in Banach spaces
We consider in this section some facts about probability in a (complex) Banach space
V. Most are relatively elementary. For simplicity, we will always assume that V is
separable (so that, in particular, Borel measures on V have a well-defined support).
The first result concerns series X
Xn
n
where (Xn ) is a sequence of symmetric random variables, which means that for any N > 1,
and for any choice (ε1 , . . . , εN ) of signs εn ∈ {−1, 1} for 1 6 n 6 N, the random vectors
(X1 , . . . , XN ) and (ε1 X1 , . . . , εN XN )
158
have the same distribution.
Symmetric random variables have remarkable properties. For instance, we have:
Proposition B.11.1 (Lévy). Let V be a separable Banach space with norm k · k,
and (Xn ) a sequence of V-valued random variables. Assume that the sequence (Xn ) is
symmetric. Let
SN = X1 + · · · + XN
for N > 1. For N > 1 and ε > 0, we have
P( max kSN k > ε) 6 2 P(kSN k > ε).
16n6N

This result is known as Lévy’s reflection principle, and can be compared with Kol-
mogorov’s maximal inequality (Lemma B.10.3).
Proof. (1) Similarly to the proof of Lemma B.10.3, we define a random variable T
by T = ∞ if kSn k 6 ε for all n 6 N, and otherwise
T = inf{n 6 N | kSn k > ε}.
Assume T = k and consider the random variables
X0n = Xn for 1 6 n 6 k, X0n = −Xn for k + 1 6 n 6 N.
By assumption, the sequence (X0n )16n6N has the same distribution as (Xn )16n6N . Let
S0n denote the partial sums of the sequence (X0n ), and T0 the analogue of T for the sequence
(X0n ). The event {T0 = k} is the same as {T = k} since X0n = Xn for n 6 k. On the other
hand, we have
S0N = X1 + · · · + Xk − Xk+1 − · · · − XN = 2Sk − SN .
Therefore
P(kSN k > ε and T = k) = P(kS0N k > ε and T0 = k) = P(k2Sk − SN k > ε and T = k).
By the triangle inequality we have
{T = k} ⊂ {kSN k > ε and T = k} ∪ {k2Sk − SN k > ε and T = k}.
We deduce
N
X
P( max kSn k > ε) = P(T = k)
16n6N
k=1
N
X N
X
6 P(kSN k > ε and T = k) + P(k2Sn − SK k > ε and T = k)
k=1 k=1
= 2 P(kSN k > ε).

We now consider the special case where the Banach space V is C([0, 1]), the space of
complex-valued continuous functions on [0, 1] with the norm
kf k∞ = sup |f (t)|.
t∈[0,1]

For a C([0, 1])-valued random variable X and any fixed t ∈ [0, 1], we will denote by X(t)
the complex-valued random variable that is the value of the random function X at t, i.e.,
X(t) = et ◦ X, where et : C([0, 1]) −→ C is the evaluation at t.
159
Definition B.11.2 (Convergence of finite distributions). Let (Xn ) be a sequence of
C([0, 1])-valued random variables and let X be a C([0, 1])-valued random variable. One
says that (Xn ) converges to X in the sense of finite distributions if and only if, for all
integers k > 1, and for all
0 6 t1 < · · · < tk 6 1,
the random vectors (Xn (t1 ), . . . , Xn (tk )) converge in law to (X(t1 ), . . . , X(tk )), in the sense
of convergence in law in Ck .
One sufficient condition for convergence of finite distributions is the following:
Lemma B.11.3. Let (Xn ) be a sequence of C([0, 1])-valued random variables and let X
be a C([0, 1])-valued random variable, all defined on the same probability space. Assume
that, for any t ∈ [0, 1], the random variables (Xn (t)) converge in L1 to X(t). Then (Xn )
converges to X in the sense of finite distributions.
Proof. Fix k > 1 and
0 6 t1 < · · · < tk 6 1.
Let ϕ be a Lipschitz function on Ck (given the distance associated to the norm
X
k(z1 , . . . , zk )k = |zi |,
i

for instance) with Lipschitz constant C > 0. Then we have


k
X
E(ϕ(Xn (t1 ), . . . , Xn (tk ))) − E(ϕ(X(t1 ), . . . , X(tk ))) 6 C E(|Xn (ti ) − X(ti )|)
i=1

which tends to 0 as n → +∞ by our assumption. Hence Proposition B.4.1 shows that


(Xn (t1 ), . . . , Xn (tk )) converges in law to (X(t1 ), . . . , X(tk )). This proves the lemma. 
Convergence in finite distributions is a necessary condition for convergence in law of
(Xn ) to X, but it is not sufficient: a simple example (see [10, Example 2.5]) consists in
taking the random variable Xn to be the constant random variable equal to the function
fn that is piecewise linear on [0, 1/n], [1/n, 1/(2n)] and [1/(2n), 1], and such that 0 7→ 0,
1/n 7→ 1, 1/(2n) 7→ 0 and 1 7→ 0. Then it is elementary that Xn converges to the constant
zero random variable in the sense of finite distributions, but that Xn does not converge
in law to 0 (because fn does not converge uniformly to 0).
Nevertheless, under the additional condition of tightness of the sequence of random
variables (see Definition B.3.6), the convergence of finite distributions implies convergence
in law.
Theorem B.11.4. Let (Xn ) be a sequence of C([0, 1])-valued random variables and let
X be a C([0, 1])-valued random variable. Suppose that (Xn ) converges to X in the sense
of finite distributions. Then (Xn ) converges in law to X in the sense of C([0, 1])-valued
random variables if and only if (Xn ) is tight.
For a proof, see, e.g., [10, Th. 7.1]. The key ingredient is Prokhorov’s Theorem
(see [10, Th. 5.1]) that states that a tight family of random variables is relatively compact
in the space P of probability measures on C([0, 1]), given the topology of convergence in
law. To see how this implies the result, we note that convergence in the sense of finite
distributions of a sequence implies at least that it has at most one limit in P (because
probability measures on C([0, 1]) are uniquely determined by the family of their finite
160
distributions, see [10, Ex. 1.3]). Suppose now that there exists a continuous bounded
function f on C([0, 1]) such that
E(f (Xn ))
does not converge to E(f (X)). Then there exists δ > 0 and some subsequence (Xnk ) that
satisfy | E(f (Xnk ) − f (X))| > δ for all k. This subsequence also converges to X in the
sense of finite distributions, and by relative compactness admits a further subsequence
that converges in law; but the limit of that further subsequence must then be X, which
contradicts the inequality above.
Remark B.11.5. For certain purposes, it is important to observe that this proof of
convergence in law is indirect and does not give quantitative estimates.
We will also use a variant of this result involving Fourier series. A minor issue is that
we wish to consider functions f on [0, 1] that are not necessarily periodic, in the sense
that f (0) might differ from f (1). However, we will have f (0) = 0. To account for this, we
use the identity function in addition to the periodic exponentials to represent continuous
functions with f (0) = 0.
We denote by C0 ([0, 1]) the subspace of C([0, 1]) of functions vanishing at 0. We
denote by e0 the function e0 (t) = t, and for h 6= 0, we put eh (t) = e(ht). We denote
further by C0 (Z) the Banach space of complex-valued functions on Z converging to 0 at
infinity with the sup norm. We define a continuous linear map FT : C([0, 1]) → C0 (Z) by
mapping f to the sequence (fe(h))h∈Z of its Fourier coefficients, where fe(0) = f (1) and
for h 6= 0, we have
Z 1 Z 1
f (h) =
e (f (t) − tf (1))e(−ht)dt = (f − f (1)e0 )e−h .
0 0
We want to relate convergence in law in C0 ([0, 1]) with convergence, in law or in the
sense of finite distributions, of these “Fourier coefficients” in C0 (Z). Here convergence
of finite distributions of a sequence (Xn ) of C0 (Z)-valued random variables to X means
that for any H > 1, the vectors (Xn,h )|h|6H converge in law to (Xh )|h|6H , in the sense of
convergence in law in C2H+1 .
First, since FT is continuous, Proposition B.3.2 gives immediately
Lemma B.11.6. If (Xn )n is a sequence of C0 ([0, 1])-valued random variables that con-
verges in law to a random variable X, then FT(Xn ) converges in law to FT(X).
Next, we check that the Fourier coefficients determine the law of a C0 ([0, 1])-valued
random variable (this is the analogue of [10, Ex. 1.3]).
Lemma B.11.7. If X and Y are C0 ([0, 1])-valued random variables and if FT(X) and
FT(Y) have the same finite distributions, then X and Y have the same law.
Proof. For f ∈ C0 ([0, 1]), the function g(t) = f (t) − tf (1) extends to a 1-periodic
continuous function on R. By Féjer’s Theorem on the uniform convergence of Cesàro
means of Fourier series of continuous periodic functions (see, e.g, [121, III, Th. 3.4]), we
have
X |h|  e
f (t) − tf (1) = lim 1− f (h)e(ht)
H→+∞ H
|h|6H
uniformly for t ∈ [0, 1]. Evaluating at t = 0, where the left-hand side vanishes, we deduce
that
f = lim CH (f )
H→+∞

161
where
X |h|  e
CH (f ) = f (1)e0 + 1− f (h)(eh − 1).
H
|h|6H
h6=0
Note that CH (f ) ∈ C0 ([0, 1]).
We now claim that CH (X) converges to X as C0 ([0, 1])-valued random variables. In-
deed, let ϕ be a bounded continuous function on C0 ([0, 1]), say with |ϕ| 6 M. By
the above, we have ϕ(CH (X)) → ϕ(X) as H → +∞ pointwise on C0 ([0, 1]). Since
|ϕ(CH (X))| 6 M, which is integrable on the underlying probability space, Lebesgue’s
dominated convergence theorem implies that E(ϕ(CH (X))) → E(ϕ(X)). This proves the
claim.
In view of the definition of CH (f ), which only involves finitely many Fourier coeffi-
cients, the equality of finite distributions of FT(X) and FT(Y) implies by composition
that for any H > 1, the C0 ([0, 1])-valued random variables CH (X) and CH (Y) have the
same law. Since we have seen that CH (X) converges in law to X and that CH (Y) converges
in law to Y, it follows that X and Y have the same law. 
Now comes the convergence criterion:
Proposition B.11.8. Let (Xn ) be a sequence of C0 ([0, 1])-valued random variables
and let X be a C0 ([0, 1])-valued random variable. Suppose that FT(Xn ) converges to
FT(X) in the sense of finite distributions. Then (Xn ) converges in law to X in the sense
of C0 ([0, 1])-valued random variables if and only if (Xn ) is tight.
Proof. It is an elementary general fact that if (Xn ) converges in law to X, then
the family (Xn ) is tight. We prove the converse assertion. It suffices to prove that any
subsequence of (Xn ) has a further subsequence that converges in law to X (see [10, Th.
2.6]). Because (Xn ) is tight, so is any of its subsequences. By Prokhorov’s Theorem ([10,
Th. 5.1]), such a subsequence therefore contains a further subsequence, say (Xnk )k>1 ,
that converges in law to some probability measure Y. By Lemma B.11.6, the sequence of
Fourier coefficients FT(Xnk ) converges in law to FT(Y). On the other hand, this sequence
converges to FT(X) in the sense of finite distributions, by assumption. Hence FT(X) and
FT(Y) have the same finite distributions, which implies that X and Y have the same law
by Lemma B.11.7. 
Remark B.11.9. The example that was already mentioned before Theorem B.11.4
(namely [10, Ex. 2.5]) also shows that the convergence of FT(Xn ) to FT(X) in the
sense of finite distributions is not sufficient to conclude that (Xn ) converges in law to X.
Indeed, the sequence (Xn ) in that example does not converge in law in C([0, 1]), but for
n > 1, the (constant) random variable Xn satisfies Xn (1) = 0, and by direct computation,
the Fourier coefficients (are constant and) satisfy also |Xe n (h)| 6 n−1 for all h 6= 0, which
implies that FT(Xn ) converges in law to the constant random variable equal to 0 ∈ C0 (Z).
In applications, we need some criteria to detect tightness. One such criterion is due
to Kolmogorov:
Proposition B.11.10 (Kolmogorov’s tightness criterion). Let (Xn ) be a sequence of
C([0, 1])-valued random variables. If there exist real numbers α > 0, δ > 0 and C > 0
such that, for any real numbers 0 6 s < t 6 1 and any n > 1, we have
(B.17) E(|Xn (t) − Xn (s)|α ) 6 C|t − s|1+δ ,
then (Xn ) is tight.
162
See for instance [81, Th. I.7] for a proof. The statement does not hold if the exponent
1 + δ is replaced by 1.
In fact, for some applications (as in [79]), one needs a variant where the single
bound (B.17) is replaced by different ones depending on the size of |t − s| relative to n.
Such a result does not seem to follow formally from Proposition B.11.10, because the
left-hand side in the inequality is not monotonic in terms of α (in contrast with the right-
hand side, which is monotonic since |t − s| 6 1). We state a result of this type and sketch
the proof for completeness.
Proposition B.11.11 (Kolmogorov’s tightness criterion, 2). Let (Xn ) be a sequence
of C([0, 1])-valued random variables. Suppose that there exist positive real numbers
α1 , α2 , α3 , β2 < β1 , δ, C
such that for any real numbers 0 6 s < t 6 1 and any n > 1, we have
(B.18) E(|Xn (t) − Xn (s)|α1 ) 6 C|t − s|1+δ if 0 6 |t − s| 6 n−β1 ,
(B.19) E(|Xn (t) − Xn (s)|α2 ) 6 C|t − s|1+δ if n−β1 6 |t − s| 6 n−β2 ,
(B.20) E(|Xn (t) − Xn (s)|α3 ) 6 C|t − s|1+δ if n−β2 6 |t − s| 6 1.
Then (Xn ) is tight.
Sketch of proof. For n > 1, let Dn ⊂ [0, 1] be the set of dyadic rational numbers
with denominator 2n . For δ > 0, let
ω(Xn , δ) = sup{|Xn (t) − Xn (s)| | s, t ∈ [0, 1], |t − s| 6 δ}
denote the modulus of continuity of Xn , and for n > 1 and k > 0, let
ξn,k = sup{|Xn (t) − Xn (s)| | s, t ∈ Dk , |s − t| = 2−k }.
We observe that for any α > 0, we have
X
α
E(ξn,k )6 E(|Xn (t) − Xn (s)|α ).
s,t∈Dk
|t−s|=2−k

As in [60, p. 269], the key step is to prove that


lim lim sup P(ω(Xn , 2−m ) > η) = 0
m→+∞ n→+∞

for any η > 0 (the conclusion is then derived from this fact combined with the Ascoli–
Arzela Theorem characterizing compact subsets of C([0, 1])). It is convenient here to use
the notation min(a, b) = a ∧ b. For fixed m and n, we then write

P(ω(Xn , 2−m ) > η) 6 P( sup |Xn (t) − Xn (s)| > η)+


|t−s|62−m ∧n−β1

P( sup |Xn (t) − Xn (s)| > η)+


2−m ∧n−β1 <|t−s|62−m ∧n−β2

P( sup |Xn (t) − Xn (s)| > η)


2−m ∧n−β2 <|t−s|62−m

(because, for a continuous function f on [0, 1], if |f (t) − f (s)| > η for some (s, t) such
that |t − s| 6 2−m , then there exist some dyadic rational numbers (s0 , t0 ), necessarily with
163
denominator 2n with n > m, such that |f (t0 ) − f (s0 )| > η). Using (B.18), the first term
is α1
X E(ξn,k ) C X k −k(1+δ) C 1
6 α1
6 α1
2 2 6 α1 1 − 2−δ
,
k>m
η η k>m
η
2−k 6n−β1
and similarly, using (B.19), the second (resp. using (B.20), the third) is
α2
X E(ξn,k ) C X k −k(1+δ) C 1
6 α
6 α
2 2 6 α 2 1 − 2−δ
,
k>m
η 2 η 2
k>m
η
β2 log2 (n)6k6β1 log2 (n)
−α3
(resp. 6 Cη (1 − 2−δ )−1 ). The result follows. 
We will also use the following inequality of Talagrand, which gives a type of subgaus-
sian behavior of sums of random variable in Banach spaces, extending standard properties
of real or complex-valued random variables.
Theorem B.11.12 (Talagrand). Let V be a separable real Banach space and V0 its
dual. Let (Xn )n>1 be a sequence of independent real-valued random variables with |Xn | 6 1
almost
P surely, and let (vn )n>1 be a sequence of elements of V. Assume that the series
Xn vn converges almost surely in V. Let m > 0 be a median of
X
Xn vn .
n>1

Let σ > 0 be the real number such that


X
σ 2 = sup |λ(vn )|2 .
λ∈V0 n>1
kλk61

For any real number t > 0, we have


 X   t2 
P Xn vn > tσ + m 6 4 exp − .
n>1
16

We recall that a median m of a real-valued random variable X is any real number


such that
1 1
P(X > m) > , P(X 6 m) > .
2 2
A median always exists. If X is integrable, then Chebychev’s inequality
E(|X|)
(B.21) P(X > t) 6
t
shows that m 6 2 E(|X|).
Proof. This follows easily from [114, Th. 13.2], which concerns finite sums, by
passing to the limit. 
The application of this inequality will be the following, which is (in the case V = R)
partly a simple variant of a result of Montgomery-Smith [88].
Proposition B.11.13. Let V be a separable real or complex Banach space and V0 its
dual. Let (Xn )n>1 be a sequence of independent random variables with |Xn | 6 1 almost
surely, which are either real or complex-valued depending
Pon the base field. Let (vn )n>1
be a sequence of elements of V. Assume that the series Xn vn converges almost surely
in V, and let X be its sum.
164
(1) Assume that
X X 1
(B.22) kvn k  log(N), kvn k2 
n6N n>N
N
for all N > 1. There exists a constant c > 0 such that for any A > 0, we have
P(kXk > A) 6 c exp(− exp(c−1 A)).
(2) Assume that V is a real Banach space, that (Xn ) is symmetric, identically dis-
tributed, and real-valued, and that there exists λ ∈ V0 of norm 1 such that
X
(B.23) |λ(vn )|  log(N)
n6N

for N > 1. Then there exists a constant c0 > 0 such that for any A > 0, we have
c−1 exp(− exp(cA)) 6 P (|λ(X)| > A) 6 P (kXk > A) .
Proof. We begin with (1), and we first check that we may assume that V is a real
Banach space and that the random variables Xn are real-valued. To see this, if V is
a complex Banach space, we view it as a real Banach space VR (by restricting scalar
multiplication), and we write Xn = Yn + iZn where Yn and Zn are real-valued random
variables. Then X = Y + iZ where
X X
Y= Yn vn , Z= Zn vn
n>1 n>1

are both almost surely convergent series in VR with independent real coefficients of ab-
solute value 6 1. We then have
P (kXk > A) 6 P kYk > 21 A or kZk > 12 A 6 P kYk > 12 A + P kZk > 12 A
  

for any A > 0, by the triangle inequality. Since the assumptions (B.22) hold indepen-
dently of whether V is viewed as a real or complex Banach space, we deduce that if (1)
holds in the real case, then it also does for complex coefficients.
We now assume that V is a real Banach space. The idea is that if V was simply equal
to R, then the series X would be a subgaussian random variable, and standard estimates
would give a subgaussian upper-bound for P(|X| > A), of the type exp(−cA2 ). Such
a bound would be essentially sharp for a gaussian series. But although this is already
quite strong, it is far from the truth here; intuitively, this is because, in the gaussian case,
the lower-bound for the probability arises from the small but non-zero probability that
a single summand (distributed like a gaussian) might be very large. This cannot happen
for the series X, because each Xn is absolutely bounded.
The actual proof “interpolates” between the subgaussian behavior (given by Tala-
grand’s inequality, when the Banach space is infinite-dimensional) and the boundedness
of the coefficients (Xn ) of the first few steps. This principle goes back (at least) to
Montgomery-Smith [88], and has relations with the theory of interpolation of Banach
spaces.
Fix an auxiliary parameter s > 1. We write X = X] + X[ , where
X X
X] = Xn vn , X[ = Xn vn .
16n6s2 n>s2

Let m be a median of the real random variable kX[ k. Then for any α > 0 and β > 0, we
have
P(kXk > α + β + m) 6 P(kX] k > α) + P(kX[ k > m + β),
165
by the triangle inequality. We pick
X
α=8 kvn k
16n6s2

so that by the assumption |Xn | 6 1, we have


P(kX] k > α) = 0.
Then we take β = sσ, where σ > 0 is such that
X
σ 2 = sup |λ(vn )|2 ,
kλk61
n>s2

where λ runs over the elements of norm 6 1 of the dual space V0 . By Talagrand’s
Inequality (Theorem B.11.12), we have
 s2 
P(kX[ k > m + β) 6 4 exp −
8
Hence, for all s > 1, we have
 s2 
P(kXk > α + β + m) 6 4 exp − .
8
We now select s as large as possible so that m + α + β 6 A. By Chebychev’s inequal-
ity (B.21), we have
X
m 6 2 E(kX[ k) 6 2 kvn k
16h6s2

so that
X
(B.24) m+α kvn k  log(2s)
16n6s2

for any s > 1 by (B.22). Moreover, for any linear form λ with kλk 6 1, we have
X X 1
|λ(vn )|2  kvn k2  2
2 2
s
n>s n>s

by (B.22), so that σ  s−1 and β = sσ  1. It follows that


(B.25) m + α + β 6 c log(cs)
for some constant c > 1 and all s > 1. We finally select s so that c log(cs) = A, i.e.
1 A
s = exp
c c
(assuming, as we may, that A is large enough so that s > 1) and deduce that
 s2   1  A 
P(kXk > A) 6 4 exp − = 4 exp − 2 exp .
8 8c c
This gives the desired upper bound.
We now prove (2). Replacing the vectors vn by the real numbers λ(vn ) (recall that (2)
is a statement for real Banach spaces and random variables), we may assume that V = R.
Let α > 0 be such that X
|λ(vn )| > α log(N)
n6N

166
for N > 1 and let β be a median of |Xn |. We then derive
 −1
X 
P(|X| > A) > P Xn > βn for 1 6 n 6 e(αβ) A and vn Xn > 0 .
n>eA/(αβ)

Since the random variables (Xn ) are independent, this leads to


 1 bexp(A/(αβ))c  X 
P(|X| > A) > P vn Xn > 0 .
4 A/(αβ)
n>e
Furthermore, since each Xn is symmetric, so is the sum
X
vn Xn ,
n>eA/(αβ)

which means that it has probability > 1/2 to be > 0. Therefore we have
1
P(|X| > A) > e−(log 4) exp(A/(αβ)) .
8
This is of the right form asymptotically, and thus the proof is completed. 
Remark B.11.14. (1) The typical example where the proposition applies is when kvn k
is comparable to 1/n.
(2) Many variations along these lines are possible. For instance, in Chapter 3, we
encounter the situation where the vector vn is zero unless n = p is a prime p, in which
case
1
kvp k = σ
p
for some real number σ such that 1/2 < σ < 1. In that case, we have
X N1−σ X 1 1
kvn k  , kvn k2  2σ−1
n6N
log N n>N
N log N
for N > 2 (by the Prime Number Theorem) instead of (B.22), and the adaptation of the
arguments in the proof of the proposition lead to
 
1/(1−σ) 1/(2(1−σ))
P(kXk > A) 6 c exp −cA (log A)
(Indeed, check that (B.25) gives here
s2(1−σ)
m+α+β  √
log s
and we take
s = A1/(2(1−σ)) (log A)1/(4(1−σ))
in the final application of Talagrand’s inequality.)
On the other hand, in Chapter 5, we have a case where (up to re-indexing), the
assumptions (B.22) and (B.23) are replaced by
X X log N
kvn k  (log N)2 , kvn k2  .
n6N n>N
N
Then we obtain by the same argument the estimates
P(kXk > A) 6 c exp(− exp(c−1 A1/2 )),
c−1 exp(− exp(cA1/2 )) 6 P(|λ(X)| > A) 6 P(kXk > A)
for some real number c > 0.
167
APPENDIX C

Number theory

We review here the facts of number theory that we use, and give references for their
proofs.
C.1. Multiplicative functions and Euler products
Analytic number theory frequently deals with functions f defined for integers n > 1
such that f (mn) = f (m)f (n) whenever m and n are coprime. Any such function that
is not identically zero is called a multiplicative function.1 A multiplicative function is
uniquely determined by the values f (pk ) for primes p and integers k > 1, and satisfies
f (1) = 1.
We recall that if f and g are functions defined for positive integers, the Dirichlet
convolution f ? g is defined by
X n
(f ? g)(n) = f (d)g .
d
d|n

Its key property is that the generating Dirichlet series


X
(f ? g)(n)n−s
n>1

for f ?g is the product of the generating Dirichlet series for f and g (see Proposition A.4.4).
In particular, one deduces that the convolution is associative and commutative, and that
the function δ such that δ(1) = 1 and δ(n) = 0 for all n > 2 is a neutral element. In
other words, for any arithmetic functions f , g and h, we have
f ? g = g ? f, f ? (g ? h) = (f ? g) ? h, f ? δ = f.
Lemma C.1.1. Let f and g be multiplicative functions. Then the Dirichlet convolution
f ? g of f and g is multiplicative. Moreover, the function f } g defined by
X
(f } g)(n) = f (a)g(b)
[a,b]=n

is also multiplicative.
Proof. Both statements follow simply from the fact that if n and m are coprime
integers, then any divisor d of nm can be uniquely written d = d0 d00 where d0 divides n
and d00 divides m. 
Example C.1.2. To get an idea of the behavior of a multiplicative function, it is
always useful to write down the values at powers of primes. In the situation of the
lemma, the Dirichlet convolution satisfies
k
X
k
(f ? g)(p ) = f (pj )g(pk−j ),
j=0

1We emphasize that it is not required that f (mn) = f (m)f (n) for all pairs of positive integers.
168
whereas
k−1
X
k
(f } g)(p ) = (f (pj )g(pk ) + f (pk )g(pj )) + f (pk )g(pk ).
j=0

In particular, suppose that f and g are supported on squarefree integers, so that f (pk ) =
g(pk ) = 0 for any prime if k > 2. Then f } g is also supported on squarefree integers
(this is not necessarily the case of f ? g), and satisfies
(f } g)(p) = f (p) + g(p) + f (p)g(p)
for all primes p.
A very important multiplicative function is the Möbius function.
Definition C.1.3. The Möbius function µ(n) is the multiplicative function supported
on squarefree integers such that µ(p) = −1 for all primes p.
In other words, if we factor
n = p1 · · · pj
where each pi is prime, then we have µ(n) = 0 if there exists i 6= j such that pi = pj , and
otherwise µ(n) = (−1)j .
A key property of multiplicative functions is their Euler product expansion, as a
product over primes.
Lemma C.1.4. Let f be a multiplicative function such that
X
|f (n)| < +∞.
n>1

Then we have X YX 


j
f (n) = f (p )
n>1 p j>0

where the product on the right is absolutely convergent. In particular, for all s ∈ C such
that
X f (n)

n>1
ns
converges absolutely, we have
X f (n) Y
s
= (1 + f (p)p−s + · · · + f (pk )p−ks + · · · ),
n>1
n p

where the right-hand side converges absolutely.


Proof. For any prime p, the series
1 + f (p) + · · · + f (pk ) + · · ·
P
is a subseries of f (n), so that the absolute convergence of the latter (which holds by
assumption) implies that all of these partial series are also absolutely convergent.
We now first assume that f (n) > 0 for all n. Then for any N > 1, we have
YX X
f (pk ) = f (n)
p6N k>0 n>1
p|n⇒p6N

169
by expanding the product and using the absolute convergence and the uniqueness of
factorization of integers. It follows that
YX X X
f (pk ) − f (n) 6 f (n)
p6N k>0 n6N n>N
P
(since we assume f (n) > 0). This converges to 0 as N → +∞, because the series f (n)
is absolutely convergent. Thus this case is done.
In the general case, replacing f by |f |, the previous argument shows that the product
converges absolutely. Then we get in the same manner
YX X X
f (pk ) − f (n) 6 |f (n)| −→ 0
p6N k>0 n6N n>N

as N → +∞. 
Corollary C.1.5. For any s ∈ C such that Re(s) > 1, we have
X Y 1
(C.1) n−s = −s
,
n>1 p
1 − p
X Y 1
(C.2) µ(n)n−s = (1 − p−s ) = .
n>1 p
ζ(s)

Example C.1.6. The fact that the Dirichlet series for the Möbius function is the
inverse of the Riemann zeta function, combined with the link between multiplication and
Dirichlet convolution, leads to the so-called Möbius inversion formula: for arithmetic
functions f and g, the relations
X X n
g(n) = f (d), f (n) = µ(d)g
d
d|n d|n

(for all n > 1) are equivalent. (Indeed the first means that g = f ? 1, where 1 is the
constant function, and the second that f = g?µ; since µ?1 = δ, which is the multiplicative
function version of the identity ζ(s)−1 · ζ(s) = 1, the equivalence of the two follows from
the associativity of the convolution.)
Example C.1.7. Let f and g be multiplicative functions supported on squarefree
integers defining absolutely convergent series. Then for σ > 0, we have
X f (m)g(n) X (f } g)(d) Y 
−σ
σ
= σ
= 1 + (f (p) + g(p) + f (p)g(p))p .
n>1
[m, n] d>1
d p

For instance, consider the case where f and g are both the Möbius function µ. Then
µ } µ is supported on squarefree numbers and takes value −1 − 1 + 1 = −1 at primes, so
is in fact equal to µ. We obtain the nice formula
X µ(m)µ(n) X (f } g)(d) Y 1 X 1
−s
s
= s
= 1 − s
= µ(n)n = ,
n>1
[m, n] d>1
d p
p n>1
ζ(s)
for Re(s) > 1.
Example C.1.8. Another very important multiplicative arithmetic function is the
Euler function ϕ defined by ϕ(q) = |(Z/qZ)× | for q > 1. This function is multiplicative,
by the Chinese Remainder Theorem, which implies that there exists an isomorphism of
groups
(Z/q1 q2 Z)× ' (Z/q1 Z)× × (Z/q2 Z)×
170
when q1 and q2 are coprime integers. We have ϕ(p) = p − 1 if p is prime, and more
generally ϕ(pk ) = pk − pk−1 for p prime and k > 1 (since an element x of Z/pk Z is
invertible if and only if its unique lift in {0, . . . , pk − 1} is not divisible by p). Hence, by
factorization, we obtain the product expansion
Y
vp (n) vp (n)−1
Y 1
ϕ(n) = (p −p )=n 1−
p
p|n p|n

where vp (n) is the power p-adic valuation of n, i.e., the exponent of the power of p dividing
exactly n.
We deduce from Lemma C.1.4 the expression
X Y 
−s −s 2 −2s k k−1 −ks
ϕ(n)n = 1 + (p − 1)p + (p − p)p + · · · + (p − p )p + ···
n>1 p

ζ(s − 1)
= ,
ζ(s)
again valid for Re(s) > 1. This may also be deduced from the formula
X n
ϕ(n) = µ(d) ,
d
d|n

i.e., ϕ = µ ? Id, where Id is the identity arithmetic function.

C.2. Additive functions


We also often encounter additive functions (although they are not so important in
this book), which are complex-valued functions g defined for integers n > 1 such that
g(nm) = g(n) + g(m) for all pairs of coprime integers n and m. In particular, we have
then g(1) = 0.
If g is an additive function, then we can write
X
g(n) = g(pvp (n) )
p

for any n > 1, where vp is the p-adic valuation (which is zero for all but finitely many p).
As for multiplicative functions, an additive function is therefore determined uniquely by
its values at prime powers.
Some standard examples are given by g(n) = log n, or more generally g(n) = log f (n),
where f is a multiplicative function that is always positive. The arithmetic function ω(n)
that counts the number of prime factors of an integer n > 1 (without multiplicity) is also
additive; it is of course the subject of the Erdős–Kac Theorem.
Conversely, if g is an additive function, then for any complex number s ∈ C, the
function n 7→ esg(n) is a multiplicative function.

C.3. Primes and their distribution


For any real number x > 1, we denote by π(x) the prime counting function, i.e.,
the number of prime numbers p 6 x. This is of course one of the key functions of
interest in multiplicative number theory. Except in the most elementary cases, interesting
statements require some information on the size of π(x).
The first non-trivial quantitative bounds are due to Chebychev, giving the correct
order of magnitude of π(x), and were elaborated by Mertens to obtain other very useful
estimates for quantities involving primes.
171
Proposition C.3.1 (Chebychev and Mertens estimates). (1) There exist positive
constants c1 and c2 such that
x x
c1 6 π(x) 6 c2
log x log x
for all x > 2.
(2) For any x > 3, we have
X1
= log log x + O(1).
p6x
p

(3) For any x > 3, we have


X log p
= log x + O(1).
p6x
p

See, e.g, [59, §2.2] or [52, Th. 7, Th. 414] (resp. [59, (2.15)] or [52, Th. 427]; [59,
(2.14)] or [52, Th. 425]) for a proof of the first (resp. second, third) estimate.
Exercise C.3.2. (1) Show that the first estimate implies that the n-th prime is of
size about n log n (up to multiplicative constants), and also implies the bounds
X1
log log x   log log x
p6x
p

for x > 3.
(2) Let π2 (x) be the numbers of integers n 6 x such that n is the product of at most
two primes (possibly equal). Prove that there exist positive constants c3 and c4 such that
x log log x x log log x
c3 6 π2 (x) 6 c4
log x log x
for all x > 3.
The real key result in the study of primes is the Prime Number Theorem with a strong
error term:
Theorem C.3.3. Let A > 0 be an arbitrary real number. For x > 2, we have
 x 
(C.3) π(x) = li(x) + O ,
(log x)A
where li(x) is the logarithmic integral
Z x
dt
li(x) =
2 log t
and the implied constant depends only on A. More generally, for α > 0 fixed, we have
Z x  x1+α 
X
α dt
p = tα +O
p6x 2 log t (log x)A

where the implied constant depends only on A.


For a proof, see for instance [59, §2.4 or Cor. 5.29]. By an elementary integration by
parts, we have Z x
dt x  x 
li(x) = = +O ,
2 log t log x (log x)2
172
for x > 2, hence the “usual” simple asymptotic version of the Prime Number Theorem
x
π(x) ∼ , as x → +∞.
log x
However, note that if one expresses the main term in the “simple” form x/ log x, the error
term cannot be better than x/(log x)2 .
The Prime Number Theorem easily implies a stronger form of the Mertens formula:
Corollary C.3.4. There exists a constant C ∈ R such that
X1
(C.4) = log log x + C + O((log x)−1 ).
p6x
p

Exercise C.3.5. Show that (C.4) is in fact equivalent with the Prime Number The-
orem in the form
x
π(x) ∼
log x
as x → +∞.
Another estimate that will be useful in Chapter 4 is the following:
Proposition C.3.6. Let A > 0 be a fixed real number. For all x > 2, we have
Y A
1+  (log x)A ,
p6x
p
Y A −1
1−  (log x)A ,
p6x
p

where the implied constant depends only on A.


Proof. In both cases, if we compute the logarithm, we obtain
X A  1 
+O 2
p6x
p p

where the implied constant depends on A, and the result follows from the Mertens for-
mula. 
In Chapter 5, we will also need the generalization of these basic statements to primes
in arithmetic progressions. We recall that for x > 1, and any modulus q > 1 and
integer a ∈ Z, we define
X
π(x; q, a) = 1,
p6x
p≡a (mod q)

the number of primes p 6 x that are congruent to a modulo q. If a is not coprime


to q, then π(x; q, a) is bounded as x varies; it was one of the first major achievements
of analytic number theory when Dirichlet proved that, conversely, there are infinitely
many primes p ≡ a (mod q) if (a, q) = 1. This was done using the theory of Dirichlet
characters and L-functions, which we will survey later (see Section C.5). Here we state
the analogue of the Prime Number Theorem which shows that, asymptotically, all residue
classes modulo q are roughly equivalent.
173
Theorem C.3.7. For any fixed q > 1 and A > 1, and for any x > 2, we have
1 x  x 
π(x; q, a) = +O
ϕ(q) log x (log x)A
1 1
∼ π(x) ∼ li(x).
ϕ(q) ϕ(q)
C.4. The Riemann zeta function
As recalled in Section 3.1, the Riemann zeta function is defined first for complex
numbers s such that Re(s) > 1 by means of the absolutely convergent series
X 1
ζ(s) = s
.
n>1
n
By Lemma C.1.4, it has also the Euler product expansion
Y
ζ(s) = (1 − p−s )−1
p

in this region. Using this expression, we can compute the logarithmic derivative of the
zeta function, always for Re(s) > 1. We obtain the Dirichlet series expansion
ζ0 X (log p)p−s X Λ(n)
(C.5) − (s) = =
ζ p
1 − p−s n>1
ns
(using a geometric series expansion), where the function Λ is called the von Mangoldt
function, defined by
(
log p if n = pk for some prime p and some k > 1
(C.6) Λ(n) =
0 otherwise.
In other words, up to the “thin” set of powers of primes with exponent k > 2, the
function Λ is the logarithm restricted to prime numbers.
Beyond the region of absolute convergence, it is known that the zeta function extends
to a meromorphic function on all of C, with a unique pole located at s = 1, which is a
simple pole with residue 1 (see the argument in Section 3.1 for a simple proof of analytic
continuation to Re(s) > 0). More precisely, let
s
ξ(s) = π −s/2 Γ ζ(s)
2
for Re(s) > 1. Then ξ extends to a meromorphic function on C with simple poles at
s = 0 and s = 1, which satisfies the functional equation
ξ(1 − s) = ξ(s).
Because the Gamma function has poles at integers −k for k > 0, it follows that
ζ(−2k) = 0 for k > 1 (the case k = 0 is special because of the pole at s = 1). The
negative even integers are called the trivial zeros of ζ(s). Hadamard and de la Vallée
Poussin proved (independently) that ζ(s) 6= 0 for Re(s) = 1, and it follows that the
non-trivial zeros of ζ(s) are located in the critical strip 0 < Re(s) < 1.
Proposition C.4.1. (1) For 1/2 < σ < 1, we have
Z T
1
|ζ(σ + it)|2 dt −→ ζ(2σ)
2T −T
as T → +∞.
174
(2) We have
Z T
1
|ζ( 21 + it)|2 dt ∼ T(log T)
2T −T
as T → +∞.
See [117, Th. 7.2] for the proof of the first formula and [117, Th. 7.3] for the second
(which is due to Hardy and Littlewood).
Exercise C.4.2. This exercise explains the proof of the first formula (which is easier
than the second one).
(1) Prove that for 12 6 σ 6 σ 0 < 1 and for T > 2, we have
X 1 1
σ
 T2−2σ (log T).
16m<n6T
(mn) log(n/m)

(Consider separately the sum where m < 21 n and the remainder.)


(2) Prove that
Z T X
1 2
n−σ−it dt → ζ(2σ)
2T −T
n6|t|
as T → +∞. (Expand the square and integrate using (1).)
(3) Conclude using Proposition C.4.5 below.

For much more information concerning the analytic properties of the Riemann zeta
function, see [117]. Note however that the deeper arithmetic aspects are best understood
in the larger framework of L-functions, from Dirichlet L-functions (which are discussed
below in Section C.5) to automorphic L-functions (see, e.g, [59, Ch. 5]).
We will also use the Hadamard factorization of the Riemann zeta function. This is
an analogue of the factorization of polynomials in terms of their zeros, which holds for
meromorphic functions on C with restricted growth.
Proposition C.4.3. The zeros % of ξ(s) all satisfy 0 < Re(%) < 1, and there exists
constants α and β ∈ C such that
Y s  −s/%
s(s − 1)ξ(s) = eα+βs 1− e
%
%
for any s ∈ C, where the product runs over the zeros of ξ(s), counted with multiplicity,
and converges uniformly on compact subsets of C. In fact, we have
X 1
2
< +∞.
%
|%|

Given that s 7→ s(s − 1)ξ(s) is an entire function of finite order, this follows from the
general theory of such functions (see, e.g, [116, Th. 8.24] for Hadamard’s factorization
theorem). What is most important for us is the following corollary, which is an analogue
of partial fraction expansion for the logarithmic derivative of a polynomial – except that
it is most convenient here to truncate the infinite sum.
Proposition C.4.4. Let s = σ + it ∈ C be such that 21 6 σ 6 1 and ζ(s) 6= 0. Then
there are  log(2 + |t|) zeros % of ξ such that |s − %| 6 1, and we have
ζ 0 (s) 1 1 X 1
− = + − + O(log(2 + |t|)),
ζ(s) s s−1 s−%
|s−%|<1

175
where the sum is over zeros % of ζ(s) such that |s − %| < 1, counted with multiplicity.
Sketch of proof. We first claim that the constant β in Proposition C.4.3 satisfies
X
(C.7) Re(β) = − Re(%−1 ),
%

where % runs over all the zeros of ξ(s) with multiplicity. Indeed, applying the Hadamard
product expansion to both sides of the functional equation ξ(1 − s) = ξ(s) and taking
logarithms, we obtain
X 1 1 1 1
2 Re(β) = β + β̄ = − + + + .
%
s − % 1 − s − %̄ % %̄

For any fixed s that is not a zero of ξ(s), we have (s − %)−1 − (1 − s − %̄)−1  |%|−2 , where
the implied constant depends on s. Similarly similarly %−1 + %̄−1  |%|−2 , so the series
X 1 1  X 1 1
+ and +
%
s − % 1 − s − %̄ %
% %̄
are absolutely convergent. So we can separate them; the first one vanishes, because the
terms cancel out (both % and 1 − %̄ are zeros of ζ(s)), and we obtain (C.7).
Now let T > 2 and s = 3 + iT. Using the expansion
ζ 0 (s) X X
− = (log p)p−ks ,
ζ(s) k>0 p

we get the trivial estimate


ζ0
(s) 6 ζ 0 (3).
ζ
0
By Stirling’s formula (Proposition A.3.3), we have ΓΓ (s/2)  log(2 + T), and for any zero
% = β + iγ of ξ(s), we have
2  1  3
< Re < .
9 + (T − γ)2 s−% 4 + (T − γ)2
If we compute the real part of the formula
ζ0 Γ0 1 1 X 1 1
− (s) = (s/2) − β + + − + ,
ζ Γ s s−1 %
s−% %

and rearrange the resulting absolutely convergent series (using (C.7)), we get
X 1
(C.8)  log(2 + T).
%
1 + (T − γ)2

This convenient estimate implies, as claimed, that there are  log(2 + T) zeros % such
that |Im(% − T)| 6 1.
Now, finally, let s = σ + it such that 12 6 σ 6 1 and ξ(s) 6= 0. We have
ζ0 ζ0 ζ0
− (s) = − (s) + (3 + it) + +O(log(2 + |t|)),
ζ ζ ζ
by the previous elementary estimate. Hence (by the Stirling formula again) we have
ζ0 1 1 X 1 1 
− (s) = + − − + O(log(2 + |t|)).
ζ s s−1 %
s − % 3 + it − %
176
In the series, we keep the zeros with |s − %| < 1, and we estimate the contribution of the
others by
X 1 1 X 3
− 6  log(2 + |t|)
s − % 3 + it − % 1 + (T − γ)2
|s−%|>1 |s−%|>1

by (C.8). 
1
We will use an elementary approximation for ζ(s) in the strip 2
< Re(s) < 1.
Proposition C.4.5. Let T > 1. For σ > 1/2, and for any s = σ + it with 1/2 6 σ <
3/4 and |t| 6 T, we have
X  T1−σ 
−s −1/2
ζ(s) = n +O +T .
16n6T
|t| + 1

Proof. This follows from [117, Th. 4.11] (a result first proved by Hardy and Little-
wood) which states that for any σ0 > 0, we have
X T1−σ
ζ(s) = n−s − + O(T−1/2 )
16n6T
1−s
for σ > σ0 , since 1/(1 − s)  1/(|t| + 1) if 1/2 6 σ < 3/4. 
The last (and most subtle) result concerning the zeta function that we need is an
important refinement of (2) in Proposition C.4.1.
Proposition C.4.6. Let T > 1 be a real number and let m, n be integers such that
1 6 m, n 6 T. Let σ be a real number with 21 6 σ 6 1. We have
Z T  it  (m, n)2 σ
1 m
|ζ(σ + it)|2 dt = ζ(2σ)
2T −T n mn
(m, n)2 1−σ T  |t| 1−2σ
Z
1 
+ ζ(2 − 2σ) dt + O(min(m, n)T−σ+ε ).
2T mn −T 2π

This is essentially due to Selberg [110, Lemma 6], and a proof is given by Radziwill
and Soundararajan [95, §6].

C.5. Dirichlet L-functions


Let q > 1 be an integer. The Dirichlet L-functions modulo q are Dirichlet series
attached to characters of the group of invertible residue classes modulo q. More precisely,
for any such character χ : (Z/qZ)× → C× , we extend it to Z/qZ by sending non-invertible
classes to 0, and then we view it as a q-periodic function on Z. The resulting function on Z
is called a Dirichlet character modulo q. (See Example B.6.2 (3) for the definition and
basic properties of characters of finite abelian groups; an excellent elementary account
can also be found in Serre’s book [112, §VI.1].)
We denote by 1q the trivial character modulo q (which is identically 1 on all invertible
residue classes modulo q and 0 elsewhere). A character χ such that χ(n) ∈ {±1} for
all n coprime to q is called a real character. This condition is equivalent to having χ
real-valued, or to having χ2 = 1q .
By the duality theorem for finite abelian groups (see Example B.6.2, (3)), the set
of Dirichlet characters modulo q is a group under pointwise multiplication with 1q as
the identity element, and it is isomorphic to (Z/qZ)× ; in particular, the number of
177
Dirichlet characters modulo q is ϕ(q). Moreover, the Dirichlet characters modulo q form
an orthonormal basis of the space of complex-valued functions on (Z/qZ)× .
Let χ be a Dirichlet character modulo q. By construction, the function χ is multi-
plicative on Z, in the strong sense that χ(nm) = χ(n)χ(m) for all integers n and m (even
if they are not coprime).
The orthonormality property of characters of a finite group implies the following
fundamental relation:
Proposition C.5.1. Let q > 1 be an integer. For any x and y in Z, we have
(
1 X 1 if x ≡ y (mod q) and x, y are coprime with q
χ(x)χ(y) =
ϕ(q) 0 otherwise.
χ (mod q)

where the sum is over all Dirichlet characters modulo q.


Proof. If x or y is not coprime with q, then the formula is valid because both sides
are zero. Otherwise, this is a special case of the general decomposition formula
(
1 X 1 if x = y
(C.9) χ(x)χ(y) =
|G| 0 if x 6= y
χ∈G b

for any finite abelian group G and elements x and y of G. Indeed, if we view y as fixed
and x as a variable, this is simply the decomposition of the characteristic function fy of
the element y ∈ G in the orthonormal basis of characters: this decomposition is
X
fy = hfy , χiχ,
χ∈G
b

which becomes X
fy = χ(y)χ,
χ∈G
b

from which in turn (C.9) follows by evaluating at x. 


Let q > 1 be an integer and χ a Dirichlet character modulo q. One defines
X χ(n)
L(s, χ) =
n>1
ns
for all s ∈ C such that Re(s) > 1; since |χ(n)| 6 1 for all n ∈ Z, this series is abso-
lutely convergent and defines a holomorphic function in this region, called the L-function
associated to χ.
In the region where Re(s) > 1, the multiplicativity of χ implies that we have the
absolutely convergent Euler product expansion
Y
L(s, χ) = (1 − χ(p)p−s )−1
p

(by Lemma C.1.4 applied to f (n) = χ(n)n−s for any s ∈ C such that Re(s) > 1). In
particular, we deduce that L(s, χ) 6= 0 if Re(s) > 1. Moreover, computing the logarithmic
derivative, we obtain the formula
L0 X
− (s, χ) = Λ(n)χ(n)n−s
L n>1

for Re(s) > 1.


178
For the trivial character 1q modulo q, we have the formula
Y Y
L(s, 1q ) = (1 − p−s )−1 = ζ(s) (1 − p−s ).
p-q p|q

Since the second factor is a finite product of quite simple form, we see that, when q
is fixed, the analytic properties of this particular L-function are determined by those of
the Riemann zeta function. In particular, it has meromorphic continuation with a simple
pole at s = 1, where the residue is
Y ϕ(q)
(1 − p−1 ) = .
q
p|q

For χ non-trivial, we have the following result (see, e.g. [59, §5.9]):
Theorem C.5.2. Let χ be a non-trivial Dirichlet character modulo q. Define εχ = 0
if χ(−1) = 1 and εχ = 1 if χ(−1) = −1. Let
s + ε 
χ
ξ(s, χ) = π −(s+εχ )/2 q s/2 Γ L(s, χ)
2
for Re(s) > 1. Furthermore, let
1 X x
τ (χ) = √ χ(x)e .
q ×
q
x∈(Z/qZ)

Then ξ(s, χ) extends to an entire function on C which satisfies the functional equation
ξ(s, χ) = τ (χ)ξ(1 − s, χ).
In Chapter 5, we will require the basic information on the distribution of zeros of
Dirichlet L-functions. We summarize it in the following proposition (see, e.g., [59, Th.
5.24]).
Proposition C.5.3. Let χ be a Dirichlet character modulo q.
(1) For T > 1, the number N(T; χ) of zeros % of L(s, χ) such that
Re(%) > 0, | Im(%)| 6 T
satisfies
T  qT  T
(C.10) N(T; χ) = log − + O(log q(T + 1)),
π 2π π
where the implied constant is absolute.
(2) For any ε > 0, the series X
|%|−1−ε
%
converges, where % runs over zeros of L(s, χ) such that Re(%) > 0.
Remark C.5.4. These two statements are not independent, and in fact the first one
implies the second by splitting the partial sum
X 1
|%|1+ε
|%|6T

for T > 1 in terms of zeros in intervals of length 1:


X 1 X 1 X X log N
6 1
|%|1+ε 16N6T N1+ε 16N6T
N1+ε
|%|6T N−16|%|6N

179
by (1). Since this is uniformly bounded for all T, we obtain (2).
Corollary C.5.5. Let χ be a Dirichlet character modulo q.
(1) We have
X 1
1  (log T)2
0<γ<T
| 2
+ iγ|
1
L( +iγ,χ)=0
2
for T large enough.
(2) We have
X 1 log T
 ,
γ>T
| 12 + iγ|2 T
1
L( +iγ,χ)=0
2
for T > 1.
Finally, we need a form of the explicit formula linking zeros of Dirichlet L-functions
with the distribution of prime numbers.
Theorem C.5.6. Let q > 1 be an integer and let χ be a non-trivial Dirichlet character
modulo q. For any x > 2 and any X > 2 such that 2 6 x 6 X, we have
X X xβ+iγ  x(log qx)2 
Λ(n)χ(n) = − +O ,
n6x
β + iγ X
L(β+iγ)=0
|γ|6X

where the sum is over non-trivial zeros of L(s, χ), counted with multiplicity, and the
implied constant is absolute.
Sketch of proof. We refer to, e.g., [59, Prop. 5.25] for this result. Here we wish to
justify intuitively the existence of such a relation between sums (essentially) over primes,
and sums over zeros of the associated L-function.
Pick a function ϕ defined on [0, +∞[ with compact support. Using the Mellin inversion
formula, (see Proposition A.3.1, (3)), we can write
L0
n Z
X 1 s
Λ(n)χ(n)ϕ = − (s, χ)ϕ(s)x
b ds
n>1
x 2iπ (2) L
for all x > 1. Assume (formally) that we can shift the integration line to the left, say to
the line where the real part is 1/4, where the contribution would be x1/4 . The contour
shift leads to poles located at all the zeros of L(s, χ), with residue equal to the opposite
of the multiplicity of the zero (since the L-function is entire, there is no contribution from
poles). We can therefore expect that
X n X
%
Λ(n)χ(n)ϕ =− ϕ(%)x
b + (small error).
n>1
x %

where % runs over non-trivial zeros of L(s, χ), counted with multiplicity.
If such a formula holds for the characteristic function ϕ of the interval [0, 1], then
since Z 1
1
ϕ(s)
b = xs−1 dx = ,
0 s
we would obtain
X n X x%
Λ(n)χ(n)ϕ =− + (small error).
n>1
x %
%
180

Remark C.5.7. There is non-trivial analytic work to do in order to justify the com-
putations in this sketch, because of various convergence issues for instance (which also
explains why the formula is most useful in a truncated form involving only finitely many
zeros), but this formal outline certainly explains the existence of the explicit formula.
This explicit formula explains why the location of zeros of Dirichlet L-functions is so
important in the study of prime numbers in arithmetic progressions. This motivates the
Generalized Riemann Hypothesis modulo q:
Conjecture C.5.8 (Generalized Riemann Hypothesis). For any integer q > 1 and
for any Dirichlet character χ modulo q and any zero % = β + iγ of its L-function such
that 0 < β 6 1, we have β = 12 .
This is the most famous open problem of number theory. In practice, we will also speak
of Generalized Riemann Hypothesis modulo q when considering only the fixed modulus q
instead of all moduli. The case q = 1 corresponds to the original Riemann Hypothesis
for the Riemann zeta function only.
By just applying orthogonality (Proposition C.5.1) and estimating trivially in the
explicit formula with the help of Proposition C.5.3, we deduce:
Proposition C.5.9. Let q > 1 be an integer. Assume that the Generalized Riemann
Hypothesis modulo q holds. Then we have
X x
Λ(n) = + O(x1/2 (log qx)2 ).
n6x
ϕ(q)
n≡a (mod q)

Remark C.5.10. Compare the quality of the error term with the (essentially) best
known unconditional result of Theorem C.3.7.
Another corollary of the explicit formula that will be helpful in Chapter 5 is the
following:
Corollary C.5.11. Let q > 1 be an integer and let χ be a non-trivial Dirichlet
character modulo q. Assume that the Generalized Riemann Hypothesis holds for L(s, χ),
i.e., that all non-trivial zeros of L(s, χ) have real part 1/2. For any x > 2, we have
Z x X 
Λ(n)χ(n) dt  x3/2 ,
2 n6t

where the implied constant depends on q.


Proof. Pick X = x in the explicit formula. Using the assumption on the zeros, we
obtain by integration the expression
1
Z x X  X Z x t 2 +iγ
Λ(n)χ(n) dt = 1 dt + O(x(log qx)2 )
2 n6t 1 2 2 + iγ
L( +iγ)=0
2
|γ|6x
1 1
X x 2 +iγ+1 − 2 2 +iγ+1
= + O(x(log qx)2 )  x3/2 ,
1
( 12 + iγ)( 21 + iγ + 1)
L( +iγ)=0
2
|γ|6x

181
where the implied constant depends on q, since the series
X 1
1 1
1
( 2 + iγ)( 2 + iγ + 1)
L( +iγ)=0
2

converges absolutely by Proposition C.5.3, (2). 

C.6. Exponential sums


In Chapter 6, we studied some properties of exponential sums. Although we do not
have the space to present a detailed treatment of such sums, we will give a few examples
and try to explain some of the reasons why such sums are important and interesting.
This should motivate the “curiosity driven” study of the shape of the partial sums. We
refer to the notes [75] and to [59, Ch. 11] for more information, including proofs of the
Weil bound (6.1) for Kloosterman sums.
In principle, any finite sum
X
SN = e(αn )
16n6N

of complex numbers of modulus 1 counts as an exponential sum, and the goal is – given
the phases αn ∈ R – to obtain a bound on S that improves as much as possible on the
“trivial” bound |SN | 6 N.
On probabilistic
√ grounds one can expect that for highly oscillating phases, the sum SN
is of size about N. Indeed, if we consider αn to be random variables that are independent

and uniformly distributed in R/Z, then the Central Limit Theorem shows that SN / N is
distributed approximately like a standard complex
√ gaussian random variable, so that the
“typical” size of SN is of order of magnitude N. When this occurs also for deterministic
sums (up to factors of smaller order of magnitude), one says that the sums have square-
root cancellation; this usually only makes sense for an infinite sequence of sums where
N → +∞.
Example C.6.1. For instance, the partial sums
X
MN = µ(n)
16n6N

of the Möbius function can be seen in this light. Estimating MN is vitally important
in analytic number theory, because it is not very hard to check that the Prime Number
Theorem, in the form (C.3), with error term x/(log x)A for any A > 0, is equivalent with
the estimate
N
MN 
(log N)A
for any A > 0, where the implied constant depends on A. Moreover, the best possible
estimate is the square-root cancellation
MN  N1/2+ε ,
with an implied constant depending on ε > 0, and this is known to be equivalent to the
Riemann Hypothesis for the Riemann zeta function.
The sums that appear in Chapter 6 are however of a fairly different nature. They
are sums over finite fields (or subsets of finite fields), with summands e(αn ) of “algebraic
182
nature”. For a prime p and the finite field Fp with p elements,2 the basic examples are
of the following types:
Example C.6.2. (1) [Additive character sums] Fix a rational function f ∈ Fp (T).
Then for x ∈ Fp that is not a pole of f , we can evaluate f (x) ∈ Fp , and e(f (x)/p) is a
well-defined complex number. Then consider the sum
X
e(f (x)/p).
x∈Fp
f (x) defined

For fixed a and b in F×p , the example f (T) = aT + bT


−1
gives rise to the Kloosterman
2
sum of Section 6.1. If f (T) = T , we obtain a quadratic Gauss sum, namely
X  x2 
e .
x∈F
p
p

(2) [Multiplicative character sums] Let χ be a non-trivial character of the finite multi-
plicative group F×p ; we define χ(0) = 0 to extend it to Fp . Let f ∈ Fp [T] be a polynomial
(or a rational function). The corresponding multiplicative character sum is
X
χ(f (x)).
x∈Fp

One may also have finitely many polynomials and characters and sum their products. An
important example of these is
X
χ1 (x)χ2 (1 − x),
x∈Fp

for multiplicative characters χ1 and χ2 , which is called a Jacobi sum.


(3) [Mixed sums] In fact, one can mix the two types, obtaining a family of sums that
generalize both: fix rational functions f1 and f2 in Fp (T), and consider the sum
X
χ(f1 (x))e(f2 (x)/p),
x∈Fp

where the summand is defined to be 0 if f2 (x) is not defined, or if f1 (x) is 0 or not defined.
Some of the key examples are obtained in this manner. Maybe the simplest interesting
ones are the Gauss sums attached to χ, defined by
X
χ(x)e(ax/p)
x∈Fp

where a ∈ Fp is a parameter. Others are the sums


X  ax + bx̄ 
χ(x)e
×
p
x∈Fp

for a, b in Fp , which generalize the Kloosterman sums. When χ is a character of order 2


(i.e., χ(x) is either 1 or −1 for all x ∈ Fp ), this is called a Salié sum.
2 For simplicity, we restrict to these particular finite fields, but the theory extends to all.
183
Remark C.6.3. We emphasize that the sums that we discuss range over the whole
finite field (except for values of x where the summand is not defined). Sums over smaller
subsets of Fp (e.g., over a segment 1 6 x 6 N < p of integers) are very interesting and
important in applications (indeed, they are the topic of Chapter 6!), but behave very
differently.
Except for a few special cases (some of which are discussed below in exercises), a
simple “explicit” evaluation of exponential sums of the previous types is not feasible.
Even deriving non-trivial bounds is far from obvious, and the most significant progress
requires input from algebraic geometry. The key result, proved by A. Weil in the 1940’s,
takes the following form (in a simplified version that is actually rather weaker than the
actual statement). It is a special case of the Riemann Hypothesis over finite fields.
Theorem C.6.4 (Weil). Let χ be a non-trivial multiplicative character modulo q. Let
f1 and f2 be rational functions in Fp [T], and consider the sum
X
χ(f1 (x))e(f2 (x)/p).
x∈Fp

Assume that either f1 is not of the form g1d , where d is the order of χ and g1 ∈ Fp [T], or
f2 has a pole of order not divisible by p, possibly at infinity.
Then, there exists an integer β, depending only on the degrees of the numerator and
denominator of f1 and f2 , and for 1 6 i 6 β, there exist complex numbers αi such that

|αi | 6 p, with the property that
β
X X
χ(f1 (x))e(f2 (x)/p) = − αi .
x∈Fp i=1

In particular, we have
X √
χ(f1 (x))e(f2 (x)/p) 6 β p.
x∈Fp

In fact, one can provide formulas for the integer β that are quite explicit (in terms
of the zeros and poles of the rational functions f1 and f2 ), and often one knows that

|αi | = q for all i. For instance, if f1 = 1 (so that the sum is an additive character sum)

and f2 is a polynomial such that 1 6 deg(f2 ) < p, then β = deg(f2 ) − 1, and |αi | = p
for all p.
For more discussion and a proof in either the additive or multiplicative cases, we refer
to [75].
The following exercises illustrate this general result in three important cases. Note
however, that there is no completely elementary proof in the case of Kloosterman sums,
where β = 2, leading to (6.1).
Exercise C.6.5 (Gauss sums). Let χ be a non-trivial multiplicative character of F×
p
and a ∈ F×
p . Denote
X
τ (χ, a) = χ(x)e(ax/p),
x∈Fp

and put τ (χ) = τ (χ, 1) (up to normalization, this is the same sum as occurs in the
functional equation for the Dirichlet L-function L(s, χ), see Theorem C.5.2).
(1) Prove that

|τ (χ, a)| = p.
184

(This proves the corresponding special case of Theorem C.6.4 with β = 1 and |α1 | = p.)
[Hint: Compute the modulus square, or apply the discrete Parseval identity.]
(2) Prove that for any automorphism σ of the field C, we also have

|σ(τ (χ, a))| = p.
(This additional property is also true for all αi in Theorem C.6.4 in general; it means
that each αi is a so-called p-Weil number of weight 1.)
Exercise C.6.6 (Jacobi sums). Let χ1 and χ2 be non-trivial multiplicative characters
of F×
p . Denote
X
J(χ1 , χ2 ) = χ1 (x)χ2 (1 − x).
x∈Fp

(1) Prove that


τ (χ1 )τ (χ2 )
J(χ1 , χ2 ) =
τ (χ1 χ2 )
and deduce that Theorem C.6.4 holds for the Jacobi sums J(χ1 , χ2 ) with β = 1 and
|α1 | = 1. Moreover, show that α1 satisfies the property of the second part of the previous
exercise.
(3) Assume that p ≡ 1 (mod 4). Prove that there exist integers a and b such that
a + b2 = p (a result of Fermat).[Hint: Show that there are characters χ1 of order 2 and
2

χ2 of order 4 of F×
p , and consider J(χ1 , χ2 ).]

Exercise C.6.7 (Salié sums). Assume that p > 3.


(1) Check that there is a unique non-trivial real character χ2 of F×
p . Prove that for
2
any x ∈ Fp , the number of y ∈ Fp such that y = x is 1 + χ2 (y).
(2) Prove that
τ (χ2 )τ (χ2 ) = χ(4)τ (χ)τ (χχ2 ).
(Hasse–Davenport relation). [Hint: Use the formula for Jacobi sums, and compute J(χ, χ)
in terms of the number of solutions of quadratic equations; express this number of solu-
tions in terms of χ2 .]
For (a, b) ∈ F×
p , define
X  ax + bx̄ 
S(a, b) = χ2 (x)e .
×
p
x∈Fp

(3) Show that for b ∈ F×


p, we have
X
S(a, b) = s(χ)χ(a)
χ

where
χ(b)χ2 (b)τ (χ̄)τ (χ̄χ2 )
s(χ) = .
q−1
[Hint: Use a discrete multiplicative Fourier expansion.]
(4) Show that
χ2 (b)τ (χ2 )
s(χ) = χ(4b)τ (χ̄2 ).
q−1
(5) Deduce that
X y 
S(a, b) = τ (χ2 ) e .
2
p
ay =4b

185
(6) Deduce that Theorem C.6.4 holds for S(a, b) with either β = 0 or β = 2, in which

case |α1 | = |α2 | = p.

186
Bibliography

[1] L.-P. Arguin, D. Belius, P. Bourgade, M. Radziwill, and K. Soundararajan: Maximum of the
Riemann zeta function on a short interval of the critical line, Communications in Pure and
Applied Mathematics 72 (2017), 500–535.
[2] R. Arratia, A.D. Barbour and S. Tavaré: Logarithmic combinatorial structures: a probabilistic
approach, E.M.S. Monographs, 2003.
[3] P. Autissier, D. Bonolis and Y. Lamzouri: The distribution of the maximum of partial sums of
Kloosterman sums and other trace functions, preprint arXiv:1909.03266.
[4] B. Bagchi: Statistical behaviour and universality properties of the Riemann zeta function and
other allied Dirichlet series, PhD thesis, Indian Statistical Institute, Kolkata, 1981; available at
library.isical.ac.in:8080/jspui/bitstream/10263/4256/1/TH47.CV01.pdf.
[5] A. Barbour, E. Kowalski and A. Nikeghbali: Mod-discrete expansions, Probability Theory and
Related Fields, 2013; doi:10.1007/s00440-013-0498-8.
[6] P. Billingsley: On the distribution of large prime factors, Period. Math. Hungar. 2 (1972),
283–289.
[7] P. Billingsley: Prime numbers and Brownian motion, American Math. Monthly 80 (1973),
1099–1115.
[8] P. Billingsley: The probability theory of additive arithmetic functions, The Annals of Probability
2 (1974), 749–791.
[9] P. Billingsley: Probability and measure, 3rd edition, Wiley, 1995.
[10] P. Billingsley: Convergence of probability measures, 2nd edition, Wiley, 1999.
[11] V. Blomer, É. Fouvry, E. Kowalski, Ph. Michel, D. Milićević and W. Sawin: The second moment
theory of families of L-functions: the case of twisted Hecke L-functions, Memoirs of the A.M.S,
to appear; arXiv:1804.01450
[12] J. Bober, E. Kowalski and W. Sawin: On the support of the Kloosterman paths, preprint (2019).
[13] E. Bombieri and J. Bourgain: On Kahane’s ultraflat polynomials, J. Europ. Math. Soc. 11
(2009), 627–703.
[14] D. Bonolis: On the size of the maximum of incomplete Kloosterman sums, preprint arXiv:
1811.10563.
[15] N. Bourbaki: Éléments de mathématique, Topologie générale, Springer.
[16] N. Bourbaki: Éléments de mathématique, Fonctions d’une variable réelle, Springer.
[17] N. Bourbaki: Éléments de mathématique, Intégration, Springer.
[18] N. Bourbaki: Éléments de mathématique, Algèbres et groupes de Lie, Springer.
[19] N. Bourbaki: Éléments de mathématique, Théories spectrales, Springer.
[20] L. Breiman: Probability, Classic in Applied Math., SIAM, 1992.
[21] E. Breuillard and H. Oh (eds): Thin groups and super-strong approximation, MSRI Publications
Vol. 61, Cambridge Univ. Press, 2014.
[22] F. Cellarosi and J. Marklof: Quadratic Weyl sums, automorphic functions and invariance prin-
ciples, Proc. Lond. Math. Soc. (3) 113 (2016), 775–828.
[23] B. Cha, D. Fiorilli and F. Jouve: Prime number races for elliptic curves over function fields,
Ann. Sci. Éc. Norm. Supér. (4) 49 (2016), 1239–1277.
[24] P.L. Chebyshev: Lettre de M. le Professeur Tchébychev à M. Fuss sur un nouveau théorème re-
latif aux nombres premiers contenus dans les formes 4n+1 et 4n+3, in Oeuvres de P.L. Tcheby-
chef, vol. I, Chelsea 1962, pp. 697–698; also available at https://archive.org/details/
117744684_001/page/n709
[25] H. Cohen and H. W. Lenstra, Jr.: Heuristics on class groups of number fields, in “Number
theory (Noordwijkerhout, 1983)”, L.N.M 1068, Springer, Berlin, 1984, pp. 33–62.

187
[26] L. Devin: Chebyshev’s bias for analytic L-functions, Math. Proc. Cambridge Philos. Soc. 169
(2020), 103–140.
[27] P. Donnelly and G. Grimmett: On the asymptotic distribution of large prime factors, J. London
Math. Soc. 47 (1993), 395–404.
[28] J.J. Duistermaat and J.A.C. Kolk: Distributions: theory and applications, Birkhaüser, 2010.
[29] W. Duke, J. Friedlander and H. Iwaniec: Equidistribution of roots of a quadratic congruence to
prime moduli, Ann. of Math. 141 (1995), 423–441.
[30] M. Einsiedler and T. Ward: Ergodic theory: with a view towards number theory, Grad. Texts
in Math. 259, Springer 2011. √
[31] N.D. Elkies and C.T. McMullen: Gaps in n mod 1 and ergodic theory, Duke Math. J. 123
(2004), 95–139.
[32] J.S. Ellenberg, A. Venkatesh and C. Westerland: Homological stability for Hurwitz spaces and
the Cohen-Lenstra conjecture over function fields, Ann. of Math. 183 (2016), 729–786.
[33] P. Erdős: On the density of some sequences of numbers, III, Journal of the LMS 13 (1938),
119–127.
[34] P. Erdős: On the smoothness of the asymptotic distribution of additive arithmetical functions,
Amer. J. Math. 61 (1939), 722–725.
[35] P. Erdős and M. Kac: The Gaussian law of errors in the theory of additive number theoretic
functions, Amer. J. Math. 62 (1940), 738–742.
[36] P. Erdős and A. Wintner: Additive arithmetical functions and statistical independence, Amer.
J. Math. 61 (1939), 713–721.
[37] W. Feller: An introduction to probability theory and its applications, Vol. 2, Wiley, 1966.
[38] D. Fiorilli and F. Jouve: Distribution of Frobenius elements in families of Galois extensions,
preprint, https://hal.inria.fr/hal-02464349.
[39] Y. V. Fyodorov, G. A. Hiary and J. P. Keating: Freezing transition, characteristic polynomials
of random matrices, and the Riemann Zeta-Function, Phys. Rev. Lett. 108 (2012), 170601 (5pp).
[40] G.B. Folland: Real analysis, Wiley, 1984.
[41] K. Ford: The distribution of integers with a divisor in a given interval, Annals of Math. 168
(2008), 367–433.
[42] É. Fouvry, E. Kowalski and Ph. Michel: An inverse theorem for Gowers norms of trace functions
over Fp , Math. Proc. Cambridge Phil. Soc. 155 (2013), 277–295.
[43] J. Friedlander and H. Iwaniec: Opera de cribro, Colloquium Publ. 57, A.M.S, 2010.
[44] P.X. Gallagher: The large sieve and probabilistic Galois theory, in Proc. Sympos. Pure Math.,
Vol. XXIV, Amer. Math. Soc. (1973), 91–101.
[45] P.X. Gallagher: On the distribution of primes in short intervals, Mathematika 23 (1976), 4–9.
[46] M. Gerspach: On the pseudomoments of the Riemann zeta function, PhD thesis, ETH Zürich,
2020.
[47] É. Ghys: Dynamique des flots unipotents sur les espaces homogènes, Séminaire N. Bourbaki
1991–92, exposé 747; http://numdam.org/item?id=SB_1991-1992__34__93_0.
[48] A. Granville: The anatomy of integers and permutations, preprint (2008), http://www.dms.
umontreal.ca/~andrew/PDF/Anatomy.pdf
[49] A. Granville and J. Granville: Prime suspects, Princeton Univ. Press, 2019; illustrated by R.J.
Lewis.
[50] A. Granville and G. Martin: Prime number races, Amer. Math. Monthly 113 (2006), 1–33.
[51] A. Granville and K. Soundararajan: Sieving and the Erdös-Kac theorem, in “Equidistribution
in number theory, an introduction”, 15–27, Springer 2007.
[52] G.H. Hardy and E.M. Wright: An introduction to the theory of numbers, 5th edition, Oxford
1979.
[53] A. Harper: Two new proofs of the Erdős–Kac Theorem, with bound on the rate of convergence,
by Stein’s method for distributional approximations, Math. Proc. Camb. Phil. Soc. 147 (2009),
95–114.
[54] A. Harper: The Riemann zeta function in short intervals, Séminaire N. Bourbaki, 71ème année,
exposé 1159, March 2019.
[55] A. Harper: On the partition function of the Riemann zeta function, and the Fyodorov–Hiary–
Keating conjecture, preprint (2019); arXiv:1906.05783.

188
[56] A. Harper and Y. Lamzouri: Orderings of weakly correlated random variables, and prime number
races with many contestants, Probab. Theory Related Fields 170 (2018), 961–1010.
[57] C. Hooley: On the distribution of the roots of polynomial congruences, Mathematika 11 (1964),
39–49.
[58] K. Ireland and M. Rosen: A Classical Introduction to Modern Number Theory, 2nd Edition,
GTM 84, Springer-Verlag (1990).
[59] H. Iwaniec and E. Kowalski: Analytic Number Theory, Colloquium Publ. 53, A.M.S, 2004.
[60] O. Kallenberg: Foundations of modern probability theory, Probability and its Applications,
Springer (1997).
[61] N.M. Katz: Gauss sums, Kloosterman sums and monodromy groups, Annals of Math. Studies
116, Princeton Univ. Press (1988).
[62] N.M. Katz and P. Sarnak: Zeroes of zeta functions and symmetry, Bull. Amer. Math. Soc.
(N.S.) 36 (1999), 1–26.
[63] J.P. Keating and N.C. Snaith: Random matrix theory and ζ(1/2 + it), Commun. Math. Phys.
214, (2000), 57–89.
[64] D. Koukoulopoulos: Localized factorizations of integers, Proc. London Math. Soc. 101 (2010),
392–426.
[65] E. Kowalski: The large sieve and its applications, Cambridge Tracts in Math., vol 175, C.U.P
(2008).
[66] E. Kowalski: Poincaré and analytic number theory, in “The scientific legacy of Poincaré”, edited
by É. Charpentier, É. Ghys and A. Lesne, A.M.S, 2010.
[67] E. Kowalski: Sieve in expansion, Séminaire Bourbaki, Exposé 1028 (November 2010); in
Astérisque 348, Soc. Math. France (2012), 17–64.
[68] E. Kowalski: The large sieve, monodromy, and zeta functions of algebraic curves, II Indepen-
dence of the zeros, IMRN (2008) Art. ID rnn 091, 57 pages.
[69] E. Kowalski: Families of cusp forms, Pub. Math. Besançon, 2013, 5–40.
[70] E. Kowalski: An introduction to the representation theory of groups, Grad. Studies in Math.
155, A.M.S, 2014.
[71] E. Kowalski: The Kloostermania page, blogs.ethz.ch/kowalski/the-kloostermania-page/
[72] E. Kowalski: Bagchi’s Theorem for families of automorphic forms, in “Exploring the Riemann
Zeta function”, Springer (2017), 180–199.
[73] E. Kowalski: Averages of Euler products, distribution of singular series and the ubiquity of
Poisson distribution, Acta Arithmetica 148.2 (2011), 153–187.
[74] E. Kowalski: Expander graphs and their applications, Cours Spécialisés 26, Soc. Math. France
(2019).
[75] E. Kowalski: Exponential sums over finite fields, I: elementary methods, www.math.ethz.ch/
~kowalski/exp-sums.pdf
[76] E. Kowalski and Ph. Michel: The analytic rank of J0 (q) and zeros of automorphic L-functions,
Duke Math. J. 100 (1999), 503–542.
[77] E. Kowalski and A. Nikeghbali: Mod-Poisson convergence in probability and number theory,
International Mathematics Research Notices 2010; doi:10.1093/imrn/rnq019
[78] E. Kowalski and A. Nikeghbali: Mod-Gaussian convergence and the value distribution of ζ(1/2+
it) and related quantities, Journal of the L.M.S. 86 (2012), 291–319.
[79] E. Kowalski and W. Sawin: Kloosterman paths and the shape of exponential sums, Compositio
Math. 152 (2016), 1489–1516.
[80] E. Kowalski and K. Soundararajan: Equidistribution from the Chinese Remainder Theorem,
preprint (2020).
[81] N. V. Krylov: Introduction to the theory of random processes, Graduate studies in mathematics
43, A.M.S, 2002.
[82] Y. Lamzouri: On the distribution of the maximum of cubic exponential sums, J. Inst. Math.
Jussieu 19 (2020), 1259–1286.
[83] D. Li and H. Queffélec: Introduction à l’étude des espaces de Banach; Analyse et probabilités,
Cours Spécialisés 12, S.M.F, 2004.
[84] A. Lubotzky, R. Phillips and P. Sarnak: Ramanuajan graphs, Combinatorica 8 (1988), 261–277.
[85] W. Bosma, J. Cannon and C. Playoust: The Magma algebra system, I. The user language, J.
Symbolic Comput. 24 (1997), 235–265; also http://magma.maths.usyd.edu.au/magma/

189
[86] J. Markov and A. Strömbergsson: The three gap theorem and the space of lattices, American
Math. Monthly 124 (2017), 741–745.
[87] D. Milićević and S. Zhang: Distribution of Kloosterman paths to high prime power moduli,
preprint, arXiv:2005.08865.
[88] S.J. Montgomery-Smith: The distribution of Rademacher sums, Proceedings of the AMS 109
(1990), 517–522.
[89] D.W. Morris: Ratner’s theorems on unipotent flows, Chicago Lectures in Mathematics, Univer-
sity of Chicago Press, 2005.
[90] J. Najnudel: On the extreme values of the Riemann zeta function on random intervals of the
critical line, Probab. Theory Related Fields 172 (2018), 387–452.
[91] J. Neukirch: Algebraic number theory, Springer, 1999.
[92] C. Perret-Gentil: Some recent interactions of probability and number theory, in Newsletter of
the European Mathematical Society, March 2019.
[93] PARI/GP, version 2.6.0, Bordeaux, 2011, http://pari.math.u-bordeaux.fr/.
[94] J. Pintz: Cramér vs. Cramér. On Cramér’s probabilistic model for primes, Functiones et Ap-
proximatio 37 (2007), 361–376,
[95] M. Radziwill and K. Soundararajan: Selberg’s central limit theorem for log |ζ(1/2 + it)|,
L’enseignement Mathématique 63 (2017), 1–19.
[96] M. Radziwill and K. Soundararajan: Moments and distribution of central L-values of quadratic
twists of elliptic curves, Invent. math. 202 (2015), 1029–1068.
[97] M. Radziwill and K. Soundararajan: Value distribution of L-functions, Oberwolfach report
40/2017, to appear.
[98] O. Randal-Williams: Homology of Hurwitz spaces and the Cohen–Lenstra heuristic for function
fields (after Ellenberg, Venkatesh, and Westerland), Séminaire N. Bourbaki 2019, exposé 1162;
arXiv:1906.07447; in Astérisque, to appear.
[99] M. Ratner: On Raghunathan’s measure conjecture, Ann. of Math (2) 134 (1991), 545–607.
[100] A. Rényi and P. Turán: On a theorem of Erdős and Kac, Acta Arith. 4 (1958), 71–84.
[101] G. Ricotta and E. Royer: Kloosterman paths of prime powers moduli, Comment. Math. Helv.
93 (2018), 493–532.
[102] G. Ricotta, E. Royer and I. Shparlinski: Kloosterman paths of prime powers moduli, II, to
appear in Bulletin de la SMF.
[103] J. Rotman: Advanced modern algebra, part I, 3rd edition, Grad. Studies Math. 165 (AMS),
2015.
[104] D. Revuz and M. Yor: Continuous Martingales and Brownian Motion, 3rd ed., Springer-Verlag,
Berlin, 1999.
[105] M. Rubinstein and P. Sarnak: Chebyshev’s Bias, Experimental Math. 3 (1994), 173–197.
[106] W. Rudin: Real and complex analysis, McGraw Hill, 1970.
[107] P. Sarnak: letter to B. Mazur on the Chebychev bias for τ (p), https://publications.ias.
edu/sites/default/files/MazurLtrMay08.PDF
[108] I. Schoenberg: Über die asymptotische Verteilung reeller Zahlen mod 1, Math. Z. 28 (1928),
171–199.
[109] I. Schoenberg: On asymptotic distributions of arithmetical functions, Trans. A.M.S. 39 (1936),
315–330.
[110] A. Selberg: Contributions to the theory of the Riemann zeta function, Arch. Math. Naturvid.
48 (1946), 89–155; or in Collected works, I.
[111] J.-P. Serre: Linear representations of finite groups, Grad. Texts in Math. 42, Springer (1977).
[112] J.-P. Serre: A course in arithmetic, Grad. Texts in Math. 7, Springer (1973).
[113] V.T. Sós: On the theory of diophantine approximations. Acta Math. Acad.Sci. Hungar. 8 (1957),
461–472.
[114] M. Talagrand: Concentration of measure and isoperimetric inequalities in product spaces, Publ.
Math. I.H.É.S 81 (1995), 73–205.
[115] G. Tenenbaum: Introduction to analytic and probabilistic number theory, Cambridge studies
adv. math. 46, 1995.
[116] E.C. Titchmarsh: The theory of functions, 2nd edition, Oxford Univ. Press, 1939.
[117] E.C. Titchmarsh: The theory of the Riemann zeta function, 2nd edition, Oxford Univ. Press,
1986.

190
[118] A. Tóth: Roots of quadratic congruences, Internat. Math. Res. Notices 2000, 719–739.
[119] S.M. Voronin: Theorem on the Universality of the Riemann Zeta Function, Izv. Akad. Nauk
SSSR, Ser. Matem. 39 (1975), 475–486; English translation in Math. USSR Izv. 9 (1975), 443–
445.
[120] H. Weyl: Über die Gleichverteilung von Zahlen mod. Eins, Math. Ann. 77 (1914).
[121] A. Zygmund: Trigonometric sums, 3rd Edition, Cambridge Math Library, Cambridge, 2002.

191
Index

L-function, 21, 57, 74, 175 Chinese Restaurant Process, 35


k-free integer, 11 Cohen–Lenstra heuristics, 116
p-adic valuation, 13, 45, 171 compact group, 143
complex gaussian random variable, 150
absolute moments, 140 complex logarithm, 121
additive character sum, 183, 184 complex moments, 142
additive function, 20–22, 26, 171 composition principle, 133
affine transformation, 118 convergence almost surely, 16, 46, 90, 101, 152,
algebraic differential equation, 56 157
almost prime, 62 convergence in law, 4, 7, 13, 40, 52, 65, 80, 82,
approximately independent random variables, 100, 133, 139, 146, 157, 160
5, 12 convergence in probability, 5, 82, 88, 135, 136,
approximation, 15 153
arithmetic function, 22, 36 convergence in L1 , 160
arithmetic random function, 80 convergence in L2 , 90
arithmetic random variable, 4, 22, 29, 39, 62, 98 convergence in Lp , 5, 136
arithmetic truncation, 62 convergence of finite distributions, 13, 102, 106,
asymptotic independence of prime numbers, 57 160, 162
automorphic L-function, 175 converse of the method of moments, 30, 141
automorphism group, 117 coprime integers, 11, 12, 168, 171
Bagchi’s Theorem, 41, 44, 51, 57, 74, 90 correlation matrix, 148
Banach space, 36, 41, 52, 94, 98, 102, 107, 128, counting measure, 144
136, 137, 157–159, 164 Cramèr model, 119
Basel problem, 9 critical line, 21, 59, 61, 74
Bernoulli random variable, 8, 9, 11, 15, 26, 34 critical strip, 39, 174
Bernstein’s Theorem, 129
Bessel function, 93 density of a set, 36
bilinear forms and primes, 112 derandomization, 119
Birkhoff–Khintchine pointwise ergodic theorem, Dirac mass, 115
52 Dirichlet L-function, 57, 78, 79, 173, 175, 178,
Bohr–Jessen Theorem, 40 184
Bombieri–Vinogradov Theorem, 19 Dirichlet character, 57, 79–81, 89, 90, 173, 177,
Borel–Cantelli Lemma, 27, 155 178, 181
Brownian motion, 36 Dirichlet convolution, 61, 127, 168
Dirichlet polynomial, 60–62, 66, 124
Central Limit Theorem, 27, 30, 32, 65, 75, 97, Dirichlet series, 45, 51, 71, 124, 126, 170, 177
139, 141, 149, 182 discrete group, 19
character, 43, 177 discrete Parseval identity, 185
characteristic function, 5, 33, 79, 91, 93, 139, discrete Plancherel formula, 100
148, 149, 152 distribution function, 16, 36
Chebychev bias, 21, 77 distribution of prime numbers, 76, 180
Chebychev estimate, 22, 25, 172
Chebychev’s inequality, 164 elementary matrix, 19
Chinese Remainder Theorem, 11, 12, 14, 114, equidistribution, 143
170 equidistribution modulo 1, 111, 112, 114
192
Erdős–Kac Theorem, 4, 6, 20, 26, 36, 40, 57, irreducible polynomial, 11
62, 111, 171
Erdős–Wintner Theorem, 20, 25, 31 Jacobi sum, 183, 185
Euler function, 14, 19, 170 Jessen–Wintner Theorem, 17
Euler product, 38, 42, 60, 62, 69, 169, 174, 178
exact sequence, 147 Kloosterman paths, 98, 103, 104, 107
expander graphs, 20, 119 Kloosterman sum, 21, 96, 99, 101, 104, 107,
explicit formula, 83, 180, 181 110, 182, 183
exponential distribution, 113 Kolmogorov’s Maximal Inequality, 153, 154,
exponential sum, 21, 96, 110, 182 159
Kolmogorov’s Theorem, 46, 92, 101, 152
Fermat’s Theorem, 185 Kolmogorov’s Three Series Theorem, 16, 153
finite cyclic group, 145 Kolmogorov’s Tightness Criterion, 100, 102,
finite field, 96, 182 104, 107, 162
finite Markov chain, 20 Kolmogorov’s Zero–One Law, 153
Fourier coefficients, 100, 102, 104, 161 Kolmogorov–Smirnov distance, 34
Fourier inversion formula, 123 Kronecker Equidistribution Theorem, 80, 83,
Fourier series, 101, 161 146
Fourier transform, 5, 122, 123, 139, 144
fourth moment, 104 Laplace transform, 129, 148, 151
fractional part, 39, 111, 114, 117 lattice, 118
functional equation, 38 law of the iterated logarithm, 97
functional limit theorem, 40, 99, 110 Lebesgue measure, 111
fundamental discriminant, 117 Lipschitz function, 51, 63, 135, 137, 160
fundamental theorem of arithmetic, 8 local spectral equidistribution, 57
Féjer’s Theorem, 161 logarithmic integral, 172
logarithmic weight, 78
Gamma function, 39, 123 Lyapunov’s Theorem, 149
gaps, 112, 116 Lévy Criterion, 33, 149, 152
Gauss sum, 183, 184
gaussian distribution, 21 Markov chain theory, 20
gaussian random variable, 4, 30, 142, 148, 150 Markov inequality, 70, 153
gaussian vector, 142, 148 mean-value estimates, 58
Generalized Riemann Hypothesis, 19, 80–83, median, 164
93, 181 Mellin inversion formula, 122, 126, 180
Generalized Simplicity Hypothesis, 88, 89, 92, Mellin transform, 48, 50, 122, 126
93, 95 Menshov–Rademacher Theorem, 47, 155
geometric random variable, 13, 23 Mertens formula, 27, 28, 33, 37, 54, 172, 173
Gowers norms, 119 method of moments, 5, 27, 65, 99, 140
mild measure, 140
Hadamard factorization, 66, 175 mild random variable, 140, 142, 148
Hardy–Littlewood k-tuple conjecture, 116 model of arithmetic objects, 119
Hasse–Davenport relation, 185 modular forms, 57, 74, 96
moment conjectures, 74
ideal class group, 116 multiplication table, 31
imaginary quadratic fields, 117 multiplicative character sum, 183
inclusion-exclusion, 8, 14 multiplicative Fourier expansion, 185
independent and uniformly distributed random multiplicative function, 14, 115, 168, 169
variables, 88 Möbius function, 8, 25, 60, 169, 170, 182
independent events, 155 Möbius inversion, 17, 170
independent random variables, 5, 12, 15, 16, 20,
23, 26, 30, 34, 41, 46, 79, 90, 94, 97, 99, non-trivial zero, 174
101, 108, 132, 142, 143, 148, 149, 152, 154, number field, 116
157, 164, 167, 182 number of cycles, 34
indicator function, 14 number of prime factor, 31
indicator of an event, 8 number of prime factors, 4
interpolation theory, 165
invertible residue classes, 19, 177 orthogonality of characters, 79
193
orthogonality relation, 80 Riesz Representation Theorem, 131, 138
orthonormal basis, 144 Riesz–Markov Theorem, 128
orthonormal random variables, 156 roots of polynomial congruences, 114
orthonormality of characters, 178 Rubinstein–Sarnak distribution, 78, 83, 93

perturbation, 5, 15, 135 Salié sum, 183, 185


Petersson formula, 57 Sato–Tate measure, 57, 99, 109, 143
Poisson approximation, 32 Selberg’s Theorem, 43, 60, 74, 90
Poisson distribution, 21, 151 Selberg–Delange method, 33
Poisson random variable, 32, 33 semi-circle law, 99
Poisson–Dirichlet distribution, 36 sieve, 33
Pontryagin duality, 144 smooth partial sum, 47, 126
prime counting function, 171 space of lattices, 113
Prime Number Theorem, 19, 25, 32, 54, 57, 68, speed of convergence, 7
116, 167, 172, 173, 182 square-root cancellation, 182
Prime Number Theorem in arithmetic squarefree integer, 8, 19, 31, 61, 169, 170
progressions, 81 standard complex gaussian random variable,
prime numbers, 19, 21 59, 63, 97, 182
primes in arithmetic progressions, 173 standard gaussian, 26
primitive Dirichlet character, 89 standard gaussian random variable, 60, 63, 74,
probability Haar measure, 83, 84, 88, 99, 111, 141, 148
143–146 Stein’s Method, 33
probability in Banach spaces, 21 Stirling formula, 124, 176
product measure, 12, 132 subgaussian bounds, 74
Prokhorov’s Theorem, 100, 160 subgaussian random variable, 102, 150, 164, 165
Property (τ ), 20 summation by parts, 39, 81, 120
Property (T), 20 support, 42, 52, 107, 132, 157, 158
pseudomoments, 45 symmetric group, 34
purely singular measure, 16 symmetric partial sums, 99
symmetric powers, 35
quadratic Gauss sum, 106, 183 symmetric random variable, 158, 165, 167

Radon measure, 131 Talagrand’s inequality, 164, 166, 167


Ramanujan graph, 119 Taylor polynomial, 69
random Dirichlet series, 47, 48, 53, 57 tensor product, 35
random Euler product, 21, 41, 45, 90 Three Gaps Theorem, 113
random Fourier series, 21, 99, 100, 107, 110 tightness, 100, 101, 104, 105, 135, 160
random function, 21, 46, 82, 92 total variation distance, 33, 34
random holomorphic function, 46 trivial zero, 174
Random Matrix Theory, 21, 75 truncation, 28
random permutation, 34, 37 ultraflat polynomials, 119
random product, 16 uniform distribution, 7
random series, 16, 152 uniform integrability, 30, 141
random vector, 13, 93 uniform probability measure, 6, 12, 19, 26, 39,
random walk, 19, 20 57, 101, 111, 115
Rankin’s trick, 68, 71 uniform random variable on the unit circle, 41
real character, 177 uniformity, 7
reduction modulo an integer, 6 uniformly distributed random variable, 143,
reflection principle, 159 146, 182
representation theory, 35 unique factorization, 57
residue class, 7 unitary character, 144
Riemann Hypothesis, 182
Riemann Hypothesis over finite fields, 99, 106, valuations of integers, 13
110, 184 Van der Corput inequality, 112
Riemann sum, 100 von Mangoldt formula, 87
Riemann zeta function, 9, 17, 21, 38, 45, 50, 56, von Mangoldt function, 80, 174
59, 66, 74, 170, 174, 179, 181 Voronin’s Universality Theorem, 40, 52, 53
194
weak convergence, 133
Weil bound, 96, 182
Weil number, 185
Weyl Criterion, 43, 111, 146
Weyl sums, 110

zeros of Dirichlet L-functions, 78, 80, 82, 87,


89, 91, 179

195

You might also like