Prime Number Theory and The Riemann Zeta-Function

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Prime Number Theory and the Riemann

Zeta-Function
D.R. Heath-Brown

1 Primes
An integer p N is said to be prime if p 6= 1 and there is no integer n
dividing p with 1 < n < p. (This is not the algebraists definition, but in our
situation the two definitions are equivalent.)
The primes are multiplicative building blocks for N, as the following cru-
cial result describes.
Theorem 1 (The Fundamental Theorem of Arithmetic.) Every n
N can be written in exactly one way in the form

n = pe11 pe22 . . . pekk ,

with k 0, e1 , . . . , ek 1 and primes p1 < p2 < . . . < pk .


For a proof, see Hardy and Wright [5, Theorem 2], for example. The
situation for N contrasts with that for arithmetic in the set

{m + n 5 : m, n Z},

where one has, for example,



6 = 2 3 = (1 + 5) (1 5),

with 2, 3, 1 + 5 and 1 5 all being primes.
A second fundamental result appears in the works of Euclid.
Theorem 2 There are infinitely many primes.
This is proved by contradiction. Assume there are only finitely many
primes, p1 , p2 , . . . , pn , say. Consider the integer N = 1 + p1 p2 . . . pn . Then
N 2, so that N must have at least one prime factor p, say. But our list
of primes was supposedly complete, so that p must be one of the primes pi ,

1
say. Then pi divides N 1, by construction, while p = pi divides N by
assumption. It follows that p divides N (N 1) = 1, which is impossible.
This contradiction shows that there can be no finite list containing all the
primes.
There have been many tables of primes produced over the years. They
show that the detailed distribution is quite erratic, but if we define

(x) = #{p x : p prime},

then we find that (x) grows fairly steadily. Gauss conjectured that

(x) Li(x),

where Z x
dt
Li(x) = ,
2 log t
that is to say that
(x)
lim = 1.
x Li(x)

The following figures bear this out.


(x)
(108 ) = 5,776,455 Li(x)
= 0.999869147. . . ,
12 (x)
(10 ) = 37,607,912,018 Li(x)
= 0.999989825. . . ,
(x)
(1016 ) = 279,238,341,033,925 Li(x)
= 0.999999989. . . .
It is not hard to show that in fact
x
Li(x) ,
log x
but it turns out that Li(x) gives a better approximation to (x) than x/ log x
does. Gauss conjecture was finally proved in 1896, by Hadamard and de la
Vallee Poussin, working independently.
Theorem 3 (The Prime Number Theorem.) We have
x
(x)
log x
as x .
One interesting interpretation of the Prime Number Theorem is that for
a number n in the vicinity of x the probability that n is prime is asymp-
totically 1/ log x, or equivalently, that the probability that n is prime is
asymptotically 1/ log n. Of course the event n is prime is deterministic
that is to say, the probability is 1 if n is prime, and 0 otherwise. None the

2
less the probabilistic interpretation leads to a number of plausible heuris-
tic arguments. As an example of this, consider, for a given large integer
n, the probability that n + 1, n + 2, . . . , n + k are all composite. If k is at
most n, say, then the probability that any one of these is composite is about
1 1/ log n. Thus if the events were all independent, which they are not, the
overall probability would be about
k
1
1 .
log n

Taking k = (log n)2 and approximating


log n
1
1
log n

by e1 , we would have that the probability that n + 1, n + 2, . . . , n + k are


all composite, is around n .
If En is the event that n + 1, n + 2, . . . , n + k are all composite, then the
events En and En+1 are clearly not independent. However we may hope that
En and En+k are independent. If the events En were genuinely independent
for different values of n then an application of the Borel-Cantelli lemma
would tell us that En should happen infinitely often when < 1, and finitely
often for 1. With more care one can make this plausible even though
En and En0 are correlated for nearby values n and n0 . We are thus led to the
following conjecture.
Conjecture 1 If p0 denotes the next prime after p then
p0 p
lim sup = 1.
p (log p)2

Numerical evidence for this is hard to produce, but what there is seems to
be consistent with the conjecture.
In the reverse direction, our simple probabilistic interpretation of the
Prime Number Theorem might suggest that the probability of having both n
and n + 1 prime should be around (log n)2 . This is clearly wrong, since one
of n and n + 1 is always even. However, a due allowance for such arithmetic
effects leads one to the following.
Conjecture 2 If
Y 1

c=2 1 = 1.3202 . . . ,
p>2
(p 1)2

3
the product being over primes, then
Z x
dt
#{n x : n, n + 2 both prime} + c . (1)
2 (log t)2
The numerical evidence for this is extremely convincing.
Thus the straightforward probabilistic interpretation of the Prime Num-
ber Theorem leads to a number of conjectures, which fit very well with the
available numerical evidence. This probabilistic model is known as Cramers
Model and has been widely used for predicting the behaviour of primes.
One further example of this line of reasoning shows us however that the
primes are more subtle than one might think. Consider the size of
(N + H) (N ) = #{p : N < p N + H},
when H is small compared with N . The Prime Number Theorem leads one
to expect that
Z N +H
dt H
(N + H) (N ) + .
N log t log N
However the Prime Number Theorem only says that
Z x
dt x
(x) = + o( ),
2 log t log x
or equivalently that Z x
dt
(x) = + f (x),
2 log t
where
f (x)
0
x/ log x
as x . Hence
Z N +H
dt
(N + H) (N ) = + f (N + H) f (N ).
N log t
In order to assert that
f (N + H) f (N )
0
H/ log N
as N we need cN H N for some constant c > 0. None the less,
considerably more subtle arguments show that
H
(N + H) (N )
log N

4
even when H is distinctly smaller than N .
A careful application of the Cramer Model suggests the following conjec-
ture.

Conjecture 3 Let > 2 be any constant. Then if H = (log N ) we should


have
H
(N + H) (N )
log N
as N .

This is supported by the following result due to Selberg in 1943 [15].

Theorem 4 Let f (N ) be any increasing function for which f (N ) as


N . Assume the Riemann Hypothesis. Then there is a subset E of the
integers N, with
#{n E : n N } = o(N )
as N , such that

(n + f (n) log2 n) (n) f (n) log n

for all n 6 E.

Conjecture 3 would say that one can take E = if f (N ) is a positive power


of log N .
Since Cramers Model leads inexorably to Conjecture 3, it came as quite
a shock to prime number theorists when the conjecture was disproved by
Maier [9] in 1985. Maier established the following result.

Theorem 5 For any > 1 there is a constant > 0 such that

(N + (log N ) ) (N )
lim sup 1 +
N (log N )1

and
(N + (log N ) ) (N )
lim inf 1 .
N (log N )1
The values of N produced by Maier, where (N + (log N ) ) (N ) is ab-
normally large, (or abnormally small), are very rare. None the less their
existence shows that the Cramer Model breaks down. Broadly speaking one
could summarize the reason for this failure by saying that arithmetic effects
play a bigger role than previously supposed. As yet we have no good alter-
native to the Cramer model.

5
2 Open Questions About Primes,
and Important Results
Here are a few of the well-known unsolved problems about the primes.

(1) Are there infinitely many prime twins n, n + 2 both of which are
prime? (Conjecture 2 gives a prediction for the rate at which the num-
ber of such pairs grows.)

(2) Is every even integer n 4 the sum of two primes? (Goldbachs Con-
jecture.)

(3) Are there infinitely many primes of the form p = n2 + 1?

(4) Are there infinitely many Mersenne primes of the form p = 2n 1?

(5) Are there arbitrarily long arithmetic progressions, all of whose terms
are prime?

(6) Is there always a prime between any two successive squares?

However there have been some significant results proved too. Here are a
selection.

(1) There are infinitely many primes of the form a2 + b4 . (Friedlander and
Iwaniec [4], 1998.)

(2) There are infinitely many primes p for which p + 2 is either prime or a
product of two primes. (Chen [2], 1966.)

(3) There is a number n0 such that any even number n n0 can be written
as n = p+p0 with p prime and p0 either prime or a product of two primes.
(Chen [2], 1966.)

(4) There are infinitely many integers n such that n2 + 1 is either prime or
a product of two primes. (Iwaniec [8], 1978.)
243
(5) For any constant c < 205 = 1.185 . . ., there are infinitely many integers
c
n such that [n ] is prime. Here [x] denotes the integral part of x, that
is to say the largest integer N satisfying N x. (Rivat and Wu [14],
2001, after Piatetski-Shapiro, [11], 1953.)

(6) Apart from a finite number of exceptions, there is always a prime be-
tween any two consecutive cubes. (Ingham [6], 1937.)

6
(7) There is a number n0 such that for every n n0 there is at least one
prime in the interval [n , n + n0.525 ]. (Baker, Harman and Pintz, [1],
2001.)

(8) There are infinitely many consecutive primes p,0 p such that p0 p
(log p)/4. (Maier [10], 1988.)

(9) There is a positive constant c such that there are infinitely many con-
secutive primes p,0 p such that

(log log p)(log log log log p)


p0 p c log p .
(log log log p)2

(Rankin [13], 1938.)

(10) For any positive integer q and any integer a in the range 0 a < q,
which is coprime to q, there are arbitrarily long strings of consecutive
primes, all of which leave remainder a on division by q. (Shiu [16],
2000.)

By way of explanation we should say the following. The result (1) demon-
strates that even though we cannot yet handle primes of the form n2 + 1, we
can say something about the relatively sparse polynomial sequence a2 + b4 .
The result in (5) can be viewed in the same context. One can think of [nc ] as
being a polynomial of degree c with c > 1. Numbers (2), (3) and (4) are
approximations to, respectively, the prime twins problem, Goldbachs prob-
lem, and the problem of primes of the shape n2 + 1. The theorems in (6)
and (7) are approximations to the conjecture that there should be a prime
between consecutive squares. Of these (7) is stronger, if less elegant. Maiers
result (8) shows that the difference between consecutive primes is sometimes
smaller than average by a factor 1/4, the average spacing being log p by the
Prime Number Theorem. (Of course the twin prime conjecture would be
a much stronger result, with differences between consecutive primes some-
times being as small as 2.) Similarly, Rankins result (9) demonstrates that
the gaps between consecutive primes can sometimes be larger than average,
by a factor which is almost log log p. Again this is some way from what we
expect, since Conjecture 1 predict gaps as large as (log p)2 . Finally, Shius
result (10) is best understood by taking q = 107 and a = 7, 777, 777, say.
Thus a prime leaves remainder a when divided by q, precisely when its dec-
imal expansion ends in 7 consecutive 7s. Then (10) tells us that a table of
primes will somewhere contain a million consecutive entries, each of which
ends in the digits 7,777,777.

7
3 The Riemann Zeta-Function
In the theory of the zeta-function it is customary to use the variable s =
+ it C. One then defines the complex exponential

ns := exp(s log n), with log n R.

The Riemann Zeta-function is then



X
(s) := ns , > 1. (2)
n=1

The sum is absolutely convergent for > 1, and for fixed > 0 it is uniformly
convergent for 1 + . It follows that (s) is holomorphic for > 1. The
function is connected to the primes as follows.
Theorem 6 (The Euler Product.) If > 1 then we have
Y
(s) = (1 ps )1 ,
p

where p runs over all primes, and the product is absolutely convergent.
This result is, philosophically, at the heart of the theory. It relates a sum
over all positive integers to a product over primes. Thus it relates the additive
structure, in which successive positive integers are generated by adding 1, to
the multiplicative structure. Moreover we shall see in the proof that the
fact that the sum and the product are equal is exactly an expression of the
Fundamental Theorem of Arithmetic.
To prove the result consider the finite product
Y
(1 ps )1 .
pX

Since > 1 we have |ps | < p1 < 1, whence we can expand (1 ps )1 as


an absolutely convergent series 1 + ps + p2s + p3s + . . .. We may multiply
together a finite number of such series, and rearrange them, since we have
absolute convergence. This yields
Y
X
s 1 aX (n)
(1 p ) = ,
pX n=1
ns

where the coefficient aX (n) is the number of ways of writing n in the form

n = pe11 pe22 . . . perr with p1 < p2 < . . . < pr X.

8
By the Fundamental Theorem of Arithmetic we have aX (n) = 0 or 1, and if
n X we will have aX (n) = 1. It follows that

X
X X 1 X 1
s aX (n)
| n | | s| = .
n=1 n=1
ns n>X
n n>X
n

P
As X this final sum must tend to zero, since the infinite sum n=1 n
converges. We therefore deduce that if > 1, then
Y X
1
lim (1 ps )1 = s
,
X
pX n=1
n

as required. Of course the product is absolutely convergent, as one may see


by taking s = .
One important deduction from the Euler product identity comes from
taking logarithms and differentiating termwise. This can be justified by the
local uniform convergence of the resulting series.
Corollary 1 We have
X (n)
0
(s) = , ( > 1), (3)
n=2
ns

where
log p, n = pe ,
(n) =
0, otherwise.
The function (n) is known as the von Mangoldt function.

4 The Analytic Continuation and Functional


Equation of (s)
Our definition only gives a meaning to (s) when > 1. We now seek to
extend the definition to all s C. The key tool is the Poisson Summation
Formula.
Theorem 7 (The Poisson Summation Formula.) Suppose that f : R
R is twice differentiable and that f, f 0 and f 00 are all integrable over R. Define
the Fourier transform by
Z

f (t) := f (x)e2itx dx.

9
Then
X X
f (n) = f(n),

both sides converging absolutely.

There are weaker conditions under which this holds, but the above more
than suffices for our application. The reader should note that there are a
number of conventions in use for defining the Fourier transform, but the one
used here is the most appropriate for number theoretic purposes.
The proof (see Rademacher [12, page 71], for example) uses harmonic
analysis on R+ . Thus it depends only on the additive structure and not on
the multiplicative structure.
If we apply the theorem to f (x) = exp{x2 v}, which certainly fulfils
the conditions, we have
Z

f (n) =
2
ex v e2inx dx
Z

2 2
= ev(x+in/v) en /v dx

Z
n2 /v 2
= e evy dy

1 2
= en /v ,
v

providing that v is real and positive. Thus if we define



X
(v) := exp(n2 v),

then the Poisson Summation Formula leads to the transformation formula


1
(v) = (1/v).
v

The function (v) is a theta-function, and is an example of a modular form.


It is the fact that (v) not only satisfies the above transformation formula
when v goes to 1/v but is also periodic, that makes (v) a modular form.
The Langlands Philosophy says that all reasonable generalizations of
the Riemann Zeta-function are related to modular forms, in a suitably gen-
eralized sense.

10
We are now ready to consider (s), but first we introduce the function

X 2 v
(v) = en , (4)
n=1

so that (v) = ((v) 1)/2 and


1 1
2(v) + 1 = {2( ) + 1}. (5)
v v

We proceed to compute that, if > 1, then


Z Z
X
s/21 2 x
x (x)dx = xs/21 en dx
0 n=1 0

X Z
1
= y s/21 ey dy
n=1
(n )s/2
2
0

X
1 s
= ( )
n=1
(n2 )s/2
2
s
= (s) s/2 ( ),
2
on substituting y = n2 x. The interchange of summation and integration is
justified by the absolute convergence of the resulting sum.
We now split the range of integration in the original integral, and apply
the transformation formula (5). For > 1 we obtain the expression
Z Z 1
s/2 s s/21
(s) ( ) = x (x)dx + xs/21 (x)dx
2
Z1 Z0 1
1 1 1 1
= xs/21 (x)dx + xs/21 { ( ) + }dx
1 0 x x 2 x 2
Z Z 1
1 1 1
= xs/21 (x)dx + xs/23/2 ( )dx +
x s1 s
Z1 Z0
1
= xs/21 (x)dx + y (1s)/21 (y)dy ,
1 1 s(1 s)

where we have substituted y for 1/x in the final integral.


We therefore conclude that
Z
s/2 s 1
(s) ( ) = {xs/21 + x(1s)/21 }(x)dx , (6)
2 1 s(1 s)

11
whenever > 1. However the right-hand side is meaningful for all values
s C {0, 1}, since the integral converges by virtue of the exponential
decay of (x). We may therefore use the above expression to define (s)
for all s C {0, 1}, on noting that the factor s/2 (s/2) never vanishes.
Indeed, since (s/2)1 has a zero at s = 0 we see that the resulting expression
for (s) is regular at s = 0. Finally we observe that the right-hand side of
(6) is invariant on substituting s for 1 s. We are therefore led to the the
following conclusion.

Theorem 8 (Analytic Continuation and Functional Equation.) The


function (s) has an analytic continuation to C, and is regular apart from a
simple pole at s = 1, with residue 1. Moreover
s 1s
s/2 ( )(s) = (1s)/2 ( )(1 s).
2 2
Furthermore, if a b and |t| 1, then s/2 ( 2s )(s) is bounded in
terms of a and b.

To prove the last statement in the theorem we merely observe that


Z
s/2 s
| ( )(s)| 1 + (xb/21 + x(1a)/21 )(x)dx.
2 1

5 Zeros of (s)
It is convenient to define
1 s s
(s) = s(s 1) s/2 ( )(s) = (s 1) s/2 (1 + )(s), (7)
2 2 2
so that (s) is entire. The functional equation then takes the form (s) =
(1 s). It is clear from (3) that (s) can have no zeros for > 1, since the
series converges. Since 1/(z) is entire, the function (s/2) is non-vanishing,
so that (s) also has no zeros in > 1. Thus, by the functional equation,
the zeros of (s) are confined to the critical strip 0 1. Moreover
any zero of (s) must either be a zero of (s), or a pole of (s/2). We then
see that the zeros of (s) lie in the critical strip, with the exception of the
trivial zeros at s = 2, 4, 6, . . . corresponding to poles of (s/2).
We may also observe that if is a zero of (s) then, by the functional
equation, so is 1 . Moreover, since (s) = (s), we deduce that and 1
are also zeros. Thus the zeros are symmetrically arranged about the real
axis, and also about the critical line given by = 1/2. With this picture
in mind we mention the following important conjectures.

12
Conjecture 4 (The Riemann Hypothesis.) We have = 1/2 for all
zeros of (s).
Conjecture 5 All zeros of (s) are simple.
In the absence of a proof of Conjecture 5 we adopt the convention that in any
sum or product over zeros, we shall count them according to multiplicity.

6 The Product Formula


There is a useful product formula for (s), due to Hadamard. In general
we have the following result, for which see Davenport [3, Chapter 11] for
example.
Theorem 9 Let f (z) be an entire function with f (0) 6= 0, and suppose that
there are constants A > 0 and < 2 such that f (z) = O(exp(A|z| )) for all
complex z. Then there are constants and such that

Y
+z z z/zn
f (z) = e {(1 )e },
n=1
zn

where zn P
runs over the zeros of f (z) counted with multiplicity. The infi-

nite sum n=1 |zn |2 converges, so that the product above is absolutely and
uniformly convergent in any compact set which includes none of the zeros.
We can apply this to (s), since it is apparent from Theorem 8, together
with the definition (7) that
1 1 1
(0) = (1) = 1/2 ( )Res{(s); s = 1} = .
2 2 2
For 2 one has (s) = O(1) directly from the series (2), while Stirlings
approximation yields (s/2) = O(exp(|s| log |s|)). It follows that (s) =
O(exp(|s| log |s|)) whenever 2. Moreover, when 12 2 one sees from
Theorem 8 that (s) is bounded. Thus, using the functional equation, we
can deduce that (s) = O(exp(|s| log |s|)) for all s with |s| 2.
We may therefore deduce from Theorem 9 that
Y s
(s) = e+s {(1 )es/ },

where runs over the zeros of (s). Thus, with appropriate branches of the
logarithms, we have
X s s
log (s) = + s + {log(1 ) + }.

13
We can then differentiate termwise to deduce that
0 X 1 1
(s) = + { + },

s

the termwise differentiation being justified by the local uniform convergence


of the resulting sum. We therefore deduce that
0 1 1 1 0 s X 1 1
(s) = + log ( + 1) + { + }, (8)
s1 2 2 2
s

where, as ever, runs over the zeros of counted according to multiplicity.


In fact, on taking s 1, one can show that
1 1
= 1 log 4,
2 2
where is Eulers constant. However we shall make no use of this fact.

7 The Functions N (T ) and S(T )


We shall now investigate the frequency of the zeros . We define

N (T ) = #{ = + i : 0 1, 0 T }.

The notation = <(), = =() is standard. In fact one can easily show
that (x) < (2 x)1 , whence (6) suffices to prove that (s) < 0 for real
s (0, 1). Thus we have > 0 for any zero counted by N (T ).
The first result we shall prove is the following.
Theorem 10 If T is not the ordinate of a zero, then
T T T 7
N (T ) = log + + S(T ) + O(1/T ),
2 2 2 8
where
1 1
S(T ) = arg ( + iT ),
2
is defined by continuous variation along the line segments from 2 to 2 + iT
to 21 + iT .
We shall evaluate N (T ) using the Principle of the Argument, which shows
that
1
N (T ) = R arg (s),
2

14
providing that T is not the ordinate of any zero. Here R is the rectangular
path joining 2, 2 + iT , 1 + iT , and 1. To calculate R arg (s) one
starts with any branch of arg (s) and allows it to vary continuously around
the path. Then R arg (s) is the increase in arg (s) along the path. Our
assumption about T ensures that (s) does not vanish on R.
Now (s) = (1 s) and (1 s) = (1 s), whence ( 12 + a + ib) is
conjugate to ( 21 a + ib). (In particular this shows that ( 12 + it) is always
real.) It follows that
R (s) = 2P (s),
where P is the path 21 2 2 + iT 12 + iT . On the first line segment
(s) is real and strictly positive, so that the contribution to P (s) is zero.
Let L be the remaining path 2 2 + iT 12 + iT . Then
s
L (s) = L {arg(s 1) s/2 ( + 1)} + L arg (s).
2
Now on L the function s 1 goes from 1 to 21 + iT , whence
1
L arg(s 1) = arg( + iT ) = + O(T 1 ).
2 2
We also have
s
arg s/2 = = log s/2 = =( log ),
2
so that arg s/2 goes from 0 to (T log )/2 and
T
L arg s/2 = log .
2
Finally, Stirlings formula yields
1 1
log (z) = (z ) log z z + log(2) + O(|z|1 ), (| arg(z)| ), (9)
2 2
whence
1
s + iT
L arg ( + 1) = = log ( 2 + 1)
2 2
3 T 5 T 5 T 1
= ={( + i ) log( + i ) ( + i ) + log(2)}
4 2 4 2 4 2 2
+O(1/T )
T T T 3
= log + + O(1/T ),
2 2 2 8
since
5 T T
log( + i ) = log + i + O(1/T ).
4 2 2 2
These results suffice for Theorem 10
We now need to know about S(T ). Here we show the following.

15
Theorem 11 We have S(T ) = O(log T ).

Corollary 2 (The Riemann von Mangoldt Formula). We have


T T T
N (T ) = log + O(log T ).
2 2 2
We start the proof by taking s = 2 + iT in (3) and noting that

X (n)
0
| (s)| 2
= O(1).
n=2
n

Thus the partial fraction decomposition (8) yields


X 1 1 1 0 iT
{ + }= (2 + ) + O(1).

2 + iT 2 2

We may differentiate (9), using Cauchys formula for the first derivative, to
produce
0
(z) = log z + O(1), (| arg(z)| ), (10)

and then deduce that
X 1 1
{ + } = O(log(2 + T )). (11)

2 + iT

We have only assumed here that T 0, not that T 2. In order to


get the correct order estimate when 0 T 2 we have therefore written
O(log(2 + T )), which is O(1) for 0 T 2.
Setting = + i we now have
1 2 1
< =
2 + iT (2 )2 + (T )2 4 + (T )2
and
1
< = 2 0,
+ 2
since 0 1. We therefore produce the useful estimate
X 1
2
= O(log(2 + T )), (12)

4 + (T )

which implies in particular that

#{ : T 1 T + 1} = O(log(2 + T )). (13)

16
We now apply (8) with s = + iT and 0 2, and subtract (11) from
it to produce

0 1 X 1 1
( + iT ) = + { } + O(log(2 + T )).
+ iT 1
+ iT 2 + iT

Terms with | T | > 1 have


1 1 2
| | = | |
+ iT 2 + iT ( + iT )(2 + iT )
2

| T |.| T |
2
1 2}
.
5
{4 + (T )

Thus (12) implies that


X 1 1
{ } = O(log(2 + T )),
+ iT 2 + iT
: |T |>1

and hence that


0 1 X 1 1
( + iT ) = + { }
+ iT 1 + iT 2 + iT
: |T |1

+O(log(2 + T )).

However we also have


1 1
| | 1,
2 + iT 2

whence (13) produces


X 1
= O(log(2 + T )).
2 + iT
: |T |1

We therefore deduce the following estimate.

Lemma 1 For 0 2 and T 0 we have


0 1 X 1
( + iT ) = + + O(log(2 + T )).
+ iT 1 + iT
: |T |1

17
We are now ready to complete our estimation of S(T ). Taking T 2, we
have Z 1/2+iT 0
1 1
arg ( + iT ) = = log ( + iT ) = = (s)ds,
2 2 2
the path of integration consisting of the line segments from 2 to 2 + iT and
from 2 + iT to 1/2 + iT . Along the first of these we use the formula (3),
which yields
Z " #2+iT
2+iT
0 X (n)
(s)ds = = O(1).
2 n=2
ns log n
2

For the remaining range we use Lemma 1, which produces


Z 1/2+iT X Z 1/2+iT
0 ds
= (s)ds = = + O(log T )
2+iT 2+iT s
: |T |1
X 1
= ={log( + iT ) log(2 + iT )}
2
: |T |1

+O(log T )
X 1
= {arg( + iT ) arg(2 + iT )}
2
: |T |1

+O(log T )
X
= O(1) + O(log T )
: |T |1

= O(log T ),

by (13). This suffices for the proof of Theorem 11.

8 The Non-Vanishing of (s) on = 1


So for we know only that the non-trivial zeros of (s) lie in the critical strip
0 1. Qualitatively the only further information we have is that there
are no zeros on the boundary of this strip.

Theorem 12 (Hadamard and de la Vallee Poussin, independently, 1896.)


We have (1 + it) 6= 0, for all real t.

This result was the key to the proof of the Prime Number Theorem. Quan-
titatively one can say a little more.

18
Theorem 13 (De la Vallee Poussin.) There is a positive absolute constant
c such that for any T 2 there are no zeros of (s) in the region
c
1 , |t| T.
log T

In fact, with much more work, one can replace the function c/ log T by
one that tends to zero slightly more slowly, but that will not concern us here.
The proof of Theorem 13 uses the following simple fact.

Lemma 2 For any real we have

3 + 4 cos + cos 2 0.

This is obvious, since

3 + 4 cos + cos 2 = 2{1 + cos }2 .

We now use the identity (3) to show that

0 0 0
3 () 4< ( + it) < ( + 2it)

X
(n)
=
{3 + 4 cos(t log n) + cos(2t log n)}
n=2
n
0,

for > 1. When 1 < 2 we have


0 1
() = + O(1),
1
from the Laurent expansion around the pole at s = 1. For the remaining two
terms we use Lemma 1, to deduce that
3 X 1 X 1
+ O(1) 4< < + O(log T )
1 + it + 2it
: |t|1 : |2t|1

0
for 1 < 2, T 2, and |t| T . Suppose we have a zero 0 = 0 + i0 ,
say, with 0 0 T . Set t = 0 . We then observe that for any zero we have
1
< = 0,
+ it ( )2 + (t )2

19
since > 1 , and similarly
1
< 0.
+ 2it
We can therefore drop all terms from the two sums above, with the exception
of the term corresponding to = 0 , to deduce that
4 3
+ O(log T ).
0 1
Suppose that the constant implied by the O(. . .) notation is c0 . This is just
a numerical value that one could calculate with a little effort. Then
4 3
+ c0 log T
0 1
whenever 1 < 2. If 0 = 1 we get an immediate contradiction by
choosing = 1 + (2c0 log T )1 . If 0 < 3/4 the result of Theorem 13 is
immediate. For the remaining range of 0 we choose = 1 + 4(1 0 ), which
will show that
4 3
+ c0 log T.
5(1 0 ) 4(1 0 )
Thus
1
c0 log T,
20(1 0 )
and hence
1
1 0 .
20c0 log T
This completes the proof of Theorem 13.
The reader should observe that the key feature of the inequality given in
Lemma 2 is that the coefficients are non-negative, and that the coefficient of
cos is strictly greater than the constant term. In particular, the inequality
1 + cos 0
just fails to work.
Theorem 13 has a useful corollary.
Corollary 3 Let c be as in Theorem 13, and let T 2. Then if
c
1 2
2 log T
and |t| T , we have
0 1
( + it) = + O(log2 T ).
+ it 1

20
For the proof we use Lemma 1. The sum over zeros has O(log T ) terms,
by (13), and each term is O(log T ), since
c
,
2 log T
by Theorem 13.

9 Proof of the Prime Number Theorem


Since our argument is based on the formula (3), it is natural to work with
(n). We define X X
(x) = (n) = log p. (14)
nx pk x

This is not the same function as that defined in (4)! Our sum (x) is related
to (x) in the following lemma.

Lemma 3 For x 2 we have


Z x
(x) (t)
(x) = + dt + O(x1/2 ).
log x 2 t log2 t
For the proof we begin by setting
X
(x) = log p.
px

Then
Z x Z
X log p x
(t)
dt = 2 dt
2 t log2 t 2 pt t log t
X Z x log p
= 2 dt
px p t log t
X x
log p
=
px
log t p
(x)
= (x) ,
log x
so that Z x
(x) (t)
(x) = + dt. (15)
log x 2 t log2 t

21
However it is clear that terms in (14) with k 2 have p x1/2 , and there are
at most x1/2 such p. Moreover k log x/ log p, whence the total contribution
from terms with k 2 is O(x1/2 log x). Thus

(x) = (x) + O(x1/2 log x).

If we substitute this into (15) the required result follows.


We will use contour integration to relate (x) to 0 (s)/(s). This will be
done via the following result.

Lemma 4 Let y > 0, c > 1 and T 1. Define


Z c+iT s
1 y
I(y, T ) = ds.
2i ciT s

Then
0, 0<y<1 yc
I(y, T ) = + O( ).
1, y>1 T | log y|
When 0 < y < 1 we replace the path of integration by the line segments
c iT N iT N + iT c + iT , and let N . Then
Z N +iT s
y
ds 0,
N iT s

while Z Z N
N iT
ys y yc
ds = O( d) = O( ),
ciT s c T T | log y|
and similarly for the integral from N + iT to c + iT . It follows that
yc
I(y, T ) = O( )
T | log y|

for 0 < y < 1. The case y > 1 can be treated analogously, using the path
c iT N iT N + iT c + iT . However in this case we pass
a pole at s = 0, with residue 1, and this produces the corresponding main
term for I(y, T ).
We can now give our formula for (x).
1
Theorem 14 For x 2
N, = 1 + 1/ log x and T 1 we have
Z
1 +iT
0 xs x log2 x
(x) = { (s)} ds + O( ).
2i iT s T

22
For the proof we integrate termwise to get
Z +iT X
1 0 xs x
{ (s)} ds = (n)I( , T )
2i iT s n=2
n
X X
x 1
= (n) + O( (n)( ) ).
nx n=2
n T | log x/n|

Since we are taking x 12 N the case x/n = 1 does not occur. In the error
sum those terms with n x/2 or n 2x have | log x/n| log 2. Such terms
therefore contribute
X
x x 0
O( (n) ) = O( | ()|)
n=2
Tn T
x1+1/ log x 1
= O( )
T 1
x log x
= O( ).
T
When x/2 < n < 2x we have

1 |x n|
| log x/n|
2 x
and
x
(n)( ) = O(log x).
n
These terms therefore contribute
X x log x x log2 x
O( ) = O( )
T |x n| T
x/2<n<2x

on bearing in mind that x 12 N. The theorem now follows.


We are now ready to prove the following major result.
Theorem 15 There is a positive constant c0 such that
p
(x) = x + O(x exp{c0 log x}) (16)

for all x 2. Moreover we have


p
(x) = Li(x) + O(x exp{c0 log x})

for all x 2.

23
The error terms here can be improved slightly, but with considerably more
work.
It clearly suffices to consider the case in which x 12 N. To prove the
result we set
c
=1 , T 2,
2 log T
as in Lemma 3, and replace the line of integration in Theorem 14 by the path
iT iT + iT + iT . The integrand has a pole at s = 1
with residue x, arising from the pole of (s), but no other singularities, by
virtue of Theorem 13. On the new path of integration Lemma 3 shows that
0
(s) = O(log2 T ).

We therefore deduce that
Z Z T
x log2 x log2 T log2 T
(x) = x + O( ) + O( x d) + O( x dt),
T T T | + it|

where the first error integral corresponds to the line segments iT


iT and + iT + iT , and the second error integral to the segment
iT + iT . These integrals are readily estimated to yield

x log2 x log2 T
(x) = x + O( ) + O( x ) + O(x log3 T ).
T T
Of course x = O(x) here. Thus if T x we merely get
1
(x) = x + O(x log3 x{ + x1 }).
T
We now choose p
T = exp{ log x},
whence
c p
(x) = x + O(x(log x)3 exp{ min(1, ) log x}).
2
We may therefore choose any positive constant c0 < min(1, 2c ) in Theorem 15.
This establishes (16). To prove the remaining assertion, it suffices to insert
(16) into Lemma 3.
Finally we should stress that the success of this argument depends on
being able to take < 1, since there is an error term which is essentially of
order x . Thus it is crucial that we should at least know that (1 + it) 6= 0.
If we assume the Riemann Hypothesis, then we may take any > 12 in
the above analysis. This leads to the following estimates.

24
Theorem 16 On the Riemann Hypothesis we have
(x) = x + O(x )
and
(x) = Li(x) + O(x )
1
for any > 2
and all x 2.
One cannot reduce the exponent below 1/2, since there is a genuine contri-
bution to (x) arising from the zeros of (s).

10 Explicit Formulae
In this section we shall argue somewhat informally, and present results with-
out proof.
If f : (0, ) C we define the Mellin transform of f to be the function
Z
F (s) := f (x)xs1 dx.
0

By a suitable change of variables one sees that this is essentially a form of


Fourier transform. Indeed all the properties of Mellin transforms can readily
be translated from standard results about Fourier transforms. In particular,
under suitable conditions one has an inversion formula
Z +i
1
f (x) = F (s)xs ds.
2i i
Arguing purely formally one then has

X
X Z 2+i
1
(n)f (n) = (n) F (s)ns ds
n=2 n=2
2i 2i
Z 2+i
1 0
= { (s)}F (s)ds.
2i 2i
If one now moves the line of integration to <(s) = N one passes poles at
s = 1 and at s = for every non-trivial zero , as well as at the trivial zeros
2n. Under suitable conditions the integral along <(s) = N will tend to 0
as N . This argument leads to the following result.
Theorem 17 Suppose that f C 2 (0, ) and that supp(f ) [1, X] for
some X. Then
X X
X
(n)f (n) = F (1) F () F (2n).
n=2 n=1

25
One can prove such results subject to weaker conditions on f . If x is
given, and
1, t x
f (t) =
0, t > x,
then the conditions above are certainly not satisfied, but we have the follow-
ing related result.

Theorem 18 (The Explicit Formula.) Let x T 2. Then


X x x log2 x
(x) = x + O( ).
T
: ||T

For a proof of this see Davenport [3, Chapter 17], for example. There are
variants of this result containing a sum over all zeros, and with no error term,
but the above is usually more useful.
The explicit formula shows exactly how the zeros influence the behaviour
of (x), and hence of (x). The connection between zeros and primes is even
more clearly shown by the following result of Landau.

Theorem 19 For fixed positive real x define (x) = 0 if x 6 N and (x) =


(n) if x = n N. Then

2 X log T
(x) = x + O x ( ),
T : 0<T
T

where Ox (. . .) indicates that the implied constant may depend on x.

This result shows that the zeros precisely determine the primes. Thus, for
example, one can reformulate the conjecture (1) as a statement about the
zeros of the zeta-function. All the unevenness of the primes, for example
the behaviour described by Theorem 5, is encoded in the zeros of the zeta-
function. It therefore seems reasonable to expect that the zeros themselves
should have corresponding unevenness.

11 Dirichlet Characters
We now turn to the simplest type of generalization of the Riemann Zeta-
function, namely the Dirichlet L-functions. In the remainder of these notes
we shall omit most of the proofs, being content merely to describe what can
be proved.

26
A straightforward example of a Dirichlet L-function is provided by the
infinite series
1 1 1 1 1
1 s + s s + s s + .... (17)
3 5 7 9 11
We first need to describe the coefficients which arise.

Definition Let q N. A (Dirichlet) character to modulus q is a


function : Z C such that

(i) (mn) = (m)(n) for all m, n Z;

(ii) (n) has period q;

(iii) (n) = 0 whenever (n, q) 6= 1; and

(iv) (1) = 1.

Part (iv) of the definition is necessary merely to rule out the possibility that
is identically zero.
As an example we can take the function

1, n 1 (mod 4),
(n) = 1, n 3 (mod 4), (18)

0, n 0 (mod 2).

This is a character modulo 4, and is the one generating the series (17). A
second example is the function

1, (n, q) = 1,
0 (n) :=
0, (n, q) 6= 1.

This produces a character for every modulus q, known as the principal char-
acter modulo q.
A number of key facts are gathered together in the following theorem.

Theorem 20 (i) We have |(n)| = 1 for every n coprime to q.

(ii) If 1 and 2 are two characters to modulus q, then so is 1 2 , where


we define 1 2 (n) = 1 (n)2 (n).

(iii) There are exactly (q) different characters to modulus q.

(iv) If n 6 1 (mod q) then there is at least one character modulo q for


which (n) 6= 1.

27
In part (iii) the function (q) is the number of positive integers n q for
which n and q are coprime.
To prove part (i) we note that the sequence nk (mod q) must eventually
repeat when k runs through N. Thus there exist k < j with (nk ) = (nj ),
and hence (n)k = (n)j . Since n is coprime to q we have (n) 6= 0, so that
(n)jk = 1.
Part (ii) of the theorem is obvious, but parts (iii) and (iv) are harder,
and we refer the reader to Davenport [3, Chapter 4] for the details. As
an example of (iii) we note that (4) = 2, and we have already found two
characters modulo 4. There are no others.
One further fact may elucidate the situation. Consider a general finite
abelian group G In our case we will have G = (Z/qZ) . Define G b to be

the group of homomorphisms : G C , where the group action is given by
(1 2 )(g) := 1 (g)2 (g). In our case these homomorphisms are, in effect, the
characters. Then the groups G and G b are isomorphic, and part (iii) above is
an immediate consequence. The details can be found in Ireland and Rosen
[7, pages 253 and 254], for example. Indeed there is a duality between G and
b The isomorphism between them is not natural, but there is a natural
G.
isomorphism
bb
G ' G.
There are two orthogonality relations satisfied by the characters to a given
modulus q. The first of these is the following.
Theorem 21 If a and q are coprime then
X
(q), n a (mod q),
S := (n)(a) =
0, n 6 a (mod q).
(mod q)

When n a(mod q) this is immediate since then (n)(a) = 1 for all .


In the remaining case, choose an element b with ab 1(mod q). By (iv) of
Theorem 20 there exists a character 1 such that 1 (nb) 6= 1. Then
X
1 (nb)S = 1 (n)(n)1 (b)(a).
(mod q)

However
1 (b)1 (a) = 1 (ab) = 1 (1) = 1,
whence 1 (b) = 1 (a). We therefore deduce that
X
1 (nb)S = 1 (n)(n)1 (a)(a)
(mod q)
X
= 1 (n)1 (a).
(mod q)

28
As runs over the complete set of characters to modulus q the product 1
does as well, since 1 = 1 0 implies = 0 . Thus
X
1 (n)1 (a) = S
(mod q)

and hence 1 (nb)S = S. Since 1 (nb) 6= 1 we deduce that S = 0, as required.


The second orthogonality relation is the following.
P
Theorem 22 If 6= 0 then qn=1 (n) = 0.

The proof is analogous to the previous result, and is based on the obvious
fact that if 6= 0 then there is some integer n coprime to q such that
(n) 6= 1. The details are left as an exercise for the reader.
If q has a factor r and is a character modulo r we can define the
character modulo q which is induced by . This is done by setting

(n), (n, q) = 1,
(n) =
0, (n, q) 6= 1.

For example, we may take to be the character modulo 4 given by (18).


Then if q = 12 we induce a character modulo 12, as in the following table.

1 2 3 4 5 6 7 8 9 10 11 12
1 0 -1 0 1 0 -1 0 1 0 -1 0
1 0 0 0 1 0 -1 0 0 0 -1 0

A character (mod q) which cannot be produced this way from some


divisor r < q is said to be primitive. The principal character is induced by
the character 0(mod 1), that is to say by the character which is identically
1. If q is prime, then all the characters except for the principal character are
primitive. In general there will be both primitive and imprimitive characters
to each modulus. Imprimitive characters are a real nuisance!!

12 Dirichlet L-functions
For any character to modulus q we will define the corresponding Dirichlet
L-function by setting

X (n)
L(s, ) = , ( > 1).
n=1
ns

29
We content ourselves here with describing the key features of these func-
tions, and refer the reader to Davenport [3], for example, for details.
The sum is absolutely convergent for > 1 and is locally uniformly
convergent, so that L(s, ) is holomorphic in this region. If is the principal
character modulo q then the series fails to converge when 1. However
for non-principal the series is conditionally convergent when > 0, and
the series defines a holomorphic function in this larger region.
There is an Euler product identity
Y
L(s, ) = (1 (p)ps )1 , ( > 1).
p

This can be proved in the same way as for (s) using the multiplicativity of
the function .
Suppose that is primitive, and that (1) = 1. If we apply the Poisson
summation formula to
2
f (x) = e(a+qx) v/q ,
multiply the result by (a), and sum for 1 a q, we find that

() 1 1
(v, ) = ( , ),
q v v

where
X 2 v/q
(v, ) := (n)en
n=

is a generalisation of the theta-function, and


q
X
() := (a)e2ia/q
a=1

is the Gauss sum.


When is primitive and (1) = 1 the function (v, ) vanishes iden-
tically. Instead we use

X 2 v/q
1 (v, ) := n(n)en ,
n=

for which one finds the analogous transformation formula

i () 1 1
1 (v, ) = 3/2
1 ( , ).
q v v

30
These two transformation formulae then lead to the analytic continuation
and functional equation for L(s, ). The conclusion is that, if is primitive
then L(s, ) has an analytic continuation to the whole complex plane, and
is regular everywhere, except when is identically 1, (in which case L(s, )
is just the Riemann Zeta-function (s)). Moreover, still assuming that is
primitive, with modulus q, we set
q s+a
(s, ) = ( )(s+a)/2 ( )L(s, ),
2
where
0, (1) = 1,
a = a() :=
1, (1) = 1.
Then
ia q 1/2
(1 s, ) = (s, ).
()
Notice in particular that, unless the values taken by are all real, this
functional equation relates L(s, ) not to the same function at 1 s but to
the conjugate L-function, with character .
It follows from the Euler product and the functional equation that there
are no zeros of (s, ) outside the critical strip. The zeros will be symmet-
rically distributed about the critical line = 1/2, but unless is real they
will not necessarily be symmetric about the real line. Hence in general it is
appropriate to define
N (T, ) := #{ : (, ) = 0 || T },
counting zeros both above and below the real axis. We then have
1 T qT T
N (T, ) = log + O(log qT )
2 2 2 2
for T 2, which can be seen as an analogue of the Riemann von Mangoldt
formula. This shows in particular that the interval [T, T + 1] contains around
1 qT
log
2 2
zeros, on average.
The work on regions without zeros can be generalized, but there are
serious problems with possible zeros on the real axis. Thus one can show
that there is a constant c > 0, which is independent of q, such that if T 2
then L(s, ) has no zeros in the region
c
1 , 0 < |t| T.
log qT

31
If is not a real-valued character then we can extend this result to the case
t = 0, but is a significant open problem to deal with the case in which
is real. However in many other important aspects techniques used for the
Riemann Zeta-function can be successfully generalized to handle Dirichlet
L-functions.

References
[1] R.C. Baker, G. Harman, and J. Pintz, The difference between consecu-
tive primes. II., Proc. London Math. Soc. (3), 83 (2001), 532-562.

[2] J.-R. Chen, on the representation of a large even integer as a sum of a


prime and a product of at most two primes, Kexue Tongbao, 17 (1966),
385-386.

[3] H. Davenport, Multiplicative number theory, Graduate Texts in Mathe-


matics, 74. (Springer-Verlag, New York-Berlin, 1980).

[4] J.B. Friedlander and H. Iwaniec, The polynomial X 2 + Y 4 captures its


primes, Annals of Math. (2), 148 (1998), 945-1040.

[5] G.H. Hardy and E.M. Wright, An introduction to the theory of numbers,
(Oxford University Press, New York, 1979).

[6] A.E. Ingham, On the difference between consecutive primes, Quart. J.


Math. Oxford, 8 (1937), 255-266.

[7] K. Ireland and M. Rosen, A classical introduction to modern number


theory, Graduate Texts in Math., 84, (Springer, Heidelberg-New York,
1990).

[8] H. Iwaniec, Almost-primes represented by quadratic polynomials, In-


vent. Math., 47 (1978), 171-188.

[9] H. Maier, Primes in short intervals, Michigan Math. J., 32 (1985), 221-
225.

[10] H. Maier, Small differences between prime numbers, Michigan Math. J.,
35 (1988), 323-344.

[11] I.I. Piatetski-Shapiro, On the distribution of prime numbers in sequences


of the form [f (n)], Mat. Sbornik N.S., 33(75) (1953), 559-566.

32
[12] H. Rademacher, Topics in analytic number theory, Grundlehren math.
Wiss., 169, (Springer, New York-Heidelberg, 1973).

[13] R.A. Rankin, The difference between consecutive prime numbers, J.


London Math. Soc., 13 (1938), 242-247.

[14] J. Rivat and J. Wu, Prime numbers of the form [nc ], Glasg. Math. J.,
43 (2001), 237-254.

[15] A. Selberg, On the normal density of primes in small intervals, and


the difference between consecutive primes, Arch. Math. Naturvid., 47,
(1943), 87-105.

[16] D.K.L. Shiu, Strings of congruent primes, J. London Math. Soc. (2), 61
(2000), 359-373.

Mathematical Institute,
24-29, St. Giles,
Oxford OX1 3LB

[email protected]

33

You might also like