The Prime Number Theorem And The Riemann Hypothesis: π (x) = # (primes p ≤ x) - ∼ ; i.e., π (x) log x x

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

THE PRIME NUMBER THEOREM AND THE RIEMANN HYPOTHESIS

PETE L. CLARK

1. Some history of the prime number theorem Recall we have dened, for positive real x, (x) = # {primes p x}. The following is probably the single most important result in number theory. Theorem 1. (Prime Number Theorem) We have (x) (x) log x = 1. x x lim 1.1. Gauss at 15. The prime number theorem (aectionately called PNT) was apparently rst conjectured in the late 18th century, by Legendre and Gauss (independently). In particular, Gauss conjectured an equivalent but more appealing form of the PNT in 1792, at the age of 15 (!!!). Namely, he looked at the frequency of primes in intervals of lengths 1000: (x) (x 1000) . 1000 Computing by hand, Gauss observed that (x) seemed to tend to 0, however very slowly. To see how slowly he computed the reciprocal, and found 1 log x, (x) (x) = meaning that 1 . log x Evidently 15 year old Gauss knew both dierential and integral calculus, because he realized that (x) was a slope of the secant line to the graph of y = (x). When 1 x is large, this suggests that the slope of the tangent line to (x) is close to log x, and hence he guessed that the function x dt Li(x) := log t 2 (x) was a good approximation to (x). Proposition 2. We have Li(x)
1 x log x ;

i.e.,

x . log x

PETE L. CLARK

Proof: A calculus exercise (LH opitals rule!). Thus PNT is equivalent to (x) Li(x). The function Li(x) called the logarithmic integral is not elementary, but has a simple enough power series expansion (see for yourself). Nowadays we have lots of data, and one can see that the error x | (x) Li(x)| is in general much smaller than | (x) log x |, so the dilogarithm gives a better asymptotic expansion. (How good? Read on.) 1.2. A partial result. As far as I know, there was no real progress for more than fty years, until the Russian mathematician Pafnuty Chebyshev proved the following two impressive results. Theorem 3. (Chebyshev, 1848, 1850) a) There exist explicitly computable positive constants C1 , C2 such that for all x, C1 x C2 x < (x) < . log x log x b) If limx Remarks: (i) For instance, one version of the proof gives C1 = 0.92 and C2 = 1.7. (But I dont know what values Chebyshev himself derived.)
x (ii) The rst part shows that (x) is of order of magnitude log x , and the second shows that if it is regular enough to have an asymptotic value at all, then it must x be asymptotic to log x . Thus the additional trouble in proving PNT is establishing this regularity in the distribution of the primes, a quite subtle matter. (We have seen that other arithmetical functions, like and d are far less regular than this their upper and lower orders dier by more than a multiplicative constant, so the fact that this regularity should exist for (x) is by no means assured.) (x) x/(log x)

exists, it necessarily equals 1.

(iii) Chebyshevs proof is quite elementary: it uses less machinery than some of the other topics in this course. However we will not give the time to prove it here: blame it on your instructors failure to understand the proof. 1.3. A complex approach. The next step was taken by Riemann in 1859. We have seen the Riemann zeta function 1 1 (s) = = (1 s )1 s n p p n=1 and its relation to the primes (e.g. obtaining an instantaneous proof that (x) by the above product factorization). However, Riemann considered (s) as a function of a complex variable: s = + it (indeed he used these rather strange names for the real and imaginary parts in his 1859 paper, and we have kept them ever since), so ns = n+it = n nit . Here n is a real number and nit = ei(log n)t is a point on the unit circle, so in modulus we have |ns | = n . From this we get that (s) is absolutely convergent for = (s) > 1. Using standard results from analysis, one sees that it indeed denes

THE PRIME NUMBER THEOREM AND THE RIEMANN HYPOTHESIS

an analytic function in the half-plane > 1. Riemann got the zeta function named after him by observing the following: Fact: (s) extends (meromorphically) to the entire complex plane and is analytic everywhere except for a simple pole at s = 1. We recall in passing, for those with some familiarity with complex variable theory, that the extension of an analytic function dened in one (connected) domain in the complex plane to a larger (connected) domain is unique if it exists at all: this is the principle of analytic continuation. So the zeta function is well-dened. The continuation can be shown to exist via an integral representation valid for > 0 and a functional equation relating the values of (s) to that of (1 s). (Note that the line = 1 2 is xed under the s 1 s.) Riemann conjectured, but could not prove, certain simple (to state!) analytic properties of (s), which he saw had profound implications on the distribution of the primes. 1.4. A nonvanishing theorem. Nevertheless it is a testament to the diculty of the subject that even after this epochal paper the proof of PNT did not come for almost another 40 years: in 1896, Jacques Hadamard and Charles de la Vall eePoussin proved PNT, independently, but by rather similar methods. The key point in both of their proofs (which Riemann could not establish) was that (s) = 0 for any s = 1 + it, i.e., along the line with = 1. Their proof does come with an explicit error estimate, albeit an ugly one: they showed in fact Theorem 4. There exist positive constants C and a such that | (x) Li(x)| Cxea
log x

It is not completely obvious that this is indeed an error bound, i.e., that ea log x lim = 0; x Li(x) this is left as another calculus exercise. 1.5. An elementary proof is prized. Much was made of the fact that the proof of PNT, a theorem of number theory, used nontrivial results from complex analysis (which by the end of the 19th century had been developed to a large degree of sophistication). Many people speculated on the existence of an elementary proof, a yearning that to my knowledge was never formalized precisely. Roughly speaking it means a proof that uses no extraneous concepts from higher analysis (such as complex analytic functions) but only the notion of a limit and the denition of a prime. It thus caused quite a stir when Atle Selberg and Paul Erd os (not independently, but not quite collaboratively either the story is a controversial one!) gave what all agreed to be an elementary proof of PNT in 1949. In 1950 Selberg (but not Erd os) received the Fields Medal. It seems fair to say that in recent times the excitement about the elementary proof has dimmed: most experts agree that it is less illuminating and less natural than the proof via Riemanns zeta function. Moreover the elementary proof remains quite intricate: ironically, more so than the analytic proof for those with

PETE L. CLARK

some familiarity with functions of a complex variable. For those who do not, the time taken to learn some complex analysis and then the proof of Hadamard - de la Vall ee-Poussin will be somewhat longer but ultimately more protable than the time spent digesting the elementary proof!1 1.6. Equivalents of PNT. It turns out that there are many statements which are equivalent to PNT: i.e., for which it is much easier to show that they imply and are implied by PNT than to prove them. A useful one is Theorem 5. Let pn be the nth prime. Then Note that this result implies (by the integral test) that pn this consequence is much easier to prove than PNT itself. pn n log n.
1 p

log log n; strangely

Far more intriguing is that that PNT is equivalent to an asymptotic formula for the average value of the M obius function: Theorem 6.
N

N lim
n=1

= 0. N Recall that the M obius function is 0 if n is not squarefree (which we know occurs 6 r with density 1 if n is a product of r distinct primes. We also 2 ) and is (1) saw that the set of all positive integers divisible by only a bounded number, say k , of primes is equal to zero, so most integers 1 n N are divisible by lots of primes, and by adding up the values of we are recording +1 if this large number is even and 1 if this large number is odd. It is very tempting to view this parity as being essentially random, similar to what would happen if we ipped a coin for each (squarefree) n and gave ourselves +1 if we got heads and 1 if we got tails. With this randomness idea planted in our mind, the above theorem seems to assert that if we ip a large number N of coins then (with large probability) the number of heads minus the number of tails is small compared to the total number of coin ips. But now it seems absolutely crazy that this result is equivalent to PNT since under the (as yet completely unjustied) assumption of randomness it is far too weak: doesnt probability theory tell us that the running total of heads minus tails will be likely to be on the order of the square root of the number of coin ips? Almost, but not quite. And is this probabilistic model justied? Well, that is the $ 1 million dollar question. 2. Coin-Flipping and the Riemann Hypothesis Let us dene the Mertens function M (N ) =
N n=1

(n)

(n).

The goal of this lecture is to discuss the following seemingly innocuous question. Question 1. What is the upper order of M (N )?
1Nevertheless Selberg became one of the great analytic number theorists of the 20th century: it turned out that the elementary proof of PNT was among his more minor work.

THE PRIME NUMBER THEOREM AND THE RIEMANN HYPOTHESIS

Among other incentives for studying this question there is a large nancial one: if the answer is close to what we think it is, then proving it will earn you $ 1 million! Recall (n) takes on only the values 1 and 0, so the trivial bound is M (N ) N. In fact we can do better, since we know that (n) = 0 i n is not squarefree, and we know, asymptotically, how often this happens. This leads to an asymptotic expression for the absolute sum:
N n=1

|(n)| = #{squarefree n N }

6 N. 2

(N ) However, in the last lecture we asserted that MN 0, which we interpreted as saying that the average order of is asymptotically 0. Thus the problem is one of cancellation in a series whose terms are sometimes positive and sometimes negative. Stop for a second and recall how much more complicated the theory of conditional convergence of such series is than the theory of convergence of series with positive terms. It turns out that the problem of how much cancellation to expect in a series whose terms are sometimes positive and sometimes negative (or a complex series in which the arguments of the terms are spread around on the unit circle) is absolutely a fundamental one in analysis and number theory. Indeed in such matters we can draw fundamental inspiration (if not proofs, directly) from probability theory, and to do so i.e., to make heuristic probabilistic reasoning even in apparently deterministic situations is an important theme in modern mathematics ever since the work of Erd os and Kac in the mid 20th century.

But our story starts before the 20th century. In the 1890s Mertens2 conjectured: (MC1) M (N ) N for all suciently large N .

This is quite bold. As we have seen, in studying orders of magnitude, it is safer to hedge ones bets by at least allowing a multiplicative constant, leading to the weaker (MC2) M (N ) C N for all N . The noted Dutch mathematician Stieltjes claimed a proof of (MC2) in 1885. But his proof was never published and was not found among his papers after his death. It would be very interesting to know exactly why Mertens believed in (MC1). He did check the inequality for all N up to N = 104 , but this is an amusingly small search by contemporary standards. (In your homework you are asked to re up your computer to compute many more values than this.) The problem is not as computationally tractable as one might wish, because computing the M obius function requires factorization of n, which is famously rather hard. Nevertheless we now know that (MC1) holds for all N 1014 .
2Franz Mertens, 18401927

PETE L. CLARK

It says something about the diculty of such questions that, while the mathematical community has viewed (MC1) and (MC2) with dubiousness for some time, (MC1) was disproved only in 1985: Theorem 7. (te Riele, Odlyzko): There are explicit constants C1 > 1, C2 < 1 such that M (N ) lim sup C1 , N N M (N ) lim inf C2 . N N That is to say, each of the inequalities N M (N ) and M (N ) N fails for innitely many N . Their proof does not supply a concrete value of N for which M (N ) > N , but we know now that there is such an N < 10156 . We still do not know whether (MC2) holds so conceivably Stieltjes was right all along and the victim of some terrible mix up although I am about to spin a tale to try to persuade you that (MC2) should be almost, but not quite, true. But rst, what about the million dollars? In the last section we mentioned two interesting equivalents of PNT. The following theorem takes things to another level: Theorem 8. The following statements (none of which are known!) are equivalent: a) For all > 0, there exists a constant C such that |M (N )| C N 2 + . b) | (x) Li(x)|
1 8 x log x
1

for all x 2657.

1 . c) Suppose (s0 ) = 0 for some s0 with real part 0 < (s0 ) < 1. Then (s0 ) = 2

We note that the rather esoteric-sounding part c) which refers to the behavior of the zeta function in a region which it is not obvious how it is dened (but was rst shown by Riemann to be dened there) is the illustrious Riemann hypothesis. One can see immediately why we care about this weird hypothesis: it is equivalent to a wonderful error bound in the prime number theorem (which one can show to be essentially best possible it is known that the error cannot be taken to be C x for all x). In 2000 the Clay Math Institute set the Riemann Hypothesis as one of seven $ 1 million prize problems. If you dont know complex analysis, no problem: just prove part a), that the cancellation in the partial sums of the M obius function is enough to make the sum less than or equal to a constant times any power of N greater than 1 2. Note that (MC1) (which is false!) = (MC2) = condition a) of the theorem, so in announcing a proof of (MC2) Stieltjes was announcing a stronger result than the Riemann hypothesis, which did not have a million dollar purse in his day but was no less a mathematical holy grail then than now. (So you can decide how likely it is that Stieltjess paper got lost in the mail and never found.) But why should we believe in the Riemann hypothesis anyway? There is some experimental evidence for it in any rectangle |t| N , 0 < < 1 the zeta function

THE PRIME NUMBER THEOREM AND THE RIEMANN HYPOTHESIS

can have only nitely many zeros (this holds for any function meromorphic on C), so one can nd all the zeros up to a certain imaginary part, and the fact that all of these zeros lie on the critical line i.e., have real part 1 2 has been experimentally conrmed in a certain range of t. It is also known that there are innitely many zeros lying on the critical line (Hardy) and that even a positive proportion of them as we go up lie on the critical line (Selberg as I said, a great mathematician). For various reasons this evidence is rather less than completely convincing. So let us go back to randomness suppose really were a random variable. What would it do, in all probability? We can consider instead the random walk on the integers, where we start at 0 and at time i, step to the right with probability 1 2 and step to the left with probability 1 . Formally speaking, our walk is given by an innite sequence {i } i=1 , 2 each i = 1. The set of all such sign sequences, {1} forms in a natural way a probability space (meaning it has a natural measure but dont worry about the details; just hold on for the ride). Then we dene a random variable S (N ) = 1 + . . . + n , meaning a function that we can evaluate on any sign sequence, and it tells us where we end up on the integers after N steps. Now the miracle of modern probability theory is that it makes perfect sense to ask what the lim sup of SN is. If you happened to catch an undergraduate course in probability theory (good for you. . .) you will probably remember that SN should be no larger than N , more or less. But this seems disappointing, because that is (MC1) (or maybe (MC2)), which feels quite dubious for the partial sums of the M obius function. But in between Mertens day and ours probability theory grew up, and we now know that N is not exactly the correct upper bound. Rather, it is given by the following spectacular theorem: Theorem 9. (Kolmogorov) With probability 1, we have SN lim sup = 1. 2N log log N N That is to say, if you randomly ip a fair coin N times, then in all probability there will be innitely many moments in time when your running tally of heads minus tails is larger than any constant times the square root of the number of ips. (Similarly, and symmetrically, the limit inmum is 1.) So true randomness predicts that (MC2) is false. On the other hand, it predicts that the Riemann Hypothesis is true, since indeed for all > 0 there exists a constant C such that 2 log log N < C N . So if we believed in the true randomness of , we would believe the following Conjecture 10. M (N ) < . N log log N N M (N ) > . lim inf N N log log N lim sup

PETE L. CLARK

Just to make sure, this conjecture is still signicantly more precise than the |M (N )| 1 C N 2 + which is equivalent to the Riemann Hypothesis, making it unclear exactly how much we should pay the person who can prove it: $ 2 million? Or more?? Kolmogorovs law of the iterated logarithm, and hence Conjecture 10, does not seem to be very well-known outside of probabilistic circles.3 In searching the literature I found a paper from the 1960s predicting such a logarithm law for M ( N ). More recently I have seen another paper suggesting that perhaps it should be log log log N instead of log log N . To be sure, the M obius function is clearly not random, so one should certainly be provisional in ones beliefs about the precise form of the upper bounds on M (N ). The game is really to decide whether the M obius function is random enough to make the Riemann hypothesis true. Nevertheless the philosophy expressed here is a surprisingly broad and deep one: whenever one meets a sum SN of N things, each of absolute value 1, and varying in sign (or in argument in the case of complex numbers), one wants to know how much cancellation there is, i.e., how far one can improve upon the trivial bound of |SN | N . The mantra here is that if there is really no extra structure in the summands i.e., randomness then one should expect SN N , more or less! More accurately the philosophy has two parts, and the part that expresses that |SN | should be no smaller than N unless there is hidden structure is an 2i extremely reliable one. An example of hidden structure is an = e N , when in fact n an = 0.
n=1

But here we have chosen to sum over all of the N th roots of unity in the complex plane, a special situation. The second part of the philosophy allows us to hope that SN is not too much larger than N . In various contexts, any of C N , N log N , 1 N 2 + , or even N 1 for some > 0, may count as being not too much larger. So in truth our philosophy of almost squareroot error is a little bit vague. But it can be, and has been, a shining light in a dark place,4 and we will see further instances of such illumination. Acknowledgement: The two lectures on these topics were delivered without formal lecture notes but only a small cheat sheet. I would have had diculty recovering what was said were it not for the very clear class notes of Ms.5 Diana May.

3I learned about Kolmogorovs theorem from a talk at Harvard given by W. Russell Mann. 4When all other lights go out? 5Now Dr.

You might also like