152 Main
152 Main
MOOR XU
NOTES FROM A COURSE BY KANNAN SOUNDARARAJAN
Abstract. These notes were taken from math 152 (Elementary Theory of Numbers) taught
by Kannan Soundararajan in Fall 2010 at Stanford University. These notes were live-TEXed
during the lecture in vim and compiled using latexmk. Each lecture gets its own section.
The notes were not edited afterward, so there may be typos; please email corrections to
[email protected].
1. 9/20
Information for the course exists at the website http://math.stanford.edu/~ksound.
There is no required book for the course, but some books are on reserve in the library.
30% from homework, 30% from one midterm, 40% from the final.
We have a CA, and he will also have office hours. Sound’s office is in 383W. His office
hours will be Thursdays, 1-3.
Homeworks will be given Wednesdays and due the following Wednesday.
1.1. Introduction. This course is about number theory, which is the study of properties of
N or Z or Q.
There is some kind of basic theory leading up to quadratic reciprocity. I like to have a
big theorem, and we’ll end up proving that given any arithmetic progression, it contains an
infinite number of primes.
Today we’ll start with primes.
1.2. Primes. One definition that you can make is the definition of an irreducible. We make
a distinction here just for fun.
Definition 1.2.1. A natural number n > 1 is called irreducible if n cannot be written as
n = ab with 1 < a, b < n.
This is what usually one calls a prime number.
That’s actually one way of writing what a prime is; here’s one that is more natural. We
need the concept of divisibility.
Definition 1.2.2. Given two integers a 6= 0 and b; we say that a|b if b = ac for some integer
c.
Definition 1.2.3. A natural number p > 0 is prime if p|ab =⇒ p|a or p|b.
It’s not clear that our two definitions are the same, so we need to prove a theorem.
Theorem 1.2.4. The primes and irreducibles are the same.
1
Proof. It is true that we can write each number as a product of irreducibles. We can prove
this by induction. That’s the same as factoring a number. We’d like to say that there is
only one way of factoring each number. Let’s prove this first.
Theorem 1.2.5 (Fundamental Theorem of Arithmetic). Every number n is a product of
primes in a unique way.
This needs a proof; it is not an obvious fact. Why? Consider an example.
Example 1.2.6. Consider the even numbers A = {2, 4, 6, 8, . . . }. The irreducibles are 2,
6, 10, 14, 18, 22, etc – the numbers that are not multiples of 4. Not every number can be
written uniquely as a product of irreducibles; for example, 60 = 6 · 10 = 2 · 30. Why is it
true that unique factorization doesn’t hold here? Why do proofs of FTA fail in this case?
There is the division algorithm:
Proposition 1.2.7 (Division algorithm). Given n, a ∈ N, we can find q, r ∈ Z with n =
aq + r, and 0 ≤ r < a.
Does this hold for our example? If we divide 30 by 2, we would get a remainder that is
too large: 30 = 2 · 14 + 2.
There are other contexts when the division algorithm holds, but it’s not always clear.
This suggests that we need to use the division algorithm in our proof. We need to prove
something about the greatest common divisor.
Definition 1.2.8. Given two numbers a, b ∈ Z (not both 0), we say that g ∈ N is the
greatest common divisor of a and b if g|a and g|b, and if it is the largest such number –
no number greater than g divides both a and b.
There is a really nice property of the GCD that will help us.
Theorem 1.2.9. Given a and b, there exists integers x, y such that g = ax + by.
Let’s use this to finish proving our Theorem 1.2.4.
First, we need to show that every prime is irreducible. Suppose not; there is a prime p
that is not irreducible. Then express p = cd with 1 < c, d < p. Then p|cd, which implies
that p|c or p|d (since p is prime), which is a contradiction because c, d < p.
Conversely, we show that every irreducible is prime. Given an irreducible n, and that
n|ab, we want to have that n|a or n|b. Suppose n doesn’t divide a. The GCD of n and a is
1, so Theorem 1.2.9 tells us that 1 = nx + ay, so that b = nbx + aby. Therefore n|b, which
is what we wanted to show. This means that n is prime.
Proof of Theorem 1.2.5. We can now prove Theorem 1.2.5. We need to show uniqueness.
Suppose that there are two factorization:
n = p1 · · · pr = q1 · · · qs .
Now, p1 |p1 · · · pr , so p1 |q1 · · · qs . Therefore, p1 divides one of q1 , . . . qs . But q1 . . . qs are
irreducibles, so p1 equals one of q1 , . . . qs . We can then cancel p1 on both sides and continue
with p2 .
Now we turn to the proof of the theorem for the GCD.
2
Proof of Theorem 1.2.9. Let S = {ax + by : x, y ∈ Z}. Clearly, 0, a, b ∈ S. Let’s just look
at the positive numbers. By the well-ordering property, there is a smallest number; let s be
the smallest natural number in S.
We claim that every element of S is a multiple of s. This comes from the division algorithm:
for n ∈ S, n = qs + r where 0 ≤ r < s, then r ∈ S. But then r = 0 by the minimality of s.
In particular, s|a and s|b, so s = ax + by is a common divisor of a and b. We need
to show that there are no bigger divisors. Any common divisor of a and b divides s, so
s = gcd(a, b).
Of course, this could also have been proved using the Euclidean Algorithm. Here, we
computed (312, 968) = 8, and 8 = 10 · 968 − 31 · 312.
Now that we know what the primes are, let’s talk about properties of primes.
Theorem 1.2.10 (Euclid). There are infinitely many primes.
We consider several proofs:
Proof. Suppose not. Then there are only finitely many primes p1 , . . . , pn are all the primes.
But then p1 · · · pn + 1 is a new prime, which is a contradiction.
Proof. If there are few primes, there must also be few natural numbers. But we know the
number of natural numbers. That’s the idea; let’s make this precise.
Usually, π(x) will denote the number of primes up to x. Consider n ≤ x. Factorize a
number n ≤ x such that
n = pα1 1 · · · pαs s .
Assume that there are only k primes.
Every number can be written as n = ab2 where a is square-free. This is of course unique.
If there are only k primes, there are only 2k square-free numbers. Now,
X X √ X √
1= 1< x 1 ≤ 2k x.
n≤x ab2 ≤x a≤x
a square-free a square-free
Proof. We have
X
log n! = log m.
1≤m≤n
We can write
1
log(n + 1) = log n + log(1 + )
n
and use a Taylor expansion. So
n+1
(n + 1) log(n + 1) = (n + 1) log n + − ....
n
2. 9/22
Instead of writing down inequalities all the time, we want a more convenient notation to
drop insignificant terms.
Definition 2.0.12. f (x) = O(g(x)) if there is a constant C such that |f (x)| ≤ Cg(x) for all
large x.
√
Example 2.0.13. For example, x = O(ex ), x = O(x), sin(x) = O(1), log x = O(x0.01 ).
Our previous inequalities for log n! can now be written as
Theorem 2.1.1. As n → ∞,
X log p
= log n + O(1).
p≤n
p
This gives us
22n
2n Y
≤ ≤ 2n = (2n)π(2n) .
2n + 1 n p≤2n
This actually tells us that
2n
2
log 2n+1 2n log(2n + 1)
2n
π(2n) ≥ = log 2 − ≥ log 2 − 2,
log(2n) log 2n log 2n log 2n
6
2n
which gives us a Chebyshev bound. Why did we consider n
? Ramanujan came up with
the proof.
Example 2.2.1 (Chebyshev).
(30n)!n!
∈ N.
(15n)!(10n)!(6n)!
Corollary 2.2.2.
x
π(x) ≥ (log 2) + O(1).
log x
We also have the following bound:
2n 2n
2 ≥ ≥ nπ(2n)−π(n) ,
n
so that
2n log 2
π(2n) − π(n) ≤ .
log n
As before, if we want, we can replace n by a real number x to get
x
π(2x) − π(x) ≤ (2 log 2) + O(1).
log x
This also gives an estimate for π(x) by summing this formula and dividing by 2 at each step.
x/2
π(x) − π(x/2) ≤ (2 log 2) + O(1).
log x/2
This gives
x
π(x) ≤ (2 log 2) + O(log x),
log x
yielding the other half of the Chebyshev bound. This is enough (with some tweaking) to
2n
Bertrand’s Postulate. This uses the fact that if 3 ≤ p ≤ n then p does not divide
prove
2n
n
.
3. 9/27
We will discuss the theory of congruences, leading up to quadratic reciprocity, for the next
few lectures.
3.1. Congruences.
Definition 3.1.1. n > 0, n ∈ N, a, b ∈ Z, a ≡ b (mod n) means that n | (a − b).
It is easy to see that this forms an equivalence relation. This means that it is satisfies
(1) a ≡ a (mod n)
(2) a ≡ b (mod n) iff b ≡ a (mod n)
(3) a ≡ b (mod n) and b ≡ c (mod n) means that a ≡ c (mod n).
We have more properties:
• a ≡ b (mod n) =⇒ ax ≡ bx (mod n)
• c ≡ d (mod n) =⇒ a + c ≡ b + d (mod n)
We cannot always cancel, however. If ax ≡ bx (mod n) and (x, n) = 1, then a ≡ b (mod n).
7
3.1.1. Residue classes. It is natural to think about equivalence classes. Given any n, Z
splits into n equivalence classes. These are called the residues mod(n). The set of residue
classes (mod n) forms an additive group, satisfying the standard properties of associativity,
existence of identity, and inverses.
We can also multiply residue classes: a (mod n) × b (mod n) = ab (mod n). In general,
this does not form a group, as 0 does not have an inverse. There is a multiplicative identity
1 (mod n).
Theorem 3.1.2. The congruence ax ≡ b (mod n) as a unique solution (mod n) if (a, n) =
1.
If (a, n) = g, then g must divide b for there to be a solution.
Proof. From (a, n) = 1, we see that 1 = ax + ny. This means that we can solve ax ≡ 1
(mod n), so b = abx + nby, so we therefore can solve ax ≡ b (mod n).
For uniqueness, suppose that ax ≡ b (mod n) and ay ≡ b mod n. We can subtract these
equations and cancel because (a, n) = 1.
So this tells us precisely which residue classes are invertible.
Definition 3.1.3. a (mod n) is a reduced residue class if (a, n) = 1.
Every reduced residue class is invertible. The set of reduced residue classes (mod(n)) with
the operation of multiplication again forms an abelian group. (Check this).
It was clear that the additive group had n elements. It’s not so clear for this multiplicative
group.
Definition 3.1.4. The Euler phi function φ(n) is the number of reduced residue classes
(mod n).
Note. If p is prime, φ(p) = p − 1, and φ(p2 ) = p2 − p.
If we look at residue classes (mod p), we only need to check distributivity to see that this
forms a field with (+, ×).
3.2. Useful theorems.
Theorem 3.2.1 (Wilson’s Theorem). If p is a prime then (p − 1)! ≡ −1 (mod p).
Exercise 3.2.2. If n > 4 is composite then (n − 1)! ≡ 0 (mod n).
Remark. Someone told Gauss that this would be hard to prove because there are no good
ways to write primes, and Gauss said that they needed new notions and not new notations.
Proof of Wilson’s Theorem. Consider
1 × 2 × 3 × · · · × p − 1.
If a (mod p), there exists a−1 (mod p), and we can cancel the two. This is good as long as
a 6≡ a−1 (mod p), but a ≡ a−1 (mod p) means that a2 ≡ 1 (mod p), so we need to consider
1 and −1 as the two that do not cancel. Hence we get that the product is congruent to −1
(mod p).
From a (mod n), we can compute a2 (mod n), a3 (mod n), etc. Since there are a finite
number of reduced residue classes, we must come back to something that we had earlier. So
ak ≡ al (mod n) for some k < l, so that al−k ≡ 1 (mod n).
8
Definition 3.2.3. The order of the residue class (mod n) is the smallest g ∈ N such that
ag ≡ 0 (mod n).
Example 3.2.4. The order of 1 is 1, and the order of −1 is 2 (if n > 2). Anything else
would be hard to compute.
Theorem 3.2.5 (Euler’s Theorem). If (a, n) = 1, then order of a divides φ(n).
Corollary 3.2.6 (Fermat’s Little Theorem). If p is prime, the order if a (mod p) divides
p − 1. Equivalently, if p - a then ap−1 ≡ 1 (mod p), or equivalently, ap ≡ a (mod p).
Proof of Euler’s Theorem. Consider the φ(n) residue classes 1 (mod n), . . . , (n−1) (mod n).
What happens when we multiply these by a mod n? We get a (mod n), . . . , (−a) (mod n).
We claim that these two sets are the same. Each of these is reduced, so it was claimed in
the original set. The reverse is also true because ax ≡ b (mod n) has a solution, all of these
are distinct. So our two sets of residue classes are permutations of each other.
So Y Y Y
b≡ (ab) (mod n) ≡ aφ(n) b (mod n).
(b,n)=1 (b,n)=1 (b,n)=1
b (mod n) b (mod n) b (mod n)
p−1
αj
β q
Ifqj j is the order of a , but if the order if even smaller, you won’t have enough powers of
j
6. 10/6
Midterm in two weeks: Wednesday week after next: October 20.
We have a polynomial f (x) ∈ Z[x], f (x) = an xn + an−1 xn−1 + · · · + a0 . We’re interested
in solutions to f (x) ≡ 0 (mod p). We’re really interested in the coefficients (mod p).
We proved:
Theorem 6.0.6. If (an , p) = 1, then f (x) ≡ 0 (mod p) has at most n solutions.
Proof. Suppose not, then b1 , . . . , bn are distinct solutions (mod p). Then f (x) = an (a −
b1 )(x − b2 ) . . . (x − bn ) = g(x) where g(x) has degree < n. Then g(b1 ), . . . , g(bn ) ≡ 0 (mod p).
Contradiction unless all coefficients of g are 0 (mod p). But plug in x = bm1 .
Last time, this was drowned in notation.
Lemma 6.0.7. If q α || p − 1 then there is an element a (mod p) of order q α .
Proof. Suppose there is an element b (mod p) of order q α r. Then br has order q α . There is
no element of order a multiple of q α . So every element has order dividing p−1
q
.
p−1
So x q ≡ 1 (mod p) has p − 1 solutions. This is a contradiction.
If p − 1 = q1α1 q2α2 . . . qlαl , Then we get ai (mod p) of order qiαi . We multiply these together
to get that a1 . . . al has order p − 1.
We have therefore proved that
Theorem 6.0.8. (Z/pZ)∗ is cyclic, and there is a primitive root (mod p).
6.1. Lifting.
6.1.1. Lift from p to p2 . We will lift primitive roots (mod p) to primitive roots (mod p2 ).
We began this last time.
Consider a primitive root g (mod p). There are p of them (mod p2 ): g + kp (mod p2 )
where 0 ≤ k ≤ p − 1. These are the only candidates for a primitive root (mod p2 ).
Exercise 6.1.1. If g (mod p2 ) is a primitive root, then g (mod p) is a primitive root.
14
Solution. If g r ≡ 1 (mod p), then g r = 1 + ap. Then g rp = (1 + ap)p ≡ 1 (mod p2 ). The
order of g (mod p2 ) is a multiple of r that divides rp.
6.1.2. Lift from p2 to p3 . Suppose g (mod p2 ) is a primitive root and g + kp2 (mod p3 ). The
possible orders are p2 (p − 1) or p(p − 1). Now write g p(p−1) = 1 + bp2 , and we want to
understand (g + kp2 )p(p−1) (mod p3 ). This comes from the binomial theorem:
p(p−1) p(p−1) p(p − 1)
(g + kp) =g + kp2 g p(p−1)−1 ≡ g p(p−1) (mod p3 ).
1
Could g p(p−1) have been 1 (mod p3 )? Could b have been a multiple of p? We can write
g p−1 = 1 + ap and p - a. Then
p−1 p p p
(g ) =1+ (ap) + (ap)2 + · · · ≡ 1 + ap2 (mod p3 ).
1 2
6.1.3. Structure of (Z/2pα Z)× . This is the same as the structure of (Z/pα Z)× by the Chinese
Remainder Theorem.
Why did we have to assume that p is an odd prime? (Z/4Z)× = {1, 3 (mod 4)} is
generated by 3. However, for (Z/8Z)× , every element has order 1 or 2. This means that it
is the Klein four group Z/2Z × Z/2Z. What is different in the proof?
If α ≥ 3, then (Z/2α Z)× has size 2α−1 . Then 5 has order 2α−2 . Also include -1, and we
can prove that a (mod 2)α is ±5j for some 1 ≤ j ≤ 2α−2 .
15
6.2. Quadratic Congruences. Consider ax2 +bx+c ≡ 0 (mod p), where p is odd. There’s
not much to do for p = 2. We can also assume that (a, p) = 1. First, we complete the square:
4a2 (ax2 + bx + c) ≡ 0 (mod p)
2
(2ax) + 2(2a)bx + 4ac ≡ 0 (mod p)
(2ax + b)2 ≡ b2 − 4ac (mod p).
Therefore, it is sufficient to solve y 2 ≡ d (mod p), where d is the discriminant b2 − 4ac.
Definition 6.2.1. A residue class a (mod p) is a called a quadratic residue if there are 2
solutions to x2 ≡ a (mod p) and a nonresidue if there are no solutions to the congruence. If
a ≡ 0 (mod p), then there is only one solution.
Definition 6.2.2. The Legendre symbol is
a 1 if a is a quadratic residue (mod p)
= −1 if a is a quadratic nonresidue (mod p)
p
0 if a ≡ 0 (mod p).
7. 10/11
We want to understand quadratic congruences (mod n), and it is sufficient to understand
them (mod p); from that, simply use the Chinese Remainder Theorem.
We considered ax2 + bx + c ≡ 0 (mod p), p is odd, and (a, p) = 1. This reduced to solving
2
y ≡ d (mod p), which led to the definitions of quadratic residue and Legendre symbol.
7.1. Quadratic residues. There are primitive roots g (mod p), so for every (n, p) = 1,
then n ≡ g a (mod p) for some a. From this, we see that if a is even, then n is a quadratic
residue because n ≡ (g a/2 )2 (mod p). If a is odd, then n is a quadratic nonresidue, since
otherwise g = (nb )2 (mod p).
Fromthis,
wemn see
that the Legendre symbol is (completely) multiplicative, which means
m n
that p p = p .
Proposition 7.1.1 (Euler’s Criterion). If (n, p) = 1 then
n p−1
≡n 2 (mod p).
p
Proof. Write n ≡ g k (mod p). If n is even, the left hand side is 1, which is congruent to the
right hand side.
If n is odd, the LHS is −1, so we have
p−1
−1 ≡ g (2l+1) 2 ≡ g (p−1)/2 ≡ −1 (mod p).
Corollary 7.1.2.
(
−1 p−1 +1 if p ≡ 1 (mod 4)
= (−1) 4 =
p −1 if p ≡ 3 (mod 4)
16
This makes it easy to determine whether a number is a quadratic residue (mod p).
We can produce a primality test. We want to see if n is prime. Pick any a < n, and check
n−1
if an−1 ≡ 1 (mod n). If this 6≡ 1, then n is composite. If ≡ 1, then we look at a 2 ≡ ±1.
n−1
If it is −1, we stop. If it is +1, see if n−1
2
is even and check if a 4 ≡ ±1.
If a number n passes this test for any value of a, it is called a strong pseudoprime. Try a
different a with this procedure. This is a very rapid process.
If the Generalized Riemann Hypothesis is true, then this algorithm works efficiently.
Theorem 7.1.3 (Gauss’s Law of Quadratic Reciprocity). Given two primes p and q (dif-
ferent and odd), then
(
p q p−1 q−1 1 if either p or q is 1 (mod 4)
= (−1) 2 2 =
q p −1 if both p and q are 3 (mod 4).
This is a result that is theoretically interesting. It is not yet clear why it is an interesting
or important fact. People are still looking for similar reciprocity laws in other cases. You’ll
have to trust me that it is interesting. It is not a useful computational tool.
The Legendre symbol (mod p) has some properties:
(1) is periodic (mod p)
(2) is completely multiplicative
This is rather surprising, as it is not clear why such a function should exist. We’ll later
find all such functions that are periodic and multiplicative. This will be the crucial thing to
prove Dirichlet’s Theorem on producing primes in arithmetic progressions.
Example 7.1.4.
1
if n ≡ 1 (mod 4)
−1 if n ≡ 3 (mod 4)
0 if n is even.
17
Note that the number of times we get −bj is equal to the number of times that aj (mod p)
lies in [ p+1
2
, p − 1]. We don’t really care about this number; just whether it’s even or odd.
Now,
aj
aj = + r.
p
There are two cases: r = bj or r = p − bj . If it is +bj , then it has the same parity as bj ; if it
is −bj , then it has the opposite parity as bj .
We can now compute
p−1 p−1 p−1 p−1 p−1
2 2 X 2 2 2
X X aj X aj X
aj = p + rj = p + (# of −bj terms) + j
j=1 j=1
p j=1 j=1
p j=1
p−1 p−1
2 2
X aj X
≡ + j + (# of −bj terms) (mod 2).
j=1
p j=1
Now,
p−1 p−1
p−1 p+1 p−1 p+1 2 2
p2 − 1
2 2 2 2
X aj X aj
(# of −bj terms) ≡ a − + = (a − 1) + (mod 2).
2 2 j=1
p 8 j=1
p
Corollary 7.1.6.
(
2 p2 −1 +1 if p ≡ 1 (mod 8) or p ≡ −1 (mod 8)
= (−1) 8 =
p −1 if p ≡ 3 (mod 8) or p ≡ 5 (mod 8)
Now,
P p−1 P p−1
q p2 −1
= (−1) 8 (q−1)+ j=1 b p c = (−1) j=1 b p c .
2 qj 2 qj
p
Similarly,
P p−1
p
= (−1) k=1 b q c ,
2 pk
q
so their product is
P p−1 P p−1
p q 2
b qj
c 2
+ k=1 b pkq c .
= h(−1) j=1 p
q p
p−1
We need one final trick. Consider all numbers of the form qj − pk where 1 ≤ j ≤ 2
and
1 ≤ k ≤ q−1
2
.
18
There are p−1
q−1
2 2
nonzero integers. Some are positive and some are negative. How
many
j k positive numbers are there? For it to be positive, we need qj > pk. Given j, there are
qj
p
such values of k. The total positive values are therefore
p−1
2
X qj
.
j=1
p
j k
pk
The negative values come from qj < pk. Given k, the number of j is q
, so the total
negative values is
p−1
2
X pk
.
j=1
q
So
P p−1 P p−1
p q 2
b qj
c 2
+ k=1 b pkq c . = (−1)( p−1 q−1
2 )( 2 ) .
= (−1) j=1 p
q p
Remark. This was not the most intuitive proof, but it doesn’t require much machinery to
set up. There are more intuitive proofs. For example, we want to work with congruences of
things other than integers. Here,
(a + b)p ≡ ap + bp (mod p)
might hold for algebraic integers a and b, i.e. where a and b are solutions to monic polynomial
equations with integer coefficients.
There are nice algebraic integers that you can construct; these are called Gauss sums:
p−1
X n 2πin
e p .
n=1
p
These are also algebraic integers. Now we can do ingenious things:
p−1
!q
X n 2πin X nq q X nq 2πinq q X m 2πinq
e p ≡ = e p = e p .
n=1
p p p p p p
Therefore,
q
≡ (Gauss sum)q−1 (mod p).
p
8. 10/13
2
We now know
how 2to solve any quadratic congruence ax + bx + c (mod p). This leads to
d
computing p , d = b − 4ac.
We know a variety of facts about the Legendre symbol. In particular, it gives statements
of the form
(
d 1 if p lies in some residue classes (mod 4|d|)
=
p −1 if p lies in some other residue classes (mod 4|d|)
19
Example 8.0.7.
(
5 p 1 if p ≡ 1, 4 (mod 5))
= =
p 5 −1 if p ≡ 2, 3 (mod 5))
( p
7 7
if p ≡ 1 (mod 4))
= p
p − 7 if p ≡ 3 (mod 4)),
which leads to conditions on p (mod 28).
Figure out what happens when d = 35.
8.1. Absolute Values in Q. We’ll talk about some pretty theorems. There’s a completely
different way where primes appear. This has to do in some way with analysis. If you think
about real analysis, it is based on the notion of distance between two numbers, which is
based on absolute value. This has some nice properties: |xy| = |x||y| and |x + y| ≤ |x| + |y|.
The questions that we want to think about involve the field Q of rational numbers.
Definition 8.1.1. An absolute value on Q is a function f : Q → R≥0 with the following
properties:
(1) f (x) = 0 iff x = 0
(2) f (xy) = f (x)f (y) for all x, y ∈ Q
(3) f (x + y) ≤ f (x) + f (y) (triangle inequality).
Example 8.1.2. The trivial absolute value: f (0) = 0, f (x) = 1 for all x 6= 0.
Example 8.1.3. f (x) = |x| satisfies these properties. f (x) = |x|α does too when 0 < α ≤ 1.
The conditions on α come from the triangle inequality. We want to check that
xα + y α ≥ (x + y)α .
If we take x = y then we want 2 > 2α , so clearly α < 1. To show our condition, divide both
sides by y α to reduce the inequality to tα + 1 ≥ (t + 1)α , which is a problem in single variable
calculus.
There are another class of examples that come from primes.
Example 8.1.4. p-adic absolute value. Let p be a prime. Consider n ∈ N, and write
n = pα b. Here, pα ||n, so p - b, α ≥ 0.
Define the p-adic absolute value as
|n|p = p−α ,
and additionally, define | − 1|p = 1.
If we have a rational number m n
, define
m |m|p
= .
n p |n|p
Multiplicativity is obvious. We need to check the triangle inequality. For simplicity, we
do this for the integers. Suppose that n1 = pα1 b1 , n2 = pα2 b2 . We want to show that
|n1 + n2 |p ≤ |n1 |p + |n2 |p .
20
Note that |n1 |p = p−α1 , |n2 |p = p−α2 , and |n1 + n2 |p ≤ p− min(α1 ,α2 ) = max(p−α1 , p−α2 ). So
the triangle inequality is true, and indeed, we’ve shown something stronger:
|n1 + n2 |p ≤ max(|n1 |p , |n2 |p ).
To check the triangle inequality for rational numbers, we can extend the previous argument
by clearing denominators.
We needed p to be prime, because otherwise the multiplicativity fails.
With the normal absolute value, the absolute value of the rational numbers form a dense
set in R. In this case, however, the image is |Q|p = {pn : n ∈ Z}.
Strangely, p, p2 , p3 , . . . is small while p1 , p12 , . . . are large.
Example 8.1.5. As in the case of the usual absolute value, we can raise this to a power |x|αp .
With our new triangle inequality, we see that the triangle inequality is satisfied whenever
α > 0.
Theorem 8.1.6 (Ostrowski). These are all of the absolute values on Q.
Proof. Let f be an absolute value on Q. Note that multiplicativity implies that f (1) = 1.
Then f (n) ≤ n by the triangle inequality.
If we consider values of f (n), n ∈ N, there are two cases: all are ≥ 1, or at least one of
them is < 1. We want to show that they come from the normal absolute value and the p-adic
absolute value respectively.
Case 1: Pick the smallest n ∈ N with f (n) < 1. Then by minimality, n = p is prime.
Consider r ∈ N, and take its base p expansion
r = b0 + b 1 p + b2 p 2 + · · · + b s p s ,
0 ≤ bj ≤ p − 1. Then
log r
f (r) ≤ f (b0 ) + f (b1 ) + · · · + f (bs ) ≤ (p − 1)(s + 1) ≤ (p − 1) +1 .
log p
Now,
k k k log r
f (r ) = f (r) ≤ (p − 1) +1 ,
log p
so
1/k
1/k k log r
f (r) ≤ (p − 1) +1 .
log p
As k → ∞, we conclude that f (r) ≤ 1.
We want to show that if (r, p) = 1 then f (r) = 1. If (r, p) = 1, then (rk , pk ) = 1. By the
Euclidean algorithm, we can write 1 = rk x + pk y. This means that 1 ≤ f (rk x) + f (pk y) ≤
f (r)k + f (p)k . Let k → ∞. If f (r) < 1 then the right hand side goes to 0 as k → ∞, a
contradiction.
We now know that p is a prime for which f (p) < 1, and f (n) = 1 if (n, p) = 1. But this
tells us everything. Take any n = pa b. Then
f (n) = f (p)a f (b) = f (p)a .
Write f (p) = p−α , α > 0. Then f (p) = |p|αp , and we get that f (n) = |n|αp .
21
Case 2: Now, we consider the case when f (n) ≥ 1 for all n ∈ N. Pick two numbers m, n ∈ N,
m, n > 1. Write m in base n:
m = b0 + b1 n + b2 n2 + · · · + bs ns .
log m
Then f (m) ≤ (f (b0 ) + f (b1 ) + · · · + f (s))f (n)s < (s + 1)(n − 1)f (n)s , where s ≈ log n
.
Now,
k k log m log m
f (m ) ≤ 1 + (n − 1)f (n) log n .
log n
Take k-th roots, and let k → ∞. Then
log m
f (m) ≤ f (n) log n ,
so what we’ve shown is that
1 1
f (m) log m ≤ f (n) log n .
If we swap m and n, we see that
1 1
f (m) log m = f (n) log n .
1 1
If we write f (2) = 2α , then f (2) log 2 = f (n) log n , and we get f (n) = nα . The triangle
inequality forces 0 < α ≤ 1.
√
Remark. From the rational numbers, we can take the completion: we can obtain 2 as
the limit of rational numbers, but it is not a rational number itself. So we can extend this
absolute value continuously from the rational numbers to the real numbers. We can now
do the same thing with this p-adic absolute value. Think of taking sequences of rational
numbers and consider convergence in the p-adic absolute value.
Example 8.1.7. 1, 1 + 7, 1 + 7 + 72 , 1 + 7 + 72 + 73 , . . . is a sequence of integers. Does this
sequence converge? No for the usual absolute value, but it does converge for | · |7 . It forms
1
a Cauchy sequence, and it converges to 1−7 = − 16 = 1 + 7 + 72 + 73 + · · · .
√
In fact, we can even consider −1 in the 5-adics. We want to find a sequence x1 , x2 , . . .
of natural numbers with |x2n + 1|5 → 0.
Since 22 + 1 ≡ 0 (mod 5), we have |22 + 1|5 . We can lift 2 to 2 + k · 5, and get congruences
(mod p2 ), (mod p3 ), etc, yielding a converging sequence.
9. 10/18
9.1. Sum of Two Squares. We’ll try to describe all numbers that can be written as the
sum of two squares, and we’ll give two or three proofs of this.
Given a number n, we want to write n = x2 + y 2 , x, y ∈ Z. We want a characterization of
all such n. Here’s the main theorem:
Theorem 9.1.1. n = x2 + y 2 if and only if n = pα1 1 · · · pαk k such that if pj = 3 (mod 4) then
αj is even.
Let’s try to see why this condition is necessary. It is more difficult to show that it is
sufficient.
22
Proof. First, we show that the condition is necessary:
Suppose that n is not of this form and n is the sum of two squares. So p2β+1 ||n, p = 3
(mod 4). Then x2 + y 2 ≡ 0 (mod p) and hence x2 = −y 2 (mod p), and if (y, p) = 1, this
means that (x/y)2 = −1 (mod p), but −1 is a quadratic nonresidue (mod p).
So y is a multiple of p and x is a multiple of p, so x2 + y 2 is a multiple of p2 ; cancel p2 and
repeat.
Now, we show that the condition is sufficient:
First, if m = x21 + y12 is the sum of two squares and n = x22 + y22 is the sum of two squares
then mn is the sum of two squares. Here,
m = (x1 + iy1 )(x1 − iy1 ) n = (x2 + iy2 )(x2 − iy2 )
so that
mn = (x1 x2 − y1 y2 + i(x1 y2 + x2 y1 ))(x1 x2 − y1 y2 − i(x1 y2 + x2 y1 )).
If p ≡ 3 (mod 4), we showed that it isn’t the sum of two squares. But p2 = p2 + 02 is the
sum of two squares, which means that all even powers of p is the sum of two squares. So the
main fact that we want to show is the following theorem:
Theorem 9.1.2 (Fermat). If p ≡ 1 (mod 4), then p = x2 + y 2 .
l k 1
− <√ .
p x px
This is a problem that we can solve with the pigeonhole principle; it is guaranteed by
Dirichlet’s Theorem.
23
Theorem 9.1.3 (Dirichlet’s Theorem on Diophantine Approximation). Given a real number
θ, find a rational number aq which approximates θ, with q ≤ Q and
a 1
θ− < .
q qQ
Proof. Look at 0, θ, 2θ, . . . , Qθ (mod 1), i.e. subtract out the integer part and just keep the
fractional part. There are Q + 1 numbers here. Look at the Q boxes
1 1 2 Q−1
0, , , ,··· , ,1 .
Q Q Q Q
By the pigeonhole principle, there exist 0 ≤ j < k ≤ Q with the two numbers jθ and kθ
lying in the same box.
This means that jθ − kθ has fractional part less than Q1 . Then
integer 1
θ= + error, |error| ≤ .
j−k Q(j − k)
Remark. If you are given an irrational number θ, there are infinitely many q with
a 1
θ− ≤ 2.
q q
9.2. Z[i]. We will do arithmetic in the Gaussian ring of integers Z[i]. Here,
Z[i] = {a + bi : a, b ∈ Z}.
This is nice, but we can’t divide. Allowing division, we obtain
Q(i) = {a + bi : a, b ∈ Q}.
Units in Z[i] are ±1 and ±i. One thing that clarifies a lot of stuff is the norm. Define the
norm as
N (a + bi) = a2 + b2 = (a + bi)(a − bi).
This has various nice properties. For example, N (a + bi) is a positive rational number, and
if a + bi ∈ Z[i] then the norm is an integer. Furthermore,
N ((a + bi)(c + di)) = N (a + bi)N (c + di).
If u ∈ Z[i] and u1 ∈ Z[i], then u is called a unit. This means that N (u) = 1, so u = ±1, ±i
are the only units.
If π is a prime, π | αβ implies that π | α or π | β. Here, α | β if β = αγ with γ ∈ Z[i].
α is irreducible if α = βγ implies one of β or γ is a unit. This means that α is irreducible
if it can’t be written as a product of two numbers with smaller norm. Suppose N (α) = p,
then α is irreducible.
Example 9.2.1. 2 + i is irreducible. 7 is irreducible because if 7 = αβ, then N (7) = 49 =
N (α)N (β), which means that N (α) = 7, which is impossible because 7 is not the sum of the
two squares.
24
The question we want to ask is: Is there a division algorithm? If we want to divide, we
want to write as a quotient plus some remainder. Here,
a + bi
= ρ + σi,
c + di
with ρ, σ ∈ Q. Pick r and s to be the closest integers to ρ and σ. Then the quotient is r + si.
We need to show that the remainder has smaller norm than the number I divide by. This
isn’t too hard to do.
10. 10/25
Recall the theorem of Fermat that p ≡ 1 (mod 4) means that p can be written by the
sum of two squares. We already gave two proofs of this. The first was by looking at
minimal multiples of p as the sum of two squares, and the second was by Dirichlet’s theorem
on Diophantine approximation. We started looking at a third proof: The arithmetic of
Z[i] = {a + bi : a, b ∈ Z}, which sits naturally in the field Q(i) = {a + bi : a, b ∈ Q}.
10.1. Z[i]. Here, the norm of a + bi ∈ Q(i) is N (α) = a2 + b2 = αα. Note that N (αβ) =
N (α)N (β).
α ∈ ZZ[i] is a unit if α1 ∈ Z[i]. If α ∈ Z[i], then N (α) ∈ N (could be zero if α = 0), so if
α is a unit, N (α) = 1, and the only units are α = ±1, ±i.
α | β if β = αγ for some γ ∈ Z[i]. α ∈ Z[i] is irreducible if α = βγ for β, γ ∈ Z[i] implies
that β or γ is a unit. Equivalently, α 6= βγ with 1 < N (β), N (γ) < N (α).
If N (α) = p (i.e. α is a rational prime) then α is irreducible. For example, 1 + 2i ∈ Z[i]
is irreducible, as are 1 − 2i, 1 + i, 3 + 2i. But there are other irreducibles too. If p ≡ 3
(mod 4), then p is irreducible in Z[i].
Proof. Suppose that p is irreducible, and p = αβ, so N (α)N (β) = p2 , and so N (α) = N (β) =
p. But then p = a2 + b2 , contradicting p ≡ 3 (mod 4).
Note that 5 = (1 + 2i)(1 − 2i) and 2 = (1 + i)(1 − i) = (1 + i)2 (−i), so these are not
irreducible.
Our aim is to show that if p ≡ 1 (mod 4) then p = ππ where N (π) = p and π is irreducible.
π is prime means that if π | αβ then π | α or π | β. What we would like to prove is
Theorem 10.1.1. In Z[i], every irreducible is prime and conversely.
Proof. In the case of the integers, we used the division algorithm. We want to do something
similar here.
Note that the converse is easy: If π is prime and π = αβ, then π | α or π | β. Then
N (α) ≥ N (π) or N (β) ≥ N (π). This implies that N (α) = N (π) and N (β) = 1, or the other
way around, so π is irreducible.
11. 10/27
We will move on to the big theorem that we will prove in the rest of the course.
Theorem 11.0.1 (Dirichlet’s Theorem on Primes in Arithmetic Progressions). If (a, p) = 1,
then any arithmetic progression a (mod q) contains infinitely many primes.
This is a very simple sounding statement, but the proof is not so simple. This will take
around three weeks for us to prove. We’ll build up a proof and do this case by case.
Before we consider the main idea of the proof, let’s look at a case you’ve already handled.
We can prove that there are infinitely many primes ≡ 1 (mod 4) and infinitely many primes
≡ 3 (mod 4). The case of 1 (mod 4) was on the homework, using our knowledge about sum
of two squares.
For the case of 3 (mod 4), we have primes p1 , p2 , . . . , pn . Then 4p1 p2 . . . pn − 1 must be
divisible by a new prime ≡ 3 (mod 4).
Similarly, in the spirit of Euclid, we can prove that there are infinitely many primes that
are ≡ 1 (mod 3) and ≡ −1 (mod 3). This will be on the next problem set.
This trick fails for −1 (mod 5), however, as 5p1 p2 . . . pn − 1 could be pq with p ≡ q ≡ 2
(mod 5).
P1
11.1. Euler’s proof of the infinitude of primes. This is based on the fact that p
P 1
diverges. This can be seen from p σ for σ > 1. This converges, but we’d like to say that
this tends to infinity as σ → 1+ .
A nice way to think about this is to consider the Riemann zeta function
∞
X 1
ζ(s) = .
n=1
ns
For example, if s is real, s > 1, this series converges absolutely. If we think of s = σ + it as
a complex number, we have
1 1 1
s
= σ it
n n n
The final term has absolute value 1, so
1 1
s
= σ.
n n
So ζ(s), s = σ + it, converges absolutely for σ > 1.
The Riemann zeta function has a very natural connection with prime numbers. For every
natural number, we can factor it into primes in an unique way. Then
Y X ∞
1 1 1
1 + σ + 2σ + 3σ + · · · = ζ(s).
p
p p p n=1
This product converges absolutely if s > 1, or <s > 1, and converges is this range to a
nonzero value.
Why does this prove that there are infinitely many primes? Suppose that there are finitely
many primes. Then the right hand side remains bounded as σ → 1+ . However, as σ → 1+ ,
∞ Z 2 Z 3 Z ∞
X 1 dt dt dt 1
ζ(σ) = σ
≥ σ
+ σ
= σ
= .
n=1
n 1 t 2 t 1 t σ−1
On the other hand,
Z 2 Z 3
dt dt 1
ζ(σ) ≤ 1 + + + ··· ≤ + 1.
1 tσ 2 tσ σ−1
Proposition 11.1.2. For σ > 1,
1 1
ζ(σ) = + O(1) = + γ + c1 (σ − 1) + · · ·
σ−1 σ−1
28
We can then write
X 1
log ζ(σ) = − log 1 − σ .
p
p
Using
∞
X xk
− log(1 − x) = ,
k=1
k
we have ∞
XX 1 X 1
1
log ζ(σ) = kσ
= σ
+O 2σ
.
p k=1
kp p
p p
Now,
∞ ∞
X 1 X 1 1 1 1
kσ
≤ kσ
= 2σ 1 = O 2σ
.
k=2
kp k=2
2p 2p 1 − p σ p
Therefore, we have
Proposition 11.1.3.
∞
XX 1 1
log ζ(σ) = kσ
= σ + O(1).
p k=1
kp p
Corollary 11.1.4.
X 1 1
σ
= log + O(1).
p
p σ − 1
This last equality again follows from unique factorization and the fact that χ−4 is com-
pletely multiplicative. This is
∞ −1
χ−4 (p)k Y
YX χ−4 (p)
= ks
= 1− s
.
p k=0
p p
p
29
When <s > 1, the product converges absolutely, and it converges to something nonzero.
Therefore, L(s, χ−4 ) 6= 0 if <s > 1.
Where does the series converge? This is an alternating series, and we can using the
alternating series test to see that this converges for s > 0. This doesn’t say anything about
the product, however.
Now, consider
∞
χ−4 (p)k X χ−4 (p)
XX
X χ−4 (p)
log L(σ, χ−4 ) = − log 1 − = = + O(1).
p
pσ p k=1
pkσ p
pσ
Let σ → 1+ . Then
X 2
+ O(1) = log ζ(σ) + log L(σ, χ−4 ).
pσ
p≡1 (mod 4)
Since log ζ(σ) → ∞, we are done if we can show log L(σ, χ−4 ) does not go to −∞. In fact,
this actually converges to a log L(1, χ−4 ). So we want to show that L(1, χ−4 ) 6= 0. In this
case,
1 1 1 1 π
L(1, χ−4 ) = − + − + · · · = ,
1 3 5 7 4
and hence there are infinitely many primes ≡ 1 (mod 4). The other case is similar, so
X 1 X 1 1 1
σ
≈ σ
≈ log + O(1).
p p 2 σ−1
p≡1 (mod 4) p≡3 (mod 4)
12. 11/1
We are building toward a proof that there are infinitely many p ≡ a (mod q) when (a, q) =
1. We are adding to Euler’s proof of the infinitude of primes.
where χ−4 is completely multiplicative and periodic. This series converges absolutely if
<s > 1 and converges conditionally if s > 0. The product converges absolutely if s > 1.
Now,
∞
XX χ−4 (p)k X χ−4 (p)
log L(σ, χ−4 ) = kσ
= σ
+ O(1).
p k=1
kp p
p
Then
1 X 1
(log ζ(σ) + log L(σ, χ−4 ) = + O(1)
2 pσ
p≡1 (mod 4)
1 X 1
(log ζ(σ) − log L(σ, χ−4 ) = + O(1).
2 pσ
p≡3 (mod 4)
Theorem 12.1.1. As σ → 1+ ,
X 1 1 X 1
σ
= ζ(σ) + O(1) = .
p 2 pσ
p≡1 (mod 4) p≡3 (mod 4)
1 X 1
(log ζ(σ) − log L(σ, χ−3 ) = + O(1).
2 pσ
p≡2 (mod 3)
+
Let σ → 1 , L(σ, χ−3 ) → L(1, χ−3 ) 6= ∞ 6= 0. We know that this does not go to infinity
because of the conditional convergence from the alternating series test. It does not go to
zero because we can sum the series.
Z 1 Z 1 Z 1
3 4 6 7 1−t dt
L(1, χ−3 ) = (1 − t + t − t + t − t + · · · ) dtT = 3
dt = 2
dt
0 0 1−t 0 1+t+t
Z 1 Z 3/2 Z √3 √
dt dy 3 dz 4
= 2
= 2
= √ 2
0 (t + 1/2) + 3/4 1/2 y + 3/4 1/ 3 2 z + 1 3
2 √ √ π
= √ arctan( 3) − arctan(1/ 3) = √ .
3 3 3
We have therefore proved an analogous theorem to what we got before:
32
Theorem 12.2.1. As σ → 1+ ,
X 1 1 X 1
σ
= ζ(σ) + O(1) = .
p 2 pσ
p≡1 (mod 3) p≡2 (mod 3)
and there are three more of these. The alternating series test says that none of these is zero.
The last step is to show that none of these is zero; this step is left as an exercise.
12.4. q = 5. Here, we want to consider χ : (Z/5Z)× → C, where χ(mn) = χ(m)χ(n), χ is
not identically zero.
As before χ(1) = 1, and the group (Z/5Z)× is cyclic. It is generated by 2. We just need
to know χ(2). Since χ(2)4 = χ(16) = χ(1) = 1, we get four possibilities: χ(2) = ±1, ±i.
We can again write down the character table:
1 2 3 4
χ0 1 1 1 1 trivial or principal character
χ5 1 -1 -1 1
ψ 1 i -i -1
ψ 1 -i i -1
We can use these to identify each progression (mod 5). For example, for 2 (mod 5), we
have
1
(χ0 − x5 − iψ + iψ).
4
This is kind of like taking the dot product.
We again define
Y −1 X
χ(p) χ(n)
L(s, χ) = 1− s = s
.
p
p n
34
These converge absolutely when s > 1. When χ = χ0 , we have
1
L(s, χ0 ) = ζ(s) 1 − s .
5
Here,
log L(σ, χ0 ) = log ζ(σ) + O(1).
For χ 6= χ0 , we need an alternating series test that will tell us that they converge condi-
tionally for s > 0. We also need to know that L(1, χ) 6= 0 for these characters. The complex
ones aren’t too hard; the real one was done at the Putnam seminar a few weeks ago.
13. 11/3
We were trying to define Dirichlet characters for every modulus to separate out every
reduced residue class.
In the case of q = 5, we saw that χ : (Z/5Z)× → C. 2 is a primitive root (mod 5), so
χ(2)4 = 1, and we computed a character table.
We defined
∞ −1
X χ(n) Y χ(p)
L(s, χ) = = 1− s ,
n=1
ns p
p
which converges for s > 1.
1
If χ = χ0 , we have L(s, χ0 ) = ζ(s) 1 − 5s
. Then
1 1 1
L(s, χ0 ) = 1 − − + + ···
2s 3s 4s
converges conditionally for s > 0, and
1 1 1
L(s, ψ) = 1 + s
− s − s + ···
2 3 4
converges conditionally for s > 0.
Now,
∞
χ(p)k
XX
X χ(p)
log L(s, χ) = − log 1 − s = ks
.
p
p p k=1
kp
Recall that log of complex numbers is dangerous, because it is not single valued; adding
multiples of 2πi does not change log. When s > 1, we have
X χ(p)
= + O(1).
p
ps
Then,
X 1 1
= log L(s, χ0 ) + log L(s, χ5 ) + log L(s, ψ) + log L(s, ψ) + O(1).
ps 4
p≡1 (mod 5)
We want L(1, χ5 ), L(1, ψ), L(1, ψ) to be not infinity or zero. For L(s, ψ) and L(s, ψ), we
can look at the imaginary part, i.e.
1 1 1 1
=L(1, ψ) = − + − + · · · > 0.
2 3 5 8
35
We can do this trick of writing it as an integral:
Z 1 1
1 − t − t2 + t3
Z
2 3 5 6 7 8
L(1, χ5 ) = 1 − t − t + t + t − t − t + t + · · · dt = dt,
0 0 1 − t5
and this is left to you as an exercise. Recall from calculus that any rational function can be
integrated. This will have some nice answer in terms of logs of the golden ratio.
We can do the same thing for every progression (mod 5) to see that there are infinitely
many primes in every progression, and in fact, a quarter in each progression.
We need to prove orthogonality relations for the character table to make sure that we
can produce every arithmetic progression, and we need to find a more general form of the
alternating series test. Then we will have a way to isolate the primes in every progression,
so when we let s → 1+ , we will need to show that the L-functions are not infinity or zero.
13.1. Dirichlet characters (mod q).
Definition 13.1.1. A Dirichlet character χ(q) is a function χ : Z → C with the properties
(1) χ(n) = 0 if and only if (n, q) > 1.
(2) χ(n + q) = χ(n)
(3) χ(mn) = χ(m)χ(n) for all m, n.
Therefore, we have χ : (Z/qZ)× → C× is a group homomorphism.
How do we figure out what these functions are? We know that χ(1) = 1 because χ(n) =
χ(n)χ(1) for all n. Now, if (a, q) = 1,
χ(a)φ(q) = χ aφ(q) = χ(1) = 1,
Proof. The second statement follows from the first statement for χψ. (χψ = χ0 ⇔ χ = ψ).
So we only have to prove the first statement.
Define X
S(χ) = χ(n)
n (mod q)
Take a number c so that (c, q) = 1, and multiply both sides by χ(c). So
X X X
χ(c)S(χ) = χ(c)χ(n) = χ(cn) = χ(m) = S(χ).
n (mod q) n (mod q) m (mod q)
This means that either S(χ) = 0 or χ(c) = 1. If S(χ) 6= 0 then χ(c) = 1 for all (c, q) = 1, so
that χ = χ0 , which is what we wanted to prove.
13.2.2. Characters for composite moduli. Next, we want to show the orthogonality of columns
and we want to know what all of the characters are. We want to show how to construct
characters for composite moduli.
Consider characters χ1 (mod q1 ) and χ2 (mod q2 ), (q1 , q2 ) = 1. We can define x1 x2 (n) =
x1 (n)x2 (n). We claim that χ1 χ2 is a character (mod q1 q2 ). This is periodic by the Chinese
Remainder Theorem. This is also completely multiplicative, so it is a character.
So if q = pα1 1 pα2 2 · · · pαk k , we have characters χ1 (mod pα1 1 ), χ2 (mod pα1 2 ), · · · χk (mod pα1 k ),
we can multiply these to get a character χ (mod q) = χ1 χ2 · · · χk .
Remark. If we take another choice ψ1 (mod pα1 1 ), ψ2 (mod pα2 2 ), · · · , ψk (mod pαk k ), we can
construct ψ (mod q).
Say χ1 6= ψ1 . Then we want to say that χ 6= ψ.
α
Proof. Say χ1 (n) 6= ψ1 (n). Choose m ≡ n (mod pα1 1 ), m ≡ 1 (mod pj j ) for all j > 1. Then
χ(m) = χ1 (n) and ψ(m) = ψ1 (n)
37
This tells us that we have at least φ(q) distinct characters (mod q). Consider χψ (mod q).
This corresponds to χ1 ψ1 (mod pα1 1 ), χ2 ψ2 (mod pα2 2 ), · · · , χk ψk (mod pαk k ).
α
Let H denote the group of characters that we obtain by multiplying characters (mod pj j ).
We would like to show that H = G.
13.2.3. Orthogonality of columns.
Proposition 13.2.2. Let (n, q) = 1. Then
(
X φ(q) n ≡ 1 (mod q)
χ(n) =
χ∈H
0 n 6≡ 1 (mod q)
Given this, we can generalize slightly to get
(
X φ(q) n ≡ a (mod q)
χ(n)χ(a) =
χ∈H
0 n 6≡ a (mod q)
Proof. Note that statement 2 follows from statement 1 Since aa−1 ≡ 1 (mod q), we have
χ(a)χ(a−1 ) = 1, so χ(a) = χ(a−1 ), and then
X X
χ(n)χ(a) = χ(na−1 ).
χ∈H χ∈H
P
Define S(n) = χ∈Hχ(n). Take ψ ∈ H. Then
X X X
ψ(n)S(n) − ψ(n)S(n) = (ψχ)(n) = ρ(n) = S(n).
χ∈H χ∈H ρ∈H
Therefore, either S(n) = 0 or ψ(n) = 1 for all ψ ∈ H. If ψ(n) ≡ 1 for all ψ ∈ H, then n ≡ 1
(mod q).
Check this: Prove it for q = pα .
14. 11/8
14.1. Review. We are interested in Dirichlet characters χ : (Z/pZ)× → C× , and in fact,
they always have values that lie on the unit circle. The values of χ are φ(q)-th roots of unity.
The characters χ form a group, where χ0 is the principal character is the identity, and
−1
χ = x is the complex conjugate and inverse. If χ, ψ are characters, then χψ(n) = χ(n)ψ(n)
is a character.
Let the group of characters (mod q) be G. This is a finite abelian group.
We proved the first orthogonality relation:
(
X φ(q) χ = χ0
χ(n) = .
n (mod q)
0 χ 6
= χ0
We discussed the case q = pα , p is odd. Here, (Z/qZ)× is cyclic, so it is easy to see what
the characters are. We can pick a generator g. Then χ(g) determines χ(g k ) for all k, and
38
2πil
so χ is determined for all reduced residue classes. Note that we can have χ(g) = e φ(q) for
0 ≤ l ≤ φ(q) − 1. Therefore, for q = pα , we explicitly described φ(q) characters (mod q).
Now, we describe what happens for a composite modulus. Suppose q = pα1 1 pα2 2 · · · pαk k .
Consider characters
χ1 (mod p1 )α1 , χ2 (mod p2 )α2 , ··· , χk (mod pk )αk .
Then, define χ (mod q) via χ(n) = χ1 (n)χ2 (n) · · · χk (n). Different choices for (χ1 , · · · , χk )
give different choices of χ (mod q). We have therefore constructed φ(q) characters χ (mod q)
in this fashion. The characters we have constructed in this way also form a group. Call this
group H. We want to show that G = H; all of the characters arise this way. To find
characters for a composite modulus, multiply characters for prime power modulus.
To do this, we proved another orthogonality relation. Given (n, q) = 1,
(
X φ(q) n ≡ 1 (mod q)
χ(n) = .
χ∈H
0 n ≡
6 1 (mod q)
This is the same as saying that if (n, q) = 1, (a, q) = 1, then
(
X φ(q) n ≡ a (mod q)
χ(n)χ(a) = .
χ∈H
0 n 6≡ a (mod q)
Proof. Let X
S(n) = χ(n).
χ∈H
Take any character ψ ∈ H. Then
X X
ψ(n)S(n) = χψ(n) = χ(n) = S(n),
χ∈H χ∈H
so either S(n) = 0 or ψ(n) = 1 for all characters ψ ∈ H. The only way the latter condition
can hold is when n ≡ 1 (mod q). The proof is to pick χ2 , · · · , χk to all be the trivial
character, and only vary χ1 . Since χ1 comes from a root of unity, we have n = g k , and
2πikl
therefore χ(n) = e φ(q) , which is only possible for k = 0 and hence n = 1. The same
argument holds for the other characters χ2 , · · · , χk , and so we’re done.
We get for free that G = H are there are no more characters (mod q). Suppose that X is
some character (mod q) which is in G but not H. By the first orthogonality relation, if we
take any ψ ∈ H, X
X(n)ψ(n) = 0.
n (mod q)
Now take any (c, q) = 1 and multiply both sides by ψ(c). Then
X
X(n)ψ(n)ψ(c) = 0.
n (mod q)
φ(q) · X(c) = 0,
so therefore X(c) = 0 for all (c, q) = 1, so X(·) is identically zero and therefore G = H.
Here is another way to think about the previous discussion. We are interested in the space
of all functions (Z/qZ)× → C. This is a vector space over C of dimension φ(q). There are
some vectors in this space that we like. A nice basis for the space would be (for (a, q) = 1):
(
1 n ≡ a (mod q)
fa (n) = .
0 n 6≡ a (mod q)
We’ve written down another basis for the space. This is not as simple as the previous basis,
but it has another very important property: Our new basis consists of group homomorphisms.
These are the characters χ : (Z/qZ)× → C, and they respect the group structure. There
are exactly φ(q) of these, and they also form an orthogonal basis by the first orthogonality
relation. The second orthogonality relation is simply a change of basis relation between our
two bases.
Remark. This is actually an important principle. Given some arbitrary group, we can write
down bases of the space of maps from the group to some set. If the group is abelian, life is
wonderful. If not, that’s the realm of representation theory.
Dirichlet did this before the idea of abstract groups, so he was the first person to deal
with these characters. These ideas predate the idea of groups.
14.2. Plan of proof of Dirichlet’s Theorem. We now know that there are φ(q) characters
χ(q). For each, form
∞ −1
χ(p) χ(p2 )
Y
X χ(n) Y χ(p)
L(s, χ) = = 1 + s + 2s + · · · = 1− s .
n=1
ns p
p p p
p
If <(s) > 0, both the series and the product converge absolutely, and L(s, χ) 6= 0 if <(s) > 1.
Recall that if χ = χ0 , we have
Y Y −1
1 1
L(s, χ0 ) = ζ(s) 1− s = 1− s
p p
p|q p-q
1 φ(q)
As s → 1+ ,
Q
p|q 1− ps
→ q
and ζ(s) → ∞. For s > 1,
∞
XX χ(pk ) X χ(p)
log L(s, χ) = = + O(1).
p k=1
kpks p
ps
40
Now, given a residue class a (mod q), (a, q) = 1, we have
X 1 X 1 1 X
s
= s
χ(p)χ(a)
p p
p φ(q)
p≡a (mod q) χ (mod q)
!
1 X X χ(p)
= χ(a)
φ(q) p
ps
χ (mod q)
1 X
= χ(a) (log L(s, χ) + O(1))
φ(q)
χ (mod q)
1 X
= χ(a) log L(s, χ) + O(1).
φ(q)
χ (mod q)
We want this to diverge because that would give infinitely many primes ≡ a (mod q). Now,
for χ = χ0 , we have log L(s, χ0 ) = log ζ(s) + O(1) → +∞ as s → 1+ .
The crux of the remainder of the proof will be to show that as s → 1+ , χ 6= χ0 , we want
that L(s, χ) does not tend to 0 or ∞, i.e. that log L(s, χ) is bounded as s → 1+ , χ 6= χ0 .
One part will be easy. To show that L(s, χ) does not go to infinity, we only need a
generalization of the alternating test. To show that this does not go to zero is quite hard,
especially for real characters.
14.3. Generalization of alternating series test. We generalize the alternating series test
to show that L(s, χ) makes sense for s > 0.
14.3.1. Partial summation. Assume that there is some sequence of complex numbers an , and
assume that there is “a nice function” f (n). Then we want to consider
B
X
an f (n).
n=A+1
Define
n
X
sn = ak .
k=1
We can now write
B
X B
X B
X B
X
an f (n) = (sn − sn−1 )f (n) = sn f (n) − sn−1 f (n)
n=A+1 n=A+1 n=A+1 n=A+1
B
X B−1
X
= sn f (n) − sn f (n + 1)
n=A+1 n=A
B
X
= sB f (B + 1) − sA f (A + 1) − sn (f (n + 1) − f (n))
n=A+1
This is precisely the alternating series test. Note that all that we needed was that the sn are
bounded.
Proposition 14.4.1. Given an ∈ C, with |sn | ≤ S. Suppose that f (n) is monotone decreas-
ing to zero. Then
X
an f (n)
converges (converges).
Proof. We just need to show that the partial sums form a Cauchy sequence, i.e.
B
X
an f (n) < ε
n=A+1
15. 11/10
15.1. Partial summation. We introduced the idea of partial summation: P If there is a nice
sequence of complex numbers an and some nice function f (n), and if sn = k≤n ak , then
B
X B
X B
X
an f (n) = f (n)(sn −sn−1 ) = sB f (B +1)−sA f (A+1)− sn (f (n+1)−f (n)).
n=A+1 n=A+1 n=A+1
42
Think of this as integration, i.e.
B Z B+ Z B+
X +
f (n)(s(n) − s(n − 1)) = f (t) d(st ) = f (t)st |B
(A+1)− − f 0 (t)st dt
n=A+1 (A+1)− (A+1)−
Z B+
−
=s B+
+
f (B ) − s (A+1)− f ((A + 1) ) − f 0 (t)st dt.
(A+1)−
The point is that if f is nice and differentiable, we can rewrite our sums as integrals.
Last time, we considered the alternating series test as a nice application of this. Here are
more applications:
15.1.1. Applications.
Proposition 15.1.1.
X1 1
= log x + γ + O ,
n≤x
n x
where γ ≈ 0.577 . . . is Euler’s constant.
Proof. Here, X
st = 1 = [t].
1≤n≤t
an = 1 if n ∈ N, and f (t) = 1t . We are interested in
X 1 Z x+ 1 x+ Z x+ Z x+
1 1 [x+ ] [t]
= d([t]) = [t] − 2
[t]dt = + + 2
dt
n≤x
n 1− t t 1− 1− t x 1− t
Z x+ Z x+
1 t − {t} 1 {t}
=1+O + dt = 1 + O + log x − dt.
x 1− t2 x 1− t2
Here, [t] denotes the integer part of t and {x} denotes the fractional part of t, and
Z x+ Z ∞ Z ∞
{t} {t} {t} 1
2
dt = 2
dt − 2
dt = constant − O .
1− t 1 t x t x
We have therefore proved that
Z ∞
X1 {t} 1
= (log x) + 1 − 2
dt + O .
n≤x
n 1 t x
R∞
Let γ = 1 − 1 {t} t2
dt. It is unknown if γ is irrational.
Remark.
Y e−x
1
1− ∼ .
p≤x
p log x
A similar method can be used to prove other formulas, such as Stirling’s formula. Another
problem is to show that
1 1 π2
ζ(2) = 2 + 2 + · · · = .
1 2 6
The point is to compute the sum of the first few terms can lead to a small error because
what is left isn’t just any random thing. See the homework for details.
43
15.2. L(s, χ). Let χ be a character (mod q), χ 6= χ0 . Then
∞
X χ(n)
L(s, χ) = s > 1.
n=1
ns
We want an expression that makes sense even when s > 0. Let
X
Sχ (x) = χ(n).
n≤x
Then ∞
Z ∞ Z ∞ Z ∞
1 Sχ (t) Sχ (t) Sχ (t)
L(s, χ) = d(S χ (t)) = +s dt = s dt.
1− ts ts 1− 1− ts+1 1 ts+1
Note that
|Sχ (t)| ≤ φ(q) for all t ≥ 0.
Therefore, the preceding integral converges provided that s ≥ 0. If you thought of this as a
complex integral, s = σ + iy, this converges if σ = <s > 0. Note that we have omitted the
case where χ = χ0 , or the case of the Riemann zeta function. This can also be done for ζ(s);
see the homework. We know that ζ(s) must blow up at s = 1, but we’ll get an analog of this
feature to get something that makes sense for s > 0. This is basically analytic continuation.
The point is that if we consider as an example
1
1 + z + z2 + z3 + · · · = ,
1−z
and the sum makes sense when |z| ≤ 1, but the right hand side makes sense for z 6= 1. They
agree when both are well-defined, but one of them is more general. Happily, there is only
one way to do this.
Claim. If χ 6= χ0 and σ > 0, then L(σ, χ) is infinitely differentiable.
How do you even differentiate this once? Here,
∞ ∞ Z ∞
d X d −s log n X − log nχ(n) log n
L(s, χ) = χ(n) e = s
= − s d(Sχ (t)).
ds n=1
ds n=1
n 1− t
This will be absolutely convergent if s > 1, but it actually converges for s > 0. Part of the
point here is that χ(n) has positive and negative signs.
Z ∞ 0 Z ∞ Z ∞
Sχ (t) Sχ (t) Sχ (t)
s s+1
dt = s+1
dt + s (− log t) dt.
1 t 1 t 1 ts+1
Now, we should be reasonably happy that L(σ, χ) is once differentiable for all s > 0. To be
completely rigorously, we want to show
L(σ + δ, χ) − L(σ, χ)
− L0 (σ, χ) < ε.
δ
So what we’re claiming is that L(σ, χ) are very nice functions.
In particular, if σ is very close to one, we can use Taylor’s theorem:
(σ − 1)2 00
L(σ, χ) = L(1, χ) + (σ − 1)L0 (1, χ) + L (1, χ) + · · · .
2!
44
We go back to the prove of Dirichlet’s theorem. For σ > 1, we have
X 1 1 X
= χ(x) (log L(σ, χ)) + O(1).
pσ φ(q)
p≡a (mod q) χ (mod q)
If χ = χ0 , we know that
1
log L(σ, χ) = log ζ(σ) + O(1) = log + O(1).
σ−1
Therefore,
X 1 1 1 1 X
= log + x(a) log L(σ, χ) + O(1).
pσ φ(q) σ − 1 φ(q) χ6=χ
p≡a (mod q) 0
+
Now, as σ → 1 , L(σ, χ) → L(1, χ) is finite for χ 6= χ0 .
We make a key assumption that L(1, χ) 6= 0 for every χ 6= χ0 . If this is true, we are done
with Dirichlet’s Theorem.
15.3. L(1, χ). If χ is a complex character (χ 6= χ), then L(1, χ) 6= 0. Moreover, if χ is a real
character, then either L(1, χ) or L0 (1, χ) is not zero. (If there exists a zero at 1, it must be
a simple zero.)
Proof. Suppose that χ is complex. Then
L(1, χ) = 0 ⇔ L(1, χ) = 0
because
X χ(n) X χ(n) X χ(n)
=0⇔ = .
n n n
Take a = 1. Then
X 1 X 1 1 X
= log + log L(σ, χ) + O(1).
p≡1 mod q
φ(q) σ − 1 φ(q)
Now, suppose that L(σ, χ) has a zero of order mχ at σ = 1. This means that
L(1, χ) = L0 (1, χ) = · · · = Lmχ −1 (1, χ) = 0.
We want to say that mχ = 0.
We have
L(σ, χ) ≈ cχ (σ − 1)mχ .
Therefore,
!
X 1 1 1 X
= log + mχ log(σ − 1) + O(1)
p≡1 mod q
pσ φ(q) σ − 1 χ6=χ
0
!
1 1 X
= log 1− mχ + O(1).
φ(q) σ−1 χ6=χ 0
Since there are a positive number of primes in this residue class, the right hand side needs
to be positive, so X
mχ ≤ 1.
χ6=χ0
45
16. 11/15
Today we should finish the proof of Dirichlet’s Theorem.
Here’s a quick survey of what we’ve done so far.
16.1. Review. We’ve found Dirichlet characters χ (mod q) to isolate the arithemtic pro-
gressions. We’ve also defined absolutely convergent functions
∞ −1
X χ(n) Y χ(p)
L(s, χ) = = 1− s .
n=1
ns p
p
When x 6= χ0 , we can extend this to something that makes sense for s > 0. We did this last
time by writing Z ∞ Z ∞
1 sχ (y)
L(s, χ) = s
d(sχ (y)) = s dy.
1 y 1 y s+1
This is infinitely differentiable. Also, as s → 1, we have that L(s, χ) 6→ ∞. We just need to
show that it is nonzero.
Why do we want to do this?
X X X χ(pk )
χ(a) log L(s, χ) = χ(a)
k,p
kpks
x (mod q) χ (mod q)
X 1 X 1
= φ(q) = φ(q) + O(1).
k,p
kpks ps
p≡a (mod q)
pk ≡a (mod q)
We already know what happens for complex-valued characters. Let’s recap that. If s is a
real number σ, then all of the terms on the right hand side are positive. Take a = 1. Then
X X 1
log L(s, χ) = ks
≥ 0 if s > 1.
p,k
kp
x (mod q)
pk ≡1 (mod q)
Then Y
L(s, χ) ≥ 1.
p
+
Let s → 1 . This product contain one term that goes to infinity. This means that there can
only be at most one term that goes to zero. First, the product is real because they come in
conjugate pairs.
If L(1, χ) → 0, then by Taylor,
|L(s, χ)| ≤ C(s − 1) for s close to 1.
Now, if χ is a complex character, with L(1, χ) = 0, then L(1, χ) = 0 also, and
Y C
L(s, χ) ≤ C(s − 1)C(s − 1)C ≤ s − 1,
χ mod q
s − 1
where the right hand terms represent χ0 , χ, χ, and all other characters. This contradicts
that the product is at least 1.
If χ is a real characters, then in the same way (Taylor approximation), Twe see that
L(1, χ) and L0 (1, χ) can’t both be zero.
46
16.2. L(1, χ) 6= 0 for real characters χ. Now, we just need to show that if χ is a real
character (mod q), L(1, χ) 6= 0. We did several examples in the homework. This is the
hardest part of the proof. Dirichlet gave a beautiful proof of this: In half of cases, you get
something in terms of π, and in other cases, you get things like the golden ratio. We’ll discuss
that in the next several lectures. Here, we’ll give a slick proof that is harder to understand
but can be done more quickly.
Define X X
rχ (n) = χ(d) = χ(a).
d|n n=ab
Since this function is multiplicative, we only need to figure out what this does on prime
powers. So
k+1 χ(p) = 1
1 χ(p) = 0
rχ (pk ) = 1 + χ(p) + χ(p2 ) + · · · + χ(pk ) =
0 χ(p) = −1, k odd
1 χ(p) = −1, k even.
This looks like writing numbers as the sum of two squares. Here,
2
p≡1 (mod 4)
rχ (p) = 0 p≡3 (mod 4)
1 p=2
47
So we can interpret 4rχ (n) as the number of ways of writing n = x2 + y 2 . Another way to
think about this is prime factorization in the Gaussian integers. There are only eight ways
of writing p = ππ. The point is that rχ (n) is something that we should care about.
The idea of the proof is that x is something large. Consider
X rχ (n)
√ .
n≤x
n
This is
X XX XX X jxk X x
d(n) = 1= 1= = + O(1)
n≤x n≤x d|n d≤x n≤x d≤x
d d≤x
d
d|n
1
= x log x + γ + O + O(x) = x log x + O(x).
x
This procedure is very wasteful, since the error we get is not so good. Why is the error term
not so good? Approximation of the floor is not so good. This is bad when we know more
about the floor. √
Dirichlet (in a different context) proved an asymptotic formula with error O( x). This is
done using the hyperbola method. An example of a hyperbola is ab = x, we are interested in
counting lattice points lying below the hyperbola ab = x. Dirichlet’s idea is to pick a point
(A, B) on the hyperbola. We can count the points inside the hyperbola with a ≤ A, and we
have to add back the terms where A < a and b ≤ B. This gives us two cases.
Case 1:
X x
X X X 1
1= 1= + O(1) = x log A + γ + O + O(A)
a,b a≤A a≤A
a A
b≤x/a
ab≤x
a≤A
Case 2:
X 1 X χ(a)
√ √ .
b≤B
b a
A<a<x/b
X χ(n) sχ (y) ∞ Z ∞
sχ (y)
= s
+ s
+s dy
n≤z
n y z z y s+1
Z ∞
X χ(n) |sχ (z)| φ(q)
= s
+O s
+O s s+1
dy ,
n≤z
n z z y
so
X χ(n) φ(q)
L(1, χ) = +O
n≤z
n z
and
X χ(n) φ(q)
L(1/2, χ) = √ +O √ .
n≤z
n z
In case 1, we then have that
√ X χ(a) X χ(a)
A
2 x +C √ +O √
a≤A
a a≤A
a x
√ √
φ(q) 1
= 2 x L(1, χ) + O + L(1/2, χ) + O( √ ) + O(A/ x).
A A
√
When A = B = x and we assume L(1, χ) = 0, this is O(1).
17. 11/17
Today we are actually going to finish the proof of Dirichlet’s Theorem.
17.1. Finishing the proof. Let’s recall where we’re at. We want to show that L(1, χ) 6= 0
for real characters χ. This is quite hard.
We were looking at
X X
rχ (n) = χ(d) = χ(a).
d|n ab=n
2
Note that rχ (n) ≥ 0, rχ (n) ≥ 1 if n = m , and
X rχ (n) X 1 1
√ ≥ = log x + O(1).
n √ m 2
n≤x n≤ x
If you write
X χ(n) Z ∞
d(sχ (y))
L(s, χ) = + ,
n≤z
ns z+ ys
we want to show that the tail is small. Here, s > 0. To do this, integrate by parts:
Z ∞ ∞ Z ∞ Z ∞
d(sχ (y)) sχ (y) sχ (y) sχ (z + ) sχ (y)
s
= s
+s s+1
dy = − s
+s s+1
dy.
z+ y y z+ z+ y z z+ y
17.2.1. Counting ways to write as sum of two squares. This is (almost) counting the number
of ways of writing p = x2 + y 2 . If p ≡ 3 (mod 4), we proved that this is not possible. If
p = 2, there are four ways to do this: (±1)2 + (±1)2 . Note that 4 = rχ−4 (2). We claim that
when p ≡ 1 (mod 4), there are 8 ways.
Write
p = x2 + y 2 = (x + iy)(x − iy) = π1 π 1
where x + iy and x − iy are primes in Z[i]. We have unique factorization, so this factorization
is unique up to units.
If π is a prime in Z[i] with norm N (π) = p, then p = x2 + y 2 = (x + iy)(x − iy), then
either x + iy = (±1 or ± i)π or x − iy = (±1 or ± i)π. This gives our desired 8 solutions.
52
Therefore, 4rχ−4 (p) gives the number of ways of writing p as a sum of two squares. This
also works more generally.
If p ≡ 3 (mod 4), there are 4 ways to write p2 as a sum of two squares: (±p)2 + 02 and
0 + (±p)2 , and indeed, rχ−4 (p2 ) = 1.
2
Now, by examining the prime factorization, we can see that the number of ways of writing
n = x2 + y 2 is 4rχ−4 (n).
Now, consider X
4rχ−4 (n).
n≤x
We will use the hyperbola method to connect this to L(1, χ−4 ). We can also write this as
X X X
4rχ−4 (n) = {(a, b) : a2 + b2 = n} = 1,
n≤x n≤x (a,b)
a2 +b2 ≤x
√
which is the number of lattice points inside of a circle of radius x. How many integer points
should lie in a circle? This should roughly be
√
= area + O(circumference) = πx + O( x).
Here, it seems that the error should actually be better: there’s a significant amount of
cancellation. Things should work out nicely, and it seems like the error should only be
O(x1/4+ε ). This is a conjecture called Gauss’s Circle Problem, and it is closely related to
Dirichlet’s Divisor Problem. We already know that O(x1/3 ).
17.2.2. Hyperbola method again. Consider any character χ 6= χ0 (mod q). Then
X X X X X X
χ(a) = rχ (n) = χ(a) 1+ χ(a).
ab≤x n≤x a≤A b≤x/a b≤B A<a≤x/b
In case 1,
X x 1 x
χ(a) + O(1) = x(L(1, χ) + O( )) + O(A) = xL(1, χ) + O(A + ),
a≤A
a A A
√
and we again choose A = x.
For case 2, !
X X X
χ(a) = O φ(q) = O(B),
b≤B A<a≤x/b b≤B
so therefore, X √
rχ (n) = xL(1, χ) + O( x),
n≤x
53
√
choosing A = B = x. Then when χ = χ−4 , we have
X √ √
4 rχ−4 (n) = 4xL(1, χ) + O( x) = πx + O( x).
n≤x
Therefore, L(1, χ−4 ) = π4 . Dirichlet found this proof, and he found how this generalizes.
17.2.3. Binary quadratic forms. This is something of the form
f (x, y) = ax2 + bxy + cy 2 ,
a, b, c ∈ Z, a > 0. For example, x2 + y 2 . We say that such a form is primitive if (a, b, c) = 1.
The discriminant is b2 −4ac, which is what we get when we try to complete the square. Here,
4af (x, y) = 4a2 x2 + 4abxy + 4acy 2 = (2ax + by)2 − dy 2 .
If d < 0, then the binary quadratic form is positive definite. If d > 0, then the form is
indefinite, taking positive and negative values. In the case d = m2 , we get a degenerate
situation (2ax + by + my)(2ax + by − my). We only care about the case d < 0. For example,
2 2
√ x + y , we have d = −4.
in the case
In Q( 5), we have
√ ! √ !
1+ 5 1− 5
= −1,
2 2
so the golden ratio is invertible here. The structure is more complicated for d > 0 than for
d < 0.
The plan is to understand all quadratic forms of a given discriminant. There should be
lots of such quadratic forms. There is are three variables d = b2 − 4ac and one equation, so
there should be lots of solutions. Just like x5 + y 5 = z 5 .
As an example, x2 + y 2 = (x + y)2 + y 2 = x2 + 2xy + y 2 . Of course, these are “the
same” because there is a nice change of variables. It will turn out that there is only one of
discriminant −4, and the number of quadratic forms is called the class number.
18. 11/29
The final will be Monday at 8:30am, and it will cover everything through last week.
18.1. Review of last lecture. We considered χ (mod 4). Then
4rχ (n) = #{n = x2 + y 2 }.
Then X √
4 rχ (n) = #{(x, y) ∈ Z2 : x2 + y 2 ≤ x} = πx + O( x).
n≤x
In addition, we know that X
rχ (a) = χ(a).
ab=n
Using the hyperbola method, we proved that
X √
rχ (n) = 4xL(1, χ) + O( x),
n≤x
Now,
1 n
z = z + n,
0 1
57
so the first step of the algorithm moves z until it lies between the vertical lines x = 1/2 and
x = −1/2. Now,
0 −1 1
z=− .
1 0 z
We flip this point, causing it to lie above the circle (but possibly messing up the real coor-
dinate), and we repeat.
The argument on binary quadratic forms is exact the same as the algorithm to put
√
−b + D
2a
inside this fundamental domain.
Given (a, b, c) and if |b| = a choose b > 0, then we have
(a, a, c) ∼ (a, −a, c) if a < c
and if a = c if
(a, b, a) ∼ (a, −b, a).
Definition 18.3.2. A binary quadratic form is called reduced if |b| ≤ a ≤ c and
(1) if |b| = a then choose b > 0
(2) if a = c then choose b ≥ 0.
Every positive definite binary quadratic form is equivalent to a (unique) reduced form.1
Two reduced forms are inequivalent (to be justified later).
We want to compute all reduced forms of a given discriminant. We will give an upper
bound for a, giving a finite number of choices for a, b. Then c is fixed by the discriminant
D = b2 − 4ac, D < 0. For a reduced form (a, b, c), we have |D| = 4ac − b2 ≥ 4ac − a2 ≥
4a2 − a2 = 3a2 , so therefore r
|a|
a≤ .
3
There are only finitely many choices for a. For each, there are a finite number of choices for
b and hence for c.
The number of real binary quadratic forms of a given discriminant D is called the class
number h(D). We’ve shown that this is a finite number.
Example 18.3.3. D = −4. We require
r
4
a≤ =⇒ a = 1.
3
This means that |b| ≤ 1 and b2 − 4ac = −4. Note that b has the same parity as the
discriminant, so b is even and hence b = 0 and c = 1. So there is only one quadratic form of
discriminant −4, and this is x2 + y 2 . The class number is 1.
Example 18.3.4. D = −3. We now want
r
3
a≤ =⇒ a = 1.
3
Then b is odd and |b| ≤ 1 =⇒ b = ±1. Then c = 1. But we said that if a = |b| then we
choose b ≥ 0, so x2 +xy +y 2 is the unique equivalence class of quadratic forms of discriminant
−3.
58
Note that D = −5 is impossible; the only allowed discriminants are those equivalent to 0
or 1 (mod 4).
18.4. Sum of two squares revisited. We again prove that if p ≡ 1 (mod 4) then p =
x2 + y 2 .
Proof. We have
−1
=1
p
and
−4
= 1,
p
so −4 ≡ n2 (mod p), so then −4 = n2 − pc. So we get a solution to −4 ≡ n2 (mod 4p) by
the Chinese Remainder Theorem. This gives −4 = n2 − 4pc. Consider the quadratic form
px2 + nxy + cy 2 of discriminant −4.
We will also consider forms of the form x2 + 3y 2 .
19. 12/1
19.1. Reduced binary quadratic forms. Last time we were looking at binary quadratic
forms ax2 + bxy + cy 2 where (a, b, c) = 1, a, c > 0, and b2 − 4ac < 0. We proved the following:
Theorem 19.1.1 (Reduction Theory). Each form is equivalent to a form with |b| ≤ a ≤ c
and D = b2 − 4ac with the caveat that if |b| = a then choose b positive, and if a = c then
choose b positive.
Keep in mind that b has the same parity as the discriminant D and D ≡ 0, 1 (mod 4).
Last time, we gave an algorithm for producing reduced forms. We didn’t prove, however,
that two reduced forms are inequivalent, and we sketch the proof here. It’s a sketch because
it’s like a calculus exercise.
Proof. Suppose we have a reduced form f (x, y) = ax2 + bxy + cy 2 . We gave the bound last
time of r
|D|
a≤ .
3
What is the smallest value represented by f ? We want to show that this is a = f (±1, 0).
Other numbers that are represented are c = f (0, ±1) and a + b + c = f (1, 1). Choose ±1
and ±1 such that a − |b| + c = f (±1, ±1).
The smallest number represented is a and the second smallest number that is properly2
represented is c, and the third smallest properly represented is a − |b| + c. This is left as an
exercise to think through. The general idea is that
2
x + y2
2 2 |b| 2 |b|
f (x, y) ≥ ax − |b| + cy ≥ a − x + c− y 2 ≥ (a − |b| + c) min(x2 , y 2 ).
2 2 2
Given this fact, two reduced forms must be inequivalent, because the smallest numbers
that they represent will give the coefficients a and c and then b. There are a few special
cases to think through – what happens for a = c?
2Proper means (x, y) = 1.
59
Now we’ve given a complete theory of producing inequivalent reduced forms. The number
of reduced forms is the class number, denoted by h(D).
Example 19.1.2. When D = −4, we have h(−4) = 1, and the only reduced form is x2 + y 2 .
Example 19.1.3. When D = −3, we have h(−3) = 1, and the only reduced form is
x2 + xy + y 2 .
Example 19.1.4. When D = −7, we want
p
a ≤ 7/3 =⇒ a = 1.
In addition, |b| ≤ a and b is odd so b = 1. Then by the discriminant, we get c = 2 and the
only reduced form is x2 + xy + 2y 2 .
Example 19.1.5. For D = −8, then as above, a = 1, and b must be even so b = 0, and the
only reduced form is x2 + 2y 2 .
Example 19.1.6. D = −12. Here, a ≤ 2. If a = 1 then b = 0, and we get x2 + 3y 2 as a
reduced form. If a = 2, then b can be 0 or 2. If b = 0, we cannot get a value for c, while if
b = 2, we get c = 2, and so we get 2x2 + 2xy + 2y 2 = 2(x2 + xy + y 2 ), which isn’t primitive.
In this case, we also see that h(−12) = 1.
There are only finitely numbers with class number 1, so let’s do an example where the
class number is more than one.
Example 19.1.7. D = −20. Here, a ≤ 2. In the case a = 1, we have b = 0, so c = 5. This
is the form x2 + 5y 2 .
In the case a = 2, b must be even, so b = 0 or b = 2. If b = 0 then −20 = −4 × 2 × c is
not possible, and if b = −2 then c = 3, and we get a second reduced form 2x2 + 2xy + 3y 2 .
Therefore, h(−20) = 2.
We see that as the discriminant gets larger, we have to consider more and more cases, and
there’s a good chance something works.
Remark. If D ≡ 0, 1 (mod 4) then h(D) ≥ 1. Why?
If D ≡ 0 (mod 4) then use x2 − D4 y 2 , and if D ≡ 1 (mod 4) then use x2 + xy + 1−D 2
4
y .
Why is the theory of binary quadratic forms very pretty?
Let n be odd, and suppose that (n, D) = 1. We want to know: Can n = f (x, y) for some
binary quadratic form f of discriminant D and (x, y) = 1?
Suppose that p and q are coprime, and f (p, q) = n. We go back to the Euclidean Algorithm
to claim that we can find r and s such that
p r
∈ SL2 (Z).
q s
If we make a change of basis
x p r X
= ,
y q s Y
we get a transformation
f (x, y) →g(X, Y ) = nx2 + (B)xy + (C)y 2
f (p, q) →g(1, 0).
60
Since f and g has the same discriminant, we must have D = B 2 − 4Cn, which means that
D is congruent to a square (mod n).
This is actually an equivalent condition. Conversely, if D is a square (mod n), then n is
represented properly by a binary quadratic form of discriminant D. This is because we can
write D = b2 − (·)n. We would love to have (·) divisible by 4. How can we rig it so that it is
even? We want b to have the same parity of D. We can rewrite the previous expression by
D = (b + n)2 − (·)n. Since n is odd, b or b + n has the same parity as D. So we do have a
solution to D = b2 − 4nc. Then nx2 + bxy + cy 2 is a form of discriminant D and it represents
n. This is actually a beautiful theorem.
This gives us a lot of consequences.
Example 19.1.8. D = −4. When is −4 congruent to a square (mod n)? If n = p is prime
then p ≡ 1 (mod 4).
Example 19.1.9. D = −8, with form x2 + 2y 2 . When is p = x2 + 2y 2 ? We want −8 to be
a square (mod p), which requires
(
−8 3 (mod 8)
= 1 =⇒ p =
p 1 (mod 8).
Example 19.1.11. D = −20. We have two reduced forms x2 + 5y 2 and 2x2 + 2xy + 3y 2 .
If we choose a prime p 6= 2, 5, we want to ask if p can be properly represented by a form of
discriminant −20, which requires −20p
= 1.
Suppose that p ≡ 1 (mod 4). Then we want
5 p
=1= =⇒ p ≡ 1, 4 (mod 5)
p 5
In the case that p ≡ 3 (mod 4), so we want
−20 5 p
= 1 =⇒ = −1 = =⇒ p ≡ 2, 3 (mod 5).
p p 5
Combining these conditions, we see that the primes that work are p ≡ 1, 3, 7, 9 (mod 20).
For these p, either p = x2 + 5y 2 or p = 2x2 + 2xy + 3y 2 .
Eventually, this kind of statement is all you can say. But here we have an extra piece of
luck: p = x2 + 5y 2 requires p ≡ 1, 4 (mod 5), while p = 2x2 + 2xy + 3y 2 = (2x + y)2 + 5y 2
requires p ≡ 2, 3 (mod 5).
Now, if p ≡ 1, 9 (mod 20) then p = x2 + 5y 2 , and if p ≡ 3, 7 (mod 20) then p = 2x2 +
2xy + 3y 2 . Euler wrote down these types of results and Gauss did the general theory, which
he called genus theory.
61
We want to connect this back to L-functions. This related to why we get nice values like
π
4
.
19.2. Relation to L-functions. If n is odd and (n, D) = 1, what does it mean to say that
D is a square (mod n).
Suppose that n = p1 . . . pk (square-free). Then pDi = 1 for each pi , i = 1, 2, . . . , k.
Notice that
k
Y D
1+ =0
i=1
pi
unless D is a square (mod p1 . . . pk ). Multiplying this product, there are 2n terms, sort of
like the divisor function.
Extend the Legendre symbol to all numbers via
Y α
D D
= .
n α
p
p ||n
Note that this is completely multiplicative and periodic, so it is a character (mod |D|). There
are a few things to be checked here. Applying this,
k X
Y D D
1+ =
i=1
pi n=ab
b
Assume that D < 0, and D is called a fundamental discriminant, which means that
D 6= a2 b with b is a discriminant and a > 1. For example, −12 = −3 · 4 is not a fundamental
discriminant, but −20 is a fundamental discriminant even though −20 = −5 · 4 because −5
is not a discriminant.
Then we have χ(n) = Dn is a real character (mod |D|). Then
X
rχ (n) = χ(b),
ab=n
and
2rχ (n) = #{n = f (x, y) : f over all reduced forms of discriminant 1, and (x, y) ∈ Z2 }.
(We have two special cases. If D = −3, use 6rχ (n), and if D = −4, use 4rχ (n).)
Now, like we did with characters (mod 4), we use the hyperbola method to get that
X
2rχ (n) ∼ 2L(1, χ)x.
n≤x
This is an ellipse ax2 + bxy + cy 2 ≤ X. How many lattice points are inside the ellipse? The
answer is roughly the area of the ellipse, and we can work this out. We get
X X 2πX
1∼ p .
f =red. bin. quad. forms (x,y)
|D|
ax2 +bxy+cy 2 f (x,y)≤X
62
It’s the same answer for every reduced binary quadratic form, so we end up getting
X 2πX
2rχ (n) ∼ 2L(1, χ)x ∼ p h(D),
n≤x
|D|
so for D < −4, we get
πh(D)
L(1, χ) = p ,
|D|
which is an amazing theorem of Dirichlet. This tells us why the some of the L-functions
are nonzero. In the other case, we count lattice points inside a hyperbola and do the same
thing.
19.3. Why is n2 + n + 41 a prime? We can show that the class number of D = −163 is
−1. This is rather surprising. We want
r
163
a≤ =⇒ a ≤ 7.
3
So we have to check that a = 2, 3, 4, 5, 6, 7, and b is odd. It’s not too bad, we just have to
check. We get that the only reduced quadratic form x2 + xy + 41y 2 .
So then n2 + n + 41 = f (n, 1). If n ≤ 39, then we can check that f (n, 1) = 41. Suppose
that f (n, 1) is composite. This means that there exists a prime p < 41 with p | f (n, 1),
which means that −163 is a square (mod n), which means that −163 is a square (mod p),
which means that p is represented by some form of discriminant −163. But this form cannot
represent numbers between 1 and 41, so we’re done.
This actually generalizes. If h(1 − 4A) = 1, then n2 + n + A is prime for n ≤ A − 1. Sadly,
the largest value for which this works is −163. This is Gauss’s class number problem, and
it was solved in the 1950s.
E-mail address: [email protected]
63