0% found this document useful (0 votes)
25 views63 pages

152 Main

Uploaded by

myturtle game01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views63 pages

152 Main

Uploaded by

myturtle game01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

MATH 152 NOTES

MOOR XU
NOTES FROM A COURSE BY KANNAN SOUNDARARAJAN

Abstract. These notes were taken from math 152 (Elementary Theory of Numbers) taught
by Kannan Soundararajan in Fall 2010 at Stanford University. These notes were live-TEXed
during the lecture in vim and compiled using latexmk. Each lecture gets its own section.
The notes were not edited afterward, so there may be typos; please email corrections to
[email protected].

1. 9/20
Information for the course exists at the website http://math.stanford.edu/~ksound.
There is no required book for the course, but some books are on reserve in the library.
30% from homework, 30% from one midterm, 40% from the final.
We have a CA, and he will also have office hours. Sound’s office is in 383W. His office
hours will be Thursdays, 1-3.
Homeworks will be given Wednesdays and due the following Wednesday.

1.1. Introduction. This course is about number theory, which is the study of properties of
N or Z or Q.
There is some kind of basic theory leading up to quadratic reciprocity. I like to have a
big theorem, and we’ll end up proving that given any arithmetic progression, it contains an
infinite number of primes.
Today we’ll start with primes.

1.2. Primes. One definition that you can make is the definition of an irreducible. We make
a distinction here just for fun.
Definition 1.2.1. A natural number n > 1 is called irreducible if n cannot be written as
n = ab with 1 < a, b < n.
This is what usually one calls a prime number.
That’s actually one way of writing what a prime is; here’s one that is more natural. We
need the concept of divisibility.
Definition 1.2.2. Given two integers a 6= 0 and b; we say that a|b if b = ac for some integer
c.
Definition 1.2.3. A natural number p > 0 is prime if p|ab =⇒ p|a or p|b.
It’s not clear that our two definitions are the same, so we need to prove a theorem.
Theorem 1.2.4. The primes and irreducibles are the same.
1
Proof. It is true that we can write each number as a product of irreducibles. We can prove
this by induction. That’s the same as factoring a number. We’d like to say that there is
only one way of factoring each number. Let’s prove this first.
Theorem 1.2.5 (Fundamental Theorem of Arithmetic). Every number n is a product of
primes in a unique way.
This needs a proof; it is not an obvious fact. Why? Consider an example.
Example 1.2.6. Consider the even numbers A = {2, 4, 6, 8, . . . }. The irreducibles are 2,
6, 10, 14, 18, 22, etc – the numbers that are not multiples of 4. Not every number can be
written uniquely as a product of irreducibles; for example, 60 = 6 · 10 = 2 · 30. Why is it
true that unique factorization doesn’t hold here? Why do proofs of FTA fail in this case?
There is the division algorithm:
Proposition 1.2.7 (Division algorithm). Given n, a ∈ N, we can find q, r ∈ Z with n =
aq + r, and 0 ≤ r < a.
Does this hold for our example? If we divide 30 by 2, we would get a remainder that is
too large: 30 = 2 · 14 + 2.
There are other contexts when the division algorithm holds, but it’s not always clear.
This suggests that we need to use the division algorithm in our proof. We need to prove
something about the greatest common divisor.
Definition 1.2.8. Given two numbers a, b ∈ Z (not both 0), we say that g ∈ N is the
greatest common divisor of a and b if g|a and g|b, and if it is the largest such number –
no number greater than g divides both a and b.
There is a really nice property of the GCD that will help us.
Theorem 1.2.9. Given a and b, there exists integers x, y such that g = ax + by.
Let’s use this to finish proving our Theorem 1.2.4.
First, we need to show that every prime is irreducible. Suppose not; there is a prime p
that is not irreducible. Then express p = cd with 1 < c, d < p. Then p|cd, which implies
that p|c or p|d (since p is prime), which is a contradiction because c, d < p.
Conversely, we show that every irreducible is prime. Given an irreducible n, and that
n|ab, we want to have that n|a or n|b. Suppose n doesn’t divide a. The GCD of n and a is
1, so Theorem 1.2.9 tells us that 1 = nx + ay, so that b = nbx + aby. Therefore n|b, which
is what we wanted to show. This means that n is prime. 
Proof of Theorem 1.2.5. We can now prove Theorem 1.2.5. We need to show uniqueness.
Suppose that there are two factorization:
n = p1 · · · pr = q1 · · · qs .
Now, p1 |p1 · · · pr , so p1 |q1 · · · qs . Therefore, p1 divides one of q1 , . . . qs . But q1 . . . qs are
irreducibles, so p1 equals one of q1 , . . . qs . We can then cancel p1 on both sides and continue
with p2 . 
Now we turn to the proof of the theorem for the GCD.
2
Proof of Theorem 1.2.9. Let S = {ax + by : x, y ∈ Z}. Clearly, 0, a, b ∈ S. Let’s just look
at the positive numbers. By the well-ordering property, there is a smallest number; let s be
the smallest natural number in S.
We claim that every element of S is a multiple of s. This comes from the division algorithm:
for n ∈ S, n = qs + r where 0 ≤ r < s, then r ∈ S. But then r = 0 by the minimality of s.
In particular, s|a and s|b, so s = ax + by is a common divisor of a and b. We need
to show that there are no bigger divisors. Any common divisor of a and b divides s, so
s = gcd(a, b). 
Of course, this could also have been proved using the Euclidean Algorithm. Here, we
computed (312, 968) = 8, and 8 = 10 · 968 − 31 · 312.
Now that we know what the primes are, let’s talk about properties of primes.
Theorem 1.2.10 (Euclid). There are infinitely many primes.
We consider several proofs:
Proof. Suppose not. Then there are only finitely many primes p1 , . . . , pn are all the primes.
But then p1 · · · pn + 1 is a new prime, which is a contradiction. 
Proof. If there are few primes, there must also be few natural numbers. But we know the
number of natural numbers. That’s the idea; let’s make this precise.
Usually, π(x) will denote the number of primes up to x. Consider n ≤ x. Factorize a
number n ≤ x such that
n = pα1 1 · · · pαs s .
Assume that there are only k primes.
Every number can be written as n = ab2 where a is square-free. This is of course unique.
If there are only k primes, there are only 2k square-free numbers. Now,
X X √ X √
1= 1< x 1 ≤ 2k x.
n≤x ab2 ≤x a≤x
a square-free a square-free

This is a contradiction for x > 4k . 


This also gives a bad bound
log x
π(x) ≥
log 4
.
We describe the factorization of n!. Given a prime p, what is the exact power of p dividing
n!; this is sometimes denoted pα ||n!.
So the power of p dividing n! equals
∞  
X n
sp := .
k=1
pk
Note that this is actually a finite sum. Now,
Y
n! = ps p .
p≤n
3
Proposition 1.2.11. We have Stirling’s Formula:

n! ≈ 2πne−n nn .
In number theory, we are often interested in how quickly something can be computed –
with a computer program, for example. GCD was fast to compute, while n! is not fast to
compute; that’s why we care about Stirling’s Formula.
Another form of Stirling is
log n! = n log n − n
.

Proof. We have
X
log n! = log m.
1≤m≤n

Comparing the sum to an integral, we see that


Z n Z n+1
n log n − n + 1 = log t dt ≤ log n! ≤ log t dt = (n + 1) log(n + 1) − n
1 1

We can write
1
log(n + 1) = log n + log(1 + )
n
and use a Taylor expansion. So
n+1
(n + 1) log(n + 1) = (n + 1) log n + − ....
n


2. 9/22
Instead of writing down inequalities all the time, we want a more convenient notation to
drop insignificant terms.
Definition 2.0.12. f (x) = O(g(x)) if there is a constant C such that |f (x)| ≤ Cg(x) for all
large x.

Example 2.0.13. For example, x = O(ex ), x = O(x), sin(x) = O(1), log x = O(x0.01 ).
Our previous inequalities for log n! can now be written as

log n! = n log n − n + O(log n),

and Stirling’s formula would be


 
1 1 1
log n! = n log n − n + log n + log 2n + O .
2 2 n
4
2.1. The number of primes. So why is this useful for studying primes? We had the
formula X
log n! = sp log p,
p≤n
where    
n n
sp = + 2 + ...
p p
Using the fact that [x] = x + O(1), we have sp = np + O(1) + O( pn2 ).
Now, we have
! !
X n  
n X log p X X log p
+ O(1) + O log p = n +O log p + O n
p≤n
p p2 p≤n
p p≤n p≤n
p2
The final term has a sum that converges, so it reduces to O(n). We will prove that the
middle term is also O(n). Assuming this, we have
X log p
n + O(n) = n log n − n + O(log n)
p≤n
p
so that
X log p
n = n log n + O(n).
p≤n
p

Theorem 2.1.1. As n → ∞,
X log p
= log n + O(1).
p≤n
p

Note that here, it is no longer important that n is an integer. So we can replace it by a


real number x, and the formula would still make
P sense.
Why do we care? We want to study π(x) = p≤x 1. The first person to make real progress
toward this was Gauss, and he made a conjecture that became the prime number theorem.
Theorem 2.1.2 (Prime Number Theorem, Gauss, 1896).
Z x  
dt x x
π(x) ≈ = +O .
2 log t log x (log x)2
Our previous theorem was a weak version of the prime number theorem.
Based on what we know, we can still say something about primes. Here’s a weaker result:
Proposition 2.1.3 (Chebyshev).
cx Cx
≤ π(x) ≤
log x log x
for some constants 0 < c < C and all large x.
The prime number theorem would state that
π(x)
lim = 1,
n→∞ x/ log x

so this is clearly a bit weaker.


5
From the Chebyshev bounds, we get the following nice result about primes that says that
prime occurs will some regularity.
Theorem 2.1.4 (Bertrand’s Postulate). For every n ≥ 2, there is always a prime between
n and 2n.
2.2. Proof of a Chebyshev bound.
Proof of a Chebyshev bound. Here, we will prove one of the Chebyshev bounds.
We will consider the middle binomial coefficient 2n n
. We want to understand its prime
factorization. We have some easy bounds because it is the biggest binomial coefficient:
22n
 
2n
≤ ≤ 22n .
2n + 1 n
This means that  
2n
2n log 2 − log(2n + 1) ≤ log ≤ 2n log 2.
n
What is the power of p dividing 2n

n
? Using the factorial form of the binomial coefficient,
we see that this is
∞   ∞   X ∞    
X 2n X n 2n n
j
−2 j
= j
−2 j .
j=1
p j=1
p j=1
p p
Note that (
0 {x} ∈ [0, 12 )
[2x] − 2[x] =
1 {x} ∈ [ 12 , 1)
For the large primes, we only have to consider j = 1, which is easy. The smaller primes are
messier. We divide into √ two groups of primes.
For large primes p > 2n, the power of p dividing 2n

n
is either 0 or 1, depending on the
fractional part of { np }.
For example, if n < p ≤ 2n, then the power of p is 1. Of course, this should be obvious
from the factorial form.
If 2n
3
< p ≤ n, the power of p is 0.
If 2 < p ≤ 2n
n
3
, the power of p is 1.
We can keep extending √ this.
For the primes p < 2n, the exponent of p dividing 2n is at least 0 and at most log 2n

n log p
.
So the binomial coefficient satisfies
Y log 2n 2n Y
p log p ≥ ≥ p
p≤2n
n n<p≤2n

This gives us
22n
 
2n Y
≤ ≤ 2n = (2n)π(2n) .
2n + 1 n p≤2n
This actually tells us that
 2n 
2
log 2n+1 2n log(2n + 1)

2n

π(2n) ≥ = log 2 − ≥ log 2 − 2,
log(2n) log 2n log 2n log 2n
6
2n

which gives us a Chebyshev bound. Why did we consider n
? Ramanujan came up with
the proof. 
Example 2.2.1 (Chebyshev).
(30n)!n!
∈ N.
(15n)!(10n)!(6n)!
Corollary 2.2.2.
x
π(x) ≥ (log 2) + O(1).
log x
We also have the following bound:
 
2n 2n
2 ≥ ≥ nπ(2n)−π(n) ,
n
so that
2n log 2
π(2n) − π(n) ≤ .
log n
As before, if we want, we can replace n by a real number x to get
x
π(2x) − π(x) ≤ (2 log 2) + O(1).
log x
This also gives an estimate for π(x) by summing this formula and dividing by 2 at each step.
x/2
π(x) − π(x/2) ≤ (2 log 2) + O(1).
log x/2
This gives
x
π(x) ≤ (2 log 2) + O(log x),
log x
yielding the other half of the Chebyshev bound. This is enough (with some tweaking) to
2n
 Bertrand’s Postulate. This uses the fact that if 3 ≤ p ≤ n then p does not divide
prove
2n
n
.

3. 9/27
We will discuss the theory of congruences, leading up to quadratic reciprocity, for the next
few lectures.
3.1. Congruences.
Definition 3.1.1. n > 0, n ∈ N, a, b ∈ Z, a ≡ b (mod n) means that n | (a − b).
It is easy to see that this forms an equivalence relation. This means that it is satisfies
(1) a ≡ a (mod n)
(2) a ≡ b (mod n) iff b ≡ a (mod n)
(3) a ≡ b (mod n) and b ≡ c (mod n) means that a ≡ c (mod n).
We have more properties:
• a ≡ b (mod n) =⇒ ax ≡ bx (mod n)
• c ≡ d (mod n) =⇒ a + c ≡ b + d (mod n)
We cannot always cancel, however. If ax ≡ bx (mod n) and (x, n) = 1, then a ≡ b (mod n).
7
3.1.1. Residue classes. It is natural to think about equivalence classes. Given any n, Z
splits into n equivalence classes. These are called the residues mod(n). The set of residue
classes (mod n) forms an additive group, satisfying the standard properties of associativity,
existence of identity, and inverses.
We can also multiply residue classes: a (mod n) × b (mod n) = ab (mod n). In general,
this does not form a group, as 0 does not have an inverse. There is a multiplicative identity
1 (mod n).
Theorem 3.1.2. The congruence ax ≡ b (mod n) as a unique solution (mod n) if (a, n) =
1.
If (a, n) = g, then g must divide b for there to be a solution.
Proof. From (a, n) = 1, we see that 1 = ax + ny. This means that we can solve ax ≡ 1
(mod n), so b = abx + nby, so we therefore can solve ax ≡ b (mod n).
For uniqueness, suppose that ax ≡ b (mod n) and ay ≡ b mod n. We can subtract these
equations and cancel because (a, n) = 1. 
So this tells us precisely which residue classes are invertible.
Definition 3.1.3. a (mod n) is a reduced residue class if (a, n) = 1.
Every reduced residue class is invertible. The set of reduced residue classes (mod(n)) with
the operation of multiplication again forms an abelian group. (Check this).
It was clear that the additive group had n elements. It’s not so clear for this multiplicative
group.
Definition 3.1.4. The Euler phi function φ(n) is the number of reduced residue classes
(mod n).
Note. If p is prime, φ(p) = p − 1, and φ(p2 ) = p2 − p.
If we look at residue classes (mod p), we only need to check distributivity to see that this
forms a field with (+, ×).
3.2. Useful theorems.
Theorem 3.2.1 (Wilson’s Theorem). If p is a prime then (p − 1)! ≡ −1 (mod p).
Exercise 3.2.2. If n > 4 is composite then (n − 1)! ≡ 0 (mod n).
Remark. Someone told Gauss that this would be hard to prove because there are no good
ways to write primes, and Gauss said that they needed new notions and not new notations.
Proof of Wilson’s Theorem. Consider
1 × 2 × 3 × · · · × p − 1.
If a (mod p), there exists a−1 (mod p), and we can cancel the two. This is good as long as
a 6≡ a−1 (mod p), but a ≡ a−1 (mod p) means that a2 ≡ 1 (mod p), so we need to consider
1 and −1 as the two that do not cancel. Hence we get that the product is congruent to −1
(mod p). 
From a (mod n), we can compute a2 (mod n), a3 (mod n), etc. Since there are a finite
number of reduced residue classes, we must come back to something that we had earlier. So
ak ≡ al (mod n) for some k < l, so that al−k ≡ 1 (mod n).
8
Definition 3.2.3. The order of the residue class (mod n) is the smallest g ∈ N such that
ag ≡ 0 (mod n).
Example 3.2.4. The order of 1 is 1, and the order of −1 is 2 (if n > 2). Anything else
would be hard to compute.
Theorem 3.2.5 (Euler’s Theorem). If (a, n) = 1, then order of a divides φ(n).
Corollary 3.2.6 (Fermat’s Little Theorem). If p is prime, the order if a (mod p) divides
p − 1. Equivalently, if p - a then ap−1 ≡ 1 (mod p), or equivalently, ap ≡ a (mod p).
Proof of Euler’s Theorem. Consider the φ(n) residue classes 1 (mod n), . . . , (n−1) (mod n).
What happens when we multiply these by a mod n? We get a (mod n), . . . , (−a) (mod n).
We claim that these two sets are the same. Each of these is reduced, so it was claimed in
the original set. The reverse is also true because ax ≡ b (mod n) has a solution, all of these
are distinct. So our two sets of residue classes are permutations of each other.
So Y Y Y
b≡ (ab) (mod n) ≡ aφ(n) b (mod n).
(b,n)=1 (b,n)=1 (b,n)=1
b (mod n) b (mod n) b (mod n)

Hence, aφ(n) ≡ 1 (mod n). 


3.3. Primality testing.
Puzzle 3.3.1. Here is a puzzle. Two people meet on the internet, and they decide to get
married. They want to send a ring, but the mail isn’t secure. Everyone has a big supply of
padlocks. How would they be able to send a ring so that at any time, the ring has at least
one padlock on it.
Is it true that an−1 ≡ 1 (mod n) implies that n is prime? If (a, n) ≡ 1? The answer turns
out to be no. There is this number 561 = 3 × 11 × 17. But if (a, 561) = 1 then a560 ≡ 1
(mod 561).
To show this, observe that
a2 ≡ 1 (mod 3) =⇒ a560 ≡ 1 (mod 3)
10 560
a ≡1 (mod 11) =⇒ a ≡1 (mod 11)
16 560
a ≡1 (mod 17) =⇒ a ≡1 (mod 17).
Hence, the converse to Fermat’s Little Theorem is not true. There are infinitely many
numbers like 561; it is a Carmichael number.
What if we want to see if a number is prime? Check that an−1 ≡ 1 (mod n), and (a, n) = 1.
If not, n is not prime. If so, we don’t know; check with a different a. Eventually, we might
have a good chance that this is prime, as Carmichael numbers are rare.
Is this good way to check primality? Is this fast to compute? We can compute an−1
(mod n) rapidly by repeated squaring.
Six or seven years ago, there was a jazzed up version of this from Agrawal, Kayal, and
Saxena. It was a rapid (polynomial time) algorithm to determine whether a number is prime,
answering a question of Gauss.
In contrast, we don’t know a good way to factor numbers into primes. If there were a way,
the remainder of this lecture would be pointless.
9
3.3.1. Diffie-Hellman. This is a precursor of RSA.
A and B want to have a common code word but while communicating in a pubic channel.
Let p be prime and let g be a random complicated number. A thinks of a number x. She
posts g x (mod p). B thinks of a number y and posts g y (mod p).
At this point, public information is g, p, g x , and g y , while x and y are private. Both A
and B know g xy , which no one else knows.
Why can’t anyone else find g xy ? This is the discrete logarithm problem: Given x and g x ,
find x. This has no known good solution. People don’t know how to nicely do this with
more than two people.
4. 9/29
Definition 4.0.2. Z/nZ is the additive group of residue classes (mod n). (Z/nZ)× is the
multiplicative group of residue classes (mod n), which has size φ(n).
4.1. RSA Public Key Cryptography. This uses Euler’s Theorem. Say you’re a big
company like Amazon, and people want to buy stuff from you. You need to be able to send
messages and receive coded messages. Everyone should be able to encode, and only you
should be able to decode.
Pick two large primes p and q. These are secret. Compute pq = n and compute the Euler
phi function φ(n) = (p − 1)(q − 1) = pq − p − q + 1. Choose a number c as the coding key,
and find a number d such cd ≡ 1 (mod φ(n)). Suppose that (c, φ(n)) ≡ 1. Then we can use
the Euclidean algorithm to compute d rapidly.
Public information are c and n, while d is secret.
Anyone can send a message a. They compute ac (mod n) and send that to you. You can
decode this by computing (ac )d ≡ a1+kφ(n) ≡ a (mod n).
Nobody can prove that this is secure. It’s easy to show that something is easy, but it hard
to show why something should be hard. This is the P 6= N P problem.
4.2. Structure of group of reduced residues (mod n). What is the structure of the
group of all residues (mod n)? This is a cyclic group of order n generated by 1 (mod n).
Actually, any a coprime to n also gives a generator a (mod n).
4.2.1. Chinese Remainder Theorem. Consider the structure of the multiplicative group.
Theorem 4.2.1 (Chinese Remainder Theorem). If (n1 , n2 ) = 1 then there is a natural
bijection
(
a1 (mod n1 ), (a1 , n1 ) = 1 n
↔ a (mod n1 n2 ), (a, n1 n2 ) = 1.
a2 (mod n2 ), (a2 , n2 ) = 1
Another way of saying this is that we can find a unique simultaneous solutions to
(
x ≡ a1 (mod n1 )
x ≡ a2 (mod n2 )
Proof. The main thing to use is that (n1 , n2 ) = 1. This means that we can find k1 and k2
such that n1 k1 + n2 k2 = 1.
I want to find some number a (mod n1 n2 ). We would like to use something like a1 n2 k2 +
a2 n1 k1 . Note that this is congruent to a1 (mod n1 ) and congruent to a2 (mod n2 ). 
10
4.2.2. Euler φ function is multiplicative. One consequence of this is that the Euler φ function
is multiplicative, so Q
that if n = n1 n2 and (n1 , n2 ) = 1, then φ(n) = φ(n1 )φ(n2 ).
Therefore, if n = pαi i , then
Y Y
φ(pαi i ) = pαi i − pαi i −1 .

φ(n) =
i
 
1
Another way of writing this is φ(pα ) = pα 1 − . Then p
Y 1

φ(n) = n 1− .
p
p|n

For reduced residue classes, we have


a1 (mod n1 ), a2 (mod n2 ) ↔ a (mod n1 n2 )
b1 (mod n1 ), b2 (mod n2 ) ↔ b (mod n1 n2 ).
Then we see that
a1 b 1 (mod n1 ), a2 b2 (mod n2 ) ↔ ab (mod n1 n2 ).
So we’ve proved that (Z/n1 n2 Z)× is isomorphic to the group (Z/n1 Z)× ×(Z/n2 Z)× . There-
fore, (Z/nZ)× can be understood from the structure of (Z/pα Z)× for prime powers pα .
Consider (Z/pZ)× . Every element has order dividing p − 1. Does there exist an element
of order p − 1?
4.2.3. Primitive roots.
Definition 4.2.2. A primitive root (mod n) is an element of (Z/nZ)× of order φ(n). This
means that it generates (Z/nZ)× .
Theorem 4.2.3. There is a primitive root (mod n) if and only if n = pα for an odd prime
or n = 2pα or n = 2 or n = 4.
Lemma 4.2.4. (n1 , n2 ) = 1, g1 is the order of a (mod n1 ), and g2 is the order of a (mod n2 ),
then the order of a (mod n1 n2 ) is the lcm of g1 and g2 .
Proof. Note that the lcm works. So the order of a (mod n1 n2 ) is g that divides lcm(g1 , g2 ).
Now, ag ≡ 1 (mod n1 ) and ag ≡ 1 (mod n2 ), so g1 | g and g2 | g, which means that
lcm(g1 , g2 ) | g. 
Proof of Theorem 4.2.3. Notice that φ(n) is almost always composite, because it is even. If
n1 , n2 > 2, then φ(n1 ) and φ(n2 ) are even, and lcm(φ(n1 ), φ(n2 )) ≤ 21 φ(n1 )φ(n2 ). There are
then no primitive roots (mod n).
Let’s take it for granted that if n is a power of 2 larger than 4, then the structure is a
little bit different. 
Example 4.2.5. Consider 561 = 3 × 11 × 17. Then
order | lcm(2, 10, 16) = 80,
while 80 | 560.
Note that (Z/2pα Z)× is isomorphic to (Z/2Z)× × (Z/pα Z)× .
The plan will be the following:
11
(1) (Z/pZ)× is cyclic
(2) “lift” primitive roots (mod p) to (mod p2 ), etc.
5. 10/4
If (n1 , n2 ) = 1, we clearly have that (Z/n1 n2 Z)× is isomorphic to (Z/n1 Z)× × (Z/n2 Z)× .
5.1. Primitive roots.
Theorem 5.1.1 (Primitive roots). For any prime p, the group (Z/pZ)× is cyclic. So there
is a element g (mod p) with order p − 1.
This is an easy to understand group, as
(Z/pZ)× = {g (mod p), g 2 (mod p), · · · , g p−1 (mod p)}.
×
If p = 2, this is easy because (Z/2Z) has only one element. So we can assume that p is
odd.
5.1.1. Polynomial Congruences. We want to consider polynomial congruences. In general,
we look at a polynomial with integer coefficients f (x) ∈ Z[x], and we want to consider f (x)
(mod p). We say that a is a solution to the congruence f (x) ≡ 0 (mod p) if f (a) ≡ 0
(mod p). In fact, we can also consider congruences mod n where n is composite.
So far, we know how to solve one congruence: the linear case f (x) = ax + b. Considering
this expression modulo p, we see that it has a unique solution if (a, p) = 1, and if p | a, then
no solutions if b 6≡ 0 (mod p) and p solutions if b ≡ 0 (mod p).
5.1.2. Quadratic congruences. The natural next step is to consider quadratic congruences.
This is already hard.
The equation x2 − 1 (mod p) has two solutions when p is prime, while the equation x2 − 1
(mod 15) has four solutions by the Chinese remainder theorem.
In addition, x2 + 1 (mod 3) has no solutions, x2 + 1 (mod 5) has two solutions, and hence
2
x + 1 (mod 15) has no solutions. This demonstrates that the solutions are not so easy to
see.
We’ll discuss this in more detail in the next few lectures. Most of the time, looking at a
congruence mod p, you never get more solutions than the degree of the polynomial.
5.1.3. Monic polynomials. We consider polynomials with coefficient 1 by dividing out by the
leading coefficient as long as it were coprime to the modulus. Look at polynomials of the
form
f (x) = xn an−1 xn−1 + · · · + a0 .
Lemma 5.1.2. If f (x) is monic of degree n (leading coefficient is coprime to p), then
f (x) ≡ 0 (mod p) has at most n solutions.
Proof. If n = 1 then we’re done. We will prove this by induction on n. Assume the induction
hypothesis is that the lemma is true for degrees up to n − 1. We need to prove this in degree
n.
Suppose that f (x) has n + 1 solutions, so that f (b1 ), . . . , f (bn+1 ) ≡ 0 (mod p), where
b1 , . . . , bn+1 are distinct residue classes.
Consider the polynomial of degree n g(x) = (x − b1 ) . . . (x − bn ). This means that g(x) =
f (x)q(x) + r(x), where the remainder has degree less than n. Note that this means that r(x)
12
has solutions b1 , . . . , bn , but r(bn+1 ) 6≡ 0 (mod p). This contradicts the induction hypothesis,
so f (x) must have at most n solutions and we’re done. 
5.1.4. Back to primitive roots. We go back to looking for primitive roots. If p is a prime,
write p − 1 = q1α1 q2α2 . . . qrαr where q1 , . . . , qr are distinct primes.
α
Lemma 5.1.3. There is an element g (mod p) whose order is qj j .
α
Proof. Suppose that there is no element of order qj j . Take any number a (mod p) of order
β
g = q1β1 . . . qj j dividing p − 1, so ag ≡ 1 (mod p). Note that
p−1
!qjβj
αj
a qj ≡1 (mod p).

p−1
αj
β q
Ifqj j is the order of a , but if the order if even smaller, you won’t have enough powers of
j

the qj . This was a messy proof.


p−1
α α
If there is no element of order qj j , then for every a (mod p), we have a q j ≡ 1 (mod p).
α −1
If g is the order of a (mod p) and qj j | g then βj < αj and this is true. But if βj = αj ,
α
then by the preceding argument we have produced an element of order qj j .
p−1
Consider f (x) = x qj − 1 has at most p−1
qj
solutions. But we know that it is zero for every
(x, p) ≡ 1, and that’s a contradiction.
The notation is messy, but the idea is simple. Remove all of the j’s and things will look
better. 
Lemma 5.1.4. If a has order m and b has order n and (m, n) = 1, then ab has order mn.
Proof. If am ≡ 1 (mod p) then amn ≡ 1 (mod p), and a similar argument holds for b to get
bmn ≡ 1 (mod p), so (ab)mn ≡ 1 (mod p). So if g is the order of ab, then g divides mn. We
want to show that m | g and n | g.
We know that (ab)g ≡ 1 (mod 1), so (ab)gn ≡ 1 (mod p), which means that agn ≡ 1
(mod p), so m | gn and hence m | g. The same holds for b. 
There is therefore a primitive root (mod p). We’ve proved that (Z/pZ)× has a generator
g (mod p).
What is the order of an element g a ? g a is also a primitive root if (a, p − 1) = 1. If
p−1
(a, p − 1) 6= 1, then the order is (a,p−1) .
×
What we’ve proved is that (Z/pZ) is isomorphic to (Z/(p−1)Z). The number of primitive
roots (mod p) is φ(p − 1).
If d | p − 1, how many elements of order d are there? The elements of order d are of the
p−1
form g ( d )l where (d, l) = 1. There are φ(d) elements with this order.
This also means that X
φ(d) = p − 1.
d|p−1
In fact, X
φ(d) = n.
d|n
13
To see this, write down the fractions n1 , n2 , . . . , nn , and reduce each to lowest terms. The
possible denominators are divisors of n, provides another representation to the previous
sum.
Theorem 5.1.5. If p is odd, there is a primitive root (mod pα ).
Proof. First, we go from p to p2 . Take g (mod p) to be a primitive root. We want to
construct a primitive root (mod p2 ). If we have an element a (mod p2 ), then a0 + a1 p where
0 ≤ a0 , a1 ≤ p − 1.
Starting with g + kp (mod p2 ), 0 ≤ k ≤ p − 1, we compute the order of g + kp (mod p2 ).
If the order is r, then (g + kp)r ≡ 1 (mod p2 ). r | p(p − 1), and g r ≡ 1 (mod p), but also
p − 1 | r. So r = p − 1 or r = p(p − 1). Can it actually be equal to p − 1?
We’ll prove next time that for exactly one value of k, g +kp will have order p−1 (mod p2 ),
and for the remaining p − 1 values, g + kp will be a primitive root (mod p2 ).


6. 10/6
Midterm in two weeks: Wednesday week after next: October 20.
We have a polynomial f (x) ∈ Z[x], f (x) = an xn + an−1 xn−1 + · · · + a0 . We’re interested
in solutions to f (x) ≡ 0 (mod p). We’re really interested in the coefficients (mod p).
We proved:
Theorem 6.0.6. If (an , p) = 1, then f (x) ≡ 0 (mod p) has at most n solutions.
Proof. Suppose not, then b1 , . . . , bn are distinct solutions (mod p). Then f (x) = an (a −
b1 )(x − b2 ) . . . (x − bn ) = g(x) where g(x) has degree < n. Then g(b1 ), . . . , g(bn ) ≡ 0 (mod p).
Contradiction unless all coefficients of g are 0 (mod p). But plug in x = bm1 . 
Last time, this was drowned in notation.
Lemma 6.0.7. If q α || p − 1 then there is an element a (mod p) of order q α .
Proof. Suppose there is an element b (mod p) of order q α r. Then br has order q α . There is
no element of order a multiple of q α . So every element has order dividing p−1
q
.
p−1
So x q ≡ 1 (mod p) has p − 1 solutions. This is a contradiction. 
If p − 1 = q1α1 q2α2 . . . qlαl , Then we get ai (mod p) of order qiαi . We multiply these together
to get that a1 . . . al has order p − 1.
We have therefore proved that
Theorem 6.0.8. (Z/pZ)∗ is cyclic, and there is a primitive root (mod p).
6.1. Lifting.
6.1.1. Lift from p to p2 . We will lift primitive roots (mod p) to primitive roots (mod p2 ).
We began this last time.
Consider a primitive root g (mod p). There are p of them (mod p2 ): g + kp (mod p2 )
where 0 ≤ k ≤ p − 1. These are the only candidates for a primitive root (mod p2 ).
Exercise 6.1.1. If g (mod p2 ) is a primitive root, then g (mod p) is a primitive root.
14
Solution. If g r ≡ 1 (mod p), then g r = 1 + ap. Then g rp = (1 + ap)p ≡ 1 (mod p2 ). The
order of g (mod p2 ) is a multiple of r that divides rp.

The order of g (mod p) is p − 1, and the order of g + kp (mod p) is p − 1, so the order


of g (mod p2 ) is a multiple of p − 1 and a divisor of p(p − 1). So there are in fact only two
choices: it could be p − 1 or p(p − 1).
If g p−1 ≡ 1 + ap, then consider (g + kp)p−1 . We want to check if this is 1 (mod p2 ). We
can expand this using the binomial theorem:
   
p−1 p−1 p−1 p−2 p−2
(g + kp) =g + (kp)g + (kp)2 g p−3 + · · ·
1 2
p−1 p−2 p−2
≡g + (p − 1)kpg ≡ 1 + ap − kpg (mod p2 ).

So we want to see if a ≡ kg p−2 (mod p), or equivalently, k ≡ ag (mod p).


Therefore, the order of g (mod p2 ) is p − 1 for one value of k and it is p(p − 1) for p − 1
values of k. So every primitive root (mod p) gives p − 1 primitive roots (mod p2 ). So we get
at least (p − 1)φ(p − 1) = φ(φ(p2 )) primitive roots.

6.1.2. Lift from p2 to p3 . Suppose g (mod p2 ) is a primitive root and g + kp2 (mod p3 ). The
possible orders are p2 (p − 1) or p(p − 1). Now write g p(p−1) = 1 + bp2 , and we want to
understand (g + kp2 )p(p−1) (mod p3 ). This comes from the binomial theorem:
 
p(p−1) p(p−1) p(p − 1)
(g + kp) =g + kp2 g p(p−1)−1 ≡ g p(p−1) (mod p3 ).
1

Could g p(p−1) have been 1 (mod p3 )? Could b have been a multiple of p? We can write
g p−1 = 1 + ap and p - a. Then
   
p−1 p p p
(g ) =1+ (ap) + (ap)2 + · · · ≡ 1 + ap2 (mod p3 ).
1 2

That means that b ≡ a (mod p) is not a multiple of p.


So if g (mod p2 ) is a primitive root (mod p2 ), then g + kp2 (mod p3 ) is a primitive root
(mod p3 ) for every 0 ≤ k ≤ p − 1.
There are therefore p(p − 1)φ(p − 1) = φ(φ(p3 )) primitive roots.
Hence, if p is an odd prime, there are φ(φ(pα )) primitive roots (mod pα ).

6.1.3. Structure of (Z/2pα Z)× . This is the same as the structure of (Z/pα Z)× by the Chinese
Remainder Theorem.
Why did we have to assume that p is an odd prime? (Z/4Z)× = {1, 3 (mod 4)} is
generated by 3. However, for (Z/8Z)× , every element has order 1 or 2. This means that it
is the Klein four group Z/2Z × Z/2Z. What is different in the proof?
If α ≥ 3, then (Z/2α Z)× has size 2α−1 . Then 5 has order 2α−2 . Also include -1, and we
can prove that a (mod 2)α is ±5j for some 1 ≤ j ≤ 2α−2 .
15
6.2. Quadratic Congruences. Consider ax2 +bx+c ≡ 0 (mod p), where p is odd. There’s
not much to do for p = 2. We can also assume that (a, p) = 1. First, we complete the square:
4a2 (ax2 + bx + c) ≡ 0 (mod p)
2
(2ax) + 2(2a)bx + 4ac ≡ 0 (mod p)
(2ax + b)2 ≡ b2 − 4ac (mod p).
Therefore, it is sufficient to solve y 2 ≡ d (mod p), where d is the discriminant b2 − 4ac.
Definition 6.2.1. A residue class a (mod p) is a called a quadratic residue if there are 2
solutions to x2 ≡ a (mod p) and a nonresidue if there are no solutions to the congruence. If
a ≡ 0 (mod p), then there is only one solution.
Definition 6.2.2. The Legendre symbol is

  
a 1 if a is a quadratic residue (mod p)
= −1 if a is a quadratic nonresidue (mod p)
p 
0 if a ≡ 0 (mod p).

7. 10/11
We want to understand quadratic congruences (mod n), and it is sufficient to understand
them (mod p); from that, simply use the Chinese Remainder Theorem.
We considered ax2 + bx + c ≡ 0 (mod p), p is odd, and (a, p) = 1. This reduced to solving
2
y ≡ d (mod p), which led to the definitions of quadratic residue and Legendre symbol.

7.1. Quadratic residues. There are primitive roots g (mod p), so for every (n, p) = 1,
then n ≡ g a (mod p) for some a. From this, we see that if a is even, then n is a quadratic
residue because n ≡ (g a/2 )2 (mod p). If a is odd, then n is a quadratic nonresidue, since
otherwise g = (nb )2 (mod p).
Fromthis,
 wemn see
 that the Legendre symbol is (completely) multiplicative, which means
m n
that p p = p .
Proposition 7.1.1 (Euler’s Criterion). If (n, p) = 1 then
 
n p−1
≡n 2 (mod p).
p
Proof. Write n ≡ g k (mod p). If n is even, the left hand side is 1, which is congruent to the
right hand side.
If n is odd, the LHS is −1, so we have
p−1
−1 ≡ g (2l+1) 2 ≡ g (p−1)/2 ≡ −1 (mod p).

Corollary 7.1.2.
  (
−1 p−1 +1 if p ≡ 1 (mod 4)
= (−1) 4 =
p −1 if p ≡ 3 (mod 4)
16
This makes it easy to determine whether a number is a quadratic residue (mod p).
We can produce a primality test. We want to see if n is prime. Pick any a < n, and check
n−1
if an−1 ≡ 1 (mod n). If this 6≡ 1, then n is composite. If ≡ 1, then we look at a 2 ≡ ±1.
n−1
If it is −1, we stop. If it is +1, see if n−1
2
is even and check if a 4 ≡ ±1.
If a number n passes this test for any value of a, it is called a strong pseudoprime. Try a
different a with this procedure. This is a very rapid process.
If the Generalized Riemann Hypothesis is true, then this algorithm works efficiently.
Theorem 7.1.3 (Gauss’s Law of Quadratic Reciprocity). Given two primes p and q (dif-
ferent and odd), then
   (
p q p−1 q−1 1 if either p or q is 1 (mod 4)
= (−1) 2 2 =
q p −1 if both p and q are 3 (mod 4).
This is a result that is theoretically interesting. It is not yet clear why it is an interesting
or important fact. People are still looking for similar reciprocity laws in other cases. You’ll
have to trust me that it is interesting. It is not a useful computational tool.
The Legendre symbol (mod p) has some properties:
(1) is periodic (mod p)
(2) is completely multiplicative
This is rather surprising, as it is not clear why such a function should exist. We’ll later
find all such functions that are periodic and multiplicative. This will be the crucial thing to
prove Dirichlet’s Theorem on producing primes in arithmetic progressions.
Example 7.1.4. 
1
 if n ≡ 1 (mod 4)
−1 if n ≡ 3 (mod 4)

0 if n is even.

Proof of Quadratic Reciprocity. Consider the reduced residue classes 1, 2, . . . , p−1


2
, and mul-
p−1 p−1
tiply each of them by a to get the classes a, 2a, . . . , a 2 . Suppose 1 ≤ j ≤ 2 . We can
make aj (mod p) lie in the interval [− p−1
2
, p−1
2
]. There are now two cases.
p−1
aj = bj or − bj with 1 ≤ bj ≤ .
2
Now, consider 1 ≤ j 6= k ≤ p−1
2
. Can bj = bk ? No, because aj 6≡ ak (mod p) and aj 6≡ −ak
(mod p). So the bj form a permutation of [1, p−1
2
].
Lemma 7.1.5.
p−1
a 2 ≡ (−1)# times we get −bj
Proof.
(p−1)/2 (p−1)/2 (p−1)/2
Y Y Y
(p−1)/2
j=a j≡ ±bj .
j=1 j=1 j=1


17
Note that the number of times we get −bj is equal to the number of times that aj (mod p)
lies in [ p+1
2
, p − 1]. We don’t really care about this number; just whether it’s even or odd.
Now,
 
aj
aj = + r.
p
There are two cases: r = bj or r = p − bj . If it is +bj , then it has the same parity as bj ; if it
is −bj , then it has the opposite parity as bj .
We can now compute
p−1 p−1 p−1 p−1 p−1
2 2   X 2 2   2
X X aj X aj X
aj = p + rj = p + (# of −bj terms) + j
j=1 j=1
p j=1 j=1
p j=1
p−1 p−1
2   2
X aj X
≡ + j + (# of −bj terms) (mod 2).
j=1
p j=1

Now,
p−1 p−1
p−1 p+1 p−1 p+1 2  2 
p2 − 1
 
2 2 2 2
X aj X aj
(# of −bj terms) ≡ a − + = (a − 1) + (mod 2).
2 2 j=1
p 8 j=1
p

Corollary 7.1.6.
  (
2 p2 −1 +1 if p ≡ 1 (mod 8) or p ≡ −1 (mod 8)
= (−1) 8 =
p −1 if p ≡ 3 (mod 8) or p ≡ 5 (mod 8)

Proof. Take a = 2 above, and we get that


p−1
2 2 
p2 − 1

p −1 X 2j
(# of −bj terms) = + = .
8 j=1
p 8


Now,
P p−1 P p−1
 
q p2 −1
= (−1) 8 (q−1)+ j=1 b p c = (−1) j=1 b p c .
2 qj 2 qj

p
Similarly,
P p−1
 
p
= (−1) k=1 b q c ,
2 pk

q
so their product is
P p−1 P p−1
  
p q 2
b qj
c 2
+ k=1 b pkq c .
= h(−1) j=1 p
q p
p−1
We need one final trick. Consider all numbers of the form qj − pk where 1 ≤ j ≤ 2
and
1 ≤ k ≤ q−1
2
.
18
There are p−1
 q−1 
2 2
nonzero integers. Some are positive and some are negative. How
many
j k positive numbers are there? For it to be positive, we need qj > pk. Given j, there are
qj
p
such values of k. The total positive values are therefore
p−1
2  
X qj
.
j=1
p
j k
pk
The negative values come from qj < pk. Given k, the number of j is q
, so the total
negative values is
p−1
2  
X pk
.
j=1
q
So
P p−1 P p−1
  
p q 2
b qj
c 2
+ k=1 b pkq c . = (−1)( p−1 q−1
2 )( 2 ) .
= (−1) j=1 p
q p

Remark. This was not the most intuitive proof, but it doesn’t require much machinery to
set up. There are more intuitive proofs. For example, we want to work with congruences of
things other than integers. Here,
(a + b)p ≡ ap + bp (mod p)
might hold for algebraic integers a and b, i.e. where a and b are solutions to monic polynomial
equations with integer coefficients.
There are nice algebraic integers that you can construct; these are called Gauss sums:
p−1  
X n 2πin
e p .
n=1
p
These are also algebraic integers. Now we can do ingenious things:
p−1  
!q
X n 2πin X nq q  X nq  2πinq q  X m 2πinq 
e p ≡ = e p = e p .
n=1
p p p p p p
Therefore,  
q
≡ (Gauss sum)q−1 (mod p).
p

8. 10/13
2
We now know
 how 2to solve any quadratic congruence ax + bx + c (mod p). This leads to
d
computing p , d = b − 4ac.
We know a variety of facts about the Legendre symbol. In particular, it gives statements
of the form
  (
d 1 if p lies in some residue classes (mod 4|d|)
=
p −1 if p lies in some other residue classes (mod 4|d|)
19
Example 8.0.7.
    (
5 p 1 if p ≡ 1, 4 (mod 5))
= =
p 5 −1 if p ≡ 2, 3 (mod 5))
  ( p
7 7 
if p ≡ 1 (mod 4))
= p
p − 7 if p ≡ 3 (mod 4)),
which leads to conditions on p (mod 28).
Figure out what happens when d = 35.
8.1. Absolute Values in Q. We’ll talk about some pretty theorems. There’s a completely
different way where primes appear. This has to do in some way with analysis. If you think
about real analysis, it is based on the notion of distance between two numbers, which is
based on absolute value. This has some nice properties: |xy| = |x||y| and |x + y| ≤ |x| + |y|.
The questions that we want to think about involve the field Q of rational numbers.
Definition 8.1.1. An absolute value on Q is a function f : Q → R≥0 with the following
properties:
(1) f (x) = 0 iff x = 0
(2) f (xy) = f (x)f (y) for all x, y ∈ Q
(3) f (x + y) ≤ f (x) + f (y) (triangle inequality).
Example 8.1.2. The trivial absolute value: f (0) = 0, f (x) = 1 for all x 6= 0.
Example 8.1.3. f (x) = |x| satisfies these properties. f (x) = |x|α does too when 0 < α ≤ 1.
The conditions on α come from the triangle inequality. We want to check that
xα + y α ≥ (x + y)α .
If we take x = y then we want 2 > 2α , so clearly α < 1. To show our condition, divide both
sides by y α to reduce the inequality to tα + 1 ≥ (t + 1)α , which is a problem in single variable
calculus.
There are another class of examples that come from primes.
Example 8.1.4. p-adic absolute value. Let p be a prime. Consider n ∈ N, and write
n = pα b. Here, pα ||n, so p - b, α ≥ 0.
Define the p-adic absolute value as
|n|p = p−α ,
and additionally, define | − 1|p = 1.
If we have a rational number m n
, define
m |m|p
= .
n p |n|p
Multiplicativity is obvious. We need to check the triangle inequality. For simplicity, we
do this for the integers. Suppose that n1 = pα1 b1 , n2 = pα2 b2 . We want to show that
|n1 + n2 |p ≤ |n1 |p + |n2 |p .
20
Note that |n1 |p = p−α1 , |n2 |p = p−α2 , and |n1 + n2 |p ≤ p− min(α1 ,α2 ) = max(p−α1 , p−α2 ). So
the triangle inequality is true, and indeed, we’ve shown something stronger:
|n1 + n2 |p ≤ max(|n1 |p , |n2 |p ).
To check the triangle inequality for rational numbers, we can extend the previous argument
by clearing denominators.
We needed p to be prime, because otherwise the multiplicativity fails.
With the normal absolute value, the absolute value of the rational numbers form a dense
set in R. In this case, however, the image is |Q|p = {pn : n ∈ Z}.
Strangely, p, p2 , p3 , . . . is small while p1 , p12 , . . . are large.
Example 8.1.5. As in the case of the usual absolute value, we can raise this to a power |x|αp .
With our new triangle inequality, we see that the triangle inequality is satisfied whenever
α > 0.
Theorem 8.1.6 (Ostrowski). These are all of the absolute values on Q.
Proof. Let f be an absolute value on Q. Note that multiplicativity implies that f (1) = 1.
Then f (n) ≤ n by the triangle inequality.
If we consider values of f (n), n ∈ N, there are two cases: all are ≥ 1, or at least one of
them is < 1. We want to show that they come from the normal absolute value and the p-adic
absolute value respectively.
Case 1: Pick the smallest n ∈ N with f (n) < 1. Then by minimality, n = p is prime.
Consider r ∈ N, and take its base p expansion
r = b0 + b 1 p + b2 p 2 + · · · + b s p s ,
0 ≤ bj ≤ p − 1. Then
 
log r
f (r) ≤ f (b0 ) + f (b1 ) + · · · + f (bs ) ≤ (p − 1)(s + 1) ≤ (p − 1) +1 .
log p
Now,
 
k k k log r
f (r ) = f (r) ≤ (p − 1) +1 ,
log p
so
 1/k
1/k k log r
f (r) ≤ (p − 1) +1 .
log p
As k → ∞, we conclude that f (r) ≤ 1.
We want to show that if (r, p) = 1 then f (r) = 1. If (r, p) = 1, then (rk , pk ) = 1. By the
Euclidean algorithm, we can write 1 = rk x + pk y. This means that 1 ≤ f (rk x) + f (pk y) ≤
f (r)k + f (p)k . Let k → ∞. If f (r) < 1 then the right hand side goes to 0 as k → ∞, a
contradiction.
We now know that p is a prime for which f (p) < 1, and f (n) = 1 if (n, p) = 1. But this
tells us everything. Take any n = pa b. Then
f (n) = f (p)a f (b) = f (p)a .
Write f (p) = p−α , α > 0. Then f (p) = |p|αp , and we get that f (n) = |n|αp .
21
Case 2: Now, we consider the case when f (n) ≥ 1 for all n ∈ N. Pick two numbers m, n ∈ N,
m, n > 1. Write m in base n:
m = b0 + b1 n + b2 n2 + · · · + bs ns .
log m
Then f (m) ≤ (f (b0 ) + f (b1 ) + · · · + f (s))f (n)s < (s + 1)(n − 1)f (n)s , where s ≈ log n
.
Now,
 
k k log m log m
f (m ) ≤ 1 + (n − 1)f (n) log n .
log n
Take k-th roots, and let k → ∞. Then
log m
f (m) ≤ f (n) log n ,
so what we’ve shown is that
1 1
f (m) log m ≤ f (n) log n .
If we swap m and n, we see that
1 1
f (m) log m = f (n) log n .
1 1
If we write f (2) = 2α , then f (2) log 2 = f (n) log n , and we get f (n) = nα . The triangle
inequality forces 0 < α ≤ 1. 

Remark. From the rational numbers, we can take the completion: we can obtain 2 as
the limit of rational numbers, but it is not a rational number itself. So we can extend this
absolute value continuously from the rational numbers to the real numbers. We can now
do the same thing with this p-adic absolute value. Think of taking sequences of rational
numbers and consider convergence in the p-adic absolute value.
Example 8.1.7. 1, 1 + 7, 1 + 7 + 72 , 1 + 7 + 72 + 73 , . . . is a sequence of integers. Does this
sequence converge? No for the usual absolute value, but it does converge for | · |7 . It forms
1
a Cauchy sequence, and it converges to 1−7 = − 16 = 1 + 7 + 72 + 73 + · · · .

In fact, we can even consider −1 in the 5-adics. We want to find a sequence x1 , x2 , . . .
of natural numbers with |x2n + 1|5 → 0.
Since 22 + 1 ≡ 0 (mod 5), we have |22 + 1|5 . We can lift 2 to 2 + k · 5, and get congruences
(mod p2 ), (mod p3 ), etc, yielding a converging sequence.

9. 10/18
9.1. Sum of Two Squares. We’ll try to describe all numbers that can be written as the
sum of two squares, and we’ll give two or three proofs of this.
Given a number n, we want to write n = x2 + y 2 , x, y ∈ Z. We want a characterization of
all such n. Here’s the main theorem:
Theorem 9.1.1. n = x2 + y 2 if and only if n = pα1 1 · · · pαk k such that if pj = 3 (mod 4) then
αj is even.
Let’s try to see why this condition is necessary. It is more difficult to show that it is
sufficient.
22
Proof. First, we show that the condition is necessary:
Suppose that n is not of this form and n is the sum of two squares. So p2β+1 ||n, p = 3
(mod 4). Then x2 + y 2 ≡ 0 (mod p) and hence x2 = −y 2 (mod p), and if (y, p) = 1, this
means that (x/y)2 = −1 (mod p), but −1 is a quadratic nonresidue (mod p).
So y is a multiple of p and x is a multiple of p, so x2 + y 2 is a multiple of p2 ; cancel p2 and
repeat.
Now, we show that the condition is sufficient:
First, if m = x21 + y12 is the sum of two squares and n = x22 + y22 is the sum of two squares
then mn is the sum of two squares. Here,
m = (x1 + iy1 )(x1 − iy1 ) n = (x2 + iy2 )(x2 − iy2 )
so that
mn = (x1 x2 − y1 y2 + i(x1 y2 + x2 y1 ))(x1 x2 − y1 y2 − i(x1 y2 + x2 y1 )).
If p ≡ 3 (mod 4), we showed that it isn’t the sum of two squares. But p2 = p2 + 02 is the
sum of two squares, which means that all even powers of p is the sum of two squares. So the
main fact that we want to show is the following theorem:
Theorem 9.1.2 (Fermat). If p ≡ 1 (mod 4), then p = x2 + y 2 .

Proof 1. If p ≡ 1 (mod 4), then −1



p
= 1. This means that p2 +1 ≡ 0 (mod p). So if we look
at the set of all sums of two squares, it contains multiples of p smaller than p2 . If l2 + 1 ≡ 0
(mod p), we can take |l| < p−12
, so hence l2 + 12 ≤ ( p−1
2
)2 + 1 < p2 .
Suppose that pm is the smallest multiple of p which is the sum of two squares, and we can
assume that m < p. Then pm = x2 + y 2 . Suppose that x ≡ x (mod m) and y ≡ y (mod m),
where |x|, |y| ≤ m2 . Then x2 + y 2 ≡ mn and n < m2 .
Now, mn = x2 + y 2 and pm = x2 + y 2 , so pm · mn = pm2 n is the sum of two squares.
With the identity from the beginning of our proof, this means that
pm · mn = pm2 n = (xx + yy)2 + (xy − xy)2 .
Since the second term is 0 (mod m), we see that the first term is also 0 (mod m). This
means that we can cancel m2 everywhere, so pn is the sum of two squares, contradicting the
minimality assumption for m. 

Proof 2. As before, we get l2 + 1 ≡ 0 (mod p), |l| ≤ p−1


2
. Consider p = x2 + y 2 . If x2 + y 2 ≡ 0
(mod p), take y = xl (mod p), so we get x2 + l2 x2 ≡ 0 (mod p).
Search numbers of the form x2 + (lx)2 . We are interested in (x (mod p), (lx) (mod p)).
√ √
What we want is x (mod p) is smaller than p, and lx (mod p) smaller than p. This is
enough: Then x2 + (lx)2 < 2p and it is a multiple of p, so it is equal to p.
√ √
Now, for each of 1 ≤ x ≤ p, we want that |lx − kp| < p, so we want

l k 1
− <√ .
p x px
This is a problem that we can solve with the pigeonhole principle; it is guaranteed by
Dirichlet’s Theorem.
23
Theorem 9.1.3 (Dirichlet’s Theorem on Diophantine Approximation). Given a real number
θ, find a rational number aq which approximates θ, with q ≤ Q and
a 1
θ− < .
q qQ
Proof. Look at 0, θ, 2θ, . . . , Qθ (mod 1), i.e. subtract out the integer part and just keep the
fractional part. There are Q + 1 numbers here. Look at the Q boxes
     
1 1 2 Q−1
0, , , ,··· , ,1 .
Q Q Q Q
By the pigeonhole principle, there exist 0 ≤ j < k ≤ Q with the two numbers jθ and kθ
lying in the same box.
This means that jθ − kθ has fractional part less than Q1 . Then
integer 1
θ= + error, |error| ≤ .
j−k Q(j − k)

Remark. If you are given an irrational number θ, there are infinitely many q with
a 1
θ− ≤ 2.
q q


9.2. Z[i]. We will do arithmetic in the Gaussian ring of integers Z[i]. Here,
Z[i] = {a + bi : a, b ∈ Z}.
This is nice, but we can’t divide. Allowing division, we obtain
Q(i) = {a + bi : a, b ∈ Q}.
Units in Z[i] are ±1 and ±i. One thing that clarifies a lot of stuff is the norm. Define the
norm as
N (a + bi) = a2 + b2 = (a + bi)(a − bi).
This has various nice properties. For example, N (a + bi) is a positive rational number, and
if a + bi ∈ Z[i] then the norm is an integer. Furthermore,
N ((a + bi)(c + di)) = N (a + bi)N (c + di).
If u ∈ Z[i] and u1 ∈ Z[i], then u is called a unit. This means that N (u) = 1, so u = ±1, ±i
are the only units.
If π is a prime, π | αβ implies that π | α or π | β. Here, α | β if β = αγ with γ ∈ Z[i].
α is irreducible if α = βγ implies one of β or γ is a unit. This means that α is irreducible
if it can’t be written as a product of two numbers with smaller norm. Suppose N (α) = p,
then α is irreducible.
Example 9.2.1. 2 + i is irreducible. 7 is irreducible because if 7 = αβ, then N (7) = 49 =
N (α)N (β), which means that N (α) = 7, which is impossible because 7 is not the sum of the
two squares.
24
The question we want to ask is: Is there a division algorithm? If we want to divide, we
want to write as a quotient plus some remainder. Here,
a + bi
= ρ + σi,
c + di
with ρ, σ ∈ Q. Pick r and s to be the closest integers to ρ and σ. Then the quotient is r + si.
We need to show that the remainder has smaller norm than the number I divide by. This
isn’t too hard to do.

10. 10/25
Recall the theorem of Fermat that p ≡ 1 (mod 4) means that p can be written by the
sum of two squares. We already gave two proofs of this. The first was by looking at
minimal multiples of p as the sum of two squares, and the second was by Dirichlet’s theorem
on Diophantine approximation. We started looking at a third proof: The arithmetic of
Z[i] = {a + bi : a, b ∈ Z}, which sits naturally in the field Q(i) = {a + bi : a, b ∈ Q}.

10.1. Z[i]. Here, the norm of a + bi ∈ Q(i) is N (α) = a2 + b2 = αα. Note that N (αβ) =
N (α)N (β).
α ∈ ZZ[i] is a unit if α1 ∈ Z[i]. If α ∈ Z[i], then N (α) ∈ N (could be zero if α = 0), so if
α is a unit, N (α) = 1, and the only units are α = ±1, ±i.
α | β if β = αγ for some γ ∈ Z[i]. α ∈ Z[i] is irreducible if α = βγ for β, γ ∈ Z[i] implies
that β or γ is a unit. Equivalently, α 6= βγ with 1 < N (β), N (γ) < N (α).
If N (α) = p (i.e. α is a rational prime) then α is irreducible. For example, 1 + 2i ∈ Z[i]
is irreducible, as are 1 − 2i, 1 + i, 3 + 2i. But there are other irreducibles too. If p ≡ 3
(mod 4), then p is irreducible in Z[i].
Proof. Suppose that p is irreducible, and p = αβ, so N (α)N (β) = p2 , and so N (α) = N (β) =
p. But then p = a2 + b2 , contradicting p ≡ 3 (mod 4). 
Note that 5 = (1 + 2i)(1 − 2i) and 2 = (1 + i)(1 − i) = (1 + i)2 (−i), so these are not
irreducible.
Our aim is to show that if p ≡ 1 (mod 4) then p = ππ where N (π) = p and π is irreducible.
π is prime means that if π | αβ then π | α or π | β. What we would like to prove is
Theorem 10.1.1. In Z[i], every irreducible is prime and conversely.
Proof. In the case of the integers, we used the division algorithm. We want to do something
similar here.
Note that the converse is easy: If π is prime and π = αβ, then π | α or π | β. Then
N (α) ≥ N (π) or N (β) ≥ N (π). This implies that N (α) = N (π) and N (β) = 1, or the other
way around, so π is irreducible.

10.1.1. Division algorithm. Given α, β ∈ Z[i], we can write α = βγ + δ, where γ is the


quotient and δ is the remainder. We want to be able to do this with 0 ≤ N (δ) ≤ N (β).
Proof. Take αβ = ρ + σi, ρ, σ ∈ Q. Take r and s to be integers with − 12 < ρ − r < 1
2
and
− 12 < σ − s < 12 .
25
Then take γ = r + si. Then δ = ((ρ − r) + (σ − s)i)β. Then
1
N (δ) = N (β)((ρ − r)2 + (σ − s)2 ) < N (β).
2

10.1.2. Euclidean algorithm. Our aim is to understand the common factors of α and β. Then
write α = βγ + δ and β = (· · · )δ + (· · · ). This is nice because it always terminates because
of the division algorithm.
We can express the “greatest common divisor” of α and β as a linear combination αx + βy
where x, y ∈ Z[i].
10.1.3. Finishing the proof. We can now show that if π is irreducible, π | αβ, then π | α
or π | β. Suppose that π - α. Then the only common factors of π and α are units ±1, ±i.
By the Euclidean algorithm, we can therefore write 1 = αx + πy. Then β = αβx + βπy.
Therefore, π | β. 
Theorem 10.1.2. p ≡ 1 (mod 4) then p = x2 + y 2 is the sum of two squares.
Proof. We want to show that p is reducible, as then we could write p = ππ, and then
N (π) = N (π) = x2 + y 2 = p.
So suppose that p is irreducible, which is equivalent to saying that p is prime. We know
that there is l such that l2 + 1 ≡ 0 (mod p). This is simply the fact that −1 is a quadratic
residue (mod p). This means that p | (l + i)(l − i), so p | l + i or p | l − i. But these are both
impossible, i.e. l + i = p(a + bi) means that l = pa and 1 = pb is impossible. So p = ππ is
in fact reducible, and we’re done. (Note that π and π are primes in Z[i] because they have
norm p. 
10.2. Arithmetic of Z[i]. There is unique factorization into primes (up to units and order-
ing of the primes). This is the same proof as in the case of the integers (cancel them).
The primes in Z[i] are rational primes p ≡ 3 (mod 4). If p ≡ 1 (mod 4), then p = ππ splits
as the product of two primes, and 2 = (1+i)2 (−i). Here, 2 is special because 1−i = −i(1+i),
so 1 − i and 1 + i are actually the same prime. √ √
Instead
√ of Z[i], we could try to do the
√ same thing in Q( −5), where we consider Z[ −5] =
{a + b −5 : a, b ∈ Z}. Here, N (a + b −5) = a2 + 5b2 , and we’re√hopeful that we √ can repeat
what we did earlier. Here, √the units √ are ±1, and 2, 3, 1 + −5, and 1 − −5 are all
irreducible, but 2 · 3 = (1 + −5)(1 − −5), and there is no unique factorization.
Here, the division algorithm fails because the remainder is too big. This means that the
Euclidean√ algorithm may not terminate.
In Q( 14), there is unique factorization,√but there is no Euclidean algorithm.
There are imaginary quadratic fields Q(− d), d > 0. There are exactly 9 values
√ of d where
one gets unique factorization into irreducibles. The largest example is Q(− 163). This is
connected to the fact observed by Euler that n2 + n + 41 is prime when n = 0, 1, 2, . . . , 39,
and the discriminant of this is −163. There is no better quadratic.
In the integers,
{ax + by} = {gx : x ∈ Z}
where g = gcd(a, b). Something similar is true in Z[i]:
{αx + βy} = {γx}.
26
This relates to the idea of an ideal, which corresponds to what is sounds like: ideal integers.
If α, β ∈ I, then αx + βy ∈ It for all x, y ∈ R. For example, in Z, (7) is an ideal. In some
sets,
√ every ideal is generated
√ by one number; in other sets, this is not true. For example, in
Z[ −5], (2x + (1 + −5)y) is an ideal that is not principal.

11. 10/27
We will move on to the big theorem that we will prove in the rest of the course.
Theorem 11.0.1 (Dirichlet’s Theorem on Primes in Arithmetic Progressions). If (a, p) = 1,
then any arithmetic progression a (mod q) contains infinitely many primes.
This is a very simple sounding statement, but the proof is not so simple. This will take
around three weeks for us to prove. We’ll build up a proof and do this case by case.
Before we consider the main idea of the proof, let’s look at a case you’ve already handled.
We can prove that there are infinitely many primes ≡ 1 (mod 4) and infinitely many primes
≡ 3 (mod 4). The case of 1 (mod 4) was on the homework, using our knowledge about sum
of two squares.
For the case of 3 (mod 4), we have primes p1 , p2 , . . . , pn . Then 4p1 p2 . . . pn − 1 must be
divisible by a new prime ≡ 3 (mod 4).
Similarly, in the spirit of Euclid, we can prove that there are infinitely many primes that
are ≡ 1 (mod 3) and ≡ −1 (mod 3). This will be on the next problem set.
This trick fails for −1 (mod 5), however, as 5p1 p2 . . . pn − 1 could be pq with p ≡ q ≡ 2
(mod 5).
P1
11.1. Euler’s proof of the infinitude of primes. This is based on the fact that p
P 1
diverges. This can be seen from p σ for σ > 1. This converges, but we’d like to say that
this tends to infinity as σ → 1+ .
A nice way to think about this is to consider the Riemann zeta function

X 1
ζ(s) = .
n=1
ns
For example, if s is real, s > 1, this series converges absolutely. If we think of s = σ + it as
a complex number, we have
1 1 1
s
= σ it
n n n
The final term has absolute value 1, so
1 1
s
= σ.
n n
So ζ(s), s = σ + it, converges absolutely for σ > 1.
The Riemann zeta function has a very natural connection with prime numbers. For every
natural number, we can factor it into primes in an unique way. Then
Y  X ∞
1 1 1
1 + σ + 2σ + 3σ + · · · = ζ(s).
p
p p p n=1

This identity is due to Euler, and it is a form of unique factorization.


27
It isn’t obvious what it means for a product to converge. Suppose we have

Y
(1 + an ).
n=1
P
If an is small, then 1 + an ≈ ean . Then (1 + an ) ≈ e an , so for the product to converge,
Q
we might want the sum of an to converge. We make this more formal.
Definition 11.1.1.

Y
(1 + an )
n=1
converges absolutely if

X
|an |
n=1
converges.
We can certainly say that 1 + an = exp(an + O(an )2 ). If we assume that
P
an converges,
then the product makes sense.
Suppose that we take
∞  
Y 1
1− .
n=2
n
The partial products are 12 · 32 · 34 · · · NN−1 = N1 → 0 as n → ∞, but the product does not
converge absolutely. If a product converges absolutely, then it is not zero unless one of the
terms in the product is zero.
Let’s look
P 1back to the product in the zeta function. This product does converge absolutely
because pσ
converges. Then
∞ Y  Y Y −1
X 1 1 1 1 1
ζ(s) = = 1 + s + 2σ + · · · = = 1− s .
n=1
ns p
p p p
1 − p1s p
p

This product converges absolutely if s > 1, or <s > 1, and converges is this range to a
nonzero value.
Why does this prove that there are infinitely many primes? Suppose that there are finitely
many primes. Then the right hand side remains bounded as σ → 1+ . However, as σ → 1+ ,
∞ Z 2 Z 3 Z ∞
X 1 dt dt dt 1
ζ(σ) = σ
≥ σ
+ σ
= σ
= .
n=1
n 1 t 2 t 1 t σ−1
On the other hand,
Z 2 Z 3
dt dt 1
ζ(σ) ≤ 1 + + + ··· ≤ + 1.
1 tσ 2 tσ σ−1
Proposition 11.1.2. For σ > 1,
 
1 1
ζ(σ) = + O(1) = + γ + c1 (σ − 1) + · · ·
σ−1 σ−1
28
We can then write  
X 1
log ζ(σ) = − log 1 − σ .
p
p
Using

X xk
− log(1 − x) = ,
k=1
k
we have ∞
XX 1 X 1 
1

log ζ(σ) = kσ
= σ
+O 2σ
.
p k=1
kp p
p p
Now,
∞ ∞  
X 1 X 1 1 1 1

≤ kσ
= 2σ 1 = O 2σ
.
k=2
kp k=2
2p 2p 1 − p σ p
Therefore, we have
Proposition 11.1.3.

XX 1 1
log ζ(σ) = kσ
= σ + O(1).
p k=1
kp p

Corollary 11.1.4.
X 1 1
σ
= log + O(1).
p
p σ − 1

This proves that there are infinitely many primes.


11.2. Dirichlet’s theorem for 1 (mod 4) and 3 (mod 4). The proof will be a generaliza-
tion of Euler’s proof of the infinitude of primes, but we need to construct one more function
since we are splitting the primes into two groups. This is called the Dirichlet L-function,
which we will denote by L(s, χ−4 ).
Definition 11.2.1. 
1
 if n ≡ 1 (mod 4)
χ−4 (n) = −1 if n ≡ 3 (mod 4)

0 if n is even
This is periodic with period 4: χ−4 (n) = χ−4 (n + 4). It is also multiplicative: χ−4 (mn) =
χ−4 (m)χ−4 (n).
Definition 11.2.2.
∞ Y χ−4 (p) χ−4 (p2 )

X χ−4 (n) 1 1 1 1
L(s, χ−4 ) = = s − s + s − s + ··· = 1+ + .
n=1
ns 1 3 5 7 p
ps p2s

This last equality again follows from unique factorization and the fact that χ−4 is com-
pletely multiplicative. This is
∞ −1
χ−4 (p)k Y

YX χ−4 (p)
= ks
= 1− s
.
p k=0
p p
p
29
When <s > 1, the product converges absolutely, and it converges to something nonzero.
Therefore, L(s, χ−4 ) 6= 0 if <s > 1.
Where does the series converge? This is an alternating series, and we can using the
alternating series test to see that this converges for s > 0. This doesn’t say anything about
the product, however.
Now, consider

χ−4 (p)k X χ−4 (p)
  XX
X χ−4 (p)
log L(σ, χ−4 ) = − log 1 − = = + O(1).
p
pσ p k=1
pkσ p

We also know that


X 1
log ζ(σ) = + O(1).
p

Therefore,
X 2
log ζ(σ) + log L(σ, χ−4 ) = + O(1)

p≡1 (mod 4)
X 2
log ζ(σ) − log L(σ, χ−4 ) = + O(1)

p≡3 (mod 4)

Let σ → 1+ . Then
X 2
+ O(1) = log ζ(σ) + log L(σ, χ−4 ).

p≡1 (mod 4)

Since log ζ(σ) → ∞, we are done if we can show log L(σ, χ−4 ) does not go to −∞. In fact,
this actually converges to a log L(1, χ−4 ). So we want to show that L(1, χ−4 ) 6= 0. In this
case,
1 1 1 1 π
L(1, χ−4 ) = − + − + · · · = ,
1 3 5 7 4
and hence there are infinitely many primes ≡ 1 (mod 4). The other case is similar, so
X 1 X 1 1 1
σ
≈ σ
≈ log + O(1).
p p 2 σ−1
p≡1 (mod 4) p≡3 (mod 4)

12. 11/1
We are building toward a proof that there are infinitely many p ≡ a (mod q) when (a, q) =
1. We are adding to Euler’s proof of the infinitude of primes.

12.1. Review. Recall


∞ Y −1
X 1 1
ζ(s) = s
= 1− s .
n=1
n p
p
Then

XX 1
log ζ(s) = ks
,
p k=1
kp
30
and
X 1
log ζ(σ) = + O(1).

We also proved
1
ζ(σ) = + O(1).
σ−1
We introduced
Y −1
1 1 1 1 χ−4 (p)
L(s, χ−4 ) = s − s + s − s + · · · = 1−
1 3 5 7 p
ps

where χ−4 is completely multiplicative and periodic. This series converges absolutely if
<s > 1 and converges conditionally if s > 0. The product converges absolutely if s > 1.
Now,

XX χ−4 (p)k X χ−4 (p)
log L(σ, χ−4 ) = kσ
= σ
+ O(1).
p k=1
kp p
p

Then
1 X 1
(log ζ(σ) + log L(σ, χ−4 ) = + O(1)
2 pσ
p≡1 (mod 4)

1 X 1
(log ζ(σ) − log L(σ, χ−4 ) = + O(1).
2 pσ
p≡3 (mod 4)

6 0 because L(1, χ−4 ) = π4 .


Let σ → 1+ , L(σ, χ−4 ) → L(1, χ−4 ) 6= ∞ =

Theorem 12.1.1. As σ → 1+ ,
X 1 1 X 1
σ
= ζ(σ) + O(1) = .
p 2 pσ
p≡1 (mod 4) p≡3 (mod 4)

Here is a pretty fact:


1
= 1 − t2 + t4 − t6 + · · ·
1 + t2
Then
Z 1
dt 1 1 1 π
= 1 − + − + · · · = arctan(1) − arctan(0) = .
0 1 + t2 3 5 7 4
Now, the plan will be to generalize this whole proof for every modulus. We’ll need to con-
struct functions analogous to χ−4 and to write down functions like L(s, χ−4 ). The functions
that we write down will need to converge and be positive, which in general is very hard.
Then we’ll have analogs of the theorem. Evaluating at 1 will be interesting and is called the
class number formula.
31
12.2. q = 3. Now, let q = 3. There were two reduced residue classes (mod 4), so we needed
two functions. You’d think that the same would be true here. One of them is the zeta
function, so we need another function.
We want to find χ−3 that is periodic with period 3 and that is completely multiplicative
and not identically zero.
We want to set χ−3 (n) = 0 if 3 | n. Periodic with period 3 means that χ−3 : (Z/3Z)× → C.
Also, if m, n are reduced residue classes (mod 3), we want χ−3 (mn) = χ−3 (m)χ−3 (n). So we
have to define χ−3 (1 (mod 3)) = 1, and χ−3 (2 (mod 3)) = ±1. If we chose +1, we would
get the zeta function, so we’ll actually take χ−3 (2 (mod 3)) = −1. Then

1
 p ≡ 1 (mod 3)
χ−3 (p) = −1 p ≡ 2 (mod 3)

0 p = 3.
which extends to 
1
 n≡1 (mod 3)  
n
χ−3 (n) = −1 n ≡ 2 (mod 3) = .

0 3
n=3
We said before that the Legendre symbol is completely multiplicative, so we’re looking for
generalizations of Legendre symbols.
Now, define
∞  −1
X χ−3 (σ) Y x−3
L(σ, χ−3 ) = = 1− σ ,
n=1
nσ p
p
and X χ−3
log L(σ, χ−3 ) = + O(1)
p

if σ > 1. Then
1 X 1
(log ζ(σ) + log L(σ, χ−3 ) = + O(1)
2 pσ
p≡1 (mod 3)

1 X 1
(log ζ(σ) − log L(σ, χ−3 ) = + O(1).
2 pσ
p≡2 (mod 3)
+
Let σ → 1 , L(σ, χ−3 ) → L(1, χ−3 ) 6= ∞ 6= 0. We know that this does not go to infinity
because of the conditional convergence from the alternating series test. It does not go to
zero because we can sum the series.
Z 1 Z 1 Z 1
3 4 6 7 1−t dt
L(1, χ−3 ) = (1 − t + t − t + t − t + · · · ) dtT = 3
dt = 2
dt
0 0 1−t 0 1+t+t
Z 1 Z 3/2 Z √3 √
dt dy 3 dz 4
= 2
= 2
= √ 2
0 (t + 1/2) + 3/4 1/2 y + 3/4 1/ 3 2 z + 1 3
2  √ √  π
= √ arctan( 3) − arctan(1/ 3) = √ .
3 3 3
We have therefore proved an analogous theorem to what we got before:
32
Theorem 12.2.1. As σ → 1+ ,
X 1 1 X 1
σ
= ζ(σ) + O(1) = .
p 2 pσ
p≡1 (mod 3) p≡2 (mod 3)

12.3. q = 8. We want to consider 1, 3, 5, 7 (mod 8). To distinguish these four possibilities,


we should look for four functions. We have the zeta function as before, which we will give a
new name: (
1 (n, 8) = 1
χ0 (n) =
0 (n, 8) > 1.
Then
∞ Y −1
X χ0 (n) 1
L(σ, χ0 ) = = 1− σ .
n=1
nσ p
p
So χ0 : (Z/8Z)× → C, and it takes the value 1 on all reduced residue classes. We want three
more functions that are periodic mod 8. They shall be zero on the even numbers. We
want χ : (Z/8Z)× → C not identically zero. They also must satisfy χ(mn) = χ(m)χ(n) for
any two residue classes m, n (mod 8). Another way of saying this is that we want a group
homomorphism from (Z/8Z)× to C.
As before, we must have χ(1) = 1. Note that (Z/8Z)× ∼ = (Z/2Z) × (Z/2Z). There are
two possibilities: χ(3) = ±1.
If χ(3) = 1, then χ(5) = ±1. If χ(5) = 1 then χ(7) = 1. If χ(5) = −1 then χ(7) = −1.
If χ(3) = −1, then χ(5) = ±1. If χ(5) = 1 then χ(7) = −1. If χ(5) = −1 then χ(7) = 1.
These are all of them: there are precisely four of them.
1 3 5 7
χ0 1 1 1 1
χ−4 1 -1 1 -1
χ8 1 -1 -1 1
χ−8 1 1 -1 -1
Currently, this is just numerology. Call these functions characters. We’ll generalize this
in the next lecture or so.
In each of these four characters, we can look at
X χ(n) Y  −1
χ(p)
L(s, χ) = = 1− s .
ns p
p
The series and product converge absolutely when s > 1.
What are we hoping to do? In the case 1 (mod 8), we want
1
(χ0 (n) + χ−4 (n) + χ−8 (n) + χ−4 (n))
4
In the case 3 (mod 8), we want
1
(χ0 (n) − χ−4 (n) − χ−8 (n) + χ−4 (n))
4
In the case 5 (mod 8), we want
1
(χ0 (n) + χ−4 (n) − χ−8 (n) − χ−4 (n))
4
33
In the case 7 (mod 8), we want
1
(χ0 (n) − χ−4 (n) + χ−8 (n) − χ−4 (n))
4
When χ = χ0 , we want  
1
L(s, χ0 ) = ζ(s) 1 − s .
2
If σ > 1,
log L(σ, χ0 ) = log ζ(σ) + O(1).
We already know that L(s, χ−4 ) converges conditionally for s > 0. Additionally,
1 1 1 1
L(s, χ8 ) = s − s − s + s + · · ·
1 3 5 7
1 1 1 1
L(s, χ−8 ) = s + s − s − s + · · ·
1 3 5 7
Here, we can combine adjacent terms to get alternating series that converge conditionally
for s > 0 provided we work out some generalization of the alternating series test.
For all of these characters, σ > 1, and
X χ(p)
log L(σ, χ) = + O(1).

Now,
X 1 1
σ
= (log L(σ, χ0 ) + log L(σ, χ4 ) + log L(σ, χ−8 ) + log L(σ, χ8 )) + O(1),
p 4
1 (mod 8)

and there are three more of these. The alternating series test says that none of these is zero.
The last step is to show that none of these is zero; this step is left as an exercise.
12.4. q = 5. Here, we want to consider χ : (Z/5Z)× → C, where χ(mn) = χ(m)χ(n), χ is
not identically zero.
As before χ(1) = 1, and the group (Z/5Z)× is cyclic. It is generated by 2. We just need
to know χ(2). Since χ(2)4 = χ(16) = χ(1) = 1, we get four possibilities: χ(2) = ±1, ±i.
We can again write down the character table:
1 2 3 4
χ0 1 1 1 1 trivial or principal character
χ5 1 -1 -1 1
ψ 1 i -i -1
ψ 1 -i i -1
We can use these to identify each progression (mod 5). For example, for 2 (mod 5), we
have
1
(χ0 − x5 − iψ + iψ).
4
This is kind of like taking the dot product.
We again define
Y −1 X
χ(p) χ(n)
L(s, χ) = 1− s = s
.
p
p n
34
These converge absolutely when s > 1. When χ = χ0 , we have
 
1
L(s, χ0 ) = ζ(s) 1 − s .
5
Here,
log L(σ, χ0 ) = log ζ(σ) + O(1).
For χ 6= χ0 , we need an alternating series test that will tell us that they converge condi-
tionally for s > 0. We also need to know that L(1, χ) 6= 0 for these characters. The complex
ones aren’t too hard; the real one was done at the Putnam seminar a few weeks ago.

13. 11/3
We were trying to define Dirichlet characters for every modulus to separate out every
reduced residue class.
In the case of q = 5, we saw that χ : (Z/5Z)× → C. 2 is a primitive root (mod 5), so
χ(2)4 = 1, and we computed a character table.
We defined
∞  −1
X χ(n) Y χ(p)
L(s, χ) = = 1− s ,
n=1
ns p
p
which converges for s > 1.
1

If χ = χ0 , we have L(s, χ0 ) = ζ(s) 1 − 5s
. Then
1 1 1
L(s, χ0 ) = 1 − − + + ···
2s 3s 4s
converges conditionally for s > 0, and
1 1 1
L(s, ψ) = 1 + s
− s − s + ···
2 3 4
converges conditionally for s > 0.
Now,

χ(p)k
  XX
X χ(p)
log L(s, χ) = − log 1 − s = ks
.
p
p p k=1
kp
Recall that log of complex numbers is dangerous, because it is not single valued; adding
multiples of 2πi does not change log. When s > 1, we have
X χ(p)
= + O(1).
p
ps
Then,
X 1 1 
= log L(s, χ0 ) + log L(s, χ5 ) + log L(s, ψ) + log L(s, ψ) + O(1).
ps 4
p≡1 (mod 5)

We want L(1, χ5 ), L(1, ψ), L(1, ψ) to be not infinity or zero. For L(s, ψ) and L(s, ψ), we
can look at the imaginary part, i.e.
1 1 1 1
=L(1, ψ) = − + − + · · · > 0.
2 3 5 8
35
We can do this trick of writing it as an integral:
Z 1 1
1 − t − t2 + t3
Z
2 3 5 6 7 8

L(1, χ5 ) = 1 − t − t + t + t − t − t + t + · · · dt = dt,
0 0 1 − t5
and this is left to you as an exercise. Recall from calculus that any rational function can be
integrated. This will have some nice answer in terms of logs of the golden ratio.
We can do the same thing for every progression (mod 5) to see that there are infinitely
many primes in every progression, and in fact, a quarter in each progression.
We need to prove orthogonality relations for the character table to make sure that we
can produce every arithmetic progression, and we need to find a more general form of the
alternating series test. Then we will have a way to isolate the primes in every progression,
so when we let s → 1+ , we will need to show that the L-functions are not infinity or zero.
13.1. Dirichlet characters (mod q).
Definition 13.1.1. A Dirichlet character χ(q) is a function χ : Z → C with the properties
(1) χ(n) = 0 if and only if (n, q) > 1.
(2) χ(n + q) = χ(n)
(3) χ(mn) = χ(m)χ(n) for all m, n.
Therefore, we have χ : (Z/qZ)× → C× is a group homomorphism.
How do we figure out what these functions are? We know that χ(1) = 1 because χ(n) =
χ(n)χ(1) for all n. Now, if (a, q) = 1,
χ(a)φ(q) = χ aφ(q) = χ(1) = 1,


so each χ(a) is a φ(q)-th root of unity.


We always have the principal character
(
1 (n, q) = 1
χ0 (n) =
0 (n, q) > 1.
How many characters are there? There are only finitely many reduced residue classes and
finitely many choices for χ(a), so there are only finitely many characters. (We expect there
to be φ(q) characters based on the examples computed earlier.)
If q = pα where p is an odd prime, then there exists a primitive root g (mod q). Then we
only need to specify χ(g), so there are exactly φ(q) characters here. So we can take
2πil
χ(g) = e φ(q) , 0 ≤ l ≤ φ(q) − 1.
What if q = 2α ? The case of α = 1 is trivial, and we’ve done α = 2 and α = 3. To specify
χ, we need to specify χ(−1) and χ(5). The choices are χ(−1) = ±1 and χ(5) is a 2α−2 -th
root of unity. So there are 2α−1 characters here.
If p is an odd prime and g is a primitive root. Then χ(g) = −1 and x are Legendre
symbols. So these characters are generalizations of the Legendre symbols.
The characters form a group. Suppose that χ1 and χ2 are characters (mod q). Define
χ1 χ2 (n) = χ1 (n)χ2 (n). This is still a character because it is periodic and completely multi-
plicative, and it is not identically zero. The identity element is χ0 . The inverse of χ is the
complex conjugate χ. The Legendre symbol is its own inverse.
For example, in the case q = 5, recall the character table
36
1 2 3 4
χ0 1 1 1 1
χ5 1 -1 -1 1
ψ 1 i -i -1
ψ 1 -i i -1
The characters here form a cyclic group ψ, ψ 2 = χ5 , ψ 3 = ψ, and ψ 4 = χ0 .
13.2. Orthogonality relations. Now, we need to show the orthogonality relations. Let G
be the group of all characters x (mod q). At the moment, we only know that G is a finite
group.
13.2.1. Orthogonality of rows.
Proposition 13.2.1. If χ ∈ G, consider the row sums
(
X φ(q) χ = χ0
χ(n) =
n (mod q)
0 χ 6= χ0 .
In fact, (
X φ(q) χ=ψ
χ(n)ψ(n) =
n (mod q)
0 χ 6= ψ.

Proof. The second statement follows from the first statement for χψ. (χψ = χ0 ⇔ χ = ψ).
So we only have to prove the first statement.
Define X
S(χ) = χ(n)
n (mod q)
Take a number c so that (c, q) = 1, and multiply both sides by χ(c). So
X X X
χ(c)S(χ) = χ(c)χ(n) = χ(cn) = χ(m) = S(χ).
n (mod q) n (mod q) m (mod q)

This means that either S(χ) = 0 or χ(c) = 1. If S(χ) 6= 0 then χ(c) = 1 for all (c, q) = 1, so
that χ = χ0 , which is what we wanted to prove. 
13.2.2. Characters for composite moduli. Next, we want to show the orthogonality of columns
and we want to know what all of the characters are. We want to show how to construct
characters for composite moduli.
Consider characters χ1 (mod q1 ) and χ2 (mod q2 ), (q1 , q2 ) = 1. We can define x1 x2 (n) =
x1 (n)x2 (n). We claim that χ1 χ2 is a character (mod q1 q2 ). This is periodic by the Chinese
Remainder Theorem. This is also completely multiplicative, so it is a character.
So if q = pα1 1 pα2 2 · · · pαk k , we have characters χ1 (mod pα1 1 ), χ2 (mod pα1 2 ), · · · χk (mod pα1 k ),
we can multiply these to get a character χ (mod q) = χ1 χ2 · · · χk .
Remark. If we take another choice ψ1 (mod pα1 1 ), ψ2 (mod pα2 2 ), · · · , ψk (mod pαk k ), we can
construct ψ (mod q).
Say χ1 6= ψ1 . Then we want to say that χ 6= ψ.
α
Proof. Say χ1 (n) 6= ψ1 (n). Choose m ≡ n (mod pα1 1 ), m ≡ 1 (mod pj j ) for all j > 1. Then
χ(m) = χ1 (n) and ψ(m) = ψ1 (n) 
37
This tells us that we have at least φ(q) distinct characters (mod q). Consider χψ (mod q).
This corresponds to χ1 ψ1 (mod pα1 1 ), χ2 ψ2 (mod pα2 2 ), · · · , χk ψk (mod pαk k ).
α
Let H denote the group of characters that we obtain by multiplying characters (mod pj j ).
We would like to show that H = G.
13.2.3. Orthogonality of columns.
Proposition 13.2.2. Let (n, q) = 1. Then
(
X φ(q) n ≡ 1 (mod q)
χ(n) =
χ∈H
0 n 6≡ 1 (mod q)
Given this, we can generalize slightly to get
(
X φ(q) n ≡ a (mod q)
χ(n)χ(a) =
χ∈H
0 n 6≡ a (mod q)

Proof. Note that statement 2 follows from statement 1 Since aa−1 ≡ 1 (mod q), we have
χ(a)χ(a−1 ) = 1, so χ(a) = χ(a−1 ), and then
X X
χ(n)χ(a) = χ(na−1 ).
χ∈H χ∈H
P
Define S(n) = χ∈Hχ(n). Take ψ ∈ H. Then
X X X
ψ(n)S(n) − ψ(n)S(n) = (ψχ)(n) = ρ(n) = S(n).
χ∈H χ∈H ρ∈H

Therefore, either S(n) = 0 or ψ(n) = 1 for all ψ ∈ H. If ψ(n) ≡ 1 for all ψ ∈ H, then n ≡ 1
(mod q).
Check this: Prove it for q = pα . 

14. 11/8
14.1. Review. We are interested in Dirichlet characters χ : (Z/pZ)× → C× , and in fact,
they always have values that lie on the unit circle. The values of χ are φ(q)-th roots of unity.
The characters χ form a group, where χ0 is the principal character is the identity, and
−1
χ = x is the complex conjugate and inverse. If χ, ψ are characters, then χψ(n) = χ(n)ψ(n)
is a character.
Let the group of characters (mod q) be G. This is a finite abelian group.
We proved the first orthogonality relation:
(
X φ(q) χ = χ0
χ(n) = .
n (mod q)
0 χ 6
= χ0

The proof was reasonably easy. Similarly, we saw that


(
X φ(q) χ = ψ
χ(n)ψ(n) = .
n (mod q)
0 χ=6 ψ

We discussed the case q = pα , p is odd. Here, (Z/qZ)× is cyclic, so it is easy to see what
the characters are. We can pick a generator g. Then χ(g) determines χ(g k ) for all k, and
38
2πil
so χ is determined for all reduced residue classes. Note that we can have χ(g) = e φ(q) for
0 ≤ l ≤ φ(q) − 1. Therefore, for q = pα , we explicitly described φ(q) characters (mod q).
Now, we describe what happens for a composite modulus. Suppose q = pα1 1 pα2 2 · · · pαk k .
Consider characters
χ1 (mod p1 )α1 , χ2 (mod p2 )α2 , ··· , χk (mod pk )αk .
Then, define χ (mod q) via χ(n) = χ1 (n)χ2 (n) · · · χk (n). Different choices for (χ1 , · · · , χk )
give different choices of χ (mod q). We have therefore constructed φ(q) characters χ (mod q)
in this fashion. The characters we have constructed in this way also form a group. Call this
group H. We want to show that G = H; all of the characters arise this way. To find
characters for a composite modulus, multiply characters for prime power modulus.
To do this, we proved another orthogonality relation. Given (n, q) = 1,
(
X φ(q) n ≡ 1 (mod q)
χ(n) = .
χ∈H
0 n ≡
6 1 (mod q)
This is the same as saying that if (n, q) = 1, (a, q) = 1, then
(
X φ(q) n ≡ a (mod q)
χ(n)χ(a) = .
χ∈H
0 n 6≡ a (mod q)

Proof. Let X
S(n) = χ(n).
χ∈H
Take any character ψ ∈ H. Then
X X
ψ(n)S(n) = χψ(n) = χ(n) = S(n),
χ∈H χ∈H

so either S(n) = 0 or ψ(n) = 1 for all characters ψ ∈ H. The only way the latter condition
can hold is when n ≡ 1 (mod q). The proof is to pick χ2 , · · · , χk to all be the trivial
character, and only vary χ1 . Since χ1 comes from a root of unity, we have n = g k , and
2πikl
therefore χ(n) = e φ(q) , which is only possible for k = 0 and hence n = 1. The same
argument holds for the other characters χ2 , · · · , χk , and so we’re done. 
We get for free that G = H are there are no more characters (mod q). Suppose that X is
some character (mod q) which is in G but not H. By the first orthogonality relation, if we
take any ψ ∈ H, X
X(n)ψ(n) = 0.
n (mod q)

Now take any (c, q) = 1 and multiply both sides by ψ(c). Then
X
X(n)ψ(n)ψ(c) = 0.
n (mod q)

Since this is true for all ψ ∈ H, we can sum over them:


X X
X(n) ψ(n)ψ(c) = 0.
n (mod q) ψ∈H
39
By the second orthogonality relation, this gives

φ(q) · X(c) = 0,

so therefore X(c) = 0 for all (c, q) = 1, so X(·) is identically zero and therefore G = H.
Here is another way to think about the previous discussion. We are interested in the space
of all functions (Z/qZ)× → C. This is a vector space over C of dimension φ(q). There are
some vectors in this space that we like. A nice basis for the space would be (for (a, q) = 1):
(
1 n ≡ a (mod q)
fa (n) = .
0 n 6≡ a (mod q)

These form an orthonormal basis with the usual inner product:


X
< f, g >= f (n)g(n).
n (mod q)

We’ve written down another basis for the space. This is not as simple as the previous basis,
but it has another very important property: Our new basis consists of group homomorphisms.
These are the characters χ : (Z/qZ)× → C, and they respect the group structure. There
are exactly φ(q) of these, and they also form an orthogonal basis by the first orthogonality
relation. The second orthogonality relation is simply a change of basis relation between our
two bases.

Remark. This is actually an important principle. Given some arbitrary group, we can write
down bases of the space of maps from the group to some set. If the group is abelian, life is
wonderful. If not, that’s the realm of representation theory.
Dirichlet did this before the idea of abstract groups, so he was the first person to deal
with these characters. These ideas predate the idea of groups.

14.2. Plan of proof of Dirichlet’s Theorem. We now know that there are φ(q) characters
χ(q). For each, form
∞ −1
χ(p) χ(p2 )
  Y
X χ(n) Y χ(p)
L(s, χ) = = 1 + s + 2s + · · · = 1− s .
n=1
ns p
p p p
p

If <(s) > 0, both the series and the product converge absolutely, and L(s, χ) 6= 0 if <(s) > 1.
Recall that if χ = χ0 , we have
Y  Y −1
1 1
L(s, χ0 ) = ζ(s) 1− s = 1− s
p p
p|q p-q
 
1 φ(q)
As s → 1+ ,
Q
p|q 1− ps
→ q
and ζ(s) → ∞. For s > 1,


XX χ(pk ) X χ(p)
log L(s, χ) = = + O(1).
p k=1
kpks p
ps
40
Now, given a residue class a (mod q), (a, q) = 1, we have
 
X 1 X 1  1 X
s
= s
χ(p)χ(a)
p p
p φ(q)
p≡a (mod q) χ (mod q)
!
1 X X χ(p)
= χ(a)
φ(q) p
ps
χ (mod q)
1 X
= χ(a) (log L(s, χ) + O(1))
φ(q)
χ (mod q)
1 X
= χ(a) log L(s, χ) + O(1).
φ(q)
χ (mod q)

We want this to diverge because that would give infinitely many primes ≡ a (mod q). Now,
for χ = χ0 , we have log L(s, χ0 ) = log ζ(s) + O(1) → +∞ as s → 1+ .
The crux of the remainder of the proof will be to show that as s → 1+ , χ 6= χ0 , we want
that L(s, χ) does not tend to 0 or ∞, i.e. that log L(s, χ) is bounded as s → 1+ , χ 6= χ0 .
One part will be easy. To show that L(s, χ) does not go to infinity, we only need a
generalization of the alternating test. To show that this does not go to zero is quite hard,
especially for real characters.

14.3. Generalization of alternating series test. We generalize the alternating series test
to show that L(s, χ) makes sense for s > 0.

14.3.1. Partial summation. Assume that there is some sequence of complex numbers an , and
assume that there is “a nice function” f (n). Then we want to consider
B
X
an f (n).
n=A+1

Define
n
X
sn = ak .
k=1
We can now write
B
X B
X B
X B
X
an f (n) = (sn − sn−1 )f (n) = sn f (n) − sn−1 f (n)
n=A+1 n=A+1 n=A+1 n=A+1
B
X B−1
X
= sn f (n) − sn f (n + 1)
n=A+1 n=A
B
X
= sB f (B + 1) − sA f (A + 1) − sn (f (n + 1) − f (n))
n=A+1

This is an analog of integration of parts.


41
14.4. Alternating series test. Here, a1 = 1, a2 = −1, · · · , an = ±1, sn = 0 or 1, f (n) is
monotone decreasing. Then
B
X B
X
an f (n) ≤ f (B + 1) + f (A + 1) + (f (n) − f (n + 1)) = 2f (A + 1).
n=A+1 n=A+1

This is precisely the alternating series test. Note that all that we needed was that the sn are
bounded.
Proposition 14.4.1. Given an ∈ C, with |sn | ≤ S. Suppose that f (n) is monotone decreas-
ing to zero. Then
X
an f (n)
converges (converges).
Proof. We just need to show that the partial sums form a Cauchy sequence, i.e.
B
X
an f (n) < ε
n=A+1

if A is sufficiently big. By partial summation, it is


B
!
X
≤S f (B + 1) + f (A + 1) + (f (n) − f (n + 1)) = 2Sf (A + 1).
n=A+1

By choosing A sufficiently large, we can make this less than ε. 


We have therefore proved that

X χ(n)
n=1
ns
converges conditionally if s > 0. All we have to show is that
n
X
sn = χ(k)
k=1

is bounded. But after every q values, the sum is zero, so


n
X
|sn | = χ(k) ≤ φ(q).
k=1

15. 11/10
15.1. Partial summation. We introduced the idea of partial summation: P If there is a nice
sequence of complex numbers an and some nice function f (n), and if sn = k≤n ak , then
B
X B
X B
X
an f (n) = f (n)(sn −sn−1 ) = sB f (B +1)−sA f (A+1)− sn (f (n+1)−f (n)).
n=A+1 n=A+1 n=A+1
42
Think of this as integration, i.e.
B Z B+ Z B+
X +
f (n)(s(n) − s(n − 1)) = f (t) d(st ) = f (t)st |B
(A+1)− − f 0 (t)st dt
n=A+1 (A+1)− (A+1)−
Z B+

=s B+
+
f (B ) − s (A+1)− f ((A + 1) ) − f 0 (t)st dt.
(A+1)−

The point is that if f is nice and differentiable, we can rewrite our sums as integrals.
Last time, we considered the alternating series test as a nice application of this. Here are
more applications:
15.1.1. Applications.
Proposition 15.1.1.  
X1 1
= log x + γ + O ,
n≤x
n x
where γ ≈ 0.577 . . . is Euler’s constant.
Proof. Here, X
st = 1 = [t].
1≤n≤t
an = 1 if n ∈ N, and f (t) = 1t . We are interested in
X 1 Z x+ 1 x+ Z x+ Z x+
1 1 [x+ ] [t]
= d([t]) = [t] − 2
[t]dt = + + 2
dt
n≤x
n 1− t t 1− 1− t x 1− t
  Z x+   Z x+
1 t − {t} 1 {t}
=1+O + dt = 1 + O + log x − dt.
x 1− t2 x 1− t2
Here, [t] denotes the integer part of t and {x} denotes the fractional part of t, and
Z x+ Z ∞ Z ∞  
{t} {t} {t} 1
2
dt = 2
dt − 2
dt = constant − O .
1− t 1 t x t x
We have therefore proved that
 Z ∞   
X1 {t} 1
= (log x) + 1 − 2
dt + O .
n≤x
n 1 t x
R∞
Let γ = 1 − 1 {t} t2
dt. It is unknown if γ is irrational. 
Remark.
Y e−x

1
1− ∼ .
p≤x
p log x
A similar method can be used to prove other formulas, such as Stirling’s formula. Another
problem is to show that
1 1 π2
ζ(2) = 2 + 2 + · · · = .
1 2 6
The point is to compute the sum of the first few terms can lead to a small error because
what is left isn’t just any random thing. See the homework for details.
43
15.2. L(s, χ). Let χ be a character (mod q), χ 6= χ0 . Then

X χ(n)
L(s, χ) = s > 1.
n=1
ns
We want an expression that makes sense even when s > 0. Let
X
Sχ (x) = χ(n).
n≤x

Then ∞
Z ∞ Z ∞ Z ∞
1 Sχ (t) Sχ (t) Sχ (t)
L(s, χ) = d(S χ (t)) = +s dt = s dt.
1− ts ts 1− 1− ts+1 1 ts+1
Note that
|Sχ (t)| ≤ φ(q) for all t ≥ 0.
Therefore, the preceding integral converges provided that s ≥ 0. If you thought of this as a
complex integral, s = σ + iy, this converges if σ = <s > 0. Note that we have omitted the
case where χ = χ0 , or the case of the Riemann zeta function. This can also be done for ζ(s);
see the homework. We know that ζ(s) must blow up at s = 1, but we’ll get an analog of this
feature to get something that makes sense for s > 0. This is basically analytic continuation.
The point is that if we consider as an example
1
1 + z + z2 + z3 + · · · = ,
1−z
and the sum makes sense when |z| ≤ 1, but the right hand side makes sense for z 6= 1. They
agree when both are well-defined, but one of them is more general. Happily, there is only
one way to do this.
Claim. If χ 6= χ0 and σ > 0, then L(σ, χ) is infinitely differentiable.
How do you even differentiate this once? Here,
∞ ∞ Z ∞
d X d −s log n  X − log nχ(n) log n
L(s, χ) = χ(n) e = s
= − s d(Sχ (t)).
ds n=1
ds n=1
n 1− t
This will be absolutely convergent if s > 1, but it actually converges for s > 0. Part of the
point here is that χ(n) has positive and negative signs.
 Z ∞ 0 Z ∞ Z ∞
Sχ (t) Sχ (t) Sχ (t)
s s+1
dt = s+1
dt + s (− log t) dt.
1 t 1 t 1 ts+1
Now, we should be reasonably happy that L(σ, χ) is once differentiable for all s > 0. To be
completely rigorously, we want to show
L(σ + δ, χ) − L(σ, χ)
− L0 (σ, χ) < ε.
δ
So what we’re claiming is that L(σ, χ) are very nice functions.
In particular, if σ is very close to one, we can use Taylor’s theorem:
(σ − 1)2 00
L(σ, χ) = L(1, χ) + (σ − 1)L0 (1, χ) + L (1, χ) + · · · .
2!
44
We go back to the prove of Dirichlet’s theorem. For σ > 1, we have
X 1 1 X
= χ(x) (log L(σ, χ)) + O(1).
pσ φ(q)
p≡a (mod q) χ (mod q)

If χ = χ0 , we know that
1
log L(σ, χ) = log ζ(σ) + O(1) = log + O(1).
σ−1
Therefore,
X 1 1 1 1 X
= log + x(a) log L(σ, χ) + O(1).
pσ φ(q) σ − 1 φ(q) χ6=χ
p≡a (mod q) 0

+
Now, as σ → 1 , L(σ, χ) → L(1, χ) is finite for χ 6= χ0 .
We make a key assumption that L(1, χ) 6= 0 for every χ 6= χ0 . If this is true, we are done
with Dirichlet’s Theorem.
15.3. L(1, χ). If χ is a complex character (χ 6= χ), then L(1, χ) 6= 0. Moreover, if χ is a real
character, then either L(1, χ) or L0 (1, χ) is not zero. (If there exists a zero at 1, it must be
a simple zero.)
Proof. Suppose that χ is complex. Then
L(1, χ) = 0 ⇔ L(1, χ) = 0
because
X χ(n) X χ(n) X χ(n)
=0⇔ = .
n n n
Take a = 1. Then
X 1 X 1 1 X
= log + log L(σ, χ) + O(1).
p≡1 mod q
φ(q) σ − 1 φ(q)
Now, suppose that L(σ, χ) has a zero of order mχ at σ = 1. This means that
L(1, χ) = L0 (1, χ) = · · · = Lmχ −1 (1, χ) = 0.
We want to say that mχ = 0.
We have
L(σ, χ) ≈ cχ (σ − 1)mχ .
Therefore,
!
X 1 1 1 X
= log + mχ log(σ − 1) + O(1)
p≡1 mod q
pσ φ(q) σ − 1 χ6=χ
0
  !
1 1 X
= log 1− mχ + O(1).
φ(q) σ−1 χ6=χ 0

Since there are a positive number of primes in this residue class, the right hand side needs
to be positive, so X
mχ ≤ 1. 
χ6=χ0
45
16. 11/15
Today we should finish the proof of Dirichlet’s Theorem.
Here’s a quick survey of what we’ve done so far.
16.1. Review. We’ve found Dirichlet characters χ (mod q) to isolate the arithemtic pro-
gressions. We’ve also defined absolutely convergent functions
∞  −1
X χ(n) Y χ(p)
L(s, χ) = = 1− s .
n=1
ns p
p
When x 6= χ0 , we can extend this to something that makes sense for s > 0. We did this last
time by writing Z ∞ Z ∞
1 sχ (y)
L(s, χ) = s
d(sχ (y)) = s dy.
1 y 1 y s+1
This is infinitely differentiable. Also, as s → 1, we have that L(s, χ) 6→ ∞. We just need to
show that it is nonzero.
Why do we want to do this?
X X X χ(pk )
χ(a) log L(s, χ) = χ(a)
k,p
kpks
x (mod q) χ (mod q)
X 1 X 1
= φ(q) = φ(q) + O(1).
k,p
kpks ps
p≡a (mod q)
pk ≡a (mod q)

We already know what happens for complex-valued characters. Let’s recap that. If s is a
real number σ, then all of the terms on the right hand side are positive. Take a = 1. Then
X X 1
log L(s, χ) = ks
≥ 0 if s > 1.
p,k
kp
x (mod q)
pk ≡1 (mod q)

Then Y
L(s, χ) ≥ 1.
p
+
Let s → 1 . This product contain one term that goes to infinity. This means that there can
only be at most one term that goes to zero. First, the product is real because they come in
conjugate pairs.
If L(1, χ) → 0, then by Taylor,
|L(s, χ)| ≤ C(s − 1) for s close to 1.
Now, if χ is a complex character, with L(1, χ) = 0, then L(1, χ) = 0 also, and
Y C
L(s, χ) ≤ C(s − 1)C(s − 1)C ≤ s − 1,
χ mod q
s − 1
where the right hand terms represent χ0 , χ, χ, and all other characters. This contradicts
that the product is at least 1.
If χ is a real characters, then in the same way (Taylor approximation), Twe see that
L(1, χ) and L0 (1, χ) can’t both be zero.
46
16.2. L(1, χ) 6= 0 for real characters χ. Now, we just need to show that if χ is a real
character (mod q), L(1, χ) 6= 0. We did several examples in the homework. This is the
hardest part of the proof. Dirichlet gave a beautiful proof of this: In half of cases, you get
something in terms of π, and in other cases, you get things like the golden ratio. We’ll discuss
that in the next several lectures. Here, we’ll give a slick proof that is harder to understand
but can be done more quickly.
Define X X
rχ (n) = χ(d) = χ(a).
d|n n=ab

This function rχ (a) is multiplicative. If (m, n) = 1, then


X X X
rχ (m)rχ (n) = χ(d1 ) χ(d2 ) = χ(d) = rχ (mn).
d1 |m d2 |n d|mn

Since this function is multiplicative, we only need to figure out what this does on prime
powers. So


 k+1 χ(p) = 1

1 χ(p) = 0
rχ (pk ) = 1 + χ(p) + χ(p2 ) + · · · + χ(pk ) =

 0 χ(p) = −1, k odd

1 χ(p) = −1, k even.

This in particular means that rχ (n) ≥ 0. In addition, rχ (n) ≥ 1 if n = m2 is a perfect


square.
Why do we care about this function? This is a nice function to look at.
Example 16.2.1. If χ is a character (mod 1), so that χ(n) = 1 for all n. Then rχ (n) = d(n).
We proved in the homework that d(n) is bounded by nε , and
X d(n)
= ζ(s)2 .
ns
Example 16.2.2.
X rχ (n) X X χ(a) 1 ∞ ∞
X χ(a) X 1
= = = L(s, χ)ζ(s).
ns n n=ab
as b s a=1
as b=1 bs

From this perspective, rχ (s) seems like a reasonable thing to consider.


Example 16.2.3. Consider χ = χ−4 . In this case,


 k + 1 p ≡ 1 (mod 4)

1 p ≡ 2 (mod 4)
rχ (pk ) =

 0 p ≡ 3 (mod 4), p odd

1 p ≡ 3 (mod 4), p even

This looks like writing numbers as the sum of two squares. Here,

2
 p≡1 (mod 4)
rχ (p) = 0 p≡3 (mod 4)

1 p=2
47
So we can interpret 4rχ (n) as the number of ways of writing n = x2 + y 2 . Another way to
think about this is prime factorization in the Gaussian integers. There are only eight ways
of writing p = ππ. The point is that rχ (n) is something that we should care about.
The idea of the proof is that x is something large. Consider
X rχ (n)
√ .
n≤x
n

• Get a lower bound for this, ≥ 12 log n + O(1).


• If L(1, χ) = 0, it is O(1), which is a contradiction, so L(1, χ) 6= 0.
For the first part, we have
X rχ (n) X 1 X 1 √ 1
√ ≥ = = log x + γ + O(1) = log x + O(1).
n m √ m 2
n≤x n=m2 ≤x m≤ x

because rχ (n) = 0 and rχ (m2 ) ≥ 1.

16.2.1. Interlude on the divisor problem. We want to consider


X
d(n).
n≤x

This is
X XX XX X jxk X x 
d(n) = 1= 1= = + O(1)
n≤x n≤x d|n d≤x n≤x d≤x
d d≤x
d
d|n
  
1
= x log x + γ + O + O(x) = x log x + O(x).
x
This procedure is very wasteful, since the error we get is not so good. Why is the error term
not so good? Approximation of the floor is not so good. This is bad when we know more
about the floor. √
Dirichlet (in a different context) proved an asymptotic formula with error O( x). This is
done using the hyperbola method. An example of a hyperbola is ab = x, we are interested in
counting lattice points lying below the hyperbola ab = x. Dirichlet’s idea is to pick a point
(A, B) on the hyperbola. We can count the points inside the hyperbola with a ≤ A, and we
have to add back the terms where A < a and b ≤ B. This gives us two cases.
Case 1:
X x   
X X X  1
1= 1= + O(1) = x log A + γ + O + O(A)
a,b a≤A a≤A
a A
b≤x/a
ab≤x
a≤A

= x log A + γx + O(B) + O(A),



and AB = x, so choosing A, B ≈ x gives us a nice error.
48
Case 2:
X X X X x 
1= 1= − A + O(1)
a,b b≤B A≤a≤x/b b≤B
b
A<a
b≤B
ab≤x
  
1
= x log B + γ + O − A(B + O(1)) + O(B)
B
= x log B + γx − x + O(A) + O(B).
Putting the two cases together, we see that
X X
d(n) = 1 = case 1 + case 2
n≤x a,b
ab≤x

= x log x + (2γ − 1)x + O(A + B) = x log x + (2γ − 1)x + O( x).
√ 1
by choosing A = B = x. Dirichlet conjectured that the√error should be O(x 4 +ε ). One way
is to replace a hyperbola with a circle. The bound of O( x) has been improved to O(x1/3 ),
but Dirichlet’s conjecture is still unknown.

16.2.2. Back to rχ (n). We use the hyperbola method to bound


X rχ (n) X X χ(a) 1 X χ(a) 1
√ = √ √ = √ √ .
n≤x
n n≤x n=ab
a b a,b
a b
ab≤x

We again split up into two cases a ≤ A and a > A, b ≤ B.


Case 1:
X χ(a) X 1
√ √
a≤A
a x b
b≤ a

For now, let’s assume



X 1  
1
√ =2 t+C +O √ ;
n≤t
n t
the proof will come soon via partial summation. Then the above sums are
√ X χ(a)
r r 
X χ(a)  x a X χ(a) 
A

= √ 2 +C +O =2 x +C √ +O √
n≤A
a a x a≤A
a a≤A
a x

Case 2:
X 1 X χ(a)
√ √ .
b≤B
b a
A<a<x/b

We don’t know much here.


49
Recall
Z ∞ X χ(n) Z ∞ 1
sχ (y)
L(s, χ) = s dy = + d(sχ (y))
1 y s+1 n≤z
n s
z y s

X χ(n)  sχ (y) ∞ Z ∞
sχ (y)
= s
+ s
+s dy
n≤z
n y z z y s+1
   Z ∞ 
X χ(n) |sχ (z)| φ(q)
= s
+O s
+O s s+1
dy ,
n≤z
n z z y
so  
X χ(n) φ(q)
L(1, χ) = +O
n≤z
n z
and  
X χ(n) φ(q)
L(1/2, χ) = √ +O √ .
n≤z
n z
In case 1, we then have that
√ X χ(a) X χ(a)  
A
2 x +C √ +O √
a≤A
a a≤A
a x
√ √
  
φ(q) 1
= 2 x L(1, χ) + O + L(1/2, χ) + O( √ ) + O(A/ x).
A A

When A = B = x and we assume L(1, χ) = 0, this is O(1).

17. 11/17
Today we are actually going to finish the proof of Dirichlet’s Theorem.

17.1. Finishing the proof. Let’s recall where we’re at. We want to show that L(1, χ) 6= 0
for real characters χ. This is quite hard.
We were looking at
X X
rχ (n) = χ(d) = χ(a).
d|n ab=n
2
Note that rχ (n) ≥ 0, rχ (n) ≥ 1 if n = m , and
X rχ (n) X 1 1
√ ≥ = log x + O(1).
n √ m 2
n≤x n≤ x

We will use this fact that we still need to prove later:



X 1  
1
√ =2 t+C +O √ .
n≤t
n t
We want an upper bound, and we used the hyperbola method. The point is to count all
points under a hyperbola containing the point AB = x, which requires
√ splitting into two
regions a ≤ A and a > A, b ≤ B. We will end up choosing A = B = x.
50
In the first case,
√ X χ(a)
X χ(a) X 1 X χ(a)  
A
√ √ =2 x +C √ +O √ .
a≤A
a b a≤A
a a≤A
a x
b≤x/a

In the second case, we will consider


X 1 X χ(a)
√ √
b≤B
b A<a≤x/b
b

We will keep a q a fixed constant and let x → ∞. We could write


Z ∞
sχ (y)
L(s, χ) = s s+1 dy,
1 y
where X
sχ (y) = χ(a).
a≤y

If you write
X χ(n) Z ∞
d(sχ (y))
L(s, χ) = + ,
n≤z
ns z+ ys
we want to show that the tail is small. Here, s > 0. To do this, integrate by parts:
Z ∞ ∞ Z ∞ Z ∞
d(sχ (y)) sχ (y) sχ (y) sχ (z + ) sχ (y)
s
= s
+s s+1
dy = − s
+s s+1
dy.
z+ y y z+ z+ y z z+ y

Since |sχ (y)| ≤ φ(q) = O(1), so this integral


Z ∞ Z ∞
sχ (y) φ(q) s φ(q)
s+1
dy ≤ s + φ(q) s+1
dy = 2 s .
z+ y z z y z
Therefore,
X χ(n) Z ∞  
d(sχ (y)) X χ(n) 1
L(s, χ) = + = +O .
n≤z
ns z+ ys n≤z
n s zs
We can hence genuinely approximate this by taking the first few terms of the series. We can
now have bounds for each of the two cases.
In the first case, we now have

       
1 1 1 A
2 x L(1, χ) + O + C L( , χ) + O √ +O √
A 2 A x


 
1 A x 1
=2 xL(1, χ) + CL( , χ) + O √ + +√ = O(1),
2 x A A

Plugging in A = x, and assuming to the contrary that L(1, χ) = 0, we get a net bounded
contribution from case 1.
Now, let’s look at case 2. We want to bound
X 1 X χ(a)
√ √ .
b≤B
b A<a≤x/b
b
51
Here, r !
1 X χ(n) b
L( , χ) = √ +O ,
2 n x
n≤x/b
and r !
1 X χ(n) 1
L( , χ) = √ +O ,
2 n≤A
n A
so that subtracting gives
r !
X χ(a) b 1
√ =O +√ .
a x A
A<A≤x/b

Therefore, case 2 gives a contribution of


r !   √ !
X 1 b 1 B B
√ O +√ =O √ +O √ = O(1)
b≤B
b x A x A

when we set A = B = x.
We can now put the cases together. We have therefore proved that if L(1, χ) = 0,
X rχ (n)
√ = O(1),
n≤x
n
which is a contradiction! 
Why doesn’t this work for complex characters? The lower bound needed rχ (n), which we
explicithly computed to be real and taking nice values. You can probably make it work with
complex numbers, but it would take a bit of work.
There’s one thing that we forgot to check:
Z t P x
d( n≤y 1) 1 x [y]
X 1 Z
[y]
√ = √ = √ + 3/2
dy.
n≤t
n 1 − y y 1 − 2 1 y

17.2. Why is L(1, χ−4 ) = π4 ? We saw that



2 p ≡ 1 (mod 4)

rχ−4 (p) = 1 p = 2

0 p ≡ 3 (mod 4).

17.2.1. Counting ways to write as sum of two squares. This is (almost) counting the number
of ways of writing p = x2 + y 2 . If p ≡ 3 (mod 4), we proved that this is not possible. If
p = 2, there are four ways to do this: (±1)2 + (±1)2 . Note that 4 = rχ−4 (2). We claim that
when p ≡ 1 (mod 4), there are 8 ways.
Write
p = x2 + y 2 = (x + iy)(x − iy) = π1 π 1
where x + iy and x − iy are primes in Z[i]. We have unique factorization, so this factorization
is unique up to units.
If π is a prime in Z[i] with norm N (π) = p, then p = x2 + y 2 = (x + iy)(x − iy), then
either x + iy = (±1 or ± i)π or x − iy = (±1 or ± i)π. This gives our desired 8 solutions.
52
Therefore, 4rχ−4 (p) gives the number of ways of writing p as a sum of two squares. This
also works more generally.
If p ≡ 3 (mod 4), there are 4 ways to write p2 as a sum of two squares: (±p)2 + 02 and
0 + (±p)2 , and indeed, rχ−4 (p2 ) = 1.
2

If p ≡ 1 (mod 4), then


r2 π 2 = p2 = x2 + y 2 = (x + iy)(x − iy).
Either r2 | (x + iy) or π 2 | (x − iy) or x + iy = unit · p and x − iy = unit · p, which gives us
a total of 12 solutions.
If n = pq, p, q ≡ 1 (mod 4), we have p = p0 p0 , q = q 0 q 0 . If pq = (x + iy)(x − iy), then
p q | (x + iy) or p0 q 0 | (x + iy) or p0 q 0 | (x + iy) or p0 q 0 | (x + iy).
0 0

Now, by examining the prime factorization, we can see that the number of ways of writing
n = x2 + y 2 is 4rχ−4 (n).
Now, consider X
4rχ−4 (n).
n≤x

We will use the hyperbola method to connect this to L(1, χ−4 ). We can also write this as
X X X
4rχ−4 (n) = {(a, b) : a2 + b2 = n} = 1,
n≤x n≤x (a,b)
a2 +b2 ≤x

which is the number of lattice points inside of a circle of radius x. How many integer points
should lie in a circle? This should roughly be

= area + O(circumference) = πx + O( x).
Here, it seems that the error should actually be better: there’s a significant amount of
cancellation. Things should work out nicely, and it seems like the error should only be
O(x1/4+ε ). This is a conjecture called Gauss’s Circle Problem, and it is closely related to
Dirichlet’s Divisor Problem. We already know that O(x1/3 ).
17.2.2. Hyperbola method again. Consider any character χ 6= χ0 (mod q). Then
X X X X X X
χ(a) = rχ (n) = χ(a) 1+ χ(a).
ab≤x n≤x a≤A b≤x/a b≤B A<a≤x/b

In case 1,
X x  1 x
χ(a) + O(1) = x(L(1, χ) + O( )) + O(A) = xL(1, χ) + O(A + ),
a≤A
a A A

and we again choose A = x.
For case 2, !
X X X
χ(a) = O φ(q) = O(B),
b≤B A<a≤x/b b≤B

so therefore, X √
rχ (n) = xL(1, χ) + O( x),
n≤x
53

choosing A = B = x. Then when χ = χ−4 , we have
X √ √
4 rχ−4 (n) = 4xL(1, χ) + O( x) = πx + O( x).
n≤x

Therefore, L(1, χ−4 ) = π4 . Dirichlet found this proof, and he found how this generalizes.
17.2.3. Binary quadratic forms. This is something of the form
f (x, y) = ax2 + bxy + cy 2 ,
a, b, c ∈ Z, a > 0. For example, x2 + y 2 . We say that such a form is primitive if (a, b, c) = 1.
The discriminant is b2 −4ac, which is what we get when we try to complete the square. Here,
4af (x, y) = 4a2 x2 + 4abxy + 4acy 2 = (2ax + by)2 − dy 2 .
If d < 0, then the binary quadratic form is positive definite. If d > 0, then the form is
indefinite, taking positive and negative values. In the case d = m2 , we get a degenerate
situation (2ax + by + my)(2ax + by − my). We only care about the case d < 0. For example,
2 2
√ x + y , we have d = −4.
in the case
In Q( 5), we have
√ ! √ !
1+ 5 1− 5
= −1,
2 2
so the golden ratio is invertible here. The structure is more complicated for d > 0 than for
d < 0.
The plan is to understand all quadratic forms of a given discriminant. There should be
lots of such quadratic forms. There is are three variables d = b2 − 4ac and one equation, so
there should be lots of solutions. Just like x5 + y 5 = z 5 .
As an example, x2 + y 2 = (x + y)2 + y 2 = x2 + 2xy + y 2 . Of course, these are “the
same” because there is a nice change of variables. It will turn out that there is only one of
discriminant −4, and the number of quadratic forms is called the class number.

18. 11/29
The final will be Monday at 8:30am, and it will cover everything through last week.
18.1. Review of last lecture. We considered χ (mod 4). Then
4rχ (n) = #{n = x2 + y 2 }.
Then X √
4 rχ (n) = #{(x, y) ∈ Z2 : x2 + y 2 ≤ x} = πx + O( x).
n≤x
In addition, we know that X
rχ (a) = χ(a).
ab=n
Using the hyperbola method, we proved that
X √
rχ (n) = 4xL(1, χ) + O( x),
n≤x

and comparing our two formulas gives L(1, χ) = π4 .


54
18.2. Binary quadratic forms.
Definition 18.2.1. Binary quadratic forms are expressions of the form f (x, y) = ax2 +bxy +
cy 2 , which has discriminant D = b2 − 4ac.
Completing the square yields
4af (x, y) = (2ax + by)2 − Dy 2 .
If a > 0 and D < 0, we see f (x, y) ≥ 0 is positive definite. If D > 0, then f (x, y) is indefinite.
If D is a square, then the expression above factors into two linear terms, and we consider
this case to be degenerate.
We will mainly focus on the case of D < 0. The form is now positive definite. Given a
quadratic form, what numbers can be expressed by it?
Question. Describe the numbers represented by a positive definite binary quadratic form
f (x, y).
Definition 18.2.2. We say that f (x, y) is primitive if (a, b, c) = 1.
We will assume throughout that f is primitive.
Definition 18.2.3. n is primitively represented by f (x, y) if n = f (r, s) with (r, s) = 1.
We can of course think of a quadratic form in terms of a matrix.
  
2 2
 a b/2 x
f (x, y) = ax + bxy + cy = x y .
b/2 c y
We can make some simplifications on our quadratic form as follows. If we take a form
x2 +y 2 , we can do a change of variable x = X+Y and y = Y , which gives us the new quadratic
form X 2 + 2XY + Y 2 . Both of these quadratic forms have D = −4, and they represent the
same numbers – any number that can be represented by one form can also be represented
by the other. If x, y are integers, then so are X, Y , and conversely. Understanding one is the
same as understanding the other; they are equivalent.
This is really like doing linear algebra. We want to do a change of basis replacing (x, y)
by a matrix times (x, y). We need this matrix to be invertible over the integers.
Definition
 18.2.4.
 A quadratic form f (x, y) is equivalent to a form g(X, Y ) if there is a
α β
matrix ∈ SL2 (Z) with α, β, γ, δ ∈ Z and the determinant 1 is αδ − βγ = +1 such
γ δ
that with     
x α β X
=
y γ δ Y
we have f (x, y) = g(X, Y ).
We have  −1  
α β δ −β
=
γ δ −γ α
1In order for the matrix to be invertible over the integers, we need the determinant to be ±1. We further
restrict it to be +1 in this case.
55
so this is obviously invertible over the integers:
    
X δ −β x
= .
Y −γ α y
We claimed that this is a equivalence relation. Something is equivalent to itself via the
identity matrix, and transitivity follows because such matrices form a group via multiplying
the two change of basis matrices.
 α β T a b/2
        
 a b/2 x α β X
x y = X Y ,
b/2 c y γ δ b/2 c γ δ Y
where  T   
α β a b/2 α β
γ δ b/2 c γ δ
 
A B
is the matrix for g. If g ∼ h via the matrix then the matrix for h is
C D
 T  T    
A B α β a b/2 α β A B
C D γ δ b/2 c γ δ C D
Remark. So why do we want to make this definition? What is the point of this? If two
quadratic forms are equivalent then their discriminants are the same.
Example 18.2.5. What are the quadratic forms discriminant −4? There are a lot of
 of 
α β
these. Starting from x2 + y 2 , pick any matrix ∈ SL2 (Z), which gives a change of
γ δ
variable
x = αX + βY
y = γX + δY
and yields a new equivalent quadratic form
(αX + βY )2 + (γX + δY )2
of discriminant −4.
We will reduce all quadratic forms to a finite class of inequivalent quadratic forms.
Theorem 18.2.6. There are only finitely many inequivalent classes of positive definite binary
quadratic forms of a given discriminant.
18.3. Proof of the theorem. We want to give an algorithm to make the coefficients of the
quadratic form as small as possible; we like working with quadratic forms like x2 + y 2 . This
is called induction theory.
18.3.1. Reduction theory. Consider
ax2 + bxy + cy 2
with the assumption that a > 0, c > 0, and D = b2 − 4ac < 0. Every such form is equivalent
to one with |b| ≤ a ≤ c.
Denote the above quadratic form by (a, b, c) = ax2 + bxy + cy 2 . There are two operations
that we want to do.
56
Operation I:
We want to flip x and y to get x = Y and y = X. This matrix has determinant
  −1, so this
0 −1
doesn’t quite work. Instead, take x = −Y and y = X, yielding matrix ∈ SL2 (Z).
−1 0
This means that (a, b, c) ∼ (c, −b, a).
Operation II:
The other   is replacing x = X + nY and y = Y for some n ∈ Z. This corresponds
operation
1 n
to a matrix , which clearly has determinant 1. The inverse is given by Y = y and
0 1
 
1 −n
X = x − ny, with matrix . Under this operation, we have
0 1
ax2 + bxy + cy 2 = a(X + nY )2 + b(X + nY )y + cY 2 = aX 2 + (2an + b)XY + (an2 + bn + c)Y 2 .
This yields the equivalence
(a, b, c) ∼ (a, 2an + b, an2 + bn + c).
Algorithm 18.3.1. Start with (a, b, c).
(1) Choose n so that |2an + b| ≤ a and use operation II. We get (a1 , b1 , c1 ) with |b1 | ≤ a1 .
(2) If c1 ≤ a1 , use operation I to flip (c1 , −b1 , a1 ).
(3) Repeat as needed.
This clearly terminates because one of the variables always decreases. This can also be
seen from the following nice geometric process.
18.3.2. Geometric view. Let H be the upper half plane, so that
H = {x + iy : y > 0}.
 
a b
Consider any matrix ∈ SL2 (R), with a, b, c, d ∈ R and ad − bc = 1. This matrix can
c d
act on the upper half plane via the Mobius transformation
 
a b az + d
z= ,
c d cz + d
and
az + d y
= = .
cz + d |cz + d|2
Every point of H is equivalent under SL2 (Z) to a point
Draw the lines x = 1/2, x = −1/2, and the unit circle. Take the region between the two
vertical lines and above the circle. This region is called the fundamental domain.

Now,  
1 n
z = z + n,
0 1
57
so the first step of the algorithm moves z until it lies between the vertical lines x = 1/2 and
x = −1/2. Now,  
0 −1 1
z=− .
1 0 z
We flip this point, causing it to lie above the circle (but possibly messing up the real coor-
dinate), and we repeat.
The argument on binary quadratic forms is exact the same as the algorithm to put

−b + D
2a
inside this fundamental domain.
Given (a, b, c) and if |b| = a choose b > 0, then we have
(a, a, c) ∼ (a, −a, c) if a < c
and if a = c if
(a, b, a) ∼ (a, −b, a).
Definition 18.3.2. A binary quadratic form is called reduced if |b| ≤ a ≤ c and
(1) if |b| = a then choose b > 0
(2) if a = c then choose b ≥ 0.
Every positive definite binary quadratic form is equivalent to a (unique) reduced form.1
Two reduced forms are inequivalent (to be justified later).
We want to compute all reduced forms of a given discriminant. We will give an upper
bound for a, giving a finite number of choices for a, b. Then c is fixed by the discriminant
D = b2 − 4ac, D < 0. For a reduced form (a, b, c), we have |D| = 4ac − b2 ≥ 4ac − a2 ≥
4a2 − a2 = 3a2 , so therefore r
|a|
a≤ .
3
There are only finitely many choices for a. For each, there are a finite number of choices for
b and hence for c.
The number of real binary quadratic forms of a given discriminant D is called the class
number h(D). We’ve shown that this is a finite number.
Example 18.3.3. D = −4. We require
r
4
a≤ =⇒ a = 1.
3
This means that |b| ≤ 1 and b2 − 4ac = −4. Note that b has the same parity as the
discriminant, so b is even and hence b = 0 and c = 1. So there is only one quadratic form of
discriminant −4, and this is x2 + y 2 . The class number is 1.
Example 18.3.4. D = −3. We now want
r
3
a≤ =⇒ a = 1.
3
Then b is odd and |b| ≤ 1 =⇒ b = ±1. Then c = 1. But we said that if a = |b| then we
choose b ≥ 0, so x2 +xy +y 2 is the unique equivalence class of quadratic forms of discriminant
−3.
58
Note that D = −5 is impossible; the only allowed discriminants are those equivalent to 0
or 1 (mod 4).
18.4. Sum of two squares revisited. We again prove that if p ≡ 1 (mod 4) then p =
x2 + y 2 .
Proof. We have  
−1
=1
p
and  
−4
= 1,
p
so −4 ≡ n2 (mod p), so then −4 = n2 − pc. So we get a solution to −4 ≡ n2 (mod 4p) by
the Chinese Remainder Theorem. This gives −4 = n2 − 4pc. Consider the quadratic form
px2 + nxy + cy 2 of discriminant −4. 
We will also consider forms of the form x2 + 3y 2 .

19. 12/1
19.1. Reduced binary quadratic forms. Last time we were looking at binary quadratic
forms ax2 + bxy + cy 2 where (a, b, c) = 1, a, c > 0, and b2 − 4ac < 0. We proved the following:
Theorem 19.1.1 (Reduction Theory). Each form is equivalent to a form with |b| ≤ a ≤ c
and D = b2 − 4ac with the caveat that if |b| = a then choose b positive, and if a = c then
choose b positive.
Keep in mind that b has the same parity as the discriminant D and D ≡ 0, 1 (mod 4).
Last time, we gave an algorithm for producing reduced forms. We didn’t prove, however,
that two reduced forms are inequivalent, and we sketch the proof here. It’s a sketch because
it’s like a calculus exercise.
Proof. Suppose we have a reduced form f (x, y) = ax2 + bxy + cy 2 . We gave the bound last
time of r
|D|
a≤ .
3
What is the smallest value represented by f ? We want to show that this is a = f (±1, 0).
Other numbers that are represented are c = f (0, ±1) and a + b + c = f (1, 1). Choose ±1
and ±1 such that a − |b| + c = f (±1, ±1).
The smallest number represented is a and the second smallest number that is properly2
represented is c, and the third smallest properly represented is a − |b| + c. This is left as an
exercise to think through. The general idea is that
 2
x + y2
    
2 2 |b| 2 |b|
f (x, y) ≥ ax − |b| + cy ≥ a − x + c− y 2 ≥ (a − |b| + c) min(x2 , y 2 ).
2 2 2
Given this fact, two reduced forms must be inequivalent, because the smallest numbers
that they represent will give the coefficients a and c and then b. There are a few special
cases to think through – what happens for a = c? 
2Proper means (x, y) = 1.
59
Now we’ve given a complete theory of producing inequivalent reduced forms. The number
of reduced forms is the class number, denoted by h(D).
Example 19.1.2. When D = −4, we have h(−4) = 1, and the only reduced form is x2 + y 2 .
Example 19.1.3. When D = −3, we have h(−3) = 1, and the only reduced form is
x2 + xy + y 2 .
Example 19.1.4. When D = −7, we want
p
a ≤ 7/3 =⇒ a = 1.
In addition, |b| ≤ a and b is odd so b = 1. Then by the discriminant, we get c = 2 and the
only reduced form is x2 + xy + 2y 2 .
Example 19.1.5. For D = −8, then as above, a = 1, and b must be even so b = 0, and the
only reduced form is x2 + 2y 2 .
Example 19.1.6. D = −12. Here, a ≤ 2. If a = 1 then b = 0, and we get x2 + 3y 2 as a
reduced form. If a = 2, then b can be 0 or 2. If b = 0, we cannot get a value for c, while if
b = 2, we get c = 2, and so we get 2x2 + 2xy + 2y 2 = 2(x2 + xy + y 2 ), which isn’t primitive.
In this case, we also see that h(−12) = 1.
There are only finitely numbers with class number 1, so let’s do an example where the
class number is more than one.
Example 19.1.7. D = −20. Here, a ≤ 2. In the case a = 1, we have b = 0, so c = 5. This
is the form x2 + 5y 2 .
In the case a = 2, b must be even, so b = 0 or b = 2. If b = 0 then −20 = −4 × 2 × c is
not possible, and if b = −2 then c = 3, and we get a second reduced form 2x2 + 2xy + 3y 2 .
Therefore, h(−20) = 2.
We see that as the discriminant gets larger, we have to consider more and more cases, and
there’s a good chance something works.
Remark. If D ≡ 0, 1 (mod 4) then h(D) ≥ 1. Why?
If D ≡ 0 (mod 4) then use x2 − D4 y 2 , and if D ≡ 1 (mod 4) then use x2 + xy + 1−D 2
4
y .
Why is the theory of binary quadratic forms very pretty?
Let n be odd, and suppose that (n, D) = 1. We want to know: Can n = f (x, y) for some
binary quadratic form f of discriminant D and (x, y) = 1?
Suppose that p and q are coprime, and f (p, q) = n. We go back to the Euclidean Algorithm
to claim that we can find r and s such that
 
p r
∈ SL2 (Z).
q s
If we make a change of basis     
x p r X
= ,
y q s Y
we get a transformation
f (x, y) →g(X, Y ) = nx2 + (B)xy + (C)y 2
f (p, q) →g(1, 0).
60
Since f and g has the same discriminant, we must have D = B 2 − 4Cn, which means that
D is congruent to a square (mod n).
This is actually an equivalent condition. Conversely, if D is a square (mod n), then n is
represented properly by a binary quadratic form of discriminant D. This is because we can
write D = b2 − (·)n. We would love to have (·) divisible by 4. How can we rig it so that it is
even? We want b to have the same parity of D. We can rewrite the previous expression by
D = (b + n)2 − (·)n. Since n is odd, b or b + n has the same parity as D. So we do have a
solution to D = b2 − 4nc. Then nx2 + bxy + cy 2 is a form of discriminant D and it represents
n. This is actually a beautiful theorem.
This gives us a lot of consequences.
Example 19.1.8. D = −4. When is −4 congruent to a square (mod n)? If n = p is prime
then p ≡ 1 (mod 4).
Example 19.1.9. D = −8, with form x2 + 2y 2 . When is p = x2 + 2y 2 ? We want −8 to be
a square (mod p), which requires
  (
−8 3 (mod 8)
= 1 =⇒ p =
p 1 (mod 8).

Remark. Quadratic reciprocity told us that −8



p
depends on p (mod 8).
In general, if p is an odd prime, (p, D) = 1, then when is Dp = 1? This only depends (by


quadratic reciprocity) on p (mod |D|).


Example 19.1.10. D = −12. There is one reduced form x2 + 3y 2 . To be represented by
this form p = x2 + 3y 2 , we want −12 = 1 ⇔ −3
 
p p
= 1, which means that p ≡ 1 (mod 3) by
quadratic reciprocity. For example, if p = 1 (mod 4), we see that p3 = p3 = 1.
 

Example 19.1.11. D = −20. We have two reduced forms x2 + 5y 2 and 2x2 + 2xy + 3y 2 .
If we choose a prime p 6= 2, 5, we want to ask if p can be properly represented by a form of
discriminant −20, which requires −20p
= 1.
Suppose that p ≡ 1 (mod 4). Then we want
   
5 p
=1= =⇒ p ≡ 1, 4 (mod 5)
p 5
In the case that p ≡ 3 (mod 4), so we want
     
−20 5 p
= 1 =⇒ = −1 = =⇒ p ≡ 2, 3 (mod 5).
p p 5
Combining these conditions, we see that the primes that work are p ≡ 1, 3, 7, 9 (mod 20).
For these p, either p = x2 + 5y 2 or p = 2x2 + 2xy + 3y 2 .
Eventually, this kind of statement is all you can say. But here we have an extra piece of
luck: p = x2 + 5y 2 requires p ≡ 1, 4 (mod 5), while p = 2x2 + 2xy + 3y 2 = (2x + y)2 + 5y 2
requires p ≡ 2, 3 (mod 5).
Now, if p ≡ 1, 9 (mod 20) then p = x2 + 5y 2 , and if p ≡ 3, 7 (mod 20) then p = 2x2 +
2xy + 3y 2 . Euler wrote down these types of results and Gauss did the general theory, which
he called genus theory.
61
We want to connect this back to L-functions. This related to why we get nice values like
π
4
.
19.2. Relation to L-functions. If n is odd and (n, D) = 1, what does it mean to say that
D is a square (mod n).
Suppose that n = p1 . . . pk (square-free). Then pDi = 1 for each pi , i = 1, 2, . . . , k.


Notice that
k   
Y D
1+ =0
i=1
pi
unless D is a square (mod p1 . . . pk ). Multiplying this product, there are 2n terms, sort of
like the divisor function.
Extend the Legendre symbol to all numbers via
  Y  α
D D
= .
n α
p
p ||n

Note that this is completely multiplicative and periodic, so it is a character (mod |D|). There
are a few things to be checked here. Applying this,
k    X  
Y D D
1+ =
i=1
pi n=ab
b
Assume that D < 0, and D is called a fundamental discriminant, which means that
D 6= a2 b with b is a discriminant and a > 1. For example, −12 = −3 · 4 is not a fundamental
discriminant, but −20 is a fundamental discriminant even though −20 = −5 · 4 because −5
is not a discriminant.
Then we have χ(n) = Dn is a real character (mod |D|). Then

X
rχ (n) = χ(b),
ab=n

and
2rχ (n) = #{n = f (x, y) : f over all reduced forms of discriminant 1, and (x, y) ∈ Z2 }.
(We have two special cases. If D = −3, use 6rχ (n), and if D = −4, use 4rχ (n).)
Now, like we did with characters (mod 4), we use the hyperbola method to get that
X
2rχ (n) ∼ 2L(1, χ)x.
n≤x

This is also equal to X X


1.
f =red. bin. quad. forms (x,y)
ax2 +bxy+cy 2 f (x,y)≤X

This is an ellipse ax2 + bxy + cy 2 ≤ X. How many lattice points are inside the ellipse? The
answer is roughly the area of the ellipse, and we can work this out. We get
X X 2πX
1∼ p .
f =red. bin. quad. forms (x,y)
|D|
ax2 +bxy+cy 2 f (x,y)≤X
62
It’s the same answer for every reduced binary quadratic form, so we end up getting
X 2πX
2rχ (n) ∼ 2L(1, χ)x ∼ p h(D),
n≤x
|D|
so for D < −4, we get
πh(D)
L(1, χ) = p ,
|D|
which is an amazing theorem of Dirichlet. This tells us why the some of the L-functions
are nonzero. In the other case, we count lattice points inside a hyperbola and do the same
thing.
19.3. Why is n2 + n + 41 a prime? We can show that the class number of D = −163 is
−1. This is rather surprising. We want
r
163
a≤ =⇒ a ≤ 7.
3
So we have to check that a = 2, 3, 4, 5, 6, 7, and b is odd. It’s not too bad, we just have to
check. We get that the only reduced quadratic form x2 + xy + 41y 2 .
So then n2 + n + 41 = f (n, 1). If n ≤ 39, then we can check that f (n, 1) = 41. Suppose
that f (n, 1) is composite. This means that there exists a prime p < 41 with p | f (n, 1),
which means that −163 is a square (mod n), which means that −163 is a square (mod p),
which means that p is represented by some form of discriminant −163. But this form cannot
represent numbers between 1 and 41, so we’re done.
This actually generalizes. If h(1 − 4A) = 1, then n2 + n + A is prime for n ≤ A − 1. Sadly,
the largest value for which this works is −163. This is Gauss’s class number problem, and
it was solved in the 1950s.
E-mail address: [email protected]

63

You might also like