0% found this document useful (0 votes)
23 views147 pages

Number Theory 2024

Uploaded by

mangofarmergoose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views147 pages

Number Theory 2024

Uploaded by

mangofarmergoose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 147

MATH3304

Introduction to Number Theory


Dr. Ben Kane
Mr. Jincheng Tang

Department of Mathematics
The University of Hong Kong

Second Semester 2023-2024

Date: January 13, 2024

1
Contents
1. Practical Information 3
2. Introduction 6
3. Divisibility, prime factorization, and the Euclidean algorithm 8
3.1. Optional: some facts from abstract algebra about the integers 8
3.2. Division algorithm 8
3.3. Optional: Euclidean domains 17
4. Congruences and residue classes 20
5. Primality tests and cryptography 30
6. Solving congruences and the Chinese remainder theorem 40
7. Primitive roots and cyclic groups 50
8. Quadratic reciprocity 62
9. Multiplicative arithmetic functions 74
10. Sums of two squares 81
11. Representations of real numbers via continued fractions 90
12. Approximation of real numbers with rational numbers 98
13. Periodic continued fractions 104
14. Primes and their distribution 113
15. Elliptic Curves 136
16. Partitions 143

2
1. Practical Information
• Instructor
– Dr. Ben Kane
– Email: bkane[at]hku.hk
– Office: Run Run Shaw Building A411
– Consultation hours: Tuesdays 10:00-13:00
• Tutor: Mr. Jincheng Tang
– Email: tangent[at]connect.hku.hk
– Office: Run Run Shaw Building A215
– Consultation hours: Wednesdays 12:00–14:00
• Moodle Website: _MATH3304_2A_2023
Grade assessment

(1) The final exam is worth 50% of the final grade.


(2) There will be two midterm exams, each is worth 20%, for a total of 40% of the
final grade.
(3) Homework assignments
Homework constitutes 10% of your final grade. For homework assignments,
only the grades A, B, C, D, and F will be given by the tutor. The grades
are cumulative throughout the semester, so the exact score will contribute to
your final grade (not the letters A, B, C, D, and F, but the total scores for all
assignments are kept in our records).
(4) Historical data indicates that the course grades will be determined with a grading
scale roughly like the one below:
Grade Score
+
A 90%–100%
A 84%–89%

A 80%–83%
+
B 76%–79%
B 69%–75%

B 64%–68%
+
C 60%–63%
C 53%–59%
C− 49%–52%
D 40%–48%
F < 40%.
Note: The above is only a guideline. If you attain the scores on the right-hand
side of the above list, you should expect to obtain (at least) the grade listed on
the left, but the boundaries may be curved later in the semester depending on
the distribution of scores across the entire course.
3
Homework Assignments
(1) Please drop your work in the assignment box marked Math3304 on the 4th floor
of Run Run Shaw Building or submit it via the Moodle page.
(2) Homework is due (roughly) every two weeks on Wednesdays and the assignment
is to be turned in by 19:00 on the due date. No late work will be accepted.
(3) Please show your work! Answers without any of the steps shown will receive no
credit.
(4) You are permitted (and even encouraged!) to discuss the homework problems
with your classmates. However, you are responsible for your own work and
each student is expected to write down the solutions in their own words! Pho-
tocopies of other students’ solutions, combined solutions for multiple students,
and plaguarized solutions will not be accepted.
Lectures
Lectures (with typed slides based on the lecture notes) were recorded during the
pandemic period and will be made available for your benefit. Nevertheless, the
course will be conducted in person and you are highly recommended to attend and
actively participate. Asking questions can help clarify any misunderstandings that
you may be having. Moreover, indications from the past few semesters indicate that
many students try to catch up with the recorded lectures directly before exams and
fall behind in the course, ultimately not performing well on the midterms and final
exam.
The lectures will be held on Mondays from 9:30–11:20 and Thursdays from 9:30–
10:20, except for periods of class suspension for public holidays, etc. Lecture will be
held in the Main Building room 151 (MB151).
Tutorials
The tutorial information is given below:
Day of week Time Room
Wednesdays 10:30–11:20 LE4
Abbrev. Full building name
LE Library Extension
The tutorials will begin January 31, 2024.
Exams
There will be two midterm exams. The first midterm exam is tentatively scheduled
for February 26, 2024 and the second is preliminarily scheduled to take place on
April 15, 2024. Calculators will be allowed during the midterm exams.
Course Description
This course aims to give a background in elementary number theory, which roughly
speaking is the study of integers and counting. We investigate the area through
formal proofs, using some techniques from real analysis and connecting to certain
4
concepts in abstract algebra (a course in abstract algebra is not assumed, so any
necessary topics from abstract algebra will be introduced and explained within the
context of the course material).
Outline of Topics covered
The following will be covered in this course:
(1) Divisibility and congruences
(2) Legendre symbols and quadratic congruences
(3) Quadratic reciprocity
(4) Elliptic curves
(5) Prime number theorem
(6) Prime numbers and cryptography
Course material
• Lecture notes: The lecture notes are the main reference material for this
course. They may be downloaded from Moodle. For another perspective
or if you prefer a different style, you may also want to look at one of the
suggested readings below, however.
• Suggested reading: Kenneth H. Rosen, Elementary number theory, 6th edi-
tion, Pearson, 2010.
• Suggested reading: J. H. Silverman, A friendly introduction to number the-
ory, Prentice Hall, 2001.
• Suggested reading: Lo-Keng Hua, Introduction to number theory, Springer-
Verlag, 1982.
Student responsibility
(1) It is your responsibility to attend the lecture. New material usually depends on
earlier material, so you should be warned that missing one lecture or skipping
forward in the video lectures may result in difficulty with subsequent lectures.
(2) It is highly recommended to attend tutorials and be active in the group discus-
sion. Please come with questions that you are having about the lecture and/or
the homework. We are here to help you! Moreover, students who attend tuto-
rials perform noticably better on exams.
(3) When in doubt, please come to our consultation hours (listed above). Many
times a one-on-one session can result in clearing up that small detail you couldn’t
quite understand the first time. Since material builds on itself, this small detail
might have been holding you up on newer material, too.

5
2. Introduction
What is number theory? Here are some typical questions
(1) Diophantine geometry:
Suppose that you are given a polynomial
F = F (x1 , . . . , xn )
with coefficients in Z.
What are the set of solutions to
F (x) = 0,
where xℓ come from a certain set? In geometery, one might ask about x ∈ Cn ,
in analysis, one might ask about x ∈ Rn , but in number theory, one asks for
solutions with x ∈ Zn , x ∈ Qn , or x ∈ Fnq (for q a prime).
Examples 2.1.
• Which natural numbers are the sum of two (integer) squares? In other
words, for which n ∈ N do there exist x, y, ∈ Z with
x2 + y 2 = n?
The corresponding polynomial would be F (x, y) := x2 + y 2 − n. For small
n, one can check directly:
3 no
5 = 12 + 22
7 no
..
.

In this example, for some small n we have found solutions directly by plug-
ging in, but it turns out that one can find a full characterization of the
set
n ∈ N : ∃x, y ∈ Z, x2 + y 2 = n .


To give the flavour of the answer, if we restrict to primes, then we have the
answer
p prime : ∃x, y ∈ Z, x2 + y 2 = p = {p prime : p ≡ 1 (mod 4)} .


This is the kind of question looked at by a number theorist, and we will


show this later in the course.
• Another famous question is Fermat’s Last Theorem, which was famously
proven by Andrew Wiles. Fermat considered equations of the following type
(x, y, z ∈ N):
xn + y n = z n x, y > 0.
6
Fermat claimed in 1637 that there are no solutions for n ≥ 3 (note that
there are many solutions for n = 2 coming from right triangles).
This problem went unsolved for many centuries and was only solved in the
1990s by Andrew Wiles using complicated mathematics including algebraic
geometry. This is a good example of a common phenomenon in number
theory; there are many problems which are very simple to state and under-
stand the question (this is not as common in many areas of mathematics),
but the solutions to these problems turns out to be very difficult and uses
a lot of complicated mathematics.
(2) Problems on Prime numbers
We will later give a precise definition of prime numbers. Here are some questions
about primes that naturally arise.
• Are there infinitely many prime numbers? The answer turns out to be yes
and this is not very difficult to prove.
• Twin primes: A twin prime is a pair p and p + 2 such that both p and
p + 2 are prime.
Examples 2.2.
(3, 5), (5, 7), . . .
Question: Are there infinitely many twin primes?
Although this question is very easy to state, it is still unknown whether
there are infinitely many or finitely many twin primes, and this is a famous
open problem (i.e., nobody knows how to solve this, but some number
theorists are working hard on it).
• It is also common in number theory to ask how many primes/solutions
there are. This is known as the study of asymptotics. For example, the
number of primes p ≤ X, that is to say the size of the set
{p ≤ X : p is prime},
turns out to be (by a famous theorem known as the Prime Number Theo-
rem) approximately X/ log(X). More precisely, we mean that
#{p ≤ X : p is prime}
lim X
= 1.
X→∞
log(X)

Since we don’t know if there even exist infinitely many twin primes, we
can’t say how many there are up to X.
Similarly, one can also determine the asymptotic growth of the size of the
set
n ≤ X : ∃x, y ∈ Z, x2 + y 2 = n .


7
3. Divisibility, prime factorization, and the Euclidean algorithm
3.1. Optional: some facts from abstract algebra about the integers.
• The set Z with the usual addition and multiplication is a commutative ring
with identity.
To recall:
A ring R is a non-empty set with two binary operators + and · which satisfy
the following:
(i) The pair (R, +) form an abelian group.
∗ The set R is closed with respect to addition.
∗ The operator + satisfies associativity.
∗ There is a unit 0.
∗ For each r ∈ R, there exists −r ∈ R for which r + (−r) = 0 (i.e.,
every element has an inverse).
∗ The operator + is abelian, i.e., a + b = b + a ∀a, b ∈ R.
(ii) The operator · is associative.
(iii) For every a, b, c ∈ R, the following distributive property holds:
(a + b) · c = a · c + b · c.
A commutative ring has the additional property that
a·b=b·a ∀a, b ∈ R.
In a ring with identity, there is also an element 1 satisfying
1·a=a=a·1 ∀a ∈ R.
• The ring Z is an integral domain, which means that
a, b ∈ Z a · b = 0 ⇒ a = 0 or b = 0
(in rings where this does not hold, if a · b = 0 and neither is 0, then we call
a and b zero divisors).

3.2. Division algorithm. The properties from Section 3.1 lead to a number of
useful properties about the integers. An additional property satisfied by the integers
plays an important role in number theory. Specifically, the positive integers satisfy
the well-ordering principle, which we next describe.
Definition. Let N denote the set of positive integers (note that my N starts at
1, but for some other N includes 0 and denotes the non-negative integers; I write
N0 := N ∪ {0} for the set including 0). Then for any A ⊆ N with A ̸= ∅, there is a
smallest element of A. In other words, there exists a ∈ A such that for every x ∈ A
we have a ≤ x.
We next use the well-ordering principle to obtain the following result.
8
Theorem 3.1 (Division Algorithm). For a ∈ N and b ∈ Z, there exist unique
q, r ∈ Z with 0 ≤ r < a satisfying
b = qa + r
Proof. The set S := {b − qa : q ∈ Z, b − qa ≥ 0} ⊆ N0 is non-empty. Thus by the
well-ordering principle, it contains a unique smallest element r (we have noted the
well-ordering of N, and the well-ordering of N0 follows directly from 0 ≤ n for every
n ∈ N0 ). For this r, there exists q ∈ Z such that
r = b − qa,
or in other words b = qa + r.
We next show that 0 ≤ r < a. However, one sees from the definition of S that
every element b − qa ∈ S satisfies b − qa ≥ 0. Hence r ≥ 0 by construction. Now
suppose for contradiction that r ≥ a. In this case, we have
b − (q + 1)a = r − a ≥ 0
But then r − a ∈ S and r − a < r, which contradicts the minimality of r. We
conclude that r < a.
Finally, we must show that q and r are unique. Suppose that
b = q 0 a + r0
with q0 ∈ Z and 0 ≤ r0 < a. Then r0 ∈ S and by the minimality of r we have
r ≤ r0 . Moreover, comparing the identities containing r and r0 , we have
r0 − r = (q − q0 )a.
Since r0 ≥ r, we have q ≥ q0 . However, if q > q0 , then q − q0 ≥ 1 and hence
r0 − r = (q − q0 )a ≥ a.
On the other hand, since r0 < a and r ≥ 0, we have
r0 − r < a − 0 = a,
leading to a contradiction. It follows that q = q0 , and hence also r = r0 (since there
are no zero divisors in Z). □
Definition. Suppose that R is a commutative ring (if you have not had abstract
algebra, just consider R = Z) and a, b ∈ R. The element a is called a divisor of
b (one also says that a divides b, b is divisible by a or b is a multiple of a) if there
exists q ∈ R such that b = qa.
If a divides b, then we write a | b, and if a does not divide b (a is not a divisor of
b), then we write a ∤ b.
Remark 3.2. If R is an integral domain (the integers are an integral domain) and
a ̸= 0 satisfies a | b, then there is a unique q ∈ R for which b = qa.
9
Example 3.3. In the case we are mostly interested in, we have R = Z. Since, for
example, 5 · 13 = 65, we see that 5 is a divisor of 65. The unique q such that 5q = 65
is q = 13.
Theorem 3.4. Suppose that R is a commutative ring with identity 1 (again, you
can think of R = Z) and let a, b, c, d, u, v ∈ R. Then the following hold:
(1)
1 | a,
a | a,
a | 0.
(2) If a | b and b | c, then a | c.
(3) If d | a and d | b, then d | (ua + vb).
(4) If a | b, then a | bc and ac | bc.
(5) If 0 | a, then a = 0.
Suppose further that R is an integral domain. Then it follows that
(6) If av | bv and v ̸= 0, then a | b.
In the case of R = Z, we have
(7) The only divisors of 1 are ±1.
(8) If a | b and b | a, then a = ±b.
(9) If a | b and b ̸= 0, then |a| ≤ |b|.
Proof. These follow by fairly straightforward calculations, so we only work out a
couple of representative cases. For example, to show (3), we note that d | a and d | b
implies that there exist q1 and q2 such that
a = q1 d,
b = q2 d.
Therefore
ua + vb = uq1 d + vq2 d = (uq1 + vq2 )d,
where in the last step we used distributivity (and commutativity) of R. We therefore
see that d | ua + vb.
To show (6), we note that
bv = qav ⇒ (b − qa)v = 0.
Since v ̸= 0 and there are no zero divisors (because R is an integral domain), it
follows that
b − qa = 0.
We therefore have
b = qa ⇒ a | b.

10
Definition. Note that, since n · 1 = n, every n ∈ N has the divisors 1 and n.
If the only positive divisors of some integer p > 1 are 1 and p, then we call p a
prime number (or simply prime). We omit p = 1 in the definition of primes because
otherwise there are some problems which occur (for example, in prime factorization,
which we look at next).
Theorem 3.5 (Prime factorization). Every natural number n > 1 can be represented
as a product of finitely many primes.
Remark 3.6. We will later show that the prime factorization is unique.
Proof. We show the result by induction. Firstly, for n = 2 we see directly that 2
is prime (by Theorem 3.4 (9), all divisors of 2 must be smaller than 2 in absolute
value).
Now suppose that n > 2 and for every 2 ≤ m < n the claim holds. If n is prime,
then we are done. Otherwise we have
n = m1 m2
with 2 ≤ m1 < n and 2 ≤ m2 < n. By induction, m1 and m2 have prime factoriza-
tions
ℓ1
Y
m1 = pj,1
j=1
ℓ2
Y
m2 = pj,2 ,
j=1

so that
ℓ1
Y ℓ2
Y
n= pj,1 pj,2 .
j=1 j=1
This is the claim. □
Theorem 3.7. There are infinitely many primes.
Proof (Euclid). Suppose that p1 , . . . , pr are all primes. Then N = p1 · · · pr + 1 is not
divisible by any prime pj , since if N = pj q, then
!
Y
pj q = p1 · · · p r + 1 ⇒ pj q − pℓ = 1.
ℓ̸=j

But then by Theorem 3.4 (7), we conclude that pj = ±1, which contradicts the fact
that it is prime.
However, by Theorem 3.5, N has a factorization into primes. Therefore, in partic-
ular, there exists a prime p such that p | N . But then p ̸= pj for all j, contradicting
the fact that p1 , . . . , pr is the set of all primes. □
11
Sieve of Eratosthenes (ca. 300BC)
One can compute a list of all primes p ≤ x for a given x by successively underlining
primes and then crossing out multiples of that prime. The next number not under-
lined and not crossed out is the next prime. For example, for x = 15, one finds all
primes ≤ 15 as follows (the multiples of each prime are colored the same as that
prime, with the coloring matching the first time that it is crossed out):
1, 2, 3, 4 , 5, 6 , 7, 8 , 9 , 
10,
 11, 
12,
 13, 
14,
 15


Theorem 3.8. Order the primes p1 = 2, p2 = 3, p3 = 5, . . . by their size. Then one


has
n−1
pn ≤ 22 .
Remark 3.9. A stronger/more complicated theorem known as the Prime Number
Theorem implies that pn is about the size of n log(n).
Proof. First note that
1−1 2−1
p1 = 2 = 2 2 , p2 = 3 < 4 = 22 .
k−1
Now suppose that n ≥ 1 and pk ≤ 22 for all k ≤ n. Recall from the proof of
Qn
Theorem 3.7 that N = 1 + j=1 pj is not divisible by any of the primes p1 , . . . , pn .
Since it has a prime factorization by Theorem 3.5, there must be a prime p ̸= pj
with p | N (it could be the case that p = N ). Since p ̸= pℓ for 1 ≤ ℓ ≤ n, we have
p ≥ pn+1 and by Theorem 3.4 (9), we have p ≤ N . Therefore
0 +21 +...+2n−1 n −1 n −1 n
pn+1 ≤ p ≤ p1 · · · pn + 1 ≤ 22 + 1 = 22 + 1 < 2 · 22 = 22 ,
where the third inequality follows by inductive hypothesis.

Definition. Let R be a commutative ring with 1, a1 , . . . , am ∈ R (again, take R = Z
if you have not had abstract algebra). An element d ∈ R is called a common divisor
of a1 , . . . , am if for every 1 ≤ k ≤ m, we have d | ak .
If it further holds that, for every common divisor δ of a1 , . . . , ar ,
δ | d,
we call d a greatest common divisor of a1 , . . . , am . One writes d = gcd(a1 , . . . , am )
(or d = (a1 , . . . , am ), as it is often abbreviated).
If 1 is a greatest common divisor of a1 , . . . , am , then we call a1 , . . . , am relatively
prime (or co-prime).
One calls a1 , . . . , am pairwise co-prime if for every pair j, k ∈ {1, . . . , m} with
j ̸= k we have
gcd(aj , ak ) = 1.
A t ∈ R is called a common multiple of a1 , . . . , am if ak | t for every 1 ≤ k ≤ m.
If it further holds that for every common multiple s of a1 , . . . , am satisfies t | s,
then we call t a least common multiple of a1 , . . . , am . One writes t = lcm(a1 , . . . , am ).
12
Theorem 3.10.
(1) Suppose that a1 , . . . , am ∈ Z are not all 0. Then the integers a1 , . . . , am have
a unique positive greatest common divisor d and the following statements are
equivalent:
(a) The (positive) integer d is a common divisor of a1 , . . . , am and for every
common divisor δ of a1 , . . . , am , one has δ | d.
(b) The integer d is the largest element of N for which d | ak for every k =
1, . . . , m.
(c) The integer d is the smallest positive integer in

{x1 a1 + . . . + xm am : x1 , . . . , xm ∈ Z} .

(2) (Bezout’s lemma) There are integers x1 , . . . , xm ∈ Z for which

x1 a1 + . . . + xm am = gcd(a1 , . . . , am ).

Proof.
(1) Note first that 1 is a positive divisor of a1 , . . . , am . By Theorem 3.4 (9), every
common divisor δ of a1 , . . . , am satisfies

δ ≤ min {|ak | : 1 ≤ k ≤ m} .

Thus the set

S := {δ ∈ N : δ is a common divisor of a1 , . . . , am }

is non-empty and bounded from above. It follows that S has a greatest element,
which we denote by d (note that d satisfies condition (b) by construction).
Now suppose that d2 satisfies condition (a). Then d | d2 because d is a common
divisor. Thus by Theorem 3.4 (9) we have d ≤ d2 . However, d2 ∈ S and so by
the maximality of d in S, we conclude that d2 = d. Thus (a) is equivalent to
(b).
We now show that the integer satisfying (b) (which is the same as the integer
satisfying (a) from above) is the same as the integer satisfying (c). The set

A := {x1 a1 + . . . + xm am : x1 , . . . xm ∈ Z, a1 x1 + . . . + am xm > 0}

is non-empty since for ak ̸= 0 we have |ak | = ±ak ∈ A. Thus by the well-ordering


principle, there exists a smallest element g ∈ A.
We write
g = x 1 a1 + . . . + x m am
with xj ∈ Z, 1 ≤ j ≤ m. If δ | aj for every 1 ≤ j ≤ m, Theorem 3.4 (3)
(repeated many times) implies that δ | g. From (b), we conclude that d | g and
hence by Theorem 3.4 (9), we have d ≤ g.
13
Suppose for contradiction that there exists ak for which g ∤ ak . Then by
Theorem 3.1 there exist q, r ∈ Z with 0 < r < g (r = 0 is excluded because we
assumed g ∤ ak ) such that
ak = qg + r.
Thus (by writing g as a linear combination of a1 , . . . , am )
r = ak − qg ∈ A,
contradicting the minimality of g in A. It follows that g is a common divisor
of a1 , . . . , am , and thus by (b) we have g | d. Since we have already shown that
d | g and both are positive, Theorem 3.4 (8) implies that g = d, which implies
that (b) and (c) are equivalent.
(2) This follows from the equivalence between (a) and (c) in part (1).

Theorem 3.11 (Properties of the gcd).
If a, b ∈ Z are not both 0 and n ∈ N, then the following hold:
(1)
gcd(na, nb) = n gcd(a, b).
(2) If n | a and n | b, then gcd na , nb = n1 gcd(a, b). In particular, if d = gcd(a, b),


then  
a b
gcd , = 1.
d d
(3) We have
gcd(a, b) = gcd(b, a) = gcd(a, −b) = gcd(a, b + na).
(4) If gcd(a, n) = gcd(b, n) = 1, then gcd(ab, n) = 1.
(5) If n | ab and gcd(a, n) = 1, then n | b.
Proof.
(1) This follows directly from Theorem 3.10 (1)(c), as for a1 = an and a2 = bn we
can pull n out of each aj .
(2) Set a′ = a/n and b′ = b/n and then use (1). That is
gcd(a, b) = gcd(a′ n, b′ n) = n gcd(a′ , b′ ),
which is equivalent to the claim.
(3) These all follow by Theorem 3.10 (1)(c). In particular, in the last one, the
elements of the set
x1 a + x2 (b + na) = (x1 + nx2 )a + x2 b
are in one-to-one correspondence with x′1 = ′
 x1 + nx2 and x2 = x2 (the corre-
x ′
sponding matrix ( 10 n1 ) sending ( xx12 ) to x1′ is invertible, with inverse ( 10 −n
1 )).
2
14
(4) Theorem 3.10 (2) implies that there exist x, y, u, v ∈ Z such that
1 = xa + yn = ub + vn
Rearranging, we have
abxu = axbu = (1 − ny)(1 − nv) = 1 − n(y + v − nyv)
Thus if we set r := xu and s := y + v − nyv, then we have
abr + sn = 1.
Then Theorem 3.10 (1)(c) implies that gcd(ab, n) = 1.
(5) The claim follows directly if a = 0 or b = 0, so we assume ab ̸= 0. Since
gcd(a, n) = 1, (1) implies that
gcd(nb, ab) = b gcd(n, a) = b.
Since n | ab and n | nb, Theorem 3.10 (1)(a) implies that n | gcd(ab, nb) = b.

Theorem 3.12. Suppose that a, b ∈ Z and p is prime. If p | ab, then p | a or p | b.
Proof. Suppose that p | ab and p ∤ a. Note next that since gcd(p, a) is a divisor of
p and p is prime, it must be the case that gcd(p, a) = p or gcd(p, a) = 1. However,
since p ∤ a, we have gcd(p, a) = 1. Thus by Theorem 3.11 (5), we conclude that
p | b. □
Theorem 3.13 (Fundamental Theorem of Arithmetic). Let n ∈ N, n ≥ 2 be given.
Then n has a unique representation as a product of primes, where uniqueness is
meant to be up to reordering the primes.
Proof. Existence of the representation as a product of primes was proven in Theorem
3.5, so it remains to prove that the representation is unique.
Suppose that there are primes p1 , . . . , pr and q1 , . . . , qs for which
n = p1 · · · pr = q1 · · · q s .
We prove the claim by induction on s. If s = 1, then n = q1 is prime, and hence since
p1 | q1 , we have p1 = 1 or p1 = q1 , from which we conclude that the representation
is unique.
Next note that by Theorem 3.12 (used repeatedly),
pr | n = q1 · · · qs
implies that pr | qℓ for some ℓ and without loss of generality, we may assume that
ℓ = s. Again noting that qs is prime, we have pr = qs as above. We then conclude
that
n
p1 · · · pr−1 = = q1 · · · ps−1 .
qs
15
By induction we conclude that the representation for n/qs is unique, which yields
the claim.

Theorem 3.14 (Euclidean algorithm). Let a ∈ Z and b ∈ N be given. Then there
exist integers k ≥ 0 and r1 , . . . , rk , q1 , . . . , qk+1 for which
a = bq1 + r1 0 < r1 < b,
b = r1 q2 + r2 0 < r2 < r1 ,
r1 = r2 q3 + r3 0 < r3 < r2 ,
..
.
rk−2 = rk−1 qk + rk 0 < rk < rk−1 ,
rk−1 = rk qk+1 ,
with rk = gcd(a, b).
One obtains a solution x, y ∈ Z to ax + by = gcd(a, b) by solving for rk = gcd(a, b)
by successively plugging in (with r0 = b and r−1 = a)
rj = rj−2 − rj−1 qj .
Proof. By Theorem 3.11 (3), we have
gcd(a, b) = gcd(a − bq1 , b) = gcd(b, r1 ) = gcd(b − r1 q2 , r1 )
= gcd(r1 , r2 ) = . . . = gcd(rk−1 , rk ) = rk ,
where in the last line we note that since rk | rk−1 , Theorem 3.10 (2) implies that
gcd(rk−1 , rk ) = rk .

Example 3.15. We consider the example
b = 918 = 2 · 33 · 17,
a = 4340 = 22 · 5 · 7 · 31.
From the prime factorization, we have gcd(a, b) = 2. We next show this via the
Euclidean algorithm. Namely,
4340 = 4 · 918 + 668
918 = 1 · 668 + 250
668 = 2 · 250 + 168
250 = 1 · 168 + 82
168 = 2 · 82 + 4
82 = 20 · 4 + 2
4=2·2
16
⇒ 2 = 82 − 20 · 4 = 82 − 20(168 − 2 · 82) = 41 · 82 − 20 · 168
= 41 · (250 − 168) − 20 · 168 = 41 · 250 − 61 · 168
= 41 · 250 − 61(668 − 2 · 250) = 163(918 − 668) − 61 · 668
= 163 · 918 − 224(4340 − 4 · 918) = 1059 · 918 − 224 · 4340.
3.3. Optional: Euclidean domains.
Definition. An integral domain R is called a Euclidean domain (or Euclidean ring),
if there exists a function (known as a Euclidean function or degree function) g :
R \ {0} → N0 such that for every a, b ∈ R with b ̸= 0, there exist q, r ∈ R with
a = qb + r and either r = 0 or g(r) < g(b).
In any Euclidean domain, there exists a division algorithm yielding a system of
linear equations such as those in the Euclidean algorithm, where at each step one
has g(rj+1 ) < g(rj ).
Examples 3.16.
(1) The Euclidean function for the R = Z is g(x) = |x| (see Theorem 3.1).
(2) Suppose that K is a field (such as K = Q or K = R, for example) and let
R = K[x] be the ring of polynomials in one variable over K.
Then
g(F ) = deg(F ),
where deg(F ) is the degree of the polynomial. The resulting division algorithm
is polynomial long division.
(3) For the ring R = Z[i] := {a + bi : a, b ∈ Z} of Gaussian integers,
g(a + bi) = |a + bi|2 = a2 + b2 .
Definition. Let a commutative ring R with 1 be given. An element ε ∈ R is called
a unit (in R), if there exists η ∈ R with
εη = 1.
The units of R form a group with respect to multiplication. These are referred to
as the group of units and are usually denoted by R× .
Elements x, y ∈ R are called associate if x | y and y | x both hold. In other words,
there exist ∃a, b ∈ R with
y = ax = aby.
In an integral domain R, we see that
y(ab − 1) = 0,
in which case either y = 0 or ab = 1. Thus two elements x and y are associate if and
only if x = y = 0 or x = by and y = ax with ab = 1 (i.e., a, b ∈ R× ). Associativity
is an equivalence relation.
17
An element x ∈ R is called irreducible if x ̸= 0 and x ∈ / R× and every divisor of
x is either a unit or associate to x. An x ∈ R is called prime or a prime element of
R, if x ̸= 0 and x ∈/ R× and if for any a, b ∈ R for which x | ab, it follows that x | a
or x | b.
Remarks.
(1) In an integral domain, every prime element is irreducible. If x is prime and y | x,
then there exists a ∈ R such that x = ay. In particular, x | ay, and hence x | a
or x | y. If x | y, then x and y are associate by definition. If x | a, then a = xb
for some b ∈ R, and it follows that
x = ay = xby =⇒ x(1 − by) = 0.
Since x ̸= 0, we see that by = 1, from which it follows that x and y are associate.
(2) By Theorem 3.12, every irreducible element in Z is prime, but this does not hold
for an arbitrary integral domain. It does hold for Euclidean domains, however;
the proof generally follows Theorem 3.12. Suppose that R is a Euclidean domain.
Then by the Euclidean algorithm for R, if a is irreducible and does not divide
b, then there exist x, y ∈ R such that ax + by = 1, so if a | bc and a ∤ b, then
∃d ∈ R with ad = bc and
d = (ax + by)d = adx + byd = b(cx + dy).
From this we conclude that
ab(cx + dy) = bc.
Since b ̸= 0 (otherwise a | b automatically) and R is an integral domain, we have
a(cx + dy) = c,
and hence a | c in particular.
Example 3.17. Set
√  √
R = Z[ −6] = a + b −6 a, b ∈ Z .
Then R is an integral
√ domain.
For z = a + b −6 ∈ R, we define the norm of z to be
N (z) := |z|2 = a2 + 6b2 .
Straightforward calculations show that
N (zw) = N (z)N (w), N (0) = 0, N (1) = N (−1) = 1,
N (z) > 1 ∀z ∈
/ {0, ±1} .
The units of R are given by
R× = {z ∈ R : N (z) = 1} = {+1, −1} .
18

The elements 2, 3 and√ −6 are irreducible in R.√To see this, √
suppose that w ∈ R
is a proper divisor of −6 (not a unit and not −6).
√ Then −6 = wz for some
z ∈ R, and hence (since N (z)N (w) = N (zw) = N ( −6) = 6)
N (w) ∈ {2, 3} .
However a2 + 6b2 = 2 and a2 + 6b2 = 3 have no solutions in Z. Thus
√ √
6 = 2 · 3 = − −6 · −6
are two different factorizations of 6 into irreducible factors.

19
4. Congruences and residue classes
For a, b ∈ Z and N ∈ N, one calls a congruent to b modulo N , if N | (a − b). If a is
congruent to b modulo N , then one writes
a ≡ b (mod N ),
while if they are not congruent, then one writes
a ̸≡ b (mod N ).
Theorem 4.1. Let N ∈ N be given. Congruence modulo N is an equivalence relation
which is compatible with addition and multiplication. In other words, for every
a, b, c, d ∈ Z,
(1) a ≡ a (mod N ),
(2) a ≡ b (mod N ) ⇒ b ≡ a (mod N ),
(3) a ≡ b (mod N ), b ≡ c (mod N ) ⇒ a ≡ c (mod N ),
(4) a ≡ b (mod N ), c ≡ d (mod N ) ⇒ a + c ≡ b + d (mod N ),
(5) a ≡ b (mod N ), c ≡ d (mod N ) ⇒ ac ≡ bd (mod N ).
Proof.
(1) This follows directly from N · 0 = 0 = a − a.
(2) Clearly N | (b − a) implies that N | (a − b).
(3) If N | (a − b) and N | (b − c), then there exist x and y for which N x = a − b
and N y = b − c. But then
a − c = (a − b) + (b − c) = N x + N y = N (x + y),
from which we conclude that N | a − c.
(4) We have a − b = N x and c − d = N y for some x, y ∈ Z. Thus
a + c − (b + d) = (a − b) + (c − d) = N (x + y),
from which the claim follows.
(5) We again choose x, y ∈ Z such that a − b = N x and c − d = N y. Then
ac − bd = ac − bc + bc − bd = (a − b)c + b(c − d) = (xc + by)N.

Definition. The equivalence relation from congruences splits Z into disjoint classes,
which we call the residue classes (or congruence classes) modulo N . One often writes
a to denote the equivalence class a + N Z.
One calls a set {x1 , . . . , xn } ⊂ Z a complete residue system modulo N if for every
a ∈ Z there exists a unique j ∈ 1, . . . , n such that
a ≡ xj (mod N ).
For example {0, 1, . . . , N − 1} (known as the least residue system modulo N ) or
{0, −1, . . . , −N + 1} . . . are complete residue systems modulo N .
20
The set of all residue classes modulo N is usually denoted Z/N Z and Z/N Z forms
a ring with respect to
a + b := a + b
a · b := a · b.
The identity element wth respect to addition is 0 and the identity element with
respect to multiplication is 1.
Remark 4.2. The ring Z/N Z may contain zero divisors. For example, in Z/4Z we
have
2·2=4=0
and in Z/6Z we have
2 · 3 = 6 = 0.
Theorem 4.3. For N > 1, the ring Z/N Z is an integral domain if and only if N
is prime.
Proof. If N is not prime, then there exist a, b ∈ N with N = ab and 1 < a < N (and
1 < b < N ). Since in Z/N Z we have
a · b = ab = N = 0,
it follows that either a is a zero divisor or either a = 0 or b = 0. Since 1 < a < N ,
we have a ̸≡ 0 (mod N ), and hence a ̸= 0. Similarly, b ̸= 0. We conclude that
Z/N Z is not an integral domain.
Conversely, assume that N is prime and that
ab = a · b = 0,
or in other words
ab ≡ 0 (mod N ),
which is equivalent to N | ab. Since N is prime, Theorem 3.12 implies that N | a
or N | b, in which case a = 0 or b = 0, respectively. Therefore Z/N Z is an integral
domain. □
Definition. A commutative ring R is called a field if every non-zero element is
invertible, or in other words R× = R \ {0}.
Theorem 4.4. Every finite integral domain R is a field. In particular, for a prime
p, Fp := Z/pZ is a field with p elements.
Specifically, every non-zero element of Z/pZ is invertible.
Proof. Suppose that x ∈ R \ {0}. Since R is finite, x, x2 , x3 , . . . are not all distinct.
Hence there exist r, s ∈ N with r < s and
xr = xs .
21
Since R is an integral domain and
xr xs−r − 1 = 0,


we conclude from xr ̸= 0 that xs−r = 1. Rewriting this, we have


x · xs−r−1 = 1,
or in other words xs−r−1 is the inverse of x. □
We next show how to “cancel” in congruences.

Theorem 4.5 (Cancellation in Z/N Z). For N ∈ N and a, x, y ∈ Z, we have


(1) The congruence
ax ≡ ay (mod N )
holds if and only if
 
N
x≡y mod .
gcd(a, N )
(2) If ax ≡ ay (mod N ) and gcd(a, N ) = 1, then x ≡ y (mod N ).
(3) If a | N , then ax ≡ ay (mod N ) if and only if x ≡ y (mod N/a).

Proof. We write d = gcd(a, N ) and assume that ax ≡ ay (mod N ). Then by


definition there exists k ∈ Z such that
a(x − y) = kN.
This implies that
a N
(x − y) = k .
d d
We conclude that Nd ad (x − y). By Theorem 3.11 (2), we have gcd a N

,
d d
= 1, and
hence by Theorem 3.10 we conclude that
N
(x − y),
d
or in other words  
N
x≡y mod ,
d
as claimed.
For the reverse direction, suppose that
 
N
x ≡ y mod .
d
Then there exists k ∈ Z for which
N
x−y =k .
d
22
Multiplying by a and noting that d | a yields
N a
ax − ay = ka = k N.
d d
Hence N | ax − ay, or in other words
ax ≡ ay (mod N ).
Part (2) is the special case d = 1 and part (3) is the special case d = a. □
Theorem 4.6.
(1) If a ≡ b (mod N ) and d | N , then a ≡ b (mod d).
(2) If f (x) = c0 + c1 x + . . . + cm xm is a polynomial with coefficients cν ∈ Z and
a, b ∈ Z with a ≡ b (mod N ), then
f (a) ≡ f (b) (mod N ).
(3) For N1 , . . . , Nr , the system of congruences
a ≡ b (mod N1 ),
..
.
a ≡ b (mod Nr )
hold if and only if a ≡ b (mod lcm(N1 , . . . , Nr )).
(4) If a ≡ b (mod N ), then gcd(N, a) = gcd(N, b).
Proof.
(1) Writing N = dN ′ and a − b = kN , we have
a − b = kN = kN ′ d,
so that d | a − b or in other words a ≡ b (mod d) by definition.
(2) We prove the claim by induction on the degree of f . Clearly for degree 0 the
claim holds because f (a) = c0 = f (b) independent of a and b. Now suppose that
f has degree m > 0 and write
g(x) := f (x) − cm xm ,
which is a polynomial of degree ≤ m − 1. By inductive hypothesis, g(a) ≡ g(b)
(mod N ). Rearranging, we have
f (x) = g(x) + cm xm ,
so that
f (a) = g(a) + cm am .
By Theorem 4.1 (4), since g(a) ≡ g(b), we have
f (a) ≡ g(b) + cm am (mod N ).
23
Moreover, since a ≡ b (mod N ), Theorem 4.1 (5) (used repeatedly, i.e., by
induction) implies that
c m am ≡ c m b m (mod N ).
Again using Theorem 4.1 (4), we conclude that
f (a) ≡ g(b) + cm bm = f (b) (mod N ).
(3) Suppose first that a ≡ b (mod Nj ) for every 1 ≤ j ≤ r. Then a − b is a common
multiple of N1 , . . . , Nr and thus is a multiple of lcm(N1 , . . . , Nr ) by definition.
We conclude that
a≡b (mod lcm(N1 , . . . , Nr )).
Conversely, assume that a ≡ b (mod lcm(N1 , . . . , Nr )). By part (1), since Nj
divides lcm(N1 , . . . , Nr ), we conclude that
a ≡ b (mod Nj )
for every j = 1, . . . , r.
(4) Suppose that a ≡ b (mod N ) and set d := gcd(a, N ). Then there exists k ∈ Z
for which a − b = kN , and hence b = a − kN . By Theorem 3.4 (3), it follows
that d | b and d is a common divisor of b and N . Thus Theorem 3.10 implies
that d | gcd(b, N ).
Reversing the roles of a and b yields
gcd(b, N ) | d,
and hence gcd(a, N ) = gcd(b, N ) by Theorem 3.4 (8).

Definition. We call the residue class a a primitive residue class modulo N if the
integer a is relatively prime to N . This is well-defined because gcd(a + rN, N ) =
gcd(a, N ) by Theorem 3.11 (3).
Let φ(N ) denote the number of primitive residue classes modulo N ; we call φ the
Euler phi-function (it is also sometimes denoted ϕ).

Example 4.7. For N = 6, the only primitive residue classes are 1 and 5, so we
conclude that φ(6) = 2.
For N = 12, the integers 1, 5, 7, 11 give representatives of the primitive residue
classes, from which we conclude that φ(12) = 4.

A set {a1 , , . . . , ar } ⊂ Z with gcd(aj , N ) = 1 is called a reduced residue system


modulo N if for every a ∈ Z with gcd(a, N ) = 1 there exists exactly one 1 ≤ j ≤ r
with a ≡ aj (mod N ).

Example 4.8. The set {1, 5, 7, 11} is a reduced residue system modulo 12.
24
Remark 4.9. A set {a1 , . . . , ar } of integers with gcd(aj , N ) = 1 and aj ̸≡ ak (mod N )
for every pair j ̸= k is a reduced residue system if and only if r = φ(N ). This follows
from the fact that if a ∈ Z with gcd(a, N ) = 1, then a is a primitive residue class.
Theorem 4.10.
(1) Suppose that N ∈ N and c ∈ Z are relatively prime. Then {a1 , . . . , ar } is
a reduced residue system modulo N if and only if {ca1 , . . . , car } is a reduced
residue system modulo N .
(2) The primitive residue classes form a group with respect to multiplication (that
is to say, the primitive residue classes are closed under multiplication and every
element has a multiplicative inverse). These are in one-to-one correspondence
with the group of units (Z/NZ)× of Z/NZ.
Proof.
(1) Since gcd(c, N ) = 1 = gcd(aj , N ), Theorem 3.11 (4) implies that gcd(caj , N ) = 1
as well. The set {ca1 , . . . , car } must have size φ(N ), so it remains to show that
each of the caj is in a different congruence class. Now suppose that
caj ≡ cak (mod N ),
or in other words
N | c(aj − ak ).
By Theorem 4.5 (2) and gcd(c, N ) = 1, we conclude that N | (aj − ak ), or in
other words aj ≡ ak (mod N ). Since a1 , . . . , ar is a reduced residue system,
uniqueness of the j ′ for which aj ≡ a′j (mod N ) (namely j ′ = j) implies that
k = j. Thus caj ≡ cak if and only if j = k. Since both sets have size φ(N ), we
conclude that {caj : j ∈ 1, . . . , r} is a reduced residue system.
For the converse, suppose that ca1 , . . . , car form a reduced residue system.
Since gcd(ca1 , N ) = 1, we have gcd(a1 , N ) = 1, so {a1 , . . . , ar } is a set of inte-
gers which are relatively prime to N and there are φ(N ) of them. It remains
to show that no two of them are in the same congruence class. However, if
aj ≡ ak (mod N ), then caj ≡ cak (mod N ), which implies that j = k because
{ca1 , . . . , cak } is a reduced residue system. This implies the claim.
(2) If gcd(b, N ) = gcd(c, N ) = 1, then by Theorem 3.11 (4), we have gcd(bc, N ) = 1.
Next note that by part (1), if a1 , a2 , . . . , ar are a reduced residue system, then
so are a1 c, a2 c, . . . , ar c. In particular, there exists j for which
aj c ≡ 1 (mod N ).
Thus c is a unit in Z/N Z. We conclude that the primitive residue classes form
a subgroup of (Z/NZ)× . On the other hand, every unit in Z/NZ is a primitive
residue class modulo N because bc ≡ 1 (mod N ) implies that gcd(bc, N ) = 1
(since gcd(b, N ) | bc, Theorem 4.6 (4) implies that gcd(b, N ) | 1, so gcd(b, N ) =
1). This completes the proof.
25

Theorem 4.11.
(1) If G is a finite abelian (commutative) group of size (also called the order of
the group) m with identity element e, then am = e for all a ∈ G.
(2) Euler’s Totient Theorem
For all N ∈ N and a ∈ Z with gcd(a, N ) = 1, we have

aφ(N ) ≡ 1 (mod N ).

(3) Fermat’s Little Theorem


If p is prime, then ap−1 ≡ 1 (mod p) for all a ∈ Z with p ∤ a and ap ≡ a
(mod p) for all a ∈ Z.

Proof.
(1) Suppose that a1 = e, a2 , . . . , am are all elements of G and let a ∈ G be arbitrary.
Since G is a group, a is invertible, and hence x 7→ xa is a bijection between
elements of G. In other words

(4.1) {a1 , . . . , am } = {aa1 , . . . , aam } .

We next set g := a1 a2 · · · am . From (4.1) and the fact that G is abelian, we see
that
g = (aa1 ) · (aa2 ) · · · (aam ) = am (a1 · · · am ) = am g.
Therefore we conclude that am g = g. Since g is invertible, this implies that
am = e.
(2) This follows from part (1) with G = (Z/N Z)× .
(3) This follows directly from part (2) after noting that φ(p) = p − 1.

We next give an alternative proof of Fermat’s Little Theorem. We would like to


prove that ap ≡ a (mod p). The claim is clear for p | a, so we assume that p ∤ a.
We proceed by induction on a ∈ N0 (or a (mod p) chosen between 0 and p − 1, as
ap ≡ (a + p)p (mod p) by Theorem 4.1, so to prove the claim for all a ∈ Z it suffices
to show it for 0 ≤ a < p). We may then use the binomial formula to compute
p  
p
X p j
(a + 1) = a.
j=0
j

We next claim that for 1 ≤ j ≤ p − 1, we have


 
p
p| .
j
26
Proof. We have that
 
p
j! = p · · · (p − j + 1),
j
which is divisible by p because j > 0. Thus
  
p
p | j! .
j
Since p is prime, either p | j! or p | pj . However, since j < p, we have p ∤ j! (all


of the primes dividing j! are at most j by writing j! = jℓ=1 ℓ and then using the
Q

Fundamental Theorem of Arithmetic (Theorem 3.13). We therefore conclude that

(4.2) (a + 1)p ≡ 1 + ap (mod p).

By inductive hypothesis, we have ap ≡ a (mod p), so that (4.2) implies that

(a + 1)p ≡ 1 + a (mod p).


9
Example 4.12. What are the last 3 digits in the decimal representation of 99 ? In
9
other words, we need to compute 99 modulo 1000. Since

9φ(1000) ≡ 1 (mod 1000),

we first compute 99 (mod φ(1000)). A calculation shows that φ(1000) = 400 (we
will later compute a formula for this). Thus

99 = (80 + 1)4 · 9 = (804 + 4 · 803 + 6 · 802 + 4 · 80 + 1) · 9


≡ −79 · 9 ≡ 89 (mod 400)
89  
9
X 89
⇒ 99 89 89
≡ 9 = (10 − 1) = j
10 (−1)89−j
·
j=0
j
 
89
≡ −1 + 890 − 100 ≡ −1 − 110 − 100 · 3916
2
≡ −111 − 600 ≡ 289 (mod 1000).

It turns out that Theorem 4.11 (1) holds in more generality when G is not neces-
sarily abelian.

Theorem 4.13 (Lagrange). If G is a finite group of size m and e is the unit of the
group, then for every element a ∈ G there exists a smallest k ∈ N with ak = e and
k | m. In particular, am = e for all a ∈ G.

Remark 4.14. We call k the order of the element a in G.


27
Proof. Let a ∈ G be given. Since G is finite, the elements a, a2 , a3 , . . . are not all
distinct. Suppose that ar = as for some r, s ∈ N with r < s. Then
as−r ar = as = ar ,
so (since ar is invertible) as−r = e. We have hence proven the existence of k ∈ N for
which ak = e and we may choose k smallest.
Furthermore, the elements of the set
S := e, a, . . . , ak−1 ⊆ G


must all be distinct, since otherwise ar = as for some r < s < k and from this we
conclude that as−r = e, contradicting the minimality of k.
If S = G, then k = m, and hence k | m. On the other hand, if S ̸= G, then there
exists b2 ∈ G with b2 ∈/ S (note that b2 ̸= aj for any j since if j ′ ≡ j (mod k), then
j′ j
a = a ).
The set
S2 := b2 , b2 a, . . . , b2 ak−1


is disjoint from S because b2 ar = aℓ implies that b2 = aℓ−r . Moreover, if b2 ar = b2 as ,


then as−r = e, so all of the elements b2 aℓ with 0 ≤ ℓ < k are distinct and the size of
S2 is also k.
We then again note that if S ∪ S2 ̸= G, then there exists b3 ∈ G with b3 ∈ / (S ∪ S2 )
and iteratively construct b1 = e, b2 , . . . , bt ∈ G (there are only finitely many bt
because G is finite) and
Sj := bj , bj a, . . . , bj ak−1


for which
t
[
G= Sj
j=1
is a disjoint union. Since this is a disjoint union, the size of G is
t
X t
X
#G = #Sj = k = tk.
j=1 j=1

Thus m = tk, and k | m. Finally, for all a ∈ G we have


am = akt = (ak )t = et = e.

Theorem 4.15. Let N, a, b ∈ Z with N > 0 and gcd(a, N ) = 1. Then the congru-
ence
ax ≡ b (mod N )
has precisely one solution modulo N . In other words, there is a solution x0 ∈ Z to
this congruence and the set of all solutions in Z is
{x0 + tN : t ∈ Z} .
28
Proof. Set
x0 = aφ(N )−1 b ∈ Z.
Then by Fermat’s Little Theorem (Theorem 4.11) and Theorem 4.1 (5), we have
ax0 = aφ(N ) b ≡ 1 · b = b (mod N ).
Therefore x0 is a solution to the congruence.
Now suppose that x is another solution. Then we have (using Theorem 4.1)
a(x − x0 ) = ax − ax0 ≡ b − b = 0 (mod N ).
Since gcd(N, a) = 1, Theorem 4.5 (2) implies that x ≡ x0 (mod N ). □
Theorem 4.16 (Wilson’s Theorem). An integer m > 1 is prime if and only if
(m − 1)! ≡ −1 (mod m).
Proof. Suppose first that m is not prime and  write m = ab with 1 <  a and b < m.
Since a | (m − 1)!, we have gcd (m − 1)!, m ̸= 1. Since gcd m, −1 = 1, Theorem
4.6 (4) implies that (m − 1)! ̸≡ −1 (mod m).
Now assume that m is prime. We check the claim directly for m = 2 and m = 3
by plugging in. For p = m ≥ 5 and prime, Theorem 4.15 implies that for each
a ∈ {1, . . . , p − 1}, there exists a unique x = xa ∈ {1, 2, . . . , p − 1} with
ax ≡ 1 (mod p).
Moreover, uniqueness and the fact that xxa = a implies that for a ̸= b, we have
xa ̸= xb . We may pair together each a with xa unless xa = a (if xa = a, then a gets
paired with itself). Note that xa = a if and only if
a2 ≡ 1 (mod p).
Rewriting this as a2 − 1 ≡ 0 (mod p) and then factoring
(a − 1)(a + 1) = a2 − 1 ≡ 0 (mod p),
we have p | (a − 1)(a + 1). Since p is prime, Theorem 3.12 implies that p | a − 1 or
p | a + 1. For 2 ≤ a ≤ p − 2, we see that p ∤ a − 1 and p ∤ a + 1, and thus the only
solutions are a = 1 and a = p − 1. We then write
p−2
Y Y
(p − 1)! = 1 · (p − 1) · a = 1 · (p − 1) · axa .
a=2 pairs (a,xa )

Since axa ≡ 1 (mod p), Theorem 4.1 (5) implies that


Y Y
(p − 1)! = 1 · (p − 1) · axa ≡ 1 · (−1) · 1 = −1 (mod p).
pairs (a,xa ) pairs (a,xa )

29
5. Primality tests and cryptography
The basic idea of (modern) cryptography is to find what is called a one-way
function. A one-way function is “easy to compute in one direction”, but hard “in
reverse”. The idea of one kind of cryptography (known as RSA, after the inventors
Rivest, Shamir, and Adleman) is based on the fact that if N = p1 p2 with p1 and
p2 “large” primes, then it is easy to compute N if you know p1 and p2 (you just
multliply), but it is very hard to find p1 and p2 (these are unique by Theorem 3.13)
√ know N . It is of course possible by going through all primes up to N
if you only
(actually N is enough) and checking if they divide N . But if p1 and p2 are large,
then this will take a very long time. This has led to many people trying to develop
ways to quickly determine whether a number is prime or not. This is called primality
testing.
In order to search for primes, one needs to understand some properties satisfied
by primes. Recall that by Fermat’s Little Theorem (Theorem 4.11), for p prime and
b ∈ Z we have
bp ≡ b (mod p).
Is this a property unique to primes?
Definition. For integers n > 1 and b > 1, we call n a pseudoprime to the base b (or
Fermat pseudoprime), if n is not prime but
bn ≡ b (mod n).
If b = 2, then this is sometimes abbreviated by simply saying that n is a pseudoprime.
Example 5.1. All pseudoprimes (to base 2) under 2000 are 341, 561, 645, 1105, 1387,
1729, and 1905. Lehmer (1950) found the first even pseudoprime (namely 161038)
and Beeger (1951) proved that there are infinitely many even pseudoprimes (we will
show this later).
Definition. If n is a pseudoprime to every base, then we call n a Carmichael number.
By choosing the base b = −1, we see that (−1)n ≡ −1 (mod n) and thus every
Carmichael number is odd. The following gives a way to find some Carmichael
numbers.
Theorem 5.2. Suppose that n = p1 · · · pr , with r ≥ 2 and distinct odd primes
p1 , . . . , pr . If for every j = 1, . . . , r we have φ(pj ) = pj − 1 dividing n − 1, then n is
a Carmichael number.
Proof. Suppose that n has a representation as given in the statement of the theorem.
Then for each j there exists by assumption kj ∈ N with n − 1 = (pj − 1)kj . By
Fermat’s Little Theorem, for every a ∈ Z with pj ∤ a we have
k
an−1 = apj −1 j ≡ 1kj ≡ 1 (mod pj ).
30
It follows that an ≡ a (mod pj ) for all a ∈ Z and all j = 1, . . . , r. By Theorem 4.6
(3), we have
an ≡ a (mod lcm (p1 , . . . , pr )) .
Since p1 , . . . , pr are distinct primes, we have lcm(p1 , . . . , pr ) = p1 p2 · · · pr = n. It
follows that n is a Carmichael number. □

Remark 5.3. There are three Carmichael numbers less than 2000. Namely, they are
561 = 3 · 11 · 17,
1105 = 5 · 13 · 17,
1729 = 7 · 13 · 19.

Questions.
(1) In the 17th century, Mersenne investigated the question of which numbers n the
number 2n − 1 is prime.
It turns out that numbers of this type keep some sort of pseudoprime property.
For a > 1 and b > 1, we have
xab − 1 = (xa − 1) xa(b−1) + xa(b−2) + . . . + xa + 1 ,

(5.1)
and hence 2ab − 1 is never prime. If p is prime, then we call Mp = 2p − 1
a Mersenne number, and if Mp is itself prime, then we call Mp a Mersenne
prime. The largest known prime (as of January, 2016) is the Mersenne prime
274207281 − 1.

Example 5.4. The Mersenne numbers M2 , M3 , M5 , M7 are prime, but M11 =


2047 = 23 · 89 is not prime.

People naturally look for patterns, and hence it is somewhat natural to look
for primes of a certain “type” (following a general pattern). Suppose that you
are interested in finding all primes that are of the form an − 1. It turns out that
if an − 1 is prime for a, n ∈ N with n > 1, then one can show that it must be
the case that a = 2. We have seen above that n must then also be prime. This
is one reason why the search for Mersenne primes is natural.

Theorem 5.5.
(1) If n ∈ N with 2n ≡ 2 (mod n), then 2Mn ≡ 2 (mod Mn ) also holds, where
Mn := 2n − 1.
(2) If p is prime, then Mp prime or pseudoprime.
(3) If n is pseudoprime, then Mn is also pseudoprime.
(4) There are infinitely many pseudoprimes (to the base 2).

Proof.
31
(1) This holds for n = 1 because M1 = 1.
Now suppose that n > 1 and 2n ≡ 2 (mod n). Since 2n > 2, we may choose
k ∈ N such that
2n = 2 + kn.
n −1
Then 2Mn = 22 = 2 · 2kn . Therefore, using (5.1),
2Mn − 2 = 2(2kn − 1) = 2(2n − 1) 2(k−1)n + 2(k−2)n + . . . + 2n + 1


= 2Mn 2(k−1)n + . . . + 2n + 1 ≡ 0 (mod Mn ).




(2) Since p is prime, 2p ≡ 2 (mod p) by Fermat’s Little Theorem (Theorem 4.11).


By part (1), Mp is either prime or pseudoprime.
(3) From part (1), Mn is either prime or pseudoprime. However, (5.1) implies that
Mn is not prime because n is not prime by assumption.
(4) Since 341 is pseudoprime, we obtain an infinite sequence of pseudoprimes given
by
M341 , MM341 , MMM341 , . . . .

The study of Mersenne numbers has another interesting application.
Definition. A number n ∈ N is called a perfect number if n is equal to the sum
of its positive divisors d | n with d < n. In other words, n is perfect if and only if
⇔ σ(n) = 2n, where
X
σ(n) = d.
d|n
d>0

Example 5.6. The numbers 6 and 28 are perfect because


6 = 1 + 2 + 3,
28 = 1 + 2 + 4 + 7 + 14.
Theorem 5.7.
(1) If p is a prime of the form p = 2N − 1 with N ∈ N, then n = 2N −1 p is a perfect
number.
(2) Conversely, if n is an even perfect number, then n = 2N −1 (2N − 1) with N ∈ N,
where 2N − 1 is prime.
Proof.
(1) Set n = 2N −1 (2N − 1) with p = 2N − 1 prime. Then the positive divisors of n
are 1, 2, 22 , . . ., 2N −1 , p, 2p, . . ., and 2N −1 p. Therefore
   
σ(n) = (1 + p) 1 + 2 + 22 + . . . + 2N −1 = 2N 2N − 1 = 2n.
We conclude that n is a perfect number.
32
(2) Let an even perfect number n be given. Then σ(n) = 2n and n = 2s m for some
s ≥ 1 and odd m ≥ 1. We compute
σ(2s ) = 1 + 2 + . . . + 2s−1 + 2s = 2s+1 − 1 < 2 · 2s .
Therefore 2s is not perfect and we conclude that m ≥ 3.
Now suppose that c = c1 c2 with c1 , c2 ∈ N relatively prime. Every divisor d
of c can be uniquely written as a product d = d1 d2 with d1 | c1 and d2 | c2 . We
conclude that
  
X X X X
σ(c) = d= d1 d2 =  d1   d2  = σ(c1 )σ(c2 ).
d|c d1 |c1 d1 |c1 d2 |c2
d2 |c2

Therefore we conclude that if c1 and c2 are relatively prime, then


(5.2) σ(c1 c2 ) = σ(c1 )σ(c2 ).
Since 2s and m are relatively prime and n is perfect, we have
2s+1 m = 2n = σ(n) = σ(2s m) = σ(2s )σ(m) = (2s+1 − 1)σ(m).
Thus
(2s+1 − 1)σ(m) = 2s+1 m.

Now write d = gcd m, σ(m) . Since gcd(2s+1 , 2s+1 − 1) = 1, it follows that
σ(m) = 2s+1 d and m = (2s+1 − 1)d.
d2 are distinct
Since s ≥ 1, we have d ̸= m. If d = 2s+1 − 1, then 1, d, |{z}
=m
divisors of m, so
σ(m) ≥ 1 + d + d2 > d(1 + d) = d · 2s+1 = σ(m),
which is a contradiction. Thus d ̸= 2s+1 − 1. If d > 1, then 1, d, m, and 2s+1 − 1
are distinct divisors of m. It follows that
σ(m) ≥ 1 + d + m + 2s+1 − 1 = d + (2s+1 − 1)d + 2s+1 = 2s+1 (d + 1)
> 2s+1 d = σ(m),
which is again a contradiction. It follows that d = 1, so σ(m) = 2s+1 and
m = 2s+1 − 1. Therefore n = 2s (2s+1 − 1).
It remains to show that m is prime. If m were not prime, then m has a divisor
p ̸= m and it follows that
σ(m) ≥ 1 + p + m > m + 1 = 2s+1 .
Recalling that σ(m) = 2s+1 d and we have shown that d = 1, the inequality
σ(m) > 2s+1 leads to a contradiction, and we conclude that m = 2s+1 − 1 is
prime.

33
Examples 5.8.
6 = (22 − 1)2,
28 = (23 − 1)22 ,
496 = 31 · 16 = (25 − 1)24 .

Question. Are there odd perfect numbers? This is unknown, but it is conjectured
that there are none.

Definition. Another type of integer for which primality has been thoroughly tested
n
are the Fermat numbers Fn := 22 + 1.

Remark 5.9. Similarly to looking for primes of the shape an −1, the Fermat numbers
are a natural testing ground for primes following a pattern. We first note that if m
is not a power of 2, then 2m + 1 is not prime. Write m = vt with v ∈ N and t ≥ 3
odd. Similarly to (5.1), we have
2m + 1 = 2vt + 1 = (2v + 1) 2v(t−1) − 2v(t−2) + . . . − 2v + 1 .


Note that the above factorization requires t to be odd because for t even we have
(2v + 1) 2v(t−1) − 2v(t−2) + . . . + 2v − 1 = 2vt − 1.


Theorem 5.10. For every n ≥ 0, we have 2Fn ≡ 2 (mod Fn ) and hence Fn is either
prime or pseudoprime.
n n
Proof. From Fn = 22 + 1, we have 22 ≡ −1 (mod Fn ). Thus for every a ≥ 0 we
have
n
2a·2 ≡ (−1)a (mod Fn ).
n −n
Writing k = 22 ∈ 2Z, we have
n +1 n
2Fn = 2k·2 = 2 · 2k·2 .
Since k is even, (−1)k = 1 and hence combining the two formulas above yields
n
2Fn = 2 · 2k·2 ≡ 2(−1)k ≡ 2 (mod Fn ).

We next discuss an idea for checking whether a number is composite (not prime).
Similar to the idea of pseudoprimes, we check a condition which is satisfied by
primes.

Definition. Suppose that n − 1 = 2s t with t odd and s ≥ 1. Then for b ∈ N, we call


r
n a strong pseudoprime to the base b if either bt ≡ 1 (mod n) or b2 t ≡ −1 (mod n)
for some 0 ≤ r < s.
34
Example 5.11. We check whether the Carmichael number n = 561 = 3 · 11 · 17 is a
strong pseudoprime to the base 2. Firstly, we have
=t
z}|{4
n − 1 = 2 · 35 .
If a ≡ b (mod n) and p | n, then a ≡ b (mod p), so we only need to check modulo
each prime divisor of 561. Since 2 ≡ −1 (mod 3), we have (by Theorem 4.1 (5))
r ·35 r
22 ≡ (−1)2 (mod 3).
Thus if n is a stong pseudoprime to the base 2, then it must be the case that 2t ≡ −1
(mod n). However, since 24 ≡ −1 (mod 17), Theorem 4.1 (5) implies that
235 = 23 · (24 )8 ≡ 8 (mod 17).
Thus 235 ̸≡ ±1 (mod 17), and we conclude that n is not a strong pseudoprime to
the base 2.
r
Note that if n is a strong pseudoprime to the base b and b2 t ≡ ±1 (mod n) for
0 ≤ r < s, then
s r 2s−r
(5.3) bn−1 = b2 t = b2 t ≡ 1 (mod n).
Hence n is either prime or pseudoprime to the base b.
Theorem 5.12.
(1) If 2 ̸= n is prime, then n is a strong pseudoprime to every base b with gcd(b, n) =
1.
(2) If n ≥ 3 is odd, squarefree, and a strong pseudoprime to every base b with
gcd(b, n) = 1, then either n is prime or it is a Carmichael number.
Proof. (1) For a prime n ̸= 2 with n − 1 = 2s t and t odd, we recall that by Fermat’s
Little Theorem (Theorem 4.11) we have
s
b2 t = bn−1 ≡ 1 (mod n).
In the proof of Wilson’s Theorem (Theorem 4.16), we saw that (since n is prime)
the unique solution xa to the congruence axa ≡ 1 (mod n) is xa = a if and only
s−1
if a = ±1. Therefore a2 ̸≡ 1 (mod n). It follows that b2 t ≡ ±1 (mod n). If
s−1
b2 t ≡ −1 (mod n), then the condition for being a strong pseudoprime to the
s−1
base b is satisfied with r = s − 1. On the other hand, if b2 t ≡ 1 (mod n), then
s−2 r
b2 t ≡ ±1 (mod n). Inductively, there exists 0 ≤ r < s for which b2 t ≡ −1
(mod n) or bt ≡ 1 (mod n).
Alternative proof:
Suppose that n ̸= 2 is prime and n ∤ b. By Fermat’s Little Theorem (Theorem
4.11), we have
bn−1 − 1 ≡ 0 (mod n).
35
Therefore n | bn−1 − 1. Since n is an odd prime, we have s ≥ 1 and thus we next
rewrite  s−1   s−1 
n−1 2s t 2 t 2 t
b −1=b −1= b −1 b +1
Recursively using b2m − 1 = (bm − 1) (bm + 1), we obtain
s−1
Y r
bn−1 − 1 = bt − 1 b2 t + 1 .

(5.4)
r=0

Since n is prime and n divides the left-hand side of the above equation, we conclude
that n divides one of the factors on the right-hand side. In other words, either
bt ≡ 1 (mod n)
or for some 0 ≤ r < s
r
b2 t ≡ −1 (mod n).
This is precisely the condition defining strong pseudoprimes to the base b.
Note: This alternative proof motivates the definition of strong pseudoprimes
because the factorization (5.4) naturally shows that condition of being strongly
pseudoprime to the base b is indeed a stronger condition and the definition stems
out of this factorization, as a prime would have to divide one of the factors, while
a non-prime may not (it may have some prime factors in common with different
factors on the right-hand side of (5.4)).
(2) By (5.3), we have
bn−1 ≡ 1 (mod n)
for all b ∈ Z with gcd(b, n) = 1.
We claim that for every b ∈ Z
bn ≡ b (mod n).
Since n is squarefree, Theorem 4.6 (3) implies that this is equivalent to
bn ≡ b (mod p)
for every prime p | n. Write the prime factorization of n as n = rj=1 pj . Since n is
Q
a
squarefree, we have pj ̸= pℓ for j ̸= ℓ. If b = b′ c with gcd(b′ , n) = 1 and c = rj=1 pj j ,
Q

where aj ≥ 0, then
bn = b′n cn .
Since b′n−1 ≡ 1 (mod n), we have b′n ≡ b′ (mod n). Thus by Theorem 4.6 (3) and
Theorem 4.1 (5), we have
bn ≡ b′ cn (mod pj )
It remains to show that cn ≡ c (mod pj ). Clearly if pj | c, then this holds trivially.
Otherwise, using Theorem 4.6 (3), it suffices to show that for ℓ ̸= j,
pnℓ ≡ pℓ (mod pj ).
36
Q
Consider b := pℓ + ℓ′ ̸=ℓ pℓ′ . Then gcd(b, n) = 1 because pℓ ∤ b for all ℓ. Thus by
(5.3) we have
bn−1 ≡ 1 (mod pj ).
Q
Since b = pℓ + ℓ′ ̸=ℓ pℓ′ ≡ pℓ (mod pj ) (noting that j ̸= ℓ), by Theorem 4.1 (5) we
have
pn−1
ℓ ≡ bn−1 ≡ 1 (mod pj ).
From this we conclude that
pnℓ ≡ pℓ (mod pj ).
Therefore bn ≡ b (mod n) for every b ∈ Z, and we conclude that n is either prime
or a Carmichael number.

Definition. If n is not a stong pseudorprime to the base b, then we call b a witness


for the compositness of n (i.e., that n is not prime). Theorem 5.12 essentially says
that if no relatively prime witnesses exist, then n is either prime or a Carmichael
number.

Theorem 5.13 (Rabin’s Theorem). If n > 9 is odd and composite, then at least
3/4 of all residue classes b (mod n) are witnesses that n is not prime.

Although we don’t prove Rabin’s Theorem in this class, we discuss its implications.
Specifically, the following primality test is based on Rabin’s Theorem. Roughly
speaking, if one chooses a “random” b, there is at most a 1/4 chance that n will
be strongly pseudoprime to the base b. Checking enough bases, one can be “pretty
sure” that the number is prime because if none of the bases are witnesses, then the
probability is very nearly 1 that n is prime.
Rabin–Miller primality test
Choose a “small” k and bases b1 , . . . , bk . For a “large” odd number n, one tests if n
is a strong pseudoprime to the bases b1 , . . . , bk .
If not, then n is composite. If it is a strong pseudoprime to all of the bases, then
n is “probably prime”. If the witnesses are “independent”, then Rabin’s Theorem
states that the probability of falsely identifying a composite number as “probably
prime” is only 4−k .

Idea of public/private key encryption


(1) Messages are exchanged among a large group of people.
(2) Each person has a private key and a public key.
(3) The public key is given to everyone (for instance, your computer saves Google’s
public key).
(4) The private key is kept secret.
37
(5) Whenever you want to send a message M , you first change it into a number N
(for example, a = 1, b = 2, . . . , z = 26, and so forth).
(6) If you want to send the message to one person P (and keep it secret from
others!), you use P ’s public key to “lock” it. More specifically, the public key
gives a function EP (E stands for “encryption”) which sends one number to
another number; you send EP (N ).
(7) The message is sent to person P , but you can’t guarantee that others in between
don’t see it (for example, messages sent through the internet go through other
people’s computers!).
(8) The person P uses the private key to construct a function DP (D stands for
“decryption”).
(9) The functions EP and DP satisfy
EP ◦ DP = DP ◦ EP = id,
where id means the identity map id(N ) = N .
(10) One-way functions are used so that it is “hard” to determine DP if you only
know EP (remember that everyone has EP ).
One of the one-way functions used in this construction is the factorization of
integers.

RSA encryption algorithm (Rivest, Shamir, Adleman, 1977)


(The computer from) A chooses two “large” primes pA ̸= qA and computes n =
nA = pq, and then chooses e = eA ∈ N with 1 < e < (p − 1)(q − 1) = φ(n) and
gcd(e, φ(n)) = 1. Using the Euclidean algorithm, one can find d ∈ Z and x ∈ Z
such that
de + xφ(n) = 1,
or in other words
de ≡ 1 (mod φ(n)).
One sets dA ≡ d with d = dA ∈ {1, . . . , φ(n) − 1}.
The pair eA and n are the public key. The primes pA , qA , and the number dA are
the private key.
Now suppose that there is a message whose numerical code satisfies 0 ≤ N <
nA − 1 that needs to be encrypted. Then one computes
EA (N ) ≡ N eA (mod nA )
and sends this as the encrypted message. To decrypt the message, one computes
DA (N ) ≡ N dA (mod nA ).
Note that by Theorem 4.11 (2) (Euler’s Theorem), we have
DA (EA (N )) = N eA dA = N 1+kφ(n) ≡ N (mod nA ).
38
Therefore A can get the original message N back, while someone without dA would
have a difficult time figuring out how to get N back if they only know N eA .

Some remarks
• The algorithm only takes O(log nA ) operations (it should be fast for making
calculations)
• If one only knows n and not pA and qA , then φ(n) is not very easy to compute.
Otherwise one could find dA using the Euclidean algorithm.

39
6. Solving congruences and the Chinese remainder theorem
Theorem 6.1. Suppose that n, a, b ∈ Z with n > 0 and set d := gcd(a, n). Then the
congruence ax ≡ b (mod n) has a solution if and only if d | b. In this case, there
are precisely d different solutions modulo n.
If x0 is a solution to ad x ≡ 1 (mod nd ), then the numbers d1 bx0 + kn with k ∈


{0, 1, . . . , d − 1} are all of the distinct solutions modulo n to ax ≡ b (mod n).

Proof. First recall that by Theorem 4.6 (4), if ax ≡ b (mod n), then gcd(ax, n) =
gcd(b, n). Since d = gcd(a, n) | gcd(ax, n) = gcd(b, n), we have d | gcd(b, n). Thus
if a solution exists, we must have d | b.
Now suppose that d | b. We write a = da0 , b = db0 , and n = dn0 with a0 , b0 , n0 ∈
Z. By Theorem 4.5 (1), we see that ax ≡ b (mod n) is equivalent to a0 x ≡ b0
(mod n0 ).
Since gcd(a0 , n0 ) = 1, Theorem 4.15 implies that
a0 x ≡ 1 (mod n0 )
has precisely one solution x0 (mod n0 ) and all solutions in Z are given by x =
x0 + kn0 with k ∈ Z. Setting y0 = x0 b0 , we have
a0 y 0 ≡ a0 x 0 b 0 ≡ b 0 (mod n0 )
and again using Theorem 4.15 all solutions of this type are of the form b0 x0 + kn0
with k ∈ Z.
Therefore
1
y0 + kn0 = b0 x0 + kn0 = (bx0 + kn).
d
These give the distinct solutions modulo n for k ∈ {0, 1, . . . , d − 1}. □
Having solved linear congruences, we next consider linear systems of congruences.

Theorem 6.2 (Chinese remainder theorem). Let pairwise co-prime natural numbers
n1 , . . . , nr and a1 , . . . , ar , b1 , . . . , br ∈ Z be given such that gcd(nj , aj ) = 1 for j =
1, . . . , r.
The system of congruences aj x ≡ bj (mod nj ) for j = 1, . . . , r has precisely one
solution modulo n1 · · · nr .

Proof. By Theorem 4.15, aj x ≡ bj (mod nj ) has precisely one solution x = cj


(mod nj ).
Hence the solutions to aj x ≡ bj (mod nj ) are precisely cj + ynj with y ∈ Z.
Now set tj := nnj with n := n1 · · · nr . Since nℓ are pairwise coprime, we have
gcd(nj , tj ) = 1 for j = 1, . . . , r. Thus tj yj ≡ 1 (mod nj ) has a solution yj for
j = 1, . . . , r, and for j ̸= k we have
tj ≡ 0 (mod nk )
40
Now set
x = y1 t1 c1 + . . . + yr tr cr .
Since tj ≡ 0 (mod nk ), we have
x ≡ yk tk ck ≡ ck (mod nk ).
Therefore
ak x ≡ ak ck ≡ bk (mod nk ).
Thus x is a solution to the system of congruences. Suppose that y is another solution.
Then for each j = 1, . . . , r we have
aj (x − y) ≡ 0 (mod nj ),
or in other words nj | aj (x − y). Since gcd(nj , aj ) = 1, Theorem 3.11 (5) implies
that x − y ≡ 0 (mod nj ), and hence y ≡ x (mod nj ). We finally use Theorem 4.6
(3) to conclude that y ≡ x (mod n). □
Example 6.3. Pick r = 3 with n1 = 3, n2 = 5, and n3 = 7, so that n = 3 · 5 · 7. We
set a1 = a2 = a3 = 1, b1 = 2, b2 = 3, and b3 = 2. Then cj = bj , t1 = 35, t2 = 21,
t3 = 15, y1 = −1, y2 = 1, and y3 = 1.
One thus obtains a solution
x = −35 · 2 + 21 · 3 + 15 · 2 = 23.
The Chinese Remainder Theorem can also be written in an algebraic (ring-
theoretic) way. To describe the statement in this language, we first require some
definitions.
Definition. For rings R and S, one defines the Cartesian product by
R × S := {(r, s), r ∈ R, s ∈ S}.
The set R × S gives a ring, where the addition and multiplication are defined by
(r1 , s1 ) + (r2 , s2 ) = (r1 + r2 , s1 + s2 ),
(r1 , s1 ) · (r2 , s2 ) = (r1 r2 , s1 s2 ).
One calls the ring the direct sum and is usually written R ⊕ S. If R and S are
commutative, then so is the direct sum.
If R and S have identity elements 1R and 1S , then (1R , 1S ) is the identity element
of R ⊕ S.
Two rings R and S are called isomorphic (written R ≃ S or R ∼ = S) if there is a
bijective map ψ : R → S such that for every a, b ∈ R
ψ (1R ) = 1S ,
ψ(a + b) = ψ(a) + ψ(b),
ψ(ab) = ψ(a)ψ(b).
41
Every such map is called a ring isomorphism (any map following thes above rules
except that it is not necessarily bijective is known as a ring homomorphism). The
maps r 7→ (r, 0) and s 7→ (0, s) yield ring isomorphisms

R →{(r, 0)|r ∈ R} ⊆ R ⊕ S,

S →{(0, s)|s ∈ S} ⊆ R ⊕ S.
Through these isomorphisms, one can consider R and S as subrings of R ⊕ S.
Theorem 6.4 (Chinese Remainder Theorem - Ring Theory). Suppose that n ∈ N,
n > 1 and n = st with s and t relatively prime. Then
Z/nZ ≃ Z/sZ ⊕ Z/tZ.
If n has the prime factorization n = pa11 · · · par r with distinct primes p1 , . . . , pr , then
Z/nZ ≃ Z/pa11 Z ⊕ . . . ⊕ Z/par r Z.
Proof. It suffices to show the first claim because the second follows by induction on
the number of distinct prime divisors.
We define a mapping ψ from the residue classes a modulo n (i.e., in Z/nZ) to the
direct product Z/sZ ⊕ Z/tZ via
ψ(a) := (a + sZ, a + tZ).
The goal is to show that this is a ring isomorphism. We first prove that ψ is
well-defined; if one replaces a with a + nx for some x ∈ Z, then by definition
ψ(a + nx) = (a + nx + sZ, a + nx + tZ). However, since s | n and t | n, we have
nx + sZ = sZ and nx + tZ = tZ. Hence the definition is well-defined. One sees
ψ(a + b) = ψ(a) + ψ(b),
ψ(ab) = ψ(a)ψ(b)
directly. Therefore ψ is a ring homomorphism, and it remains to show that it is a
bijection.
Let u, v ∈ Z be given. Since gcd(s, t) = 1, Theorem 6.2 implies that there exists
x ∈ Z with x ≡ u (mod s) and x ≡ v (mod t), and furthermore that such x is
uniquely determined modulo n. We conclude that ψ is surjective because ψ(x) =
(u + sZ, v + tZ) and injective because ψ(y) = (u + sZ, v + tZ) implies that y ≡ x
(mod n). Therefore ψ is a ring isomorphism. □
Definition. Analogously to R ⊕ S, one defines the direct product of G × H of two
groups G and H. Suppose that U and V are subgroups of a group G for which
uv = vu for all u ∈ U and v ∈ V . Then
U V := {uv|u ∈ U, v ∈ V }
is also a subgroup of G. If U V = G, then we call G the internal direct product of U
and V .
42
Lemma 6.5. If R and S are commutative rings with identity, then
(R ⊕ S)× ≃ R× × S × .

Proof. A pair (r, s) ∈ R⊕S is a unit in R⊕S precisely when there exist (u, v) ∈ R⊕S
for which
(ru, sv) = (r, s)(u, v) = 1R⊕S = (1R , 1S ).
This is hence equivalent to the existence of u ∈ R and v ∈ S with ru = 1R , sv = 1S .
From this we conclude that r ∈ R× and s ∈ S × . □

Notation. For n ∈ N, let E(n) = (Z/nZ)× denote the group of primitive residue
classes modulo n.

Lemma 6.5 together with Theorem 6.4 hence gives the following direct corollary.

Corollary 6.6. If m and n are relatively prime natural numbers, then


E(mn) ≃ E(m) × E(n) and φ(mn) = φ(m)φ(n).

Definition. An arithmetic function is any function f : N → M from the natural


numbers to some set M . An arithmetic function f : N → C is called multiplicative if
f (mn) = f (m)f (n) for all m, n ∈ N with gcd(m, n) = 1 and f is not the (identically)
zero function.

For multiplicative functions, one has f (1) = 1 because if f (m) ̸= 0, then


0 ̸= f (m) = f (m · 1) = f (m)f (1).
If f is multiplicative and n = pa11 · · · par r with distinct primes p1 , . . . , pr , then one
sees (by induction) that
   
f (n) = f pa11 · · · f par r .

Example 6.7. The Euler phi-function φ is multiplicative by Corollary 6.6. For p


prime, we have
 
 
a a a−1 a−1 a 1
φ p =p −p = p (p − 1) = p 1 − .
p
Therefore, by telescoping, we find that
X    
φ(d) = φ(1) + φ(p) + φ p2 + . . . + φ pa
d|pa
   
= 1 + (p − 1) + p2 − p + . . . + pa − pa−1 = pa .
P
Theorem 6.8. If f : N → C is multiplicative and F (n) := d>0 f (d), then F is
d|n
also multiplicative.
43
Proof. Suppose that m, n ∈ N are relatively prime. By Theorem 3.13, the divisors
d of nm have a unique representation
d = d1 d2
with d1 | m, d2 | n and d1 , d2 > 0. Since n and m are relatively prime, it follows
that d1 and d2 are also relatively prime, and it follows that
X X X
F (mn) = f (d) = f (d1 d2 ) = f (d1 ) f (d2 )
d|mn d1 |m d1 |m
d2 |n d2 |n
  
X X
= f (d1 )  f (d2 ) = F (m)F (n).
d1 |m d2 |n


Theorem 6.9. The Euler phi-function φ is multiplicative and for n ∈ N we have
Y  1

φ(n) = n 1− ,
p prime
p
p|n
X
φ(d) = n.
d|n
d>0

Proof. It was already shown in Corollary 6.6 that φ is multiplicative. By multi-


plicativity, it suffices to compute φ(pa ), which was computed in the example before
Theorem 6.8. This yields
Y 1

φ(n) = n 1− .
p
p|n
Moreover, the formula X
φ(d) = pa
d|pa
P
was also proven in that example, and Theorem 6.8 implies that F (n) = d|n φ(d)
is multiplicative. Since F (pa ) = pa , we have F (n) = n. □
Definition. A function f : N → C is called totally multiplicative if f (mn) =
f (m)f (n) for every m, n ∈ N (no restriction on the gcd).
Example 6.10. The function n 7→ nk with k ∈ C is totally multiplicative, and hence
Theorem 6.8 implies that X
σk (n) = dk
d|n
is multiplicative. The functions σk are called sums of divisors functions.
In the proof of Theorem 5.7, we saw σ1 (n) = σ(n) and the subscript 1 is often
omitted in this case. The function σ0 (n) = τ (n) is number of divisors of n.
44
For prime powers pa with a ∈ N0 and k ̸= 0, one has
p(a+1)k − 1
σk (pa ) = 1 + pk + p2k + . . . + pak = .
pk − 1
In the special case k = 0, we have
τ (pa ) = σ0 (pa ) = a + 1
and !
a
Y Y
τ pj j = (aj + 1) .
j j

We next consider congruences of higher degree polynomials.


Definition. Let f (x) = am xm + . . . + a1 x + a0 be a polynomial with coefficients
aj ∈ Z. The degree of the congruence f (x) ≡ 0 (mod n) is the largest integer k with
n ∤ ak .
The number of solutions of f (x) ≡ 0 (mod n) is the number of distinct solutions
x0 modulo n to the congruence f (x0 ) ≡ 0 (mod n).
Theorem 6.11. If f (x) is a polynomial with coefficients in Z. For n ∈ N, let Nf (n)
be the number of solutions of f (x) ≡ 0 (mod n). Then Nf (n) is a multiplicative
function.
In particular, this congruence has a solution if and only if for every prime p with
pa | n and pa+1 ∤ n, the congruence f (x) ≡ 0 (mod pa ) is solvable.
Proof. Suppose that n = st with gcd(s, t) = 1. Every solution x modulo n is also a
solution modulo s and modulo t because f (x) ≡ 0 (mod n) implies that f (x) ≡ 0
(mod s).
Now suppose that (
y1 , . . . , yk with k = Nf (s),
z1 , . . . , zℓ with ℓ = Nf (t),
are the respective distinct solutions to
(
f (x) ≡ 0 (mod s),
f (x) ≡ 0 (mod t).
By Theorem 4.6 (3) it follows that x ∈ Z is a solution to f (x) ≡ 0 (mod n) if and
only if f (x) ≡ 0 (mod s) and f (x) ≡ 0 (mod t), which in turn implies that x ≡ yµ
(mod s) and x ≡ zν (mod t) for some µ ∈ {1, . . . , k} and ν ∈ {1, . . . , ℓ}.
Since gcd(s, t) = 1, Theorem 6.2 implies that for every pair (yµ , zν ) there exists
an x = xµ,ν ∈ Z for which x ≡ yµ (mod s) and x ≡ zν (mod t), and x is uniquely
determined modulo n.
It is also clear that different pairs (yµ , zν ) yield distinct xµ,ν . It thus holds that
Nf (n) = k · ℓ = Nf (s)Nf (t).
45

Theorem 6.12 (Hensel’s Lemma). Suppose that f (x) is a polynomial with integer
coefficients, p is a prime and a ∈ N. From each solution f (x) ≡ 0 (mod pa ), the
solutions to f (x) ≡ 0 (mod pa+1 ) may be determined by solving a certain linear
congruence.
Proof. Write f (x) = an xn + . . . + a1 x + a0 with aν ∈ Z. Every solution to f (x) ≡ 0
(mod pa+1 ) is clearly also a solution to f (x) ≡ 0 (mod pa ). Moreover, for every
r ∈ {0, . . . , p − 1},
f (x + rpa ) ≡ f (x) ≡ 0 (mod pa ).
Thus for every solution x to f (x) ≡ 0 (mod pa ), we would like to determine whether
x + rpa is a solution modulo pa+1 or not.
Writing h := rpa for each possible choice of r, under the assumption that f (x) ≡ 0
(mod pa ), we would like to determine whether
f (x + h) ≡ 0 (mod pa+1 )
holds or not. Since we already know that f (x) ≡ 0 (mod pa ) holds, it would be
helpful if we could write f (x + h) in terms of f (x). Luckily, Taylor’s Theorem does
this for us (note that Taylor’s Theorem is simpler for polynomials because there is
no error). Namely, we have
1 1
f (x + h) = f (x) + f ′ (x)h + f ′′ (x)h2 + . . . + f (n) (x)hn
2 n!
1 1
= f (x) + f ′ (x)rpa + f ′′ (x)r2 p2a + . . . + f (n) (x)rn pan .
2 n!
ak a+1
Since a ≥ 1, for k ≥ 2 we have p ≡ 0 (mod p ), so we’d like to conclude that
f (x + h) ≡ f (x) + f ′ (x)h (mod pa+1 ).
However, there is a problem; namely, the coefficient in front of pak is k!1 f (k) (x), which
might have a large power of p in its denominator from the k!, which might cancel
the power pak so that
1 (k)
f (x)rk pak ̸≡ 0 (mod pa+1 ).
k!
We would like to show that this cancellation with the denominator does not happen.
We claim that this coefficient is actually an integer; in other words, we next show
that, for every k = 0, . . . , n, we have
1 (k)
f (x) ∈ Z.
k!
In order to show that this coefficient is indeed an integer, we rewrite f (x + h) in
another way. Specifically, we use the Binomial Theorem to obtain
1 1
f (x) + f ′ (x)h + f ′′ (x)h2 + . . . + f (n) (x)hn = f (x + h)
2 n!
46
n n ν   n n  
!
X X X ν ν−k k X X ν
= aν (x + h)ν = aν x h = aν xν−k hk .
ν=0 ν=0 k=0
k k=0 ν=k
k
Now think of the above formula as a polynomial in h. Since it is a polynomial which
is equal for every h, it must be the case that the coefficients of hk match on both
sides. Comparing coefficients of hk , we conclude that for every k = 0, . . . , n, we have
n  
1 (k) X ν
f (x) = aν xν−k .
k! ν=k
k
The left-hand side of the above equation is the coefficient that we wanted to prove
is an integer. The numbers νk and aν are integers and, since x ∈ Z, we also have
xν−k ∈ Z. Thus the right-hand side of the above equation is an integer, and hence
the left-hand side is as well.
In particular, if y is a solution modulo pa , then for x = y + rpa with r ∈ Z, we
have
1
f (x) = f (y + rpa ) = f (y) + f ′ (y)rpa + f ′′ (y)r2 p2a + . . .
2
′ a a+1
≡ f (y) + f (y)rp (mod p ).
It follows that f (x) ≡ 0 (mod pa+1 ) if and only if
f (y) + f ′ (y)rpa ≡ 0 (mod pa+1 ).
Since pa | f (y), this is equivalent by Theorem 4.5 to
f (y)
f ′ (y)r ≡ − (mod p).
pa
The solutions to this linear congruence in r are in one-to-one correspondence with
the solutions x = y + rpa with f (x) ≡ 0 (mod pa+1 ).
If p ∤ f ′ (y), then Theorem 4.15 implies that there is a unique solution r.
If p | f ′ (y), then the number of solutions to the linear congruence is
(
0 if pa+1 ∤ f (y),
p if pa+1 | f (y).

Solving f (x) ≡ 0 (mod pa ):


(1) Start: Solve f (x) ≡ 0 (mod p).
(2) Recursion
For each solution y1 for the congruence f (x) ≡ 0 (mod pa ), one obtains a solu-
tion x to f (x) ≡ 0 (mod pa+1 ) with x ≡ y1 (mod pa ) in the form x = y1 + rpa ,
where r is a solution to the linear congruence f ′ (y1 )r ≡ − f (y
pa
1)
(mod p).
There are three cases:
(1) If p ∤ f ′ (y1 ), then there is a unique solution r.
47
(2) If p | f ′ (y1 ) and pa+1 ∤ f (y1 ), then there are no solutions.
(3) If p | f ′ (y1 ) and pa+1 |f (y1 ), then there are precisely p solutions (i.e., every
r ∈ {0, . . . , p − 1} is a solution).

Example 6.13. We consider the polynomial f (x) = x8 + 10x + 7 and we look for
solutions modulo powers of p = 3.
We begin by solving the equation modulo 3, and then we will apply Hensel’s
Lemma to obtain solutions for higher powers of 3. To find the solutions modulo 3,
we simply plug in x = 0, x = 1, and x = 2, and see by direct calculation that y1 = 1
is the unique solution to f (x) ≡ 0 (mod 3).
Now we are going to find solutions modulo 9. To do so, we use Hensel’s Lemma
(Theorem 6.12) with a = 1. From the proof of Theorem 6.12, we need to solve
f ′ (y1 )r ≡ f (y3 1 ) (mod 3) to find the solutions modulo 9. We therefore compute

f ′ (y1 ) = 8y17 + 10 = 18 ≡ 0 (mod 3),


2
f (y1 ) = 18 ≡ 0 (mod 3 ).

Since f ′ (y1 ) ≡ 0 (mod 3), either all choices of r satisfy the congruence or none of
them do, but f (y1 ) ≡ 0 (mod 9) implies that they indeed all satisfy the congruence.
Therefore we obtain three solutions to f (x) ≡ 0 (mod 9); namely, we have x1 = −2,
x2 = 1, x3 = 4.
We next continue to find solutions modulo 27 = 33 . For this, we use Hensel’s
Lemma (Theorem 6.12) with a = 2. We begin with each of the solutions x1 , x2 , x3
(mod 9) and apply Hensel’s Lemma in these cases. We thus again compute f ′ (xj ) ≡
0 (mod 3) for j = 1, 2, 3 and

27 ≡ 0 (mod 3)
 for j = 1,
f (xj ) 
= 2 ̸≡ 0 (mod 3) for j = 2,
32 
7287 ≡ 0 (mod 3) for j = 3.

Thus the solutions to f (x) ≡ 0 (mod 33 ) are


(
−2, 7, 16,
−5, 4, 13.

Remark 6.14. Hensel’s Lemma is often stated in the following form:

Theorem 6.15. Suppose that f is a polynomial with integer coefficients and x ∈ Z


is given such that f (x) ≡ 0 (mod pa ) and f ′ (x) ̸≡ 0 (mod p). Then there exists a
unique y modulo pa+1 such that f (y) ≡ 0 (mod pa+1 ) and y ≡ x (mod pa ).
If f (x) ≡ 0 (mod pa ) and f ′ (x) ≡ 0 (mod p), then there either exist p solutions
y modulo pa+1 such that f (y) ≡ 0 (mod pa+1 ) and y ≡ x (mod pa ) or no solutions
y modulo pa+1 with y ≡ x (mod pa ).
48
The form of Hensel’s Lemma given in Theorem 6.15 is useful for applying to a
given problem as long as p ∤ f ′ (x). The more general form given in Theorem 6.12
applies even when p | f ′ (x), but one may find it more difficult to use.
Theorem 6.16. Let p be prime and n denote the degree of the polynomial congru-
ence f (x) ≡ 0 (mod p). The congruence f (x) ≡ 0 (mod p) has at most n distinct
solutions modulo p.
Proof. This follows by polynomial long division, thinking of the coefficients as ele-
ments of the field Fp . □

49
7. Primitive roots and cyclic groups
Definition. If G is a finite group of size m and x ∈ G, then then there exists h ∈ N
(this depends on x) smallest with xh = e, where e is the identity element of G.
One calls h the order of x in G. By Lagrange’s Theorem (Theorem 4.13), we have
h|m. If h = m, then we have G = {e, x, x2 , . . . , xm−1 } and one calls G a cyclic group.
For a relatively prime to n, the order of a modulo n is the order of the residue class
a in the group E(n), i.e., the smallest k ∈ N with
ak ≡ 1 (mod n).
By Euler’s Totient Theorem (Theorem 4.11 (2)), the order is at most φ(n), and
Theorem 4.13 furthermore implies that the order of a is a divisor of φ(n).
Problem. For which n does there exist an a with order precisely φ(n) modulo n? In
other words, for which n is E(n) cyclic?
Definition. If a has order φ(n), then we call a a primitive root modulo n.
Recall that by Corollary 6.6, if n = pa11 · · · par r with distinct primes p1 , . . . , pr , then
   
E(n) ≃ E pa11 × . . . × E par r .

So one can investigate E(pa ) for p prime.


Examples 7.1.
• For n = 8, the group E(8) has the elements 1, 3, 5, and 7, and one sees from
a direct calculation that 32 ≡ 52 ≡ 72 ≡ 1 (mod 8). Therefore every element
other than 1 (the identity element) has order 2. From this one can conclude
that
E(8) ≃ Z2 × Z2
is a direct product of two cyclic groups of order 2 (and is not itself cyclic).
Here Zn is the cyclic subgroup of order n (it is isomorphic to the group Z/nZ
with addition).
• For n = 5, we note that E(5) contains the elements 1, 2, 3, and 4. We
compute
22 = 4,
32 = 9 ≡ 4 (mod 5),
42 = 16 ≡ 1 (mod 5)
Therefore 2 and 3 are primitive roots and
E(5) ≃ Z4
is a cyclic group of order 4.
50
• For n = 10, since E(2) is trivial, we also directly see that
E(10) ≃ E(5) × E(2) ≃ E(5) ≃ Z4 .
• For n = 12, we have
E(12) ≃ E(4) × E(3) ≃ Z2 × Z2 .
Theorem 7.2. Suppose that n ∈ N and a ∈ Z with gcd(a, n) = 1 and let h denote
h
the order of a modulo n. Then for k ∈ N, the element ak has the order gcd(h,k)
modulo n.
Proof. By assumption, we have
as ≡ 1 (mod n) ⇔ h | s.
Therefore
s
ak ≡1 (mod n) ⇔ h | ks ⇔ ks ≡ 0 (mod h)
By Theorem 4.5 (1), this is equivalent to
 
h h
s≡0 mod ⇔ s.
gcd(h, k) gcd(h, k)

Theorem 7.3. For a prime p, the group E(p) is cyclic and there are precisely
φ(p − 1) primitive roots modulo p.
Proof. The size of E(p) is φ(p) = p − 1. Hence the order of every a modulo p is a
divisor of p − 1.
For every d | p − 1, let ψ(d) be the number of a ∈ E(p) with order d. Since every
a has an order dividing p − 1, we have
X
ψ(d) = #E(p) = p − 1 = φ(p).
d|(p−1)

By Theorem 6.9, we also have


X
φ(d) = p − 1 = φ(p).
d|(p−1)

The identity between these two formulas hints at the fact that ψ(d) = φ(d), which
we next prove.
Recall first that by Theorem 6.16, the congruence
xd ≡ 1 (mod p)
has at most d solutions. On the other hand, if a has order d, then aj for j ∈
{0, . . . , d − 1} all satisfy
d j
aj = ad ≡ 1j = 1 (mod p),
51
and aj ̸≡ ak (mod p) for 0 ≤ j < k < d (since otherwise ak−j ≡ 1 (mod p),
contradicting the fact that the order is d). Thus if ψ(d) > 0, then aj for 0 ≤ j < d
are precisely the solutions to xd ≡ 1 (mod p), and hence in particular any element
of order d must be of the form aj for some j ∈ {0, . . . , d − 1}.
d
Furthermore, by Theorem 7.2, the order of aj is gcd(d,j) , and hence the element aj
also has order d if and only if gcd(d, j) = 1. We conclude that if ψ(d) > 0, then
ψ(d) = #{j : 0 ≤ j < d, gcd(d, j) = 1} = φ(d).
Therefore, for each d | (p − 1), we have ψ(d) = 0 or ψ(d) = φ(d). If ψ(d) = 0 for
some d, then
X X X
ψ(d) < φ(d) = ψ(d),
d|(p−1) d|(p−1) d|(p−1)

which is a contradiction. Therefore ψ(d) = φ(d) for every d, and we obtain in


particular ψ(p − 1) = φ(p − 1) > 0. □

Example 7.4.
p φ(p − 1) Primitive roots modulo p
7 2 3, 5
17 8 3, 5, 6, 7, 10, 11, 12, 13
19 6 2, 3, 10, 13, 14, 15
41 16 6, 7, 11, 12, 13, 15, 17, 19, 22, 24, 26, 28, 29, 30, 34, 35.
Theorem 7.5 (Criterium for cyclic groups). A finite group G of size m is cyclic if
and only if for every divisor d of m, there is at most one subgroup H of G with size
d.

Proof. A slight generalization of Theorem 4.13 of Lagrange shows that the size of a
subgroup H of G is a divisor of m. Namely, one can show that the size of xH is the
same as the size of H and they are disjoint. We keep adding elements to get that G
is a dijoint union rj=1 (xj H) and hence #G/#H = r.
S

We first assume that G is cyclic and generated by an element x, and will show
that there is at most one subgroup of size d for d | m.
For every d | m with d > 0, xm/d generates a subgroup of size d. If H is a subgroup

of size d, then there exists a smallest t ∈ N with xt ∈ H. If xt ∈ H, then for every
k, ℓ ∈ Z, we have
′ k  t′ ℓ
xkt+ℓt = xt x ∈ H.
In particular, Bezout’s Lemma (Theorem 3.10 (2)) implies that

xgcd(t,t ) ∈ H.
Since gcd(t, t′ ) | t and t is the minimal power of x in H, we conclude that gcd(t, t′ ) =
t, or in other words t | t′ . It follows that H is generated by xt . One concludes that
52
t | m and d = mt . Therefore H is the subgroup generated by xm/d . It follows that
this is the unique subgroup of size d.
Now assume for the converse that for every divisor d > 0 of m, there exists at
most one subgroup of size d, and we let ψ(d) denote the size of the set of elements
of G with order d. If ψ(d) > 0 and y ∈ G is an element of order d, then by the
same argument as in the proof of Theorem 7.3, there are exactly φ(d) elements of
the form y j ∈ G (with 0 ≤ j < d) with order d. We conclude that if ψ(d) > 0, then
ψ(d) ≥ φ(d). Moreover, letting
H = ⟨y⟩ := {y j : j ∈ Z} = {y j : 0 ≤ j < d}
be the subgroup of G generated by y, we see that #H = d. If ψ(d) > φ(d) for
some d | m, then it must be the case that there exists z ∈ G with order d and
z∈/ ⟨y⟩. However, the subgroup ⟨z⟩ also has size d and ⟨z⟩ =
̸ ⟨y⟩, contradicting the
uniqueness of subgroups of size d. Therefore we conclude that either ψ(d) = 0 or
ψ(d) = φ(d). Since every element has some order dividing m, we have
X
ψ(d) = #G = m.
d|m

On the other hand X


φ(d) = m,
d|m

so that X X
ψ(d) = φ(d).
d|m d|m

We conclude that ψ(d) = φ(d) for every d | m, and in particular ψ(m) = φ(m) > 0,
from which we conclude that G is cyclic. □

Theorem 7.6. Let K be a field and W be a finite subgroup of the group K × of


units. Then W is cyclic.

Proof. Let H ⊆ W be a subgroup of size d. The equation


xd = 1
has at most d solutions in the field K (by polynomial long division). At the same
time, every element of H satisfies the above equation, and thus H is the unique
subgroup of size d. We conclude by Theorem 7.5 that W is cyclic. □

Theorem 7.7 (Gauss). For n ∈ N, the following are equivalent.


(1) There exists a primitive root modulo n.
(2) The group E(n) of primitive residue classes is cyclic.
(3) We have n = 1, n = 2, n = 4, n = pa , or n = 2pa with a prime p ̸= 2 and
a ∈ N.
53
Proof.
The equivalence between (1) and (2) is clear because a primitive root generates
E(n).
We next prove (1) =⇒ (3).
We write n = 2a pa11 . . . par r with a ≥ 0 and distinct primes pj ̸= 2 with aj > 0.
By Theorem 6.9, if gcd(m, n) = 1 then
φ(mn) = φ(m)φ(n)
and φ(n) is even for all n ≥ 3. Suppose for contradiction that (3) is not satisfied.
Then one of the following holds:
(i) r ≥ 2,
(ii) r = 1, a ≥ 2,
(iii) r = 0, a ≥ 3.
If either (i) or (ii) holds, then φ(pa11 ) and φ(m) for m = pna1 are both even by
1
Theorem 6.9.
Thus for every c ∈ Z with gcd(c, n) = 1, Theorem 4.11 (2) implies that
 a1
 12 φ(m)
c 2 φ(n) = cφ(p1 )
1
≡ 1 (mod pa11 ) and
1
  21 φ(pa1 1 )
φ(n) φ(m)
c2 = c ≡ 1 (mod m).
1
Hence it follows by Theorem 4.6 (3) that c 2 φ(n) ≡ 1 (mod n). Thus there is no c
with order φ(n), contradicting the assumption of the existence of a primitive root
in (1).
If (iii) holds, then n = 2a with a ≥ 3, and the integers relatively prime to n are
c = 2b + 1, b ∈ Z. One sees directly that
c2 = 4b(b + 1) + 1 ≡ 1 (mod 8),
so that c2 = 8d + 1 for some d ∈ Z. Furthermore,
c4 = 16d(4d + 1) + 1 ≡ 1 (mod 16)
and one concludes by induction that
a−2
c2 ≡1 (mod 2a ) for all a ≥ 3.
Since φ(2a ) = 2a−1 , there is no primitive root c modulo 2a . We have hence shown
that (1) =⇒ (3).
Now suppose that (3) holds. Modulo 1, 2, and 4, we see that 3 is a primitive root.
Now suppose that p ̸= 2 is prime. By Theorem 7.3, there exists a primitive root c
modulo p. For b = c + pt with t ∈ Z, the binomial theorem implies that
p−1  
p−1
X p − 1 p−1−j j j
b = c t p ≡ cp−1 + (p − 1)cp−2 pt (mod p2 ).
j=0
j
54
Since p ∤ (p − 1)cp−2 , there exists t ∈ Z such that bp−1 = 1 + n1 p with n1 ∈ Z and
n1 ̸≡ 0 (mod p). It follows that bp−1 ̸≡ 1 (mod p2 ). However,
p  
p(p−1) p
X p j j
b = (1 + n1 p) = n1 p = 1 + n2 p2
j=0
j

with n2 ≡ n1 ̸≡ 0 (mod p). By induction, we obtain


k−1 (p−1)
bp = 1 + nk pk
with nk ≡ n1 ̸≡ 0 (mod p) for all k ≥ 2.
Let h be the order of b modulo pa . Since h | φ(pa ) = pa−1 (p−1), we have h = ps ·d
with s ≤ a − 1 and d | (p − 1). We next show that s = a − 1 and d = p − 1. We first
show that s = a − 1. Using bh ≡ 1 (mod pa ), we conclude that
s  p−1
bp (p−1) = bh d ≡ 1 (mod pa ).
Using the construction above of the integers nk with k = s + 1, we obtain
s (p−1)
1 + ns+1 ps+1 = bp ≡1 (mod pa ).
Since p ∤ ns+1 , it follows that s + 1 ≥ a, or in other words s ≥ a − 1. However, we
have already seen that s ≤ a − 1, and hence s = a − 1.
Repeatedly using Fermat’s Little Theorem (Theorem 4.11 (3)), we have
s
bd ≡ (bd )p = bh ≡ 1 (mod p).
Since b ≡ c (mod p) is a primitive root modulo p, it follows that (p − 1)|d. Hence
since d | (p − 1), we have shown that d = p − 1, and overall that
h = pa−1 (p − 1) = φ(pa ).
Thus b is a primitive root modulo pa for all a ≥ 1.
We finally consider the case n = 2pa . Recall that by Corollary 6.6, we have
φ(2pa ) = φ(2)φ(pa ) = φ(pa ).
For the primitive root b modulo pa constructed above, we note that both b and b+pa
are primitive roots modulo pa and precisely one of b or b + pa is odd. For the choice
b′ which is odd, we get a primitive residue class and the order of b′ is φ(pa ), so that
b′ is a primitive root modulo 2pa . Thus we have proven that (3) =⇒ (1). □
Remarks.
(1) Let p ̸= 2 be prime. Then one sees from the above proof that there exists
b ∈ Z, which is a primitive root modulo 2pa (and (mod pa )) for every a ≥ 1
(in particular, b depends on p, but it is independent of a). In particular, if b is
a primitive root modulo p2 , then it is a primitive root modulo pa . Furthermore,
if b is a primitive root modulo p, then either b or b + p (or both) is a primitive
root modulo pa .
55
(2) By induction, we have
5 ≡ 1 + 22 (mod 23 ),
52 ≡ 1 + 23 (mod 24 ),
..
.
a−3
52 ≡ 1 + 2a−1 (mod 2a )

for all a ≥ 3. Thus 5 has the order 2a−2 modulo 2a , and 5 generates a cyclic
subgroup H of size 2a−2 = 21 φ(2a ) in E(2a ). Since no primitive roots exist for
a > 3, this is the largest cyclic subgroup.
Suppose for contradiction that −1 ∈ H. Then there exists 0 < r < 2a−2 for
which
5r ≡ −1 (mod 2a ).
We conclude that
52r ≡ (−1)2 = 1 (mod 2a ),
so that 2r = 2a−2 , and hence r = 2a−3 . We conclude that
a−3
52 ≡ −1 (mod 2a )
and from above
a−3
52 ≡ 1 + 2a−1 (mod 2a ),
so that 2a−1 ≡ −2 (mod 2a ), which is a contradiction for a ≥ 3.
Therefore, for a ≥ 3 we have −1 ∈/ H and E(2a ) is the direct product of the
cyclic subgroups ⟨5⟩ and ⟨−1⟩ generated by 5 and −1. In particular, we have
E(2a ) ≃ Z2a−2 × Z2 .

Definition. Suppose that there exists a primitive root g modulo n ∈ N. Then g,


g 2 , g 3 , . . ., g φ(n) form a reduced residue system modulo n. Thus for every a ∈ Z
with gcd(a, n) = 1, there exists a unique k with 1 ≤ k ≤ φ(n) and a ≡ g k (mod n).
One calls k the index of a with respect to the base g modulo n (or the discrete
logarithm of a to the base g). One writes
k = indg (a) = indg (a, n) = ind(a, n; g).
The index satisfies the rules
indg (xy) ≡ indg (x) + indg (y) (mod φ(n)),
indg (xm ) ≡ m indg (x) (mod φ(n)).

One sees that the mapping k (mod φ(n)) 7→ g k (mod n) defines an isomorphism

(Z/φ(n)Z, +) → E(n).
56
Example 7.8. We would like to solve the congruence
x10 ≡ 13 (mod 17).
We use the primitive root g = 3 modulo 17. This yields the following table giving
the corespondence between the index and the residue classes (to go from k to k + 1,
one simply multiplies by 3 and then takes the answer modulo 17):
k 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
g k 3 9 10 13 5 15 11 16 14 8 7 4 12 2 6 1
Written in reverse, we get the following indices for the residue classes:
x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ind3 (x) 16 14 1 12 5 15 11 10 2 3 7 13 4 9 6 8

Since 13 ≡ 34 (mod 17) by the above table, the congruence x10 ≡ 13 (mod 17) is
equivalent to x10 ≡ 34 (mod 17), or in other words 310 ind3 (x) ≡ 34 (mod 17). Since
3a ≡ 3b (mod 17) if and only if a ≡ b (mod 16) (since 3 is a primitive root modulo
17 and φ(17) = 16), this may be written as the linear congruence
10 ind3 (x) ≡ ind3 (13) = 4 (mod 16).
which by Theorem 4.5 (1) is in turn equivalent to
5 ind3 (x) ≡ 2 (mod 8).
Using Theorem 4.5 (1) with a = 5 and noting that 52 ≡ 1 (mod 8), we conclude
that this is equivalent to
ind3 (x) ≡ 2 (mod 8).
Therefore ind3 (x) = 2 or ind3 (x) = 10 (since 1 ≤ ind3 (x) ≤ 16). From the abvoe
table, we see that x = 9 or x = 8 (corresponding to ind3 (x) = 2 and ind3 (x) = 10,
respectively) give the solutions to the congruence.
Question. From the above calculation, we see that it is useful to compute the indices
indg (x, p) for a given p, g and x (with p prime). How can one do this quickly?

The “giant steps - baby steps” algorithm of Shanks takes about O( p log p) oper-
ations to do this.
Given p and g, let q ∈ N be minimal with q(q + 1) ≥ p. One next computes
and saves g q , g 2q , . . ., g q·q (these are the “giant steps”). Suppose that k = indg (x) is
written in the form
k = ℓq + r
with 0 ≤ ℓ ≤ q and 0 ≤ r ≤ q − 1 (k may always be written in this form because
q(q + 1) ≥ p by assumption). Then we have
g ℓq+r = g k ≡ x (mod p),
57
and hence in particular
g ℓq ≡ xg −r (mod p).
We thus compute x, xg −1 , xg −2 , . . ., xg 1−q modulo p (actually, in practice one can
stop at xg −r , because we only need to compute until the above congruence holds)
and compare them with the numbers g ℓq that we have already computed (these are
the “baby steps”).
Doing this, one obtains a pair r, ℓ with
xg −r ≡ g ℓq (mod p),
from which we conclude that k = indg (x) = ℓq + r.

Definition. If xn ≡ a (mod p) is solvable in Z, then we call a an n-th power modulo


p. Here p is prime and p ∤ a.
In particular, one calls a a quadratic residue (resp. quadratic non-residue) modulo
p if x2 ≡ a (mod p) in Z is solvable (resp. not solvable).

Theorem 7.9. Let p be a prime, n ∈ N, and a ∈ Z with p ∤ a be given. Then xn ≡ a


(mod p) has either exactly d = gcd(n, p − 1) solutions or no solutions, respectively
p−1
depending on whether a d ≡ 1 (mod p) or not. In particular, if d = 1, then xn ≡ 1
(mod p) has a unique solution.

Proof. Suppose that x0 is a solution. Then by Fermat’s Little Theorem (Theorem


4.11 (3)), we have
p−1 p−1 n
a d ≡ (xn0 ) d = x0p−1 d ≡ 1 (mod p).
p−1
Thus, by contrapositive, there are no solutions if a d ̸≡ 1 (mod p).
p−1
We now conversely assume that a d ≡ 1 (mod p). Let g be a primitive root
modulo p and set k := indg (a), so that a ≡ g k (mod p). Then we have
k(p−1) p−1
g d ≡a d ≡1 (mod p).
Since g is a primitive root, it follows that
k(p − 1)
≡ 0 (mod p − 1),
d
and hence d | k. Setting y := indg (x), the congruence
g ny ≡ xn ≡ a ≡ g k (mod p)
is equivalent to
ny ≡ k (mod p − 1).
Since d | k, Theorem 6.1 gives precisely d solutions y to this linear congruence.
Hence there are also d solutions to the equivalent congruence xn ≡ a (mod p). □
58
Theorem 7.10 (Legendre). Suppose that p ̸= 2 is prime and a ∈ Z with p ∤ a.
1
Then x2 ≡ a (mod p) has precisely two solutions in the case a 2 (p−1) ≡ 1 (mod p)
1
and no solutions in the case a 2 (p−1) ≡ −1 (mod p).

Proof. One uses Theorem 7.9 and notes that


 1 2
a 2 (p−1) ≡ ap−1 ≡ 1 (mod p)
1 1
implies that either a 2 (p−1) ≡ 1 (mod p) or a 2 (p−1) ≡ −1 (mod p) (note that ±1 are
the two solutions in Theorem 7.9 for a = 1).

Special case: a = −1
In this case
1 1
a 2 (p−1) = (−1) 2 (p−1) ≡ 1 (mod p)
1
(p−1)
is equivalent to (−1) 2 = 1, which is in turn equivalent to p ≡ 1 (mod 4). So
2
x ≡ −1 (mod p) is solvable if and only if p ≡ 1 (mod 4). To state this another
way, we see that −1 is a quadratic residue if and only if p ≡ 1 (mod 4).
We are next going to investigate rational numbers and a connection between
repeating decimal expansions and the order of 10 in the multiplicative group (Z/cZ)×
for c satisfying gcd(c, 10) = 1. We begin by reviewing decimal expansions.
a
Definition. Let α ∈ Q with 0 < α < 1 be given and write it as α = b
with a, b ∈ Z
satisfying gcd(a, b) = 1. The decimal expansion

X
α = 0.a1 a2 a3 . . . = aν 10−ν
ν=1

with aν ∈ {0, 1, . . . , 9} is unique as long as one does not allow an expansion where
aν = 9 for every ν > ν0 .
One calls the decimal expansion repeating (also known as recurring), if there exist
integers r ≥ 0 and s > 0 such that
aν+s = aν
for every ν > r; in other words the numbers aν repeat with period s after an initial r
digits. If r and s are chosen minimally with this property, then we call s the length
of the period and ar+1 . . . ar+s is called the repetend.
A common notation for a repeating decimal expansion is
α = 0.a1 . . . ar ar+1 . . . ar+s .
Here the line is written over the repetend; other notations include putting dots above
the digits in the repetend or separating the repetend with a space and writing . . .
afterwards, such as denoting 0.357 232 . . . to mean 0.357232.
59
We furthermore say that a decimal expansion is terminating if it has finite length
(or, equivalently, if it is a repeating decimal expansion and the repetend is 0, but
many authors exclude terminating decimal expansions when defining repeating dec-
imals).

Examples 7.11.
1 1
= 0.06 = 0.142857.
15 7
Theorem 7.12. A decimal expansion represents a rational number if and only if it
is either repeating or terminating.

Proof. The decimal representation of α = ab for 0 < α < 1 is obtained by noting


that since a < b (from α < 1), we have 10a < 10b, and hence 10a = a1 b + r with
0 ≤ a1 < 10 and 0 ≤ r < b. This yields the first decimal a1 in the expansion, and
r
b
< 1 allows us to continue recursively; applying this method, we obtain
a = r1 , 0 < r1 < b,
10r1 = a1 b + r2 , 0 ≤ r2 < b,
10r2 = a2 b + r3 , 0 ≤ r3 < b,
..
.
Hence
a 10r1 a1 r2 a1 a2 r3
α= = = + = + 2 + 2 = . . . = 0.a1 a2 a3 . . . .
b 10b 10 10b 10 10 10 b
The decimal representation terminates if and only if rn = 0 for some n. Otherwise,
since rν ∈ {0, 1, . . . , b − 1} only has possibly many choices, there exist µ and ν with
µ < ν and rµ = rν . The decimal expansions of rν /b and rµ /b are hence identical,
and we see from the algorithm above that
ak = ak+(ν−µ)
for all k ≥ µ. Thus the decimal expansion is repeating (with period length at most
b).
To show the converse, we assume that the decimal representation of α is periodic,
writing α = 0.a1 . . . ar ar+1 . . . ar+s . If the expansion terminates (i.e., if ar+j = 0 for
all j > 0), then
r
1 X
α = r a1 . . . ar = aν 10−ν
10 ν=1
is rational. Otherwise, we have
10r+s α − 10r α = a1 . . . ar ar+1 . . . ar+s − a1 . . . ar = a ∈ N.
a
Thus α = 10r (10s −1)
∈ Q. □
60
Theorem 7.13. Suppose that α = ab ∈ Q with 0 < a < b, gcd(a, b) = 1, and
b = 2u 5v c, where u ≥ 0, v ≥ 0, c ≥ 1, and gcd(c, 10) = 1.
The length s of the period and the length r of the decimals preceding the periodic
part in the decimal expansion of α are given by r = max{u, v} and s is the order of
10 in the group E(c) of primitive residue classes modulo c.
In particular, s | φ(c) and s = φ(c) if and only if 10 is a primitive root modulo c.
The decimal expansion terminates if and only if c = 1.
Proof. Suppose that α = 0.a1 . . . ar ar+1 . . . ar+s . Then, as decimal representations,
10r+s α = a1 . . . ar (ar+1 . . . ar+s ).ar+1 . . . ar+s ,
10r α = a1 . . . ar .ar+1 . . . ar+s ,
so that
a
10r+s − 10r = 10r+s − 10r α = a1 . . . ar ar+1 . . . ar+s − a1 . . . ar ∈ N.

b
Therefore b | 10 (10s − 1)a, and since gcd(a, b) = 1, it follows that b | 10r (10s − 1).
r

This in turn applies that 2u 5v | 10r and c | (10s − 1). It follows that r ≥ max{u, v}
and 10s ≡ 1 (mod c).
We next assume that 2u 5v | 10q and c | (10t − 1) for some q ≤ r and t ≤ s. Then
we have
a 10q 10t − 1
10q+t α − 10q α = 10q (10t − 1) = u v a = ax
b 2 5 c
for some x ∈ Z. It follows that α has period length at most t and q decimals before
the periodic part of the decimal expression. Since r and s were chosen minimally
with this property, we conclude that q ≥ r and t ≥ s. It follows that q = r and
t = s.
Therefore r and s are the minimal q and t for which 2u 5v | 10q and c | (10t − 1).
We see that q = max{u, v} directly and since 10t ≡ 1 (mod c) means that the order
of 10 modulo c is equal to t. This yields the claim. □

A problem of Abel:
For which primes p does p12 have the same period length as p1 ?.
For example, for p = 3, we have
1
= 0.3
3
1
= 0.1.
9

61
8. Quadratic reciprocity
Note that for p ̸= 2 prime and p ∤ a, by completing the square, the quadratic
congruence
ax2 + bx + c ≡ 0 (mod p),
is equivalent to
(2ax + b)2 + (4ac − b2 ) ≡ 0 (mod p),
and hence equivalent to the system
y 2 ≡ d (mod p), 2ax + b ≡ y (mod p),
where d := b2 − 4ac. Since we studied linear congruences already in Theorem 6.1,
we next consider x2 ≡ a (mod p) and ask the following questions:
Questions.
(1) Given fixed p, for which a is x2 ≡ a (mod p) solvable?
(2) For fixed a, which p yield a solution to x2 ≡ a (mod p)?
 
a
Definition. For a ∈ Z and p ̸= 2 prime, we define the Legendre symbol p
by

  
 1 if p ∤ a and x2 ≡ a (mod p) is solvable,
a 
:= −1 if p ∤ a and x2 ≡ a (mod p) not solvable,
p 
0 if p | a.

 
a
The number of solutions to x2 ≡ a (mod p) in every case is given by 1 + p
.

By Theorem 7.9, we have the following.


Theorem 8.1 (Legendre).For a ∈ Z and p ̸= 2 prime, the number of solutions to
2 a
x ≡ a (mod p) equals 1 + p and furthermore the Legendre symbol is evaluated by
  ( 1
a 1 if a 2 (p−1) ≡ 1 (mod p),
= 1
p −1 if a 2 (p−1) ≡ −1 (mod p).
Remark 8.2. The resulting congruence
 
a p−1
≡a 2 (mod p)
p
is called Euler’s criterion, which first appeared in a paper of Euler in 1748.
Theorem 8.3. Suppose that a, b, p ∈ Z, with p > 2 prime and p ∤ ab. Then the
following hold.
(1) If a ≡ b (mod p), then    
a b
= .
p p
62
  1
(2) We have Euler’s criterion, namely ap ≡ a 2 (p−1) (mod p).
(3) We have
    
ab a b
= ,
p p p
 2
a
= 1,
p
 2   
ab b
= ,
p p
 
1
= 1.
p
The quadratic residues (those a with solutions to x2 ≡ a (mod p)) form a sub-
group of E(p).
(4) We have  
−1 1
= (−1) 2 (p−1) .
p
Proof.
(1) This follows directly from the definition.
(2) This follows by Theorem 8.1.
(3) By part (2), for all c we have
 
c 1
≡ c 2 (p−1) (mod p).
p
Therefore
    
ab 1
(p−1) 1
(p−1) 1
(p−1) a b
≡ (ab) 2 = a2 b2 ≡ (mod p).
p p p
Noting that the Legendre symbol is always ±1 for p ∤ c, we conclude that the
Legendre symbols are equal. The special case b = a yields
   2
ab a
≡ = 1 (mod p).
p p
We then also have
a2 b a2
  
   
b b
= = .
p p
p p
   2
The identity p = 1 follows from the a = 1 case of ap = 1.
1

Finally, we note that the quadratic residues form a subgroup because if a is


a quadratic residue and b is a quadratic residue, then so is ab (since abp
=
  
a b
p p
= 1).
(4) This is the a = −1 case of part (2).
63

We next give an alternative proof of Euler’s criterion (Theorem 8.3 (2)).
 
Proof of Euler’s criterion without the results from Section 7. If ap = 1, then there
exists x0 ∈ Z for which x20 ≡ a (mod p). Thus, by Fermat’s Little Theorem (Theo-
rem 4.11 (3)), we conclude that
 
1
(p−1) p−1 a
a2 ≡ x0 ≡ 1 = (mod p).
p
 
Now suppose that ap = −1. By Theorem 6.1, for each k ∈ {1, 2, . . . , p − 1}
 
there is a unique ℓ ∈ {1, 2, . . . , p − 1} for which kℓ ≡ a (mod p). Since ap = −1,
we have ℓ ̸= k for every k. The residue classes 1, . . . , p − 1 split into p−1
2
pairs (k, ℓ)
with kℓ ≡ a (mod p). It follows from Wilson’s Theorem (Theorem 4.16) that
p−1  
1
(p−1)
Y Y a
a2 ≡ kℓ = k = (p − 1)! ≡ −1 = (mod p).
k=1
p
(k,ℓ)


Theorem 8.4 (Lemma of Gauss). Suppose that p ̸= 2 is prime and a ∈ Z with
p ∤ a. For each of the integers ℓa with 1 ≤ ℓ ≤ p−1
2
(i.e., a, 2a, 3a, . . ., p−1
2
a), let
mℓ be smallest in absolute value which  congruent to ℓa modulo p. Let N be the
 is
a
number of ℓ for which mℓ < 0. Then p = (−1)N .

Proof. Set S = {1, 2, . . . , p−1


2
}. Then either mℓ ∈ S or −mℓ ∈ S and N is the number
of ℓ with −mℓ ∈ S. We claim that for k ̸= ℓ we have |mℓ | ̸= |mk |. If mℓ = mk ,
then (ℓ − k)a ≡ 0 (mod p), and hence (ℓ − k) ≡ 0 (mod p). Since k, ℓ ∈ S, we have
k = ℓ. If mℓ = −mk , then (ℓ + k)a ≡ 0 (mod p), so that ℓ + k ≡ 0 (mod p). Since
k, ℓ ∈ S, this is impossible.
We conclude that the map ℓ 7→ |mℓ | is a permutation of S (that is to say, it just
reorders the numbers). It follows that
   
p−1 N 12 (p−1) p−1
! = ℓ = |mℓ| ≡ (−1) ! (mod p).
Y Y Y
N
aℓ = (−1) a
2 ℓ∈S ℓ∈S ℓ∈S
2

Since p ∤ p−1

2
!, it follows that
1
1 ≡ (−1)N a 2 (p−1) (mod p),
and hence
1
a 2 (p−1) ≡ (−1)N(mod p).
 
By Euler’s criterion (Theorem 8.3 (2)), it follows that ap ≡ (−1)N (mod p), and
 
hence ap = (−1)N . □
64
Theorem 8.5. Let p, a and N be given as in Theorem 8.4. Then
X  ℓa  p2 − 1
N≡ + (a − 1) (mod 2).
1
p 8
1≤ℓ≤ 2 (p−1)

If a is odd, then we also have


 
X ℓa
N≡ (mod 2).
p
1≤ℓ≤ 21 (p−1)

Here for x ∈ R we use ⌊x⌋ to denote the largest integer ≤ x.

Proof. For ℓ ∈ S = 1, 2, . . . , p−1



2
, we let mℓ denote the element with smallest
absolute value congruent to ℓa as in Theorem 8.4. In other words, ℓa ≡ mℓ (mod p)
and |mℓ | < p2 .
The inequality |mℓ | < p2 and ℓa ≡ mℓ (mod p) implies that
  (
ℓa mℓ if mℓ > 0,
ℓa − p =
p p + mℓ if mℓ < 0.
It is thus natural to let s1 , . . . , sM be the elements mℓ with mℓ > 0 and r1 , . . . , rN
be the numbers mℓ < 0 (recall that in Theorem 8.4, N counts the number of ℓ such
that mℓ < 0). We see that we have
X X  ℓa 
ℓa = p + s1 + . . . + sM + (p + r1 ) + . . . + (p + rN ).
ℓ∈S ℓ∈S
p

Furthermore, since (as shown in the proof of Theorem 8.4) ℓ 7→ |mℓ | is a permutation
of the set S, we have
X X
ℓ= |mℓ | = (s1 + . . . + sM ) − (r1 + . . . + rN ).
ℓ∈S ℓ∈S

Moreover, by a famous observation of Gauss,


p−1
 p−1 p−1

2 2 2
X X 1 X X
ℓ= ℓ=  ℓ+ ℓ
ℓ∈S ℓ=1
2 ℓ=1 ℓ=1
p−1
2 
p2 − 1

ℓ→ p+1
2
−ℓ 1X p+1 p+1 p−1
= ℓ+ −ℓ = · = .
2 ℓ=1 2 4 2 8

Therefore, subtracting the two formulas above yields


p2 − 1 X X X  ℓa 
(a − 1) = ℓa − ℓ=p + pN + 2(r1 + . . . + rN ).
8 ℓ∈S ℓ∈S ℓ∈S
p
65
Since p is odd, it follows that
X  ℓa  p2 − 1
N ≡ pN ≡ + (a − 1) (mod 2).
ℓ∈S
p 8

This yields the first claim. The second claim follows because for a odd we have
p2 − 1
(a − 1) ≡0 (mod 2).
8

Theorem 8.6. For all primes p ̸= 2, we have


 
−1 1
= (−1) 2 (p−1) ,
p
 
2 1 2
= (−1) 8 (p −1) .
p
In other words, the integer 2 is a quadratic residue modulo p for all primes p ≡ 1, 7
(mod 8) and 2 is a quadratic non-residue modulo p for all primes p ≡ 3, 5 (mod 8),
while the integer −1 is a quadratic residue if p ≡ 1 (mod 4) and a quadratic non-
residue if p ≡ −1 (mod 4).

Proof. The first claim is Theorem 8.3 (4)).  


For a = 2, Gauss’s Lemma (Theorem 8.4) and Theorem 8.5 imply that p2 =
(−1)N with
X  2ℓ  p2 − 1
N≡ + (mod 2).
p−1
p 8
1≤ℓ≤ 2

p−1
However, since 1 ≤ ℓ ≤ 2
, we have 2 ≤ 2ℓ ≤ p − 1, and hence for every such ℓ we
have  
2ℓ
= 0.
p
Therefore we conclude that
p2 − 1
N≡ (mod 2).
8
  1
2 2 −1)
from which p
= (−1) 8 (p follows. □

By Theorem 8.3 (3), we conclude the following corollary.

Corollary 8.7. For every prime p ̸= 2, we have


     (
−2 −1 2 1 for p ≡ 1 or 3 (mod 8),
= =
p p p −1 for p ≡ 5 or 7 (mod 8).
66
The next theorem relates the Legendre symbols (p/q) and (q/p). In other words,
if one knows that p is a quadratic residue modulo q, can one determine whether q
is a quadratic residue modulo p?

Theorem 8.8 (Quadratic reciprocity). Suppose that p and q are distinct odd primes.
Then we have   
p q 1
= (−1) 4 (p−1)(q−1) .
q p
Hence   
− pq for p ≡ q ≡ 3 (mod 4),
  
p
= q
q  otherwise.
p

Proof. Set
 
2 p−1 q−1
T := (x, y) ∈ N : 1 ≤ x ≤ ,1 ≤ y ≤ ,
2 2
 
q
T1 = (x, y) ∈ T : y < x ,
p
 
p
T2 = (x, y) ∈ T : x < y .
q
For (x, y) ∈ T2 we have y > pq x, so it follows directly that T1 ∩ T2 = ∅. Furthermore,
for (x, y) ∈ T we have gcd(qx, p) = 1, so y = pq x is impossible. Therefore T = T1 ∪T2
and T1 ∩ T2 = ∅. It follows that
p−1 q−1 X  qx  X  py 
· = #T = #T1 + #T2 = + .
2 2 p−1
p q−1
q
1≤x≤ 2
1≤y≤ 2

By Theorem 8.5 and Theorem 8.4, we have


 
P q
⌊ qxp ⌋ = (−1)N =
(−1) x
.
p
 
Reversing the roles of p and q, we get a formula for pq as well, concluding that
  
(p−1) (q−1) p q
(−1) 2 2 = .
q p

There are many different proofs of quadratic reciprocity. We next give another
slightly different proof.
2πi 2πi
1, 2, . . . , p−1

Second proof of quadratic reciprocity. We set η = e p , ξ = e q , S =
2
,
and R = 1, 2, . . . , q−1
 p 2πi ℓ
2
. Next note that since η = e = 1, η only depends on
the residue class of ℓ (mod p). Hence with mℓ as in Theorem 8.4 (with a = q), we
67
mℓ
see that η qℓ = η mℓ . Set εℓ = |mℓ |
(i.e., εℓ = 1 if mℓ > 0 and εℓ = −1 if mℓ < 0). We
then obtain that
Y η qℓ − η −qℓ Y η mℓ − η −mℓ
=
ℓ∈S
η ℓ − η −ℓ ℓ∈S
η ℓ − η −ℓ
Y η εℓ |mℓ | − η −εℓ |mℓ |
= .
ℓ∈S
η ℓ − η −ℓ
Now note that
η εℓ |mℓ | − η −εℓ |mℓ | = εℓ η |mℓ | − η −|mℓ | ,


so that the above formula becomes


Y  η |mℓ | − η −|mℓ | 
εℓ .
ℓ∈S
η ℓ − η −ℓ
We recall that, from the proof of Theorem 8.4, the map ℓ 7→ |mℓ | is a permutation
of the set S. Thus
|mℓ |
− η −|mℓ | −ℓ
 ℓ

Y η |mℓ | − η −|mℓ | Y
Q Q
ℓ∈S η ℓ∈S η − η
Y Y
εℓ ℓ − η −ℓ
= ε ℓ Q ℓ − η −ℓ )
= ε ℓ Q ℓ − η −ℓ )
= εℓ .
ℓ∈S
η ℓ∈S ℓ∈S (η ℓ∈S ℓ∈S (η ℓ∈S

Therefore, since εℓ = −1 if and only if mℓ < 0, we obtain that (again denoting by


N the size of the set of ℓ ∈ S with mℓ < 0)
Y η qℓ − η −qℓ Y
ℓ − η −ℓ
= εℓ = (−1)N .
ℓ∈S
η ℓ∈S

By Theorem 8.4, we conclude that


Y η qℓ − η −qℓ  
q N
= (−1) = .
ℓ∈S
η ℓ − η −ℓ p
We next claim that for any complex number z ∈ C with z ̸= 0, one has (recalling
2πi
that ξ = e q )
Y
z q − z −q = ξ b z − ξ −b z −1 .


b (mod q)

This identity is equivalent to


Y
z 2q − 1 = ξ b z 2 − ξ −b .


b (mod q)

Both sides are polynomials of degree 2q, and the identity follows by comparing the
roots of both sides (there are precisely 2q roots of each side, namely ±ξ b with b
(mod q)). Thus it follows that, using z = η a ,
η qa − η −qa Y
b a −b −a

= ξ η − ξ η .
η a − η −a
b (mod q)
b̸≡0 (mod q)
68
Writing (by pairing the b ∈ R and q − b terms and noting that ξ q = 1)
Y  Y b a
ξ b η a − ξ −b η −a = ξ η − ξ −b η −a ξ −b η a − ξ b η −a ,
 

b (mod q) b∈R
b̸≡0 (mod q)

we therefore obtain
  Y
q Y   YY  
= ξ b η a − ξ −b η −a = ξ b η a − ξ −b η −a ξ −b η a − ξ b η −a
p a∈S a∈S b∈R
b (mod q)
b̸≡0 (mod q)
Y Y    
= η 2a + η −2a − ξ 2b + ξ −2b .
a∈S b∈R

Reversing the roles of p and q reverses the order of the difference in the last product,
giving a factor of −1 for each element of S and R. Thus
     
p #R#S q p−1 q−1 q
= (−1) = (−1) 2 2 .
q p p

Example 8.9. Is x2 ≡ 280 (mod 641) soluble?


280

First note that p = 641 is prime, so the answer is determined by computing 641
.
Since 280 = 23 · 5 · 7, we obtain
        
280 Thm. 8.3 2 5 7 Thm. 8.6 5 7
= =
641 641 641 641 641 641
Thm. 8.8        
(Quad. Rec.) 641 641 1 4
= = = 1.
5 7 5 7
Thus x2 ≡ 280 (mod 641) has precisely two solutions.

This example shows how solving the congruence modulo large primes can be
greatly simplified with quadratic reciprocity. It can also be used to figure out the
primes for which a given integer is a quadratic residue.

Examples 8.10.
• For which
 primes
 p is 5 a quadratic residue modulo p?
5 p

Since p = 5 , we see that 5 is a quadratic residue if and only if p ≡ 1, 4
(mod 5).
• For which primes p is 7 a quadratic residue modulo p?
We have  
7 (p−1)
p
1= = (−1) 2
p 7
if and only if p ≡ 1, 9, 25, 3, 19 or 27 (mod 28).
69
Definition. Suppose that P, Q ∈ Z and furthermore that Q > 0 is odd and has
prime factorization Q = q1 · ·· qs , where q1 , . . . , qs are (not necessarily distinct)
P
primes. The Jacobi symbol Q is then defined by
     
P P P
(8.1) = ··· .
Q q1 qs
Note the following properties of the Jacobi symbol, which all follow directly from
the definition unless otherwise stated.  
P
(1) If Q is prime, then the Jacobi symbol coincides with the Legendre symbol Q .
 
P
(2) If gcd(P, Q) > 1, then Q = 0.
 
P
(3) If gcd(P, Q) = 1, then Q ∈ {−1, 1}.
(4) By the Chinese remainder theorem (Theorem 6.2), the solvability of the congru-
ence
x2 ≡ P (mod Q)
is equivalent to the solvability of the family of congruences
x2 ≡ P (mod q a )
for every prime q and a ∈ N with q a | Q. Hence if x2 ≡ P (mod Q) is solvable,
then it follows that  
P
=1
qj
   
P P
for all j = 1, . . . , s, and hence we also have Q = 1. Thus if Q = −1, then
we can conclude that the congruence is not solvable. The reverse direction
  does
P
not hold, however; that is to say, there exist P and Q for which Q = 1, but
x2 ≡ P (mod Q) is not solvable. For example, take Q = q 2 with q prime and
let P be any quadratic non-residue modulo q.
Theorem 8.11. For arbitrary a, b ∈ Z and odd c, d ∈ N, the following hold.
(1) We have
 a   b   ab 
= .
c c c
(2) We have a a  a 
= .
c d cd
(3) If gcd(a, c) = 1, then  2  
a a
= 2 = 1.
c c
(4) If a ≡ b (mod c), then it follows that
a b
= .
c c
70
Proof.
(1) By the definition (8.1), for part (1) it suffices to show the claim for c prime,
which is precisely the statement of Theorem 8.3 (3).
(2) We obtain (2) directly from the definition (8.1).
(3) Part (3) follows directly from parts (1) and (2) and the fact that the Jacobi
symbol has value ±1 whenever gcd(a, c) = 1.
(4) If a ≡ b (mod c), then a ≡ b (mod qj ) for every prime qj dividing c. The claim
then follows by Theorem 8.3 (1). □

Theorem 8.12. For every odd Q, we have


 
−1 1
= (−1) 2 (Q−1) ,
Q
 
2 1 2
= (−1) 8 (Q −1) .
Q
Proof. We write Q = q1 · · · qs with primes q1 , . . . , qs . By Theorem 8.6 and the
definition of the Jacobi symbol, it follows that
  Y s   Y s
−1 −1 1 1 Ps
= = (−1) 2 (qj −1) = (−1) 2 j=1 (qj −1) .
Q j=1
qj j=1

For odd u and v, one has


 
uv − 1 u−1 v−1 (u − 1)(v − 1)
− + = ≡0 (mod 2),
2 2 2 2
or in other words  
uv − 1 u−1 v−1
≡ + (mod 2).
2 2 2
We therefore obtain by induction that
s
X 1 1 1
(8.2) (qj − 1) ≡ (q1 · · · qs − 1) = (Q − 1) (mod 2),
j=1
2 2 2

and hence  
−1 1
= (−1) 2 (Q−1) .
Q
Similarly, for odd u and v we have
u2 v 2 − 1 1  2  1
− u − 1 + v 2 − 1 = (u2 − 1)(v 2 − 1) ≡ 0 (mod 2),
8 8 8
so that one inductively concludes that
s
X 1 2  1  1 
qj − 1 ≡ q12 . . . qs2 − 1 = Q2 − 1 (mod 2).
j=1
8 8 8
71
Thus Theorem 8.6 implies that
  Y s   Y s
2 2 1 P1 2 1
(−1) 8 (qj −1) = (−1) 8 (qj −1) = (−1) 8 (Q −1) .
2 2
= =
Q j=1
qj j=1


Theorem 8.13 (Quadratic reciprocity for the Jacobi symbol). For relatively prime
P, Q ∈ N, we have    
P 1
(P −1)(Q−1) Q
= (−1) 4 .
Q P
Proof. We write P = p1 · · · pr and Q = q1 · · · qs with primes p1 , . . . , pr and q1 , . . . , qs .
Since P and Q are relatively prime, pj ̸= qℓ . Then by the definition (8.1) of the
Jacobi symbol and quadratic recoprocity for primes, we conclude that
  Y s   Y s Y r   s r  
P P pk Thm. 8.8 Y Y qj 1
= = = (−1) 4 (pk −1)(qj −1)
Q j=1
qj j=1 k=1
qj j=1 k=1
pk
   
Q Ps Pr Q 1 Ps 1 Pr
(−1)( 2 j=1 (qj −1))( 2 k=1 (pk −1)) .
1
(pk −1)(qj −1)
= (−1) j=1 k=1 4 =
P P
By (8.2) (in the proof of Theorem 8.12), we furthermore have
s
X 1  1 
qj − 1 ≡ Q−1 (mod 2) and
j=1
2 2
r
X 1  1 
pk − 1 ≡ P −1 (mod 2).
k=1
2 2
It follows that    
P Q 1
= (−1) 4 (P −1)(Q−1) .
Q P

Example 8.14. Consider the Mersenne prime M19 = 219 −1 = 524287. Is x2 ≡ 100003
(mod M19 ) solvable?
Since M19 is prime, we only need to compute the Jacobi symbol 100003

524287
. Using
Theorem 8.13 and Theorem 8.12, we obtain
≡1 (mod 4)

       4   z}|{ 
100003 524287 24272 2 · 1517 1517
=− =− =− =−
524287 100003 100003 100003 100003
        
100003 1398 2 699 699
=− =− =− =
1517 1517 1517
| {z } 1517 1517

≡5 (mod 8)
72
≡1 (mod 4)

 z }| {       
1517 119 699 104
= =− =−
699 699 119 119
        
2 13 13 119 2
=− =− =− =− = 1.
119
|{z} 119 119 13 13
|{z}
↑ ↑
≡7 (mod 8) ≡5 (mod 8)

Hence the above congruence is solvable.

73
9. Multiplicative arithmetic functions
To recall: A function f : N → C is called multiplicative if f (nm) = f (n)f (m)
whenever gcd(n, m) = 1. By Theorem 6.8, if f is multiplicative, then F (n) =
P
d|n f (d) also is.

Question. For every multiplicative function F : N → C, does there exist a function


P
f : N → C for which F (n) = d|n f (d) ∀n? If so, is the function f necessarily
multiplicative?
We first answer the above question for possibly the “simplest” multiplicative func-
tion, namely (
1 if n = 1,
I(n) :=
0 if n > 1.
Definition. To answer the question, we require an auxilliary function. First recall
that an integer n ∈ N is called squarefree if there does not exist m ∈ N with m > 1
for which m2 | n. We then define the Möbius function µ : N → Z via

1 if n = 1,


r
µ(n) := (−1) if n is a product of r distinct primes,

0

if n is not squarefree.

Theorem 9.1. The Möbius function is multiplicative and satisfies


X
µ(d) = I(n).
d|n

Proof. The multiplicativity follows directly from the definition since m2 | n with
m > 1 implies that there is a prime p | m for which p2 | n. We conclude from
Theorem 6.8 that X
I(n) := µ(d)
d|n

is also multiplicative. For prime powers n = pk with k ≥ 1, we have


k
X
k
µ (pν ) = µ(1) + µ(p) = 0.

I p =
|{z} |{z}
ν=0 =1 =−1

From the multiplicativity of I, we conclude that


(
1 if n = 1,
I(n) =
0 if n > 1.

Theorem 9.2 (Möbius inversion formula).
74
P
(1) Let f : N → C be given and set F (n) = d|n f (d). Then we have f (n) =
n
P 
d|n µ(d)F d for all n ∈ N.
(2) If F : N → C is given and one sets f (n) = d|n µ(d)F nd , then it holds that
P 
P
F (n) = d|n f (d) for all n ∈ N.
Proof.
(1) Given the definition of F , Theorem 9.1 implies that
 
X n X X X X
µ(d)F = µ(d) f (t) = f (t)  µ(d)

d
d|n d|n n t|n n
t d
d t
n
Thm. 9.1
X
= f (t)I = f (n).
t
t|n

In the last step, we use the fact that I(n/t) = 0 unless n = t.


(2) Using Theorem 9.1 together with the definition of f , we obtain
  XX   X X  δk 
X XX d d
f (d) = µ(t)F = µ F (δ) = µ F (δ)
t δ d=δk
δ
d|n d|n t|d d|n δ|d δ|n
d|n
n
Thm. 9.1
X X X
= F (δ) µ(k) = F (δ)I = F (n).
δ
δ|n n δ|n
k δ


Corollary 9.3. For F : N → C, there exists a uniquely-defined function f : N → C
P
satisfying F (n) = d|n f (d). The function f is moreover explicitly given by
X n
f (n) = µ(d)F .
d
d|n

Examples 9.4.
P
(1) By Theorem 6.9, we have d|n φ(d) = n. Combining this with Theorem 9.2 (1)
(with f (n) = φ(n) and F (n) = n), one obtains
X n X µ(d) Y  1

φ(n) = µ(d) = n =n 1− .
d d p prime
p
d|n d|n
p|n

Conversely, from (letting ap be the highest power of p such that pap | n)


Y 1
 Y
µ(1)p−0 + µ(p)p−1

φ(n) = n 1− =n
p
p|n p|n
ap
!
Y X
µ pj p−j

=n
p|n j=0
75
X
−1
X n X n
=n µ(d)d = µ(d) = µ(d)F ,
d d
d|n d|n d|n

(in the last line we note that every divisor d | n is a product pj11 pj22 · · · pjrr with
P
pℓ prime and jℓ ≤ apℓ ) and Theorem 9.2 (2), one obtains d|n φ(d) = n.
(2) Recall that the sum of divisor functions σk : N → R are defined by σk (n) =
k
P
d|n d for k ∈ R (see Section 6). Hence by Theorem 9.2 (1), it follows that
X n
nk = µ(d)σk .
d
d|n

Definition. For arithmetic functions f : N → C and g : N → C one defines the


convolution of f and g (or Dirichlet product) to be the function f ∗ g : N → C given
by n
X
(f ∗ g) (n) := f (d)g .
d
d|n

Remark 9.5. By making the change of variables d → nd in the sum, we see that
f ∗ g = g ∗ f.
In a formal variable s, one can define the formal series

X
Df (s) = f (n)n−s ,
n=1

called the Dirichlet series associated to f . When it converges, one can evaluate the
series for s ∈ C. If the series converges absolutely for some choice of s, then one has

X ∞
X ∞ X
X ∞
−s −s
Df (s)Dg (s) = f (k)k g(m)m = f (k)g(m)(km)−s
k=1 m=1 k=1 m=1
 

X ∞
X  −s X
=  f (k)g(m) n = (f ∗ g) (n)n−s .
n=1 k,m n=1
km=n

Thus Df ∗g (s) = Df (s)Dg (s), so convolution corresponds to multiplication for the


associated Dirichlet series.
The Riemann zeta function is defined for Re(s) > 1 by

X
ζ(s) = n−s ,
n=1

which is the Dirichlet series DU (s) for the constant function U (n) := 1 for all n.
One sees by direct calculation that
∞ ∞ ∞ X ∞
k −s n=md
X X X X
−s k −s
ζ(s)ζ(s − k) = m d d = d n = σk (n)n−s .
m=1 d=1 n=1 d|n n=1
76
Setting E(n) := n for all n ∈ N (so E k (n) := E(n)k = nk ) and noting that ζ(s−k) =
DE k (s), we see that U ∗ E k = σk (by the comparison of their Dirichlet series above)
and in particular U ∗ E = σ1 .
We next show that the set of arithmetic functions form a commutative ring, where
multiplication is defined by convolution and addition is pointwise addition
(f + g)(n) := f (n) + g(n).

(1) Firstly, for arithmetic functions f , g, and h, one has


  X X X
(f ∗ g) ∗ h (n) = (f ∗ g)(d)h(d3 ) = f (d1 )g(d2 ) h(d3 )
dd3 =n d1 d2 =d dd3 =n
X  
= f (d1 )g(d2 )h(d3 ) = . . . = f ∗ (g ∗ h) (n),
d1 d2 d3 =n

so convolution satisfies associativity, or namely


(f ∗ g) ∗ h = f ∗ (g ∗ h).
(2) One sees directly from the definition of convolution that the function I from
Theorem 9.1 satisfies
I ∗f =f
for all f . Thus I is an identity element under multiplication.
(3) Directly from the definition, one sees that the distributive property
(f + g) ∗ h = f ∗ h + g ∗ h
holds for all f , g, and h.
(4) Finally, we have already seen that commutativity holds via the change of vari-
ables d → n/d in the sum.

Theorem 9.6. The set A of all arithmetic functions with pointwise addition and
convolution as the multiplication form an integral domain. The identity element is
(
1 if n = 1,
I(n) =
0 if n > 1.
The units of A are precisely the functions f : N → C with f (1) ̸= 0.

Proof. We have already seen above that A is a ring. We next show that there are
no zero divisors. Let f, g ∈ A be given with f ∗ g = 0 and suppose that g ̸≡ 0. Let
m ∈ N be minimal with g(m) ̸= 0.
Since
X m
0 = (f ∗ g) (m) = f g(d) = f (1)g(m),
d
d|m
77
it follows that f (1) = 0. Suppose next that n ≥ 2 and f (k) = 0 for 1 ≤ k ≤ n − 1.
Then
X XX
0 = (f ∗ g) (mn) = f (k)g(ℓ) = f (k)g(ℓ) = f (n)g(m) ⇒ f (n) = 0.
k,ℓ k≥n ℓ≥m
kℓ=mn ℓk=mn

By induction, we conclude that f (n) = 0 for all n, and hence there are no zero
divisors.
If f ∈ A is invertible, then there exists g ∈ A such that f ∗ g = I. In particular,
1 = I(1) = f (1)g(1),
and hence f (1) ̸= 0.
Conversely, let f ∈ A be given with f (1) ≠ 0. We then recursively construct a
g ∈ A with f ∗ g = I. Firstly, set
1
g(1) :=
f (1)
so that
(f ∗ g)(1) = f (1)g(1) = 1 = I(1).
Now suppose that n ≥ 2 and the values g(k) are given for every 1 ≤ k ≤ n − 1 such
that (f ∗ g) (k) = I(k) for 1 ≤ k ≤ n − 1. Since n > 1, we want to choose g(n) so
that X n
0 = I(n) = (f ∗ g) (n) = f (1)g(n) + f g(d)
d
d|n
d<n
holds. This is satisfied for
1 X n
g(n) := − f g(d).
f (1) d
d|n
d<n


Examples 9.7. We again set U (n) := 1 and E k (n) := nk for all n ∈ N. By Theorem
9.1, we have
µ ∗ U = I.
Theorem 9.2 can be written as follows: if F = f ∗ U , then f = F ∗ µ. Note next
that X
(U ∗ U ) (n) = 1 = σ0 (n) = τ (n) (number of divisors of n)
d|n

Convolution with µ (using µ ∗ U = I from above) yields


τ ∗ µ = U ∗ U ∗ µ = U ∗ I = U.
Therefore X n
µ(d)τ =1
d
d||n
78
for all n ∈ N. It follows that the inverse (under convolution) τ −1 of τ is
τ −1 = µ ∗ µ.
By Theorem 6.9 and Theorem 9.2, we have
φ = µ ∗ E,
E = φ ∗ U.
The sum of divisor functions satisfy (for k ≥ 0 an integer; note that E 0 = U )
σk = E k ∗ U,
from which one concludes that
σk ∗ φ = E k ∗ E.
Since φ = E ∗ µ and τ = U ∗ U , we moreover have
φ ∗ τ = E ∗ µ ∗ U ∗ U = E ∗ U = σ1 ,
from which one concludes that
X X n
d = (φ ∗ τ ) (n) = φ(d)τ .
d
d|n d|n

Using µ ∗ U = I and σ1 = E ∗ U , we similarly have


φ ∗ σ1 = E ∗ µ ∗ U ∗ E = E ∗ E,
and hence n
X X n
φ(d)σ1 = d· = nτ (n).
d d
d|n d|n

Theorem 9.8. If f and g are multiplicative functions, then so are f ∗ g and f −1 .


The multiplicative functions form a subgroup of the invertible elements of A with
respect to convolution.

Proof. Suppose that m, n ∈ N are relatively prime. Every divisor d of nm has a


unique representation of the form d = d1 d2 with d1 | m and d2 | n. Therefore, we
have  
X  mn  X X mn
(f ∗ g) (mn) = f (d)g = f (d1 d2 )g .
d d1 d2
d|mn d1 |m d2 |n
 
Since m and n are relatively prime, we have gcd(d1 , d2 ) = 1 and gcd dm1 , dn2 = 1.
Since f and g are multiplicative, it follows that
 X  
X m n
(f ∗ g) (mn) = f (d1 )g f (d2 )g = (f ∗ g) (m) · (f ∗ g) (n).
d1 d2
d1 |m d2 |n
79
Next recall that f (1) = 1 because f is multiplicative. Theorem 9.6 then implies
the existence of the inverse f −1 of f . We next define a multiplicative function
a
h : N → C. We set h(1) := 1 and for n = ℓj=1 pj j with p prime, we define
Q


a 
Y
h(n) := f −1 pj j .
j=1

It is easy to see that h is multiplicative. We next show that h = f −1 in order to


show that f −1 is multiplicative. From the first part of this proof, since f and h are
both multiplicative, we conclude that f ∗ h is multiplicative. If n = pr is a prime
power, then h(pr ) = f −1 (pr ) and
Xr Xr
r r−ν ν
f (pr−ν )f −1 (pν ) = f ∗ f −1 (pr ) = I(pr ).

(f ∗ h) (p ) = f (p )h(p ) =
ν=0 ν=0
Since I and f ∗ h are both multiplicative, they are completely determined by their
values at prime powers, and hence f ∗ h = I. We conclude that h = f −1 , and hence
f −1 is multiplicative. □

80
10. Sums of two squares
A right triangle all of whose sides have integral length x, y, z ∈ N is called a
Pythagorean triangle. The side lengths satisfy
x2 + y 2 = z 2
and the ordered pair (x, y, z) is called a Pythagorean triple.
Example 10.1. The ordered pairs (3, 4, 5) and (5, 12, 13) are Pythagorean triples.
If gcd(x, y, z) = 1, then the Pythagorean triple is called primitive. It is easy to
see that if a Pythgorean triple is primitive, then we furthermore have
gcd(x, y) = gcd(y, z) = gcd(x, z) = 1.
In particular, x and y cannot both be even. They are also not both odd because
z 2 ̸≡ 2 (mod 4). Without loss of generality, one may hence assume that x is odd
and y is even.
Theorem 10.2 (Diophantus). Every primitive Pythagorean triple (x, y, z) with odd
x and even y satisfies
x = r 2 − s2 , y = 2rs, z = r 2 + s2
with integers r, s ∈ Z for which r > s > 0, gcd(r, s) = 1, and r ̸≡ s (mod 2).
Conversely, every pair of such integers r and s yields a primitive Pythagorean
triple of the form
r2 − s2 , 2rs, r2 + s2 .


Proof. Suppose that r and s satisfy the conditions of the theorem and set x = r2 −s2 ,
y = 2rs, and z = r2 + s2 . Then one sees that
2
x2 + y 2 = r4 − 2r2 s2 + s4 + 4r2 s2 = r4 + 2r2 s2 + s4 = r2 + s2 = z 2
and gcd(x, y, z) = 1 because for p | y we have p = 2, p | r, or p | s and thus if p | x
or p | z we have p | gcd(r, s) = 1.
Next suppose that x2 + y 2 = z 2 with gcd(x, y, z) = 1 and y even. Then x and z
are odd and both 21 (x + z) ∈ Z and 12 (x − z) ∈ Z. It follows that
  
 y 2 1
2 2
 z+x z−x
= z −x = .
2 4 2 2
Writing d = gcd x+z , x−z

2 2
, we see that d is a divisor of
1 1
(z + x) + (z − x) = z
2 2
and also of
1 1
(z + x) − (z − x) = x.
2 2
Since gcd(x, z) = 1, it follows that d = 1.
81
Due to the uniqueness of prime factorization, since the product of 12 (z + x) and
1
2
(z − x) is a square and they are relatively prime, they are each squares. Thus there
exist r, s ∈ N with r > s such that
1
(z + x) = r2 ,
2
1
(z − x) = s2 .
2
Taking the sum and difference and noting that their product is y2 , we conclude that
z = r 2 + s2 ,
x = r 2 − s2 ,
y = 2rs.
Moreover, since 21 (z + x) = r2 and 21 (z − x) = s2 are relatively prime, we conclude
that gcd(r, s) = 1 and r ̸≡ s (mod 2) due to gcd(x, y, z) = 1. □

Fermat considered whether similar equations were solvable with powers higher
than 2. He famously claimed that xn + y n = z n has no solutions for n > 2 (this
was later known as “Fermat’s Last Theorem” and was only proven 350 years later).
Below is the n = 4 case, which he himself proved.

Theorem 10.3 (Fermat).


(1) The equation x4 + y 4 = z 2 has no solutions with natural (positive) numbers
x, y, z ∈ N.
(2) The equation x4 + y 4 = z 4 has no solutions with natural (positive) numbers
x, y, z ∈ N.

Proof. The second claim follows from the first claim, since (z 2 )2 = z 4 , so we only
need to show (1). Assume for contradiction that a solution to x4 + y 4 = z 2 in N3
exists, and let (x, y, z) be a solution with minimal z. Due to the minimality, we have
gcd(x, y, z) = 1 because d = gcd(x, y, z) satisfies d2 |z and
 x 4  y 4  z 2
+ = 2 .
d d d
Since gcd(x, y, z) = 1, we immediately see that x and y cannot both be even.
Moreover, they cannot both be odd because if they were then
z2 ≡ 1 + 1 ≡ 2 (mod 4),
and this congruence is not solvable. We may therefore assume that x is odd, y is
even, and z is odd.
We next rewrite the equation x4 + y 4 = z 2 as
x4 = z 2 − y 4 = z − y 2 z + y 2 .
 
82
If it were the case that d = gcd(z − y 2 , z + y 2 ) > 1, then d | 2z, d | 2y 2 , and d | x4 .
Since x is odd, we conclude that d is odd and hence d | z and d | y 2 , yielding a
contradiction to gcd(x, y, z) = 1. Thus gcd(z − y 2 , z + y 2 ) = 1. It follows that
z − y 2 = u4 ,
z + y2 = v4
for some relatively prime u and v. Hence
2y 2 = v 4 − u4 = v 2 + u2 v 2 − u2
 

and
2z = v 4 + u4 .
Since z is odd and y is even, we also have u ≡ v ≡ 1 (mod 2), from which we
conclude that u2 + v 2 ≡ 2 (mod 8).
Since gcd(u, v) = 1, the integers v 2 − u2 and v 2 + u2 do not have any common
odd divisors > 1. Their product is 2y 2 and it follows that
u2 + v 2 = 2b2 ,
v 2 − u 2 = a2
with relatively prime a, b ∈ N. Since u2 + a2 = v 2 , the integers (u, a, v) are a
Pythagorean triple and Theorem 10.2 implies that there exist relatively prime r and
s satisfying
u = r 2 − s2 ,
a = 2rs,
v = r 2 + s2 .
Since
2 2
2b2 = u2 + v 2 = r2 − s2 + r 2 + s2 = 2 r 4 + s4 ,


it follows that
r 4 + s 4 = b2 .
Hence (r, s, b) is another solution to the original equation. We next show that b < z,
contradicting the minimality of z.
If u = v = 1, then y = 0, which contradicts y ∈ N. Thus u4 + v 4 > u2 + v 2 and
1 4  1 2
u + v4 > u + v 2 = b2 ≥ b.

z=
2 2
This contradicts the minimality of z. □
Questions.
(1) Which n are sums of two squares?
(2) How often is such an n representable (i.e., how many solutions to x2 + y 2 = n
are there?)?
83
Consider the ring (this just means that one can add, subtract, and multiply as usual)

R = Z[i] = {a + bi : a, b ∈ Z} .

The ring R is called the Gaussian integers and we have seen already that it is a
Euclidean domain. The units of R are

R× = {1, −1, i, −i}.

The Euclidean function associated to R is the absolute value squared (which we call
the norm) and the norm of an element a + bi ∈ R is N (a + bi) := |a + bi|2 = a2 + b2 ,
giving a connection with sums of squares. For arbitrary x, y ∈ Z, we have

|x + iy|2 = x2 + y 2 ≡ 0, 1 or 2 (mod 4).

We immediately conclude that integers n ≡ 3 (mod 4) are not sums


 of two squares.
−1
Suppose that p is prime and p ≡ 1 (mod 4). Then we have p = 1. Thus
x2 + 1 ≡ 0 (mod p) has a solution x ∈ Z with 0 < x < p. For γ = x + i ∈ R, it
follows that
N (γ) = γ · γ = x2 + 1 = pm
for some m ∈ Z with 1 ≤ m < p. If p were a prime element of R, then since p | γγ,
it follows by definition that p | γ or p | γ. Since x+i
p
/ R and x−i
∈ p

/ R, we see that p
cannot be a prime element of R. As noted, following Theorem 3.12, every irreducible
element is prime for Euclidean domains, so p is also not irreducible. Every divisor π
of p in R satisfies N (π) = ππ | p2 . Since N (π) ∈ N, by unique prime factorization
over the integers we have N (π) ∈ {1, p, p2 }. This shows that in R = Z[i] we have
p = ππ with prime elements π = a + bi and π = a − bi in R. Thus p = a2 + b2 is a
sum of two squares.

Theorem 10.4. Let R = Z[i] be the ring of Gaussian integers with norm N (z) =
zz = a2 + b2 for z = a + bi ∈ R. For n ∈ N, let

r2 (n) := # (x, y) ∈ Z2 : x2 + y 2 = n


and ρ2 (n) = 41 r2 (n). Then the following hold:


(1) Unique factorization (up to multiplication by units) holds in R. The units in R
are 1, −1, i, −i.
(2) Every prime (in the integers) p ≡ 3 (mod 4) is a prime element in R and
ρ2 (p) = r2 (p) = 0; moreover, we have ρ2 (p2 ) = 1.
(3) Every prime (in the integers) p ≡ 1 (mod 4) is a product p = ππ of prime
elements π and π in R. Furthermore, ππ is not a unit, and ρ2 (p) = 2.
(4) Up to units, the unique factorization of 2 is 2 = −i(1+i)2 , which is (up to units)
the square of the prime element 1 + i in R. It holds moreover that ρ2 (2) = 1.
84
(5) The function ρ2 is multiplicative. Furthermore, for prime powers pr , we have


 0 if p ≡ 3 (mod 4) and r is odd,


1 if p ≡ 3 (mod 4) and r is even,
ρ2 (pr ) =


 r + 1 if p ≡ 1 (mod 4),

1 if p = 2.

(6) A natural number n is a sum of two squares if and only if the prime factorization
of n does not contain any primes p ≡ 3 (mod 4) raised to an odd power.
(7) The relations
 
r+1 r −4
ρ2 (p ) = ρ2 (p)ρ2 (p ) − ρ2 (pr−1 )
p
and
X  −4 
ρ2 (n) =
d
d|n
−4

hold for all primes p and n ∈ N. Here we mean d
= 0 whenever d is even.
Q a
Proof. For part (1), suppose that z ∈ R has the factorizations z = j πj j and
Q b
z = ϵ j πj j with aj , bj ∈ N0 , ε a unit, and πj prime elements in R (since we allow
aj and/or bj to be zero, we can assume that the same primes are appearing). Then
Y Y
|πj |2aj = N (z) = |πj |2bj .
j j
α
By unique factorization over the integers and the fact that N (πj ) = pj j ∈ {pj , p2j }
for some prime pj ∈ Z, we see that
Y 2α a Y 2α b
pj j j = pj j j ,
j j

from which we conclude that aj = bj . This implies uniqueness of the factorization


as a product of prime elements, assuming that such a factorization exists. We next
prove the existence of such a factorization by induction on N (z). Suppose that
z ∈ R and that we know that for every τ ∈ R with N (τ ) < N (z) we have such a
factorization. If z is irreducible, then z is a prime element of R (as R is a Euclidean
domain), so we may assume that z is not irreducible. Thus there exist x, y ∈ R with
xy = z and neither x nor y is a unit. One easily sees that N (z) = N (x)N (y). If
N (x) = 1, then xx = 1, but one easily sees that x ∈ R, so we conclude that x ∈ R×
is a unit, which is a contradiction. It follows that N (x) > 1 and N (y) < N (z),
and by the same argument N (y) > 1 and N (x) < N (z). Thus we may use the
inductive hypothesis and write both x and y as a product of prime elements; it
follows that z = xy also has a factorization of this type (side note: the proof of
uniqueness holds in more generality, but we had to use the fact that R is
a Euclidean domain to obtain the existence of a factorization of this type;
85
this observation leads to a more general phenomenon known as unique
factorization for an object known as the ideals of more general rings, but
we will not discuss those further in this course).
For part (2), note that since p ≡ 3 (mod 4) are not the sum of two squares, they
are irreducible elements, and hence prime.
For part (3), we have already seen that for p = 1 (mod 4), we have p = ππ and
N (π) = p = N (π). These are irreducible because their norms are prime, and hence
they are prime elements because R is a Euclidean domain. Suppose for contradiction
that π = επ for some unit ε ∈ R× . Then we would have

p = επ 2 = ε(a + ib)2 = ε a2 − b2 + 2abi ,




and ε ∈ {1, −1, i, −i}. For ε = ±1 it follows that ab = 0, so that p = ±a2 or p = ±b2 ,
which is a contradiction. If ε = ±i, then a2 − b2 = 0, and hence p = ±2ab = ±2a2 ,
which contradicts the condition p ≡ 1 (mod 4). We have hence proven (3).
Part (4) follows by direct computation and the fact that 1 + i is irreducible (and
hence prime) because its norm is prime.
We now move on to (5). By definition, ρ2 (n) is the number of equivalence classes
of elements of R with norm n, under the equivalence of associativity. Due to the
unique prime factorization in R, we conclude that ρ2 is multiplicative (if m > 1 and
n > 1 are relatively prime and N (x) = mn, then x = x1 x2 with N (x1 ) = m and
N (x2 ) = n as we can split x into a product of primes with norm dividing m and a
product of primes with norm dividing n).
One obtains from (2) that for p ≡ 3 (mod 4) prime
(
1 for even r,
ρ2 (pr ) =
0 for odd r.

If p ≡ 1 (mod 4) is prime, then p = ππ with prime elements π, π ∈ R. Since the


norm is multiplicative (|xy|2 = |x|2 |y|2 ), the elements π r , π r−1 π, π r−2 π 2 , . . ., and π r
all have norm pr . As π/π is not a unit, these are all distinct. Furthermore, they are
all possible elements of norm pr because any element α with N (α) = pr with r > 1
cannot be prime and hence (by induction) has a divisor with norm p, which must
be either π or π. We conclude that ρ2 (pr ) = r + 1.
The elements of norm 2r (up to equivalence by associativity) are all (1 + i)r for
some r ∈ N, from which we conclude that ρ2 (2r ) = 1.
Part (6) follows directly from part (5).
We finally prove part (7). From part (5), we may directly see that
 
r+1 r −4
ρ2 pr−1
 
ρ2 p = ρ2 (p)ρ2 (p ) −
p
86
for all primes p and r ≥ 1. The formula
r   X 
r
X −4 −4
ρ2 (p ) = ν
=
ν=0
p r
d
d|p

also follows from part (5). Since d|n −4


P 
d
is multiplicative by Theorem 6.8 and
ρ2 (n) is also multiplicative, we conclude that for all n
X  −4 
ρ2 (n) = .
d
d|n

Remarks.
(1) For z ∈ H := {z ∈ C : Im(z) > 0}, Jacobi considered the theta series

2
X
ϑ(z) = eπin z .
n=−∞

For m, n ∈ N, let Sm (n) denote the set of solutions to the equation x21 + x22 +
· · · + x2m = n with xj ∈ Z, i.e.,

Sm (n) := {x ∈ Zm : x21 + · · · + x2m = n}.

Then, letting rm (n) be the number of solutions x ∈ Zm of x21 + · · · + x2m = n,


one sees that
∞ ∞
eπi(x1 +...+xm )z
2 2
X X
ϑ(z)m = ...
x1 =−∞ xm =−∞

X X
πinz
= e
n=0 x∈Sm (n)

X
= rm (n)eπinz .
n=0

One calls rm (n) the number of representations of n as a sum of m squares.


The above theorem finds the number of representations of n as the sum of two
squares.
(2) The sum of two squares is a special case of a more general object known as
an integral quadratic form (homogeneous polynomials in many variables whose
coefficients are in Z) which is positive definite (a quadratic form Q which does not
represent negative numbers and Q(x) = 0 if and only if x = 0). In 1801, Gauss
started the consideration of representations of n by arbitrary positive-definite
binary (x ∈ Z2 ) quadratic forms.
87
For such a quadratic form Q(x, y) = ax2 + bxy + cy 2 with a, b, c ∈ Z, one looks
for solutions (x, y) ∈ Z2 to n = Q(x, y) and tries to count the number of such
solutions. The special case a = 1 = c and b = 0 is Theorem 10.4.
(3) Let f : N → C be multiplicative. The Dirichlet series

X
Df (s) = f (n)n−s
n=1

then satisfies Y
Df (s) = Df (p, s)
p prime
with

X
Df (p, s) := f (pr )p−rs .
r=0
For the example of the Riemann zeta function

X Y
ζ(s) = n−s = ζ (p, s) ,
n=1 p prime

we see that

X 1
ζ (p, s) = p−rs = ,
r=0
1 − p−s
and hence
Y 1
ζ(s) = .
p prime
1 − p−s
From Theorem 10.4, we have the example

X Y
Dρ2 (s) = ρ2 (n)n−s = Dρ2 (p, s).
n=1 p prime
P∞
For p ̸= 2, we set F (x) = r=0 ρ2 (pr )xr . By Theorem 10.4 (7), we have
   
−1  
1 − ρ2 (p)x + x2 F (x) = 1 + ρ2 (p) − ρ2 (p) x
p
∞    
X
r r−1
 −1 r−2
 r
+ ρ2 (p ) − ρ2 (p)ρ2 p + ρ2 p x = 1.
r=2
p
The factor for p = 2 is

X 1
2−rs = .
r=0
1 − 2−s
Hence we conclude that
−1 Y 1
Dρ2 (s) = 1 − 2−s   .
−s + −1
p prime 1 − ρ2 (p)p p
p−2s
p̸=2
88
The factor for p ≡ 3 (mod 4) can also be computed via Theorem 10.4 (5) as
1 1 1
Dρ2 (p, s) = −2s
= ,
1−p 1 − p 1 + p−s
−s

and for p ≡ 1 (mod 4) Theorem 10.4 (5) yields


1 1
Dρ2 (p, s) = = .
−s
1 − 2p + p −2s
(1 − p−s )2
It follows that
  
1 Y 1 1  Y 1 1 
Dρ2 (s) = 
1 − 2−s 1 − p 1 + p−s
−s 1 − p 1 − p−s
−s
p≡3 (mod 4) p≡1 (mod 4)
∞ −s
Y 1 X 1 X 2
= ζ(s)   = ρ2 (n)n−s = x + y2
p prime 1− −1
p−s n=1
4 x,y∈Z
p
p̸=2
1 X X
= (µµ)−s = N (µ)−s .
4
µ∈Z[i] (µ)Z[i]
µ̸=0 (µ)̸=0

Here (µ) denotes the elements modulo units, which are what is known as princi-
pal ideals (actually all ideals in Z[i] are principal because it is what is known as
a principal ideal domain, but further discussion about this is left for a course in
Algebra).

89
11. Representations of real numbers via continued fractions
A real number x is called irrational if x ∈
/ Q.
Approximation of real numbers by rational numbers For every x ∈ R and
n ∈ N, there exists an m ∈ Z for which
m 1
x− ≤ .
n 2n
Questions. For every x ∈ R, is there a choice of (reduced) rational number m
n
such
m 1 m
that x − n is “significantly smaller” than 2n ? How small can x − n get? Finally,
how does one find m
n
approximating x very well?
m
Theorem 11.1 (Hurwitz). If x ∈ R but x ∈
/ Q, then there are infinitely many n
with gcd(m, n) = 1 and
m 1
x− <√ .
n 5n2
We will prove this later.
u0
Definition. Let u1
be a reduced fraction with u1 > 0. Using the Euclidean algo-
rithm, one has
u0 = u1 a0 + u2 0 < u2 < u1 ,
u1 = u2 a1 + u3 0 < u3 < u2 ,
..
.
uk−1 = uk ak−1 + uk+1 0 < uk+1 < uk ,
uk = uk+1 ak .
The numbers aj satisfy a0 ∈ Z and a1 , . . . , ak ∈ N.
uj
The rational number ζj := uj+1 > 1 satisfies
1
ζ0 = a0 + ,
ζ1
1
ζ1 = a1 + ,
ζ2
..
.
1
ζk−1 = ak−1 + ,
ζk
ζk = ak .
It follows that
u0 1 1 1
= ζ0 = a0 + = a0 + 1 = . . . = a0 + 1 .
u1 ζ1 a1 + ζ2
a1 + a2 + a 1
3+
..
. 1
ak−1 + a1
k
90
This formula is called a continued fraction expansion for uu01 and a0 , a1 , . . . , ak are
called the partial quotients of uu10 .
Suppose that x0 , x1 , . . . , xk ∈ R (one usually assumes that xj ∈ Q) with xj > 0
for 1 ≤ j ≤ k. Then
1
[x0 , x1 , . . . , xk ] := x0 + 1
x1 + x2 + x 1
3+
...
1
xk−1 + x1
k

is called a generalized (finite) continued fraction. If x0 ∈ Z and x1 , . . . , xk ∈ N, then


we call the continued fraction simple or regular.

Remark 11.2. The continued fractions satisfy the relation


 
1 1
[x0 , x1 , . . . , xk ] = x0 + = x0 , x1 , . . . , xk−2 , xk−1 + .
[x1 , . . . , xk ] xk
Hence in particular
[a0 , a1 , . . . , ak , 1] = [a0 , a1 , . . . , ak−1 , ak + 1].
In the Euclidean algorithm we have ak > 1, so that
u0
= [a0 , a1 , . . . , ak ] = [a0 , . . . , ak−1 , ak − 1, 1].
u1
We see next that the above identity between regular continued fractions is essen-
tially the only relation, or in other words, up the above identity, the expansion as a
regular continued fraction is unique.

Theorem 11.3. If two regular continued fractions a = [a0 , a1 , . . . , ak ] and b =


[b0 , b1 , . . . , bn ] satisfy a = b and both ak > 1 and bn > 1, then k = n and aj = bj for
0 ≤ j ≤ k.
In particular, every rational number q has a unique representation as a regular
continued fraction q = [a0 , a1 , . . . , ak ] with ak > 1.

Proof. Set xj := [aj , aj+1 , . . . , ak ] and yj := [bj , bj+1 , . . . , bn ] for 0 ≤ j ≤ min{k, n}.
For j ≥ 1 we have xj > 0 and yj > 0. Thus we see directly that for j < k
1 1
(11.1) x j = aj + = aj + .
[aj+1 , . . . , ak ] xj+1
We plan to use this identity to apply an inductive argument. First note that from
(11.1), it follows that for every 1 ≤ j ≤ k − 1
xj > aj ≥ 1
and aj < xj < aj + 1 for 0 ≤ j ≤ k − 1. Moreover, xk = ak > 1. Hence aj = ⌊xj ⌋
for 0 ≤ j ≤ k.
91
Similarly, we have
1
y j = bj +
,
yj+1
implying that bj < yj < bj + 1 for 0 ≤ j ≤ n − 1 and bj = ⌊yj ⌋ for 0 ≤ j ≤ n. By
assumption we have x0 = y0 . Hence
a0 = ⌊x0 ⌋ = ⌊y0 ⌋ = b0 .
Now assume that for some 0 ≤ j ≤ min{k, n} we have xj = yj and aj = bj . Then it
follows that
1 1
xj+1 = = = yj+1 ,
x j − aj y j − bj
and hence also
aj+1 = ⌊xj+1 ⌋ = ⌊yj+1 ⌋ = bj+1 .
Inductively, we obtain xj = yj and aj = bj for all 0 ≤ j ≤ min{k, n}. Suppose for
contradiction that k < n. Then it follows that xk = yk and
1
ak = b k = y k − < yk = xk ,
yk+1
which contradicts ak = xk . It follows that k = n, and the proof is complete. □
Theorem 11.4. Suppose that a0 ∈ Z and let a sequence (aν )ν≥1 of integers aν ∈ Z
be given. Define the sequences (hn )n≥−2 and (kn )n≥−2 recursively by
h−2 = 0, h−1 = 1, hn = an hn−1 + hn−2 ,
n≥0
k−2 = 1, k−1 = 0, kn = an kn−1 + kn−2 .
One then sets rn := [a0 , a1 , . . . , an ] for n ≥ 0. Then for all n ≥ 1, the following hold.
(1) For all x ∈ R, x > 0, one has
xhn−1 + hn−2
[a0 , a1 , . . . , an−1 , x] = .
xkn−1 + kn−2
(2) One has rn = hknn with gcd (hn , kn ) = 1.
(3) For every n ≥ −1, one has
hn kn−1 − hn−1 kn = (−1)n−1
and
1
rn − rn−1 = (−1)n−1 .
kn kn−1
(4) For n ≥ 0, we have
hn kn−2 − hn−2 kn = (−1)n an
and
an
rn − rn−2 = (−1)n .
kn kn−2
Proof.
92
(1) For n = 0, the right-hand side is x and the left-hand side is [x] = x. For n = 1,
the left-hand side is
1
[a0 , x] = a0 + ,
x
while the right-hand side is
xh0 + h−1 xa0 + 1 1
= = a0 + .
xk0 + k−1 x x
Now suppose that for some n ≥ 1, we have that (1) holds for all x > 0. Then it
follows that
an + x1 hn−1 + hn−2
  
1
[a0 , a1 , . . . , an , x] = a0 , a1 , . . . , an−1 , an + =
an + x1 kn−1 + kn−2

x
x (an hn−1 + hn−2 ) + hn−1 xhn + hn−1
= = .
x (an kn−1 + kn−2 ) + kn−1 xkn + kn−1
The result hence follows by induction.
(2) Setting x = an in (1), we obtain
(1) an hn−1 + hn−2 hn
rn = [a0 , a1 , . . . , an ] = = .
an kn−1 + kn−2 kn
We defer the proof that hn and kn are relatively prime to part (3).
(3) We proceed by induction. By definition, we have
h−1 k−2 − h−2 k−1 = 1.
Now suppose that for some n ≥ −1, we have
hn−1 kn−2 − hn−2 kn−1 = (−1)n−2 .
Then we obtain
 
hn kn−1 − hn−1 kn = an hn−1 + hn−2 kn−1 − hn−1 an kn−1 + kn−2
= − hn−1 kn−2 − hn−2 kn−1 = (−1)n−1 .


Note that this identity combined with Theorem 3.10 (1) implies that gcd(hn , kn ) =
1 for all n ≥ −1, and we furthermore conclude that
hn hn−1 hn kn−1 − hn−1 kn (−1)n−1
rn − rn−1 = − = = .
kn kn−1 kn kn−1 kn kn−1
(4) We begin by noting that
h0 k−2 − h−2 k0 = a0
and
 
hn kn−2 − hn−2 kn = an hn−1 + hn−2 kn−2 − hn−2 an kn−1 + kn−2
= an hn−1 kn−2 − kn−1 hn−2 = (−1)n an .

93
Therefore
hn hn−2 hn kn−2 − hn−2 kn (−1)n an
rn − rn−2 = − = = .
kn kn−2 kn kn−2 kn kn−2

Remark 11.5. The recursion defining kn and hn may be written in matrix form as
    
hn hn−1 hn−1 hn−2 an 1
= .
kn kn−1 kn−1 kn−2 1 0
We next discuss how to construct infinite continued fractions via limits.
Theorem 11.6. Suppose that a0 ∈ Z and let a sequence a1 , a2 , a3 , . . . of positive
integers (aj ∈ N for j ≥ 1) be given. Letting rn := [a0 , a1 , . . . , an ] denote the
value of the continued fraction formed by the sequence up to n for n ≥ 0, one has
r0 < r2 < r4 < . . . < r5 < r3 < r1 and the limit limn→∞ rn exists.
Proof. By Theorem 11.4 (4), we have rn − rn−2 > 0 for even n and rn − rn−2 < 0 for
odd n. By Theorem 11.4 (3), we also have r2n − r2n−1 < 0. Thus we conclude that
r2n < r2n+2ν < r2n+2ν−1 < r2n−1 ∀n, ν ∈ N.
Moreover, Theorem 11.4 (3) also implies that
(−1)n−1
rn − rn−1 =
kn kn−1
and kn → ∞ because it is strictly increasing by its recursive definition (as an > 0).
Thus the limits of the subsequences from n even and n odd equal each other (they
both exist because they are monotone and bounded) and the limit limn→∞ rn hence
exists (and is equal to the limit of each of these subsequences). □
Definition. Let a0 ∈ Z and a sequence a1 , a2 , . . . with aj ∈ N for j ≥ 1 be given.
The value of the regular infinite continued fraction [a0 , a1 , a2 , . . .] is the limit
lim [a0 , a1 , . . . , an ].
n→∞

One calls rn the n-th convergent of [a0 , a1 , . . .]. The sequences (hn )n and (kn )n from
Theorem 11.4 satisfy rn = hknn and gcd (hn , kn ) = 1.
Questions.
• Is every irrational number the limit of an infinite continued fraction?
• Is it possible for different (regular) infinite continued fractions to have the
same value?
• Can rational numbers be expressed as regular infinite continued fractions?
In other words, can an infinite continued fraction have a rational value?
Theorem 11.7. The value of every regular infinite continued fraction is irrational.
94
Proof. Set ϑ = [a0 , a1 , a2 , . . .] and let hn , kn , and rn be as in Theorem 11.4. By
Theorem 11.6, we have r2n < ϑ < r2n+1 , and hence
0 < |ϑ − rn | < |rn+1 − rn | .
Multiplying by kn , Theorem 11.4 (parts (2) and (3)) therefore imply that
1
0 < |kn ϑ − hn | < kn |rn+1 − rn | = .
kn+1
Suppose for contradiction that ϑ ∈ Q. Then ϑ = ab with a ∈ Z and b ∈ N.
Multiplying the previous inequality by b, it follows that
b
0 < |kn a − hn b| <
kn+1
b
for all n. Since b is fixed independent of n and kn+1 → ∞ as n → ∞, the ratio kn+1
is less than 1 for n sufficienty large. However, since kn , a, hn , and b are integers, so
is kn a − hn b. It follows that for n sufficiently large, |kn a − hn b| is an integer between
0 and 1, which is a contradiction. We therefore conclude that ϑ ∈ / Q. □

Infinite continued fractions also satisfy relations similar to those given in Remark
11.2.

Lemma 11.8. Let ϑ = [a0 , a1 , a2 , . . .] be a regular infinite continued fraction and


furthermore set ϑ1 = [a1 , a2 , a3 , . . .]. Then we have a0 = ⌊ϑ⌋ and ϑ = a0 + ϑ11 .

Proof. Since r0 < ϑ < r1 , we have


1
a0 < ϑ < a 0 + ≤ a0 + 1.
a1
Hence a0 = ⌊ϑ⌋. We then conclude from Theorem 11.6 that
 
1
ϑ = lim [a0 , a1 , . . . , an ] = lim a0 +
n→∞ n→∞ [a1 , . . . , an ]
1 1
= a0 + = a0 + .
limn→∞ [a1 , . . . , an ] ϑ1

We are now ready to show uniqueness of the representations of real numbers as


continued fractions (whenever such a representation exists).

Theorem 11.9. Two distinct regular infinite continued fractions have different val-
ues.

Proof. Suppose that a0 , b0 ∈ Z and an , bn ∈ N for n ≥ 1 are given such that


[a0 , a1 , . . .] = ϑ = [b0 , b1 , . . .].
95
By Lemma 11.8, it follows that a0 = ⌊ϑ⌋ = b0 and
1 1
[a1 , a2 , . . .] = = ϑ1 = = [b1 , b2 , . . .].
ϑ − a0 ϑ − b0
By induction, one concludes that an = bn for all n ≥ 0. □
We next show that such a continued fraction representation exists.

Theorem 11.10 (Continued fraction algorithm). Every irrational number has pre-
cisely one continued fraction representation as a regular infinite continued fraction
ϑ = [a0 , a1 , . . .]. One obtains an recursively via
ϑ0 = ϑ, a0 = ⌊ϑ0 ⌋,
1
ϑn = ϑn−1 −an−1 , an = ⌊ϑn ⌋ for n ≥ 1.

Proof. The uniqueness was already shown in Theorem 11.9. Since ϑ ∈ / Q, the
recursive formula in the theorem gives an infinite sequence (an )n≥0 with well-defined
integers an ∈ Z, as ϑn ∈
/ Z for all n. Since an < ϑn for every n ≥ 0, we see from the
recursion that an ∈ N for all n ≥ 1. It follows that
 
1 1
ϑ = ϑ0 = a0 + = [a0 , ϑ1 ] = a0 , a1 + = [a0 , a1 , ϑ2 ].
ϑ1 ϑ2
Continuing inductively, it follows that ϑ = [a0 , a1 , a2 , . . . , an−1 , ϑn ] for all n ≥ 1.
Theorem 11.4 (1) then implies that
ϑn hn−1 + hn−2
ϑ = [a0 , a1 , a2 , . . . , an−1 , ϑn ] = .
ϑn kn−1 + kn−2
Using parts (2) and (3) of Theorem 11.4, it now follows that
ϑn hn−1 + hn−2 hn−1 hn−2 kn−1 − hn−1 kn−2 (−1)n+1
ϑ − rn−1 = − = = .
ϑn kn−1 + kn−2 kn−1 kn−1 (ϑn kn−1 + kn−2 ) kn−1 (ϑn kn−1 + kn−2 )
Since ϑn > 1 and kn → ∞, we conclude that limn→∞ rn = ϑ, or in other words
ϑ = [a0 , a1 , . . .]. □

Examples 11.11.
(1) We next explicitly compute ϑ = [1, 1, 1, . . .]. We note that
1
ϑ = [1, ϑ] = 1 + ,
ϑ
and hence
ϑ2 − ϑ − 1 = 0
There are two roots of this polynomial obtained by the quadratic formula and
combining with the fact that ϑ > 1, we obtain
1 √ 
ϑ= 1+ 5 .
2
96
(2) Note that 3.1415926 < π < 3.1415927. The continued fraction algorithm for
ϑ = π yields a0 = 3 and
7.06251099 < ϑ1 < 7.06251598, a1 = 7,
15.9959104 < ϑ2 < 15.997187, a2 = 15,
1.0028211 < ϑ3 < 1.0040251, a3 = 1.
These yield the following rational approximations to π (see the rn row):
n −2 −1 0 1 2 3
an - - 3 7 15 1
hn 0 1 3 22 333 355
kn 1 0 1 7 106 113
rn - - 3 22 7
333
106
355
113

97
12. Approximation of real numbers with rational numbers
We continue to use the same notation as in Section 11, and in particular the notation
from Theorem 11.4. For ϑ = [a0 , a1 , a2 , . . .], we have
h−2 = 0, h−1 = 1, hn = an hn−1 + hn−2 ,
k−2 = 1, k−1 = 0, kn = an kn−1 + kn−2 .
Theorem 12.1. Let ϑ ∈ R be irrational (i.e., ϑ ∈ / Q). By Theorem 11.10, we may
write ϑ uniquely as a regular infinite continued fraction. Then for every n ≥ 1, the
following approximations for the continued fraction yielding ϑ hold:
hn 1
(1) ϑ − kn
<kn kn+1
.
1
(2) |ϑkn − hn | < kn+1 .
(3) ϑ − hkn+1
n+1
< ϑ − hknn .
(4) |ϑkn+1 − hn+1 | < |ϑkn − hn | .
Proof. Writing ϑn and an as in the continued fraction algorithm in Theorem 11.10,
we have
1 1 1
ϑ − rn = < = .
kn (ϑn+1 kn + kn−1 ) kn (an+1 kn + kn−1 ) kn kn+1
This is part (1).
Part (2) follows directly by multiplying part (1) by kn . Furthermore,
ϑn+1 kn + kn−1 < (an+1 + 1) kn + kn−1 = kn+1 + kn ≤ an+1 kn+1 + kn = kn+2 ,
and hence
hn 1 1
ϑ−
= > .
kn kn (ϑn+1 kn + kn−1 ) kn kn+2
Multiplying by kn and using part (2) then yields
1
ϑkn − hn > > ϑkn+1 − hn+1 ,
kn+2
which is part (4).
Finally, we obtain (3) by noting that kn < kn+1 and hence from part (4)
hn+1 1 1 hn
ϑ− = ϑkn+1 − hn+1 < ϑkn − hn = ϑ − .
kn+1 kn+1 kn kn

We next show that the approximations in Theorem 12.1 are in some sense best
possible.
a
Theorem 12.2. Suppose that ϑ ∈ R is irrational and b
is a reduced fraction with
a ∈ Z and b ∈ N. Then the following hold
a hn
(1) If ϑ − b
< ϑ− kn
for some n ≥ 1, then it must be the case that b > kn .
98
(2) If |ϑb − a| < |ϑkn − hn | for some n ≥ 1, then one must have b ≥ kn+1 .
Proof. Suppose for contradiction that (2) does not hold. Then for some n ≥ 1, we
have
|ϑb − a| < |ϑkn − hn |
and b < kn+1 . The system of equations
(
xkn + ykn+1 = b
xhn + yhn+1 = a
have a unique solution x, y ∈ Z by Theorem 11.4 (3) (the determinant of the cor-
repsonding matrix is 1).
If it were the case that x = 0, then it follows that ykn+1 = b, and hence b ≥ kn+1 ,
which is a contradiction. Similarly, if y = 0, then xkn = b and xhn = a, implying
that
|ϑb − a| = |x| |ϑkn − hn | ≥ |ϑkn − hn |
which contradicts the inequality assumed above. Hence we have xy ̸= 0.
If y < 0, then it follows that
xkn = b − ykn+1 > 0,
and hence x > 0. If y > 0, then
xkn = b − ykn+1 < 0,
and hence x < 0. In particular, we conclude that xy < 0.
By Theorem 11.6, the differences ϑ − rn and ϑ − rn+1 have opposite signs (one
is positive and the other is negative), and thus by Theorem 11.10, the differences
ϑkn − hn and ϑkn+1 − hn+1 also have different signs. Therefore x (ϑkn − hn ) and
y (ϑkn+1 − hn+1 ) have the same signs. We then compute
x (ϑkn − hn ) + y (ϑkn+1 − hn+1 ) = ϑb − a,
and hence
|ϑb − a| = |x (ϑkn − hn )| + |y (ϑkn+1 − hn+1 )|
> |x (ϑkn − hn )| ≥ |ϑkn − hn | ,
contradicting the assumption. We have hence concluded part (2).
If part (1) did not hold, then there would be an n ≥ 1 for which
a hn
ϑ− < ϑ−
b kn
and b ≤ kn . Multiplication with b then yields
a hn
|ϑb − a| = b ϑ − < kn ϑ − = |ϑkn − hn |
b kn
and the inequality kn < kn+1 contradicts part (2). □
99
a
Theorem 12.3. Let an irrational ϑ ∈ R \ Q and a rational b
∈ Q with a ∈ Z,
b ∈ N, and gcd(a, b) = 1 be given. If
a 1
ϑ− < 2
b 2b
a
is satisfied, then b
is a convergent in the continued fraction expansion of ϑ.

Proof. Suppose that


a 1
< 2ϑ−
b 2b
and assume for contradiction that b ̸= rm = hkm
a m
for all m ≥ 0. We then choose
n ∈ N0 such that kn ≤ b < kn+1 . By Theorem 12.2 (2), it follows that

ϑb − a ≥ ϑkn − hn ,

and hence
hn b a 1
ϑ− ≤ ϑ− < .
kn kn b 2bkn
a
Since b
̸ rn , we have bhn − akn ∈ Z \ {0}, and thus |bhn − akn | ≥ 1. However,
=
1 |bhn − akn | hn a hn a 1 1
≤ = − ≤ ϑ− + ϑ− < + 2,
bkn bkn kn b kn b 2bkn 2b
and hence 2bk1 n < 2b12 , so that k1n < 1b . It follows that b < kn , which contradicts the
choice of n. It therefore follows that ab = rn . □

Remark 12.4. Conversely, for a convergent we have


Thm. 12.1 (1)
hn 1 ↓ 1
ϑ− <
= .
knkn kn+1 kn (an+1 kn + kn−1 )
√ √ 
Lemma 12.5. If x ∈ R with x ≥ 1 and x + x1 < 5, then x < 1
1+ 5 and
1 1
√  2

x
> 2
5 − 1 .
1
Proof. For x ≥ 1, the map x 7→ x + is strictly monotone. Using
x
√ 
1 2 1− 5 1 √ 
√ = = 5 − 1 ,
1 1−5

2
1+ 5 2
√ 
we see that at the point x = 12 1 + 5 , this function has the value
1 √  1 1 √  1 √  √
1+ 5 + 1 √ = 1+ 5 + 5 − 1 = 5.
2 2
1+ 5 2 2

Hence if the value
√  for some x ≥ 1 is smaller than 5, it must be the case that
x < 12 1 + 5 .

100
Theorem 12.6 (Hurwitz). For every irrational number ϑ, there are infinitely many
rational numbers hk satisfying
h 1
ϑ− <√ .
k 5k 2
Moreover, for every sequence of three consecutive convergents from the continued
fraction expansion of ϑ, at least one them satisfies the above inequality.
kn h hn−1
Proof. We set qn := kn−1 and claim that if the inequality does not hold for k
= kn−1
h hn 1

and k = kn , then qn + qn < 5.
To see this claim, note that by assumption
Thm. 11.4 (3)
1 1 ↓ 1
√ 2
+ √ ≤ |ϑ − rn−1 | + |ϑ − rn | = |rn − rn−1 | = .
5kn−1 5kn2 ↑ kn−1 kn
rn −1<ϑ<rn
or rn <ϑ<rn−1

This implies that



 
1 kn kn−1 1 1
qn + = + = kn kn−1 2
+ 2 ≤ 5.
qn kn−1 kn kn−1 kn

Since 5 ̸∈ Q but qn + q1n ∈ Q, the inequality must be strict and we conclude that

qn + q1n < 5.
Now assume for contradiction that there exists an n ≥ 1 such that for every
convergent hk ∈ {rn−1 , rn , rn+1 }, we have
h 1
ϑ− ≥√ .
k 5k 2
Then by the above claim, we have
1 √ 1 √
qn + < 5 and qn+1 + < 5.
qn qn+1
We see directly that qn ≥ 1 and qn+1 ≥ 1, so, by Lemma 12.5, we conclude that
√  √ 
1 1 1
qn
> 2
5 − 1 , q n+1 < 2
5 + 1 .

We relate the two values via the relations defining kn ; namely, we have
kn+1 an+1 kn + kn−1 1
qn+1 = = = an+1 + .
kn kn qn
Since an+1 ≥ 1, it follows that
1 √  1 1 1 √  1 √ 
5 + 1 > qn+1 = an+1 + ≥1+ >1+ 5−1 = 5+1 ,
2 qn qn 2 2
which is a contradiction. □
101

Theorem 12.7. For every c > 5, there exist irrational numbers ϑ for which
h 1
ϑ− < 2
k ck
holds for only finitely many rational numbers hk .

Proof. One chooses ϑ = 12

5 + 1 independent of c. Note that we have the contin-
ued fraction expansion ϑ = [1, 1, 1, . . .] (see the example at the end of Section 11).
Let hk ∈ Q satisfying ϑ − hk < ck12 be given. By Theorem 12.3, it follows that there
exists n ≥ 0 such that
h hn
= rn = .
k kn
Using the fact that aj = 1 for all j and arguing by induction, we see that kn = hn−1 .
Hence
kn−1 kn−1 1 1 √ 
lim = lim = = 5 − 1 = ϑ − 1.
n→∞ kn n→∞ hn−1 ϑ 2
Recall the definitions ϑ0 = ϑ and
1 1
ϑn = = .
ϑn−1 − an−1 ϑn−1 − 1
One then inductively obtains that ϑn = ϑ for all n. It follows that

 
kn−1
lim ϑn+1 + = 2ϑ − 1 = 5.
n→∞ kn

Since c > 5, there are only finitely many n for which ϑn+1 + kn−1 kn
> c. Following
the proof in Theorem 11.10, we conclude that
hn 1 1
ϑ− =  <
kn kn2 ϑn+1 + kn−1 ckn2
kn

for at most finitely many n. □


Definition. A complex number α ∈ C is called algebraic if there exists a polynomial
P ̸= 0 in Z[x] (i.e., the coefficients of P (x) are integers) satisfying P (α) = 0.
Otherwise, we call α trancendental. For an alegbraic number α, the smallest degree
of a polynomial satisfying P (α) = 0 is called the degree of α.
Theorem 12.8 (Liouville, 1844). If α ∈ R is algebraic of degree n, then there exists
δ > 0 such that there are only finitely many hk ∈ Q satisfying
h δ
α− < n.
k k
Example 12.9. The number

X
10−n! = 0.110001000 . . .
n=1
102
is transcendental because of the approximations from the partial sums n ≤ N .
Liouville’s Theorem has since been improved by Thue (1909), Siegel (1921), and
Roth (1955).
Theorem 12.10 (Roth’s Theorem). If ϑ ∈ / Q is real and algebraic and ε > 0, then
there exists δ > 0 (depending on ε) such that for all hk ∈ Q we have
h δ
ϑ− > 2+ε .
k k
Definition. The Farey sequence of the first order is
 
0 1
F1 := ,
1 1
and one recursively defines the Farey sequence Fn of the nth order from the Farey
sequence Fn−1 of the (n−1)st order in the following manner: if ab and dc are adjacent
entries in the sequence Fn−1 and b + d ≤ n, then one adds the fraction a+c b+d
between
these entries in the sequence Fn . This yields the following family of Farey sequences:
 
0 1
F1 : , ,
1 1
 
0 1 1
F2 : , , ,
1 2 1
 
0 1 1 2 1
F3 : , , , , ,
1 3 2 3 1
 
0 1 1 1 2 3 1
F4 : , , , , , , ,
1 4 3 2 3 4 1
 
0 1 1 1 2 1 3 2 3 4 1
F5 : , , , , , , , , , , .
1 5 4 3 5 2 5 3 4 5 1
One can use the Farey sequences to obtain an alternative proof of Hurwitz’s Theorem
(Theorem 12.6); see Niven–Zuckerman pp. 182–189 for a proof.

103
13. Periodic continued fractions
Definition. An infnite regular continued fraction [a0 , a1 , . . .] is called periodic if
there exist integers r ≥ 0 and s > 0 for which ak+s = ak for all k ≥ r. One writes
[a0 , a1 , a2 , . . .] = [a0 , a1 , . . . , ar−1 , ar , . . . , ar+s−1 ].
We have seen one such example; namely
1 √ 
5 + 1 = [1, 1, 1, . . .] = [1].
2
In the case that r = 0, we call the continued fraction purely periodic. We call α ∈ C
a quadratic irrationality if α ̸∈ Q and α is a root of a non-zero quadratic polynomial
P (x) = ax2 + bx + c with a, b, c ∈ Z. The second root of this polynomial is called
its conjugate and √is written as α′ . If α ∈
/ R, then α′ = α is the√ complex conjugate,
while for α = 2a ∈ R (for D = b2 − 4ac) we have α′ = −b∓2a D .
−b± D

Theorem 13.1 (Euler, Lagrange). Every periodic continued fraction represents a


real quadratic irrationality. Conversely, every real quadratic irrationality is repre-
sented by a periodic continued fraction.

Proof.
(1) Suppose that α = [b0 , . . . , br−1 , a0 , . . . , an−1 ] and set ϑ = [a0 , a1 , . . . , an−1 ] and
let let hn and kn be defined for ϑ as in Theorem 11.4. Then by Theorem 11.4
(1), we have
ϑhn−1 + hn−2
ϑ = [a0 , . . . , an−1 , ϑ] = ,
ϑkn−1 + kn−2
and hence
kn−1 ϑ2 + kn−2 − hn−1 ϑ − hn−2 = 0.


Thus ϑ satisfies a quadratic equation over Z and since ϑ ∈ R\Q, we conclude


that ϑ is a real quadratic irrationality. Furthermore,
ϑu + s
α = [b0 , . . . , br−1 , ϑ] = ,
ϑv + t
where st and uv are the last two convergents for of [b0 , . . . , br−1 ]. Thus we see that
α is also a real quadratic irrationality.
(2) Let α = [a0 , a1 , a2 , . . .] be a real quadratic irrationality. Then √there exist a, b ∈ Z
and D > 0 (from the quadratic equation) such that α = −b±2a D with a ̸= 0 and
D not a square. We may rewrite this as
√ √
−2ab + 4a2 D 2ab + 4a2 D
α= or α = ,
4a2 −4a2

m0 + d
from which we obtain the representation α = q0
with m0 , q0 , d ∈ Z, q0 ̸= 0,
q0 | (d − m20 ), and d > 0 is not a square.
104
We next claim that the integers mn , qn ∈ Z defined via the recursion
mn+1 = an qn − mn ,
d−m2n+1 for n ≥ 0
qn+1 = qn
,

satisfy qn ̸= 0, qn | (d − m2n ) and further claim that an = ⌊αn ⌋, where αn = mnq+n d
and the an are from the continued fraction expansion of α.
We prove this claim √
via induction on n. For n = 0 we have q0 ̸= 0, q0 | (d − m20 ),
m0 + d
and α0 = α = q0 with a0 = ⌊α⌋ = ⌊α0 ⌋.
Now suppose that n ≥ 0 and assume the inductive hypothesis for this n. One
defines mn+1 ∈ Z and qn+1 ∈ Q by the given recursion formula. We then have
d − m2n m2n − m2n+1 d − m2n 
qn+1 = + = + mn − mn+1 an ∈ Z.
qn qn qn
We see that qn+1 ̸= 0, since otherwise d = m2n+1 would be a square. Since qn =
d−m2n+1 2

qn+1
and q n ∈ Z, it follows that q n+1 | d − mn+1 .
By Theorem 11.10, we have an+1 = ⌊αn+1 ⌋ (we use ϑ = α in Theorem 11.10) with
1 1 q
αn+1 = = √ = √n
α n − an mn + d
− an m n + d − an q n
qn
 √  √
qn qn mn+1 + d mn+1 + d
=√ = = .
d − mn+1 d − m2n+1 qn+1
For α = [a0 , a1 , . . .], let hn and kn be given as in Theorem 11.4. Then we have
αn hn−1 + hn−2
α = α0 =
αn kn−1 + kn−2

for all n ≥ 0. The conjugate of αn is αn′ = mnq−n d and
′
α′ hn−1 + hn−2

′ αn hn−1 + hn−2
α = = n′ .
αn kn−1 + kn−2 αn kn−1 + kn−2
Solving for αn′ yields

mn − d α′ kn−2 − hn−2 kn−2 α′ − rn−2
= αn′ = − = − .
qn α′ kn−1 − hn−1 kn−1 α′ − rn−1
Since the convergents of α converge to α, we have
α′ − rn−2 α′ − α
lim = = 1 > 0.
n→∞ α′ − rn−1 α′ − α
Since kn ≥ 0, it follows that there exists N ∈ N such that αn′ < 0 for all n ≥ N .
By Theorem 11.10, we have αn > 0 for all n ≥ 1. It follows that

2 d
= αn − αn′ > 0
qn
105
for all n ≥ N and hence qn > 0 for all n ≥ N . We now see that
0 < qn ≤ qn qn+1 = d − m2n+1 ≤ d
for all n ≥ N , and thus
m2n+1 = d − qn qn+1 < d,

so that |mn+1 | < d for all n ≥ N .
Since qn and mn+1 are bounded for n ≥ N and are integers, they may only take
finitely many values for n ≥ N . Therefore by the pigeon-hole principle there exist
j and ℓ with j < ℓ and qj = qℓ and mj = mℓ . By induction it follows that αj = αℓ ,
aj = aℓ and aj+s = aℓ+s for all s ≥ 0. Therefore α has a periodic continued fraction
expansion α = [a0 , a1 , . . . , aj−1 , aj , . . . , aℓ−1 ]. □

Example 13.2. Consider α = 19. Using the notation and recursive definition of
the mn and qn from Theorem 13.1, we have the following table:
1

n mn = an−1 qn−1 − mn−1 qn = qn−1 (19 − m2n ) αn = q1n (mn + 19) an = ⌊αn ⌋

0 0 1 4 < 19 √< 5 4
2 1
1 4·1−0=4 (19 − 4 ) = 3 3
(4 + √19) 2
1 2 1
2 2·3−4=2 3
(19 − 2 ) = 5 5
(2 + √19) 1
1
3 3 2 2
(3 + √19) 3
1
4 3 5 5
(3 + √19) 1
1
5 2 3 3
(2 +√ 19) 2
6 4 1 4 + 19 8
7 4 3 2
√  
This implies that 19 = 4, 2, 1, 3, 1, 2, 8
Theorem 13.3 (Galois). Suppose that α is a real quadratic irrationality. The
continued fraction expansion of α is purely periodic if and only if α > 1 and
−1 < α′ < 0. In this case, one calls α reduced.
Proof. We first suppose that α > 1 and −1 < α′ < 0 and prove that the continued
fraction expansion is purely periodic. We write α = [a0 , a1 , . . .] and note that a0 =
⌊α⌋ ≥ 1. By Theorem 11.10 (here we write αn for the values ϑn from that theorem),
for n ≥ 0 we have
1
αn+1 = , an+1 = ⌊αn+1 ⌋ .
α n − an
In particular, we have αn ≥ 1 for all n ≥ 1, since
αn ≥ 1 ⇔ 1 ≥ αn−1 − an−1 .
This is satisfied because
an = ⌊αn ⌋ .
We furthermore have
′ 1
αn+1 = .
αn′ − an
106
If αn′ < 0, then it follows that αn′ − an < −1 and −1 < αn+1′
< 0. Since −1 < α′ =
α0′ < 0, it follows by induction that −1 < αn′ < 0 for all n ≥ 0. Using
1
αn′ = an + ′
αn+1
it now follows that
1
0 < −an − ′
<1
αn+1
j k
for all n ≥ 0, and hence an = − α′1 . By Theorem 13.1, there exist j and ℓ with
n+1
j < ℓ satisfying αj = αℓ and aj = aℓ . We then conclude that
   
1 1
aj−1 = − ′ = − ′ = aℓ−1 ,
αj αℓ
and after finitely many steps we conclude that a0 = aℓ−j . Therefore
α = [a0 , a1 , . . . , aℓ−j−1 ]
is purely periodic.
Now suppose that α = [a0 , a1 , . . . , an−1 ] is purely periodic. Then by periodicity
and the fact that aj ≥ 0 for all j ≥ 1, we have α > a0 = an ≥ 1. By Theorem 11.4,
we have
αhn−1 + hn−2
α = [a0 , . . . , an−1 , α] = .
αkn−1 + kn−2
Therefore α and α′ are roots of the polynomial
f (x) = kn−1 x2 + (kn−2 − hn−1 )x − hn−2 .
Since f (0) = −hn−2 < 0 and
f (−1) = (kn−1 − kn−2 ) + (hn−1 − hn−2 ) > 0,
the Intermediate Value Theorem tells us that the polynomial f must have a root
between −1 and 0. Since α > 1, it follows that −1 < α′ < 0. □

Theorem 13.4. Suppose that d ∈ N is not a square and √ r is the length of the
shortest period in the continued fraction expansion of α = d. Then

d = [a0 , a1 , . . . , ar−1 , 2a0 ]
√ √
with a0 = ⌊ d⌋. If one writes αn = q1n (mn + d) as in the recursion in the proof of
Theorem 13.1, then we have qn ̸= −1 for all n, and qn = 1 precisely when r | n.

Proof. The real number


√ j√ k √
γ = a0 + d= d + d
107

satisfies γ > 1 and γ ′ = a0 − d so that −1 < γ ′ < 0. Theorem 13.3 hence implies
that γ has a purely periodic continued fraction expansion γ = [c0 , c1 , . . . , cr−1 ], where
r is minimal. Furthermore, c0 = cr and c0 = ⌊γ⌋ = 2a0 . It follows that

d = γ − a0 = [a0 , c1 , . . . , cr−1 , cr ] = [a0 , c1 , . . . , cr−1 , 2a0 ].

It therefore suffices to compute γ. For γ, let γ0 , γ1 , γ2 , . . . be defined as in Theorem


11.10. We have γn = [cn , cn+1 , . . . , cn+r−1 ]. By assumption, r is the smallest period,
so γ0 , γ1 , . . ., and γr−1 are pairwise distinct and γn = γ0 = γ if and only if r | n.

mn + d

Writing γn = qn as in the proof of Theorem 13.1, we have γ0 = a0 + d,
j√ k
m 0 = a0 = d , and q0 = 1. Since

1  √  √
mℓr + d = γℓr = γ0 = a0 + d
qℓr
j√ k
we have qℓr = 1 and mℓr = a0 = d for all ℓ ∈ N0 .
Suppose that for some n ≥ 0 we have qn = 1. then we have

γn = mn + d = [cn , cn+1 , . . . , cn+r−1 ].

By Theorem 13.3, it follows that γn > 1 and −1 < γn′ < 0, and hence

−1 < γn′ = mn − d < 0.

Thus

−mn − 1 < − d < −mn ,

⇐⇒ mn < d < mn + 1,
j√ k
so that mn = d . It follows that
j√ k √
γn = d + d = γ0

and hence r | n. We have therefore proven that qn = 1 implies that r | n.


suppose that for some n we have qn = −1. Then it follows that γn =
Next √
−mn − d and Theorem 13.3 implies that

−mn − d = γn > 1,

−1 < −mn + d = γn′ < 0.

It therefore follows that


√ √
d < mn < − d − 1,

which is a contradiction. Thus j√ qkn ̸= −1 for all n. We conclude that d =
[a0 , c1 , . . . , cr−1 , 2a0 ] with a0 = d . □
108
Theorem 13.5. Suppose
√ that d is not a square. Define the integers hn , kn , and qn
associated to α = d as in Theorem 11.4 and in the proof of Theorem 13.1. Then,
for all n ≥ −1, these integers satisfy the relation
h2n − dkn2 = (−1)n+1 qn+1 .
If r is the length of the shortest period of the continued fraction expansion of α, then
for all n ≥ 0 we have the relation
h2nr−1 − dknr−1
2
= (−1)nr .
 √ 
Proof. Recalling that αn = q1n mn + d , we have
 √ 
√ αn+1 hn + hn−1 mn+1 + d hn + qn+1 hn−1
d = α = α0 = [a0 , . . . , an , αn+1 ] = =  √  .
αn+1 kn + kn−1 mn+1 + d kn + qn+1 kn−1

Multiplying the left-hand side by the denominator of√ the right-hand side, we consider
both sides as elements
√ of the vector space Q · 1 + Q · d and compare the coefficients
√ front of 1 and d; since d is not a square, these coefficients must match (as 1 and
in
d are linearly independent over Q). Comparing the rational and irrational parts
in this way yields
mn+1 kn + qn+1 kn−1 − hn = 0,
mn+1 hn + qn+1 hn−1 − dkn = 0.
Eliminating mn+1 and using Theorem 11.4 (3) yields
h2n − dkn2 = qn+1 (hn kn−1 − hn−1 kn ) = (−1)n+1 qn+1 ∀n ≥ −1.
Making the change of variables n → nr − 1 and recalling that qnr = 1, we obtain
h2nr−1 − dknr−1
2
= (−1)nr qnr = (−1)nr .

Remark
√ 13.6. We see from Theorem 13.5 that the continued fraction expansion of
d leads to sollutions of the equation
x2 − dy 2 = N
for certain N . For N ∈ Z and d ∈ N, such equations are called Pell
√ equations. The
choices N ∈ {1, −1, 4, −4} are of particular interest. For |N | < d, we next show
in Theorem 13.7 below how to obtain all solutions of the Pell equation.

Theorem 13.7. Suppose that d ∈ N is not a square and let hknn be the convergents of
√ √
d from its continued fraction expansion. If N ∈ Z satisfies |N | < d and N ̸= 0,
then for every solution x, y ∈ N to the Pell equation x2 −dy 2 = N with gcd(x, y) = 1,
there exists an n such that x = hn and y = kn .
109
√ √
Proof. Let ρ, σ ∈ R and X, Y ∈ N be given with 0 < σ < ρ, ρ ̸∈ Q, gcd(X, Y ) =
1, and X 2 − ρY 2 = σ. Then we have
X √ σ
− ρ= √ > 0,
Y Y (X + ρ Y )

and hence Y X√ρ > 1. Since σ < ρ, we conclude that

X √ σ ρ 1 1
0< − ρ= √ < √ =  < .
Y Y (X + ρ Y ) Y (X + ρ Y ) Y 2 Y X√ρ + 1 2Y 2

By Theorem 12.3, we conclude that X Y


is a convergent in the continued fraction

expansion of ρ.
If N > 0, then choose σ = N , ρ = d, X = x, and Y = y. It follows that x = hn
and y = kn for some n.
If N < 0 and x2 − dy 2 = N , then we rewrite the formula as y 2 − d1 x2 = − Nd and

choose σ = − Nd , ρ = d1 , X = y, Y = x. Then 0 < σ < ρ and xy is a convergent in

the continued fraction expansion of √1d . If d = [a0 , a1 , a2 , . . .], then
1 h √ i
√ = 0, d = [0, a0 , a1 , . . .] .
d
By Theorem 11.4, it follows that the nth convergent in the continued fraction ex-
pansion of √1d equals hkn−1
n−1
. Hence for some n we have xy = hknn , and thus x = hn and
y = kn . □
Combining Theorem 13.7 with Theorem 13.5 yields the following immediate corol-
lary.

Theorem 13.8. Suppose that d ∈ N is not a square, hknn is a convergent of of d,

and r is the shortest period in the continued fraction expansion of d. Then the
following hold:
(1) If r is even, then the Pell equation
x2 − dy 2 = −1
has no solutions and all of the solutions x, y ∈ N to the Pell equation
x2 − dy 2 = 1
are given by xn = hnr−1 and yn = knr−1 with n ∈ N.
(2) If r is odd, then xn = hnr−1 and yn = knr−1 with odd n give all solutions to
x2 − dy 2 = −1,
and for even n they give the solutions to the Pell equation
x2 − dy 2 = 1.
110
Remark √
13.9. For every solution to the Pell equations in Theorem 13.8, the number
ε = x + d is a unit in the ring
h√ i n √ o
R = Z d = a + b d|a, b ∈ Z .

By plugging in x2 − dy 2 = ±1 and simplifying, we obtain



1 x − dy  √ 
= 2 = ± x − dy = ±ε′ ∈ R.
ε x − dy 2
Theorem 13.10. Suppose that d and r are given as in Theorem 13.8 and let x1 ∈ N
and y1 ∈ N be the smallest integers satisfying
x2 − dy 2 = (−1)r .

Set ε := x1 + y1 d. Then the solutions x, y ∈ N of
x2 − dy 2 = 1
and of
x2 − dy 2 = −1
are the pairs (xn , yn ) defined for n ∈ N via
√  √ n
xn + yn d = εn = x1 + y1 d .
Moreover,
x2n − dyn2 = (−1)nr .
√ √
Proof. We have ε′ = x1 − y1 d and (εn )′ = xn − yn d. Since (εn )′ = (ε′ )n and
εε′ = (−1)r , we conclude that
(εn ) (εn )′ = (εε′ ) = (−1)nr ,
n

and hence
x2n − dyn2 = (−1)nr .
Suppose that a pair (s, t) ∈ N2 give a solution to x2 − dy 2 = ±1. By the choice of
the pair (x1 , y1 ), we have
√ √
s + t d > x1 + y1 d = ε > 1.
Hence there exists a unique n ∈ N for which
√ √ √
εn = xn + yn d ≤ s + t d < xn+1 + yn+1 d = εn+1 .
Multiplication by ε−n = (−1)nr (ε′ )n yields
 √  −n √
(13.1) s+t d ε =x+y d

for some x, y ∈ Z satisfying x2 − dy 2 = ±1. Furthermore


√ √
(13.2) 1 ≤ x + y d < ε = x1 + y1 d.
111
Since √
1 1 x−y d
0< √ ≤ 1, and √ = 2 ,
x+y d x+y d x − dy 2
it follows that ( √
0<x−y d≤1 if x2 − dy 2 = 1,

0 < −x + y d ≤ 1 if x2 − dy 2 = −1.
√ √
If x2 − dy 2 = 1, then it follows that x > y d and |x| > |y| d, and hence x > 0
and y ≥ 0. Therefore x ∈ N, y ∈ N0 . If y ̸= 0, then (13.2) implies that the pair
(x, y) contradicts
√ the minimality of (x1 , y√
1 ), and thus we conclude that y = 0, so
that x + y d = x = 1 and therefore s + t d = εn .
If x2 − dy 2 = −1, then it follows that
√ √ √ √ √
y d > x, 2y d > x + y d ≥ 1, 2y d > 1, y > 0, x ≥ y d − 1 > 0.
Therefore x, y ∈ N. √Again by the choice of the pair (x1 , y1 ) and due to (13.2), we
conclude that x + y d = 1 by (13.1). □
Example 13.11.
positive solution x, y ∈ N to x2 − 2y 2 = −1 is x1 = 1 and y = 1,
(1) The smallest √
and we have 2 = [1, 2].
(2) What is the smallest positive solution to x2 − 19y 2 = 1? Since
√  
19 = 4, 2, 1, 3, 1, 2, 8 ,
we obtain the following table for the convergents:

n −2 −1 0 1 2 3 4 5
an - - 4 2 1 3 1 2
hn 0 1 4 9 13 48 61 170
kn 1 0 1 2 3 11 14 39

By Theorem 13.8, the choice x1 = 170 and y1 = 39 gives the smallest positive
solution.

112
14. Primes and their distribution
Question. Which primes can be expressed via a given polynomial?
Theorem 14.1. Suppose that f ∈ C[x]. If all values of f for sufficiently large
integer inputs are all primes, then f is constant.
Proof. Suppose that there exists n0 ∈ Z such that f (n) is prime for all n ≥ n0 .
Then it follows that f ∈ Q[x] (one can plug in different choices of n and consider
the identities as a linear system of equations and multiply by the inverse matrix).
Thus there exists an m ∈ N for which mf ∈ Z[x]. Set p0 = f (n0 ). Then by Taylor’s
formula, for every t ∈ N0 we have
  X mf (ℓ) (n0 )  ℓ−1
f n0 + tmp0 = tp0 mtp0 .
ℓ≥0
ℓ!
(ℓ)
Since mf ∈ Z[x], we conclude that mf ℓ!(n0 ) ∈ Z, and therefore for every ℓ ≥ 1 every
term is divisible by p0 . Since f (n0 + tmp0 ) is prime and also divisible by p0 , we
conclude that
f (n0 + tmp0 ) = p0 .
However, Rolle’s Theorem then tells us that there must be a root of f ′ (x) = 0
between n0 + (t − 1)mp0 and n0 + tmp for every t. Thus f ′ has infinitely many roots
and is therefore identically zero. It follows that f (x) is constant, and in particular
f (x) = p0 for all x. □
Remark 14.2. This was later generalized by R.-C. Buck (1946) to include rational
functions (ratios of polynomials).
Question. How large can the gaps between two successive primes be?
Theorem 14.3. The sequence of differences between successive primes is unbounded.
Proof. Let N ∈ N be arbitrary, 2 ≤ n ≤ N + 1 and set xn := (N + 1)! + n. Then
n | xn for xn ̸= 1, n and thus xn is not prime. Hence there exists a gap of at least
length N between the prime before (N +1)!+2 and the prime after (N +1)!+N . □
Example 14.4 (Young and Pottler, 1989). After the prime 42842283995351, there
are precisely 777 composite numbers.

In order to see how close primes are on average, we define


Y
P (y) = p,
p≤y
p prime

π(x) := #{p < x, p prime},


Q(x, y) := # {n ∈ N : n ≤ x, gcd(n, P (y)) = 1} .
113
Then we have
√  √ 
1 + π(x) − π x = Q x, x .
Recall the Möbius function µ : N → Z was defined via

1 if n = 1,


µ(n) = (−1)r if n is the product of r distinct primes,

0

if n is not squarefree.
By Theorem 9.1, the Möbius function satisfies
(
X 1 für n = 1,
µ(d) =
d|n
0 für n > 1.
Hence
X X X X X X jxk
Q(x, y) = 1= µ(d) = µ(d) 1= µ(d) .
n≤x n≤x d|gcd(n,P (y)) n≤x
d
 d|P (y) d|P (y)
d|n
gcd n,P (y) =1

In the last sum, we have µ(d) = ±1, since P (y) is squarefree. Moreover, the number
of summands is 2π(y) (every prime may appear to the power 0 or 1). Therefore we
have
√  X jxk
π(x) = π x − 1 + µ(d) .
√ d
d|P ( x)
One can use this to compute the number of primes up to x if one knows the number

of primes up to x.
Problem. Since there are many summands to compute, the above method is not very
effective at determining π(x).
Improvements were made over time:
• Meissel, 1870
• Lehmer, 1959
• Lagarias, Miller, Odlyzko, 1985
Euler: The number of primes is “a lot less” than the number of integers. By using
the above formula, one can show that
 
x
π(x) = O for x → ∞.
log log x
 
x
We will later prove that π(x) = O log x . The fact that π(x) is asymptotically the
same as logx x is a theorem known as the Prime Number Theorem.
We next consider twin primes; these are primes which have a gap of precisely 2
between them. We define
π2 (x) = # {n ∈ N, n, n + 2 prime, n + 2 ≤ x} ,
114
n   o
Q2 (x, y) = # n ∈ N, n ≤ x, gcd n(n + 2), P (y) = 1 .
Similarly to the case of π(x), one can show that
√  √ 
π2 (x) − π2 x + 2 = Q2 x − 2, x
for x ≥ 9.

Problem. It is hard to compute Q2 (x − 2, x). It is conjectured that there are
infinitely many twin primes, but this has not yet been proven.
Remark 14.5. One can show that p prime p−1 diverges. On the other hand, one can
P

also show that X


p−1 < ∞.
p prime
p+2 prime
In other words, even if there exists infinitely many twin primes, there are “many
less” twin primes than primes. This is part of the reason that it is so hard to show
that there are infinitely many twin primes.
Goldbach Problem (1742)
(a) Strong Goldbach conjecture: Every even integer greater than two can be repre-
sented as the sum of two primes.
(b) Weak Goldbach conjecture: Every odd integer greater than five can be repre-
sented as the sum of three primes.
The weak Goldbach conjecture has recently been solved by Harald Helfgott (2013)
using a modified version of what is known as the Circle Method, which was first
established by Hardy and Ramanujan. Helfgott’s proof would take too long and
would be too technical for this class, but we next explain why these are called the
“strong” and “weak” conjectures of Goldbach. Specifically, we show that (a) implies
(b) (i.e., the stronger conjecture implies the weaker conjecture).
If n ≥ 7 is odd, then ⇒ n − 3 ≥ 4 is even, and hence by the strong Golbach
conjecture, we have n = 3 + p + p′ with primesp and p′ .
The strong Goldbach conjecture is unsolved. It has been numerically verified up to
n = 4 × 1018 . Nils Pipping checked up to n = 105 in 1938, and computers were
employed by Deshouillers, te Riele, and Saouter (1998) to verify the claim up to
1014 , while T. Oliveira e Silva more recently extended the bound to 4 × 1018 in 2013.
The weak Goldbach conjecture was proven in parts leading up to Helfgott’s final
proof of the conjecture.
• Hardy and Littlewood (1923)
They used the Circle Method to show that, under the assumption of the Rie-
mann hypothesis, every sufficiently large integer is the sum of three primes.
• Vinogradov (1937)
15
Every odd integer > 33 is the sum of three primes. This was later made
115
smaller by Chen and Wang (1989) (∼ 43000 digits). This hence reduced the
problem to a finite calculation, but the calculation is not feasible.
• Helfgott (2013)
Improved the Circle Method to obtain the full conjecture.

Theorem 14.6. For x > 1, we have


 x
π(x) < 8 log 2 .
log x
Proof. For every
 prime p satisfying n < p ≤ 2n, we see that p divides the binomial
2n (2n)(2n−1)···(n+1)
coefficient n = n!
.
Note that
2n  
2n 2n
X 2n
2 = (1 + 1) = ,
j=0
j
so in particular we have
  X 2n  
2n 2n
< = 22n .
n j=0
j
It hence follows that
 
π(2n)−π(n)
Y 2n
n ≤ p≤ ≤ 22n .
n<p≤2n
n

Taking the logarithm of both sides then yields


2n · log 2
(14.1) π(2n) ≤ π(n) + .
log n
We next show that for every k ≥ 1, the following inequality is satisfied:

k
 2k+2
(14.2) π 2 < .
k
We prove (14.2) by induction. For k = 1 we have
π(2) = 1 < 23 ,
for k = 2 we have
24
π(4) = 2 < = 8,
2
and for k = 3 we have
25 32
π(8) = 4 <
= .
3 3
Now suppose that (14.2) holds for k. Then (14.1) implies that

k+1
 k
 2k+1 log 2 2k+2 2k+1 3 · 2k+1
π 2 ≤π 2 + < + = .
log 2k k k k
116
2k+3
This is ≤ k+1
if and only if
3 · 2k+1 2k+3
≤ ⇔ 3(k + 1) ≤ 4k,
k k+1
which is satisfied for k ≥ 3. Hence we conclude (14.2).
Suppose now that k ∈ N is chosen so that 2k−1 < x ≤ 2k . Then, since π is monotone
increasing, (14.2) implies that

k
 2k+2
π (x) ≤ π 2 < .
k
log(x)
Since 2k−1 < x and k ≥ log(2)
, we have

2k+2 x
< 8 log(2) .
k log(x)

Remark 14.7. Using the so-called sieve method, Brun showed that
 
x
π(x) = O .
log x
For twin primes he showed that
 
x
π2 (x) = O .
log2 x
As noted above, the Prime Number Theorem states the pi(x) is asymptotically
x
log(x)
.Although we won’t prove that statement in this class, as a step in that
direction one can show a lower bound for π(x) in a manner similar to Theorem 14.6.

Theorem 14.8. We have


1 x
π(x) ≥ log 2 .
2 log x
Proof. We omit the proof, but it is similar to the proof of Theorem 14.6. □

Lemma 14.9 (partial summation). Let (an )n∈N be an arbitrary sequence of complex
numbers and (tn )n∈N be a sequence of real numbers which are strictly increasing and
unbounded. Set X
A(t) := an
n
tn ≤t

and suppose that g : [t1 , ∞) → C is continuously differentiable. Then for every


x ≥ t1 , we have
X Z x
an g (tn ) = A (x) g (x) − A (t) g ′ (t) dt.
n t1
tn ≤x
117
Proof. The proof is similar to the proof of integration by parts. Choose N ∈ N such
that tN ≤ x ≤ tN +1 . Then for tn ≤ t ≤ tn+1 we have A(t) = A(tn ). Furthermore,
we have A (t1 ) = a1 and for n ≥ 2
A (tn ) − A (tn−1 ) = an .
Thus, since A(t) is constant in the intervals between tn and tn+1 , the Fundamental
Theorem of Calculus (FTC) implies that
Z x N
X −1 Z tn+1 Z x!

A (t) g (t) dt = + A (t) g ′ (t) dt
t1 n=1 tn tN
N
X −1 Z tn+1 Z x

= A(tn ) g (t)dt + A(tN ) g ′ (t) dt
n=1 tn tN
N −1    
FTC
X
= A (tn ) g (tn+1 ) − g (tn ) + A (tN ) g (x) − g (tN ) .
n=1

We then split the first sum into two pieces and make the shift n → n − 1 in the first
sum to rewrite this as (recalling that A(x) = A(tN ))
N
X N
X N
X
A (tn−1 ) g (tn ) − A (tn ) g (tn ) + A (x) g (x) = − an g (tn ) + A (x) g (x) .
n=2 n=1 n=1


Corollary 14.10. Suppose that g : [1, ∞) → C is continuously differentiable and
N ∈ N. Then
N Z N  Z N 
X 1 1
g(n) = g(t)dt + g(1) + g(N ) + t − ⌊t⌋ − g ′ (t)dt.
n=1 1 2 1 2
In particular,
1
log (N !) = N log N − N +
log N + O(1).
2
Proof. We use Lemma 14.9 with an = 1 and tn = n. Combining partial summation
with integration by parts, we then have
X Z x
g(n) = ⌊x⌋ g(x) − ⌊t⌋ g ′ (t)dt
1≤n≤x 1
Z x   Z x 
1 ′ 1
= ⌊x⌋ g(x) − t− g (t)dt + t − ⌊t⌋ − g ′ (t)dt
1 2 1 2

endpoint
int. by Z x
z }| {  Z x 
parts 1  1 1
= g(t)dt + g(1) + ⌊x⌋ + − x g(x) + t − ⌊t⌋ − g ′ (t)dt.
1 2 2 1 2
Choosing x = N ∈ N, the first claim follows.
118
For the second claim, we choose g(x) = log(x). Since x log x − x is an antiderivative
of log x (by integration by parts), we have

N Z N Z N  
X 1 1 −1
log (N !) = log n = log tdt + log N + t − ⌊t⌋ − t dt
n=1 1 2 1 2
1
= N log N − N + log N + O(1).
2

We then use

Z N   N −1 Z n+1  
1 −1 X 1 −1
t − ⌊t⌋ − t dt = t − ⌊t⌋ − t dt
1 2 n=1 n
2
N −1 Z 1   N −1 Z 1  !
t − 12

X 1 1 X
= (t + n) − ⌊t + n⌋ − dt = dt
n=1 0 2 t+n n=1 0 t+n
N −1 Z 1 N Z 1 !
t + n − n + 12 n + 12

X X
= dt = 1− dt
n=1 0 t + n n=1 0 t + n
N −1    
X 1 
= 1− n+ log (1 + n) − log n
n=1
2
N −1    !
X 1 1
= 1− n+ log 1 +
n=1
2 n
| {z }

X n−k 1 1 1
(−1)k+1 = − 2 + 3 − ...
k=1
k n 2n 3n
| {z }
1 −2
1
+ 3n1 2 + 2n1
− 4n1 2 + . . . = − 12 n + O (n−3 )

= 1 − 1 − 2n
N −1 −1
N
!
1 X 1 X 1
=− +O
12 n=1 n2 n=1
n3
= O (1) .

The last inequality holds because

N −1 ∞
X 1 X 1
< = ζ(2) = O (1) ,
n=1
n2 n=1
n 2

N −1
X 1
< ζ(3) = O (1) ,
n=1
n3
119
where we use the fact that ζ (s) converges absolutely for Re (s) > 1. We then obtain
the claim by plugging in the Taylor expansion of the logarithm; namely, we use

X xk
log (1 + x) = (−1)k+1 .
k=1
k

We note the following lemma about the “average size” of log(p)


p
for p < x without
proof. See Theorem 4.10 of Apostol’s “Introduction to Number Theory” for the
details.

Lemma 14.11. For x → ∞ we have


X log(p)
= log x + O(1).
p≤x
p
p prime

1
The average size of p
for p < x is considered in the following lemma.

Lemma 14.12. There exists a real constant B such that as x → ∞ we have


X1  
1
= log log(x) + B + O .
p≤x
p log(x)

log pn
Proof. We use Lemma 14.9 where tn = pn is the n-th prime, an = pn
, and g(t) =
1
log(t)
. This yields
X 1 X log p 1 Z x
A(x) A(t)
= = + 2 dt.
p≤x
p p≤x
p log p log x 2 t log t

By Lemma 14.11, we have


X log p
A(t) = = log t + a(t),
p≤t
p

where a(t) is bounded. Hence it follows that


X1 Z x Z x
a(x) 1 a(t)
=1+ + dt + 2 dt.
p≤x
p log x 2 t log t 2 t log t

Since a(t) is bounded, we have


 
a(x) 1
=O
log x log x
and the integral
Z ∞
a (t)
dt
2 t log2 t
120
also converges. Thus, noting that
Z ∞ Z ∞  ∞
a(t) 1 1
dt ≪ dt = − ,
x t log2 t x t log2 t log t x
we have
Z x Z ∞ Z ∞ Z ∞  
a (t) a(t) a (t) a (t) 1
2 dt = dt − dt = dt + O .
2 t log t 2 t log2 (t) x t log2 t 2 t log2 (t) log(x)
We therefore conclude that
X1  
1
= log log(x) + B + O ,
p≤x
p log(x)

where Z ∞
a (t)
B= dt + 1 − log log(2).
2 t log2 (t)

Theorem 14.13. Suppose that the limit


π(x) log x
lim
x→∞ x
exists. Then the limit is 1.

Proof. Suppose that


π(x) log x
lim = c,
x→∞ x
or in other words
x  
π(x) = c + ε(x)
log x
where limx→∞ ε(x) = 0.
We use Lemma 14.9 with tn = pn , an = 1, and g(t) = 1t . This yields
Z x Z x 
X1 X 1 π(x) π (t) c + ε(x) c + ε(t)
S(x) = = 1· = + dt = + dt
p≤x
p p≤x p x 2 t2 log x 2 t log t
Z x
c + ε(x)  ε(t)
= + log log x − log log 2 c + dt
log x 2 t log t
 
= c + δ(x) log log x,

where δ is a function with limx→∞ δ(x) = 0. By Lemma 14.12, we also have


S(x) = log log x + O(1),
and hence c = 1. □
We have already shown the following.
121
Theorem 14.14. For Re (s) > 1, we have
Y −1
ζ(s) = 1 − p−s .
p

We next recall some complex analysis for those who have seen it and introduce some
of it for those who have not. Roughly speaking, complex analysis is the study of
so-called holomorphic functions from a subset U ⊆ C of the complex numbers to
C. For an open set U ⊆ C and z0 ∈ U , a function f : U → C is called complex
differentiable at z0 if the limit
f (z0 + h) − f (z0 )
f ′ (z0 ) := lim
h→0 h
h+z0 ∈U

exists; here h is any element of C such that h + z0 ∈ U . One calls the function f
holomorphic at z0 if there exists an open neighborhood of z0 such that f is complex
differentiable for every point in this open neighborhood.
Writing z = x + iy, a theorem in complex analysis states that the function
f (x + iy) = u (x, y) + iv (x, y) ,
with u : R2 → R and v : R2 → R is complex differentiable if and only if u and v both
have continuous first-order partial derivatives in both variables and they satisfy the
Cauchy–Riemann differential equations
∂u ∂v ∂u ∂v
= , =− .
∂x ∂y ∂y ∂x
∂ 1 ∂
Writing ∂z
:= 2 ∂x
− 2i ∂y

and ∂
∂z
:= 1 ∂
2 ∂x
+ 2i ∂y

, the Cauchy–Riemann equations state
that

f (z) = 0.
∂z
Remark 14.15. Note that
∂ ∂
z=1 z=0
∂z ∂z
∂ ∂
z=0 z = 1,
∂z ∂z
which explains the notation. Roughly speaking, the Cauchy–Riemann equations are
satisfied if the functions have no “z contribution”.
Examples 14.16.
• Polynomials.
• The exponential function

X zn
exp (z) = .
n=0
n!
The following functions are not holomorphic:
122
• z 7→ |z|
• z→ 7 z
• z 7→ Re(z) or z 7→ Im(z)
Properties
• The space of holomorphic functions is closed under addition and multiplica-
tion.
• If g (z0 ) ̸= 0 and g is complex differentiable at z0 , then so is g1 .
• The sum, product, quotient, and chain rules all hold for complex differenti-
ation.
A function f : D → C (D ⊆ C is usually assumed, but one can more generally take
a metric space) is called analytic at x0 ∈ D if there exists a Taylor-like series

X
f (x) = an (x − x0 )n
n=0

which converges in some open neighborhood of x0 . If f is analytic at every point in


D, then one says that f is analytic in D. If D = C, then one simply says that f is
analytic.
The following hold:
• Every function which is holomorphic at z0 is also analytic at z0 .
• Holomorphic functions are infinitely-often differentiable and their derivatives
are also holomorphic.
Identity Theorem: If two holomorphic functions f : U → C and g : U → C (U ⊆ C)
are holomorphic on all of U and agree on a subset S ⊂ U for which S contains a
limit point in U , then f = g on all of U .
Suppose that U ⊂ C is open and Pf ⊂ U is a discrete subset of isolated points. A
function f is called meromorphic (on U ) if it is well-defined and holomorphic on the
set U \ Pf and for every τ ∈ U there exists nτ ∈ Z such that the limit
lim (z − τ )−nτ f (z)

z→τ

exists and is non-zero (if τ ∈


/ Pf , then one necessarily has nτ ≥ 0). If nτ < 0, then
one calls τ ∈ Pf a pole of f of order |nτ |; a pole is furthermore called simple if its
order is 1. If nτ ≥ 0 for τ ∈ Pf , then we may remove τ from the set Pf (it is a
removable singularity). For a meromorphic function, the series

X
f (z) = an (z − τ )n
n=−nτ

is called the Laurent series at τ . The coefficient a−1 is called the Residue of f at τ .

P (z)
Example 14.17. Rational functions Q(z)
with polynomials P and Q are meromorphic.
123
We next define another holomorphic function Γ (z) known as the Gamma function.
For n ∈ N we define
Γ (n) = (n − 1)!.
This is generalized for z ∈ C with Re(z) > 0 by
Z ∞
Γ (z) = tz−1 e−t dt.
0

This can be shown to converge absolutely (for Re(z) > 0) and is holomorphic in
that region. Note furthermore that, using integration by parts, the Gamma function
satisfies a functional equation (for Re(z) > 0)
Z ∞ Z ∞
z −t
Γ (z + 1) = t e dt = z tz−1 e−t dt = zΓ (z) .
0 0

Using this to define Γ(z) for z with real part between −1 and 0 with z ̸= 0 by
Γ(z) := Γ(z+1)
z
, we may extend the function to Re(z) > −1. Continuing in this way
and taking PΓ = −N0 (because we cannot divide by zero, we must leave out these
points), we obtain a meromorphic function on the entire set C (this is called analytic
continuation and the continuation is unique by the identity theorem). Specifically,
n
it is a meromorphic function with simple poles in −N0 having residue (−1) n!
.
Theorem 14.18. The Riemann zeta function has a meromorphic continuation to
the entire complex plane and satisfies the functional equation
 
1−s s−1
s s
ζ (1 − s) Γ π 2 = ζ(s)Γ π− 2 .
2 2
The following theorem was proven by Hadamard and la Vallé Poussin in 1896 at
about the same time.
Theorem 14.19 (Prime Number Theorem). As x → ∞, we have
x
π(x) ∼ .
log x
We prove the prime number theorem in 6 steps.
Step 1: We first show that the convergence of the sequence
!
X log p
− log n
p≤n
p
n=1,2,...

implies the prime number theorem.


Proof of first part. Set
X log p
A(x) := ,
p≤x
p
c := lim (A(n) − log n) .
n→∞
124
Then since       
⌊x⌋ − ⌊x⌋ + x 1
log = log 1 − =O ,
x x x
we obtain for x → ∞
     
−1
A(x) − log x − c = A ⌊x⌋ − log ⌊x⌋ −c + O x = o (1) .
| {z }
→c

Thus for x > 0


A(x) = log x + c + ε (x)
for some ε : R+ → R satisfying lim ε (x) = 0.
x→∞
We then use partial summation (Lemma 14.9 with tn = pn , an = logpnpn , and g(x) =
x
log x
) to obtain
Z x
X log p p x (1 − log t)
π(x) = = A(x) + A (t) 2 dt
p≤x
p log p log x 2 log t
Z x Z x
cx ε(x)x (1 − log t) (1 − log t)
= x+ + + dt +c dt
log x log x 2 log t 2 log2 t
|R {z } | {z }
x 1 x 2
2 log t dt−(x−2) − log x
+ log 2
x
ε(t) (1 − log t)
Z
+ dt
2 log2 t
Z x Z x
dt 2c ε(x)x (1 − log t)
= +2+ + + ε (t) dt.
2 log t log 2 log x 2 log2 t
Set Z x
dt
ℓi (x) = .
0 log t
We claim that
x
ℓi (x) ∼
log x
which implies that Z x
dt x
∼ .
2 log t log x
To see the claim, note that for x ≥ 2 we see that
int. by
Z x Z xparts
1 x 2 ↓dt
ℓi (x) − ℓi (2) = 1· dt = − +
2 log t log x log 2 log2 t
Z x  2
√ 

x dt x
⇒ ℓi(x) − ≤ √ 2 +O x =O .
log x x log t log2 x
| {z }

integrand asympt.
same here
125
This yields the asymptotic for ℓi(x). We now use this to bound the integral in the
formula for π(x). Let ε > 0 be given. Then there exists x0 such that |ε (t)| ≤ ε for
all t ≥ x0 (we can choose x0 > e). Thus
x Z x0
1 − log t 1 − log t
Z
ε (t) 2 dt ≤ ε (t) dt
2 log t 2 log2 t
Z x0 Z x0   
log(t) − 1 1 x x
≤ε dt ≤ ε dt = ε +o .
2 log2 (t) 2 log(t) log(x) log(x)
Hence we conclude the claim of the first part. □

Step 2: Bound of the error from the “tail” of the zeta function.

Lemma 14.20. For N ∈ N and Re(s) = σ > 1, we have


∞ Z ∞
X
−s 1 
n = N 1−s
+s 1 − {t} t−s−1 dt,
n=N
s−1 N

where {t} = t − ⌊t⌋.

Proof. Using partial summation (Lemma 14.9) with g (t) = t−s , tn = n, and an = 1
yields for x ≥ N
X   Z x 
−s −s
n = ⌊x⌋ − N + 1 x + s ⌊t⌋ −N + 1 t−s−1 dt.
N ≤n≤x N |{z}
=t−{t}

For σ > 1 we then take x → ∞, yielding


X∞ Z ∞ 
−s
n =s t − N + 1 − {t} t−s−1 dt.
n=N N

The first two summands in the integral can be explicitly evaluated via
Z ∞  −s+1
N −s+1

−s −s−1
 N 1
s t − Nt dt = s − = N 1−s .
N s − 1 s s − 1

Remark 14.21. One can show that the integral appearing in the lemma is a holo-
morphic function for σ > 0.

Step 3: Meromorphic continuation and non-vanishing of the Riemann zeta function.

Theorem 14.22. The Riemann zeta function has a meromorphic continuation to


s ∈ C with σ = Re(s) > 0. The meromorphic continuation is holomorphic up to a
simple pole at s = 1 with residue 1. For σ ≥ 1 the zeta function does not vanish.
126
Proof. The claim for σ > 1 follows from Theorem 14.14.
By Lemma 14.20 with N = 1, one obtains for σ > 1 that
Z ∞
1
ζ(s) = +s (1 − {t}) t−s−1 dt.
s−1 1

As noted in Remark 14.21, one can show that the integral is holomorphic for σ > 0.
From the term 1/(s − 1), we see that ζ(s) has a simple pole at s = 1 with residue
1 and no other poles with σ > 0. Note that by the identity theorem in complex
analysis, we know that this continuation is unique.
It remain to show that ζ(1 + it) ̸= 0 for real t ̸= 0. Assume for contradiction that
the zeta function vanishes at 1 + it0 with t0 ∈ R \ {0}. We consider the one-variable
function ζ(σ + it0 ) and take the Taylor expansion around σ = 1. This yields
ζ(σ + it0 ) = (σ − 1)ζ ′ (1 + it0 ) + . . . .
Since ζ has a pole at s = 1 with residue 1, the Laurent expansion of ζ(s) around
s = 1 is given by
ζ(s) = (s − 1)−1 + . . . .
Now define the function
Z(s) = ζ(s)3 ζ(s + it0 )4 ζ(s + 2it0 ).
From above, the function Z(s) is holomorphic for σ > 1 and meromorphic for σ > 0.
Moreover, the function Z(s) vanishes at the point s = 1 because ζ(s)3 has a pole of
order 3 but ζ(s + it0 )4 vanishes to order 4. Thus as σ → 1, we have
log |Z (σ)| → −∞.
We now use the product expansion (from Theorem 14.14) of ζ and expand the
logarithm for σ > 1.
In complex analysis, the complex logarithm can be defined such that for |z| < 1
one has

X zj
Log (1 − z) = − .
j=1
j
Moreover,
Re (Log (z)) = log |z|.
Hence
∞ ∞
X 
−s
 XX 1 −js
X
Log (ζ(s)) = − log 1 − p = p = an n−s ,
p p j=1
j n=1

for certain an ∈ Q, an ≥ 0. Now write s = σ + it and recall that


Re(n−s ) = n−σ Re eit log(n) = n−σ cos(t log(n)).

127
Hence
  X∞   X∞
log|ζ(s)| = Re Log ζ (s) = an Re n−s = an n−σ cos (t log n) .
n=1 n=1

We next construct the logarithm of |Z(s)|:


log |Z (σ)| = 3 log |ζ (σ)| + 4 log |ζ (σ + it0 )| + log |ζ (σ + 2it0 )|
X∞  
−σ
= an n 3 + 4 cos (t0 log n) + cos (2t0 log n) ≥ 0.
n=1

The final inequality holds because


cos(2x)=2 cos2 (x)−1

3 + 4 cos(t0 log n) + cos(2t0 log n) = 3 + 4 cos(t0 log n) +2 cos2 (t0 log n) − 1
= 2 + 4 cos(t0 log n) + 2 cos2 (t0 log n) ≥ 0,
as the function
f (x) = 2 + 4x + 2x2 = 2(x + 1)2 ≥ 0.
This inequality contradicts the claim that log |Z (σ)| → −∞ as σ → 1. □

Step 4: A lemma about Dirichlet series.


We recall the majorant criterion of Weierstrass from complex analysis (without
proof).
Definition. A series ∞
P
r=0 fr of functions fr : D → C, D ⊂ C is called (abso-
lutely and) locally uniformly convergent (in D), if for every point a ∈ D there is a
neighborhood U and a sequence (Mr )r≥0 of nonnegative real numbers such that
|fr (z)| ≤ Mr ∀z ∈ U ∩ D, r ∈ N0
P∞
and r=0 Mr converges.
The Weierstrass majorant criterion states the following.
P∞
Lemma 14.23. Suppose that r=0 fr is a locally uniformly convergent series of
analytic functions on an open set D ⊂ C. Then the infinite series is also analytic
on D.
Lemma 14.24. If a Dirichlet series D(s) = ∞ an
P
n=1 ns is constructed with an ∈ C
satisfying |an | = O (nε ) for every ε > 0 and n → ∞, then D(s) converges locally
uniformly for Re(s) = σ > 1 and is a holomorphic function of s in that region.
Proof. Let σ0 > 1 be arbitrary and suppose that 0 < ε < σ0 − 1. Then |an | ≤ cnε
for some constant c. Thus it follows that for s with σ ≥ σ0 we have

X ∞
X ∞
X
an n−s ≤ c n−(σ−ε) ≤ c n−(σ0 −ε) .
n=1 n=1 n=1
128
The series on the right-hand side converges because σ0 − ε > 1. The claim thus
follows by the Weierstrass majorant criterion. □
P 
log p
Step 5: The sequence p≤n p − log n converges.
n∈N
For n ∈ N, set
X log p
an := .
p≤n
p
Then by Lemma 14.11, we have
an = log n + O(1),
and hence an = O (nε ) for all ε > 0 (comparison with L’Hospital’s rule). Thus by
Lemma 14.24, the Dirichlet series D(s) = ∞ an
P
n=1 ns is a holomorphic function for
σ > 1. Then, noting that we may interchange the order of summation because of
absolute convergence, we have
∞ X ∞
X log p −s
X log p X
D(s) = n = n−s .
n=1 p≤n
p p
p n=p

We then use Lemma 14.20 for the inner sum to obtain


∞ Z ∞
X
−s 1 1−s
1 − {t} t−s−1 dt

n = p +s
n=p
s−1 p
 
Z ∞
p  1 1 s(s − 1)
1 − {t} t−s−1 dt .

= s
−  +
s − 1 p − 1 ps ps − 1 p p

One can use the Weierstrass majorant criterion to show that the function
s(s − 1) ∞ 
Z
1 
gp (s) =  + 1 − {t} t−s−1 dt
ps 1 − ps p p

is holomorphic for σ > 0 and



|s| (|s| + 1)
Z
1 1
|gp (s)| ≤   + |s| (|s| + 1) t−σ dt =  + .
pσ pσ − 1 p pσ pσ − 1 σpσ+1

Thus
  !
X log p p 1 1 X log p X
D(s) = + gp (s) = + gp (s) log p ,
p
p 1−s ps − 1 s−1 p
p s−1
p

and the last series is absolutely and locally uniformly convergent and thus holomor-
phic for Re(s) > 12 . Set
X
h(s) = gp (s) log p,
p
129
so that !
1 X log p
D(s) = + h(s) .
s−1 p
ps − 1
We previously showed that
X  
log ζ(s) = − log 1 − p−s ,
p

so that by differentiation we have


ζ ′ (s) X (log p)p−s X log p
=− = − ,
ζ(s) p
1 − p−s p
ps − 1

and hence  ′ 
1 ζ (s)
D(s) = − + h(s) .
s−1 ζ(s)
By Theorem 14.22, the right-hand side is holomorphic for σ ≥ 1 up to a double pole
at s = 1 (since ζ has a simple pole at s = 1). Since ζ has residue 1 at s = 1, the
principal part (the part that grows) of the Laurent expansion of D(s) at s = 1 is
1 c
D(s) = 2
+ + ...
(s − 1) (s − 1)
for some constant c. Now set
D(s)
e := D(s) + ζ ′ (s) − cζ(s).
e is holomorphic for σ ≥ 1 and we have
Then D
X∞  
D(s) =
e an − log n − c n−s .
n=1

In particular, setting
fn := an − log n − c,
the series (from plugging s = 1 in)

X fn
n=1
n
converges. In the next step, we show that fn → 0, which together with
X log p
fn = − log n − c
p≤n
p

would imply that !


X log p
lim − log n = c.
n→∞
p≤n
p
130
Step 6: fn → 0.
Since ∞ fn 1
P
n=1 n < ∞, by Cauchy’s criterion there exists for 0 < ε ≤ 2
an N0 > 0
such that for every N > N0 one has
X fn
< ε2
n
N ≤n≤N (1+ε)

and
X fn
> −ε2 .
n
N (1−ε)≤n≤N

Thus for n ∈ N with N ≤ n ≤ N (1 + ε) we have


X log p X log p N
fn = − log n − c ≥ − log N − c + log
p≤n
p p≤N
p n
| {z }
=fN

≥ fN − log(1 + ε) > fN − ε.

series exp.
of log

It therefore follows that


  X 1
fN − ε < ε2 .
n
N ≤n≤N (1+ε)

We bound the sum on the left-hand side for N > N0 against


1 + ⌊N (1 + ε)⌋ − N 1 + N (1 + ε) − 1 − N ε
≥ > = ,
N (1 + ε) N (1 + ε) 1+ε
so that we may conclude that
5 ε< 23
fN < ε (2 + ε) ≤
ε.
2
Furthermore for n with N (1 − ε) ≤ n ≤ N we have
X log p X log p N
fn = − log n − c ≤ − log N − c + log
p≤n
p p≤N
p n
≤ fN − log(1 − ε) < fN + 2ε
1
for any ε ≤ 2
(again bounding the series expansion for the logarithm). Therefore
  X 1
fN + 2ε > −ε2 .
n
N (1−ε)≤n≤N

Since  
X 1 N − N (1 − ε) N − N (1 − ε)
≥ ≥ =ε
n N N
N (1−ε)≤n≤N
131
we conclude that fN + 2ε > −ε, and thus fN > −3ε. It follows that
|fN | < 3ε,
from which we conclude that fn → 0.
We now combine the results about the Riemann zeta function.
Theorem 14.25.
(1) The only pole of the Riemann zeta function is at the point s = 1 and it is a
simple pole with residue 1.
(2) The only zeros of the Riemann zeta function outside of the strip 0 < σ = Re(s) <
1 (this is known as the critical strip) is at the points s ∈ −2N. These zeros are
simple.
(3) For s in the critical strip, if s, s, 1 − s, or 1 − s is a zero of the Riemann zeta
function, then all of the others are as well. The orders of the zeros are all the
same.
Proof.
(1)+(2): We have seen the claim for σ > 0. Hence it suffices to determine the poles
and zeros for σ ≤ 0. Consider the function
s s
Λ(s) = ζ(s)Γ π− 2 .
2
By Theorem 14.18, we have
Λ(1 − s) = Λ(s).
For σ ≤ 0, we have 1 − σ ≥ 1. Hence
2s−1
1−s

ζ(1 − s)Γ π 2
ζ(s) = s
2 .
Γ 2

For σ ≤ 0 and s ̸= 0, the function ζ(1 − s) does not have any poles or zeros,
while it has a simple pole when s = 0. However, this simple pole is cancelled
by the pole of Γ(s/2) (recall that there are simple poles of Γ(z) whenever
1−s

z ∈ −N0 ). Note further that for σ ≤ 0, the function Γ 2 has no zeros
or poles. Thus there are zeros precisely when 2s ∈ −N, or in other words
s ∈ −2N. They are all simple because the poles of the Gamma function are
simple.
(3) Suppose that σ > 1. Then
X X
ζ(s) = n−s = n−s = ζ(s).
n≥1 n≥1

Therefore if there is a zero at s, then there is also a zero at s. By the analytic


continuation, ζ(s) = ζ(s) holds more generally for all s ∈ C and the orders of
vanishing are the same. The other zeros follow from the functional equation.
132

Remarks.
1. The claim that ζ has infinitely many zeros in the critical strip 0 < σ < 1 and
that these all satisfy the properties in Theorem 14.25 (3) was conjectured by
Riemann and proven in 1893 by Hadamard.
2. There are many questions about the distribution of the zeros. Suppose that
T ≥ 0 and let N (T ) be the number of zeros of ζ(s) with 0 ≤ Im(s) ≤ T .
Then Riemann conjectured and von Mangoldt proved that as T → ∞
T T T
N (T ) = log − + O (log T ) ,
2π 2π 2π
giving a “vertical” distribution of the zeros in the critical strip. Much less
is known about the “horizontal” distribution (i.e., about the real parts. In
particular, it has not been proven that
1
 ζ has no zeros for σ > 1 − ε, no
matter how small one takes ε ∈ 0, 2 . Showing that the zero-free region
includes a strip of the type Re(s) > 1 − ε would have a number of important
applications.
c
De la Vallee Poussin proved that for η(t) := log(t) with c > 0 and σ >
t − η (|t|) with |t| sufficiently large, ζ (σ + it) ̸= 0. Note that η(|t|) → 0 as
|t| → ∞.
The most famous conjecture about the horizontal distribution of the zeros
if the Riemann hypothesis, which conjectures that all of the zeros lie on the
critical line σ = 21 . Furthermore, the grand simplicity conjecture conjectures
that all zeros of the Riemann zeta function are simple. There is numerical
and theoretical evidence in support of the Riemann hypothesis, but it has
not been proven.

We finally consider the function 1/ζ(s), which would have poles wherever ζ has
zeros (and vice-versa).

Theorem 14.26. For σ ≥ 1, we have



1 X
= µ(n)n−s .
ζ(s) n=1

In particular

X µ(n)
= 0.
n=1
n
Furthermore, for x → ∞, we have
X
µ(x) = o(x).
n≤x
133
Proof. Since µ is multiplicative, the product formula for multiplicative functions
implies that for σ > 1
∞ ∞
X µ(n) Y X ν −νs
Y
−s
 1
s
= µ (p ) p = 1 − p = .
n=1
n p ν=0 p
ζ(s)
1
Since ζ does not have any zeros for σ ≥ 1, it follows that ζ(s) is holomorphic in that
region. We furthermore see that ζ(s)−1 vanishes at s = 1 because ζ(s) has a pole
there.
The last claim follows by partial summation (Lemma 14.9). Namely, with
µ(n) X µ(n)
an = , tn = n, g (t) = t, A(t) = ,
n n≤t
n
Z x
1 X 1 X µ(n) 1
µ(n) = · n = A(x) − A (t) dt.
x n≤x x n≤x n x 1

By the second claim in the theorem, for arbitrary ε > 0 there exists t0 such that for
t > t0 we have |A (t)| ≤ ε. Therefore
1 t0
Z
1X 1
µ(n) ≤ ε + |A(t)| dt + ε(x − t0 ) = 2ε + o(1).
x n≤x x 1 x

Taking ε → 0 implies the claim. □


There is a relationship between the Riemann hypothesis and the function
X
M (x) := µ(n).
n≤x

Note that, for x ≥ 0, X


M (x) ≤ |µ(n)| ≤ x.
n≤x

An improvement for M (x) would have strong implications for the Riemann zeta
function.
Theorem 14.27.  
(1) Suppose that for some a ∈ 21 , 1 we have M (x) = O xa as x → ∞. Then ζ
 

has no zeros withσ > a.


1
(2) If M (x) = O x 2 , then the Riemann hypothesis follows and moreover all zeros
are simple.
Proof. We use partial summation to show that for σ > 1 (we use Lemma 14.9 with
tn = n, an = µ(n), and g(t) = t−s )
X µ(n) Z x
−s
s
= M (x)x + s M (t)t−s−1 dt.
n≤x
n 1
134
Taking x → ∞ yields Z ∞
1
=s M (t)t−s−1 dt.
ζ(s) 1
 
(1) Plugging in M (x) = O xa , we see that the integral converges absolutely for
σ > a and defines a holomorphic function there. Therefore ζ(s) cannot have any
zeros in this region.
1
(2) If |M (x)| ≤ cx 2 , then for σ > 21 it would follow that
Z ∞
1 1 c|s|
(14.3) ≤ c|s| t−σ− 2 dt ≤ .
|ζ(s)| 1 σ − 12
Suppose now that s0 is a zero of ζ with order k. Then in a small neighborhood
around s0 it follows that
ζ(s) = ak (s − s0 )k + . . . , with ak ̸= 0.
If s0 = 21 + it0 with t0 ∈ R, then for s = 12 + ε + it0 with ε > 0 sufficiently small,
(14.3) implies that
c 21 + ε + it0
 
1  
1≤ ζ + ε + it0 ≤ O εk−1 .
ε 2
Hence it must be the case that k = 1.

135
15. Elliptic Curves
Definition. An elliptic curve is given by an equation of the form
y 2 = x3 + ax2 + bx + c.
For fixed a, b, c, one looks for solutions (x, y) to the above equation (for example,
with x, y ∈ Z or x, y ∈ Q)

Example 15.1. Consider the following curve:

E : y 2 = x3 + 17
The elliptic curve
(15.1) E : y 2 = x3 + 17
has the solutions (−2, 3), (−1, 4), and (2, 5) (these were found simply by randomly
plugging in x and solving for y). How does one systematically find solutions to the
above equation?
Suppose that we have a solution such as (−2, 3) and we’d like to find more solutions.
Noting that the solution (−2, 3) also satisfies the linear equation
y = x + 5,
we may ask which other solutions also satisfy this linear equation.
Plugging in the linear equation to the equation (15.1) defining the elliptic curve
yields the equation
x3 − x2 − 10x − 8 = 0.
Since we already know that x = −2 is a solution, we can use polynomial long division
and factor
x3 − x2 − 10x − 8 = (x + 2) x2 − 3x − 4 .


By the quadratic equation, the second factor has the solutions x = −1, x = 4.
Plugging in the relation y = x + 5 yields the points (−1, 4) and (4, 9) on the elliptic
curve. Note also that (−2, −3), (−1, −4), and (4, −9) are also solutions due to the
symmetry y → −y.
136
The point (−2, 3) is also a point on the line
y = 3x + 9,
and we find that other points lying on the same line and also on the ellipltic curve
must satisfy
0 = x3 − 9x2 − 54x − 64.
Polynomial long division yields
0 = (x + 2) x2 − 11x − 32 .


The quadratic equation yields the additional solutions



11 ± 249
x= .
2
These are not rational solutions, though.
Problem. A cubic equation with integral coefficients which has at least one rational
solution need not necessarily have two, but if it has two rational solutions, then its
third solution is also necessarily rational.
Consider for example the two rational solutions P = (−2, 3) and Q = (2, 5) to (15.1)
and simultaneously to the line
x
y = + 4.
2
Plugging this relation into (15.1), these two points yield two rational solutions to
the equation
x2
0 = x3 − − 4x + 1.
4
Factorization of the polynomial yields
 
1
0 = (x − 2)(x + 2) x − .
4
We see that the third solution is also rational, and corresponds to the point 41 , 33

8
.
Hence from two rational points on the elliptic curve, we have found a third such
point.
Now that we have three rational points, we can continue to find more rational
points on the elliptic curve using this idea (drawing a line between two of the points
and finding a third point on that line). However, if we now choose the points (−2, 3)
and 41 , 33
8
, then clearly the third point is (2, 5), which we already have found.
However, as noted above, there is an additional symmetry between solutions (x, y)
1 33 1 33
 
and (x, −y). Thus we may replace 4 , 8 with 4 , − 8 and draw lines between the
other known points on the elliptic curve and this point.
The line passing through 41 , − 33

8
and (−2, 3) is given by
19 10
y=− x− .
6 3
137
Plugging this linear relation into the elliptic curve (15.1) yields
   
3 361 2 190 53 1 106
0=x − x − x+ = x− x+2 x− .
36 9 9 4 9
This yields the additional rational point 106 1097

9
, − 27
on the elliptic curve.
Continuing in this way, one can find infinitely many rational points on the elliptic
curve. One can show that all of the solutions may be constructed from P and Q in
this way. This is a special case of a theorem of Mordell.
Theorem 15.2 (Mordell). If E is an elliptic curve given by the equation
(15.2) E : y 2 = x3 + ax2 + bx + c,
with a, b, c ∈ Z, and whose discriminant ∆(E) satisfies
∆(E) := ∆ = −4a3 c + a2 b2 − 4b3 − 27c2 + 18abc ̸= 0,
then there is a finite set of solutions
P1 = (x1 , y1 ), P2 = (x2 , y2 ), . . . , Pr = (xr , yr )
of (15.2) with rational coordinates, such that every rational solution to E may be
obtained from these solutions by recursively drawing lines between any two points
and reflecting over the x-axis.
The proof of Theorem 15.2 is omitted here. For the interested reader, a proof
may be found in a more in-depth introduction to elliptic curves. The graduate texts
on the topic by Joseph Silverman are highly recommended.
We consider instead individual examples. Next consider the elliptic curve y 2 = x3 +x.
Theorem 15.3. The only rational point on the elliptic curve
E : y 2 = x3 + x
is the point (x, y) = (0, 0).
A C

Proof. Suppose that B , D is a point on E with rational coordinates (i.e., A, B, C, D ∈
Z and we may assume without loss of generality that B > 0 and D > 0 and
gcd(A, B) = 1 = gcd(C, D). If A = 0 or C = 0, then we seem immediately that the
point is (0, 0).
Plugging this point into the formula for the elliptic curve, we see that
C 2 B 3 = A3 D2 + AB 2 D2 = D2 A A2 + B 2 .

(15.3)
Hence D2 A | C 2 B 3 . Since gcd(C, D) = 1 it follows that D2 | B 3 . Similarly,
A3 D2 = C 2 B 3 − AB 2 D2 = B 2 C 2 B − AD2 ,


and hence B 2 | A3 D2 . Since gcd(A, B) = 1, it follows that B 2 |D2 , and hence B | D.


We may thus set v := D B
∈ Z. Plugging D = Bv into D2 | B 3 , we see that B 2 v 2 | B 3 ,
and hence v 2 | B. Therefore B = v 2 z with z ∈ Z. So D = Bv = v 3 z.
138
We now plug B = v 2 z and D = v 3 z into (15.3) to obtain
3 2 2 3 2
C 2 v 2 z = A3 v 3 z + A v 2 z v z
⇒ C 2 z = A3 + Av 4 z 2 .
Therefore
A3 = C 2 z − Av 4 z 2 = z C 2 − Av 4 z .


It follows that z | A3 . Since we have already seen that z|B and gcd(A, B) = 1, we
conclude that z = ±1. Moreover, since B = v 2 z > 0, we conclude that z = 1. We
conclude that B = v 2 and D = v 3 . We may therefore simplify (15.3) as
 
C 2 = A3 + Av 4 = A A2 + v 4 .

We would next like to show that gcd(A, A2 + v 4 ) = 1. Assume for contradiction


that there exists a prime p such that p | A and p | A2 + v 4 . Then it would follow
that v is also divisible by p, but this is impossible because gcd(A, B) = 1. Thus
gcd(A, A2 + v 4 ) = 1 and, since their product is a square, we conclude that A = u2
and A2 + v 4 = w2 for some u, w ∈ Z. Plugging the first equation into the second,
we obtain u4 + v 4 = w2 . By Theorem 10.3 (1), the only solution to this equation
over the integers is u = v = w = 0, and hence A = C = 0. □
Now consider the elliptic curve
(15.4) E : y 2 = x3 − 4x2 + 16.
The following points lie on E:
P1 = (0, 4), P2 = (4, 4), P3 = (0, −4), P4 = (4, −4).
We again consider the line through P1 and P2 . These satisfy the equation y = 4.
Plugging into (15.4) yields
0 = x3 − 4x2 = x2 (x − 4).
Since x = 0 is a double root, we do not obtain any further points on the elliptic
curve. Similarly, choosing any of the other pairs of points between P1 , P2 , P3 , and
P4 , we obtain no new points with rational coordinates. Moreover, one can show that
the only rational points on this curve are precisely P1 , P2 , P3 , and P4 .

More generally, one calls a (finite) collection of points P1 , P2 , . . ., Pt (t ≥ 3) on the


elliptic curve (a, b, c ∈ Z)
E : y 2 = x3 + ax2 + bx + c
a set of torsion points if every line between two of the points does not give a new
point outside of the set.
We note the following theorem about torsion points without proof. The interested
reader may check the original papers or a book centered on elliptic curves.
139
Theorem 15.4. Let an elliptic curve
E : y 2 = x3 + ax2 + bx + c
with integer coefficients a, b, c be given and suppose that P1 , . . ., Pt are a set of
torsion points with rational coordinates. If ∆(E) ̸= 0, then the following hold:
(1) (Nagell–Lutz Theorem, 1935/37): Writing Pi = (xi , yi ), we have xi , yi ∈ Z and
if yi ̸= 0, then yi2 | 16∆(E).
(2) (Mazur’s Theorem, 1977): A set of torsion points has size at most 15.
We also note without proof the following theorem of Siegel about integral points.
Theorem 15.5 (Siegel, 1926). Let E be an elliptic curve
E : y 2 = x3 + ax2 + bx + c,
with a, b, c ∈ Z and ∆(E) ̸= 0. Then there are only finitely many integer solutions
x, y.
In addition to taking solutions x, y ∈ Q or x, y ∈ Z, one can consider x, y ∈ Z/pZ
for a prime p.
Question. For x, y ∈ Z/pZ, there are only finitely many possible choices of x and y,
and hence necessarily only finitely many solutions. We may thus count the number
of solutions. How many such solutions are there?
Example 15.6. Consider
E : y 2 = x3 + x.
From Theorem 15.3, we already know that the only rational point on this curve is
(0, 0). However, letting Np denote the number of points on E modulo p, we get the
following table:
p points modulo p on E Np
2 (0, 0), (1, 0) 2
3 (0, 0), (2, 1), (2, 2) 3
5 (0, 0), (2, 0), (3, 0) 3
7 (0, 0), (1, 3), (1, 4), (3, 3), (3, 4), (5, 2), (5, 5) 7
11 (0, 0), (5, 3), (5, 8), (7, 3), (7, 8), (8, 5), (8, 6), (9, 1), (9, 10), (10, 3), (10, 8) 11
13 (0, 0), (2, 6), (2, 7), (3, 2), (3, 11), (4, 4), (4, 9), (5, 0), (6, 1), (6, 12), (7, 5) 19
(7, 8), (8, 0), (9, 6), (9, 7), (10, 3), (10, 10), (11, 4), (11, 9)

Note that it is often the case that Np = p. One easily checks that this is the case
for
p = 2, 3, 7, 11, 19, 23, 31, 43, 47, 59, 67, 71, . . .
Questions.
• Is it always the case that Np = p for p ≡ 3 (mod 4)?
140
• What about the other primes, namely p ≡ 1 (mod 4)?

Indeed, these questions have been answered, but the proof is left out of this lecture.

Theorem 15.7. Let E : y 2 = x3 + x.


(1) If p ≡ 3 (mod 4), then Np = p.
(2) If p ≡ 1 (mod 4), then write p = A2 +B 2 with an odd A > 0. Then Np = p±2A,
where + is chosen in the case that A ≡ 3 (mod 4) and − is chosen if A ≡ 1
(mod 4).

Consider next E : y 2 = x3 + 17.


p 2 3 5 7 11 13 17 19 23 29
Np 2 3 5 12 11 20 17 26 23 29

Theorem 15.8. If p ≡ 2 (mod 3) and the elliptic curve E is given by


E : y 2 = x3 + 17,
then Np = p.

Proof.
We first claim that 03 + 17, 13 + 17, . . ., (p − 1)3 + 17 (mod p) is a permutation
of 0, 1, 2, . . . , p − 1 (mod p). To show the claim, we must show that j 3 + 17 are all
distinct modulo p, or in other words
b31 ≡ b32 (mod p) ⇒ b1 ≡ b2 (mod p).
Since gcd(3, p − 1) = 1, there exists a solution u, v ∈ Z to the equation
3u − (p − 1)v = 1
2p−1
(for example, u = 3
,v = 2). Then, using Fermat’s little theorem (Theorem 4.11
(3)), we have
(p−1)v+1 (p−1)v+1
b31 ≡ b32 (mod p) ⇒ b3u 3u
1 ≡ b2 (mod p) ⇒ b1 ≡ b2 (mod p)
Thm. 4.11(3)
⇒ b1 ≡ b2 (mod p).
This yields the claim.
Hence the number of solutions to y 2 ≡ x3 + 17 (mod p) (with x running through all
possible choices modulo p) is equal to the number of solutions to y 2 ≡ a (mod p)
(with a running through all possible choices modulo p). We thus next count the
number of solutions for each such choice of a.
The congruence y 2 ≡ 0 (mod p) has the unique solution y ≡ 0 (mod p) and
furthermore the congruences
y 2 ≡ a (mod p) 1≤a≤p−1
141
have either two or no solutions, with each occurring equally often. Therefore y 2 ≡
x3 + 17 (mod p) has precisely
 
p−1
Np = 1 + 2 =p
2
solutions modulo p. □
Remark 15.9. The two elliptic curves that we have looked at are special; it is nor-
mally rather uncommon to have Np = p. However, p is the “expected value” of Np
in some sense.
Since p is a sort of “expected value”, for an arbitrary elliptic curve E, it is natural to
consider the difference ap := p − Np , which is known as the p-defect. The following
theorem (which we do not prove in this class) explains that Np cannot be too far
away from the expected value of p.
Theorem 15.10 (Hasse). Let E be an elliptic curve with integral coefficients and
denote by Np the number of points on E modulo p. Then for ap := p − Np , we have

|ap | < 2 p.

142
16. Partitions
Definition. A partition of an integer n ∈ N is a non-increasing sequence of integers
in N (a1 , . . . , ar ), for which
n = a1 + . . . + ar .
Let p(n) denote the number of partitions of n, writing p(0) = 1 for convenience.

Examples 16.1. The partitions of 3 are given by:


3, 2 + 1, 1 + 1 + 1, p(3) = 3.
The partitions of 5 are:
5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, 1+1+1+1+1, p(5) = 7.

There is a geometric representation of partitions given via the so-called Ferrers di-
agram.
Every summand of the partition is given as a row of the Ferrers diagram, with a
number of dots equal to the size of the summand.

Example 16.2.
6 + 3 + 3 + 2 + 1 = 15.
• • • • • •
• • •
• • •
• •

There is a natural map that takes partitions of n to other partitions of the same
size n given by the conjugate partition.
The conjugate partition is constructed by interchanging the rows and columns of
the partition. For example, the conjugate of the partition 6 + 3 + 3 + 2 + 1 seen
earlier in this example is the partition
5 + 4 + 3 + 1 + 1 + 1,
with corresponding Ferrers diagram
• • • • •
• • • •
• • •



Since conjugation forms a bijection on the partitions of size n (it is an involution),
we obtain the following conclusion.
143
Theorem 16.3. The number of partitions of n with precisely m parts is equal to
the the number of partitions of n for which the largest part is precisely m.
For q ∈ C with |q| < 1, we let

X
P (q) = p(n)q n
n=0

denote the generating function of p(n).


Theorem 16.4 (Euler). For q ∈ C with |q| < 1, we have
∞ 
Y −1
P (q) = 1 − qn .
n=1

Proof. We ignore questions of convergence and simply prove the identity formally (a
full proof relies on the Weierstrass majorant criterion and the Weierstrass product
formula from complex analysis).
Let F (q) denote the right-hand side. We expand every factor in the infinite
product as a geometric series to obtain
   
2 3 2 4 3 6
F (q) = 1 + q + q + q + . . . 1 + q + q + . . . 1 + q + q + . . . · · ·
   
= 1 + q + q 1+1 + q 1+1+1 + . . . 1 + q 2 + q 2+2 + . . . 1 + q 3 + q 3+3 + . . . · · · .
We now multiply out the product termwise, using the distributive property (on
products of infinite sums). We thus choose one term from each of the factors.
Say that we take the term q k1 ·1 from the first factor, q k2 ·2 from the second factor,
. . ., and q km ·m from the m-th factor (we can assume that there are only finitely-many
factors which are not 1, since q to an infinite power is zero, due to the fact that
|q| < 1). The contribution from this term of the product is thus
q k1 ·1 q k2 ·2 . . . q km ·m = q k1 ·1+k2 ·2+...+km ·m .
Writing

X
F (q) = f (n)q n ,
n=0
we see that this term contributes to f (n) if and only if
k1 · 1 + k2 · 2 + . . . + km · m = n.
We therefore see that f (n) precisely counts the number of partitions of n, and hence
F (q) = P (q). □
Remark 16.5. We often use the fact that if G(q) = n≥0 bn q n = 0, then bn = 0 for all
P

n. One way to see this is to take q → 0. The first term bn ̸= 0 gives the asymptotic
growth, and the sum cannot be zero if there is a non-zero term. One has to be
careful to bound the other terms to show that this is true. This bounding (using
144
real analysis) can be avoided by rewriting bn as a certain integral (using something
called the residue theorem from complex analysis, but we won’t go into detail here)
Z
1
bn = G(q)q −n−1 dq,
2πi C
where C is any simple path around zero that goes counter-clockwise. Obviously, if
G(q) = 0, then the integral is automatically zero as well, which implies that bn = 0.

# partitions of n, whose generating function


parts satisfy the condition
Q∞  −1
2m−1
odd m=1 1 − q
Q∞  −1
2m
even m=1 1 − q
Q∞  −1
m2
squares m=1 1 − q
 −1
p
Q
primes p prime 1 − q
Q∞  m

distinct m=1 1 + q
Q∞  2m−1

odd and distinct m=1 1 + q
Q∞  2m

even and distinct m=1 1 + q
 
p
Q
distinct primes p prime 1 + q
Now set
3n2 − n
w(n) = .
2
One calls w(n) and w(−n) pentagonal numbers. For n > 0 we have
n−1
X
w(n) = (3k + 1).
k=0

Theorem 16.6 (Euler’s Pentagonal Number Theorem). For q ∈ C with |q| < 1, we
have
Y∞   ∞
X
1 − qm = (−1)n q w(n) .
m=1 n=−∞

Remark 16.7. Set


pe (n) = # partitions of n into distinct parts with an even
number of parts,
po (n) = # partitions of n into distinct parts with an odd
number of parts.
145
Q∞  
m
Note that m=1 1 − q counts the number of partitions into distinct parts,
weighted with +1 if the partition has an even number of parts and −1 if the partition
has an odd number of parts. In other words, we have
∞ 
Y  X∞  
m
1−q =1+ pe (n) − po (n) q n .
m=1 n=1
Euler’s Pentagonal Number Theorem therefore implies that
pe (n) = po (n)
if n is not a pentagonal number.
Proof. We prove the theorem for 0 ≤ q < 1 and then the claim follows by analytic
continuation. Define P0 = S0 = 1 and for n ≥ 1 set
Yn   Xn  
r
Pn = 1−q and Sn = 1 + (−1)r q w(r) + q w(−r) .
r=1 r=1
We claim that
|Sn − Pn | ≤ nq n+1 .
Since nq n+1 → 0 as n → ∞, the theorem follows once we show this claim.
We now set g(r) = r(r+1)
2
and define
n
X Pn rn+g(r)
Fn = (−1)r q .
r=0
Pr
Claim. For n ∈ N we have
F n = Sn .
Proof. We first compute
1
X P1 P1 g(0) P1 1+g(1)
F1 = (−1)r q r+g(r) = q − q = (1 − q) − q 2 ,
r=0
P r P 0 P 1
   
w(1) w(−1) 2
S1 = 1 + (−1) q +q =1− q+q ,
which implies that F1 = S1 . The claim then follows by induction if we can show
that
F − Fn−1 = Sn − Sn−1 .
  n
We thus write Pn = 1 − q n Pn−1 . Then it follows that
n−1 n−1
n n2 +g(n) r Pn−1 rn+g(r) Pn−1 r(n−1)+g(r)
X X
Fn − Fn−1 = (−1) q + (−1) q − (−1)r q
r=0
Pr r=0
Pr
n−1
X Pn−1 (r+1)n+g(r)
− (−1) r q
r=0 ↑ Pr
r→r−1
146
n−1 n
r Pn−1 r(n−1)+g(r) Pn−1 rn+g(r−1)
  X
n n2 +g(n)
X
r
= (−1) q + (−1) q q −1 + (−1)r q .
r=1
Pr r=1
Pr−1
We then use 
qr − 1 1
=−
Pr Pr−1
and
r(n − 1) + g(r) = rn + g(r − 1)
to conclude that the terms on the sums on the right-hand side cancel for 1 ≤ r ≤
n − 1, leaving only the term r = n on second sum on the right-hand side. We
therefore conclude that
2 +g(n) 2 +g(n−1)
Fn − Fn−1 = (−1)n q n + (−1)n q n .
Now note that
n(n + 1)
n2 + g(n) = n2 + = w(−n),
2
n2 + g(n − 1) = w(n),
so the right-hand side simplifies to give
 
Fn − Fn−1 = (−1)n q w(n) + q w(−n) = Sn − Sn−1 .
Thus Fn = Sn for n ≥ 1, proving the claim. □
Due to the inequality Pn ≤ Pr , we conclude that
n n
n Pn rn+g(r) n→∞
X X
|Sn − Pn | = |Fn − Pn | = (−1) q ≤ q rn+g(r) < nq n+1 −→ 0
r=1
Pr r=1
and the theorem follows. □
Theorem 16.8. Set p(n) = 0 for n < 0. Then for n ≥ 1 we have the recursive
formula

X n    o
p(n) = (−1)k+1 p n − w(k) + p n − w(−k) .
k=1

Proof. By Theorem 16.4 and Theorem 16.6, we have



! ∞ !
X   X
1+ (−1)k q w(k) + q w(−k) p(m)q m = 1.
k=1 m=0

This implies the theorem directly. □

147

You might also like