Number Theory 2024
Number Theory 2024
Department of Mathematics
The University of Hong Kong
1
Contents
1. Practical Information 3
2. Introduction 6
3. Divisibility, prime factorization, and the Euclidean algorithm 8
3.1. Optional: some facts from abstract algebra about the integers 8
3.2. Division algorithm 8
3.3. Optional: Euclidean domains 17
4. Congruences and residue classes 20
5. Primality tests and cryptography 30
6. Solving congruences and the Chinese remainder theorem 40
7. Primitive roots and cyclic groups 50
8. Quadratic reciprocity 62
9. Multiplicative arithmetic functions 74
10. Sums of two squares 81
11. Representations of real numbers via continued fractions 90
12. Approximation of real numbers with rational numbers 98
13. Periodic continued fractions 104
14. Primes and their distribution 113
15. Elliptic Curves 136
16. Partitions 143
2
1. Practical Information
• Instructor
– Dr. Ben Kane
– Email: bkane[at]hku.hk
– Office: Run Run Shaw Building A411
– Consultation hours: Tuesdays 10:00-13:00
• Tutor: Mr. Jincheng Tang
– Email: tangent[at]connect.hku.hk
– Office: Run Run Shaw Building A215
– Consultation hours: Wednesdays 12:00–14:00
• Moodle Website: _MATH3304_2A_2023
Grade assessment
5
2. Introduction
What is number theory? Here are some typical questions
(1) Diophantine geometry:
Suppose that you are given a polynomial
F = F (x1 , . . . , xn )
with coefficients in Z.
What are the set of solutions to
F (x) = 0,
where xℓ come from a certain set? In geometery, one might ask about x ∈ Cn ,
in analysis, one might ask about x ∈ Rn , but in number theory, one asks for
solutions with x ∈ Zn , x ∈ Qn , or x ∈ Fnq (for q a prime).
Examples 2.1.
• Which natural numbers are the sum of two (integer) squares? In other
words, for which n ∈ N do there exist x, y, ∈ Z with
x2 + y 2 = n?
The corresponding polynomial would be F (x, y) := x2 + y 2 − n. For small
n, one can check directly:
3 no
5 = 12 + 22
7 no
..
.
In this example, for some small n we have found solutions directly by plug-
ging in, but it turns out that one can find a full characterization of the
set
n ∈ N : ∃x, y ∈ Z, x2 + y 2 = n .
To give the flavour of the answer, if we restrict to primes, then we have the
answer
p prime : ∃x, y ∈ Z, x2 + y 2 = p = {p prime : p ≡ 1 (mod 4)} .
Since we don’t know if there even exist infinitely many twin primes, we
can’t say how many there are up to X.
Similarly, one can also determine the asymptotic growth of the size of the
set
n ≤ X : ∃x, y ∈ Z, x2 + y 2 = n .
7
3. Divisibility, prime factorization, and the Euclidean algorithm
3.1. Optional: some facts from abstract algebra about the integers.
• The set Z with the usual addition and multiplication is a commutative ring
with identity.
To recall:
A ring R is a non-empty set with two binary operators + and · which satisfy
the following:
(i) The pair (R, +) form an abelian group.
∗ The set R is closed with respect to addition.
∗ The operator + satisfies associativity.
∗ There is a unit 0.
∗ For each r ∈ R, there exists −r ∈ R for which r + (−r) = 0 (i.e.,
every element has an inverse).
∗ The operator + is abelian, i.e., a + b = b + a ∀a, b ∈ R.
(ii) The operator · is associative.
(iii) For every a, b, c ∈ R, the following distributive property holds:
(a + b) · c = a · c + b · c.
A commutative ring has the additional property that
a·b=b·a ∀a, b ∈ R.
In a ring with identity, there is also an element 1 satisfying
1·a=a=a·1 ∀a ∈ R.
• The ring Z is an integral domain, which means that
a, b ∈ Z a · b = 0 ⇒ a = 0 or b = 0
(in rings where this does not hold, if a · b = 0 and neither is 0, then we call
a and b zero divisors).
3.2. Division algorithm. The properties from Section 3.1 lead to a number of
useful properties about the integers. An additional property satisfied by the integers
plays an important role in number theory. Specifically, the positive integers satisfy
the well-ordering principle, which we next describe.
Definition. Let N denote the set of positive integers (note that my N starts at
1, but for some other N includes 0 and denotes the non-negative integers; I write
N0 := N ∪ {0} for the set including 0). Then for any A ⊆ N with A ̸= ∅, there is a
smallest element of A. In other words, there exists a ∈ A such that for every x ∈ A
we have a ≤ x.
We next use the well-ordering principle to obtain the following result.
8
Theorem 3.1 (Division Algorithm). For a ∈ N and b ∈ Z, there exist unique
q, r ∈ Z with 0 ≤ r < a satisfying
b = qa + r
Proof. The set S := {b − qa : q ∈ Z, b − qa ≥ 0} ⊆ N0 is non-empty. Thus by the
well-ordering principle, it contains a unique smallest element r (we have noted the
well-ordering of N, and the well-ordering of N0 follows directly from 0 ≤ n for every
n ∈ N0 ). For this r, there exists q ∈ Z such that
r = b − qa,
or in other words b = qa + r.
We next show that 0 ≤ r < a. However, one sees from the definition of S that
every element b − qa ∈ S satisfies b − qa ≥ 0. Hence r ≥ 0 by construction. Now
suppose for contradiction that r ≥ a. In this case, we have
b − (q + 1)a = r − a ≥ 0
But then r − a ∈ S and r − a < r, which contradicts the minimality of r. We
conclude that r < a.
Finally, we must show that q and r are unique. Suppose that
b = q 0 a + r0
with q0 ∈ Z and 0 ≤ r0 < a. Then r0 ∈ S and by the minimality of r we have
r ≤ r0 . Moreover, comparing the identities containing r and r0 , we have
r0 − r = (q − q0 )a.
Since r0 ≥ r, we have q ≥ q0 . However, if q > q0 , then q − q0 ≥ 1 and hence
r0 − r = (q − q0 )a ≥ a.
On the other hand, since r0 < a and r ≥ 0, we have
r0 − r < a − 0 = a,
leading to a contradiction. It follows that q = q0 , and hence also r = r0 (since there
are no zero divisors in Z). □
Definition. Suppose that R is a commutative ring (if you have not had abstract
algebra, just consider R = Z) and a, b ∈ R. The element a is called a divisor of
b (one also says that a divides b, b is divisible by a or b is a multiple of a) if there
exists q ∈ R such that b = qa.
If a divides b, then we write a | b, and if a does not divide b (a is not a divisor of
b), then we write a ∤ b.
Remark 3.2. If R is an integral domain (the integers are an integral domain) and
a ̸= 0 satisfies a | b, then there is a unique q ∈ R for which b = qa.
9
Example 3.3. In the case we are mostly interested in, we have R = Z. Since, for
example, 5 · 13 = 65, we see that 5 is a divisor of 65. The unique q such that 5q = 65
is q = 13.
Theorem 3.4. Suppose that R is a commutative ring with identity 1 (again, you
can think of R = Z) and let a, b, c, d, u, v ∈ R. Then the following hold:
(1)
1 | a,
a | a,
a | 0.
(2) If a | b and b | c, then a | c.
(3) If d | a and d | b, then d | (ua + vb).
(4) If a | b, then a | bc and ac | bc.
(5) If 0 | a, then a = 0.
Suppose further that R is an integral domain. Then it follows that
(6) If av | bv and v ̸= 0, then a | b.
In the case of R = Z, we have
(7) The only divisors of 1 are ±1.
(8) If a | b and b | a, then a = ±b.
(9) If a | b and b ̸= 0, then |a| ≤ |b|.
Proof. These follow by fairly straightforward calculations, so we only work out a
couple of representative cases. For example, to show (3), we note that d | a and d | b
implies that there exist q1 and q2 such that
a = q1 d,
b = q2 d.
Therefore
ua + vb = uq1 d + vq2 d = (uq1 + vq2 )d,
where in the last step we used distributivity (and commutativity) of R. We therefore
see that d | ua + vb.
To show (6), we note that
bv = qav ⇒ (b − qa)v = 0.
Since v ̸= 0 and there are no zero divisors (because R is an integral domain), it
follows that
b − qa = 0.
We therefore have
b = qa ⇒ a | b.
□
10
Definition. Note that, since n · 1 = n, every n ∈ N has the divisors 1 and n.
If the only positive divisors of some integer p > 1 are 1 and p, then we call p a
prime number (or simply prime). We omit p = 1 in the definition of primes because
otherwise there are some problems which occur (for example, in prime factorization,
which we look at next).
Theorem 3.5 (Prime factorization). Every natural number n > 1 can be represented
as a product of finitely many primes.
Remark 3.6. We will later show that the prime factorization is unique.
Proof. We show the result by induction. Firstly, for n = 2 we see directly that 2
is prime (by Theorem 3.4 (9), all divisors of 2 must be smaller than 2 in absolute
value).
Now suppose that n > 2 and for every 2 ≤ m < n the claim holds. If n is prime,
then we are done. Otherwise we have
n = m1 m2
with 2 ≤ m1 < n and 2 ≤ m2 < n. By induction, m1 and m2 have prime factoriza-
tions
ℓ1
Y
m1 = pj,1
j=1
ℓ2
Y
m2 = pj,2 ,
j=1
so that
ℓ1
Y ℓ2
Y
n= pj,1 pj,2 .
j=1 j=1
This is the claim. □
Theorem 3.7. There are infinitely many primes.
Proof (Euclid). Suppose that p1 , . . . , pr are all primes. Then N = p1 · · · pr + 1 is not
divisible by any prime pj , since if N = pj q, then
!
Y
pj q = p1 · · · p r + 1 ⇒ pj q − pℓ = 1.
ℓ̸=j
But then by Theorem 3.4 (7), we conclude that pj = ±1, which contradicts the fact
that it is prime.
However, by Theorem 3.5, N has a factorization into primes. Therefore, in partic-
ular, there exists a prime p such that p | N . But then p ̸= pj for all j, contradicting
the fact that p1 , . . . , pr is the set of all primes. □
11
Sieve of Eratosthenes (ca. 300BC)
One can compute a list of all primes p ≤ x for a given x by successively underlining
primes and then crossing out multiples of that prime. The next number not under-
lined and not crossed out is the next prime. For example, for x = 15, one finds all
primes ≤ 15 as follows (the multiples of each prime are colored the same as that
prime, with the coloring matching the first time that it is crossed out):
1, 2, 3, 4 , 5, 6 , 7, 8 , 9 ,
10,
11,
12,
13,
14,
15
{x1 a1 + . . . + xm am : x1 , . . . , xm ∈ Z} .
x1 a1 + . . . + xm am = gcd(a1 , . . . , am ).
Proof.
(1) Note first that 1 is a positive divisor of a1 , . . . , am . By Theorem 3.4 (9), every
common divisor δ of a1 , . . . , am satisfies
δ ≤ min {|ak | : 1 ≤ k ≤ m} .
S := {δ ∈ N : δ is a common divisor of a1 , . . . , am }
is non-empty and bounded from above. It follows that S has a greatest element,
which we denote by d (note that d satisfies condition (b) by construction).
Now suppose that d2 satisfies condition (a). Then d | d2 because d is a common
divisor. Thus by Theorem 3.4 (9) we have d ≤ d2 . However, d2 ∈ S and so by
the maximality of d in S, we conclude that d2 = d. Thus (a) is equivalent to
(b).
We now show that the integer satisfying (b) (which is the same as the integer
satisfying (a) from above) is the same as the integer satisfying (c). The set
A := {x1 a1 + . . . + xm am : x1 , . . . xm ∈ Z, a1 x1 + . . . + am xm > 0}
then
a b
gcd , = 1.
d d
(3) We have
gcd(a, b) = gcd(b, a) = gcd(a, −b) = gcd(a, b + na).
(4) If gcd(a, n) = gcd(b, n) = 1, then gcd(ab, n) = 1.
(5) If n | ab and gcd(a, n) = 1, then n | b.
Proof.
(1) This follows directly from Theorem 3.10 (1)(c), as for a1 = an and a2 = bn we
can pull n out of each aj .
(2) Set a′ = a/n and b′ = b/n and then use (1). That is
gcd(a, b) = gcd(a′ n, b′ n) = n gcd(a′ , b′ ),
which is equivalent to the claim.
(3) These all follow by Theorem 3.10 (1)(c). In particular, in the last one, the
elements of the set
x1 a + x2 (b + na) = (x1 + nx2 )a + x2 b
are in one-to-one correspondence with x′1 = ′
x1 + nx2 and x2 = x2 (the corre-
x ′
sponding matrix ( 10 n1 ) sending ( xx12 ) to x1′ is invertible, with inverse ( 10 −n
1 )).
2
14
(4) Theorem 3.10 (2) implies that there exist x, y, u, v ∈ Z such that
1 = xa + yn = ub + vn
Rearranging, we have
abxu = axbu = (1 − ny)(1 − nv) = 1 − n(y + v − nyv)
Thus if we set r := xu and s := y + v − nyv, then we have
abr + sn = 1.
Then Theorem 3.10 (1)(c) implies that gcd(ab, n) = 1.
(5) The claim follows directly if a = 0 or b = 0, so we assume ab ̸= 0. Since
gcd(a, n) = 1, (1) implies that
gcd(nb, ab) = b gcd(n, a) = b.
Since n | ab and n | nb, Theorem 3.10 (1)(a) implies that n | gcd(ab, nb) = b.
□
Theorem 3.12. Suppose that a, b ∈ Z and p is prime. If p | ab, then p | a or p | b.
Proof. Suppose that p | ab and p ∤ a. Note next that since gcd(p, a) is a divisor of
p and p is prime, it must be the case that gcd(p, a) = p or gcd(p, a) = 1. However,
since p ∤ a, we have gcd(p, a) = 1. Thus by Theorem 3.11 (5), we conclude that
p | b. □
Theorem 3.13 (Fundamental Theorem of Arithmetic). Let n ∈ N, n ≥ 2 be given.
Then n has a unique representation as a product of primes, where uniqueness is
meant to be up to reordering the primes.
Proof. Existence of the representation as a product of primes was proven in Theorem
3.5, so it remains to prove that the representation is unique.
Suppose that there are primes p1 , . . . , pr and q1 , . . . , qs for which
n = p1 · · · pr = q1 · · · q s .
We prove the claim by induction on s. If s = 1, then n = q1 is prime, and hence since
p1 | q1 , we have p1 = 1 or p1 = q1 , from which we conclude that the representation
is unique.
Next note that by Theorem 3.12 (used repeatedly),
pr | n = q1 · · · qs
implies that pr | qℓ for some ℓ and without loss of generality, we may assume that
ℓ = s. Again noting that qs is prime, we have pr = qs as above. We then conclude
that
n
p1 · · · pr−1 = = q1 · · · ps−1 .
qs
15
By induction we conclude that the representation for n/qs is unique, which yields
the claim.
□
Theorem 3.14 (Euclidean algorithm). Let a ∈ Z and b ∈ N be given. Then there
exist integers k ≥ 0 and r1 , . . . , rk , q1 , . . . , qk+1 for which
a = bq1 + r1 0 < r1 < b,
b = r1 q2 + r2 0 < r2 < r1 ,
r1 = r2 q3 + r3 0 < r3 < r2 ,
..
.
rk−2 = rk−1 qk + rk 0 < rk < rk−1 ,
rk−1 = rk qk+1 ,
with rk = gcd(a, b).
One obtains a solution x, y ∈ Z to ax + by = gcd(a, b) by solving for rk = gcd(a, b)
by successively plugging in (with r0 = b and r−1 = a)
rj = rj−2 − rj−1 qj .
Proof. By Theorem 3.11 (3), we have
gcd(a, b) = gcd(a − bq1 , b) = gcd(b, r1 ) = gcd(b − r1 q2 , r1 )
= gcd(r1 , r2 ) = . . . = gcd(rk−1 , rk ) = rk ,
where in the last line we note that since rk | rk−1 , Theorem 3.10 (2) implies that
gcd(rk−1 , rk ) = rk .
□
Example 3.15. We consider the example
b = 918 = 2 · 33 · 17,
a = 4340 = 22 · 5 · 7 · 31.
From the prime factorization, we have gcd(a, b) = 2. We next show this via the
Euclidean algorithm. Namely,
4340 = 4 · 918 + 668
918 = 1 · 668 + 250
668 = 2 · 250 + 168
250 = 1 · 168 + 82
168 = 2 · 82 + 4
82 = 20 · 4 + 2
4=2·2
16
⇒ 2 = 82 − 20 · 4 = 82 − 20(168 − 2 · 82) = 41 · 82 − 20 · 168
= 41 · (250 − 168) − 20 · 168 = 41 · 250 − 61 · 168
= 41 · 250 − 61(668 − 2 · 250) = 163(918 − 668) − 61 · 668
= 163 · 918 − 224(4340 − 4 · 918) = 1059 · 918 − 224 · 4340.
3.3. Optional: Euclidean domains.
Definition. An integral domain R is called a Euclidean domain (or Euclidean ring),
if there exists a function (known as a Euclidean function or degree function) g :
R \ {0} → N0 such that for every a, b ∈ R with b ̸= 0, there exist q, r ∈ R with
a = qb + r and either r = 0 or g(r) < g(b).
In any Euclidean domain, there exists a division algorithm yielding a system of
linear equations such as those in the Euclidean algorithm, where at each step one
has g(rj+1 ) < g(rj ).
Examples 3.16.
(1) The Euclidean function for the R = Z is g(x) = |x| (see Theorem 3.1).
(2) Suppose that K is a field (such as K = Q or K = R, for example) and let
R = K[x] be the ring of polynomials in one variable over K.
Then
g(F ) = deg(F ),
where deg(F ) is the degree of the polynomial. The resulting division algorithm
is polynomial long division.
(3) For the ring R = Z[i] := {a + bi : a, b ∈ Z} of Gaussian integers,
g(a + bi) = |a + bi|2 = a2 + b2 .
Definition. Let a commutative ring R with 1 be given. An element ε ∈ R is called
a unit (in R), if there exists η ∈ R with
εη = 1.
The units of R form a group with respect to multiplication. These are referred to
as the group of units and are usually denoted by R× .
Elements x, y ∈ R are called associate if x | y and y | x both hold. In other words,
there exist ∃a, b ∈ R with
y = ax = aby.
In an integral domain R, we see that
y(ab − 1) = 0,
in which case either y = 0 or ab = 1. Thus two elements x and y are associate if and
only if x = y = 0 or x = by and y = ax with ab = 1 (i.e., a, b ∈ R× ). Associativity
is an equivalence relation.
17
An element x ∈ R is called irreducible if x ̸= 0 and x ∈ / R× and every divisor of
x is either a unit or associate to x. An x ∈ R is called prime or a prime element of
R, if x ̸= 0 and x ∈/ R× and if for any a, b ∈ R for which x | ab, it follows that x | a
or x | b.
Remarks.
(1) In an integral domain, every prime element is irreducible. If x is prime and y | x,
then there exists a ∈ R such that x = ay. In particular, x | ay, and hence x | a
or x | y. If x | y, then x and y are associate by definition. If x | a, then a = xb
for some b ∈ R, and it follows that
x = ay = xby =⇒ x(1 − by) = 0.
Since x ̸= 0, we see that by = 1, from which it follows that x and y are associate.
(2) By Theorem 3.12, every irreducible element in Z is prime, but this does not hold
for an arbitrary integral domain. It does hold for Euclidean domains, however;
the proof generally follows Theorem 3.12. Suppose that R is a Euclidean domain.
Then by the Euclidean algorithm for R, if a is irreducible and does not divide
b, then there exist x, y ∈ R such that ax + by = 1, so if a | bc and a ∤ b, then
∃d ∈ R with ad = bc and
d = (ax + by)d = adx + byd = b(cx + dy).
From this we conclude that
ab(cx + dy) = bc.
Since b ̸= 0 (otherwise a | b automatically) and R is an integral domain, we have
a(cx + dy) = c,
and hence a | c in particular.
Example 3.17. Set
√ √
R = Z[ −6] = a + b −6 a, b ∈ Z .
Then R is an integral
√ domain.
For z = a + b −6 ∈ R, we define the norm of z to be
N (z) := |z|2 = a2 + 6b2 .
Straightforward calculations show that
N (zw) = N (z)N (w), N (0) = 0, N (1) = N (−1) = 1,
N (z) > 1 ∀z ∈
/ {0, ±1} .
The units of R are given by
R× = {z ∈ R : N (z) = 1} = {+1, −1} .
18
√
The elements 2, 3 and√ −6 are irreducible in R.√To see this, √
suppose that w ∈ R
is a proper divisor of −6 (not a unit and not −6).
√ Then −6 = wz for some
z ∈ R, and hence (since N (z)N (w) = N (zw) = N ( −6) = 6)
N (w) ∈ {2, 3} .
However a2 + 6b2 = 2 and a2 + 6b2 = 3 have no solutions in Z. Thus
√ √
6 = 2 · 3 = − −6 · −6
are two different factorizations of 6 into irreducible factors.
19
4. Congruences and residue classes
For a, b ∈ Z and N ∈ N, one calls a congruent to b modulo N , if N | (a − b). If a is
congruent to b modulo N , then one writes
a ≡ b (mod N ),
while if they are not congruent, then one writes
a ̸≡ b (mod N ).
Theorem 4.1. Let N ∈ N be given. Congruence modulo N is an equivalence relation
which is compatible with addition and multiplication. In other words, for every
a, b, c, d ∈ Z,
(1) a ≡ a (mod N ),
(2) a ≡ b (mod N ) ⇒ b ≡ a (mod N ),
(3) a ≡ b (mod N ), b ≡ c (mod N ) ⇒ a ≡ c (mod N ),
(4) a ≡ b (mod N ), c ≡ d (mod N ) ⇒ a + c ≡ b + d (mod N ),
(5) a ≡ b (mod N ), c ≡ d (mod N ) ⇒ ac ≡ bd (mod N ).
Proof.
(1) This follows directly from N · 0 = 0 = a − a.
(2) Clearly N | (b − a) implies that N | (a − b).
(3) If N | (a − b) and N | (b − c), then there exist x and y for which N x = a − b
and N y = b − c. But then
a − c = (a − b) + (b − c) = N x + N y = N (x + y),
from which we conclude that N | a − c.
(4) We have a − b = N x and c − d = N y for some x, y ∈ Z. Thus
a + c − (b + d) = (a − b) + (c − d) = N (x + y),
from which the claim follows.
(5) We again choose x, y ∈ Z such that a − b = N x and c − d = N y. Then
ac − bd = ac − bc + bc − bd = (a − b)c + b(c − d) = (xc + by)N.
□
Definition. The equivalence relation from congruences splits Z into disjoint classes,
which we call the residue classes (or congruence classes) modulo N . One often writes
a to denote the equivalence class a + N Z.
One calls a set {x1 , . . . , xn } ⊂ Z a complete residue system modulo N if for every
a ∈ Z there exists a unique j ∈ 1, . . . , n such that
a ≡ xj (mod N ).
For example {0, 1, . . . , N − 1} (known as the least residue system modulo N ) or
{0, −1, . . . , −N + 1} . . . are complete residue systems modulo N .
20
The set of all residue classes modulo N is usually denoted Z/N Z and Z/N Z forms
a ring with respect to
a + b := a + b
a · b := a · b.
The identity element wth respect to addition is 0 and the identity element with
respect to multiplication is 1.
Remark 4.2. The ring Z/N Z may contain zero divisors. For example, in Z/4Z we
have
2·2=4=0
and in Z/6Z we have
2 · 3 = 6 = 0.
Theorem 4.3. For N > 1, the ring Z/N Z is an integral domain if and only if N
is prime.
Proof. If N is not prime, then there exist a, b ∈ N with N = ab and 1 < a < N (and
1 < b < N ). Since in Z/N Z we have
a · b = ab = N = 0,
it follows that either a is a zero divisor or either a = 0 or b = 0. Since 1 < a < N ,
we have a ̸≡ 0 (mod N ), and hence a ̸= 0. Similarly, b ̸= 0. We conclude that
Z/N Z is not an integral domain.
Conversely, assume that N is prime and that
ab = a · b = 0,
or in other words
ab ≡ 0 (mod N ),
which is equivalent to N | ab. Since N is prime, Theorem 3.12 implies that N | a
or N | b, in which case a = 0 or b = 0, respectively. Therefore Z/N Z is an integral
domain. □
Definition. A commutative ring R is called a field if every non-zero element is
invertible, or in other words R× = R \ {0}.
Theorem 4.4. Every finite integral domain R is a field. In particular, for a prime
p, Fp := Z/pZ is a field with p elements.
Specifically, every non-zero element of Z/pZ is invertible.
Proof. Suppose that x ∈ R \ {0}. Since R is finite, x, x2 , x3 , . . . are not all distinct.
Hence there exist r, s ∈ N with r < s and
xr = xs .
21
Since R is an integral domain and
xr xs−r − 1 = 0,
Definition. We call the residue class a a primitive residue class modulo N if the
integer a is relatively prime to N . This is well-defined because gcd(a + rN, N ) =
gcd(a, N ) by Theorem 3.11 (3).
Let φ(N ) denote the number of primitive residue classes modulo N ; we call φ the
Euler phi-function (it is also sometimes denoted ϕ).
Example 4.7. For N = 6, the only primitive residue classes are 1 and 5, so we
conclude that φ(6) = 2.
For N = 12, the integers 1, 5, 7, 11 give representatives of the primitive residue
classes, from which we conclude that φ(12) = 4.
Example 4.8. The set {1, 5, 7, 11} is a reduced residue system modulo 12.
24
Remark 4.9. A set {a1 , . . . , ar } of integers with gcd(aj , N ) = 1 and aj ̸≡ ak (mod N )
for every pair j ̸= k is a reduced residue system if and only if r = φ(N ). This follows
from the fact that if a ∈ Z with gcd(a, N ) = 1, then a is a primitive residue class.
Theorem 4.10.
(1) Suppose that N ∈ N and c ∈ Z are relatively prime. Then {a1 , . . . , ar } is
a reduced residue system modulo N if and only if {ca1 , . . . , car } is a reduced
residue system modulo N .
(2) The primitive residue classes form a group with respect to multiplication (that
is to say, the primitive residue classes are closed under multiplication and every
element has a multiplicative inverse). These are in one-to-one correspondence
with the group of units (Z/NZ)× of Z/NZ.
Proof.
(1) Since gcd(c, N ) = 1 = gcd(aj , N ), Theorem 3.11 (4) implies that gcd(caj , N ) = 1
as well. The set {ca1 , . . . , car } must have size φ(N ), so it remains to show that
each of the caj is in a different congruence class. Now suppose that
caj ≡ cak (mod N ),
or in other words
N | c(aj − ak ).
By Theorem 4.5 (2) and gcd(c, N ) = 1, we conclude that N | (aj − ak ), or in
other words aj ≡ ak (mod N ). Since a1 , . . . , ar is a reduced residue system,
uniqueness of the j ′ for which aj ≡ a′j (mod N ) (namely j ′ = j) implies that
k = j. Thus caj ≡ cak if and only if j = k. Since both sets have size φ(N ), we
conclude that {caj : j ∈ 1, . . . , r} is a reduced residue system.
For the converse, suppose that ca1 , . . . , car form a reduced residue system.
Since gcd(ca1 , N ) = 1, we have gcd(a1 , N ) = 1, so {a1 , . . . , ar } is a set of inte-
gers which are relatively prime to N and there are φ(N ) of them. It remains
to show that no two of them are in the same congruence class. However, if
aj ≡ ak (mod N ), then caj ≡ cak (mod N ), which implies that j = k because
{ca1 , . . . , cak } is a reduced residue system. This implies the claim.
(2) If gcd(b, N ) = gcd(c, N ) = 1, then by Theorem 3.11 (4), we have gcd(bc, N ) = 1.
Next note that by part (1), if a1 , a2 , . . . , ar are a reduced residue system, then
so are a1 c, a2 c, . . . , ar c. In particular, there exists j for which
aj c ≡ 1 (mod N ).
Thus c is a unit in Z/N Z. We conclude that the primitive residue classes form
a subgroup of (Z/NZ)× . On the other hand, every unit in Z/NZ is a primitive
residue class modulo N because bc ≡ 1 (mod N ) implies that gcd(bc, N ) = 1
(since gcd(b, N ) | bc, Theorem 4.6 (4) implies that gcd(b, N ) | 1, so gcd(b, N ) =
1). This completes the proof.
25
□
Theorem 4.11.
(1) If G is a finite abelian (commutative) group of size (also called the order of
the group) m with identity element e, then am = e for all a ∈ G.
(2) Euler’s Totient Theorem
For all N ∈ N and a ∈ Z with gcd(a, N ) = 1, we have
aφ(N ) ≡ 1 (mod N ).
Proof.
(1) Suppose that a1 = e, a2 , . . . , am are all elements of G and let a ∈ G be arbitrary.
Since G is a group, a is invertible, and hence x 7→ xa is a bijection between
elements of G. In other words
We next set g := a1 a2 · · · am . From (4.1) and the fact that G is abelian, we see
that
g = (aa1 ) · (aa2 ) · · · (aam ) = am (a1 · · · am ) = am g.
Therefore we conclude that am g = g. Since g is invertible, this implies that
am = e.
(2) This follows from part (1) with G = (Z/N Z)× .
(3) This follows directly from part (2) after noting that φ(p) = p − 1.
□
of the primes dividing j! are at most j by writing j! = jℓ=1 ℓ and then using the
Q
□
9
Example 4.12. What are the last 3 digits in the decimal representation of 99 ? In
9
other words, we need to compute 99 modulo 1000. Since
we first compute 99 (mod φ(1000)). A calculation shows that φ(1000) = 400 (we
will later compute a formula for this). Thus
It turns out that Theorem 4.11 (1) holds in more generality when G is not neces-
sarily abelian.
Theorem 4.13 (Lagrange). If G is a finite group of size m and e is the unit of the
group, then for every element a ∈ G there exists a smallest k ∈ N with ak = e and
k | m. In particular, am = e for all a ∈ G.
must all be distinct, since otherwise ar = as for some r < s < k and from this we
conclude that as−r = e, contradicting the minimality of k.
If S = G, then k = m, and hence k | m. On the other hand, if S ̸= G, then there
exists b2 ∈ G with b2 ∈/ S (note that b2 ̸= aj for any j since if j ′ ≡ j (mod k), then
j′ j
a = a ).
The set
S2 := b2 , b2 a, . . . , b2 ak−1
for which
t
[
G= Sj
j=1
is a disjoint union. Since this is a disjoint union, the size of G is
t
X t
X
#G = #Sj = k = tk.
j=1 j=1
29
5. Primality tests and cryptography
The basic idea of (modern) cryptography is to find what is called a one-way
function. A one-way function is “easy to compute in one direction”, but hard “in
reverse”. The idea of one kind of cryptography (known as RSA, after the inventors
Rivest, Shamir, and Adleman) is based on the fact that if N = p1 p2 with p1 and
p2 “large” primes, then it is easy to compute N if you know p1 and p2 (you just
multliply), but it is very hard to find p1 and p2 (these are unique by Theorem 3.13)
√ know N . It is of course possible by going through all primes up to N
if you only
(actually N is enough) and checking if they divide N . But if p1 and p2 are large,
then this will take a very long time. This has led to many people trying to develop
ways to quickly determine whether a number is prime or not. This is called primality
testing.
In order to search for primes, one needs to understand some properties satisfied
by primes. Recall that by Fermat’s Little Theorem (Theorem 4.11), for p prime and
b ∈ Z we have
bp ≡ b (mod p).
Is this a property unique to primes?
Definition. For integers n > 1 and b > 1, we call n a pseudoprime to the base b (or
Fermat pseudoprime), if n is not prime but
bn ≡ b (mod n).
If b = 2, then this is sometimes abbreviated by simply saying that n is a pseudoprime.
Example 5.1. All pseudoprimes (to base 2) under 2000 are 341, 561, 645, 1105, 1387,
1729, and 1905. Lehmer (1950) found the first even pseudoprime (namely 161038)
and Beeger (1951) proved that there are infinitely many even pseudoprimes (we will
show this later).
Definition. If n is a pseudoprime to every base, then we call n a Carmichael number.
By choosing the base b = −1, we see that (−1)n ≡ −1 (mod n) and thus every
Carmichael number is odd. The following gives a way to find some Carmichael
numbers.
Theorem 5.2. Suppose that n = p1 · · · pr , with r ≥ 2 and distinct odd primes
p1 , . . . , pr . If for every j = 1, . . . , r we have φ(pj ) = pj − 1 dividing n − 1, then n is
a Carmichael number.
Proof. Suppose that n has a representation as given in the statement of the theorem.
Then for each j there exists by assumption kj ∈ N with n − 1 = (pj − 1)kj . By
Fermat’s Little Theorem, for every a ∈ Z with pj ∤ a we have
k
an−1 = apj −1 j ≡ 1kj ≡ 1 (mod pj ).
30
It follows that an ≡ a (mod pj ) for all a ∈ Z and all j = 1, . . . , r. By Theorem 4.6
(3), we have
an ≡ a (mod lcm (p1 , . . . , pr )) .
Since p1 , . . . , pr are distinct primes, we have lcm(p1 , . . . , pr ) = p1 p2 · · · pr = n. It
follows that n is a Carmichael number. □
Remark 5.3. There are three Carmichael numbers less than 2000. Namely, they are
561 = 3 · 11 · 17,
1105 = 5 · 13 · 17,
1729 = 7 · 13 · 19.
Questions.
(1) In the 17th century, Mersenne investigated the question of which numbers n the
number 2n − 1 is prime.
It turns out that numbers of this type keep some sort of pseudoprime property.
For a > 1 and b > 1, we have
xab − 1 = (xa − 1) xa(b−1) + xa(b−2) + . . . + xa + 1 ,
(5.1)
and hence 2ab − 1 is never prime. If p is prime, then we call Mp = 2p − 1
a Mersenne number, and if Mp is itself prime, then we call Mp a Mersenne
prime. The largest known prime (as of January, 2016) is the Mersenne prime
274207281 − 1.
People naturally look for patterns, and hence it is somewhat natural to look
for primes of a certain “type” (following a general pattern). Suppose that you
are interested in finding all primes that are of the form an − 1. It turns out that
if an − 1 is prime for a, n ∈ N with n > 1, then one can show that it must be
the case that a = 2. We have seen above that n must then also be prime. This
is one reason why the search for Mersenne primes is natural.
Theorem 5.5.
(1) If n ∈ N with 2n ≡ 2 (mod n), then 2Mn ≡ 2 (mod Mn ) also holds, where
Mn := 2n − 1.
(2) If p is prime, then Mp prime or pseudoprime.
(3) If n is pseudoprime, then Mn is also pseudoprime.
(4) There are infinitely many pseudoprimes (to the base 2).
Proof.
31
(1) This holds for n = 1 because M1 = 1.
Now suppose that n > 1 and 2n ≡ 2 (mod n). Since 2n > 2, we may choose
k ∈ N such that
2n = 2 + kn.
n −1
Then 2Mn = 22 = 2 · 2kn . Therefore, using (5.1),
2Mn − 2 = 2(2kn − 1) = 2(2n − 1) 2(k−1)n + 2(k−2)n + . . . + 2n + 1
Question. Are there odd perfect numbers? This is unknown, but it is conjectured
that there are none.
Definition. Another type of integer for which primality has been thoroughly tested
n
are the Fermat numbers Fn := 22 + 1.
Remark 5.9. Similarly to looking for primes of the shape an −1, the Fermat numbers
are a natural testing ground for primes following a pattern. We first note that if m
is not a power of 2, then 2m + 1 is not prime. Write m = vt with v ∈ N and t ≥ 3
odd. Similarly to (5.1), we have
2m + 1 = 2vt + 1 = (2v + 1) 2v(t−1) − 2v(t−2) + . . . − 2v + 1 .
Note that the above factorization requires t to be odd because for t even we have
(2v + 1) 2v(t−1) − 2v(t−2) + . . . + 2v − 1 = 2vt − 1.
Theorem 5.10. For every n ≥ 0, we have 2Fn ≡ 2 (mod Fn ) and hence Fn is either
prime or pseudoprime.
n n
Proof. From Fn = 22 + 1, we have 22 ≡ −1 (mod Fn ). Thus for every a ≥ 0 we
have
n
2a·2 ≡ (−1)a (mod Fn ).
n −n
Writing k = 22 ∈ 2Z, we have
n +1 n
2Fn = 2k·2 = 2 · 2k·2 .
Since k is even, (−1)k = 1 and hence combining the two formulas above yields
n
2Fn = 2 · 2k·2 ≡ 2(−1)k ≡ 2 (mod Fn ).
□
We next discuss an idea for checking whether a number is composite (not prime).
Similar to the idea of pseudoprimes, we check a condition which is satisfied by
primes.
Since n is prime and n divides the left-hand side of the above equation, we conclude
that n divides one of the factors on the right-hand side. In other words, either
bt ≡ 1 (mod n)
or for some 0 ≤ r < s
r
b2 t ≡ −1 (mod n).
This is precisely the condition defining strong pseudoprimes to the base b.
Note: This alternative proof motivates the definition of strong pseudoprimes
because the factorization (5.4) naturally shows that condition of being strongly
pseudoprime to the base b is indeed a stronger condition and the definition stems
out of this factorization, as a prime would have to divide one of the factors, while
a non-prime may not (it may have some prime factors in common with different
factors on the right-hand side of (5.4)).
(2) By (5.3), we have
bn−1 ≡ 1 (mod n)
for all b ∈ Z with gcd(b, n) = 1.
We claim that for every b ∈ Z
bn ≡ b (mod n).
Since n is squarefree, Theorem 4.6 (3) implies that this is equivalent to
bn ≡ b (mod p)
for every prime p | n. Write the prime factorization of n as n = rj=1 pj . Since n is
Q
a
squarefree, we have pj ̸= pℓ for j ̸= ℓ. If b = b′ c with gcd(b′ , n) = 1 and c = rj=1 pj j ,
Q
where aj ≥ 0, then
bn = b′n cn .
Since b′n−1 ≡ 1 (mod n), we have b′n ≡ b′ (mod n). Thus by Theorem 4.6 (3) and
Theorem 4.1 (5), we have
bn ≡ b′ cn (mod pj )
It remains to show that cn ≡ c (mod pj ). Clearly if pj | c, then this holds trivially.
Otherwise, using Theorem 4.6 (3), it suffices to show that for ℓ ̸= j,
pnℓ ≡ pℓ (mod pj ).
36
Q
Consider b := pℓ + ℓ′ ̸=ℓ pℓ′ . Then gcd(b, n) = 1 because pℓ ∤ b for all ℓ. Thus by
(5.3) we have
bn−1 ≡ 1 (mod pj ).
Q
Since b = pℓ + ℓ′ ̸=ℓ pℓ′ ≡ pℓ (mod pj ) (noting that j ̸= ℓ), by Theorem 4.1 (5) we
have
pn−1
ℓ ≡ bn−1 ≡ 1 (mod pj ).
From this we conclude that
pnℓ ≡ pℓ (mod pj ).
Therefore bn ≡ b (mod n) for every b ∈ Z, and we conclude that n is either prime
or a Carmichael number.
□
Theorem 5.13 (Rabin’s Theorem). If n > 9 is odd and composite, then at least
3/4 of all residue classes b (mod n) are witnesses that n is not prime.
Although we don’t prove Rabin’s Theorem in this class, we discuss its implications.
Specifically, the following primality test is based on Rabin’s Theorem. Roughly
speaking, if one chooses a “random” b, there is at most a 1/4 chance that n will
be strongly pseudoprime to the base b. Checking enough bases, one can be “pretty
sure” that the number is prime because if none of the bases are witnesses, then the
probability is very nearly 1 that n is prime.
Rabin–Miller primality test
Choose a “small” k and bases b1 , . . . , bk . For a “large” odd number n, one tests if n
is a strong pseudoprime to the bases b1 , . . . , bk .
If not, then n is composite. If it is a strong pseudoprime to all of the bases, then
n is “probably prime”. If the witnesses are “independent”, then Rabin’s Theorem
states that the probability of falsely identifying a composite number as “probably
prime” is only 4−k .
Some remarks
• The algorithm only takes O(log nA ) operations (it should be fast for making
calculations)
• If one only knows n and not pA and qA , then φ(n) is not very easy to compute.
Otherwise one could find dA using the Euclidean algorithm.
39
6. Solving congruences and the Chinese remainder theorem
Theorem 6.1. Suppose that n, a, b ∈ Z with n > 0 and set d := gcd(a, n). Then the
congruence ax ≡ b (mod n) has a solution if and only if d | b. In this case, there
are precisely d different solutions modulo n.
If x0 is a solution to ad x ≡ 1 (mod nd ), then the numbers d1 bx0 + kn with k ∈
Proof. First recall that by Theorem 4.6 (4), if ax ≡ b (mod n), then gcd(ax, n) =
gcd(b, n). Since d = gcd(a, n) | gcd(ax, n) = gcd(b, n), we have d | gcd(b, n). Thus
if a solution exists, we must have d | b.
Now suppose that d | b. We write a = da0 , b = db0 , and n = dn0 with a0 , b0 , n0 ∈
Z. By Theorem 4.5 (1), we see that ax ≡ b (mod n) is equivalent to a0 x ≡ b0
(mod n0 ).
Since gcd(a0 , n0 ) = 1, Theorem 4.15 implies that
a0 x ≡ 1 (mod n0 )
has precisely one solution x0 (mod n0 ) and all solutions in Z are given by x =
x0 + kn0 with k ∈ Z. Setting y0 = x0 b0 , we have
a0 y 0 ≡ a0 x 0 b 0 ≡ b 0 (mod n0 )
and again using Theorem 4.15 all solutions of this type are of the form b0 x0 + kn0
with k ∈ Z.
Therefore
1
y0 + kn0 = b0 x0 + kn0 = (bx0 + kn).
d
These give the distinct solutions modulo n for k ∈ {0, 1, . . . , d − 1}. □
Having solved linear congruences, we next consider linear systems of congruences.
Theorem 6.2 (Chinese remainder theorem). Let pairwise co-prime natural numbers
n1 , . . . , nr and a1 , . . . , ar , b1 , . . . , br ∈ Z be given such that gcd(nj , aj ) = 1 for j =
1, . . . , r.
The system of congruences aj x ≡ bj (mod nj ) for j = 1, . . . , r has precisely one
solution modulo n1 · · · nr .
Proof. A pair (r, s) ∈ R⊕S is a unit in R⊕S precisely when there exist (u, v) ∈ R⊕S
for which
(ru, sv) = (r, s)(u, v) = 1R⊕S = (1R , 1S ).
This is hence equivalent to the existence of u ∈ R and v ∈ S with ru = 1R , sv = 1S .
From this we conclude that r ∈ R× and s ∈ S × . □
Notation. For n ∈ N, let E(n) = (Z/nZ)× denote the group of primitive residue
classes modulo n.
Lemma 6.5 together with Theorem 6.4 hence gives the following direct corollary.
□
Theorem 6.9. The Euler phi-function φ is multiplicative and for n ∈ N we have
Y 1
φ(n) = n 1− ,
p prime
p
p|n
X
φ(d) = n.
d|n
d>0
Example 6.13. We consider the polynomial f (x) = x8 + 10x + 7 and we look for
solutions modulo powers of p = 3.
We begin by solving the equation modulo 3, and then we will apply Hensel’s
Lemma to obtain solutions for higher powers of 3. To find the solutions modulo 3,
we simply plug in x = 0, x = 1, and x = 2, and see by direct calculation that y1 = 1
is the unique solution to f (x) ≡ 0 (mod 3).
Now we are going to find solutions modulo 9. To do so, we use Hensel’s Lemma
(Theorem 6.12) with a = 1. From the proof of Theorem 6.12, we need to solve
f ′ (y1 )r ≡ f (y3 1 ) (mod 3) to find the solutions modulo 9. We therefore compute
Since f ′ (y1 ) ≡ 0 (mod 3), either all choices of r satisfy the congruence or none of
them do, but f (y1 ) ≡ 0 (mod 9) implies that they indeed all satisfy the congruence.
Therefore we obtain three solutions to f (x) ≡ 0 (mod 9); namely, we have x1 = −2,
x2 = 1, x3 = 4.
We next continue to find solutions modulo 27 = 33 . For this, we use Hensel’s
Lemma (Theorem 6.12) with a = 2. We begin with each of the solutions x1 , x2 , x3
(mod 9) and apply Hensel’s Lemma in these cases. We thus again compute f ′ (xj ) ≡
0 (mod 3) for j = 1, 2, 3 and
27 ≡ 0 (mod 3)
for j = 1,
f (xj )
= 2 ̸≡ 0 (mod 3) for j = 2,
32
7287 ≡ 0 (mod 3) for j = 3.
49
7. Primitive roots and cyclic groups
Definition. If G is a finite group of size m and x ∈ G, then then there exists h ∈ N
(this depends on x) smallest with xh = e, where e is the identity element of G.
One calls h the order of x in G. By Lagrange’s Theorem (Theorem 4.13), we have
h|m. If h = m, then we have G = {e, x, x2 , . . . , xm−1 } and one calls G a cyclic group.
For a relatively prime to n, the order of a modulo n is the order of the residue class
a in the group E(n), i.e., the smallest k ∈ N with
ak ≡ 1 (mod n).
By Euler’s Totient Theorem (Theorem 4.11 (2)), the order is at most φ(n), and
Theorem 4.13 furthermore implies that the order of a is a divisor of φ(n).
Problem. For which n does there exist an a with order precisely φ(n) modulo n? In
other words, for which n is E(n) cyclic?
Definition. If a has order φ(n), then we call a a primitive root modulo n.
Recall that by Corollary 6.6, if n = pa11 · · · par r with distinct primes p1 , . . . , pr , then
E(n) ≃ E pa11 × . . . × E par r .
The identity between these two formulas hints at the fact that ψ(d) = φ(d), which
we next prove.
Recall first that by Theorem 6.16, the congruence
xd ≡ 1 (mod p)
has at most d solutions. On the other hand, if a has order d, then aj for j ∈
{0, . . . , d − 1} all satisfy
d j
aj = ad ≡ 1j = 1 (mod p),
51
and aj ̸≡ ak (mod p) for 0 ≤ j < k < d (since otherwise ak−j ≡ 1 (mod p),
contradicting the fact that the order is d). Thus if ψ(d) > 0, then aj for 0 ≤ j < d
are precisely the solutions to xd ≡ 1 (mod p), and hence in particular any element
of order d must be of the form aj for some j ∈ {0, . . . , d − 1}.
d
Furthermore, by Theorem 7.2, the order of aj is gcd(d,j) , and hence the element aj
also has order d if and only if gcd(d, j) = 1. We conclude that if ψ(d) > 0, then
ψ(d) = #{j : 0 ≤ j < d, gcd(d, j) = 1} = φ(d).
Therefore, for each d | (p − 1), we have ψ(d) = 0 or ψ(d) = φ(d). If ψ(d) = 0 for
some d, then
X X X
ψ(d) < φ(d) = ψ(d),
d|(p−1) d|(p−1) d|(p−1)
Example 7.4.
p φ(p − 1) Primitive roots modulo p
7 2 3, 5
17 8 3, 5, 6, 7, 10, 11, 12, 13
19 6 2, 3, 10, 13, 14, 15
41 16 6, 7, 11, 12, 13, 15, 17, 19, 22, 24, 26, 28, 29, 30, 34, 35.
Theorem 7.5 (Criterium for cyclic groups). A finite group G of size m is cyclic if
and only if for every divisor d of m, there is at most one subgroup H of G with size
d.
Proof. A slight generalization of Theorem 4.13 of Lagrange shows that the size of a
subgroup H of G is a divisor of m. Namely, one can show that the size of xH is the
same as the size of H and they are disjoint. We keep adding elements to get that G
is a dijoint union rj=1 (xj H) and hence #G/#H = r.
S
We first assume that G is cyclic and generated by an element x, and will show
that there is at most one subgroup of size d for d | m.
For every d | m with d > 0, xm/d generates a subgroup of size d. If H is a subgroup
′
of size d, then there exists a smallest t ∈ N with xt ∈ H. If xt ∈ H, then for every
k, ℓ ∈ Z, we have
′ k t′ ℓ
xkt+ℓt = xt x ∈ H.
In particular, Bezout’s Lemma (Theorem 3.10 (2)) implies that
′
xgcd(t,t ) ∈ H.
Since gcd(t, t′ ) | t and t is the minimal power of x in H, we conclude that gcd(t, t′ ) =
t, or in other words t | t′ . It follows that H is generated by xt . One concludes that
52
t | m and d = mt . Therefore H is the subgroup generated by xm/d . It follows that
this is the unique subgroup of size d.
Now assume for the converse that for every divisor d > 0 of m, there exists at
most one subgroup of size d, and we let ψ(d) denote the size of the set of elements
of G with order d. If ψ(d) > 0 and y ∈ G is an element of order d, then by the
same argument as in the proof of Theorem 7.3, there are exactly φ(d) elements of
the form y j ∈ G (with 0 ≤ j < d) with order d. We conclude that if ψ(d) > 0, then
ψ(d) ≥ φ(d). Moreover, letting
H = ⟨y⟩ := {y j : j ∈ Z} = {y j : 0 ≤ j < d}
be the subgroup of G generated by y, we see that #H = d. If ψ(d) > φ(d) for
some d | m, then it must be the case that there exists z ∈ G with order d and
z∈/ ⟨y⟩. However, the subgroup ⟨z⟩ also has size d and ⟨z⟩ =
̸ ⟨y⟩, contradicting the
uniqueness of subgroups of size d. Therefore we conclude that either ψ(d) = 0 or
ψ(d) = φ(d). Since every element has some order dividing m, we have
X
ψ(d) = #G = m.
d|m
so that X X
ψ(d) = φ(d).
d|m d|m
We conclude that ψ(d) = φ(d) for every d | m, and in particular ψ(m) = φ(m) > 0,
from which we conclude that G is cyclic. □
for all a ≥ 3. Thus 5 has the order 2a−2 modulo 2a , and 5 generates a cyclic
subgroup H of size 2a−2 = 21 φ(2a ) in E(2a ). Since no primitive roots exist for
a > 3, this is the largest cyclic subgroup.
Suppose for contradiction that −1 ∈ H. Then there exists 0 < r < 2a−2 for
which
5r ≡ −1 (mod 2a ).
We conclude that
52r ≡ (−1)2 = 1 (mod 2a ),
so that 2r = 2a−2 , and hence r = 2a−3 . We conclude that
a−3
52 ≡ −1 (mod 2a )
and from above
a−3
52 ≡ 1 + 2a−1 (mod 2a ),
so that 2a−1 ≡ −2 (mod 2a ), which is a contradiction for a ≥ 3.
Therefore, for a ≥ 3 we have −1 ∈/ H and E(2a ) is the direct product of the
cyclic subgroups ⟨5⟩ and ⟨−1⟩ generated by 5 and −1. In particular, we have
E(2a ) ≃ Z2a−2 × Z2 .
One sees that the mapping k (mod φ(n)) 7→ g k (mod n) defines an isomorphism
≃
(Z/φ(n)Z, +) → E(n).
56
Example 7.8. We would like to solve the congruence
x10 ≡ 13 (mod 17).
We use the primitive root g = 3 modulo 17. This yields the following table giving
the corespondence between the index and the residue classes (to go from k to k + 1,
one simply multiplies by 3 and then takes the answer modulo 17):
k 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
g k 3 9 10 13 5 15 11 16 14 8 7 4 12 2 6 1
Written in reverse, we get the following indices for the residue classes:
x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ind3 (x) 16 14 1 12 5 15 11 10 2 3 7 13 4 9 6 8
Since 13 ≡ 34 (mod 17) by the above table, the congruence x10 ≡ 13 (mod 17) is
equivalent to x10 ≡ 34 (mod 17), or in other words 310 ind3 (x) ≡ 34 (mod 17). Since
3a ≡ 3b (mod 17) if and only if a ≡ b (mod 16) (since 3 is a primitive root modulo
17 and φ(17) = 16), this may be written as the linear congruence
10 ind3 (x) ≡ ind3 (13) = 4 (mod 16).
which by Theorem 4.5 (1) is in turn equivalent to
5 ind3 (x) ≡ 2 (mod 8).
Using Theorem 4.5 (1) with a = 5 and noting that 52 ≡ 1 (mod 8), we conclude
that this is equivalent to
ind3 (x) ≡ 2 (mod 8).
Therefore ind3 (x) = 2 or ind3 (x) = 10 (since 1 ≤ ind3 (x) ≤ 16). From the abvoe
table, we see that x = 9 or x = 8 (corresponding to ind3 (x) = 2 and ind3 (x) = 10,
respectively) give the solutions to the congruence.
Question. From the above calculation, we see that it is useful to compute the indices
indg (x, p) for a given p, g and x (with p prime). How can one do this quickly?
√
The “giant steps - baby steps” algorithm of Shanks takes about O( p log p) oper-
ations to do this.
Given p and g, let q ∈ N be minimal with q(q + 1) ≥ p. One next computes
and saves g q , g 2q , . . ., g q·q (these are the “giant steps”). Suppose that k = indg (x) is
written in the form
k = ℓq + r
with 0 ≤ ℓ ≤ q and 0 ≤ r ≤ q − 1 (k may always be written in this form because
q(q + 1) ≥ p by assumption). Then we have
g ℓq+r = g k ≡ x (mod p),
57
and hence in particular
g ℓq ≡ xg −r (mod p).
We thus compute x, xg −1 , xg −2 , . . ., xg 1−q modulo p (actually, in practice one can
stop at xg −r , because we only need to compute until the above congruence holds)
and compare them with the numbers g ℓq that we have already computed (these are
the “baby steps”).
Doing this, one obtains a pair r, ℓ with
xg −r ≡ g ℓq (mod p),
from which we conclude that k = indg (x) = ℓq + r.
Special case: a = −1
In this case
1 1
a 2 (p−1) = (−1) 2 (p−1) ≡ 1 (mod p)
1
(p−1)
is equivalent to (−1) 2 = 1, which is in turn equivalent to p ≡ 1 (mod 4). So
2
x ≡ −1 (mod p) is solvable if and only if p ≡ 1 (mod 4). To state this another
way, we see that −1 is a quadratic residue if and only if p ≡ 1 (mod 4).
We are next going to investigate rational numbers and a connection between
repeating decimal expansions and the order of 10 in the multiplicative group (Z/cZ)×
for c satisfying gcd(c, 10) = 1. We begin by reviewing decimal expansions.
a
Definition. Let α ∈ Q with 0 < α < 1 be given and write it as α = b
with a, b ∈ Z
satisfying gcd(a, b) = 1. The decimal expansion
∞
X
α = 0.a1 a2 a3 . . . = aν 10−ν
ν=1
with aν ∈ {0, 1, . . . , 9} is unique as long as one does not allow an expansion where
aν = 9 for every ν > ν0 .
One calls the decimal expansion repeating (also known as recurring), if there exist
integers r ≥ 0 and s > 0 such that
aν+s = aν
for every ν > r; in other words the numbers aν repeat with period s after an initial r
digits. If r and s are chosen minimally with this property, then we call s the length
of the period and ar+1 . . . ar+s is called the repetend.
A common notation for a repeating decimal expansion is
α = 0.a1 . . . ar ar+1 . . . ar+s .
Here the line is written over the repetend; other notations include putting dots above
the digits in the repetend or separating the repetend with a space and writing . . .
afterwards, such as denoting 0.357 232 . . . to mean 0.357232.
59
We furthermore say that a decimal expansion is terminating if it has finite length
(or, equivalently, if it is a repeating decimal expansion and the repetend is 0, but
many authors exclude terminating decimal expansions when defining repeating dec-
imals).
Examples 7.11.
1 1
= 0.06 = 0.142857.
15 7
Theorem 7.12. A decimal expansion represents a rational number if and only if it
is either repeating or terminating.
This in turn applies that 2u 5v | 10r and c | (10s − 1). It follows that r ≥ max{u, v}
and 10s ≡ 1 (mod c).
We next assume that 2u 5v | 10q and c | (10t − 1) for some q ≤ r and t ≤ s. Then
we have
a 10q 10t − 1
10q+t α − 10q α = 10q (10t − 1) = u v a = ax
b 2 5 c
for some x ∈ Z. It follows that α has period length at most t and q decimals before
the periodic part of the decimal expression. Since r and s were chosen minimally
with this property, we conclude that q ≥ r and t ≥ s. It follows that q = r and
t = s.
Therefore r and s are the minimal q and t for which 2u 5v | 10q and c | (10t − 1).
We see that q = max{u, v} directly and since 10t ≡ 1 (mod c) means that the order
of 10 modulo c is equal to t. This yields the claim. □
A problem of Abel:
For which primes p does p12 have the same period length as p1 ?.
For example, for p = 3, we have
1
= 0.3
3
1
= 0.1.
9
61
8. Quadratic reciprocity
Note that for p ̸= 2 prime and p ∤ a, by completing the square, the quadratic
congruence
ax2 + bx + c ≡ 0 (mod p),
is equivalent to
(2ax + b)2 + (4ac − b2 ) ≡ 0 (mod p),
and hence equivalent to the system
y 2 ≡ d (mod p), 2ax + b ≡ y (mod p),
where d := b2 − 4ac. Since we studied linear congruences already in Theorem 6.1,
we next consider x2 ≡ a (mod p) and ask the following questions:
Questions.
(1) Given fixed p, for which a is x2 ≡ a (mod p) solvable?
(2) For fixed a, which p yield a solution to x2 ≡ a (mod p)?
a
Definition. For a ∈ Z and p ̸= 2 prime, we define the Legendre symbol p
by
1 if p ∤ a and x2 ≡ a (mod p) is solvable,
a
:= −1 if p ∤ a and x2 ≡ a (mod p) not solvable,
p
0 if p | a.
a
The number of solutions to x2 ≡ a (mod p) in every case is given by 1 + p
.
□
Theorem 8.4 (Lemma of Gauss). Suppose that p ̸= 2 is prime and a ∈ Z with
p ∤ a. For each of the integers ℓa with 1 ≤ ℓ ≤ p−1
2
(i.e., a, 2a, 3a, . . ., p−1
2
a), let
mℓ be smallest in absolute value which congruent to ℓa modulo p. Let N be the
is
a
number of ℓ for which mℓ < 0. Then p = (−1)N .
Since p ∤ p−1
2
!, it follows that
1
1 ≡ (−1)N a 2 (p−1) (mod p),
and hence
1
a 2 (p−1) ≡ (−1)N(mod p).
By Euler’s criterion (Theorem 8.3 (2)), it follows that ap ≡ (−1)N (mod p), and
hence ap = (−1)N . □
64
Theorem 8.5. Let p, a and N be given as in Theorem 8.4. Then
X ℓa p2 − 1
N≡ + (a − 1) (mod 2).
1
p 8
1≤ℓ≤ 2 (p−1)
Furthermore, since (as shown in the proof of Theorem 8.4) ℓ 7→ |mℓ | is a permutation
of the set S, we have
X X
ℓ= |mℓ | = (s1 + . . . + sM ) − (r1 + . . . + rN ).
ℓ∈S ℓ∈S
This yields the first claim. The second claim follows because for a odd we have
p2 − 1
(a − 1) ≡0 (mod 2).
8
□
p−1
However, since 1 ≤ ℓ ≤ 2
, we have 2 ≤ 2ℓ ≤ p − 1, and hence for every such ℓ we
have
2ℓ
= 0.
p
Therefore we conclude that
p2 − 1
N≡ (mod 2).
8
1
2 2 −1)
from which p
= (−1) 8 (p follows. □
Theorem 8.8 (Quadratic reciprocity). Suppose that p and q are distinct odd primes.
Then we have
p q 1
= (−1) 4 (p−1)(q−1) .
q p
Hence
− pq for p ≡ q ≡ 3 (mod 4),
p
= q
q otherwise.
p
Proof. Set
2 p−1 q−1
T := (x, y) ∈ N : 1 ≤ x ≤ ,1 ≤ y ≤ ,
2 2
q
T1 = (x, y) ∈ T : y < x ,
p
p
T2 = (x, y) ∈ T : x < y .
q
For (x, y) ∈ T2 we have y > pq x, so it follows directly that T1 ∩ T2 = ∅. Furthermore,
for (x, y) ∈ T we have gcd(qx, p) = 1, so y = pq x is impossible. Therefore T = T1 ∪T2
and T1 ∩ T2 = ∅. It follows that
p−1 q−1 X qx X py
· = #T = #T1 + #T2 = + .
2 2 p−1
p q−1
q
1≤x≤ 2
1≤y≤ 2
b (mod q)
b (mod q)
Both sides are polynomials of degree 2q, and the identity follows by comparing the
roots of both sides (there are precisely 2q roots of each side, namely ±ξ b with b
(mod q)). Thus it follows that, using z = η a ,
η qa − η −qa Y
b a −b −a
= ξ η − ξ η .
η a − η −a
b (mod q)
b̸≡0 (mod q)
68
Writing (by pairing the b ∈ R and q − b terms and noting that ξ q = 1)
Y Y b a
ξ b η a − ξ −b η −a = ξ η − ξ −b η −a ξ −b η a − ξ b η −a ,
b (mod q) b∈R
b̸≡0 (mod q)
we therefore obtain
Y
q Y YY
= ξ b η a − ξ −b η −a = ξ b η a − ξ −b η −a ξ −b η a − ξ b η −a
p a∈S a∈S b∈R
b (mod q)
b̸≡0 (mod q)
Y Y
= η 2a + η −2a − ξ 2b + ξ −2b .
a∈S b∈R
Reversing the roles of p and q reverses the order of the difference in the last product,
giving a factor of −1 for each element of S and R. Thus
p #R#S q p−1 q−1 q
= (−1) = (−1) 2 2 .
q p p
□
This example shows how solving the congruence modulo large primes can be
greatly simplified with quadratic reciprocity. It can also be used to figure out the
primes for which a given integer is a quadratic residue.
Examples 8.10.
• For which
primes
p is 5 a quadratic residue modulo p?
5 p
Since p = 5 , we see that 5 is a quadratic residue if and only if p ≡ 1, 4
(mod 5).
• For which primes p is 7 a quadratic residue modulo p?
We have
7 (p−1)
p
1= = (−1) 2
p 7
if and only if p ≡ 1, 9, 25, 3, 19 or 27 (mod 28).
69
Definition. Suppose that P, Q ∈ Z and furthermore that Q > 0 is odd and has
prime factorization Q = q1 · ·· qs , where q1 , . . . , qs are (not necessarily distinct)
P
primes. The Jacobi symbol Q is then defined by
P P P
(8.1) = ··· .
Q q1 qs
Note the following properties of the Jacobi symbol, which all follow directly from
the definition unless otherwise stated.
P
(1) If Q is prime, then the Jacobi symbol coincides with the Legendre symbol Q .
P
(2) If gcd(P, Q) > 1, then Q = 0.
P
(3) If gcd(P, Q) = 1, then Q ∈ {−1, 1}.
(4) By the Chinese remainder theorem (Theorem 6.2), the solvability of the congru-
ence
x2 ≡ P (mod Q)
is equivalent to the solvability of the family of congruences
x2 ≡ P (mod q a )
for every prime q and a ∈ N with q a | Q. Hence if x2 ≡ P (mod Q) is solvable,
then it follows that
P
=1
qj
P P
for all j = 1, . . . , s, and hence we also have Q = 1. Thus if Q = −1, then
we can conclude that the congruence is not solvable. The reverse direction
does
P
not hold, however; that is to say, there exist P and Q for which Q = 1, but
x2 ≡ P (mod Q) is not solvable. For example, take Q = q 2 with q prime and
let P be any quadratic non-residue modulo q.
Theorem 8.11. For arbitrary a, b ∈ Z and odd c, d ∈ N, the following hold.
(1) We have
a b ab
= .
c c c
(2) We have a a a
= .
c d cd
(3) If gcd(a, c) = 1, then 2
a a
= 2 = 1.
c c
(4) If a ≡ b (mod c), then it follows that
a b
= .
c c
70
Proof.
(1) By the definition (8.1), for part (1) it suffices to show the claim for c prime,
which is precisely the statement of Theorem 8.3 (3).
(2) We obtain (2) directly from the definition (8.1).
(3) Part (3) follows directly from parts (1) and (2) and the fact that the Jacobi
symbol has value ±1 whenever gcd(a, c) = 1.
(4) If a ≡ b (mod c), then a ≡ b (mod qj ) for every prime qj dividing c. The claim
then follows by Theorem 8.3 (1). □
and hence
−1 1
= (−1) 2 (Q−1) .
Q
Similarly, for odd u and v we have
u2 v 2 − 1 1 2 1
− u − 1 + v 2 − 1 = (u2 − 1)(v 2 − 1) ≡ 0 (mod 2),
8 8 8
so that one inductively concludes that
s
X 1 2 1 1
qj − 1 ≡ q12 . . . qs2 − 1 = Q2 − 1 (mod 2).
j=1
8 8 8
71
Thus Theorem 8.6 implies that
Y s Y s
2 2 1 P1 2 1
(−1) 8 (qj −1) = (−1) 8 (qj −1) = (−1) 8 (Q −1) .
2 2
= =
Q j=1
qj j=1
□
Theorem 8.13 (Quadratic reciprocity for the Jacobi symbol). For relatively prime
P, Q ∈ N, we have
P 1
(P −1)(Q−1) Q
= (−1) 4 .
Q P
Proof. We write P = p1 · · · pr and Q = q1 · · · qs with primes p1 , . . . , pr and q1 , . . . , qs .
Since P and Q are relatively prime, pj ̸= qℓ . Then by the definition (8.1) of the
Jacobi symbol and quadratic recoprocity for primes, we conclude that
Y s Y s Y r s r
P P pk Thm. 8.8 Y Y qj 1
= = = (−1) 4 (pk −1)(qj −1)
Q j=1
qj j=1 k=1
qj j=1 k=1
pk
Q Ps Pr Q 1 Ps 1 Pr
(−1)( 2 j=1 (qj −1))( 2 k=1 (pk −1)) .
1
(pk −1)(qj −1)
= (−1) j=1 k=1 4 =
P P
By (8.2) (in the proof of Theorem 8.12), we furthermore have
s
X 1 1
qj − 1 ≡ Q−1 (mod 2) and
j=1
2 2
r
X 1 1
pk − 1 ≡ P −1 (mod 2).
k=1
2 2
It follows that
P Q 1
= (−1) 4 (P −1)(Q−1) .
Q P
□
Example 8.14. Consider the Mersenne prime M19 = 219 −1 = 524287. Is x2 ≡ 100003
(mod M19 ) solvable?
Since M19 is prime, we only need to compute the Jacobi symbol 100003
524287
. Using
Theorem 8.13 and Theorem 8.12, we obtain
≡1 (mod 4)
↓
4 z}|{
100003 524287 24272 2 · 1517 1517
=− =− =− =−
524287 100003 100003 100003 100003
100003 1398 2 699 699
=− =− =− =
1517 1517 1517
| {z } 1517 1517
↑
≡5 (mod 8)
72
≡1 (mod 4)
↓
z }| {
1517 119 699 104
= =− =−
699 699 119 119
2 13 13 119 2
=− =− =− =− = 1.
119
|{z} 119 119 13 13
|{z}
↑ ↑
≡7 (mod 8) ≡5 (mod 8)
73
9. Multiplicative arithmetic functions
To recall: A function f : N → C is called multiplicative if f (nm) = f (n)f (m)
whenever gcd(n, m) = 1. By Theorem 6.8, if f is multiplicative, then F (n) =
P
d|n f (d) also is.
Proof. The multiplicativity follows directly from the definition since m2 | n with
m > 1 implies that there is a prime p | m for which p2 | n. We conclude from
Theorem 6.8 that X
I(n) := µ(d)
d|n
□
Corollary 9.3. For F : N → C, there exists a uniquely-defined function f : N → C
P
satisfying F (n) = d|n f (d). The function f is moreover explicitly given by
X n
f (n) = µ(d)F .
d
d|n
Examples 9.4.
P
(1) By Theorem 6.9, we have d|n φ(d) = n. Combining this with Theorem 9.2 (1)
(with f (n) = φ(n) and F (n) = n), one obtains
X n X µ(d) Y 1
φ(n) = µ(d) = n =n 1− .
d d p prime
p
d|n d|n
p|n
(in the last line we note that every divisor d | n is a product pj11 pj22 · · · pjrr with
P
pℓ prime and jℓ ≤ apℓ ) and Theorem 9.2 (2), one obtains d|n φ(d) = n.
(2) Recall that the sum of divisor functions σk : N → R are defined by σk (n) =
k
P
d|n d for k ∈ R (see Section 6). Hence by Theorem 9.2 (1), it follows that
X n
nk = µ(d)σk .
d
d|n
Remark 9.5. By making the change of variables d → nd in the sum, we see that
f ∗ g = g ∗ f.
In a formal variable s, one can define the formal series
∞
X
Df (s) = f (n)n−s ,
n=1
called the Dirichlet series associated to f . When it converges, one can evaluate the
series for s ∈ C. If the series converges absolutely for some choice of s, then one has
∞
X ∞
X ∞ X
X ∞
−s −s
Df (s)Dg (s) = f (k)k g(m)m = f (k)g(m)(km)−s
k=1 m=1 k=1 m=1
∞
X ∞
X −s X
= f (k)g(m) n = (f ∗ g) (n)n−s .
n=1 k,m n=1
km=n
which is the Dirichlet series DU (s) for the constant function U (n) := 1 for all n.
One sees by direct calculation that
∞ ∞ ∞ X ∞
k −s n=md
X X X X
−s k −s
ζ(s)ζ(s − k) = m d d = d n = σk (n)n−s .
m=1 d=1 n=1 d|n n=1
76
Setting E(n) := n for all n ∈ N (so E k (n) := E(n)k = nk ) and noting that ζ(s−k) =
DE k (s), we see that U ∗ E k = σk (by the comparison of their Dirichlet series above)
and in particular U ∗ E = σ1 .
We next show that the set of arithmetic functions form a commutative ring, where
multiplication is defined by convolution and addition is pointwise addition
(f + g)(n) := f (n) + g(n).
Theorem 9.6. The set A of all arithmetic functions with pointwise addition and
convolution as the multiplication form an integral domain. The identity element is
(
1 if n = 1,
I(n) =
0 if n > 1.
The units of A are precisely the functions f : N → C with f (1) ̸= 0.
Proof. We have already seen above that A is a ring. We next show that there are
no zero divisors. Let f, g ∈ A be given with f ∗ g = 0 and suppose that g ̸≡ 0. Let
m ∈ N be minimal with g(m) ̸= 0.
Since
X m
0 = (f ∗ g) (m) = f g(d) = f (1)g(m),
d
d|m
77
it follows that f (1) = 0. Suppose next that n ≥ 2 and f (k) = 0 for 1 ≤ k ≤ n − 1.
Then
X XX
0 = (f ∗ g) (mn) = f (k)g(ℓ) = f (k)g(ℓ) = f (n)g(m) ⇒ f (n) = 0.
k,ℓ k≥n ℓ≥m
kℓ=mn ℓk=mn
By induction, we conclude that f (n) = 0 for all n, and hence there are no zero
divisors.
If f ∈ A is invertible, then there exists g ∈ A such that f ∗ g = I. In particular,
1 = I(1) = f (1)g(1),
and hence f (1) ̸= 0.
Conversely, let f ∈ A be given with f (1) ≠ 0. We then recursively construct a
g ∈ A with f ∗ g = I. Firstly, set
1
g(1) :=
f (1)
so that
(f ∗ g)(1) = f (1)g(1) = 1 = I(1).
Now suppose that n ≥ 2 and the values g(k) are given for every 1 ≤ k ≤ n − 1 such
that (f ∗ g) (k) = I(k) for 1 ≤ k ≤ n − 1. Since n > 1, we want to choose g(n) so
that X n
0 = I(n) = (f ∗ g) (n) = f (1)g(n) + f g(d)
d
d|n
d<n
holds. This is satisfied for
1 X n
g(n) := − f g(d).
f (1) d
d|n
d<n
□
Examples 9.7. We again set U (n) := 1 and E k (n) := nk for all n ∈ N. By Theorem
9.1, we have
µ ∗ U = I.
Theorem 9.2 can be written as follows: if F = f ∗ U , then f = F ∗ µ. Note next
that X
(U ∗ U ) (n) = 1 = σ0 (n) = τ (n) (number of divisors of n)
d|n
ℓ
a
Y
h(n) := f −1 pj j .
j=1
80
10. Sums of two squares
A right triangle all of whose sides have integral length x, y, z ∈ N is called a
Pythagorean triangle. The side lengths satisfy
x2 + y 2 = z 2
and the ordered pair (x, y, z) is called a Pythagorean triple.
Example 10.1. The ordered pairs (3, 4, 5) and (5, 12, 13) are Pythagorean triples.
If gcd(x, y, z) = 1, then the Pythagorean triple is called primitive. It is easy to
see that if a Pythgorean triple is primitive, then we furthermore have
gcd(x, y) = gcd(y, z) = gcd(x, z) = 1.
In particular, x and y cannot both be even. They are also not both odd because
z 2 ̸≡ 2 (mod 4). Without loss of generality, one may hence assume that x is odd
and y is even.
Theorem 10.2 (Diophantus). Every primitive Pythagorean triple (x, y, z) with odd
x and even y satisfies
x = r 2 − s2 , y = 2rs, z = r 2 + s2
with integers r, s ∈ Z for which r > s > 0, gcd(r, s) = 1, and r ̸≡ s (mod 2).
Conversely, every pair of such integers r and s yields a primitive Pythagorean
triple of the form
r2 − s2 , 2rs, r2 + s2 .
Proof. Suppose that r and s satisfy the conditions of the theorem and set x = r2 −s2 ,
y = 2rs, and z = r2 + s2 . Then one sees that
2
x2 + y 2 = r4 − 2r2 s2 + s4 + 4r2 s2 = r4 + 2r2 s2 + s4 = r2 + s2 = z 2
and gcd(x, y, z) = 1 because for p | y we have p = 2, p | r, or p | s and thus if p | x
or p | z we have p | gcd(r, s) = 1.
Next suppose that x2 + y 2 = z 2 with gcd(x, y, z) = 1 and y even. Then x and z
are odd and both 21 (x + z) ∈ Z and 12 (x − z) ∈ Z. It follows that
y 2 1
2 2
z+x z−x
= z −x = .
2 4 2 2
Writing d = gcd x+z , x−z
2 2
, we see that d is a divisor of
1 1
(z + x) + (z − x) = z
2 2
and also of
1 1
(z + x) − (z − x) = x.
2 2
Since gcd(x, z) = 1, it follows that d = 1.
81
Due to the uniqueness of prime factorization, since the product of 12 (z + x) and
1
2
(z − x) is a square and they are relatively prime, they are each squares. Thus there
exist r, s ∈ N with r > s such that
1
(z + x) = r2 ,
2
1
(z − x) = s2 .
2
Taking the sum and difference and noting that their product is y2 , we conclude that
z = r 2 + s2 ,
x = r 2 − s2 ,
y = 2rs.
Moreover, since 21 (z + x) = r2 and 21 (z − x) = s2 are relatively prime, we conclude
that gcd(r, s) = 1 and r ̸≡ s (mod 2) due to gcd(x, y, z) = 1. □
Fermat considered whether similar equations were solvable with powers higher
than 2. He famously claimed that xn + y n = z n has no solutions for n > 2 (this
was later known as “Fermat’s Last Theorem” and was only proven 350 years later).
Below is the n = 4 case, which he himself proved.
Proof. The second claim follows from the first claim, since (z 2 )2 = z 4 , so we only
need to show (1). Assume for contradiction that a solution to x4 + y 4 = z 2 in N3
exists, and let (x, y, z) be a solution with minimal z. Due to the minimality, we have
gcd(x, y, z) = 1 because d = gcd(x, y, z) satisfies d2 |z and
x 4 y 4 z 2
+ = 2 .
d d d
Since gcd(x, y, z) = 1, we immediately see that x and y cannot both be even.
Moreover, they cannot both be odd because if they were then
z2 ≡ 1 + 1 ≡ 2 (mod 4),
and this congruence is not solvable. We may therefore assume that x is odd, y is
even, and z is odd.
We next rewrite the equation x4 + y 4 = z 2 as
x4 = z 2 − y 4 = z − y 2 z + y 2 .
82
If it were the case that d = gcd(z − y 2 , z + y 2 ) > 1, then d | 2z, d | 2y 2 , and d | x4 .
Since x is odd, we conclude that d is odd and hence d | z and d | y 2 , yielding a
contradiction to gcd(x, y, z) = 1. Thus gcd(z − y 2 , z + y 2 ) = 1. It follows that
z − y 2 = u4 ,
z + y2 = v4
for some relatively prime u and v. Hence
2y 2 = v 4 − u4 = v 2 + u2 v 2 − u2
and
2z = v 4 + u4 .
Since z is odd and y is even, we also have u ≡ v ≡ 1 (mod 2), from which we
conclude that u2 + v 2 ≡ 2 (mod 8).
Since gcd(u, v) = 1, the integers v 2 − u2 and v 2 + u2 do not have any common
odd divisors > 1. Their product is 2y 2 and it follows that
u2 + v 2 = 2b2 ,
v 2 − u 2 = a2
with relatively prime a, b ∈ N. Since u2 + a2 = v 2 , the integers (u, a, v) are a
Pythagorean triple and Theorem 10.2 implies that there exist relatively prime r and
s satisfying
u = r 2 − s2 ,
a = 2rs,
v = r 2 + s2 .
Since
2 2
2b2 = u2 + v 2 = r2 − s2 + r 2 + s2 = 2 r 4 + s4 ,
it follows that
r 4 + s 4 = b2 .
Hence (r, s, b) is another solution to the original equation. We next show that b < z,
contradicting the minimality of z.
If u = v = 1, then y = 0, which contradicts y ∈ N. Thus u4 + v 4 > u2 + v 2 and
1 4 1 2
u + v4 > u + v 2 = b2 ≥ b.
z=
2 2
This contradicts the minimality of z. □
Questions.
(1) Which n are sums of two squares?
(2) How often is such an n representable (i.e., how many solutions to x2 + y 2 = n
are there?)?
83
Consider the ring (this just means that one can add, subtract, and multiply as usual)
R = Z[i] = {a + bi : a, b ∈ Z} .
The ring R is called the Gaussian integers and we have seen already that it is a
Euclidean domain. The units of R are
The Euclidean function associated to R is the absolute value squared (which we call
the norm) and the norm of an element a + bi ∈ R is N (a + bi) := |a + bi|2 = a2 + b2 ,
giving a connection with sums of squares. For arbitrary x, y ∈ Z, we have
Theorem 10.4. Let R = Z[i] be the ring of Gaussian integers with norm N (z) =
zz = a2 + b2 for z = a + bi ∈ R. For n ∈ N, let
r2 (n) := # (x, y) ∈ Z2 : x2 + y 2 = n
(6) A natural number n is a sum of two squares if and only if the prime factorization
of n does not contain any primes p ≡ 3 (mod 4) raised to an odd power.
(7) The relations
r+1 r −4
ρ2 (p ) = ρ2 (p)ρ2 (p ) − ρ2 (pr−1 )
p
and
X −4
ρ2 (n) =
d
d|n
−4
hold for all primes p and n ∈ N. Here we mean d
= 0 whenever d is even.
Q a
Proof. For part (1), suppose that z ∈ R has the factorizations z = j πj j and
Q b
z = ϵ j πj j with aj , bj ∈ N0 , ε a unit, and πj prime elements in R (since we allow
aj and/or bj to be zero, we can assume that the same primes are appearing). Then
Y Y
|πj |2aj = N (z) = |πj |2bj .
j j
α
By unique factorization over the integers and the fact that N (πj ) = pj j ∈ {pj , p2j }
for some prime pj ∈ Z, we see that
Y 2α a Y 2α b
pj j j = pj j j ,
j j
and ε ∈ {1, −1, i, −i}. For ε = ±1 it follows that ab = 0, so that p = ±a2 or p = ±b2 ,
which is a contradiction. If ε = ±i, then a2 − b2 = 0, and hence p = ±2ab = ±2a2 ,
which contradicts the condition p ≡ 1 (mod 4). We have hence proven (3).
Part (4) follows by direct computation and the fact that 1 + i is irreducible (and
hence prime) because its norm is prime.
We now move on to (5). By definition, ρ2 (n) is the number of equivalence classes
of elements of R with norm n, under the equivalence of associativity. Due to the
unique prime factorization in R, we conclude that ρ2 is multiplicative (if m > 1 and
n > 1 are relatively prime and N (x) = mn, then x = x1 x2 with N (x1 ) = m and
N (x2 ) = n as we can split x into a product of primes with norm dividing m and a
product of primes with norm dividing n).
One obtains from (2) that for p ≡ 3 (mod 4) prime
(
1 for even r,
ρ2 (pr ) =
0 for odd r.
Remarks.
(1) For z ∈ H := {z ∈ C : Im(z) > 0}, Jacobi considered the theta series
∞
2
X
ϑ(z) = eπin z .
n=−∞
For m, n ∈ N, let Sm (n) denote the set of solutions to the equation x21 + x22 +
· · · + x2m = n with xj ∈ Z, i.e.,
then satisfies Y
Df (s) = Df (p, s)
p prime
with
∞
X
Df (p, s) := f (pr )p−rs .
r=0
For the example of the Riemann zeta function
∞
X Y
ζ(s) = n−s = ζ (p, s) ,
n=1 p prime
we see that
∞
X 1
ζ (p, s) = p−rs = ,
r=0
1 − p−s
and hence
Y 1
ζ(s) = .
p prime
1 − p−s
From Theorem 10.4, we have the example
∞
X Y
Dρ2 (s) = ρ2 (n)n−s = Dρ2 (p, s).
n=1 p prime
P∞
For p ̸= 2, we set F (x) = r=0 ρ2 (pr )xr . By Theorem 10.4 (7), we have
−1
1 − ρ2 (p)x + x2 F (x) = 1 + ρ2 (p) − ρ2 (p) x
p
∞
X
r r−1
−1 r−2
r
+ ρ2 (p ) − ρ2 (p)ρ2 p + ρ2 p x = 1.
r=2
p
The factor for p = 2 is
∞
X 1
2−rs = .
r=0
1 − 2−s
Hence we conclude that
−1 Y 1
Dρ2 (s) = 1 − 2−s .
−s + −1
p prime 1 − ρ2 (p)p p
p−2s
p̸=2
88
The factor for p ≡ 3 (mod 4) can also be computed via Theorem 10.4 (5) as
1 1 1
Dρ2 (p, s) = −2s
= ,
1−p 1 − p 1 + p−s
−s
Here (µ) denotes the elements modulo units, which are what is known as princi-
pal ideals (actually all ideals in Z[i] are principal because it is what is known as
a principal ideal domain, but further discussion about this is left for a course in
Algebra).
89
11. Representations of real numbers via continued fractions
A real number x is called irrational if x ∈
/ Q.
Approximation of real numbers by rational numbers For every x ∈ R and
n ∈ N, there exists an m ∈ Z for which
m 1
x− ≤ .
n 2n
Questions. For every x ∈ R, is there a choice of (reduced) rational number m
n
such
m 1 m
that x − n is “significantly smaller” than 2n ? How small can x − n get? Finally,
how does one find m
n
approximating x very well?
m
Theorem 11.1 (Hurwitz). If x ∈ R but x ∈
/ Q, then there are infinitely many n
with gcd(m, n) = 1 and
m 1
x− <√ .
n 5n2
We will prove this later.
u0
Definition. Let u1
be a reduced fraction with u1 > 0. Using the Euclidean algo-
rithm, one has
u0 = u1 a0 + u2 0 < u2 < u1 ,
u1 = u2 a1 + u3 0 < u3 < u2 ,
..
.
uk−1 = uk ak−1 + uk+1 0 < uk+1 < uk ,
uk = uk+1 ak .
The numbers aj satisfy a0 ∈ Z and a1 , . . . , ak ∈ N.
uj
The rational number ζj := uj+1 > 1 satisfies
1
ζ0 = a0 + ,
ζ1
1
ζ1 = a1 + ,
ζ2
..
.
1
ζk−1 = ak−1 + ,
ζk
ζk = ak .
It follows that
u0 1 1 1
= ζ0 = a0 + = a0 + 1 = . . . = a0 + 1 .
u1 ζ1 a1 + ζ2
a1 + a2 + a 1
3+
..
. 1
ak−1 + a1
k
90
This formula is called a continued fraction expansion for uu01 and a0 , a1 , . . . , ak are
called the partial quotients of uu10 .
Suppose that x0 , x1 , . . . , xk ∈ R (one usually assumes that xj ∈ Q) with xj > 0
for 1 ≤ j ≤ k. Then
1
[x0 , x1 , . . . , xk ] := x0 + 1
x1 + x2 + x 1
3+
...
1
xk−1 + x1
k
Proof. Set xj := [aj , aj+1 , . . . , ak ] and yj := [bj , bj+1 , . . . , bn ] for 0 ≤ j ≤ min{k, n}.
For j ≥ 1 we have xj > 0 and yj > 0. Thus we see directly that for j < k
1 1
(11.1) x j = aj + = aj + .
[aj+1 , . . . , ak ] xj+1
We plan to use this identity to apply an inductive argument. First note that from
(11.1), it follows that for every 1 ≤ j ≤ k − 1
xj > aj ≥ 1
and aj < xj < aj + 1 for 0 ≤ j ≤ k − 1. Moreover, xk = ak > 1. Hence aj = ⌊xj ⌋
for 0 ≤ j ≤ k.
91
Similarly, we have
1
y j = bj +
,
yj+1
implying that bj < yj < bj + 1 for 0 ≤ j ≤ n − 1 and bj = ⌊yj ⌋ for 0 ≤ j ≤ n. By
assumption we have x0 = y0 . Hence
a0 = ⌊x0 ⌋ = ⌊y0 ⌋ = b0 .
Now assume that for some 0 ≤ j ≤ min{k, n} we have xj = yj and aj = bj . Then it
follows that
1 1
xj+1 = = = yj+1 ,
x j − aj y j − bj
and hence also
aj+1 = ⌊xj+1 ⌋ = ⌊yj+1 ⌋ = bj+1 .
Inductively, we obtain xj = yj and aj = bj for all 0 ≤ j ≤ min{k, n}. Suppose for
contradiction that k < n. Then it follows that xk = yk and
1
ak = b k = y k − < yk = xk ,
yk+1
which contradicts ak = xk . It follows that k = n, and the proof is complete. □
Theorem 11.4. Suppose that a0 ∈ Z and let a sequence (aν )ν≥1 of integers aν ∈ Z
be given. Define the sequences (hn )n≥−2 and (kn )n≥−2 recursively by
h−2 = 0, h−1 = 1, hn = an hn−1 + hn−2 ,
n≥0
k−2 = 1, k−1 = 0, kn = an kn−1 + kn−2 .
One then sets rn := [a0 , a1 , . . . , an ] for n ≥ 0. Then for all n ≥ 1, the following hold.
(1) For all x ∈ R, x > 0, one has
xhn−1 + hn−2
[a0 , a1 , . . . , an−1 , x] = .
xkn−1 + kn−2
(2) One has rn = hknn with gcd (hn , kn ) = 1.
(3) For every n ≥ −1, one has
hn kn−1 − hn−1 kn = (−1)n−1
and
1
rn − rn−1 = (−1)n−1 .
kn kn−1
(4) For n ≥ 0, we have
hn kn−2 − hn−2 kn = (−1)n an
and
an
rn − rn−2 = (−1)n .
kn kn−2
Proof.
92
(1) For n = 0, the right-hand side is x and the left-hand side is [x] = x. For n = 1,
the left-hand side is
1
[a0 , x] = a0 + ,
x
while the right-hand side is
xh0 + h−1 xa0 + 1 1
= = a0 + .
xk0 + k−1 x x
Now suppose that for some n ≥ 1, we have that (1) holds for all x > 0. Then it
follows that
an + x1 hn−1 + hn−2
1
[a0 , a1 , . . . , an , x] = a0 , a1 , . . . , an−1 , an + =
an + x1 kn−1 + kn−2
x
x (an hn−1 + hn−2 ) + hn−1 xhn + hn−1
= = .
x (an kn−1 + kn−2 ) + kn−1 xkn + kn−1
The result hence follows by induction.
(2) Setting x = an in (1), we obtain
(1) an hn−1 + hn−2 hn
rn = [a0 , a1 , . . . , an ] = = .
an kn−1 + kn−2 kn
We defer the proof that hn and kn are relatively prime to part (3).
(3) We proceed by induction. By definition, we have
h−1 k−2 − h−2 k−1 = 1.
Now suppose that for some n ≥ −1, we have
hn−1 kn−2 − hn−2 kn−1 = (−1)n−2 .
Then we obtain
hn kn−1 − hn−1 kn = an hn−1 + hn−2 kn−1 − hn−1 an kn−1 + kn−2
= − hn−1 kn−2 − hn−2 kn−1 = (−1)n−1 .
Note that this identity combined with Theorem 3.10 (1) implies that gcd(hn , kn ) =
1 for all n ≥ −1, and we furthermore conclude that
hn hn−1 hn kn−1 − hn−1 kn (−1)n−1
rn − rn−1 = − = = .
kn kn−1 kn kn−1 kn kn−1
(4) We begin by noting that
h0 k−2 − h−2 k0 = a0
and
hn kn−2 − hn−2 kn = an hn−1 + hn−2 kn−2 − hn−2 an kn−1 + kn−2
= an hn−1 kn−2 − kn−1 hn−2 = (−1)n an .
93
Therefore
hn hn−2 hn kn−2 − hn−2 kn (−1)n an
rn − rn−2 = − = = .
kn kn−2 kn kn−2 kn kn−2
□
Remark 11.5. The recursion defining kn and hn may be written in matrix form as
hn hn−1 hn−1 hn−2 an 1
= .
kn kn−1 kn−1 kn−2 1 0
We next discuss how to construct infinite continued fractions via limits.
Theorem 11.6. Suppose that a0 ∈ Z and let a sequence a1 , a2 , a3 , . . . of positive
integers (aj ∈ N for j ≥ 1) be given. Letting rn := [a0 , a1 , . . . , an ] denote the
value of the continued fraction formed by the sequence up to n for n ≥ 0, one has
r0 < r2 < r4 < . . . < r5 < r3 < r1 and the limit limn→∞ rn exists.
Proof. By Theorem 11.4 (4), we have rn − rn−2 > 0 for even n and rn − rn−2 < 0 for
odd n. By Theorem 11.4 (3), we also have r2n − r2n−1 < 0. Thus we conclude that
r2n < r2n+2ν < r2n+2ν−1 < r2n−1 ∀n, ν ∈ N.
Moreover, Theorem 11.4 (3) also implies that
(−1)n−1
rn − rn−1 =
kn kn−1
and kn → ∞ because it is strictly increasing by its recursive definition (as an > 0).
Thus the limits of the subsequences from n even and n odd equal each other (they
both exist because they are monotone and bounded) and the limit limn→∞ rn hence
exists (and is equal to the limit of each of these subsequences). □
Definition. Let a0 ∈ Z and a sequence a1 , a2 , . . . with aj ∈ N for j ≥ 1 be given.
The value of the regular infinite continued fraction [a0 , a1 , a2 , . . .] is the limit
lim [a0 , a1 , . . . , an ].
n→∞
One calls rn the n-th convergent of [a0 , a1 , . . .]. The sequences (hn )n and (kn )n from
Theorem 11.4 satisfy rn = hknn and gcd (hn , kn ) = 1.
Questions.
• Is every irrational number the limit of an infinite continued fraction?
• Is it possible for different (regular) infinite continued fractions to have the
same value?
• Can rational numbers be expressed as regular infinite continued fractions?
In other words, can an infinite continued fraction have a rational value?
Theorem 11.7. The value of every regular infinite continued fraction is irrational.
94
Proof. Set ϑ = [a0 , a1 , a2 , . . .] and let hn , kn , and rn be as in Theorem 11.4. By
Theorem 11.6, we have r2n < ϑ < r2n+1 , and hence
0 < |ϑ − rn | < |rn+1 − rn | .
Multiplying by kn , Theorem 11.4 (parts (2) and (3)) therefore imply that
1
0 < |kn ϑ − hn | < kn |rn+1 − rn | = .
kn+1
Suppose for contradiction that ϑ ∈ Q. Then ϑ = ab with a ∈ Z and b ∈ N.
Multiplying the previous inequality by b, it follows that
b
0 < |kn a − hn b| <
kn+1
b
for all n. Since b is fixed independent of n and kn+1 → ∞ as n → ∞, the ratio kn+1
is less than 1 for n sufficienty large. However, since kn , a, hn , and b are integers, so
is kn a − hn b. It follows that for n sufficiently large, |kn a − hn b| is an integer between
0 and 1, which is a contradiction. We therefore conclude that ϑ ∈ / Q. □
Infinite continued fractions also satisfy relations similar to those given in Remark
11.2.
Theorem 11.9. Two distinct regular infinite continued fractions have different val-
ues.
Theorem 11.10 (Continued fraction algorithm). Every irrational number has pre-
cisely one continued fraction representation as a regular infinite continued fraction
ϑ = [a0 , a1 , . . .]. One obtains an recursively via
ϑ0 = ϑ, a0 = ⌊ϑ0 ⌋,
1
ϑn = ϑn−1 −an−1 , an = ⌊ϑn ⌋ for n ≥ 1.
Proof. The uniqueness was already shown in Theorem 11.9. Since ϑ ∈ / Q, the
recursive formula in the theorem gives an infinite sequence (an )n≥0 with well-defined
integers an ∈ Z, as ϑn ∈
/ Z for all n. Since an < ϑn for every n ≥ 0, we see from the
recursion that an ∈ N for all n ≥ 1. It follows that
1 1
ϑ = ϑ0 = a0 + = [a0 , ϑ1 ] = a0 , a1 + = [a0 , a1 , ϑ2 ].
ϑ1 ϑ2
Continuing inductively, it follows that ϑ = [a0 , a1 , a2 , . . . , an−1 , ϑn ] for all n ≥ 1.
Theorem 11.4 (1) then implies that
ϑn hn−1 + hn−2
ϑ = [a0 , a1 , a2 , . . . , an−1 , ϑn ] = .
ϑn kn−1 + kn−2
Using parts (2) and (3) of Theorem 11.4, it now follows that
ϑn hn−1 + hn−2 hn−1 hn−2 kn−1 − hn−1 kn−2 (−1)n+1
ϑ − rn−1 = − = = .
ϑn kn−1 + kn−2 kn−1 kn−1 (ϑn kn−1 + kn−2 ) kn−1 (ϑn kn−1 + kn−2 )
Since ϑn > 1 and kn → ∞, we conclude that limn→∞ rn = ϑ, or in other words
ϑ = [a0 , a1 , . . .]. □
Examples 11.11.
(1) We next explicitly compute ϑ = [1, 1, 1, . . .]. We note that
1
ϑ = [1, ϑ] = 1 + ,
ϑ
and hence
ϑ2 − ϑ − 1 = 0
There are two roots of this polynomial obtained by the quadratic formula and
combining with the fact that ϑ > 1, we obtain
1 √
ϑ= 1+ 5 .
2
96
(2) Note that 3.1415926 < π < 3.1415927. The continued fraction algorithm for
ϑ = π yields a0 = 3 and
7.06251099 < ϑ1 < 7.06251598, a1 = 7,
15.9959104 < ϑ2 < 15.997187, a2 = 15,
1.0028211 < ϑ3 < 1.0040251, a3 = 1.
These yield the following rational approximations to π (see the rn row):
n −2 −1 0 1 2 3
an - - 3 7 15 1
hn 0 1 3 22 333 355
kn 1 0 1 7 106 113
rn - - 3 22 7
333
106
355
113
97
12. Approximation of real numbers with rational numbers
We continue to use the same notation as in Section 11, and in particular the notation
from Theorem 11.4. For ϑ = [a0 , a1 , a2 , . . .], we have
h−2 = 0, h−1 = 1, hn = an hn−1 + hn−2 ,
k−2 = 1, k−1 = 0, kn = an kn−1 + kn−2 .
Theorem 12.1. Let ϑ ∈ R be irrational (i.e., ϑ ∈ / Q). By Theorem 11.10, we may
write ϑ uniquely as a regular infinite continued fraction. Then for every n ≥ 1, the
following approximations for the continued fraction yielding ϑ hold:
hn 1
(1) ϑ − kn
<kn kn+1
.
1
(2) |ϑkn − hn | < kn+1 .
(3) ϑ − hkn+1
n+1
< ϑ − hknn .
(4) |ϑkn+1 − hn+1 | < |ϑkn − hn | .
Proof. Writing ϑn and an as in the continued fraction algorithm in Theorem 11.10,
we have
1 1 1
ϑ − rn = < = .
kn (ϑn+1 kn + kn−1 ) kn (an+1 kn + kn−1 ) kn kn+1
This is part (1).
Part (2) follows directly by multiplying part (1) by kn . Furthermore,
ϑn+1 kn + kn−1 < (an+1 + 1) kn + kn−1 = kn+1 + kn ≤ an+1 kn+1 + kn = kn+2 ,
and hence
hn 1 1
ϑ−
= > .
kn kn (ϑn+1 kn + kn−1 ) kn kn+2
Multiplying by kn and using part (2) then yields
1
ϑkn − hn > > ϑkn+1 − hn+1 ,
kn+2
which is part (4).
Finally, we obtain (3) by noting that kn < kn+1 and hence from part (4)
hn+1 1 1 hn
ϑ− = ϑkn+1 − hn+1 < ϑkn − hn = ϑ − .
kn+1 kn+1 kn kn
□
We next show that the approximations in Theorem 12.1 are in some sense best
possible.
a
Theorem 12.2. Suppose that ϑ ∈ R is irrational and b
is a reduced fraction with
a ∈ Z and b ∈ N. Then the following hold
a hn
(1) If ϑ − b
< ϑ− kn
for some n ≥ 1, then it must be the case that b > kn .
98
(2) If |ϑb − a| < |ϑkn − hn | for some n ≥ 1, then one must have b ≥ kn+1 .
Proof. Suppose for contradiction that (2) does not hold. Then for some n ≥ 1, we
have
|ϑb − a| < |ϑkn − hn |
and b < kn+1 . The system of equations
(
xkn + ykn+1 = b
xhn + yhn+1 = a
have a unique solution x, y ∈ Z by Theorem 11.4 (3) (the determinant of the cor-
repsonding matrix is 1).
If it were the case that x = 0, then it follows that ykn+1 = b, and hence b ≥ kn+1 ,
which is a contradiction. Similarly, if y = 0, then xkn = b and xhn = a, implying
that
|ϑb − a| = |x| |ϑkn − hn | ≥ |ϑkn − hn |
which contradicts the inequality assumed above. Hence we have xy ̸= 0.
If y < 0, then it follows that
xkn = b − ykn+1 > 0,
and hence x > 0. If y > 0, then
xkn = b − ykn+1 < 0,
and hence x < 0. In particular, we conclude that xy < 0.
By Theorem 11.6, the differences ϑ − rn and ϑ − rn+1 have opposite signs (one
is positive and the other is negative), and thus by Theorem 11.10, the differences
ϑkn − hn and ϑkn+1 − hn+1 also have different signs. Therefore x (ϑkn − hn ) and
y (ϑkn+1 − hn+1 ) have the same signs. We then compute
x (ϑkn − hn ) + y (ϑkn+1 − hn+1 ) = ϑb − a,
and hence
|ϑb − a| = |x (ϑkn − hn )| + |y (ϑkn+1 − hn+1 )|
> |x (ϑkn − hn )| ≥ |ϑkn − hn | ,
contradicting the assumption. We have hence concluded part (2).
If part (1) did not hold, then there would be an n ≥ 1 for which
a hn
ϑ− < ϑ−
b kn
and b ≤ kn . Multiplication with b then yields
a hn
|ϑb − a| = b ϑ − < kn ϑ − = |ϑkn − hn |
b kn
and the inequality kn < kn+1 contradicts part (2). □
99
a
Theorem 12.3. Let an irrational ϑ ∈ R \ Q and a rational b
∈ Q with a ∈ Z,
b ∈ N, and gcd(a, b) = 1 be given. If
a 1
ϑ− < 2
b 2b
a
is satisfied, then b
is a convergent in the continued fraction expansion of ϑ.
ϑb − a ≥ ϑkn − hn ,
and hence
hn b a 1
ϑ− ≤ ϑ− < .
kn kn b 2bkn
a
Since b
̸ rn , we have bhn − akn ∈ Z \ {0}, and thus |bhn − akn | ≥ 1. However,
=
1 |bhn − akn | hn a hn a 1 1
≤ = − ≤ ϑ− + ϑ− < + 2,
bkn bkn kn b kn b 2bkn 2b
and hence 2bk1 n < 2b12 , so that k1n < 1b . It follows that b < kn , which contradicts the
choice of n. It therefore follows that ab = rn . □
x
> 2
5 − 1 .
1
Proof. For x ≥ 1, the map x 7→ x + is strictly monotone. Using
x
√
1 2 1− 5 1 √
√ = = 5 − 1 ,
1 1−5
2
1+ 5 2
√
we see that at the point x = 12 1 + 5 , this function has the value
1 √ 1 1 √ 1 √ √
1+ 5 + 1 √ = 1+ 5 + 5 − 1 = 5.
2 2
1+ 5 2 2
√
Hence if the value
√ for some x ≥ 1 is smaller than 5, it must be the case that
x < 12 1 + 5 .
□
100
Theorem 12.6 (Hurwitz). For every irrational number ϑ, there are infinitely many
rational numbers hk satisfying
h 1
ϑ− <√ .
k 5k 2
Moreover, for every sequence of three consecutive convergents from the continued
fraction expansion of ϑ, at least one them satisfies the above inequality.
kn h hn−1
Proof. We set qn := kn−1 and claim that if the inequality does not hold for k
= kn−1
h hn 1
√
and k = kn , then qn + qn < 5.
To see this claim, note that by assumption
Thm. 11.4 (3)
1 1 ↓ 1
√ 2
+ √ ≤ |ϑ − rn−1 | + |ϑ − rn | = |rn − rn−1 | = .
5kn−1 5kn2 ↑ kn−1 kn
rn −1<ϑ<rn
or rn <ϑ<rn−1
We relate the two values via the relations defining kn ; namely, we have
kn+1 an+1 kn + kn−1 1
qn+1 = = = an+1 + .
kn kn qn
Since an+1 ≥ 1, it follows that
1 √ 1 1 1 √ 1 √
5 + 1 > qn+1 = an+1 + ≥1+ >1+ 5−1 = 5+1 ,
2 qn qn 2 2
which is a contradiction. □
101
√
Theorem 12.7. For every c > 5, there exist irrational numbers ϑ for which
h 1
ϑ− < 2
k ck
holds for only finitely many rational numbers hk .
√
Proof. One chooses ϑ = 12
5 + 1 independent of c. Note that we have the contin-
ued fraction expansion ϑ = [1, 1, 1, . . .] (see the example at the end of Section 11).
Let hk ∈ Q satisfying ϑ − hk < ck12 be given. By Theorem 12.3, it follows that there
exists n ≥ 0 such that
h hn
= rn = .
k kn
Using the fact that aj = 1 for all j and arguing by induction, we see that kn = hn−1 .
Hence
kn−1 kn−1 1 1 √
lim = lim = = 5 − 1 = ϑ − 1.
n→∞ kn n→∞ hn−1 ϑ 2
Recall the definitions ϑ0 = ϑ and
1 1
ϑn = = .
ϑn−1 − an−1 ϑn−1 − 1
One then inductively obtains that ϑn = ϑ for all n. It follows that
√
kn−1
lim ϑn+1 + = 2ϑ − 1 = 5.
n→∞ kn
√
Since c > 5, there are only finitely many n for which ϑn+1 + kn−1 kn
> c. Following
the proof in Theorem 11.10, we conclude that
hn 1 1
ϑ− = <
kn kn2 ϑn+1 + kn−1 ckn2
kn
103
13. Periodic continued fractions
Definition. An infnite regular continued fraction [a0 , a1 , . . .] is called periodic if
there exist integers r ≥ 0 and s > 0 for which ak+s = ak for all k ≥ r. One writes
[a0 , a1 , a2 , . . .] = [a0 , a1 , . . . , ar−1 , ar , . . . , ar+s−1 ].
We have seen one such example; namely
1 √
5 + 1 = [1, 1, 1, . . .] = [1].
2
In the case that r = 0, we call the continued fraction purely periodic. We call α ∈ C
a quadratic irrationality if α ̸∈ Q and α is a root of a non-zero quadratic polynomial
P (x) = ax2 + bx + c with a, b, c ∈ Z. The second root of this polynomial is called
its conjugate and √is written as α′ . If α ∈
/ R, then α′ = α is the√ complex conjugate,
while for α = 2a ∈ R (for D = b2 − 4ac) we have α′ = −b∓2a D .
−b± D
Proof.
(1) Suppose that α = [b0 , . . . , br−1 , a0 , . . . , an−1 ] and set ϑ = [a0 , a1 , . . . , an−1 ] and
let let hn and kn be defined for ϑ as in Theorem 11.4. Then by Theorem 11.4
(1), we have
ϑhn−1 + hn−2
ϑ = [a0 , . . . , an−1 , ϑ] = ,
ϑkn−1 + kn−2
and hence
kn−1 ϑ2 + kn−2 − hn−1 ϑ − hn−2 = 0.
Theorem 13.4. Suppose that d ∈ N is not a square and √ r is the length of the
shortest period in the continued fraction expansion of α = d. Then
√
d = [a0 , a1 , . . . , ar−1 , 2a0 ]
√ √
with a0 = ⌊ d⌋. If one writes αn = q1n (mn + d) as in the recursion in the proof of
Theorem 13.1, then we have qn ̸= −1 for all n, and qn = 1 precisely when r | n.
1 √ √
mℓr + d = γℓr = γ0 = a0 + d
qℓr
j√ k
we have qℓr = 1 and mℓr = a0 = d for all ℓ ∈ N0 .
Suppose that for some n ≥ 0 we have qn = 1. then we have
√
γn = mn + d = [cn , cn+1 , . . . , cn+r−1 ].
By Theorem 13.3, it follows that γn > 1 and −1 < γn′ < 0, and hence
√
−1 < γn′ = mn − d < 0.
Thus
√
−mn − 1 < − d < −mn ,
√
⇐⇒ mn < d < mn + 1,
j√ k
so that mn = d . It follows that
j√ k √
γn = d + d = γ0
Multiplying the left-hand side by the denominator of√ the right-hand side, we consider
both sides as elements
√ of the vector space Q · 1 + Q · d and compare the coefficients
√ front of 1 and d; since d is not a square, these coefficients must match (as 1 and
in
d are linearly independent over Q). Comparing the rational and irrational parts
in this way yields
mn+1 kn + qn+1 kn−1 − hn = 0,
mn+1 hn + qn+1 hn−1 − dkn = 0.
Eliminating mn+1 and using Theorem 11.4 (3) yields
h2n − dkn2 = qn+1 (hn kn−1 − hn−1 kn ) = (−1)n+1 qn+1 ∀n ≥ −1.
Making the change of variables n → nr − 1 and recalling that qnr = 1, we obtain
h2nr−1 − dknr−1
2
= (−1)nr qnr = (−1)nr .
□
Remark
√ 13.6. We see from Theorem 13.5 that the continued fraction expansion of
d leads to sollutions of the equation
x2 − dy 2 = N
for certain N . For N ∈ Z and d ∈ N, such equations are called Pell
√ equations. The
choices N ∈ {1, −1, 4, −4} are of particular interest. For |N | < d, we next show
in Theorem 13.7 below how to obtain all solutions of the Pell equation.
Theorem 13.7. Suppose that d ∈ N is not a square and let hknn be the convergents of
√ √
d from its continued fraction expansion. If N ∈ Z satisfies |N | < d and N ̸= 0,
then for every solution x, y ∈ N to the Pell equation x2 −dy 2 = N with gcd(x, y) = 1,
there exists an n such that x = hn and y = kn .
109
√ √
Proof. Let ρ, σ ∈ R and X, Y ∈ N be given with 0 < σ < ρ, ρ ̸∈ Q, gcd(X, Y ) =
1, and X 2 − ρY 2 = σ. Then we have
X √ σ
− ρ= √ > 0,
Y Y (X + ρ Y )
√
and hence Y X√ρ > 1. Since σ < ρ, we conclude that
√
X √ σ ρ 1 1
0< − ρ= √ < √ = < .
Y Y (X + ρ Y ) Y (X + ρ Y ) Y 2 Y X√ρ + 1 2Y 2
and hence
x2n − dyn2 = (−1)nr .
Suppose that a pair (s, t) ∈ N2 give a solution to x2 − dy 2 = ±1. By the choice of
the pair (x1 , y1 ), we have
√ √
s + t d > x1 + y1 d = ε > 1.
Hence there exists a unique n ∈ N for which
√ √ √
εn = xn + yn d ≤ s + t d < xn+1 + yn+1 d = εn+1 .
Multiplication by ε−n = (−1)nr (ε′ )n yields
√ −n √
(13.1) s+t d ε =x+y d
n −2 −1 0 1 2 3 4 5
an - - 4 2 1 3 1 2
hn 0 1 4 9 13 48 61 170
kn 1 0 1 2 3 11 14 39
By Theorem 13.8, the choice x1 = 170 and y1 = 39 gives the smallest positive
solution.
112
14. Primes and their distribution
Question. Which primes can be expressed via a given polynomial?
Theorem 14.1. Suppose that f ∈ C[x]. If all values of f for sufficiently large
integer inputs are all primes, then f is constant.
Proof. Suppose that there exists n0 ∈ Z such that f (n) is prime for all n ≥ n0 .
Then it follows that f ∈ Q[x] (one can plug in different choices of n and consider
the identities as a linear system of equations and multiply by the inverse matrix).
Thus there exists an m ∈ N for which mf ∈ Z[x]. Set p0 = f (n0 ). Then by Taylor’s
formula, for every t ∈ N0 we have
X mf (ℓ) (n0 ) ℓ−1
f n0 + tmp0 = tp0 mtp0 .
ℓ≥0
ℓ!
(ℓ)
Since mf ∈ Z[x], we conclude that mf ℓ!(n0 ) ∈ Z, and therefore for every ℓ ≥ 1 every
term is divisible by p0 . Since f (n0 + tmp0 ) is prime and also divisible by p0 , we
conclude that
f (n0 + tmp0 ) = p0 .
However, Rolle’s Theorem then tells us that there must be a root of f ′ (x) = 0
between n0 + (t − 1)mp0 and n0 + tmp for every t. Thus f ′ has infinitely many roots
and is therefore identically zero. It follows that f (x) is constant, and in particular
f (x) = p0 for all x. □
Remark 14.2. This was later generalized by R.-C. Buck (1946) to include rational
functions (ratios of polynomials).
Question. How large can the gaps between two successive primes be?
Theorem 14.3. The sequence of differences between successive primes is unbounded.
Proof. Let N ∈ N be arbitrary, 2 ≤ n ≤ N + 1 and set xn := (N + 1)! + n. Then
n | xn for xn ̸= 1, n and thus xn is not prime. Hence there exists a gap of at least
length N between the prime before (N +1)!+2 and the prime after (N +1)!+N . □
Example 14.4 (Young and Pottler, 1989). After the prime 42842283995351, there
are precisely 777 composite numbers.
In the last sum, we have µ(d) = ±1, since P (y) is squarefree. Moreover, the number
of summands is 2π(y) (every prime may appear to the power 0 or 1). Therefore we
have
√ X jxk
π(x) = π x − 1 + µ(d) .
√ d
d|P ( x)
One can use this to compute the number of primes up to x if one knows the number
√
of primes up to x.
Problem. Since there are many summands to compute, the above method is not very
effective at determining π(x).
Improvements were made over time:
• Meissel, 1870
• Lehmer, 1959
• Lagarias, Miller, Odlyzko, 1985
Euler: The number of primes is “a lot less” than the number of integers. By using
the above formula, one can show that
x
π(x) = O for x → ∞.
log log x
x
We will later prove that π(x) = O log x . The fact that π(x) is asymptotically the
same as logx x is a theorem known as the Prime Number Theorem.
We next consider twin primes; these are primes which have a gap of precisely 2
between them. We define
π2 (x) = # {n ∈ N, n, n + 2 prime, n + 2 ≤ x} ,
114
n o
Q2 (x, y) = # n ∈ N, n ≤ x, gcd n(n + 2), P (y) = 1 .
Similarly to the case of π(x), one can show that
√ √
π2 (x) − π2 x + 2 = Q2 x − 2, x
for x ≥ 9.
√
Problem. It is hard to compute Q2 (x − 2, x). It is conjectured that there are
infinitely many twin primes, but this has not yet been proven.
Remark 14.5. One can show that p prime p−1 diverges. On the other hand, one can
P
k
2k+2
(14.2) π 2 < .
k
We prove (14.2) by induction. For k = 1 we have
π(2) = 1 < 23 ,
for k = 2 we have
24
π(4) = 2 < = 8,
2
and for k = 3 we have
25 32
π(8) = 4 <
= .
3 3
Now suppose that (14.2) holds for k. Then (14.1) implies that
k+1
k
2k+1 log 2 2k+2 2k+1 3 · 2k+1
π 2 ≤π 2 + < + = .
log 2k k k k
116
2k+3
This is ≤ k+1
if and only if
3 · 2k+1 2k+3
≤ ⇔ 3(k + 1) ≤ 4k,
k k+1
which is satisfied for k ≥ 3. Hence we conclude (14.2).
Suppose now that k ∈ N is chosen so that 2k−1 < x ≤ 2k . Then, since π is monotone
increasing, (14.2) implies that
k
2k+2
π (x) ≤ π 2 < .
k
log(x)
Since 2k−1 < x and k ≥ log(2)
, we have
2k+2 x
< 8 log(2) .
k log(x)
□
Remark 14.7. Using the so-called sieve method, Brun showed that
x
π(x) = O .
log x
For twin primes he showed that
x
π2 (x) = O .
log2 x
As noted above, the Prime Number Theorem states the pi(x) is asymptotically
x
log(x)
.Although we won’t prove that statement in this class, as a step in that
direction one can show a lower bound for π(x) in a manner similar to Theorem 14.6.
Lemma 14.9 (partial summation). Let (an )n∈N be an arbitrary sequence of complex
numbers and (tn )n∈N be a sequence of real numbers which are strictly increasing and
unbounded. Set X
A(t) := an
n
tn ≤t
We then split the first sum into two pieces and make the shift n → n − 1 in the first
sum to rewrite this as (recalling that A(x) = A(tN ))
N
X N
X N
X
A (tn−1 ) g (tn ) − A (tn ) g (tn ) + A (x) g (x) = − an g (tn ) + A (x) g (x) .
n=2 n=1 n=1
□
Corollary 14.10. Suppose that g : [1, ∞) → C is continuously differentiable and
N ∈ N. Then
N Z N Z N
X 1 1
g(n) = g(t)dt + g(1) + g(N ) + t − ⌊t⌋ − g ′ (t)dt.
n=1 1 2 1 2
In particular,
1
log (N !) = N log N − N +
log N + O(1).
2
Proof. We use Lemma 14.9 with an = 1 and tn = n. Combining partial summation
with integration by parts, we then have
X Z x
g(n) = ⌊x⌋ g(x) − ⌊t⌋ g ′ (t)dt
1≤n≤x 1
Z x Z x
1 ′ 1
= ⌊x⌋ g(x) − t− g (t)dt + t − ⌊t⌋ − g ′ (t)dt
1 2 1 2
↖
endpoint
int. by Z x
z }| { Z x
parts 1 1 1
= g(t)dt + g(1) + ⌊x⌋ + − x g(x) + t − ⌊t⌋ − g ′ (t)dt.
1 2 2 1 2
Choosing x = N ∈ N, the first claim follows.
118
For the second claim, we choose g(x) = log(x). Since x log x − x is an antiderivative
of log x (by integration by parts), we have
N Z N Z N
X 1 1 −1
log (N !) = log n = log tdt + log N + t − ⌊t⌋ − t dt
n=1 1 2 1 2
1
= N log N − N + log N + O(1).
2
We then use
Z N N −1 Z n+1
1 −1 X 1 −1
t − ⌊t⌋ − t dt = t − ⌊t⌋ − t dt
1 2 n=1 n
2
N −1 Z 1 N −1 Z 1 !
t − 12
X 1 1 X
= (t + n) − ⌊t + n⌋ − dt = dt
n=1 0 2 t+n n=1 0 t+n
N −1 Z 1 N Z 1 !
t + n − n + 12 n + 12
X X
= dt = 1− dt
n=1 0 t + n n=1 0 t + n
N −1
X 1
= 1− n+ log (1 + n) − log n
n=1
2
N −1 !
X 1 1
= 1− n+ log 1 +
n=1
2 n
| {z }
∞
X n−k 1 1 1
(−1)k+1 = − 2 + 3 − ...
k=1
k n 2n 3n
| {z }
1 −2
1
+ 3n1 2 + 2n1
− 4n1 2 + . . . = − 12 n + O (n−3 )
= 1 − 1 − 2n
N −1 −1
N
!
1 X 1 X 1
=− +O
12 n=1 n2 n=1
n3
= O (1) .
N −1 ∞
X 1 X 1
< = ζ(2) = O (1) ,
n=1
n2 n=1
n 2
N −1
X 1
< ζ(3) = O (1) ,
n=1
n3
119
where we use the fact that ζ (s) converges absolutely for Re (s) > 1. We then obtain
the claim by plugging in the Taylor expansion of the logarithm; namely, we use
∞
X xk
log (1 + x) = (−1)k+1 .
k=1
k
1
The average size of p
for p < x is considered in the following lemma.
log pn
Proof. We use Lemma 14.9 where tn = pn is the n-th prime, an = pn
, and g(t) =
1
log(t)
. This yields
X 1 X log p 1 Z x
A(x) A(t)
= = + 2 dt.
p≤x
p p≤x
p log p log x 2 t log t
where Z ∞
a (t)
B= dt + 1 − log log(2).
2 t log2 (t)
□
We next recall some complex analysis for those who have seen it and introduce some
of it for those who have not. Roughly speaking, complex analysis is the study of
so-called holomorphic functions from a subset U ⊆ C of the complex numbers to
C. For an open set U ⊆ C and z0 ∈ U , a function f : U → C is called complex
differentiable at z0 if the limit
f (z0 + h) − f (z0 )
f ′ (z0 ) := lim
h→0 h
h+z0 ∈U
exists; here h is any element of C such that h + z0 ∈ U . One calls the function f
holomorphic at z0 if there exists an open neighborhood of z0 such that f is complex
differentiable for every point in this open neighborhood.
Writing z = x + iy, a theorem in complex analysis states that the function
f (x + iy) = u (x, y) + iv (x, y) ,
with u : R2 → R and v : R2 → R is complex differentiable if and only if u and v both
have continuous first-order partial derivatives in both variables and they satisfy the
Cauchy–Riemann differential equations
∂u ∂v ∂u ∂v
= , =− .
∂x ∂y ∂y ∂x
∂ 1 ∂
Writing ∂z
:= 2 ∂x
− 2i ∂y
∂
and ∂
∂z
:= 1 ∂
2 ∂x
+ 2i ∂y
∂
, the Cauchy–Riemann equations state
that
∂
f (z) = 0.
∂z
Remark 14.15. Note that
∂ ∂
z=1 z=0
∂z ∂z
∂ ∂
z=0 z = 1,
∂z ∂z
which explains the notation. Roughly speaking, the Cauchy–Riemann equations are
satisfied if the functions have no “z contribution”.
Examples 14.16.
• Polynomials.
• The exponential function
∞
X zn
exp (z) = .
n=0
n!
The following functions are not holomorphic:
122
• z 7→ |z|
• z→ 7 z
• z 7→ Re(z) or z 7→ Im(z)
Properties
• The space of holomorphic functions is closed under addition and multiplica-
tion.
• If g (z0 ) ̸= 0 and g is complex differentiable at z0 , then so is g1 .
• The sum, product, quotient, and chain rules all hold for complex differenti-
ation.
A function f : D → C (D ⊆ C is usually assumed, but one can more generally take
a metric space) is called analytic at x0 ∈ D if there exists a Taylor-like series
∞
X
f (x) = an (x − x0 )n
n=0
is called the Laurent series at τ . The coefficient a−1 is called the Residue of f at τ .
P (z)
Example 14.17. Rational functions Q(z)
with polynomials P and Q are meromorphic.
123
We next define another holomorphic function Γ (z) known as the Gamma function.
For n ∈ N we define
Γ (n) = (n − 1)!.
This is generalized for z ∈ C with Re(z) > 0 by
Z ∞
Γ (z) = tz−1 e−t dt.
0
This can be shown to converge absolutely (for Re(z) > 0) and is holomorphic in
that region. Note furthermore that, using integration by parts, the Gamma function
satisfies a functional equation (for Re(z) > 0)
Z ∞ Z ∞
z −t
Γ (z + 1) = t e dt = z tz−1 e−t dt = zΓ (z) .
0 0
Using this to define Γ(z) for z with real part between −1 and 0 with z ̸= 0 by
Γ(z) := Γ(z+1)
z
, we may extend the function to Re(z) > −1. Continuing in this way
and taking PΓ = −N0 (because we cannot divide by zero, we must leave out these
points), we obtain a meromorphic function on the entire set C (this is called analytic
continuation and the continuation is unique by the identity theorem). Specifically,
n
it is a meromorphic function with simple poles in −N0 having residue (−1) n!
.
Theorem 14.18. The Riemann zeta function has a meromorphic continuation to
the entire complex plane and satisfies the functional equation
1−s s−1
s s
ζ (1 − s) Γ π 2 = ζ(s)Γ π− 2 .
2 2
The following theorem was proven by Hadamard and la Vallé Poussin in 1896 at
about the same time.
Theorem 14.19 (Prime Number Theorem). As x → ∞, we have
x
π(x) ∼ .
log x
We prove the prime number theorem in 6 steps.
Step 1: We first show that the convergence of the sequence
!
X log p
− log n
p≤n
p
n=1,2,...
Step 2: Bound of the error from the “tail” of the zeta function.
Proof. Using partial summation (Lemma 14.9) with g (t) = t−s , tn = n, and an = 1
yields for x ≥ N
X Z x
−s −s
n = ⌊x⌋ − N + 1 x + s ⌊t⌋ −N + 1 t−s−1 dt.
N ≤n≤x N |{z}
=t−{t}
The first two summands in the integral can be explicitly evaluated via
Z ∞ −s+1
N −s+1
−s −s−1
N 1
s t − Nt dt = s − = N 1−s .
N s − 1 s s − 1
□
Remark 14.21. One can show that the integral appearing in the lemma is a holo-
morphic function for σ > 0.
As noted in Remark 14.21, one can show that the integral is holomorphic for σ > 0.
From the term 1/(s − 1), we see that ζ(s) has a simple pole at s = 1 with residue
1 and no other poles with σ > 0. Note that by the identity theorem in complex
analysis, we know that this continuation is unique.
It remain to show that ζ(1 + it) ̸= 0 for real t ̸= 0. Assume for contradiction that
the zeta function vanishes at 1 + it0 with t0 ∈ R \ {0}. We consider the one-variable
function ζ(σ + it0 ) and take the Taylor expansion around σ = 1. This yields
ζ(σ + it0 ) = (σ − 1)ζ ′ (1 + it0 ) + . . . .
Since ζ has a pole at s = 1 with residue 1, the Laurent expansion of ζ(s) around
s = 1 is given by
ζ(s) = (s − 1)−1 + . . . .
Now define the function
Z(s) = ζ(s)3 ζ(s + it0 )4 ζ(s + 2it0 ).
From above, the function Z(s) is holomorphic for σ > 1 and meromorphic for σ > 0.
Moreover, the function Z(s) vanishes at the point s = 1 because ζ(s)3 has a pole of
order 3 but ζ(s + it0 )4 vanishes to order 4. Thus as σ → 1, we have
log |Z (σ)| → −∞.
We now use the product expansion (from Theorem 14.14) of ζ and expand the
logarithm for σ > 1.
In complex analysis, the complex logarithm can be defined such that for |z| < 1
one has
∞
X zj
Log (1 − z) = − .
j=1
j
Moreover,
Re (Log (z)) = log |z|.
Hence
∞ ∞
X
−s
XX 1 −js
X
Log (ζ(s)) = − log 1 − p = p = an n−s ,
p p j=1
j n=1
One can use the Weierstrass majorant criterion to show that the function
s(s − 1) ∞
Z
1
gp (s) = + 1 − {t} t−s−1 dt
ps 1 − ps p p
Thus
!
X log p p 1 1 X log p X
D(s) = + gp (s) = + gp (s) log p ,
p
p 1−s ps − 1 s−1 p
p s−1
p
and the last series is absolutely and locally uniformly convergent and thus holomor-
phic for Re(s) > 12 . Set
X
h(s) = gp (s) log p,
p
129
so that !
1 X log p
D(s) = + h(s) .
s−1 p
ps − 1
We previously showed that
X
log ζ(s) = − log 1 − p−s ,
p
and hence ′
1 ζ (s)
D(s) = − + h(s) .
s−1 ζ(s)
By Theorem 14.22, the right-hand side is holomorphic for σ ≥ 1 up to a double pole
at s = 1 (since ζ has a simple pole at s = 1). Since ζ has residue 1 at s = 1, the
principal part (the part that grows) of the Laurent expansion of D(s) at s = 1 is
1 c
D(s) = 2
+ + ...
(s − 1) (s − 1)
for some constant c. Now set
D(s)
e := D(s) + ζ ′ (s) − cζ(s).
e is holomorphic for σ ≥ 1 and we have
Then D
X∞
D(s) =
e an − log n − c n−s .
n=1
In particular, setting
fn := an − log n − c,
the series (from plugging s = 1 in)
∞
X fn
n=1
n
converges. In the next step, we show that fn → 0, which together with
X log p
fn = − log n − c
p≤n
p
and
X fn
> −ε2 .
n
N (1−ε)≤n≤N
≥ fN − log(1 + ε) > fN − ε.
↑
series exp.
of log
Since
X 1 N − N (1 − ε) N − N (1 − ε)
≥ ≥ =ε
n N N
N (1−ε)≤n≤N
131
we conclude that fN + 2ε > −ε, and thus fN > −3ε. It follows that
|fN | < 3ε,
from which we conclude that fn → 0.
We now combine the results about the Riemann zeta function.
Theorem 14.25.
(1) The only pole of the Riemann zeta function is at the point s = 1 and it is a
simple pole with residue 1.
(2) The only zeros of the Riemann zeta function outside of the strip 0 < σ = Re(s) <
1 (this is known as the critical strip) is at the points s ∈ −2N. These zeros are
simple.
(3) For s in the critical strip, if s, s, 1 − s, or 1 − s is a zero of the Riemann zeta
function, then all of the others are as well. The orders of the zeros are all the
same.
Proof.
(1)+(2): We have seen the claim for σ > 0. Hence it suffices to determine the poles
and zeros for σ ≤ 0. Consider the function
s s
Λ(s) = ζ(s)Γ π− 2 .
2
By Theorem 14.18, we have
Λ(1 − s) = Λ(s).
For σ ≤ 0, we have 1 − σ ≥ 1. Hence
2s−1
1−s
ζ(1 − s)Γ π 2
ζ(s) = s
2 .
Γ 2
For σ ≤ 0 and s ̸= 0, the function ζ(1 − s) does not have any poles or zeros,
while it has a simple pole when s = 0. However, this simple pole is cancelled
by the pole of Γ(s/2) (recall that there are simple poles of Γ(z) whenever
1−s
z ∈ −N0 ). Note further that for σ ≤ 0, the function Γ 2 has no zeros
or poles. Thus there are zeros precisely when 2s ∈ −N, or in other words
s ∈ −2N. They are all simple because the poles of the Gamma function are
simple.
(3) Suppose that σ > 1. Then
X X
ζ(s) = n−s = n−s = ζ(s).
n≥1 n≥1
Remarks.
1. The claim that ζ has infinitely many zeros in the critical strip 0 < σ < 1 and
that these all satisfy the properties in Theorem 14.25 (3) was conjectured by
Riemann and proven in 1893 by Hadamard.
2. There are many questions about the distribution of the zeros. Suppose that
T ≥ 0 and let N (T ) be the number of zeros of ζ(s) with 0 ≤ Im(s) ≤ T .
Then Riemann conjectured and von Mangoldt proved that as T → ∞
T T T
N (T ) = log − + O (log T ) ,
2π 2π 2π
giving a “vertical” distribution of the zeros in the critical strip. Much less
is known about the “horizontal” distribution (i.e., about the real parts. In
particular, it has not been proven that
1
ζ has no zeros for σ > 1 − ε, no
matter how small one takes ε ∈ 0, 2 . Showing that the zero-free region
includes a strip of the type Re(s) > 1 − ε would have a number of important
applications.
c
De la Vallee Poussin proved that for η(t) := log(t) with c > 0 and σ >
t − η (|t|) with |t| sufficiently large, ζ (σ + it) ̸= 0. Note that η(|t|) → 0 as
|t| → ∞.
The most famous conjecture about the horizontal distribution of the zeros
if the Riemann hypothesis, which conjectures that all of the zeros lie on the
critical line σ = 21 . Furthermore, the grand simplicity conjecture conjectures
that all zeros of the Riemann zeta function are simple. There is numerical
and theoretical evidence in support of the Riemann hypothesis, but it has
not been proven.
We finally consider the function 1/ζ(s), which would have poles wherever ζ has
zeros (and vice-versa).
In particular
∞
X µ(n)
= 0.
n=1
n
Furthermore, for x → ∞, we have
X
µ(x) = o(x).
n≤x
133
Proof. Since µ is multiplicative, the product formula for multiplicative functions
implies that for σ > 1
∞ ∞
X µ(n) Y X ν −νs
Y
−s
1
s
= µ (p ) p = 1 − p = .
n=1
n p ν=0 p
ζ(s)
1
Since ζ does not have any zeros for σ ≥ 1, it follows that ζ(s) is holomorphic in that
region. We furthermore see that ζ(s)−1 vanishes at s = 1 because ζ(s) has a pole
there.
The last claim follows by partial summation (Lemma 14.9). Namely, with
µ(n) X µ(n)
an = , tn = n, g (t) = t, A(t) = ,
n n≤t
n
Z x
1 X 1 X µ(n) 1
µ(n) = · n = A(x) − A (t) dt.
x n≤x x n≤x n x 1
By the second claim in the theorem, for arbitrary ε > 0 there exists t0 such that for
t > t0 we have |A (t)| ≤ ε. Therefore
1 t0
Z
1X 1
µ(n) ≤ ε + |A(t)| dt + ε(x − t0 ) = 2ε + o(1).
x n≤x x 1 x
An improvement for M (x) would have strong implications for the Riemann zeta
function.
Theorem 14.27.
(1) Suppose that for some a ∈ 21 , 1 we have M (x) = O xa as x → ∞. Then ζ
135
15. Elliptic Curves
Definition. An elliptic curve is given by an equation of the form
y 2 = x3 + ax2 + bx + c.
For fixed a, b, c, one looks for solutions (x, y) to the above equation (for example,
with x, y ∈ Z or x, y ∈ Q)
E : y 2 = x3 + 17
The elliptic curve
(15.1) E : y 2 = x3 + 17
has the solutions (−2, 3), (−1, 4), and (2, 5) (these were found simply by randomly
plugging in x and solving for y). How does one systematically find solutions to the
above equation?
Suppose that we have a solution such as (−2, 3) and we’d like to find more solutions.
Noting that the solution (−2, 3) also satisfies the linear equation
y = x + 5,
we may ask which other solutions also satisfy this linear equation.
Plugging in the linear equation to the equation (15.1) defining the elliptic curve
yields the equation
x3 − x2 − 10x − 8 = 0.
Since we already know that x = −2 is a solution, we can use polynomial long division
and factor
x3 − x2 − 10x − 8 = (x + 2) x2 − 3x − 4 .
By the quadratic equation, the second factor has the solutions x = −1, x = 4.
Plugging in the relation y = x + 5 yields the points (−1, 4) and (4, 9) on the elliptic
curve. Note also that (−2, −3), (−1, −4), and (4, −9) are also solutions due to the
symmetry y → −y.
136
The point (−2, 3) is also a point on the line
y = 3x + 9,
and we find that other points lying on the same line and also on the ellipltic curve
must satisfy
0 = x3 − 9x2 − 54x − 64.
Polynomial long division yields
0 = (x + 2) x2 − 11x − 32 .
It follows that z | A3 . Since we have already seen that z|B and gcd(A, B) = 1, we
conclude that z = ±1. Moreover, since B = v 2 z > 0, we conclude that z = 1. We
conclude that B = v 2 and D = v 3 . We may therefore simplify (15.3) as
C 2 = A3 + Av 4 = A A2 + v 4 .
Note that it is often the case that Np = p. One easily checks that this is the case
for
p = 2, 3, 7, 11, 19, 23, 31, 43, 47, 59, 67, 71, . . .
Questions.
• Is it always the case that Np = p for p ≡ 3 (mod 4)?
140
• What about the other primes, namely p ≡ 1 (mod 4)?
Indeed, these questions have been answered, but the proof is left out of this lecture.
Proof.
We first claim that 03 + 17, 13 + 17, . . ., (p − 1)3 + 17 (mod p) is a permutation
of 0, 1, 2, . . . , p − 1 (mod p). To show the claim, we must show that j 3 + 17 are all
distinct modulo p, or in other words
b31 ≡ b32 (mod p) ⇒ b1 ≡ b2 (mod p).
Since gcd(3, p − 1) = 1, there exists a solution u, v ∈ Z to the equation
3u − (p − 1)v = 1
2p−1
(for example, u = 3
,v = 2). Then, using Fermat’s little theorem (Theorem 4.11
(3)), we have
(p−1)v+1 (p−1)v+1
b31 ≡ b32 (mod p) ⇒ b3u 3u
1 ≡ b2 (mod p) ⇒ b1 ≡ b2 (mod p)
Thm. 4.11(3)
⇒ b1 ≡ b2 (mod p).
This yields the claim.
Hence the number of solutions to y 2 ≡ x3 + 17 (mod p) (with x running through all
possible choices modulo p) is equal to the number of solutions to y 2 ≡ a (mod p)
(with a running through all possible choices modulo p). We thus next count the
number of solutions for each such choice of a.
The congruence y 2 ≡ 0 (mod p) has the unique solution y ≡ 0 (mod p) and
furthermore the congruences
y 2 ≡ a (mod p) 1≤a≤p−1
141
have either two or no solutions, with each occurring equally often. Therefore y 2 ≡
x3 + 17 (mod p) has precisely
p−1
Np = 1 + 2 =p
2
solutions modulo p. □
Remark 15.9. The two elliptic curves that we have looked at are special; it is nor-
mally rather uncommon to have Np = p. However, p is the “expected value” of Np
in some sense.
Since p is a sort of “expected value”, for an arbitrary elliptic curve E, it is natural to
consider the difference ap := p − Np , which is known as the p-defect. The following
theorem (which we do not prove in this class) explains that Np cannot be too far
away from the expected value of p.
Theorem 15.10 (Hasse). Let E be an elliptic curve with integral coefficients and
denote by Np the number of points on E modulo p. Then for ap := p − Np , we have
√
|ap | < 2 p.
142
16. Partitions
Definition. A partition of an integer n ∈ N is a non-increasing sequence of integers
in N (a1 , . . . , ar ), for which
n = a1 + . . . + ar .
Let p(n) denote the number of partitions of n, writing p(0) = 1 for convenience.
There is a geometric representation of partitions given via the so-called Ferrers di-
agram.
Every summand of the partition is given as a row of the Ferrers diagram, with a
number of dots equal to the size of the summand.
Example 16.2.
6 + 3 + 3 + 2 + 1 = 15.
• • • • • •
• • •
• • •
• •
•
There is a natural map that takes partitions of n to other partitions of the same
size n given by the conjugate partition.
The conjugate partition is constructed by interchanging the rows and columns of
the partition. For example, the conjugate of the partition 6 + 3 + 3 + 2 + 1 seen
earlier in this example is the partition
5 + 4 + 3 + 1 + 1 + 1,
with corresponding Ferrers diagram
• • • • •
• • • •
• • •
•
•
•
Since conjugation forms a bijection on the partitions of size n (it is an involution),
we obtain the following conclusion.
143
Theorem 16.3. The number of partitions of n with precisely m parts is equal to
the the number of partitions of n for which the largest part is precisely m.
For q ∈ C with |q| < 1, we let
∞
X
P (q) = p(n)q n
n=0
Proof. We ignore questions of convergence and simply prove the identity formally (a
full proof relies on the Weierstrass majorant criterion and the Weierstrass product
formula from complex analysis).
Let F (q) denote the right-hand side. We expand every factor in the infinite
product as a geometric series to obtain
2 3 2 4 3 6
F (q) = 1 + q + q + q + . . . 1 + q + q + . . . 1 + q + q + . . . · · ·
= 1 + q + q 1+1 + q 1+1+1 + . . . 1 + q 2 + q 2+2 + . . . 1 + q 3 + q 3+3 + . . . · · · .
We now multiply out the product termwise, using the distributive property (on
products of infinite sums). We thus choose one term from each of the factors.
Say that we take the term q k1 ·1 from the first factor, q k2 ·2 from the second factor,
. . ., and q km ·m from the m-th factor (we can assume that there are only finitely-many
factors which are not 1, since q to an infinite power is zero, due to the fact that
|q| < 1). The contribution from this term of the product is thus
q k1 ·1 q k2 ·2 . . . q km ·m = q k1 ·1+k2 ·2+...+km ·m .
Writing
∞
X
F (q) = f (n)q n ,
n=0
we see that this term contributes to f (n) if and only if
k1 · 1 + k2 · 2 + . . . + km · m = n.
We therefore see that f (n) precisely counts the number of partitions of n, and hence
F (q) = P (q). □
Remark 16.5. We often use the fact that if G(q) = n≥0 bn q n = 0, then bn = 0 for all
P
n. One way to see this is to take q → 0. The first term bn ̸= 0 gives the asymptotic
growth, and the sum cannot be zero if there is a non-zero term. One has to be
careful to bound the other terms to show that this is true. This bounding (using
144
real analysis) can be avoided by rewriting bn as a certain integral (using something
called the residue theorem from complex analysis, but we won’t go into detail here)
Z
1
bn = G(q)q −n−1 dq,
2πi C
where C is any simple path around zero that goes counter-clockwise. Obviously, if
G(q) = 0, then the integral is automatically zero as well, which implies that bn = 0.
Theorem 16.6 (Euler’s Pentagonal Number Theorem). For q ∈ C with |q| < 1, we
have
Y∞ ∞
X
1 − qm = (−1)n q w(n) .
m=1 n=−∞
147