0% found this document useful (0 votes)
301 views188 pages

Cyclotomic Fields With Applications

Uploaded by

Doan Trinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
301 views188 pages

Cyclotomic Fields With Applications

Uploaded by

Doan Trinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 188

Cyclotomic

Fields with
applications

G Eric Moorhouse
CYCLOTOMIC FIELDS
WITH APPLICATIONS

Lecture Notes for Math 5590


Fall 2018

G. Eric Moorhouse
University of Wyoming
c 2018
ii
Preface

These notes were written during the summer of 2018, while planning a graduate
course for the Fall 2018 semester. The theme was chosen to appeal to students of varying
backgrounds, some with interest primarily in number theory, and others more interested
in combinatorics, graph theory and finite geometry. As it happens, my research has led me
into areas of overlap between these two areas, with cyclotomic fields arising as a common
theme. And so beyond the immediate goal of appealing to students with multiple interests,
this course was conceived also as a way of crystallizing in my mind some of the finer points
of the theory of cyclotomic fields, many of which I had less familiarity with. Students were
referred primarily to Washington’s book [Wa] for further details on cyclotomic fields, and
various other sources as needed.
The realities of my teaching environment (perhaps yours too?) mean that I now
apportion less lecture time on theory and proofs, with more on examples and applications,
than when I first began teaching. In the grand tradition of mathematics, these applications
arise largely in . . . (wait for it!) . . . other areas of mathematics. (That may not be
strictly true; but as usual, our description of these applications has been rather simplified,
sometimes oversimplified, to their mathematical essence, for the sake of brevity.) These
applications include
 algorithms for fast arithmetic with polynomials and integers;
 constructions and nonexistence results for Hadamard matrices, difference sets, and
designs, particularly nets finite affine and projective planes;
 spectra of Cayley graphs and digraphs over abelian groups;
 counting solutions to equations over finite fields;
 the MacWilliams relations for error-correcting codes;
 Dirichlet’s theorem on primes on arithmetic progressions; and
 mutually unbiased bases (quantum information theory).
Given the demands of this pedagogical emphasis, there has been no single reference avail-
able where all of these developments can be found.
Another design constraint on these notes has been the varying backgrounds of our
students, some will have had advanced courses in field theory or number theory, and
others not. In order to keep these notes as self-contained as possible, I have included
appendices containing many of the results needed from field theory and number theory,
omitting the longer proofs; also omitting major results in the theory which to not bear
directly upon our particular development or featured applications. I expect that during
this fall semester, I will actually summarize much of the content in these appendices during
the lectures, rather than leaving students to read these solely on their own.
I am indebted to many sources from which I have borrowed extensively, particularly
[IR], [LN], [Sa] and [Wa]. Often this has meant rewriting content in my own way, and

iii
adding details which other authors have left as exercises. I have also looked for ways
to avoid explicitly developing all the tools required in some of the standard proofs—
not that I feel these tools are unimportant for students to learn, but due to concern
that the proliferation of technical definitions and warmup lemmas would overly distract
students from the main points. One of my goals, in particular, is a presentation of Gluck’s
Theorem 14.2 (a beautiful and very accessible argument using cyclotomic integers in a
nontrivial and surprising way). Its proof, however, invokes a theorem of Segre usually
formulated in the language of projective plane geometry. Not wanting to spend the extra
time on such an extended detour for the majority of our students without this conceptual
background, I strove instead for an alternative presentation of Segre’s Theorem in the
language of affine plane geometry. I am very happy with the resulting Theorem 3.14,
which I feel is also better adapted to the proof of Gluck’s Theorem than the original.
I regret omitting several major topics which a more comprehensive textbook would
have included: Stickelberger’s Theorem, higher reciprocity laws, applications to algebraic
coding theory and cryptology, and Bernhard Schmidt’s definitive work on the circulant
Hadamard conjecture. However, in the spirit of a set of working lecture notes, my priority
has been to limit the scope to only what I believe can be accomplished in a single semester.
But perhaps in a future revision. . .
Throughout all my rewriting of standard material, I will certainly have added many
of my own errors, for which I take full responsibility. A list of errata will be posted at
http://ericmoorhouse.org/courses/5590/
With each mistake/misprint that you encounter in this manuscript, please first check the
website to see if it has already been listed; if not, please email me at [email protected]
with the necessary correction to add to this list. Thank you!

Eric Moorhouse
August, 2018

Notational Conventions
Throughout these notes, I compose functions right-to-left, as in (στ )(a) = σ(τ (a)).
Groups are multiplicative unless otherwise indicated. The symbol ζ denotes a complex root
of unity, except when it represents a zeta function (à la Riemann, Dedekind, Hasse, etc.).
Likewise, ‘i’ signifies either an integer index (sometimes a dummy index of summation),

or −1, again depending on the context. So deal with it.

iv
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

1. Finite Cyclic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


2. Cyclotomic Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Squares and Nonsquares 9
Automorphisms of Finite Fields 12
Polynomials versus Functions 13
Counting Irreducible Polynomials 15
Segre’s Theorem 16

4. Cyclotomic Fields and Integers . . . . . . . . . . . . . . . . . . . 19


5. Fermat’s Last Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6. Characters of Finite Abelian Groups . . . . . . . . . . . . . 35
Spectra of Cayley Graphs and Digraphs 40
Error-Correcting Codes 41
The Fast Fourier Transform 44
Fast Polynomial Multiplication 46
Fast Integer Multiplication 47

7. Group Rings R[G] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49


The Rational Group Algebra of a Finite Cyclic Group 50
Direct Products 52

8. Difference Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9. Hadamard Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Skew-Type Hadamard Matrices 66
Williamson-Hadamard Matrices 67
Regular Hadamard Matrices 73
Circulant Hadamard Matrices 74

10. Quadratic Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . 76


11. Gauss and Jacobi Sums . . . . . . . . . . . . . . . . . . . . . . . . . . 84
12. Zeta Functions and L-Functions . . . . . . . . . . . . . . . . . . 91
13. Exponential Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
14. Affine Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
15. Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
16. Mutually Unbiased Bases . . . . . . . . . . . . . . . . . . . . . . . 116
17. Weil’s Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Appendices
A1. Fields and Extensions. . . . . . . . . . . . . . . . . . . . . . . . . . .129
Matrix Representations of Field Extensions 132

A2. Polynomials and Irreducibility . . . . . . . . . . . . . . . . . . 136


A3. Algebraic Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

v
A4. Normal and Separable Extensions . . . . . . . . . . . . . . . 148
A5. Field Automorphisms and Galois Theory . . . . . . . . 153
A6. Dedekind Zeta Functions and Dirichlet Series . . . 163
A7. Symmetric Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 167
A8. Computational Software . . . . . . . . . . . . . . . . . . . . . . . . 170
PARI/GP 171
Mathematica 172

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

vi
1. Finite Cyclic Groups

A cyclic group is a group generated by a single element. A cyclic group may be finite or
infinite. Every infinite cyclic group is isomorphic to the additive group of Z; or equivalently,
the multiplicative subgroup hπi = {π k : k ∈ Z} ⊂ C× . (Here C× is the multiplicative group
of nonzero complex numbers; and one can replace π by any nonzero complex number which
is not a root of unity.) Every finite cyclic group of order n is isomorphic to the additive
group Z/nZ of integers mod n; equivalently, the multiplicative subgroup of complex nth
roots of unity. The latter group is

{z ∈ C : z n = 1} = hζi = {1, ζ, ζ 2 , . . . , ζ n−1 }


where ζ = ζn is a primitive n-th root of unity, i.e. an element of order n in C× .
Recall that C contains exactly φ(n) primitive n-th roots of unity e2πki/n where 1 6 k 6 n,
gcd(k, n) = 1 and Euler’s totient function φ(n) denotes the number of values of k
satisfying the latter conditions. Evidently φ(n) is the number of generators in an arbitrary
cyclic group of order n. In the preceding context, we have used the additive cyclic group
Z/nZ in which the generators are the elements relatively prime to n.
But we will often prefer that our groups be written multiplicatively. Thus in the
generic case, an arbitrary cyclic group of order n > 1 may be expressed (up to isomorphism)
as a multiplicative group G = hx : xn = 1i = {1, x, x2 , . . . , xn−1 } generated by an element
x of order n.

Theorem 1.1. Let G = hxi be a (multiplicative) cyclic group of order n > 1. Then
G has φ(n) generators xk , 1 6 k 6 n, gcd(k, n) = 1. Every subgroup of G is cyclic of
order d dividing n. Conversely, for every positive d n, G has a unique subgroup of
order d given by hxn/d i. Thus X
n= φ(d).
16d|n

Proof. Most of Theorem 1.1 is proved by the Division Algorithm. For example if H is a
subgroup of G, let d ∈ {1, 2, . . . , n} be minimal such that xd ∈ H. (Note that the set of
d ∈ {1, 2, . . . , n} satisfying xd ∈ H is nonempty since xn = 1 ∈ H; so the minimum such
d is defined.) So hxd i ⊆ H. We have n = qd + r for some integers q, r with 0 6 r < d. If
0 < r < d then xr = xn (xd )−q ∈ H, contradicting the minimality of d; so we must have
d n. Now hxd i ⊆ H; and to prove equality, let h ∈ H, so h = xj for some j. Again by the
Division Algorithm, j = q 0 d + r0 where 0 6 r0 < d; and again using the minimality of d,
0
we have r0 = 0, so xj = (xd )q ∈ hxd i. This gives H = hxd i. The relation n = d|n φ(d)
P

follows by counting in two different ways the number of pairs (g, H) where g ∈ G and
H = hgi 6 G.
 1. Finite Cyclic Groups

Theorem 1.2. If G is a cyclic group of order n, then its automorphism group Aut G
is abelian of order φ(n). In fact, Aut G ∼
= (Z/nZ)× , the multiplicative group of units
of the ring of integers mod n.

Proof. Let G = {1, g, g 2 , . . . , g n−1 }. For each k ∈ {1, 2, . . . , n} with gcd(k, n) = 1, define
σk : G → G by σk (x) = xk . One easily checks that σk is well-defined, bijective, and
σk (xy) = (xy)k = xk y k = σk (x)σk (y). Thus σk ∈ Aut G.
Conversely, let σ ∈ Aut G. Since g has order n, so does σ(g); thus σ(g) = g k for some
k ∈ {1, 2, . . . , n} with gcd(k, n) = 1. It follows readily that σ = σk . (Every x ∈ G has the
form x = g r for some g; and then σ(x) = σ(g r ) = σ(g)r = (g k )r = σk (g r ) = σk (x).) Thus
Aut G = {σk : 1 6 k 6 n, gcd(k, n) = 1}. The map (Z/nZ)× → Aut G, k 7→ σk is in fact
an automorphism: it is bijective, and for all x ∈ G, (σk (σ` (x)) = (x` )k = xk` = σk` (x), so
σk σ` = σk` . In particular, Aut G is abelian of order φ(n).

Caution: Do not confuse G with its automorphism group Aut G. Keep in mind that G and
Aut G do not have the same order. Theorem 1.2 does not say that Aut G is cyclic; nor
does it say that the automorphism group of an abelian group is abelian. See Exercise #4.
A function f defined on positive integers is multiplicative if f (mn) = f (m)f (n)
whenever m, n are relatively prime positive integers. (Note the condition that gcd(m, n) =
1. A function f is completely multiplicative if f (mn) = f (m)f (n) for all m, n. We shall
have reason to consider functions with this stronger property; but φ is not an example.)

1
Q 
Theorem 1.3. φ is multiplicative, and φ(n) = n p|n 1− p where the product
extends over all prime divisors p|n .

Proof. Suppose n = ab where a and b are relatively prime positive integers. The natural
homomorphism f : Z → (Z/aZ) ⊕ (Z/bZ) mapping k 7→ (k+aZ, k+bZ) has kernel nZ
(since this is the set of all integers divisible by both a and b.). Since f is surjective, it
induces a ring isomorphism

Z/nZ ∼
= (Z/aZ) ⊕ (Z/bZ).
Now it is easy to see that if R1 and R2 are rings with identity, then the units of R1 ⊕ R2
are exactly the elements (u1 , u2 ) with ui a unit in Ri . Thus

(Z/nZ)× ∼
= (Z/aZ)× × (Z/bZ)× .
Taking the cardinality of both sides gives φ(n) = φ(a)φ(b).
If p is prime and r > 1, then the integers k ∈ {1, 2, . . . , pr } not relatively prime
 to
r r−1 r r r−1 r 1
p are the integers k = p`, ` ∈ {1, 2, . . . , p }; so φ(p ) = p − p = p 1 − p . So
r
the indicated formula for φ(n) holds for prime powers n = p . More generally, let n be a
1. Finite Cyclic Groups 
Qr
positive integer and consider its prime factorization n = i=1 pei i where p1 , p2 , . . . , pr are
the distinct prime factors of n, and ei > 1. Using multiplicativity of φ,
r
Y Yr
 ei 1
1 − p1i .
 
φ(n) = pi 1 − pi = n
i=1 i=1

Some textbooks use the Chinese Remainder Theorem in place of the ring isomorphism
Z/nZ ∼ = (Z/aZ) ⊕ (Z/bZ) in proving the multiplicativity of φ. In our view, however, it
is the ring isomorphism which most naturally gives rise to both the Chinese Remainder
Theorem and the group isomorphism (Z/nZ)× ∼ = (Z/aZ)× × (Z/bZ)× . This is a classic
instance where just a modicum of abstract algebra provides a clean and insightful proof,
where the alternative is a rather tedious and technical argument.
A positive integer n is squarefree if n is not divisible by any square integer larger
than 1; equivalently, n is a product of distinct primes (possibly an empty product, so that
1 is squarefree). For every positive integer n, define
(−1)k , if n is a product of k distinct primes;

µ(n) =
0, if n is not squarefree.
Like φ, the function µ is multiplicative. Indeed, µ is the unique multiplicative function
satisfying
 1, if k = 0;

µ(pk ) = −1, if k = 1;

0, if k > 2
where p is prime. From Theorems 1.1 and 1.3 we obtain
X X µ(d)
n

Theorem 1.4. φ(n) = µ d d = d n.
16d|n 16d|n

Note that the two formulas given are equivalent via the substitution d ↔ nd for divisors
d|n. The formulas Pcan be proved in several ways: (i) directly by mathematical induction
on n, using n = 16d|n φ(d); or (ii) using the inclusion-exclusion principle for counting
the cardinalities of the subgroups H 6 G where |G|/|H|Q is squarefree;
 or (iii) using Möbius
1
inversion; or (iv) by expanding the formula φ(n) = n p|n 1 − p from Theorem 1.3.

Exercises 1.
1. (a) Let G be a cyclic group of order n. Note that if one chooses g ∈ G uniformly at random, the
φ(n) φ(n) φ(n)
probability that hgi=G is n ∈ [0, 1]. Show that lim sup n = 1 and lim inf n = 0.
n→∞ n→∞
n
This says that in particular, the ratio φ(n)
has no upper bound.
φ(n) 1 φ(1) φ(2) φ(n)
(b) Determine the ‘limiting average value’ of n
, i.e. evaluate lim [ 1 + +· · ·+ ].
n→∞ n 2 n
Some numerical data might provide an initial insight here.

2. Factorization of most integers having more than a couple hundred decimal digits, is prohibitively
difficult. (In fact, no polynomial-time algorithm for integer factorization is known.) Show that
computation of φ(n) for large integers is also prohibitively difficult in general. Hint: Consider
 2. Cyclotomic Polynomials

numbers of the form n = pq where p 6= q are large primes. For such numbers we have φ(n) =
(p − 1)(q − 1). Show that any algorithm to compute φ(n) (given only the decimal representation
of n) can also be used to provide the prime factorization of n.

3. To compute gcd(m, n) for small integers, one typically relies on first determining the prime
factorizations of m and n. By remarks above, this approach fails for large integers. However,
computation of gcd(m, n) for large integers (having several hundred digits) is very efficient using
Euclid’s algorithm (which runs in polynomial time). But by #2, one cannot expect to be able
to compute φ(n) exactly for most large values of n.
Given a large positive integer n, one might try to estimate φ(n) by random sampling: Repeat-
edly choose k between 1 and n (uniformly distributed, using a pseudorandom number generator).
After N trials, if d is the number of values of k found for which gcd(k, n) = 1, we obtain an
estimate φ(n) ≈ dnN
. How practical is this approach as a means to estimate φ(n)? In particular
can one realistically hope to approximate φ(n) to within, say, 10% of its true value? Can you
suggest obvious improvements to this approach? Consider in particular (by #1) that for large
values of n, there are values of n where randomly sampling k ∈ {1, 2, . . . , n} almost always finds
gcd(k, n) = 1; and other values of n where random sampling almost always finds gcd(k, n) > 1.

4. (a) Find the smallest n such that the automorphism group of a cyclic group G of order n is not
cyclic. Determine the isomorphism type of G in this case.
(b) Find the smallest abelian group G for which Aut G is nonabelian. Indicate the isomorphism
types of G and Aut G in this case.

5. The sum of the positive divisors of n is a multiplicative function σ(n) similar to φ(n), satisfying
the formula σ(n) = n p|n (1 + p1 ).
Q

(a) Derive formulas (analogues of Theorems 1.1 and 1.4) expressing n as a sum of values of the
σ function, and the reverse.
(b) A notoriously difficult problem is the determination of solutions of σ(n) = 2n. Such numbers
are called perfect. No odd perfect numbers are known. All even perfect numbers have the
form 2p−1 (2p − 1) where both p and 2p − 1 are prime; but only finitely many primes of the
form 2p − 1 are known (Mersenne primes) although it is conjectured that infinitely many
exist. So only finitely many (roughly 50) perfect numbers are known. Say what you can
about solutions of the analogous equation n = 2φ(n).

2. Cyclotomic Polynomials ζ3 ζ2
............... .

......................................................................
.................
• ...........
.......
. . . ..................
Let n be a positive integer. The multiplicative group C× of ... .. . ..........
.
.... . ..........
... . .........
.
... . .........
......
units in the complex numbers, form a cyclic group ..
...
.
.
.. .
. .
.
.
•ζ . ...
. ........
..
. . .
. ......
..
. . . ......
......
hζi = {1, ζ, ζ 2 , . . . , ζ n−1 } ... . . .
.... . . . .... ......
.. . .
... 2π/n ......
.. ... ....
.. . . . ... . . . . . . ......
of order n, where ζ = ζn is a primitive complex nth root of 0•
...
...
...
•1 .
.
. .......
.
... . ........
.
... . . ..
unity. Usually we take ζn = e2kπi/n ; although for most pur- ...
...
...
.
.
.
. .........
..
......
..

. ...
poses, any primitive n-th root of unity will serve just as well. •ζ ...
...
.... .......
. .
. −1
..... .........
..... .............
2 n−1 ...... ..
We consider 1, ζ, ζ , . . . , ζ as unit vectors in the plane sym- ......
........
..........
......
.......
.....
............................................
metrically arranged about the origin, forming the vertices of
a regular n-gon inscribed in the circle |z| = 1. By symmetry, we directly infer the relation
1 + ζ + ζ 2 + · · · + ζ n−1 = 0 whenever n > 2,
a fact that we can also derive algebraically by comparing coefficients of tn−1 on both sides of
2. Cyclotomic Polynomials 

(2.1) tn − 1 = (t − 1)(t − ζ)(t − ζ 2 ) · · · (t − ζ n−1 ).

So ζ is a root of 1 + t + t2 + · · · + tn−1 ∈ Z[t]; yet this not in general the minimal polynomial
of ζ. The n-th cyclotomic polynomial is the monic polynomial defined by
Y
Φn (t) = (t − ζ k ).
16k6n
gcd(k,n)=1

By construction, its roots are all the φ(n) primitive n-th roots of unity in C; so the coeffi-
cients in Φn (t), being the elementary symmetric polynomials in these roots, are algebraic
integers. The extension E = Q[ζ] ⊇ Q contains all these roots, since they are powers of ζ;
and so E is the splitting field of Φn (t) over Q. In particular, E ⊇ Q is a Galois exten-
sion (Appendix A5). Every automorphism σ ∈ Aut E permutes the roots of Φn (t), so the
coefficients in Φn (t) lie in Q (Theorem A5.12). So by Theorem A3.2(ii), these coefficients
must be rational integers. This shows that Φn (t) ∈ Z[t].
Grouping together the factors t − ζ r in (2.1) according to gcd(r, n), we have
Y Y
tn − 1 = (t − ζ r ).
d|n 16r6n
gcd(r,n)=d

n
gcd j, nd = 1. This gives

Now gcd(r, n) = d iff r = dj where 1 6 j 6 d,
Y Y Y
tn − 1 = (t − ζ dj ) = Φ nd (t)
d|n 16j6 n d d|n
gcd(j, n )=1
d

since ζ dj (for 1 6 j 6 nd , gcd j, nd = 1) are the primitive d-th roots of 1. Replacing d by



n
d gives part (i) of the following:

Theorem 2.2. (i) For every n > 1, tn − 1 = d|n Φd (t).


Q

(ii) Each of the polynomials Φn (t) has integer coefficients; it is irreducible in Z[t],
and so also in Q[t]. Hence Φn (t) is the minimal polynomial of ζn over Q.

The fact that Φn (t) ∈ Z[t] was shown above. We have not actually shown that Φn (t) is
irreducible in Z[t] (and so also in Q[t]). Here we give the standard proof of this in the
important special case n = p is prime. A similar argument works for prime powers n = pe
(Exercise #2). Lang [L2] proves the irreducibility of Φn (t) in the general case, using an
argument which reduces to the prime power case. Now for p prime, use the fact that

(t + 1)p − 1 p(p − 1) p−2 p(p − 1)


Φp (t + 1) = = tp−1 + ptp−2 + t + ··· + t + p ∈ Z[t]
t 2 2
where all coefficients (except the leading coefficient) are divisible by p, and the constant
term is not divisible by p2 . By Eisenstein’s Criterion (Theorem A2.4), Φp (t + 1) ∈ Z[t]
 2. Cyclotomic Polynomials

is irreducible in Z[t]. So Φp (t) ∈ Z[t] is also irreducible in Z[t] (as follows from the
substitutions u = t+1, t = u−1 with all-integer coefficients) and so Φp (t) is also irreducible
in Q[t].
The factorization tn − 1 = d|n Φd (t) can be reversed to compute the cyclotomic
Q

polynomials from the polynomials td − 1 using ordinary division of polynomials. This may
be expressed as

d
− 1)µ(n/d) .
Q
Theorem 2.3. For every n > 1, Φn (t) = d|n (t

For example,
Φ1 (t) = t − 1
t2 −1
Φ2 (t) = t−1 =t+1
3
t −1
Φ3 (t) = t−1 = t2 + t + 1
t4 −1
Φ4 (t) = t2 −1 = t2 + 1
t5 −1
Φ5 (t) = t−1 = t4 + t3 + t2 + t + 1
(t6 −1)(t−1)
Φ6 (t) = (t3 −1)(t2 −1) = t2 − t + 1
t7 −1
Φ7 (t) = t−1 = t6 + t5 + t4 + t3 + t2 + t + 1
t8 −1
Φ8 (t) = t4 −1 = t4 + 1
t9 −1
Φ9 (t) = t3 −1 = t6 + t3 + 1
(t10 −1)(t−1)
Φ10 (t) = (t5 −1)(t2 −1) = t4 − t3 + t2 − t + 1
· · · etc.
Theorem 2.3 also gives us another proof that Φn (t) ∈ Q[t] (leading to another expla-
nation why Φn (t) ∈ Z[t]). Comparing degrees on both sides of Theorem 2.2(i) gives n =
n
P P 
d|n φ(d); and comparing degrees on both sides of Theorem 2.3 gives φ(n) = d|n µ d d.
So we recover the formulas of Theorems 1.1 and 1.4. Intuition recognizes a connection here;
formally, this is expressed by the observation that Theorems 2.2 and 2.3 are categorified
versions of their counterparts in Section 1. We do not explain the meaning of categorifi-
cation, but it is worth noting that efforts to categorify numerical formulas in a way much
like this are often very fruitful.

Theorem 2.4. Let ζ = ζn , n > 1. Then the extension E = Q(ζ) ⊇ Q is Galois


of degree φ(n). The automorphisms of E form a group Aut E = {σr : 1 6 r 6
n, gcd(r, n) = 1} where σr (ζ) = ζ r . This is an abelian group of order φ(n) isomorphic
to (Z/nZ)× , the group of units of the ring of integers mod n. We have σr σs = σrs
whenever r, s ∈ (Z/nZ)× , so the map r 7→ σr is an explicit isomorphism (Z/nZ)× →
Aut E.
3. Finite Fields 

Proof. We have already explained why the extension E ⊇ Q is the splitting field of Φn (t)
over Q (relying on the fact that Φn (t) is irreducible over Q, for which we have cited
Lang [L2] in the general case). So E ⊇ Q is Galois of degree [E : Q] = deg Φn (t) = φ(n);
and Aut E = G(E/Q) also has order φ(n). By Theorems A5.4 and A5.5, G is faithfully
represented as a transitive group of permutations of the φ(n) roots of Φn (t). This gives a
representation of G as a group of automorphisms of the cyclic group hζi of order n; and
this group has been completely described in the proof of Theorem 1.2. Noting that hζi has
only these φ(n) automorphisms, we may identify G with the automorphism group of the
cyclic group hζi of order n.

The general theory of cyclotomic fields will be continued in Section 4. Before then
we will quickly survey the theory of finite fields, which will put us in better stead for the
sequel.

Exercises 2.
1. Contrary to what one might first guess based on the first few examples, coefficients appearing in
cyclotomic polynomials are not always 0 or ±1. Find the first example (i.e. with the smallest n)
in which Φn (t) has a coefficient ±2.
2. Let n = pe , p prime, e > 1. Prove that Φn (t) has integer coefficients and is irreducible in Z[t],
hence also in Q[t].

3. Let n be a positive integer.


 t + 1, for n = 1;

(a) Show that Φ2n (t) = Φn (−t), for odd n > 3;


Φn (t2 ), for n even.

(b) State and prove an analogue of (a) expressing Φ3n (t) in terms of Φn (t).
(c) By generalizing (a) and (b), can you give a formula for Φpn (t) in terms of Φn (t) whenever p is
prime? If so, does this lead to a practical algorithm for computing cyclotomic polynomials?
By ‘practical’, one might ask how well it compares with the algorithm based on Theorem 2.3,
and illustrated by the examples which follow that result.
4. Find, with proof, a simple formula for Φn (1).

5. The general linear group of degree n over a field F is the multiplicative group GLn (F )
consisting of all invertible n × n matrices over the Q
field F . In the special case that F is finite with
|F | = q elements, one has |GLn (Fq )| = q n(n−1)/2 n k
k=1 (q − 1). For each n = 1, 2, . . . , 6, express
|GLn (Fq )| as an explicit polynomial in q; and in each case, factor the resulting polynomial into
irreducible factors in Z[q].

3. Finite Fields

For every prime p, the integers mod p form a field K = Fp . The essential observation
here is that every nonzero element a ∈ K has a multiplicative inverse. (Interpret a as an
integer not divisible by p; then since gcd(a, p) = 1, the extended Euclidean algorithm gives
 3. Finite Fields

1 = ra + sp for some r, s ∈ Z, and then r is an inverse for a mod p.) We next consider
arbitrary finite fields.
Let F be a finite field; and let q = |F | be its order. Since the additive group of F
is finite, every element in this group has finite order; so there is an element a ∈ F of
prime order p in the additive group. Now every nonzero element b ∈ F must also have
additive order p. To see this, note that the map θ : F → F , x 7→ ab x is clearly bijective
and θ(x + y) = θ(x) + θ(y), so θ is an automorphism of the additive group of F ; so b has
the same order as θ(b) = a. Thus all nonidentity elements of F have additive order p. An
abelian group with this property is elementary abelian: it is a direct product of cyclic
groups of order p. Thus every finite field F has prime power order q = pe for some e > 1
and prime p. Less obvious is the fact that for every prime power q, there is a field F of
order q; and it is unique up to isomorphism, so we may unambiguously write F = Fq . The
field F is an extension of degree [F : K] = e over the prime field K = Fp . The prime p
is the characteristic of F (and of K). See Appendix A1 for more a general discussion of
fields and extensions.
The most direct way to construct F = Fq , q = pe , is to first choose a monic irreducible
polynomial f (t) ∈ K[t] of degree e. Without loss of generality, f (t) is monic (its leading
coefficient is 1). Then F = K[θ] where θ is a formal symbol acting as a root of f (t). In
other words, F ∼ = K[t]/(f (t)). Now F has {1, θ, θ2 , . . . , θe−1 } as a basis over K. This
information gives a completely explicit construction of F . (Note that there are pe monic
polynomials in K[t]; and at least one of them is irreducible. Take this fact on faith for
now; also the fact that the resulting field F doesn’t depend on which such irreducible
polynomial we choose.)

Example 3.1: The field of order 16. Let K = F2 = {0, 1}. The monic polynomials t and
t+1 of degree 1 are of course irreducible. Of the four polynomials of degree 2, three (namely t2 ,
t2 +1 = (t+1)2 and t2 +t = t(t+1)) are reducible; so by process of elimination, the polynomial
t2 +t+1 is irreducible. Of the sixteen polynomials of degree 4, we need only consider those with
constant term 1 (so that 0 is not a root) and an odd number of terms (so that 1 is not a root). This
leaves just four polynomials including t4 +t2 +1 = (t2 +t+1)2 ; so each of the remaining three choices
t4 +t+1, t4 +t3 +1, t4 +t3 +t2 +t+1
are all irreducible (because otherwise, an irreducible factor of degree 6 2 would be involved, a possi-
bility which we have already ruled out). For simplicity, we take f (t) = t4 +t+1 and F = K[θ] where
f (θ) = 0, i.e. θ4 = θ+1.

Recall that F × is the multiplicative group of order q −1 consisting of nonzero elements


of F .

Theorem 3.2. Let F = Fq where q = pe as above. Then the multiplicative group


F × is cyclic of order q − 1.

Proof. By the Fundamental Theorem of Abelian Groups, F × ∼ = Cn1 × Cn2 × · · · × Cnk


as a direct product of cyclic groups of orders n1 , . . . , nk with n1 n2 · · · nk = q − 1. In order
3. Finite Fields 

to prove that F × is itself cyclic, it suffices to prove that gcd(ni , nj ) = 1 for all distinct i, j.
If not, then gcd(ni , nj ) = d > 2 for some i 6= j. But then F × has at least d2 elements
of order dividing d (inside the subgroup Cni × Cnj ), hence at least d2 > d roots of the
polynomial xd − 1, a contradiction.

Example 3.3: The field of order 16. Take F = F16 = K[θ] where K = F2 and θ4 = θ+1 as in
Example 3.1. Recursively computing θj+1 = θθj gives
θ4 = θ+1 θ7 = θ3 +θ+1 θ10 = θ2 +θ+1 θ13 = θ3 +θ2 +1
θ5 = θ2 +θ θ8 = θ2 +1 θ11 = θ3 +θ2 +θ θ14 = θ3 +1
6 3
θ = θ +θ 2 9 3
θ = θ +θ 12 3
θ = θ +θ +θ+1 2 θ15 = 1
Of course in the present context (characteristic two), all minus signs are the same as plus signs. Note
that the cyclic group F × of order 15 contains
• one element 1 of order 1 (the root of Φ1 (t) = t−1);
• two elements of order 3. These are θ5 and θ10 , the roots of Φ3 (t) := t2 +t+1;
• four elements of order 5. These are θ3 , θ6 , θ9 , θ12 , the roots of Φ5 (t) := t4 +t3 +t2 +t+1; and
• eight elements of order 15. These include θ, θ2 , θ4 , θ8 , the roots of t4 +t+1; and θ7 , θ11 , θ13 , θ14 ,
the roots of t4 +t3 +1. Together these make up the eight roots of Φ15 (t) = t8 −t7 +t5 −t4 +t3 −t+1.
The cyclotomic polynomial Φn (t) is defined in Section 2; its roots are the primitive n-th roots of
unity. The elements of F are the sixteen roots of t16 −t; and the elements of F × are the fifteen roots
of t15 −1 = Φ1 (t)Φ3 (t)Φ5 (t)Φ15 (t).

Squares and Nonsquares


Let F = Fq . If q is even then the map F → F , a 7→ a2 is bijective: every element is a
square (and every element has a unique square root). This follows from the fact that the
group |F × | = q − 1 has odd order.
Now (and for the remainder of our discussion of squares) assume that q is odd, so that
the map a 7→ a2 on F × is two-to-one (since the group F × is cyclic of odd order q − 1). In
this case the subgroup S ⊂ F × of index 2 consists of squares; and each element a2 ∈ S
has exactly two square roots ±a ∈ F . The cosets of S give a partition F × = S t N where
|S| = |N | = 21 q − 1 and N consists of nonsquares. The quadratic character of F is


the map
1, if a ∈ S;
(
χ : F → C, χ(a) = −1, if a ∈ N ;
0, if a = 0.

Theorem 3.4. For any field F of odd order q, the quadratic character χ satisfies
q−1
χ(a) = a 2 (interpreted as an element of F ). In particular, χ(ab) = χ(a)χ(b) for
all a, b ∈ F . Thus the product of two squares, or of two nonsquares, is a square; the
product of a square and a nonsquare is a nonsquare.

Proof. The assertions are clear when a = 0 (or b = 0); so assume ab 6= 0. The q − 1
q−1  q−1 
elements of the cyclic group F × are the roots of tq−1 −1 = t 2 +1 t 2 −1 where every
q−1
square a2 ∈ S satisfies (a2 ) 2 = aq−1 = 1, so the nonzero squares are the q−1
2 roots
 3. Finite Fields

q−1 q−1
of t 2 −1 in F ; and by elimination, the nonsquares are the roots of t 2 +1 in F . The
q−1
multiplicative property χ(ab) = χ(a)χ(b) follows from the formula χ(a) = a 2 .

Corollary 3.5. In a field F of odd order q, the element −1 is a square or a nonsquare


according as q ≡ 1 or 3 mod 4.

q−1
Proof. Use χ(−1) = (−1) 2 .

We should clarify notation by a simple example: In F13 we might solve 3x + 7 = 2


by writing 3x = 2 − 7 = −5 = 8 and x = 83 = 7. These equations (not congruences!),
although not valid in Q or in R, are perfectly valid in the context of F13 ; in particular the
expressions 2 − 7 and 83 are perfectly reasonable (albeit unsimplified) expressions in F13 .
q−1
Now in a formula such as χ(a) = a 2 where the left side has been defined as χ(a) ∈ {0, ±1}
q−1
in characteristic zero, while the right side a 2 lies in the field F of characteristic p, no
confusion should arise as to meaning. In this context, integer values may be interpreted
modulo p; and later, in other contexts, we will treat values of χ(a) as ordinary integers in
characteristic zero.
In fields of odd order, the partition F = N t {0} t S should be compared with
the partition of real numbers as negative, zero, positive; also the remaining assertions of
Theorem 3.4 with the fact that a product of two real numbers ab is positive or negative,
according as a and b have the same or opposite signs. Since positive and negative real
numbers are nonzero squares and nonsquares, the corresponding properties have a common
explanation. Yet the analogy is not complete: the sum of two squares in a finite field is not
in general a square. This follows necessarily from the fact that there are no finite ordered
fields. In fact we have:

q−1
Theorem 3.6. Let a ∈ F × , F = Fq , q odd, ε = (−1) 2 . Then each s ∈ S has
1
(i) 4 (q−4−ε) solutions of s = s1 + s2 , (s1 , s2 ) ∈ S × S;
1
(ii) 4 (q−2+ε) solutions of s = s1 + n1 , (s1 , n1 ) ∈ S × N ; and
1
(iii) 4 (q−ε) solutions of s = n1 + n2 , (n1 , n2 ) ∈ N × N .
Each n ∈ N has
1
(iv) 4 (q−ε) solutions of n = s1 + s2 , (s1 , s2 ) ∈ S × S;
1
(v) 4 (q−2+ε) solutions of n = s1 + n1 , (s1 , n1 ) ∈ S × N ; and
1
(vi) 4 (q−4−ε) solutions of n = n1 + n2 , (n1 , n2 ) ∈ N × N .

Proof. Elementary counting arguments show that the set of triples (a, b, c) with a, b, c ∈
F × and a + b + c = 0, form a set T of size |T | = (q − 1)(q − 2). Let mi be the number
3. Finite Fields 

of such triples containing exactly i squares, i ∈ {0, 1, 2, 3}. A fixed η ∈ N acts on T via
(a, b, c) 7→ (ηa, ηb, ηc), showing that m3−i = mi . So

2m2 + 2m3 = m0 + m1 + m2 + m3 = |T | = (q − 1)(q − 2).

Now the 14 (q − 1)2 triples (a, b, −a−b) with a, b ∈ S come in three types:
(I) m3 triples with −a−b ∈ S.
(II) 13 m2 triples with −a−b ∈ N . By symmetry, triples (a, b, c) ∈ T containing exactly
two squares are equally distributed between N ×S×S, S×N ×S and S×S×N .
(III) Triples with −a−b = 0, i.e. −1 = ab ∈ S. Such triples can only occur if q ≡ 1 mod 4,
in which case there are 21 (q −1) such triples (a, −a, 0), a ∈ S. In all cases, the number
of such triples can be written as 14 (1 + ε)(q − 1).
Thus 2
1
q − 1 = m3 + 13 m2 + 1
 
4 4 1+ε q−1 .
We now solve to obtain
1 3
   
m0 = m3 = 8 q − 1 q − 2 − 3ε , m1 = m2 = 8 q−1 q−2+ε .

Now it suffices to prove (i)–(iii), since these yield (iv)–(vi) by simply multiplying each of
the equations to be solved by η as above. And in each of (i)–(iii), the number of solutions
is independent of the choice of s ∈ S, as follows by multiplying the corresponding equation
by s.
For (i), solutions of 1 = s1 + s2 correspond to triples (1, −s1 , −s2 ) ∈ T . There are
m3 1 m1 /3 1
(q−1)/2 = 4 (q − 5) such triples if ε = 1; or (q−1)/2 = 4 (q − 3) such triples if ε = −1, where
m1 1
3 accounts for threefold symmetry as in (II). In either case we find 4 (q − 4 − ε) solutions
as claimed in (i).
For (ii), solutions of 1 = s1 + n1 correspond to triples (1, −s1 , −n1 ) ∈ T . Regardless
m2 /3
of the value of ε, there are (q−1)/2 = 41 (q − 2 + ε) such triples.
For (iii), solutions of 1 = n1 + n2 correspond to triples (1, −n1 , −n2 ) ∈ T . There are
m1 /3 1 m3 1
(q−1)/2 = 4 (q − 1) such triples if ε = 1; or (q−1)/2 = 4 (q + 1) triples if ε = −1. In both
cases, the number of solutions is 14 (q − ε) as claimed in (iii).

Automorphisms of Finite Fields

Theorem 3.7. Let F = Fq , q = pe , p prime, e > 1. Then Aut F is cyclic of order e,


generated by the map σ : F → F , a 7→ ap . In particular, the extension F ⊇ K = Fp
is Galois, with Galois group G(F/K) = {ι, σ, σ 2 , . . . , σ e−1 }.

Proof. Defining σ : F → F by σ(a) = ap , we clearly have σ(ab) = σ(a)σ(b) for all


a, b ∈ F . Also
 3. Finite Fields

p p p
 p−1   p−1
σ(a + b) = ap + 1 a b+ 2 ap−2 b2 + · · · +
ab + bp = ap + bp = σ(a) + σ(b)
p−1

in characteristic p, since all binomial coefficients kp for k = 1, 2, . . . , p−1 are divisible by




the prime p. Thus σ is a ring homomorphism. Since F is a field, its only ideals are {0}
and F ; and ker σ 6= F since σ(1) = 1. So ker σ = {0} and σ is injective. Since F is finite,
σ is also bijective, and it is an automorphism of F . The element σ ∈ Aut F has order at
most e = [F : K] by Theorem A5.4(iii). Its order must be exactly e since if 1 6 k < e,
k
the nonconstant polynomial xp − x ∈ K[x] cannot have pe distinct roots. Since the upper
bound |G| = e = [F : K] is attained, the extension F ⊇ K is Galois and G = hσi.

A general principle (not actually a theorem) is that almost anything that works for
prime order fields, works for finite fields. We now generalize Theorem 3.7 to arbitrary
finite fields.

Theorem 3.8. Let E ⊇ F be an extension of finite fields, with E = Fqn , F = Fq ,


where q is a prime power. Then the extension E ⊇ F is Galois of degree [E : F ] = n.
Its Galois group is cyclic: G(E/F ) = hσi = {ι, σ, σ 2 , . . . , σ n−1 } where σ(a) = aq . The
norm and trace maps of the extension are
n−1
Q i 2 n−1
NE/F (a) = σ (a) = a1+q+q +···+q ;
i=0
n−1 2 n−1
σ i (a) = a + aq + aq + · · · + aq
P
TrE/F a = .
i=0

Proof. Consider the tower E ⊇ F ⊇ K where E = Fqn , F = Fq , q = pe , K = Fp . By


Theorem 3.7, the extension E ⊇ K is Galois of degree ne with group G = G(E/K) = hτ i,
τ (a) = ap . Note that σ = τ e generates the subgroup of order n in G. By Galois corre-
spondence (Theorem A5.11) the fixed field FixE (hσi) is the unique subfield of E of order
q, so FixE (hσi) = F . Since hιi E hσi, the extension E ⊇ F is Galois of degree n and
its Galois group is G(E/F ) ∼
= GF = hσi. The formulas for norm and trace follow from
Theorem A5.13.

Polynomials versus Functions


Students encountering finite fields for the first time must come to terms with the
revelation that, for example in F3 [x],
(x2 − 1)3 = x6 − 1 6= x − 1.
All three of these polynomials represent the same function F3 → F3 , namely 0 7→ 2 7→ 1 7→
0; and the first two polynomials coincide (since corresponding coefficients agree); but the
last two polynomials are distinct (corresponding coefficients do not agree; indeed the two
3. Finite Fields 

polynomials don’t even have the same degree). In grappling with the source of one’s own
confusion, the student will come to better understand polynomials and functions, not only
over finite fields, but over other fields including R. The confusion here is not attributable
to characteristic; it comes down to a distinction between finite and infinite ground fields.
In a nutshell,
Over an infinite field, there are more functions than polynomials;
over a finite field, there are more polynomials than functions.
Let’s make sense of this synopsis.
If F is any field, then the set of all functions F → F is an algebra over F which we
may denote by F F . Sums, products and scalar multiples of functions are by pointwise
evaluation; thus for f, g ∈ F F , i.e. f, g : F → F , and scalars a, b ∈ F , the functions
f g : F → F and af + bg : F → F are defined by
(f g)(c) = f (c)g(c) and (af + bg)(c) = af (c) + bg(c) for all c ∈ F.
Now any polynomial f (x) ∈ F [x] yields a function f : F → F simply by evaluating the
polynomial at elements of F . But we must learn in general to distinguish the polynomial
f (x) ∈ F [x] from the resulting function F → F . When the field F is infinite, the func-
tions representable as polynomials (the so-called polynomial functions) form a proper
subalgebra of the algebra F F of all functions F → F . For example, the function R → R,
x 7→ sin x is not a polynomial function. One proves that over an infinite field F , the
polynomial functions form a proper subalgebra of F F , isomorphic to F [x]. This is not an
immediate consequence of the definitions; but we will outline a couple of ways to prove it
(below).
When the field F is finite, say |F | = q, then every function F → F is representable
by a polynomial, but in more than one way. Indeed, there are infinitely many polynomials
((q−1)q n distinct polynomials of degree n, for each degree n > 0; plus the zero polynomial)
but only finitely many functions (|F F | = q q in fact).
To make sense of this, observe that for an arbitrary field F (finite or infinite), every
polynomial f (x) ∈ F [x] can be used to represent a function f ∈ F F , i.e. f : F → F . This
gives a map which we shall denote by
(3.9) θ : F [x] → F F , f (x) 7→ f .

This map is not only F -linear, it also preserves products; so it is a homomorphism of


F -algebras (i.e. vector spaces over F which are also rings). (A little more in fact: since
θ(1) = 1, θ is a homomorphism of algebras with identity.) The image of θ is, by definition,
the subalgebra of all polynomial functions F → F . By the First Isomorphism Theorem
for Rings,
(3.10) F [x]/ ker θ ∼
= {polynomial functions F → F } ⊆ F F .

Now consider f (x) ∈ ker θ, i.e. f (a) = 0 for all a ∈ F . If a1 , a2 , . . . , an ∈ F are distinct,
then repeated application of the Division Algorithm shows that f (x) is divisible by (x −
a1 )(x − a2 ) · · · (x − an ) in F [x]; so either f (x) = 0 or deg f (x) > n. It follows that for
 3. Finite Fields

F infinite, ker θ = 0, i.e. θ is injective and so each polynomial can be identified with
the polynomial function that it represents. On the other hand if |F | = q, we see that
Q
ker θ ⊂ F [x] is the principal ideal generated by a∈F (x − a). This polynomial of degree
|F | = q must actually equal xq −x ∈ F [x] which is the unique monic polynomial of degree q
having all elements of F as roots, by Theorems 3.7 and 3.8. In summary, we obtain

Theorem 3.11. Let F be a field, and θ as in (3.9).


(a) If F is infinite, then θ is injective but not surjective; every polynomial f (x) can
be safely identified with the resulting function f = θ(f (x)).
(b) If |F | = q, then θ is surjective but not injective; F F ∼
= F [x]/(xq − x); and every
function F → F can be represented by infinitely many different polynomials, but
by a unique polynomial of degree less than q.

As an alternative to the argument above using the Division Algorithm, one can obtain the
isomorphism using Lagrange interpolation, which is described as follows.

Theorem 3.12. Let F be a field. Given distinct scalars a1 , a2 , . . . , an ∈ F , consider


the polynomials
Y x − aj
fi (x) = ∈ F [x], i ∈ {1, 2, . . . , n}
16j6n
ai − aj
j6=i

of degree n − 1. Then
(a) fi (aj ) = δi,j .
(b) f1 (x), f2 (x), . . . , fn (x) form a basis for the n-dimensional subspace of F [x] con-
sisting of all polynomials of degree < n.
(c) Given b1 , b2 , . . . , bn ∈ F (not necessarily distinct), the unique polynomial of
degree < n whose graph passes through the n points (a1 , b1 ), (a2 , b2 ), . . . , (an , bn )
∈ F 2 is a1 f1 (x) + a2 f2 (x) + · · · + an fn (x).

The proof of Theorem 3.12 is elementary; and the basis in 3.12(b) is the Lagrange inter-
polation basis. For |F | = q, it shows that every f : F → F is represented by a unique
polynomial of degree < q. This gives an alternative proof of Theorem 3.11.
A third proof of Theorem 3.11 is based on the observation that the problem of con-
structing a polynomial f (x) ∈ F [x] whose graph passes through n pairs (ai , bi ) as in
Theorem 3.12, is equivalent to solving a linear system of n equations in n unknowns. The
n × n coefficient matrix of this system is of Vandermonde form and is well known to be
invertible. This gives a unique interpolating polynomial.
A final word about obtaining functions from polynomials: Given a polynomial f (x) ∈
F [x], in addition to the function f : F → F represented by f (x), one also obtains a
function E → E for every extension field E ⊇ F . Thus for example, every f (x) ∈ Q[x]
3. Finite Fields 

naturally represents functions Q → Q, R → R, C → C, Q[i] → Q[i], etc. Returning to


the earlier example, although the distinct polynomials x6 − 1 and x − 1 in F3 [x] represent
the same function F3 → F3 , they represent distinct functions over F9 → F9 . Given two
polynomials f (x), g(x) ∈ F [x] where F is a finite field, we have that f (x) = g(x) iff the two
polynomials represent the same function F → F , where F ⊃ F is the algebraic closure.

Counting Irreducible Polynomials

Let F = Fq . Denote by nq,d (or simply nd , since q will usually remain unchanged
throughout our discussion) the number of monic irreducible polynomials f (x) ∈ F [x] of
degree d > 1. In the construction of finite fields, we make essential use of the fact that
nd > 1 for all d. Here we give a formula for nd , from which it follows that for a monic
polynomial f (x) ∈ F [x] of degree d chosen uniformly at random, the probability that f (x)
is irreducible is asymptotically d1 as d → ∞. The total number of elements in Fqk is
X
qk = dnd
d|k

since each α ∈ Fqk is algebraic of some degree d k; and the minimal polynomial of α
over F has d distinct roots in this extension. It follows (by induction on d, or by Möbius
inversion, or by inclusion-exclusion; cf. Section 1) that
k
X X
k
qd =

knk = µ d µ(d)q d .
d|k d|k

This may be rewritten as conclusion (ii) of:

Theorem 3.13. The number nk of monic irreducible polynomials f (x) ∈ Fq [x] of


degree k satisfies
X
(i) q d = knk ; and
k |d

1X k qk Y  1  qk
(ii) nk = µ(d)q d = 1 − (1− 1 )k ∼ as k → ∞.
k k q r k
d|k prime r |k

Notice, by the way, that this formula asserts that nk > 0 for all k > 1. (However, our
argument cannot be construed as proof of the existence of irreducible polynomials of every
degree, if one first assumes the existence of Fkq ∼
= Fq [x]/(m(x)) obtained using an irreducible
polynomial m(x) ∈ Fq [x] of degree k, as this would constitute circular reasoning. One
could however obtain the same formula by counting polynomials of each degree instead of
counting elements of each degree.)
 3. Finite Fields

The following result will be required in Sections 13 and 14. It is a fundamental result
in finite projective geometry (see [M4]); but in keeping with the scope of this course, here
we state and prove it in the affine setting.

Theorem 3.14 (Segre). Let f : F → F where F = Fq is a finite field of odd


order q. Then the following conditions are equivalent.
(i) There exist a, b, c ∈ F with a 6= 0 such that f (x) = ax2 + bx + c for all x ∈ F .
(ii) No three points of the graph of f (the point set Γf = {(x, f (x)) : x ∈ F } ⊂ F 2 )
are collinear.

Proof. Clearly (i) implies (ii). Points of intersection of the graphs of y = f (x) and
y = mx + k (m, k ∈ F ) are found by first solving ax2 + bx + c = mx + k for x; this has at
most two solutions for x ∈ F and hence at most two points of intersection (x, y).
Conversely, suppose f satisfies (ii). The following observation will be useful later.

(3.15) We may freely apply an affine linear transformation T : F 2 → F 2 of the


form T (x, y) = (αx+β, γy+δ) α, β, γ, δ ∈ F with αδ 6= 0 to Γf without
altering either the hypothesis (ii) or the desired conclusion (i). In other
words, f satisfies (i) (respectively, (ii)) iff the variant x 7→ γf x−β

α +δ
satisfies (i) (respectively, (ii)).

Each point P = (x0 , f (x0 )) lies on q−1 secant lines passing through the other points
(x, f (x)) of Γf , x 6= x0 ; and by (ii), these q−1 secants are necessarily distinct. Since there
are exactly q non-vertical lines through P (corresponding to the q choices from F for the
slope), each point P ∈ Γf lies on a unique tangent line, this being the unique non-vertical
line intersecting Γf in the unique point P . Now Γf has q tangent lines, and it is not hard
to see that any two tangent lines differ in slope. (Consider the tangent line ` through a
point P ∈ Γf , and let Q 6= P be another point of `. Since |Γf | = q is odd and every line
through Q meets Γf in 0, 1 or 2 points, there must be an even number of tangent lines
passing through Q (remembering that the vertical line through Q is not considered here
as a tangent line). Since Q already lies on `, each of the q−1 points Q 6= P on ` must
lie on at least one additional tangent line other than `. Since each of the q−1 tangent
lines `0 6= ` meets ` in a single point (since `0 and ` have different slope), the Pigeonhole
Principle shows that each point Q 6= P on ` lies on a unique tangent line other than P .
This verifies our claim that no point of the plane F 2 lies on more than two tangent lines.
At this point we prove the following fact, which is usually known as the Lemma of
Tangents.
3. Finite Fields 

(3.16) Let A 6= B be distinct points in Γf , and let M be


the point of intersection of their corresponding tangent
lines. Then the x-coordinate of M is the average of the
x-coordinates of A and B. (Stated geometrically, if we
complete the parallelogram AM BM 0 as shown, then
the diagonal M M 0 is vertical.)
Note that if A, M, B have coordinates (a1 , a2 ), (m1 , m2 ), (b1 , b2 ) respectively, then the
assertion of (3.16) is that m1 = a1 +b
2 . This formula makes sense only because char F is
1

odd (see Exercise #2 regarding the situation in even characteristic). Put another way, in
fields of even characteristic, no arithmetic progression can have more than two distinct
terms.
Proof of (3.16). By (3.15), there is no loss of generality in assuming that the points
A and B have coordinates (0, 0) and (1, 0) respectively. (The transformations in (3.15)
not only preserve the properties (i) and (ii); they also preserve the property described
by the conclusion of (3.16).) Denote by f 0 (x) the slope of the tangent line to Γf at the
point (x, f (x)). (Of course f 0 is simply a convenient name to use here; it is not intended
to connote differentiation. Note that f 0 (0) 6= 0 since the tangent line at A cannot pass
through B; and likewise, f 0 (1) 6= 0.) The points P (x, f (x)) ∈ Γf other than A, B are
indexed by the values x ∈ F , x 6= 0, 1; and each such point determines two secant lines
P A, P B with nonzero slopes f (x) f (x)
x and x−1 respectively. Now consider the product

Y slope of P A
Π=
P ∈Γ
slope of P B
f
P 6=A,B

having q−2 factors in the numerator, including all of the nonzero elements of F except
f 0 (0) (since only three lines through A have been omitted: the horizontal and vertical line,
and the tangent line). Similarly, the denominator of Π includes all of the nonzero elements
of F except f 0 (1) (the slope of the tangent line through B). After cancelling like factors,
0
we obtain Π = ff 0 (1) (0) . However, the preceding expressions for the slopes of P A and P B
give
Y slope of P A Y f (x)/x Y x−1
Π= = = = −1
P ∈Γ
slope of P B x∈F
f (x)/(x−1) x∈F
x
f
P 6=A,B x6=0,1 x6=0,1

since in the latter product, the only element of F × omitted in the numerator is −1,
whereas the denominator omits only 1 as a factor. Equating these two expressions for Π
yields f 0 (0) = −f 0 (1) = m for some m ∈ F × . This means that M is the point 12 , m

2 ,
which completes the proof of (3.16).
Now to complete the proof of Theorem 3.14, still assuming (ii) holds, we fix two points
A, B in Γf . We will henceforth assume (without loss of generality, using (3.15)) that these
 3. Finite Fields

are the points (−1, 1) and (1, 1) respectively; and that the point
M where their tangents meet is the point (0, −1). (Recall
that −1 6= 1 since q is odd. These coordinates differ from
the choices in our proof of (3.16); but our new choices benefit
from a different use of symmetry.) Consider an arbitrary point
P (x, f (x)) ∈ Γf distinct from A and B. Denote by Q and
R the points where the tangent at P meets the tangent lines
through A and B, respectively. By (3.15), these points have
coordinates Q = x−1 x+1
 
2 , u and R = 2 , v for some u, v ∈ F . Since A, M, Q are collinear,
we must have u = −x; and collinearity of B, M, R requires v = x. Finally, the collinearity
of P, Q, R requires f (x) = x2 .

We remark that in the projective setting, the three lines AR, BQ and P M all pass
through a common point O; and this assertion is the more general form of the Lemma
of Tangents. In affine coordinates as above, assuming x2 − x + 1 6= 0, one finds that
O = 2(xx(x+1) 1
 e
2 −x+1) , 2(x2 −x+1) . This is not a problem except when q = 3 and x = −1; or
when q ≡ 1 mod 6 and x is a primitive sixth root of unity in F (see Exercise #10.2). In
these cases O lies ‘at infinity’ and then the appropriate affine description is that the lines
AR, BQ and P M are mutually parallel. The traditional proof of Segre’s Theorem includes
the full version of the Lemma of Tangents, as is most natural in the projective setting.
But our proof above suffices because the complete statement of the Lemma of Tangents is
. . . er . . . tangential to the immediate goal of proving Segre’s Theorem. While (3.15) is
itself a special case of the Lemma of Tangents, this however is easily stated in affine form.
Since we will use of Segre’s Theorem in Section 13 to prove something about cyclotomic
fields, it is appropriate to question whether the geometric terminology of Segre’s Theorem
should be required to prove an essentially algebraic fact. Our personal view is that the
geometric language and figures used here provide a conceptual aid which is surely helpful
in following the proof. While the pictures might not be strictly necessary, the finiteness
of F (enabling us to use the Pigeonhole Principle) was indispensable.

Exercises 3.
1. (a) The proof of (3.16) involves the product of all q−1 nonzero elements of the finite field F = Fq .
Prove that this product equals −1. (Our proof did not require the explicit value of this
product, becauseQalmost all factors in the top and bottom of Π were cancelled.) Hint: For
each factor x in x6=0 x, the factor x−1 also appears.
(b) As a corollary, obtain Wilson’s Theorem: (p − 1)! ≡ −1 mod p for every prime p.
4. Cyclotomic Fields and Integers 

2. Let F be a finite field of even order q = 2e . (Keep in mind that for fields of even order, the map

a 7→ a2 is an automorphism, by Theorem 3.7. Its √ inverse, a 7→ √a, is√also an automorphism.
√ √ √
Thus every element has a unique square root; and a + b = a + b, ab = a b.) Consider
the function f : F → F defined by f (a) = a1 , if a 6= 0; f (0) = 0.
(a) Show that no three points of the graph of f are collinear. (Compare with Theorem 3.14(ii).)
(b) Show that the graph of f is represented by a quadratic polynomial (as in Theorem 3.14(i)) for
q ∈ {2, 4}, but not for q > 8. This shows the failure of Segre’s Theorem in even characteristic.

3. Consider an extension E ⊇ F of finite fields of odd order, and consider a nonzero element a ∈ E × .
Show that a is a square in E iff its norm NE/F (a) is a square in F . (‘Square in E’ means an
element of the form b2 , b ∈ E. ‘Square in F ’ means an element of the form b2 , b ∈ F .)

4. Let F be a finite field.


(a) Show that every subring of F is a subfield.
(a) Show that the conclusion in (a) does not hold without the hypothesis that the field F is
finite.
5. Factor each of the following cyclotomic polynomials into irreducible factors in F13 [t]:
(a) Φ4 (t) (b) Φ5 (t) (c) Φ6 (t) (d) Φ8 (t)

6. Let F = Fq be a finite field, and let n > 1.


(a) Show that every root of Φn (t) in F is a primitive nth root of unity in F .
(b) Show that Φn (t) has a root in F iff Φn (t) splits into linear factors in F [t], iff q ≡ 1 mod n.

7. Give explicit formulas for nq,d , the number of irreducible polynomials in Fq [x] of degree d ∈
{1, 2, 3, 4, 5}.

4. Cyclotomic Fields and Integers

Let n be a positive integer. As before, ζ = ζn is a primitive n-th root of unity in C, i.e.


an element of order n in C× . Thus ζ n = 1 and n is the smallest positive integer with this
property. There are φ(n) such primitive n-th roots of unity in C, these being the values ζ k
for 1 6 k 6 n, gcd(k, n) = 1. For most purposes (we will encounter some exceptions), the
primitive roots are interchangeable: it does not matter which of these values we take to
be ζ. The standard choice, however, is ζn = e2πi/n . The field E = Q[ζ] is a cyclotomic
field.
Recall that the automorphisms of E form the Galois group G = {σk : gcd(k, n) =

1} = (Z/nZ)× , the group of units of the ring of integers mod n. Here σk (ζ) = ζ k ; and the
map (Z/nZ)× → G, k 7→ σk is an isomorphism. In particular, σk σ` = σk` = σ` σk , and so
G is abelian.
An abelian extension is a Galois extension whose Galois group is abelian. So
cyclotomic extensions are abelian. More generally, every Galois extension F ⊇ Q contained
in a cyclotomic extension (i.e. E ⊇ F ⊇ Q) must also be abelian, by Galois theory (since
Aut F is a subgroup of an abelian group Aut E), and this simple fact has an important
converse: Every abelian extension of Q is contained in a cyclotomic field. This result,
 4. Cyclotomic Fields and Integers

known as the Kronecker-Weber Theorem, indicates clearly the very special nature of
cyclotomic extensions; but its proof is beyond the scope of our course. See [Wa], [L1] for
details.
An important consequence of the fact that cyclotomic extensions are abelian, is

Theorem 4.1. For every automorphism σ ∈ Aut E of a cyclotomic extension E =


Q[ζ], and every z ∈ E, we have σ(z) = σ(z).

Be warned that this is a special property not valid in most Galois extensions! See Examples
A5.8 and A5.10, for two instances where στ 6= τ σ; and τ is complex conjugation in each
of those examples.
Proof of Theorem 4.1. Denote by τ ∈ Aut E the complex conjugation map τ (z) = z.
Since Aut E is abelian, we have στ (z) = τ σ(z). (In fact, τ = σ−1 . So if σ = σk , then
στ = σ−k = τ σ.)

We should next proceed by describing the other important features of cyclotomic


fields: classify the ring of algebraic integers O ⊂ E, the unit group O× , some facts about
irreducibles and factorization in O, etc. All of this will require some work, and we will
have to take many of the key results on faith (with some references provided). We will
however be able to include proofs of several of the key results. Let’s start, however, with
some low-hanging fruit.
It is easy to see exactly how many roots of unity Q[ζ] has:

Theorem 4.2. The roots of unity in Q[ζn ] form a cyclic group hζn i of order n, if n
is even; or hζ2n i of order 2n if n is odd.

Proof. Without loss of generality, n is even; since for odd n, the field Q[ζn ] already con-
tains a primitive 2n-th root of unity −ζn . Now given a primitive m-th root of unity ζm in
Q[ζn ], we must show that ζm is a power of ζn , i.e. that m divides n. Suppose not; then there
exists a prime power pd , p prime, d > 1, dividing m but not dividing n. Without loss of
m/pd
generality m = pd ; otherwise replace ζm by ζm , a primitive pd -th root of unity in Q[ζn ].
Let n0 = lcm(m, n) = pd−k n where pk is the largest power of p dividing n, and d − k > 1.
Clearly ζm ζn is a primitive n0 -th root of unity in E = Q[ζn ], so that E ⊇ Q[ζn0 ] ⊇ Q and
φ(n0 )|φ(n). However, φ(n0 ) = pd−k φ(n) > φ(n) by Section 1, a contradiction.

Since Q[ζn ] = Q[ζ2n ] whenever n is odd (where we may take ζ2n = −ζn ), and in order to
avoid the exceptional alternative of Theorem 4.2, we will often want to assume that n is
even.
4. Cyclotomic Fields and Integers 

Since ζ = ζn is a root of unity, it is in particular an algebraic integer; and since the


algebraic integers form a subring O ⊂ Q[ζ], we must have Z[ζ] ⊆ O ⊂ Q[ζ]. Actually,
equality holds in general: O = Z[ζ], although we give the details only for prime values of n
(see Theorem 4.5).
Determining the full group of units O× takes a little more work. Clearly the roots of
unity (as in Theorem 4.2) form a subgroup of O× , namely the torsion subgroup of O× .
Some additional units are found as follows:

Theorem 4.3. Let ζ = ζn , n > 2; and suppose r, s ∈ Z with gcd(rs, n) = 1. Then


1−ζ r
1−ζ s is a unit.

Proof. Since r and s are relatively prime, ζ r and ζ s are primitive n-th roots of unity, so
each of them is a power of the other. (We must remark that ζ r 6= 1 and ζ s 6= 1 since
n > 2.) In particular, ζ s = (ζ r )k for some integer k. Thus

1 − ζ s = 1 − ζ kr = (1 − ζ r )(1 + ζ r + ζ 2r + · · · + ζ (k−1)r ) = (1 − ζ r )u

where u ∈ O. Similarly, 1 − ζ r = (1 − ζ s )v where v ∈ O. Since uv = 1, u and v are units


in O.

Theorem 4.4. For ζ = ζp , p prime, the element ε = 1 − ζ is irreducible of norm p.


The rational prime p ramifies as (p) = (ε)p−1 in O, the ring of integers in O.

Proof. Evaluating Φp (x) = (x−ζ)(x−ζ 2 )(x−ζ 3 ) · · · (x−ζ p−1 ) at 1 yields p = (1−ζ)(1−


ζ 2 ) · · · (1 − ζ p−1 ) = uεp−1 for some unit u ∈ O× , by Theorem 4.3. This yields the equality
of ideals (p) = (ε)p−1 in O. Since the factors 1 − ζ i (i = 1, 2, . . . , p−1) are the algebraic
conjugates of ε = 1 − ζ by Theorem 2.4, the norm of ε is NE/Q(ε) = p by Theorem A5.13.
If ε = ab where a, b ∈ O, then p = N(ε) = N(a) N(b) where N(a), N(b) ∈ Z, so without
loss of generality N(a) = ±1 and N(b) = ±p (otherwise interchange a and b). But then
Qp−1
±1 = N(a) = aa0 where a0 = k=2 σk (a) ∈ O, which shows that a ∈ O× . Thus ε is
irreducible in O.

We have mentioned the following result, found in [Wa, Theorem 2.6] with proof relying
on [L2, p.68]. We give the proof only in the important case that n is prime.

Theorem 4.5. For ζ = ζn , the ring of integers O ⊂ Q[ζ] is given by O = Z[ζ].


 4. Cyclotomic Fields and Integers

Proof of Theorem 4.5 for n = p prime. Let α = a0 + a1 ζ + a2 ζ 2 + · · · + ap−2 ζ p−2 ∈ O


where ai ∈ Q; we must show that each ai ∈ Z. We first show that no primes occur in
the denominators of the coefficients ai , other than possibly p. For each σk ∈ Aut E (in
the notation of Theorem 2.4, where k = 1, 2, . . . , p−1), evidently αk := σk (α) is also an
algebraic integer. This gives a linear system

α1 1 ζ ζ2 ··· ζ p−2 a0 a0
      
 α2  1 ζ2 ζ4 ··· ζ 2(p−2)  a1   a1 
α3  = 1 ζ3 ζ6 ··· ζ 3(p−2) a2 a2
      
   = M 
 ..  .
 . .. .. .. ..  ..   .. 
 . . . . . .
 .   . 
αp−1 1 ζ p−1 ζ 2(p−1) ··· ζ (p−2)(p−1) ap−2 ap−2

in which the coefficient matrix M is a Vandermonde matrix of order p−1. Its determinant,
as found by a well-known formula, is
Y
det M = (ζ j − ζ i ) = u(1 − ζ)k
16i<j6p−1

p−1
for some u ∈ O× by Theorem 4.3 and k =

2 > 0. Now

a0 α1
   
 a1   α 
 .  = M −1 .2 
 ..   .. 
ap−2 αp−1

so each ai = a0i /εk for some a0i ∈ O. Since p = εp−1 , there exists a positive integer m such
that pm ai ∈ O for all i ∈ {0, 1, 2, . . . , p−2}. However, pm ai ∈ Q. By Theorem A3.2(ii),
pm ai ∈ Z and so pm α ∈ Z[ζ] = Z[1−ζ] = Z[ε]; we may write

ε(p−1)m α = pm α = b0 + b1 ε + b2 ε2 + · · · + bp−2 εp−2 , bi ∈ Z.

We need to show that α ∈ Z[ζ] = Z[ε], so without loss of generality m > 1 and suppose at
least one of the bi is not divisible by p; we seek a contradiction. Let k ∈ {0, 1, 2, . . . , p−2}
Pp−2
be minimal such that bk 6≡ 0 mod p. Note that i=0 bi εi ∈ / (ε)k+1 since bk εk ∈
/ (ε)k+1 but
all other terms lie in (ε)k+1 . However, pm α ∈ (p) ⊆ (ε)k+1 , a contradiction as desired.

Discriminants and their use are discussed in Appendix A3. For the prime p = 2, note
that Z[ζ2 ] = Z[−1] = Z which has discriminant 1. The discriminant of Z[ζn ] for general
n > 2 is
nφ(n)
(−1)φ(n)/2 Q φ(n)/(p−1)
;
p|n p

see [Wa, Chapter 2].


4. Cyclotomic Fields and Integers 

Theorem 4.6. Suppose ζ = ζp , p an odd prime. Then the discriminant of O = Z[ζ]


p−1
is (−1) 2 pp−2 ; and the only rational prime that ramifies in E = Q[ζ] is p.

Proof. For k 6≡ 0 mod p, ζ k is a primitive p-th root of unity, and its algebraic conjugates
are ζ, ζ 2 , . . . , ζ p−1 so TrE/Q(ζ k ) = ζ + ζ 2 + · · · + ζ p−1 = −1; whereas if k ≡ 0 mod p, then
ζ k = 1 and TrE/Q(ζ k ) = 1+1+· · ·+1 = p−1. In view of the relation 1+ζ+ζ 2 +· · ·+ζ p−1 = 0,
any p − 1 of the elements 1, ζ, ζ 2 , . . . , ζ p−1 form a base for Z[ζ] over Z. It is convenient for
us to use ζ, ζ 2 , ζ 3 , . . . , ζ p−1 as our choice of base. As described in Appendix A3,

disc(ζ, ζ 2 , . . . , ζ p−1 ) = det TrE/Q(ζ i ζ j ) : 1 6 i, j 6 p−1


 

−1 −1 −1 · · · −1 −1
 
 −1 −1 −1 · · · −1 p−1 
 −1 −1 −1 · · · p−1 −1 
 

= det  .. .
.. .
.. . . .  (p−1) × (p − 1) matrix
 . .. .. .. 

−1 −1 p−1 · · · −1 −1
 
−1 p−1 −1 · · · −1 −1
−1 −1 −1 · · · −1 −1
 
 0 0 0 ··· 0 p 
 0 0 0 ··· p 0 
 
= det . . . . . .  = (−1) p−12 pp−2 .
 . . . . . .
 . . . . . . 

0 0 p ··· 0 0
 
0 p 0 ··· 0 0

Since the only rational prime dividing the discriminant is p, the result follows from Ap-
pendix A3 (the remarks about ramification following Theorem A3.10).

Example 4.7: √ The Cyclotomic Field Q[ζ3 ]. Let E = Q[ζ3 ]. Our primitive cube root of unity is
ζ3 = 12 (−1 + −3). By Theorem 4.6, E has discriminant −3. Since E ⊃ Q is an imaginary quadratic
extension, this is covered by Example A3.5 where the same value −3 is found for the discriminant.
See also Example A3.8 regarding this extension.

The maximal real subfield of Q[ζ] is the subfield R ∩ Q[ζ]. It is often denoted by
+
Q[ζ] .

Theorem 4.8. The maximal real subfield of E = Q[ζ], ζ = ζn , n > 3, is F =


Q[ζ] ∩ R = Q[α] where α = ζ+ζ −1 , an algebraic integer of degree 12 φ(n). The ring of
integers of F is Z[ζ] ∩ R = Z[α].
 4. Cyclotomic Fields and Integers

Proof. Let E = Q[ζ] and F = Q[α], α = ζ+ζ −1 . Since α = ζ+ζ ∈ R but ζ ∈ / R,


2 2
[E : F > 2. Since αζ = ζ +1, ζ is a root of f (x) = x −αx+1 ∈ F [x], so [E : F ] 6 2. This
forces [E : F ] = 2. Also since E ⊆ E ∩ R ⊂ E, [E ∩ R : F ] is a proper divisor of [E : F ] = 2,
which forces [E∩R : F ] = 1, so F = E∩R. Also φ(n) = [E : Q] = [E : F ][F : Q] = 2[F : Q],
so [Q[α] : Q] = [F : Q] = 21 φ(n). This shows, of course, that α is algebraic of degree 12 φ(n).
There is however a more direct interpretation of this value 12 φ(n) using two explicit
bases for Q[α] over Q. The obvious basis is {αk : 0 6 k < 12 φ(n)}. Another basis consists
of the algebraic integers
1, for k = 0;

αk = −k
ζ + ζ , for k = 1, 2, . . . , 12 φ(n) − 1.
k

Note that α1 = α; and for each k in the indicated interval, the binomial expansion gives

X k 
αk−2i , for k odd;




 i
 06i6 k−1

2
αk =  
 k X k 
+ αk−2i , for k even.



 k/2 i


k−2
06i6 2

Induction shows that {α0 , α1 , . . . , αk } and {α0 , α1 , α2 , . . . , αk } span the same Q-subspace
of Q[α] for each k. In fact by the relations above, the change-of-basis matrix between the
two bases is upper triangular. Indeed, since the change-of-basis matrix has integer entries
with 1’s on the main diagonal, it follows that {α0 , α1 , . . . , αk } and {α0 , α1 , α2 , . . . , αk }
span the same Z-submodule of Z[α] for each k.
Now suppose β ∈ Q[α] is an algebraic integer; we must show that β ∈ Z[α]. Simply
express
NP −1 NP−1
β = bk αk = b0 + bk (ζ k +ζ −k )
k=0 k=1
1
where bk ∈ Q for k = 0, 1, 2, . . . , N −1; N = 2 φ(n)−1. In order that β be an algebraic
integer, Theorem 4.5 requires each bk ∈ Z; but then the observations above indicate that
β is a Z-linear combination of α0 , α1 , α2 , . . . , αN −1 , i.e. β ∈ Z[α].

Example 4.9: The Maximal Real Subfield of Q[ζ7 ]. Let ζ = ζ7 , α = ζ+ζ −1 , α2 = ζ 2 +ζ −2 +2,
α3 = ζ 3 +ζ −3 + 3(ζ+ζ −1 ). We have
α3 +α2 −2α2 −1 = ζ 3 +ζ −3 +3(ζ+ζ −1 ) + ζ 2 +ζ −2 +2 − 2(ζ+ζ −1 ) − 1
= ζ 3 +ζ −3 + ζ 2 +ζ −2 + ζ + ζ −1 + 1 = 0
so α is a root of f (x) = x3 + x2 − 2x − 1 ∈ Z[x]. Since the degree of f (x) is 12 φ(7) = 3, this is
the minimal polynomial of α. Allowing ζ to vary over the primitive seventh roots of unity in C, we
obtain the roots of f (x) as
αk = e2kπi/7 +e−2kπi/7 = 2 cos 2kπ
7
, k = 1, 2, 3.

The maximal real subfield of Q[ζ] is Q[α]. Its ring of integers is


Z[α] = Z + Zα + Zα2 = Zα1 + Zα2 + Zα3 .
4. Cyclotomic Fields and Integers 

The values sin 2kπ


7
are not algebraic integers for k = 1, 2, 3 (see Exercise #1); but their ratios are
algebraic integers. For example,

sin 7 (ζ 2 − ζ −2 )/2i ζ −2 (ζ 4 − 1) 1 − ζ4

= = = ζ
sin 7
(ζ 3 − ζ −3 )/2i ζ −3 (ζ 6 − 1) 1 − ζ6
which is a unit, by Theorem 4.3.

We come now to a very important characterization of roots of unity. Let ζ ∈ C be a


root of unity, so that ζ n = 1 for some positive integer n. Then ζ is an algebraic integer
with |ζ| = 1. But more than this, every algebraic conjugate of ζ is also a root of unity and
so also has absolute value 1: that is, if σ is any field automorphism (of C, or of Q[ζ]) then
|σ(ζ)| = 1. The following result shows that roots of unity are the only numbers with this
property. Note that we strictly require algebraic integers here; for example 53 + 45 i is an
algebraic number, both of whose conjugates p have √ absolute
p value
√ 1; yet it is not a root of
unity. Moreover, the algebraic integer α = −1 + 2 + i 2 − 2 has |α| = 1, yet it is not
a root of unity.pThe minimal polynomial of α over Q is f (x) = x8+12x6+6x4+12x2+1, which
√ p √ 
also has β = 2 + 2 + 1 + 2 i as a root. There exists σ ∈ Aut C (or σ ∈ Aut E where
p √ p √
E is the splitting field of f (x)) such that σ(α) = β; and |β| = 2 + 2 + 1 + 2 6= 1.
We denote by I the ring of all algebraic integers in C.

Theorem 4.10. Let α ∈ I. Then α is a root of unity iff every algebraic conjugate
of α has absolute value 1.

Proof. (This argument is a fleshed-out version of [Wa, Lemma 1.6].) If α is a complex


root of unity, say αn = 1, then all algebraic conjugates of α are also complex n-th roots of
unity and they all have absolute value 1.
Conversely, suppose α ∈ I has minimal polynomial f (x) = (x − α1 )(x − α2 ) · · · (x −
αn ) ∈ Z[x] such that |α1 | = · · · = |αn | = 1. We may assume that α1 = α. Let K = Q(α),
so that [K : Q] = n; and consider the splitting field of f (x), namely E = Q(α1 , α2 , . . . , αn ),
this being the Galois closure of K. The field E has [E : Q] = n[E : K] automorphisms,
transitively permuting the n roots of f (x). We now introduce a family of polynomials
n
X
gr (x) = (x − α1r )(x − α2r ) · · · (x − αnr ) = br,k xn−k
k=0

indexed by r = 1, 2, 3, . . .. Each br,k is a symmetric polynomial in α1 , . . . , αn (see Appendix


A7) given by X
br,k = (−1)k αir1 αir2 · · · αirk ;
16ii <i2 <···<ik 6n
 4. Cyclotomic Fields and Integers

in particular br,0 = 1, br,1 = −α1r − α2r − · · · − αnr , and br,n = (−1)n α1r α2r · · · αnr . Since each
|αi | = 1, we obtain the bounds
 
n
br,0 = 1, |br,n | = 1, and |br,k | 6 for each k.
k

Each σ ∈ Aut E permutes α1 , . . . , αn and so must satisfy σ(br,k ) = br,k for all r, k. By
Galois theory, the fixed field of Aut E is Q; but clearly each br,k is an algebraic inte-
ger, so br,k ∈ I ∩ Q = Z by Theorem A3.2(ii). Thus gr (x) ∈ Z[x] for each r > 1. But
there are only 2 nk + 1 choices of integer br,k satisfying the bounds indicated above for


k = 1, 2, . . . , n−1 (also two choices of br,n = ±1, and the single choice br,0 = 1), hence at
Qn−1  
most N := 2 k=1 2 nk + 1 possibilities for each such polynomial gr (x). The set of all


distinct roots of all the polynomials in our family {g1 (x), g2 (x), g3 (x), . . .} is therefore a
set of cardinality at most nN ; and all distinct powers αr for r > 1 must lie in this finite
set. Evidently the powers αr cannot all be distinct, so α is a root of unity.

We can finally say something more about the group of units O× of E. By Dirichlet’s
Theorem A3.6, O× ∼ = {roots of unity} × Zr+s−1 where E has r embeddings in R and 2s
pairs of complex conjugate embeddings in C. By Theorem 4.2, we have a firm grasp on the
first factor, consisting of the roots of unity (the torsion subgroup of O× ). For the second
factor, the maximal free abelian subgroup of O, note that E := Q[ζn ] has no embeddings
in R for n > 2; and it has N := 12 φ(n) pairs (under complex conjugation) of embeddings
× ∼
in C. So OE = hζn i × ZN −1 , assuming n > 2 is even. Now the maximal real subfield
F := Q[α] ⊂ E, α = ζ + ζ −1 , has N real embeddings and no complex embeddings outside
of R. Its roots of unity are just ±1, so its group of units satisfies OF× ∼
= h−1i × ZN −1 . This
means that the subgroup OF× ⊆ OE × ×
has finite index [OE : OF× ] < ∞. In other words, if we
can identify the units in the maximal real subfield F , we shall have found ‘almost all’ the
units in E.
×
A full enumeration of the unit group OE is beyond the scope of this course; details
can be found in [Wa, Chapter 8]. But for the important case of n = p prime we have:

Theorem 4.11. Let E = Q[ζ], ζ = ζp , p an odd prime; and let F = Q[α] be its
×
maximal real subfield, α = ζ + ζ −1 . Then every unit u ∈ OE has the form u = ζ r u+
for some r ∈ {0, 1, 2, . . . , p−1} and some real unit u+ ∈ OF× .

× × ×
Proof. Since u ∈ OE , we have u ∈ OE . Let β = u/u, and note that β ∈ OE . By
Theorem 4.1,
u u
|σ(β)|2 = σ(β)σ(β) = σ(β)σ(β) = σ(ββ) = σ = σ(1) = 1
uu
4. Cyclotomic Fields and Integers 

for every σ ∈ Aut E, so |σ(β)| = 1. By Theorem 4.10, β ∈ E is a root of unity. By


Theorem 4.2, β = ±ζ k for some k ∈ Z.
Suppose first that β = −ζ k . Recall from Theorem 4.4 that the element ε = 1 − ζ is
irreducible of norm p. Clearly ζ ≡ 1 mod(ε). Expanding u ∈ O as u = a0 + a1 ζ + a2 ζ 2 +
· · · + ap−2 ζ p−2 , it follows that u ≡ a0 + a1 + a2 + · · · + ap−2 mod(ε) and u = a0 + a1 ζ p−1 +
a2 ζ p−2 + · · · + ap−2 ζ 2 ≡ a0 + a1 + a2 + · · · + ap−2 ≡ u mod(ε); so

2u = u − ζ k u ≡ u − u ≡ 0 mod(ε),

i.e. 2u = εv for some v ∈ OE . But then taking norms, 2p−1 NE/Q(u) = p NE/Q(v) where all
values of the norm are integers. But NE/Q(u) = 1 since u is a unit, so 2p−1 is divisible by
p, a contradiction.
Thus β = ζ k for some k ∈ Z/pZ. Since p is odd, we have k = 2r for some r ∈ Z/pZ.
Let u+ = ζ −r u, so that u+ is a unit. Also u2+ = ζ −2r u2 = ζ −2r u·uζ k = uu = |u|2 , which
is a positive real number; so u+ ∈ R. This means that u+ ∈ E ∩ R = F , the maximal real
subfield of E. Moreover, u = ζ r u+ as required.

Of course, we will not find these observations to be of much use in identifying the
group of units of E, unless we can first find the group of units of F . Fortunately, however,
the units of Theorem 4.3 are sufficient to generate almost the full group of units (meaning
r
a subgroup of finite index in the full group of units). Although the ratios 1−ζ 1−ζ s are not
generally real, they do provide generators of OF× after factoring out roots of unity; see
Example 4.9. More explicitly, let n > 2, and for convenience denote by ζ2n a primitive
2
2n-th root of unity satisfying ζ2n = ζn . Also let r, s be integers relatively prime to n. Then
Theorem 4.3 gives us units in E of the form
−r 2rπ r−s sin rπ
1 − ζnr 2r
1 − ζ2n r
ζ2n r
(ζ2n − ζ2n ) r−s sin 2n n ×
= = = ζ = ζ sπ ∈ OE ,
2
2s −s 2n 2sπ n
1 − ζns 1 − ζ2n s s
ζ2n (ζ2n − ζ2n ) sin 2n
sin n

whence also real units sin(rπ/n) ×


sin(sπ/n) ∈ OF . Note here that if n is even, then both r and s are
odd (as they are relatively prime to n), so the exponent r−s 2 is an integer; whereas if n is
r−s
odd, then 2 is invertible mod n so 2 is again well-defined in Z/nZ. Now if we fix s (e.g.
s = 1) and vary r over the remaining N − 1 choices of integers < n2 relatively prime to n,
we obtain generators for a large subgroup of OF× . (As usual, ‘large’ means having finite
index in the full group of real units OF× .)
At this point, we must disclose two difficulties: one of a mathematical nature, and
the other strictly terminological. The units constructed above do not suffice to generate
the full group of units. The term ‘cyclotomic unit’, which arguably should refer to a
unit in the cyclotomic field (just as one speaks of a cyclotomic number, cyclotomic integer,
etc. for elements of Q[ζ], Z[ζ], and related concepts that we have not mentioned in these
r
notes) has instead come to mean one of the units specifically of the form 1−ζ1−ζ s in E, or of
 4. Cyclotomic Fields and Integers

the form sin(rπ/n)


sin(sπ/n) in F . Determining the (finite) index of the subgroup of O
×
generated
by these ‘cyclotomic units’ in either case, is a difficult computational problem in general,
and is in fact related to the problem of computing class numbers. Computational software
gives us the answer for specific values of n of modest size; but general formulas tend to
rely on analytic methods for which computations can require great numerical finesse.
Before closing this Section, we briefly discuss factorization in E = Q[ζn ]. The fields
Q[ζn ] have unique factorization for n 6 22 and for finitely many larger values of n; see
[Wa, Chapter 11]. The primes p for which Q[ζp ] has unique factorization, are known to be
exactly the primes p 6 19. Uniqueness of factorization for n = 3, 4 is easily shown using
the Euclidean property; see Corollary A3.16.

Example 4.12: Q[ζ23 ] is neither a PID nor a UFD. The field E = Q[ζ23 ] does not have unique
factorization. To see this, by Theorem √ A3.14 it suffices to find an ideal in OE = Z[ζ], ζ = ζ23 ,
which is√not principal. First note that −23 ∈ E, which follows √ from Theorem 10.13. The subfield
F = Q[ −23] ⊂ E has as its ring of integers OF = Z[θ], θ = 12 (1+ −23) by Example A3.5. Consider
the ideals p = (2, θ) ⊂ OF (note: p = 2OF +θOF ) and p = (2, θ). We have 2 = (−2)2 + θθ ∈ pp, and
the reverse containment follows from (2a+bθ)(2c+dθ) = 2(2ac+adθ+bcθ+3bd) ∈ (2). Thus (2) = pp.
Now the norm map on OF is defined by N(a+bθ) = (a+bθ)(a+bθ) = a2 + ab + 6b2 . In particular,
N(p) N(p) = N(2) = 4 and since N(p) = N(p) by algebraic conjugation, we must have N(p)=N(p)=2.
Evidently the ideals p and p are nonprincipal; for example if p = (a+bθ), a, b ∈ Z, then a2 +ab+6b2 =
N(p) = 2; but this has no solution in integers. (If 2 = a2 +ab+6b2 = (a + 2b )2 + 23 4
b2 > 23
4
b2 , then we
2
must have b = 0; but then a = 2, a contradiction.) Now the extension E ⊃ F is Galois of degree
1
2
φ(23) = 11, with Galois group G = G(E/F ) = {ι, σ, σ 2 , . . . , σ 10 }. The ideal pOE ⊂ OE has prime
factorization of the form either
(i) pOE = P prime, NE/F(P) = 211 , σ(P) = P; or
(ii) pOE = Pσ(P)σ 2 (P) · · · σ 10 (P), NE/F(σ i (P)) = 2. We do not require (or assume) that the eleven
prime factors are distinct.
Suppose that P = πOE is principal. In case (i) this yields π ∈ F and p = P ∩ OF = πOF , a principal
ideal in OF , a contradiction. In case (ii),
10
Q 10
Q 
pOE = σ(P) = σ(π) OE = NE/F(π)OE
i=0 i=0
where NE/F(π) ∈ F ; but then p = NE/F(π)OE ∩ OF = NE/F(π)OF is principal, a contradiction.

Exercises 4.
1. Let ζ = ζ2n , n > 3.
kπ Qn−1 kπ n
(a) Using the relation 2i sin n
= ζ k − ζ −k = ζ k (1 − ζ −2k ), show that k=1 sin n
= 2n−1
.

(b) Use (a) to show that sin n cannot always be an algebraic integer.

(c) Show that none of the values sin 7
, k = 1, 2, . . . , 6, are algebraic integers.

2. In the notation of Example 4.12, consider the nonprincipal prime ideal p = (2, θ) ⊂ OF . Show
that p3 ⊂ OF is principal. (Hint: Verify that 2−θ = 23 +22 θ +θ3 ∈ p3 . Argue that (2−θ) ⊆ p3 ;
then compare norms on both sides to obtain equality.)
3. Recall that 1 − ζp ∈ Z[ζp ] is irreducible for p prime (Theorem 4.4). The condition that p is prime
is strictly necessary here. Show, for example, that if ζ = ζ15 , then 1 − ζ is a unit in Z[ζ]. State
and prove a generalization of this fact.
5. Fermat’s Last Theorem 

5. Fermat’s Last Theorem

Fermat’s Last Theorem (FLT) is, of course, the statement that the equation xn +y n =
z n has no positive integer solutions for exponent n > 2. While interest in FLT has had
a profound impact on modern mathematics, we shall have nothing to say here about the
recognized proof of this theorem due to Wiles and others, since this involves a great many
topics beyond the scope of our course. Our purpose is rather to say something about the
role of cyclotomic fields in the earliest work on FLT. One might justifiably say that the
early development of the theory of cyclotomic fields was largely motivated by FLT; but by
the time we came to accept that other tools would be required to resolve FLT, the theory
and applications of cyclotomic fields have grown far beyond their original confines.
The role of cyclotomic fields in studying FLT is evident already in the smallest case
n = 3. Fermat claimed to have settled this case in his correspondence, although there is no
surviving copy of his proof. We presume it was along the lines of Euler’s later proof of 1770,
which is essentially the proof we give below. There is a surviving copy of Fermat’s proof
for the exponent n = 4, also using his ‘method of infinite descent’. While the case n = 3
is a little more involved, the case of prime exponent is more typical in its of cyclotomic
fields; so it is this case we have chosen to highlight here.

Theorem 5.1. The equation x3 + y 3 = z 3 has no solution in positive integers.

Proof. After replacing the integer z by −z, the equation to be solved takes the more
symmetrical form x3 + y 3 + z 3 = 0; here we may suppose there is a solution in nonzero
integers, and seek a contradiction. But it will be easier to prove a seemingly stronger
statement. Let O = Z[ω], the ring of Eisenstein integers, where ω = ζ3 . The ring O is
a UFD; see Corollary A3.16. Its group of units is the finite group of sixth roots of unity
O× = {±1, ±ω, ±ω 2 }; see Example A3.8. Now we suppose that

(5.2) x3 + y 3 + uz 3 = 0 for some nonzero x, y, z ∈ O and u ∈ O× ;

and we seek a contradiction. The contradiction to which we are led, is that any solution
of (5.2) leads to a smaller solution of (5.2) (this is Fermat’s ‘method of descent’). Here
‘smaller’ is in terms of the norms; and since (absolute values of) norms lie in N, this leads
to an infinite descending sequence in N, hence a contradiction. The advantage of (5.2) over
the original equation, is that at the descent step, we have a stronger inductive hypothesis
available (as well as a stronger conclusion to fulfill). Put another way, we may further
suppose that our solution of (5.2) is as small as possible, and obtain from this an outright
contradiction. In particular,

(5.3) no irreducible element π ∈ O divides any two of x, y, z;


 5. Fermat’s Last Theorem

otherwise π divides the third member of the triple and then dividing by the common
factor π, we would obtain a smaller nonzero solution of (5.2).
A typical element z = a + bω ∈ O (a, b ∈ Z) has norm N(z) = zz = a2 − ab + b2 . The
irreducible element ε = 1 − ω has norm N(ε) = 3; and the rational prime 3 ramifies in O
as (3) = (ε)2 (see Theorem 4.4). Indeed, ε2 = 1 − 2ω + ω 2 = −3ω and (ε2 ) = (3) since −ω
is a unit.
Now it is profitable to consider possibilities for each of the variables x, y, z mod ε. We
have O/(ε) ∼= F3 . This follows directly from the formula N((ε)) = |N(ε)| = 3; but one can
also note that every x ∈ O can be expressed uniquely as x = a + bε; so x ≡ a ≡ 0, 1 or
2 mod ε (recall that 3 ≡ 0 mod ε). If x = 1 + bε then

x3 = 1 + 3bε + 3b2 ε2 + ε3 = 1 − 3(b3 − b)ε − 9b2 (b + 1)ω ≡ 1 mod 9

since b3 −b = (b−1)b(b+1) ≡ 0 mod 3 (this being a product of three consecutive integers).


Replacing x by −x shows that x3 ≡ −1 mod 9 also whenever x ≡ −1 mod ε. The third
case x ≡ 0 mod ε is even easier. Thus
1 1
( ) ( )
3
(5.4) for x ∈ O, we have x ≡ 0 mod 9 according as x ≡ 0 mod ε.
−1 −1

Now consider (5.2) mod 9 and apply (5.4) to see that x, y, z must be congruent to 0, 1, −1
mod ε (in some order). Moreover if z ≡ ±1 mod ε then we must have u = ±1; but in this
case we may exchange z with either x or y to get x ≡ 1 mod ε, y ≡ −1 mod ε, z ≡ 0 mod ε.
Now (5.4) gives x3 + y 3 ≡ 0 mod 9, so z 3 ≡ 0 mod 9, i.e. z 3 ≡ 0 mod ε4 . Thus

(5.5) x ≡ 1 mod ε; y ≡ −1 mod ε; z ≡ 0 mod ε2 . We may write z = εk z 0 where


k > 2 and z 0 ∈ O, z 0 6≡ 0 mod ε.

Now compare irreducible factors on both sides of

(5.6) (x + y)(x + ωy)(x + ω 2 y) = −uε3k (z 0 )3 .

If an irreducible π ∈ O divides both x + y and x + ωy, then it divides their difference


which is (1 − ω)y = εy. However, π cannot divide y; otherwise π also divides x, contrary
to (5.3). The same argument, applied to the other parenthetical factors on the left side
of (5.6), shows that the gcd of any two of these factors is either 1 or ε. But x + ω jy ≡
x + y ≡ 1 − 1 ≡ 0 mod ε for all j ∈ {0, 1, 2}. It follows that

(5.7) the factors x+y, x+ωy, x+ω 2y are, in some order, equal to u1 εα13 , u2 εα23
and u3 ε3k−2 α33 where ui ∈ O× and αi ∈ O are not divisible by ε.

Without loss of generality, it is the third factor x + ω 2y that is divisible by ε3k−2 ; other-
wise replace y by ω jy for some j, thereby cycling the three factors while preserving the
5. Fermat’s Last Theorem 

conditions (5.5). Now ui ∈ {±1, ±ω, ±ω 2 }; but without loss of generality, ui ∈ hωi, since
any ‘−’ signs can be absorbed into αi . Therefore we may assume

(5.8) x + y = ω j1 εα13 , x + ωy = ω j2 εα23 , x + ω 2 y = ω j3 ε3k−2 α33 .

Adding the three expressions in (5.8) gives

−ω 2ε2 x = 3x = ω j1 εα13 + ω j2 εα23 + ω j3 ε3k−2 α33 ;


−ω 2εx = ω j1 α13 + ω j2 α23 + ω j3 ε3k−3 α33 .

Since k > 2, x ≡ 1 mod ε and ω j ≡ 1 − jε mod 3, this gives

−ε ≡ (1 − j1 ε)α13 + (1 − j2 ε)α23 mod 3

and since αi ≡ ±1 mod ε, (5.4) gives αi3 ≡ ±1 mod 9. Evidently α13 ≡ 1 and α23 ≡
−1 mod 9 (or we reverse the roles of α1 and α2 so that this is the case) and j2 ≡ j1−1 mod 3.
Now

0 = (x + y) + ω(x + ωy) + ω 2(x + ω 2y) = ω j1 εα13 + ω j2 +1 εα23 + ω j3 +2 ε3k−2 α33

and so

(5.9) 0 = α13 + α23 + ω j3 −j1 +2 (εk−1 α3 )3 .

Thus (α1 , α2 , εk−1 α3 ) solves the same equation as (x, y, z) in (5.2). By the remarks fol-
lowing (5.2), it suffices to show that the new solution is smaller in the sense of norm, than
the original solution; specifically, we show that

(5.10) 0 < |N(α1 α2 εk−1 α3 )| < |N(xyz)|.

To prove (5.10), observe that

−uz 3 = x3 + y 3 = (x + y)(x + ωy)(x + ω 2y) = (ω j1 εα13 )(ω j2 εα23 )(ω j3 ε3k−2 α33 )
= ω j1 +j2 +j3 ε3 (α1 α2 εk−1 α3 )3 .

Taking norms of both sides in the extension Q[ω] ⊃ Q gives N(z)3 = ±27 N(α1 α2 εk−1 α3 )3
and so
0 < 3|N(α1 α2 εk−1 α3 )| = |N(z)| 6 |N(xyz)|
and (5.10) follows, completing the proof of Theorem 5.1.

Now consider the general case of FLT. Because Fermat himself proved his conjecture
for n = 4, clearly we can confine our attention to the case of prime exponent p > 3. So
 5. Fermat’s Last Theorem

in the following, we assume p is an odd prime; and we further suppose xp + y p = z p with


x, y, z positive integers, hoping to obtain thereby a contradiction.
We can assume any two of x, y, z are relatively prime; otherwise all three have a
common factor which can then be divided out to obtain a smaller solution in positive
integers. Next, one observes that in Z[ζp ] we have the factorization

p−1
Y
p p
(5.11) x +y = (x + ζ i y) = z p .
i=0

This relation clearly begs us to compare prime factors on both sides, expecting to find that
apart from units and powers of ε = 1 − ζ, the factors x + ζ i y are pairwise relatively prime
pth powers. This is the key idea in the proof of Theorem 1.1; and it figures prominently
also for the classification of primitive Pythagorean triples (the case of exponent n = 2).
This plan, despite its merits, runs into difficulty in Z[ζp ], where unique factorization does
not hold in general. Some famous early attempts to prove FLT (including some by the
best mathematicians of the 19th century) foundered precisely by assuming Q[ζ] to be a
UFD in general. It is natural to speculate that Fermat himself fell prey to this fallacy,
although presumably we will never know this. It was largely to repair this defect that the
concept of ‘ideal’ was introduced (so named by Ernst Kummer whose work on cyclotomic
fields, together with Sophie Germain’s contributions, led the progress toward FLT during
the 19th century).
Experience has also shown that it is profitable to approach (5.11) in two separate
cases: (i) none of x, y, z are divisible by p; or (ii) exactly one of x, y, z is divisible by p.
Tradition refers to these two cases as the first case and the second case of FLT. The
following result concerns the first case; for the counterpart of this result in the second
case, also using the theory of cyclotomic fields, see [Wa, Chapter 9]. Refer to Appendix A3
regarding the class number of an extension.

Theorem 5.12. Suppose that p > 2 is a prime for which the class number of Z[ζ],
ζ =ζp , is not divisible by p. Then the equation xp +y p = z p has no solution in integers
x, y, z relatively prime to p.

Proof. The cases p = 3, 5 are easily disposed of, even without Theorem 1.1. For p = 3,
we have z 3 ≡ ±1 mod 9 whenever gcd(z, 3) = 1; and by this same fact, x3 + y 3 ≡ 0 or
±2 mod 9. So there are no solutions for p = 3 in the first case. Exactly the same reasoning
works for p = 5, by considering Fermat’s equation mod 25. Thus we may assume p > 7.
Let O = Z[ζ], ζ = ζp . Recall that the element ε = 1−ζ ∈ O is irreducible and Z∩εO = pZ;
see Theorem 4.4.
We claim that the ideals (x+ζ i y) ⊆ O are pairwise relatively prime for i = 0, 1, 2, . . . ,
p−1. Suppose that on the contrary, (x + ζ i y) and (x + ζ j y) have a common prime factor
P ⊂ O = Z[ζ], 0 6 i < j < p. Then (1 − ζ j−i )y = ζ −i [(x + ζ i y) − (x + ζ j y)] ∈ P; so by
5. Fermat’s Last Theorem 

Theorem 4.3, either P = (1−ζ) = (ε) or y ∈ P. Similarly, (1−ζ j−i )x = (x+ζ j y)−ζ j−i (x+
ζ i y) ∈ P, so either P = (ε) or x ∈ P. We cannot have both x and y in P, otherwise
x, y are both divisible by the prime p0 satisfying p0 Z = P ∩ Z. Thus P = (ε) and z n is
divisible by (x + ζ i y) − (x + ζ j y) = ζ i (1 − ζ j−i )y which means that p z. This violates the
assumption that we are in the ‘first case’ of FLT. This proves our initial claim.
By considering prime ideal factors on both sides of (5.11), it follows easily that each
ideal (x + ζ i y) is itself the p-th power of an ideal: (x + ζ i y) = Bip for some ideal Bi ⊆ O.
Let h be the class number of O; so we have a principal ideal Bih = (ai ) for some ai ∈ O.
By hypothesis there exists a positive integers k, ` such that mp = kh + 1, so (x + ζ i y)m =
Bimp = Bikh+1 = (βi )k Bi . This implies that the ideal Bi ⊆ O is principal: Bi = (βi ),
βi ∈ O.
Now abbreviate B = B1 , β = β1 . We have (x + ζy) = B p = (β)p ; so by Theorem 4.11,
we have x+ζy = ζ k uβ p for some k ∈ {0, 1, 2, . . . , p−1} and a real unit u ∈ Z[ζ+ζ −1 ]. Also
Pp−2 Pp−2 Pp−2
writing β = i=0 bi ζ i , where bi ∈ Z, we have β p ≡ i=0 (bi ζ i )p ≡ i=0 bpi ≡ b mod p
for some b ∈ Z. Thus x + ζy ≡ ζ k ub mod p; and after complex conjugation, x + ζ −1 y ≡
ζ −k ub mod p. Since u is a unit and b an integer, we get

(5.13) x + ζy − ζ 2k x − ζ 2k−1 y = (x + ζy) − ζ −2k (x + ζ −1 y) ∈ pO.

If the powers 1, ζ, ζ 2k , ζ 2k−1 are distinct, (5.13) gives a contradiction, as we now show.
Any p − 1 of the distinct powers 1, ζ, ζ 2 , . . . , ζ p−1 form a basis for Q[ζ] over Q. Since
p > 7 (the case to which we reduced at the outset), we may choose such a basis containing
1, ζ, ζ 2k , ζ 2k−1 . Here we assume for the sake of argument that 1, ζ, ζ 2k , ζ 2k−1 are distinct
members of {1, ζ, ζ 2 , . . . , ζ p−2 }; and if this is not the case, the choice of basis can be
adapted accordingly. Then by (5.13) we have

x + ζy − ζ 2k x − ζ 2k−1 y = p(a0 + a1 ζ + a2 ζ 2 + · · · + ap−2 ζ p−2 )

for some a0 , a1 , a2 , . . . , ap−2 ∈ Z. This gives x, y ∈ pZ, contrary to hypothesis.


Now we deal with the few special cases not covered by the previous argument. Since
ζ 6= 1 and ζ 2k 6= ζ 2k−1 , these are the cases
(i) ζ 2k = 1. In this case, (5.13) reduces to ζ −1 (ζ 2 − 1)y ∈ pO. By Theorem 4.4 and the
fact that p > 2, this gives y ∈ pZ, contrary to hypothesis.
(ii) ζ = ζ 2k−1 . Here, (5.13) becomes (1 − ζ 2 )x ∈ pO, leading to a contradiction as in (i).
(iii) ζ 2k−1 = 1. Here, (5.13) becomes (1 − ζ)(x − y) ∈ pO, yielding x ≡ y mod p. We
may assume however that this congruence does not hold. If one replaces z by −z,
then Fermat’s equation takes the more symmetrical form xp + y p + z p = 0; and we
are required to prove that this equation has no solution in integers relatively prime
to p. Under these hypotheses, however, x, y, z cannot all be congruent mod p, as this
would entail 3xp ≡ 0 mod p, which cannot hold for p > 3 and p6 x. By symmetry, we
 5. Fermat’s Last Theorem

may therefore assume x 6≡ y mod p and move the third variable z to the right side of
Fermat’s equation.
This completes the proof of Theorem 5.12.

A prime p is called regular if the class number of Z[ζp ] is not divisible by p; otherwise
p is irregular. The first few irregular primes are 37, 59, 67, 101, 103, etc.; these are the
primes for which the hypotheses of Theorem 5.12 fail. Reasonable heuristics, backed by
computational evidence, support the conjecture that a proportion e−1/2 ≈ 61% of primes
are regular; so it might seem that the theory of cyclotomic fields solves FLT for ‘most’
prime exponents. However, it is not known that there are infinitely many regular primes.
(Curiously, the set of irregular primes is known to be infinite, despite being apparently
less dense than the sequence of regular primes.) Fortunately there is a test for regularity
of primes which is (at least conceptually) rather explicit.
The sequence of Bernoulli numbers B0 , B1 , B2 , B3 , . . . given by

1, − 21 , 1
6,
1
0, − 30 1
, 0, − 42 , ...

may be defined via the Taylor series expansion



x X Bn n
x
= x
e −1 n=0
n!
1 2 4 6 8 10
= 1 − 12 x + 1 1 1 1
12 x − 720 x + 30240 x − 1209600 x + 47900160 x
691
− 1307674368000 x12
14
+ 1
74724249600 x
3617
− 10670622842880000 x16 + 5109094217170944000
43867
x18 − · · · .

It is easily shown that Bn = 0 for even integers n > 4; and that the nonzero Bernoulli
numbers alternate in sign. Among the many uses of Bernoulli numbers, we mention
(i) an exact formula for certain special values of the Riemann zeta function:

22k−1 π 2k
ζ(2k) = (−1)k+1 B2k , k = 1, 2, 3, . . . ;
(2k)!

(ii) a formula expressing the sum of the kth powers of the first n positive integer as a
polynomial in n of degree k + 1:
k  
k k k 1 X k k k+1
1 +2 +3 +···+n = (−1) Bi nk+1−i .
k + 1 i=0 i

In the current context, the relevance of the Bernoulli numbers is that

(5.14) a prime p is regular iff none of the Bernoulli numbers Bk , for k = 2, 4,


6, . . . , p−3, have numerator divisible by p (when expressed as reduced
fractions).
6. Characters of Finite Abelian Groups 

For example, 691, 3617 and 43867 (which are primes) must be irregular; and the first
irregular prime, 37, divides the numerator of
7709321041217 37·683·305065927
B32 = − =− .
510 2·3·5·17

Exercises 5.
1. Find a positive integer n for which the ring of cyclotomic integers Z[ζn ] contains nonzero solutions
of x3 + y 3 = z 3 .
We say that p sharply divides n, denoted p ||n, if p divides n, but p2 does not divide n.

2. Let p be an odd prime, and let x and y be relatively prime nonzero integers with x + y 6= 0.
(a) Show that gcd(x + y, (xp + y p )/(x + y)) = 1 or p.
(b) If p divides x + y, show that p also divides xp + y p and p sharply divides (xp + y p )/(x + y).
Hint: Let u = x + y. Simplify the expression ((u − y)p + y p )/u after first expanding (u − y)p by
the Binomial Theorem. At some point you may also want to recall Fermat’s Little Theorem.
3. Let p be an odd prime; and let x, y, z be pairwise relatively prime nonzero integers satisfying
xp + y p + z p = 0. Recall that p divides at most one of x, y, z.
(a) If p6 | z, show that x + y = ap and (xp + y p )/(x + y) = Ap for some integers a, A. (Hint: Use
#2.)
(b) In the first case of FLT, p6 | xyz. Argue as in (a) to obtain x + z = bp , y + z = cp and
2x = ap + bp − cp .
(c) If p|z then we are in the second case of FLT, and (a) fails. Show that in this case, we instead
obtain x + y = pp−1 ap and (xp + y p )/(x + y) = pAp for some integers a, A.
A pair of odd primes {p, q} such that q = 2p + 1 is a Sophie Germain pair of primes. It is
conjectured that there are infinitely many such pairs of primes.
4. Suppose we have a pair of Sophie Germain primes {p, q}, q = 2p + 1. As in the first case of FLT,
suppose that x, y, z are pairwise relatively prime integers satisfying xp + y p + z p = 0 with p6 | xyz.
(a) If q6 | x, show that xp ≡ ±1 mod q. Hint: Theorem 3.4.
(b) Show that q |xyz. Hence without loss of generality, we assume q |x.
(c) Show that q |abc where a, b, c are as in #3.
(d) By considering in cases q |a, q |b or q |c, obtain a contradiction.
This solved the first case of FLT for many primes, and conjecturally an infinite class of primes. This
breakthrough of Sophie Germain was later generalized by Legendre, enabling all prime p < 100 to
be dealt with. Subsequent work of others in the 20th century, all based on Sophie Germain’s idea,
was able to prove the first case of FLT for an infinite class of primes (yet without proving there are
infinitely many Sophie Germain primes).

6. Characters of Finite Abelian Groups


Let G be a finite abelian group of order n. Here we consider G to be multiplicative, as
we are free to do, up to isomorphism. (The difficulty with additive groups is a purely
 6. Characters of Finite Abelian Groups

notational issue that arises when describing group rings. In Section 7, where we focus
on group rings, we will discuss the natural accommodations for dealing with additive
groups; but for now, there is no need to be distracted by this.) A character of G is a
homomorphism χ : G → C× ; thus χ(xy) = χ(x)χ(y) for all x, y ∈ G. (So χ is what would
be called a linear character in the larger world of group theory.) Since xn = 1 for all
x ∈ G (this being a special case of Lagrange’s Theorem), χ(x)n = χ(xn ) = χ(1) = 1 and
so all values of χ are complex roots of unity. In fact, all values of χ are complex m-th
roots of unity, where m is the exponent of G (this being the smallest positive integer m
such that xm = 1 for all x ∈ G). Note the exponent m of G divides n = |G|; and m = n
iff G is cyclic.
If χ, χ0 : G → C× are characters, then the product χχ0 : G → C× defined pointwise
by (χχ0 )(x) = χ(x)χ0 (x) is clearly a character. The trivial character (or principal
character) is the constant map G → {1} ⊂ C× . The multiplicative inverse of a character
χ is its complex conjugate χ(x) = χ(x)−1 (since the multiplicative inverse of every complex
root of unity is its complex conjugate). We see that the set of characters of G forms a
multiplicative group, which we call the dual group G.b

Theorem 6.1. Let G, G1 , G2 be finite abelian groups. Then


∼ c c
1 ×G2 = G1 ×G2 , and
(a) G\
b ∼
(b) G = G.

Proof. (a) Every ordered pair (χ1 , χ2 ) ∈ G


c1 ×G
c2 defines a map

χ1 ×χ2 : G1 ×G2 → C× , (g1 , g2 ) 7→ χ1 (g1 )χ2 (g2 ).

It is easy to check that this map is a homomorphism, so χ1 ×χ2 ∈ G\


1 ×G2 . Next, one
checks easily that the map

(6.2) c1 ×G
G c2 → G\
1 ×G2 , (χ1 , χ2 ) 7→ χ1 ×χ2

is a homomorphism. The kernel of (6.2) is trivial (for if χ1 ×χ2 (g1 , g2 ) = χ1 (g1 )χ2 (g2 ) = 1
for all (g1 , g2 ) ∈ G1 ×G2 , restricting to g1 = 1 or g2 = 1 yields both χ1 = 1 and χ2 = 1, the
trivial character on G1 and on G2 respectively). Finally, the map (6.2) is surjective. (For
if χ ∈ G\ 1 ×G2 , then restriction to the two factors gives homomorphisms χ1 (g1 ) = χ(g1 , 1)
and χ2 (g2 ) = χ(1, g2 ) and clearly χ1 ×χ2 = χ.) So the map (6.2) is an isomorphism, and
(a) follows.
For (b), we use the Fundamental Theorem of Finite Abelian Groups: Every finite
abelian group is isomorphic to a direct product of finite cyclic groups. Thus it suffices to
consider the case G is finite cyclic; and then (b) will follow in the general case using (a)
for the inductive step.
6. Characters of Finite Abelian Groups 

Suppose thus that G is cyclic of order n, with generator x. Every homomorphism χ :


G → C× is uniquely determined by its value χ(x) at the generator x, since χ(xj ) = χ(x)j .
One particular character is given by χ1 (x) = ζ where ζ = ζn = e2πi/n ; here χ1 (xj ) = ζ j .
b For given any homomorphism χ : G → C× , the
In this case we easily prove that hχ1 i = G.
value χ(x) is a complex root of unity, so χ(x) = ζ r = χ1 (x)r for some r ∈ {0, 1, 2, . . . , n−1},
whence χ = χr1 . Thus G b = hχ1 i which is cyclic of order n. By induction (on the number
of direct factors in G), (b) holds also in the general case.


=
The isomorphism G −→ G b is not canonical . For example if G = hxi is cyclic of

order 4, the dual group G b = hχ1 i is also cyclic of order 4 where χ1 (x) = i = −1.
However there is no algebraic property distinguishing i from −i (the other principal fourth
root of unity in C). There are the two isomorphisms G → G, b one mapping x 7→ χ1 and the
other mapping x 7→ χ1 ; and there is no way to distinguish one of these as the ‘preferred’
isomorphism. The situation is quite like what we find in linear algebra: If V is a finite-
dimensional vector space over a field F , then the dual space V ∗ is a vector space over F
having the same dimension as V , so we must have V ∗ ∼ = V ; however there is no canonical

= ∗
choice of vector space isomorphism V −→ V . Any choice of isomorphism requires that
we fix particular choices of ordered bases for V and for V ∗ .

Theorem 6.3. Given χ, χ0 ∈ G b where |G| = n, we have


n, if χ = χ0 ;
X 
(a) χ0 (g)χ(g) =
0, otherwise.
g∈G
Dually, given g, g 0 ∈ G, we have
n, if g = g 0 ;
X 
0
(b) χ(g )χ(g) =
0, otherwise.
χ∈Gb

Proof. Since |χ(g)| = 1, it is clear that g∈G |χ(g)|2 = n for χ ∈ G.


P b Now suppose that
χ0 6= χ in G, 0
P
g∈G χ (g)χ(g). Substituting g = uh where u ∈ G is fixed and
b and let S =
h varies over G,
X
0 0
X
0 χ0 (u)
S= χ (uh)χ(uh) = χ (u)χ(u) χ (h)χ(h) = S.
χ(u)
h∈G h∈G
0 0
Since χ 6= χ, there exists u ∈ G such that χ (u) 6= χ(u), and this forces S = 0. This
proves the orthogonality relations (a), which may be interpreted as saying that AA∗ = nI
where A is the n × n matrix with rows and columns indexed by G b and G respectively,

having (χ, g)-entry equal to χ(g); and A is the conjugate transpose of A. Equivalently,
the matrix √1n A is unitary, and so we also have A∗A = nI. This yields the second set of
orthogonality relations (b).
 6. Characters of Finite Abelian Groups

The group algebra C[G] is the set of all formal linear combinations of elements of
G with complex coefficients. This is a complex vector space of dimension n = |G| having
G as basis; but it is also a ring with multiplication defined as in G, as extended uniquely
to C[G] by the distributive law. To be explicit, let α, β ∈ C[G] be given by
X X
α= ag g, β= bg g
g∈G g∈G

where ag , bg ∈ C for all g ∈ G; then


X X X 
αβ = ax by xy = agh−1 bh g ∈ C[G]
x,y∈G g∈G h∈G

after substituting (x, y) = (gh−1 , h). Thus it is natural to consider the convolution of
two functions f1 , f2 : G → C; this is the function f1 ∗ f2 : G → C defined by
X
(f1 ∗ f2 )(g) = f1 (gh−1 )f2 (h).
h∈G

This works fine for a general finite group G, although here we only consider the case G is
abelian. Now if we define X
fb = f (g)g ∈ C[G]
g∈G

for any function f : G → C, then the identity

1 ∗ f2
fb1 fb2 = f\

holds in C[G].
We now consider three complex inner product spaces of dimension n = |G| = |G|,
b as
follows: The space L2 (G) consists of all functions G → C with inner product
X
[f1 , f2 ] = f1 (g)f2 (g).
g∈G

The space L2 (G) b → C with inner product


b consists of all functions G

1 X
[F1 , F2 ] = F1 (χ)F2 (χ).
n
χ∈G
b

The group algebra C[G] has inner product


X
[α, β] = ag bg
g∈G
6. Characters of Finite Abelian Groups 
P P
where α = g∈G ag g, β = g∈G bg g. We have a commutative diagram of vector space
isomorphisms
F
L2 (G)
.
...................................................................
L2. (G)
b
... .
...
... .............
.
...
... ....
.
.
... ...
...
... ...
ι ...
...
...
...
.
...
.. ι
b
... ...
...
.....
.............. ...
.. ...

C[G]
where ι(f ) = fb as defined above. Each g ∈ G yields a natural function g ∗ : G
b → C, namely
the evaluation g ∗ (χ) = χ(g); and the map b ι : G → L2 (G),b g 7→ g ∗ has a unique linear
2 b
ι : C[G] → L (G). The Fourier transform is the map F = b
extension to b ι ◦ ι : L2 (G) →
L2 (G)
b defined as follows: Given f : G → C, the map Ff : G b → C is given by
X
(Ff )(χ) = f (g)χ(g).
g∈G

All three of these vector space isomorphisms are in fact isometries. To see for example
that F is an isometry, let f1 , f2 : G → C, so that
1 X 1 X X
[Ff1 , Ff2 ] = (Ff1 )(χ)(Ff2 )(χ) = f1 (g)χ(g)f2 (h)χ(h)
n n
χ∈G
b χ∈Gb g,h∈G
1 X X
= f1 (g)f2 (h)δg,h n = f1 (g)f2 (g) = [f1 , f2 ]
n
g,h∈G g∈G

using Theorem 6.3(b). The identity fb1 fb2 = f\ 1 ∗ f2 shows that ι is not only an isometry,
2
but also an algebra isomorphism from L (G) (under convolution) to C[G]. All three edges
of our commutative triangle are algebra isomorphisms if we endow L2 (G) b with pointwise
multiplication: for F1 , F2 : G → C, (F1 F2 )(χ) = F1 (χ)F2 (χ). Now given f1 , f2 : G → C
b
and χ ∈ G,
b we have
X X X
F(f1 ∗ f2 )(χ) = f\1 ∗ f2 (g)χ(g) = f1 (gh−1 )f2 (h)χ(g) = f1 (x)f2 (y)χ(xy)
g∈G g,h∈G x,y∈G
X X
= f1 (x)χ(x) f2 (y)χ(y) = (Ff1 )(χ)(Ff2 )(χ)
x∈G y∈G

so that F(f1 ∗ f2 ) = (Ff1 )(Ff1 ) as required. We have proved

Theorem 6.4. The three algebras L2 (G) (with convolution), L2 (G) b (with pointwise
multiplication) and C[G] are isometrically isomorphic. In particular,
C[G] ∼
= Cn = C ⊕ C ⊕ ··· ⊕ C
| {z }
n times
where Cn has coordinatewise ring operations and the standard complex inner prod-
uct.
 6. Characters of Finite Abelian Groups

Expositions of this topic vary widely between different sources, both in the extent of
generality as well as in presentation style, depending on the author’s tastes; and I have
chosen the approach which best suits our particular theme and goals. Two directions in
which the Fourier transform generalizes are to the case G is nonabelian, and the case G is
infinite. When G is nonabelian, of course C[G] ∼ 6 Cn since the group algebra C[G] is no
=
longer commutative; and instead one finds that C[G] is isomorphic to a direct sum of full
matrix algebras. This is the realm of representation theory; see e.g. [Is], [Se]. Much of
this theory carries over in the infinite case, with more evident ease and success when G is a
compact topological group, particularly a Lie group; or an algebraic group. See e.g. [BD].
The group algebra formulation C[G] is not really suitable in the infinite case, and so this
member of the triangle is understandably absent, where one makes do with L2 (G) (as
an algebra under convolution) instead. See Exercise #5 for the standard example with
G = S1.
As an algebra, Cn has 2n idempotent elements (i.e. elements satisfying ε2 = ε);
these are the vectors having all components either 0 or 1. It also has exactly n primitive
idempotents, these being the standard basis vectors of Cn . (A primitive idempotent
is a nonzero idempotent which is not expressible as ε = ε1 + ε2 where ε1 and ε2 are
also nonzero idempotents.) Every idempotent is uniquely expressible as a sum of distinct
primitive idempotents. In view of the isomorphisms above, C[G] must also have a basis
consisting of n primitive idempotents.

1 1
P
Theorem 6.5. The primitive idempotents of C[G] are the elements nχb = n χ(g)g
g∈G
for χ ∈ G.
b

Proof. For χ, ψ ∈ G
b we have
X X X X 
1 1 b
χ
n n
b ψ = 1
n2 χ(x)ψ(y)xy = 1
n2 χ(gh−1 )ψ(h)g = 1
n2 χ(g) χ(h)ψ(h) g
x,y∈G g,h∈G g∈G h∈G
X
1
= n2 χ(g)δχ,ψ ng = δχ,ψ n1 χ
b
g∈G
using Theorem 6.3(a). These relations uniquely characterize the n primitive idempotents
of C[G] ∼
= Cn .

Spectra of Cayley Graphs and Digraphs


As an application, we determine the spectra of Cayley graphs and digraphs over finite
abelian groups. Although we do not consider loops or multiple edges, our presentation
easily adapts to this more general setting. As above, G is a finite multiplicative abelian
group. For an arbitrary subset S ⊆ G, consider the graph with vertex set G and edges
(x, xs) whenever x ∈ G and s ∈ S; thus (x, y) is an edge iff x−1 y ∈ S. This graph on
n = |G| vertices, with out-degree |S| for every vertex, is the Cayley digraph Γ = Γ(G, S).
6. Characters of Finite Abelian Groups 

We often require 1 ∈/ S; otherwise Γ has a loop at every vertex. If we want Γ to be an


ordinary graph, we would also require s−1 ∈ S whenever s ∈ S. Note that Γ is connected
(and in the directed case, strongly connected) iff hSi = G. The adjacency operator
P
A : C[G] → C[G] is the map α 7→ σα where σ = s∈S s ∈ C[G]. The spectrum (i.e.
multiset of eigenvalues) of Γ is really the spectrum of the operator A. (Note that the
matrix of A with respect to G, the standard basis of C[G], is the usual adjacency matrix
of the graph Γ.)

P
Theorem 6.6. The n eigenvalues of Γ are the values χ(σ) = s∈S χ(s) for χ ∈ G.
b

We observe that these eigenvalues all lie in the ring Z[ζm ] where m is the exponent of G
since, as we previously observed, all character values lie in this ring. Cayley graphs (and
digraphs) of nonabelian finite groups also have eigenvalues consisting of cyclotomic integers
since all character values lie in Z[ζm ] (although characters are defined somewhat differently,
and the eigenvalues are computed by a rather more subtle process. In the case Γ is an
ordinary graph (i.e. s−1 ∈ S whenever s ∈ S), it is not hard to see that Theorem 6.6 gives
real eigenvalues as expected: characters whose values extend outside {±1} occur in complex
conjugate pairs, so their net contribution to the sum in Theorem 6.6 yields real values. So
−1
in the case of ordinary Cayley graphs, the eigenvalues of Γ lie in Q[ζm ] ∩ R = Z[ζm +ζm ].
Proof of Theorem 6.6. We show that the primitive idempotents of C[G] form a basis
consisting of eigenvectors for A, with the indicated eigenvalues. In place of the primitive
idempotents n1 χ b, χ ∈ G.
b we may of course use the scalar multiples χ b For each χ ∈ G,b

X X XX X
Ab
χ= s χ(g)g = χ(s−1 x)x = χ(σ) χ(x)x = χ(σ)b
χ.
s∈S g∈G s∈S x∈G x∈G

So C[G] has a basis of eigenvectors for A, with corresponding eigenvalues χ(σ), χ ∈ G.


b
Now complex conjugation permutes the characters of G; so after reparameterizing, we ob-
tain exactly the eigenvalues claimed.

Error-Correcting Codes
A second application of group characters is to the theory of error-correcting codes. Let
F = Fq , and fix n > 1. The (Hamming) weight of a vector v = (v1 , v2 , . . . , vn ) ∈ F n ,
denoted by wt(v), is the number of nonzero coordinates in v; thus for example, the vector
(1, 0, 1, 1, 1, 0) ∈ F 6 has wt(v) = 4. A linear [n, k]-code is a k-dimensional subspace C 6
F n . Vectors in F n are words; and vectors in C are codewords. The weight distribution
of the code C is the sequence A0 , A1 , A2 , . . . , An where Ai = |{v ∈ C : wt(v) = i}|. The
minimum weight of C is the smallest d > 1 such that Ad 6= 0 (i.e. the smallest weight
of any nonzero codeword). By linearity, the minimum distance of C (the minimum
 6. Characters of Finite Abelian Groups

number of coordinates in which two distinct codewords differ) coincides with the minimum
weight d.
Let us briefly summarize the key concepts of the theory of error-correcting codes
(although this does injustice to such an extensive subject!) Words of length n are vectors
v ∈ F n whose coordinates are regarded as sequences of n letter symbols from some finite
alphabet (in this case F ) used in information exchange. We may view C as the row space
of a k×n matrix B of full rank over F . The matrix B is the generator matrix of the code.
(It is usually denoted G; but we reserve ‘G’ for groups.) Now C = {uB : u ∈ F k } where
we regard each vector u ∈ F k as a plaintext message, and uB ∈ C as its corresponding
codeword. The isomorphism F k → C is the encoding process. The purpose of such
encoding is to protect against loss of information due to a limited number of errors during
transmission between two parties (a ‘transmitter’ T and a ‘receiver’ R) in a noisy channel.
Before transmitting the message u ∈ F k , T first encodes it as v = uB ∈ C and sends this
codeword. If R correctly receives v, all is well. If R instead receives v 0 ∈ F n , a slightly
corrupted version of v, then R can hope to correctly deduce the original transmitted word
v (and thereby u) if v ∈ C is the unique codeword satisfying wt(v − v 0 ) 6 e where e is
sufficiently small. Indeed if e = b d−1
2 c, then for every word w ∈ C, there is at most one
v ∈ C satisfying wt(w−v) 6 e. In this case, C is an e-error correcting code of length n.
The goal is to construct codes of a given length n over an alphabet of given size q, with
the number of codewords q k as large as possible (thereby ensuring a large information
rate), yet with minimum distance d as large as possible (thereby maximizing the error-
correcting capability e). These constraints, however, compete against each other; and a
great deal of mathematics is used to look for the optimal code for a given set of parameters
n, q, etc. There are other design considerations as well, which we have not mentioned.
In particular, why must the alphabet be a finite field F , and the code a subspace of F n ?
For some parameter sets there are in fact nonlinear codes (non-subspaces) which perform
slightly better than any linear code with the same parameters; but these are harder to
design and to work with. Moreover any code, in order to be of practical value, must admit
efficient algorithms for both encoding and decoding. Given these constraints, linear codes
are generally the best bet for information exchange.
In addition to the generator matrix B introduced above, every [n, k]-code over F can
also be defined as the null space of an (n − k) × n matrix H over F :

C = {v ∈ F n : Hv T = 0}.

Here H is the parity check matrix of C; it has full rank n − k. The parity check matrix
H is useful for decoding: R uses it to check whether a word v is a valid codeword. The
vector HwT is the error syndrome of the word w ∈ F n . If HwT = 0, then w is a certified
codeword; if HwT 6= 0, then the syndrome may yield useful information in locating the
codeword closest to w.
6. Characters of Finite Abelian Groups 

Now the two matrices B and H play roles dual to each other: the row space of B is C,
while the row space of H is the dual code

C ⊥ = {v ∈ F n : wv T = 0 for all w ∈ C}.

Note that wv T is the usual ‘dot product’ of two row vectors v, w ∈ F n . While C is an
[n, k]-code with generator matrix B and parity check matrix H, the dual code C ⊥ is an
[n, n−k]-code with generator matrix H and parity check matrix B. There is also a relation-
ship between the weight distributions of these two codes (and in particular between their
minimum weights). Please note however that while C and C ⊥ must have complementary
dimensions, they are not in general complementary subspaces; see Example 6.7 below.
The relationship between the weight distribution of a code and its dual (Theorem 6.8)
is expressed most naturally in terms of their weight enumerators. For a code C as above,
the weight enumerator of C is the polynomial
X n
X
n−wt(v) wt(v)
AC (x, y) = x y = Ad xn−d y d ∈ Z[x, y].
v∈C d=0

Example 6.7: A Code and its Dual. Let q = 2, F = F2 , and let C < F 5 be the [5, 2]-code
spanned by 10111 and 01110. The dual code C ⊥ is the [5, 3]-code spanned by 10001, 01011 and
00110. Since
C = {00000, 10111, 01110, 11001}, C ⊥ = {00000, 10001, 01011, 00110, 11010, 10111, 01101, 11100}
we obtain AC (x, y) = x5 +2x2 y 3 +xy 4 , AC ⊥(x, y) = x5 +2x3 y 2 +4x2 y 3 +xy 4 . Following Theorem 6.8,
we obtain either weight enumerator from the other via
1 1
A (x+y, x−y) = AC ⊥(x, y);
4 C
A (x+y, x−y) = AC (x, y)
8 C⊥
which can be verified by direct computation. The code C has minimum distance 3, the best (largest)
possible for a [5, 2]-code over F2 . It is 1-error correcting. Note that the word 10111 lies in C ∩ C ⊥; so
although C and C ⊥ have complementary dimensions, they are not complementary subspaces.

Theorem 6.8 (MacWilliams). Let C be a linear [n, k]-code over F = Fq . Then


the weight enumerators of C and its dual are related by
1
AC ⊥(x, y) = A (x + (q−1)y, x − y).
qk C

Proof. Fix a nontrivial character χ of the additive group of F . Thus χ : F → hζp i


satisfies χ(a + b) = χ(a)χ(b). (Although we use an additive group F here rather than
multiplicative, this presents none of the notational difficulties alluded to earlier, since
the group algebra does not appear directly in this argument.) For all u ∈ F n , define
χ(uv T )xn−wt(v) y wt(v) . Then
P
g(u) =
v∈F n
 6. Characters of Finite Abelian Groups

χ(uv T )xn−wt(v) y wt(v)


P P P
g(u) =
u∈C u∈C v∈F n
P h P i
= χ(uv T ) xn−wt(v) y wt(v) = q k AC ⊥(x, y)
v∈F n u∈C

T
/ C ⊥ (for such vectors v, the dot
P
since the inner sum u∈C χ(uv ) = 0 whenever v ∈
product uv T yields each value of F the same number of times); whereas for v ∈ C ⊥, we get
a constant value χ(uv T ) = 1 for all q k choices of u ∈ C. Now for v = (v1 , v2 , . . . , vn ) ∈ F n ,
we have
1, if vi 6= 0;

wt(v) = wt(v1 ) + wt(v2 ) + · · · + wt(vn ) where wt(vi ) =
0, if vi = 0
and so
X
g(u) = χ(u1 v1 +u2 v2 + · · · +un vn )xn−wt(u1 )−wt(u2 )−···−wt(un ) y wt(u1 )+wt(u2 )+···+wt(un )
v1 ,v2 ,...,vn ∈F
X
= χ(u1 v1 )x1−wt(v1 ) y wt(v1 ) χ(u2 v2 )x1−wt(v2 ) y wt(v2 ) · · · χ(un vn )x1−wt(vn ) y wt(vn )
v1 ,v2 ,...,vn ∈F
Yn X
= χ(ui vi )x1−wt(vi ) y wt(vi ) .
i=1 vi ∈F

The innermost sum equals x + (q−1)y if ui = 0, or x − y if ui 6= 0. Thus


n−wt(u) wt(u)
g(u) = x + (q−1)y x−y .

Summing over u ∈ C gives AC (x + (q−1)y, x − y).

The Fast Fourier Transform


The Fast Fourier Transform (FFT) was known to Gauss at least as early as 1805
(predating Fourier, after whom the transform has been named). More recently, it was
rediscovered by many others, notably Cooley and Tukey (1965). The point is that the
Discrete Fourier Transform (DFT) over a large finite group, viewed as a square matrix,
may appear quite large, requiring extensive time (presumably by a computer) in its com-
putation. However due to the highly structured nature of this matrix, this computation
can be performed in fewer steps than one might at first suppose. It is this faster approach
to computing the DFT that accounts for the name FFT. The importance of this speedup is
due to the vast number of problems requiring DFT for their solution, and where computa-
tional time required would otherwise be expensive or prohibitive. We begin by describing
how the FFT works. We then give an application to fast multiplication for polynomials
and for integers.
6. Characters of Finite Abelian Groups 

Let G = Z/nZ (note: additive notation here). The dual group is G b = {χj : j ∈ G}
jk
where χj (k) = ζ . The Fourier transform of an arbitrary function f : G → C is F = Ff
where we abbreviate F (χj ) by F (j) to get

n−1
ζnjk f (k).
P P
F (j) = f (k)χj (k) =
k∈G k=0

The matrix of the Fourier transform is the matrix

Hn := ζnjk : 0 6 j, k < n


which goes by many names: Fourier matrix, character table of the cyclic group of order n,
generalized Sylvester matrix, etc. Of course it is also a particular Vandermonde matrix.
And we will meet this matrix in Section 16 as the most classical construction of complex
Hadamard matrix; hence our notation Hn for this matrix. A naive computer implemen-
tation of the Fourier transform entails multiplying Hn by a column vector in Cn . This
requires n2 product operations in C, plus several addition operations in C. Scalar addition
is much faster than scalar multiplication, so for simplicity we neglect them in saying

(6.9) the naive implementation of the Fourier transform F : L2 (G) → L2 (G)


b
requires n2 = 4m2 scalar multiplications.

Assume for the moment that G has even order, say n = 2m. To improve upon (6.9),
represent f and F = Ff by column vectors as
   
f (0) f (1)
hf i  f (2)   f (3) 
even f (4) f (5)
f ↔ where feven =  , fodd =  ;
   
fodd .. ..
. .
   
f (n−2) f (n−1)
and dually,
   
F (0) F (m)
h F i  F (1)   F (m+1) 
top F (2) F (m+2)
F ↔ where Ftop =  , Fbottom =  .
   
Fbottom .. ..
. .
   
F (m−1) F (2m−1)

Note that we have indexed entries of F in the usual way; but the coordinates of f have
been indexed differently, starting with the even coordinates (expressing the restriction of
f to the subgroup of G of index 2), followed by the odd coordinates (where f is restricted
to the other coset of that subgroup). The reason we say ‘dually’, and why this is the right
thing to do, is that the vectors Ftop and Fbottom list the values of F on cosets of the subgroup
hχm i of order 2 in G.
b With respect to this indexing of the rows and columns, the Fourier
transform is takes the form
 6. Characters of Finite Abelian Groups

Ftop = Hm feven + Dm Hm fodd ;


i hH 
h F
top m Dm Hm ih feven i
(6.10) = , i.e.
Fbottom Hm −Dm Hm fodd Fbottom = Hm feven − Dm Hm fodd

jk
 2 m−1
where Hm = ζm : j, k ∈ Z/mZ and Dm = diag(1, ζm , ζm , . . . , ζm ). Computing
2
Hm feven requires only m scalar multiplications, as does Hm fodd ; and then left-multiplication
by Dm requires an additional scalar multiplications. Once again, the faster operations of
scalar addition have been neglected here. Thus

(6.11) implementation of the Fourier transform F : L2 (G) → L2 (G)


b using (6.10)
requires only 2m2 + m scalar multiplications.

Note that the improvement from (6.9) to (6.11) is a reduction in execution time, almost by
a factor of two. Similar gains are found using an arbitrary small prime divisor p n in place
of the prime 2. Now note that the main step in implementing (6.10) is the application of
Hm , which can be similarly reduced to Hm0 where m = 2m0 , or m = pm0 using another
small prime p dividing m, to obtain further speedup. Assuming the original n is a product
of small primes, iterating this reduction significantly improves execution time. Notably,

(6.12) when n = 2k , using k iterations of the reduction described above, the


Fourier transform over G = Z/nZ requires only O(n log n) scalar multi-
plications as compared with O(n2 ) scalar multiplications using the direct
approach (6.9).

This is the idea of the FFT. Its applications are far too ubiquitous to be summarized here.
We content ourselves with describing two of the many applications of FFT.

Fast Polynomial Multiplication


Consider now the problem of multiplying two polynomials f (x), g(x) ∈ C[x]. Choose
n > 2 max{deg f (x), deg g(x)}, so that f (x)g(x) ∈ C[x] has degree less than n. For sim-
plicity we will take n to be the smallest power of 2 exceeding max{deg f (x), deg g(x)}. (We
could do better by finding another round number n > max{deg f (x), deg g(x)} divisible
mostly by small primes, but n = 2k is almost optimal; and to do better could never im-
prove our execution time by a factor > 2. Such an improvement would be small compared
to the improvement available using FFT with n = 2k .)
The naive implementation of polynomial multiplication of polynomial multiplication
f (x)g(x) requires O(n2 ) operations of scalar multiplication in C—we multiply each coeffi-
cient in f (x) by each coefficient in g(x). Again, we don’t worry about scalar additions.
To do better, we work with a multiplicative cyclic group G = {1, x, x2 , . . . , xn−1 } of
order n. Noting that C[G] ∼ = C[x]/(xn − 1), we work with the images of f (x) and g(x) in
this quotient ring. Because n was chosen large enough, the product f (x)g(x) as computed
6. Characters of Finite Abelian Groups 

in C[G] (reduced mod (xn − 1)) is the same as the answer in C[x]. The algorithm to
compute this product, improving upon the naive approach, is as follows.
(I) First compute fd (x) ∈ L2 (G).
b This is the Fourier transform of the sequence of
coefficients in f (x), requiring O(n log n) operations using FFT. Similarly compute
d ∈ L2 (G),
g(x) b which also requires O(n log n) operations.
(II) Multiply to obtain fd (x)g(x) b ∼
d in L2 (G) = Cn . This requires O(n) scalar multiplications
in C. Note that fd (x)g(x) \
d = f (x)g(x).

(III) Compute f (x)g(x) ∈ C[G] by applying b \


ι to f (x)g(x). This takes another O(n log n)
steps using FFT.
The total execution time to compute f (x)g(x) ∈ C[x] this way is O(3n log n + n) =
O(n log n) operations, as compared with O(n2 ) operations by the naive approach.
Pn−1
Some clarification: If f (x) = k=0 ak xk ∈ C[x] then the sequence of coefficients is
viewed as a function a : G → C, xk → ak . Now a ∈ L2 (G) and f (x) = ι(a) ∈ C[G]. Our
careful use of notation may appear overly pedantic to the casual observer; but it serves
to distinguish the function a (whose values are the coefficients in the polynomial f (x))
from the polynomial function f itself. The Fourier transform gives Fa = b (x) ∈
ι(f (x)) = fd
2 b
L (G). This completes the triangle

F F
a
...
........................................................................
fd
.
(x) L2 (G)
.
...................................................................
L2. (G)
b
.. ... ..
...
... ........... ... ...........
... ....
.
...
... ....
.
... . ... .
... ... ... ...
... ... ... ...
... ... in ... ...
ι ...
... .
...
..
ι ι ...
... .
...
..
ι
... ...
...
b ...
b
...
... .
.... ...
... .
....
.
............. .. .............. ..
... ...
.. ... .. ...

f (x) C[G]

We have described the final step (III) above as a Fourier transform. It is actually the
b given by F −1 : L2 (G)
inverse of the Fourier transform F : L2 (G) → L2 (G), b → L2 (G); but
b∼
since G = G (canonically), this is just the usual Fourier transform for the dual group G.b
−1 −1 ∗
The matrix expressing F as a linear transformation is Hn = Hn = Hn since Hn is
symmetric and unitary. Of course this is also a DFT; so it is efficiently computed using
FFT (but for the dual group, so its coefficients are the complex conjugates of those in the
FFT of step (I)).

Fast Integer Multiplication


We consider the computational complexity of multiplying two large integers exactly.
For sufficiently small integers, most programming languages handle this perfectly well using
the standard integer class of variables; for example 0, 1, 2, . . . , 4294967295 can be handled
quite well using 4-byte unsigned integers, for which multiplication is implemented directly
by processor hardware. Larger integers requiring exact multiplication (e.g. in cryptographic
applications) must be stored as arrays, and algorithms for fast multiplication are required.
 6. Characters of Finite Abelian Groups

Let M and N be two integers expressed in base b as M = a0 + a1 b + a2 b2 + · · · +


Pn
an−1 bn−1 = f (b) where f (x) = i=0 ai xi ∈ Z[x] and ai ∈ {0, 1, 2, . . . , b−1}; and similarly
N = g(b). We take n sufficiently large that each of the numbers M , N and M N is
expressible using at most n digits in base b notation. One first thinks b = 2 for binary or
b = 10 for decimal; but it is better to use b = 216 or some such value in order to make
best use of the hardware capabilities of the processor (see the comments below). Naive
implementation requires O(n2 ) basic operations (not counting additions) to evaluate M N ,
since we multiply each digit in M by each digit in N .
In order to do better, we first evaluate the product of the two polynomials f (x)g(x) in
O(n log n) steps using Fast Polynomial Multiplication as described above. After obtaining
f (x)g(x) ∈ Z[x], evaluate at x = b and perform the necessary ‘carries’ (required whenever
coefficients exceed the base b). The execution time required for this reduction is small
compared to the Fast Polynomial Multiplication. Overall, we are able to perform Fast
Integer Multiplication of n-digit integers in O(n log n) time steps.
Of course we have overlooked many details of the implementation. In particular one
notes that the base b should actually be chosen somewhat smaller than the size of the native
integer type of the processor, since coefficients of f (x)g(x) ∈ Z[x] will require slightly more
than twice as many digits as coefficients in the original polynomials.

Exercises 6.
1. (a) Prove that for any finite group G, convolution of functions G → C is associative; that is,
(f1 ∗ f2 ) ∗ f3 = f1 ∗ (f2 ∗ f3 ) for all functions f1 , f2 , f3 ∈ G. Can you obtain this result using
known properties of the group algebra C[G]? Explain.
(b) Is the set of functions G → C under convolution a group? Explain.
2. (a) Give an example of two finite abelian groups G1 and G2 of the same order, yet with G1 ∼
6= G2 .
(b) If G1 and G2 are as in (a), are the corresponding group algebras C[G1 ] and C[G2 ] isomorphic?
Explain.

3. Let F = F3 , and consider the linear [5, 2]-code C spanned by the vectors 10012 and 01211.
(a) Find a basis for the dual code C ⊥.
(b) Explicitly list all vectors in C and in C ⊥.
(c) From (b), write down the explicit weight enumerators for C and for C ⊥.
(d) By direct computation, verify that this example satisfies the MacWilliams relation (Theo-
rem 6.8).

4. Find an ordinary graph Γ (undirected, with no loops or multiple edges), as small as possible,
whose eigenvalues are not cyclotomic integers. Justify your answer.

Much of Section 6 generalizes to infinite groups; but this works best when G is a compact
topological group. For a group G to be a topological group, one requires that G also have
a topology compatible with the algebraic structure in the sense that the group multiplication
(g, h) 7→ gh and the inverse map g 7→ g −1 are continuous. Compactness and commutativity
together mean that we have a translation-invariant measure on G (Haar measure) with respect
to which we can integrate. Here we give only the group G = S 1 as an example; and we stop short
of providing a full account of the appropriate generalization of Theorem 6.4 to this situation.
7. Group Rings R[G] 

5. Let G be the multiplicative group consisting of all z ∈ C such that |z| = 1. This is not a finite
group, but it is abelian. As a topological space, G is homeomorphic to a circle (and in particular,
Z Z 2π
dz 1
G is compact). For f1 , f2 : G → C, define [f1 , f2 ] = f1 (z)f2 (z) = f1 (eti )f2 (eti ) dt.
G iz 2π 0
The complex
p vector space L2 (G) consists of all integrable functions G → C having finite norm
||f || = [f, f ], but with two functions identified whenever they disagree on a set of measure zero.
The convolution of two such functions is defined by
Z Z 2π
dz 1
(f1 ∗ f2 )(w) = f1 (wz −1 )f2 (z) = f1 (we−ti )f2 (eti ) dt.
z∈G iz 2π 0
We have ||f1 ∗ f2 || < ∞ whenever ||fi || < ∞; so the space L2 (G) is closed under convolution (you
may assume this).
(a) Show that convolution is associative, i.e. (f1 ∗f2 )∗f3 = f1 ∗(f2 ∗f3 ) for all f1 , f2 , f3 ∈ L2 (G).
(b) Find an infinite cyclic group {χn : n ∈ Z} of homomorphisms χn : G → C× . (As homomor-
phisms of topological groups, the maps χn : G → C× are required to be continuous as well
as multiplicative.)

7. Group Rings R[G]

Section 6 includes a description of the group algebra C[G] of a finite abelian group G over
the complex numbers. Replacing the coefficient ring C by another choice of commutative
ring R with identity, one similarly obtains the group ring R[G] of G over R (or in the
case R is actually a field, the group algebra over R). As before, the group G is assumed
to be multiplicative. Despite conceptual convenience of complex number coefficients, C
suffers from some difficulties not evident with other rings. In particular, computer imple-
mentation of arithmetic in C[G] is fraught with difficulty due to numerical errors inherent
in floating point approximation; whereas arithmetic in Z[G], or even in Q[G], can be im-
plemented exactly if arbitrary precision arithmetic with coefficients is available—a realistic
expectation in many programming languages. For many applications, this is a serious con-
sideration. Another advantage of varying the coefficient ring R will appear below (see the
comments following Corollary 7.3). For our purposes, taking R to be an integral domain
(i.e. a commutative ring with identity having no zero divisors) is a quite adequate level of
generality.
Every character χ ∈ G b naturally extends to a homomorphism of algebras over Q given
by X X
χ : Q[G] → Q[ζm ], ag g 7→ ag χ(g)
g∈G g∈G

where m is the exponent of G (the least common multiple of the orders of the elements
of G). It is easy to verify the required properties

χ(aα + bβ) = aχ(α) + bχ(β), χ(αβ) = χ(α)χ(β) for all α, β ∈ Q[G]


 7. Group Rings R[G]

from the definitions. Now we must be wary when using the same letter χ to denote both
a group homomorphism G → C× and an algebra homomorphism Q[G] → C; in particular
these two maps have rather different kernels as given by

ker χ : G → C× = {g ∈ G : χ(g) = 1}, ker χ : Q[G] → C = {α ∈ Q[G] : χ(α) = 0}.


 

×

But these two kernels are related;
 in particular for g ∈ G, we have g ∈ ker χ : G → C
iff g − 1 ∈ ker χ : Q[G] → C . Rather than introduce a new letter for the algebra
homomorphism, we shall try to clarify using context whenever the algebra homomorphism
is intended; and whenever we write simply ker χ, we mean the kernel of χ : G → C× .

The Rational Group Algebra of a Finite Cyclic Group


Suppose now that G is cyclic of order n. As indicated above, each χ ∈ G
b extends to
an algebra homomorphism χ : Q[G] → Q[ζn ]; and in view of the isomorphism

= Q[x]/(xn −1) ∼
Q[G] ∼ Q[x]/(Φd (x)) ∼
M M
(7.1) = = Q[ζd ],
d|n d|n
χ then lifts to an algebra homomorphism Q[x] → Q[ζn ] whose kernel contains the ideal
(xn − 1). Since the image of this map is evidently a subfield of Q[ζn ], this image is Q[ζd ] for
some d n. Now the kernel of the homomorphism Q[x] → Q[ζd ] induced by χ is the principal
ideal (Φd (x)) ⊂ Q[x]. This means that the values of χ generate the cyclotomic extension
Q[ζd ]. Under the isomorphism Q[G] ∼ = Q[x]/(xn −1) above, the monomial x corresponds to
a generator g of G; and then an arbitrary element g k ∈ G lies in ker χ = ker χ : G → C× ,


iff g k − 1 lies in the kernel of χ : Q[x] → Q[ζn ], iff xk − 1 is divisible by Φd (x), iff d k.
This shows that [G : ker χ] = d. We obtain

Theorem 7.2. Suppose G is cyclic of order n, and let χ ∈ G b be a character of


order d, so that [G : ker χ] = d. Then χ extends to an algebra homomorphism
Q[G] → Q[ζn ] having image Q[ζd ] and kernel given by the principal ideal
 
ker χ : Q[G] → Q[ζn ] = Φd (g) ⊆ Q[G]
where Φd (g) ∈ Q[G] is the evaluation of the cyclotomic polynomial Φd (x) at a gener-
ator g of G.

Note that the parameter d n uniquely characterizes the kernel of the algebra homomor-
phism χ : Q[G] → C, as well as the kernel (and the order) of χ : G → C× ; however it does
not uniquely characterize the image Q[ζd ] since for d odd, Q[ζd ] = Q[ζ2d ].

Corollary 7.3. Let G be a finite abelian group, and let α, β ∈ Q[G]. Then
(a) α = β iff χ(α) = χ(β) for all χ ∈ G.
b
(b) Suppose G is cyclic of order n; and for each k n, consider the character χk ∈ G
b
of order d = nk . Then α = β iff χk (α) = χk (β) for all k n.
7. Group Rings R[G] 

The set X = {χk : k n} has cardinality |X | = σ(n), where σ(n) is the number of positive
integer divisors of n; see Exercise #1.5. Note that σ(n) is generally quite small compared
with n. By comparison, given two elements α, β ∈ C[G] where G is cyclic of order n, we
have α = β iff χ(α) = χ(β) for all n characters χ ∈ G. b No fewer than all n characters
will suffice for this purpose. Recall the algebra homomorphism C[G] ∼ = Cn ; and note that
each algebra homomorphism χ : C[G] → C, being C-linear, has an (n−1)-dimensional
subspace as its kernel. The intersection of all these kernels is {0}; but for  any proper
T
subset X ⊂ G of size |X | = k < n, the subspace χ∈X ker χ : C[G] → C ⊆ C[G] has
b
dimension > n − k > 1; it therefore contains α 6= 0 satisfying χ(α) = χ(0) = 0 for all
χ ∈ X . Another way to say this is that over C, the analogue of (7.1) is the algebra

= b . Each equation χ(α) = χ0 (β) says that

isomorphism C[G] −→ Cn , α 7→ χ(α) : χ ∈ G
two vectors in Cn (corresponding to α, β ∈ C[G]) have two coordinates the same; but to
guarantee equality of the two vectors, one must compare all n coordinates.
In practical implementations of Corollary 7.3 for the purpose of checking for equality
of two elements of Q[G], it should be remembered (as previously observed) that each of
the equalities χk (α) = χk (β) can be checked exactly using arbitrary precision arithmetic,
in the ring Q[G] ∼ = Q[x]/(xn −1). Nevertheless, since floating precision is typically much
easier to implement and requires less execution time, it may be that when testing a large
number of pairs (α, β) in Q[G] as candidates for equality, most of the cases can be ruled
out quickly using floating point arithmetic; and only in those cases where numerical values
agree to within a well-chosen tolerance, then closer inspection using exact arithmetic be
used for a final check for equality.

= L
Proof of Corollary 7.3. By Theorem 7.2, the isomorphism Q[G] −→ d|n Q[ζd ] of (7.1)

is explicitly given by α 7→ χd (α) : d|n . The fact that this map is injective is the desired
conclusion.

Notational Accommodations for Additive Groups


We have described the construction of the group ring R[G] of multiplicative group
G over a suitable ring R. A notational difficulty arises when this construction is applied
directly to an additive group G, since addition becomes ambiguous: we have two different
types of addition, ring addition in R[G] (where coefficients of like terms are added in R)
and addition in G. Fortunately this difficulty is easily resolved, as we now describe.
Let G be an additive group. (Presumably G is abelian, as by popular convention,
we always assume addition to be commutative; however this assumption is not strictly
necessary.) We assume G has order |G| = v and identity element 0 ∈ G. Introduce v new
symbols xg , one for each group element g ∈ G, which we multiply according to the rule

xg xh = xg+h for g, h ∈ G.

This makes X := {xg : g ∈ G} a multiplicative group, isomorphic to G via the obvious


correspondence g ↔ xg . We also abbreviate 1 := x0 for the multiplicative identity element
 7. Group Rings R[G]

of X. Now the group algebra of X (or of G) over R has addition and multiplication defined
by P  P  
ag xg + bg xg = (ag +bg )xg ; ag xg bg xg = ag−h bh xg.
P P P P P
g∈G g∈G g∈G g∈G g∈G g∈G h∈G

The group ring R[X] of X over R works just like before; and we will often refer to this
group ring as simply R[G], implicitly invoking the isomorphism X ∼
= G, as this is merely
a notational device.

Example 7.4: The Group Algebra R[Z]. The infinite additive cyclic group G = Z is rewritten
multiplicatively as X = {xk : k ∈ Z}. The group ring (with real coefficients) takes the form
R[G] = R[X] = R[x, x−1 ] which is the ring of Laurent polynomials with real coefficients. It
consists of all polynomials in x and x−1 (note: only finitely many terms, but exponents may be
positive, negative or zero). This ring is of course an algebra over R of infinite dimension, with X as
basis.

Example 7.5: The Group Algebra Q[Z/nZ]. The finite additive cyclic group G = Z/nZ =
{0, 1, 2, . . . , n−1} of order n is isomorphic to the finite multiplicative cyclic group X = {1, x, x2 , . . . ,
xn−1 } where the generator x has order n. The group ring (with rational coefficients) is Q[G] =
{a0 +a1 x+ · · · +an−1 xn−1 : ai ∈ Q} ∼ = Q[x]/(xn − 1), an algebra of dimension n.

Direct Products
Let G1 and G2 be multiplicative groups of order n1 and n2 respectively. The direct product
G1 × G2 is the group of order n1 n2 whose elements are ordered pairs (g1 , g2 ), gi ∈ Gi .
Multiplication is componentwise, viz. (g1 , g2 )(g10 , g20 ) = (g1 g10 , g2 g20 ). We will identify G1
and G2 with the corresponding subgroups G1 × {1} and {1} × G2 of G1 × G2 via the
embeddings g1 7→ (g1 , 1) and g2 7→ (1, g2 ). (There is no harm or ambiguity in these
identifications unless G1 and G2 contain nonidentity elements sharing the same symbols;
but then we simply replace G1 or G2 by an isomorphic copy on a new set of symbols to
avoid the ambiguity.) Now the embeddings G1 , G2 ⊆ G1 × G2 give rise to subalgebras
R[G1 ] and R[G2 ] embedded in the group algebra R[G1 ×G2 ]; and the set of products of the
form α1 α2 , αi ∈ R[Gi ], serve to generate the entire algebra R[G1 × G2 ] as an R-module.
Readers comfortable with tensor products will already recognize that this observation is
more fully expressed by the isomorphism R[G1 × G2 ] ∼ = R[G1 ] ⊗R R[G2 ]; and readers
unfamiliar with this terminology can safely shelve it for future reference.
Recall from Theorem 6.1(a) that every character χ ∈ G\ 1 × G2 has the form χ = χ1×χ2
where χi ∈ Gi , so that χ(g1 , g2 ) = χ1 (g1 )χ2 (g2 ) whenever gi ∈ Gi . This extends by Q-
c
linearity to the rational group algebra of G1 × G2 , so that if
k
X
α= rj α1,j α2,j ∈ Q[G1 × G2 ], rj ∈ Q, αi,j ∈ Q[Gi ],
j=1
7. Group Rings R[G] 

then
k
X
χ(α) = rj χ1 (α1,j )χ2 (α2,j ) ∈ C.
j=1

Example 7.6: Q[G1 ×G2 ] where G1 and G2 are finite cyclic. Let G1 = {1, x, x2 , . . . , xm−1 }
be cyclic of order m, and G2 = {1, y, y 2 , . . . , y n−1 } be cyclic of order n. Then Q[G1 × G2 ] ∼ =
Q[x, y]/(xm −1, y n −1), an algebra of dimension mn over Q. Note that Q[G1 ] ∼ = Q[x]/(xm −1) and
Q[G2 ] ∼
= Q[y]/(y n −1); and we have simply tensored these two algebras together over Q using Q[x, y] ∼
=
Q[x] ⊗Q Q[y] and the remarks above. Here {xi y j : 0 6 i < m, 0 6 j < n} is a basis over Q. Similarly
if G1 = Z/mZ and G2 = Z/nZ, then replacing these additive cyclic groups by their multiplicative
proxies as in Example 7.5, we once again have Q[G1 × G2 ] ∼ = Q[x, y]/(xm −1, y n −1).

Having established the necessary notational preliminaries, we discuss the important


case of the additive group G of a finite field Fq . Here G is elementary abelian of order q,
and we are interested in the group algebra Q[G]; but to simplify statements, all reference
to X, the multiplicative copy of G, will be suppressed.

Theorem 7.7. Let G be the additive group of a finite field F = Fq of odd order.
Let S, N ⊂ G be the subsets of size q−12 corresponding to the nonzero squares and
the nonsquares in F . Let α, β, κ ∈ Q[G] denote the sums of S, N and G respectively,
in the group algebra. Then α generates a 3-dimensional ideal in Q[G] with basis
{1, α, β}, or {1, α, κ}. We have κ = 1 + α + β; ακ = βκ = q−1 2
2 κ; κ = qκ and the
following relations are satisfied.
(a) If q ≡ 1 mod 4: α2 = q−1 q−5 q−1 q−1 ∗
2 + 4 α + 4 β = 4 (1 + κ) − α; α = α; β = β;

√ √
b satisfies χ(α) = 1 (−1 ± q) and χ(β) = 1 (−1 ∓ q).
every nontrivial χ ∈ G 2 2
(b) If q ≡ 3 mod 4: α∗ = β; αα∗ = q+1 q−3
4 + 4 κ; every nontrivial character χ ∈ G
b
√ √
satisfies χ(α) = 21 (−1 ± i q) and χ(β) = 21 (−1 ∓ i q).

Proof. The fact that {1, α, β} spans an ideal in Q[G], with structure constants as stated,
q−1
follow from Theorem 3.6. For example when q ≡ 1 mod 4 and ε = (−1) 2 in the notation
of Theorem 3.6, we may write α2 = m0 + m+ α + m− β where m0 , m+ , m− are positive inte-
gers expressing the number of ways to express 0 (respectively, each square, each nonsquare)
as a sum of two squares in F . Here m0 = q−1 2 is the number of solutions 0 = a + (−a)
q−5 q−1
with a ∈ S; also m+ = 4 and m− = 4 by parts (i) and (iv) of Theorem 3.6.
Now let χ ∈ G b be nontrivial. By Theorem 6.3(a), we have χ(κ) = 0. If q ≡ 1 mod 4
then
χ(α)2 = χ q−1
 q−1
4 (1 + κ) − α = 4 − χ(α)

so that
1 2 q

χ(α) + 2 = 4
 8. Difference Sets

1 √  1 √ 
which yields χ(α) = 2 −1 ± q ; also 0 = χ(κ) = χ(1 + α + β) gives χ(β) = 2 −1 ∓ q .
If q ≡ 3 mod 4 then

|χ(α)|2 = χ(α)χ(α) = χ(αα∗ ) = χ q+1 q−3


 q+1
4 + 4 κ = 4

and so |χ(α)| = 12 q + 1. Also 0 = χ(κ) = χ(1 + α + β) = 1 + χ(α) + χ(α) so χ(α) =
1 √ 1 √
2 (−1 ± i q) and χ(β) = χ(α) = 2 (−1 ∓ i q).

Exercises 7.
1. Consider the group algebra R = F [G] of a finite (multiplicative) abelian group G over a field F ,
and let A ⊆ R be an ideal. Recall that R is an n-dimensional vector space with basis G, where
n = |G|; and A is a subspace. Assume that the characteristic of F does not divide n. (Thus
char F may equal zero; however char F cannot equal p for any of the finitely many primes p
dividing n.)
(a) Using linear algebra, explain why R has a subspace U complementary to A, i.e. R = A ⊕ U ;
this means that R = A + U and A ∩ U = 0. Here you may cite any known theorems from
linear algebra.
(b) Show by example that the subspace U 6 R in (a) is not necessarily an ideal of R, or even a
subalgebra, in general.
(c) Prove that there exists an F -linear transformation P : R → R such that P 2 = P , having
image equal to A and null space equal to U . (Again cite any known facts from linear algebra,
using (a).)
1 P −1 P (hv). The hypothesis regarding char F guar-
(d) Define T : R → R by T (v) = n h∈G h
antees that n has an inverse in F (so we are not dividing by zero here). Prove that T is
F -linear.
(e) Prove that T (gv) = gT (v) for all g ∈ G and v ∈ R.
(f) Prove that T 2 = T .
(g) Prove that T (v) = v iff v ∈ A.
(h) Prove that the image of T is A, and the kernel of T is a subspace B 6 R complementary to
A; so R = A ⊕ B.
(i) Prove that B ⊆ R is in fact an ideal. (While (a) gives a complementary subspace, this is
stronger: it gives a complementary ideal.)
Wherever well-known facts from linear algebra suffice, please indicate so. This is not the place
to re-prove basic facts from linear algebra, only to demonstrate a knowledge of which facts these
are. We remark that the assumption that G is abelian is not actually required here; however if
G is nonabelian, we must speak of left ideals throughout, instead of (two-sided) ideals.

2. Let R = F [G] where F = F2 and G = {1, g} is cyclic. Note that the hypotheses of #1 are not
satisfied. Here we how that the conclusion of #1 also fails. Find an ideal A ⊆ R for which there
does not exist a complementary ideal; so there is no ideal B ⊆ R satisfying R = A ⊕ B.

8. Difference Sets
Let G be a multiplicative group of order v. (While our groups will typically be abelian,
we do not yet require this.) A (v, k, r)-difference set in G is a subset D ⊂ G of size
8. Difference Sets 

|D| = k such that every nonidentity element g ∈ G is expressible in exactly r ways as


g = d1 d−1
2 with d1 , d2 ∈ D. A necessary condition for the existence of such a difference set
is the feasibility relation (v − 1)r = k(k − 1), which follows by counting in two different
ways the number of ordered pairs (d1 , d2 ) of distinct elements of D. In order to avoid
degenerate situations, we will always assume that v > k; that is, k > r (which is easily seen
to be equivalent, using the feasibility relation). We often call a difference set D cyclic,
abelian or nonabelian according as G is cyclic, abelian or nonabelian. We refer to the
triple of positive integers (v, k, r) as the parameters of the difference set; and the integer
n := k−r (soon to play a prominent role) is the order of D. Note that n > 0. Triples
of integers (v, k, r) satisfying the feasibility conditions are not necessarily the parameters
of any difference set, as we point out in the remarks preceding Theorem 8.2 below. The
smallest feasible parameter set for which existence of a difference set is currently unknown,
is apparently (160, 54, 18).
We will reformulate the notion of difference sets in the language of group rings, which
is ideally suited for this purpose. But first we need a little more terminology.
Given α = g∈G ag g ∈ R[G] with ag ∈ R, we denote α∗ = g∈G ag g −1 ∈ R[G]. It is
P P

easy to see that the map α 7→ α∗ is an antiautomorphism of R[G], meaning that it is a


bijective map satisfying

(α + β)∗ = α∗+ β ∗ and (αβ)∗ = β ∗α∗

for all α, β ∈ G. (If G is abelian, then clearly this map is an automorphism of the algebra.)
P
In the following, we also denote κ := g ∈ Z[G].
g∈G

Lemma 8.1. Let G be a multiplicative group of order v, and let D ⊂ G be a subset


of size |D| = k. Then D is a (v, k, r)-difference set iff
δδ ∗ = n + rκ
P
where δ := d ∈ Z[G] and n := k−r.
d∈D

Proof. The relation δδ ∗ = n + rκ says that every nonidentity element g ∈ G is expressible


exactly r ways in the form g = d1 d−1
2 where d1 , d2 ∈ D with d1 6= d2 ; and this is simply the
requirement that D be a (v, k, r)-difference set. The identity element 1 ∈ G is expressible
as 1 = dd−1 for each of the k elements d ∈ D; and this agrees with the constant term
n + r = k in the expression n + rκ.

A symmetric (v, k, r)-design is an incidence system (P, B) consisting of a set P of


cardinality |P| = v (whose elements are called points) and a collection B consisting of v
subsets of P (called blocks) such that
(i) Each block contains exactly k points; and dually, each point lies in exactly k blocks.
 8. Difference Sets

(ii) Any two distinct points lie in exactly r common blocks. Dually, any two distinct
blocks intersect in exactly r points.
Again, a necessary condition for the existence of a symmetric (v, k, r)-design is the feasi-
bility relation (v − 1)r = k(k − 1), deduced by counting in two different ways the number
of pairs of distinct points in a fixed block. As before, we call (v, k, r) the parameters of
the design, and n := k − r its order; and to avoid trivial cases, we always assume v > k,
i.e. k > r, n > 0. From the feasibility relation, we see that any symmetric design with r = 1
has parameters (n2 +n+1, n+1, 1); this is called a projective plane of order n. Note
that the feasibility relations are necessary, but not sufficient, condition for the existence
of a symmetric design with a given set of parameters; for example, there is no symmet-
ric design with the feasible parameter set (43, 7, 1) (since there is no projective plane of
order 6). Given this assertion and Theorem 8.2 below, it follows that there is also no
(43, 7, 1)-difference set. The smallest currently open parameter set for which the existence
of a symmetric design has not yet been resolved, is (81, 16, 3) (although so much is known
about the automorphism group of a putative symmetric (81, 16, 3)-design, that it cannot
arise from any difference set).

Theorem 8.2. A (v, k, r)-difference set D ⊂ G gives rise to a symmetric (v, k, r)-
design whose point set is G and whose blocks are the right translates Dh = {dh : d ∈
D}, h ∈ G. This symmetric design admits G as a group of automorphisms permuting
the points regularly (by right-multiplication).

Example 8.3: The Symmetric (7,3,1)- and (7,4,2)-Designs. Consider the cyclic
g3....
group of order seven, G = {1, g, g 2 , . . . , g 6 }. The subset D = {g, g 2 , g 4 } is a (7, 3, 1)- ..• ....
... ... ....
... ....... ...
difference set. The corresponding symmetric (7, 3, 1)-design, whose seven translates g5.............. ..... .............. g6
..•
. ....... ..1........• ..
.......... .......
are the lines shown on the right, is a projective plane of order 2. The complement of ..... ......•
. .... ................ .... ................. .....
. ... . ... ...... ...........
D is a (7, 4, 2)-difference set {1, g 3 , g 5 , g 6 } whose translates are the seven quadrangles g2•....................................•
.........................•
g
.. 4
g
(sets of four points, no three collinear) in the same plane.
The (7, 2, 1)-difference set generalizes in more than one way; see Exercises #3,4,5.

Proof of Theorem 8.2. Obviously each block Dh contains exactly k points dh, d ∈ D.
Each point g ∈ G lies in exactly k blocks Dd−1g for d ∈ D; this is because g ∈ Dh iff
g = dh for some d ∈ D, iff h = d−1g.
Given two distinct points g1 6= g2 in G, a block Dh contains both points iff (g1 , g2 ) =
(d1 h, d2 h) for some d1 , d2 ∈ D, iff d1 d−1
2 = g1 g2−1 and h = d−1
1 g1 . There are exactly r
pairs (d1 , d2 ) in D satisfying these conditions; and each such pair (d1 , d2 ) yields a unique
block Dh = Dd−1 1 g1 . Thus there are exactly r blocks containing both points g1 and g2 . It
remains only to show that any two distinct blocks intersect in exactly r points.
Let A be the v × v incidence matrix of our point-block structure: rows and columns of
A are indexed by points and blocks respectively; and the entry in row g ∈ G and column
8. Difference Sets 

Dh, h ∈ G, is 1 or 0 according as g is or is not in Dh. Let J = Jv be the v × v matrix


of 1’s, and let I = Iv be the v × v identity matrix. Using what we have already shown,
AJ = JA = kJ and AAT = nI + rJ, where AT is the transpose of A; and of course also
ATJ = JAT = kI. Since n > 0, clearly nI + rJ is nonsingular (it has positive eigenvalues
n + rv and n of multiplicity 1 and v − 1 respectively, with eigenspaces h1i and 1⊥ respec-
tively, where 1 is the all-ones vector of length v) and so A is also nonsingular. Since A and
AT both commute with I and J, they commute with each other; therefore ATA = nI + rJ.
This proves that any two distinct blocks intersect in exactly r points.

An automorphism of a design (P, B) is a transformation σ permuting the points,


and also permuting the blocks, such that given any P ∈ P and B ∈ B, we have P ∈ B iff
σ(P ) ∈ σ(B). Typically we identify each block with the set of its points; and in this view,
an automorphism is the same thing as a permutation of the points such that for every
B ∈ B, its image σ(B) := {σ(P ) : P ∈ B} is also a block.

Theorem 8.4. Every automorphism of a symmetric (v, k, r)-design has the same
number of fixed points as fixed blocks.

Proof. Let  (P, B) be a symmetric design with v × v incidence matrix A = aP,B : P ∈


P, B ∈ B ; here aP,B = 0 or 1 according as P ∈ / B or P ∈ B. Given an automorphism
σ of (P, B), we construct v × v permutation matrices Π1 and Π2 representing the action
of σ on points and blocks respectively. Thus for points P and P 0 , Π1 has (P, P 0 )-entry
δσ(P ),P 0 = 1 or 0 according as σ(P ) is or is not equal to P 0 ; similarly Π2 has (B, B 0 )-entry
δσ(B),B 0 . Now Π1 AΠT2 has (P, B)-entry equal to
X
δσ(P ),P 0 aP 0 ,B 0 δσ(B),B 0 = aσ(P ),σ(B) = aP,B
P 0 ,B 0

since σ is an automorphism. This shows that Π1 AΠT2 = A. Now ΠT2 = Π2 and A is


invertible by the proof of Theorem 8.2; so Π1 = AΠ2 A−1 . In particular, tr Π1 = tr Π2 ,
which says that the number of points fixed by σ equals the number of blocks fixed by σ.

Not every symmetric design arises from a difference set; the designs constructed above
are special in that they admit G as a regular group of automorphisms. Indeed, right-
multiplication by an element a ∈ G permutes points via g 7→ ga and blocks via Dh 7→ Dha,
thereby preserving incidence. To say that this group of automorphisms is regular is to
say that for any two points g1 , g2 , there is a unique automorphism in our group mapping
g1 7→ g2 (in this case, right-multiplication by a = g1−1g2 ). It is clear that this group of
automorphisms of the design is isomorphic to the group G that we started with, and that
it regularly permutes the blocks (as well as regularly permuting the blocks). In fact, for
 8. Difference Sets

a symmetric design, any group which regularly permutes the points must also regularly
permute the blocks, although we do not prove this here.

Theorem 8.5. If D is a (v, k, r)-difference set in a group G, then the symmetric


(v, k, r)-design constructed above admits a group of automorphisms isomorphic to G,
regularly permuting both the points and the blocks.
Conversely, if (P, B) is a symmetric (v, k, r)-design with a group G of automor-
phisms regularly permuting both the points and the blocks, then there is a (v, k, r)-
difference set D ⊂ G which gives rise to the design (P, B).

Proof. We have only to prove the converse. Arbitrarily we choose a point and label it
‘1’. The other points are then labelled by the remaining group elements, using the regular
action of G which we may assume acts by right-multiplication: element g ∈ G maps point
‘1’ to point ‘g’. Choose a block B arbitrarily, and let D be the set of all points in B. Given
our labelling of points by group elements, D is viewed as a subset of G, with |D| = k. Given
g 6= 1 in G, by assumption there are exactly r blocks containing both the points g and 1.
Any such block may be denoted Bh, or simply Dh = {dh : d ∈ D}, for some h ∈ G, using
the regular action of G on the set of blocks; and we have g, 1 ∈ Dh iff (g, 1) = (d1 h, d2 h)
for some d1 , d2 ∈ D, iff (g, h) = (d1 d−1 −1
2 , d2 ). Thus every non-identity element g ∈ G is
expressible as g = d1 d−1
2 in exactly r ways. So D ⊂ G is a (v, k, r)-difference set.

Theorem 8.6. Suppose that D is a (v, k, r)-difference set in a group G. Then so


are the subsets
aDb := {adb : d ∈ D}; D∗ := {d−1 : d ∈ D}; and σ(D) := {σ(d) : d ∈ D}
for all a, b ∈ G, and every automorphism σ ∈ Aut G.

P P
Proof. Let δ = d∈D d ∈ Z[G] and κ = g∈G g ∈ Z[G] as before. By Lemma 8.1,

δδ = n + rκ. Given a, b ∈ G, the sum of the elements in aDb is aδb, which satisfies

(aδb)(aδb)∗ = aδδ ∗ a−1 = a(n + rκ)a−1 = n + rκ,

so aDb is also a (v, k, r)-difference set. It is also clear that σ(D) is a (v, k, r)-difference set
in G whenever σ ∈ Aut G.
It is clear from the definition that the dual of a symmetric (v, k, r)-design (in which the
roles of points and blocks are reversed) is also a symmetric (v, k, r)-design (which may or
may not be isomorphic to the original design). Moreover those symmetric designs arising
from difference sets in the way we have shown, have a group G permuting both points and
8. Difference Sets 

blocks regularly. From a (v, k, r)-difference set D ⊂ G we construct a symmetric (v, k, r)-
design in which point P lies in block d(B) iff d ∈ D, iff d−1 (P ) lies in the block B; so in the
dual design, the ‘point’ B lies in the ‘block’ d∗ (P ) iff d∗ ∈ D∗ , where D∗ = {d−1 : d ∈ D}.
This dual design is also a symmetric (v, k, r)-design admitting G as a regular group of
automorphisms; and so by Theorem 8.5, D∗ is also a difference set.

Following Theorem 8.6, we usually consider two difference sets D, D0 ⊂ G to be equivalent


if D0 = aσ(D)b for some a, b ∈ G and σ ∈ Aut G, since it is in these cases the corresponding
designs are isomorphic.
The search for difference sets in nonabelian groups is an interesting but very difficult
problem. For the remainder of this section, we will confine our attention to abelian groups.
If G is (multiplicative) abelian of order v, then for every integer t relatively prime to v,
the map g 7→ g t is an automorphism of G which extends to an automorphism of the group
P
algebra: given α = ag g ∈ Q[G], ag ∈ Q, we define
g∈G

X
α[t] = ag g t ∈ Q[G].
g∈G

Note that α[s] [t] = α[st] and α[−1] = α∗ . Now if D is a (v, k, r)-difference set in an abelian


group G, and t is relatively prime to v, it follows from Theorem 8.6 that the subset

D[t] := {dt : d ∈ D}

is also a (v, k, r)-difference set in G. However, D[t] may coincide with a translate gD for
some g ∈ G. Hall’s Multiplier Theorem 8.9 indicates some sufficient conditions for this to
occur. But first we prove

Theorem 8.7. Let G be a multiplicative cyclic group of order v, and consider a


P
nonempty subset D ⊂ G of size |D| = k < v with sum δ = d∈D ∈ Z[G]. Then
(a) D is a (v, k, r)-difference set in G iff k 2 − rv = n := k − r and

|χ(δ)| = n
for every nontrivial χ ∈ G.b

(b) Assuming G is cyclic, in (a) it suffices to verify |χd (δ)| = n for a set of repre-
sentatives χd of the characters of order d, where d ranges over the divisors of v
with d > 1.

Proof. By Lemma 8.1, D is a (v, k, r)-difference set iff δδ ∗ = n + rκ. By Corollary 7.3,
this is equivalent to

(8.8) |χd (δ)|2 = χd (δ)χd (δ) = χd (δδ ∗ ) = χd (n + rκ) = n + rχd (κ)


 8. Difference Sets

for all d n. For d = 1 this gives another proof of k 2 = n + rv, which is just the feasibility
P
relation. For d > 1, we have χd (κ) = g∈G χd (g) = 0 by Theorem 6.3(a), so (8.8) reduces
to |χd (δ)|2 = n.

Theorem 8.9 (Hall’s Multiplier Theorem [Ha], [HR]). Let D be a (v, k, r)-
difference set in an abelian group G of order v. Suppose that the order n = k − r has
a prime divisor p > r which does not divide v. Then D[p] = gD for some g ∈ G.

[−1]
P P
Proof. Let δ = d∈D d and κ = g∈G g, so that δδ = n + rκ. We see from the
p [p]
Multinomial Theorem that δ = δ + pα for some α ∈ Z[G], so

δ [p] δ [−1] = δ p δ [−1]− pαδ [−1] = δ p−1 δδ [−1]+ pα1 = (n + rκ)δ p−1+ pα1 = rκ + pα2

for some α1 , α2 ∈ Z[G]. Since all coefficients on the left side are non-negative integers,
this must also be true on the right side; and since p > r this means that all coefficients in
α2 are non-negative integers. Multiplying both sides by κ yields k 2 κ = rvκ + pα2 κ; and
using the feasibility relation k 2 = n + rv we obtain pα2 κ = nκ. Applying [−1] yields also
[−1]
pα2 κ = nκ. Now
[−1] [p]
δ [p] δ [−1] δ [p] δ [−1] = δ [p] δ [−1] δ [−p] δ = δδ [−1] δδ [−1] = (n+rκ)(n+rκ)[p]
 

= n2 + 2rnκ + r2 vκ.

On the other hand,


[−1] [−1] [−1]
δ [p] δ [−1] δ [p] δ [−1] = (rκ + pα2 )(rκ + pα2 ) = r2 vκ + 2rnκ + p2 α2 α2 .


Equating these two expressions yields


[−1]
p2 α2 α2 = n2 .
P
The expansion α2 = g∈G ag g ∈ Z[G] must have exactly one nonzero coefficient ag since
the coefficients are non-negative integers and the right side has a single term n2 . It follows
that pα2 = ng for some g ∈ G. We arrive at

δ [p] δ [−1] = ng + rκ = δδ [−1] g.

We would like to cancel δ [−1] from both sides, but first we need to know that δ [−1] is not
a zero divisor in Q[G]. The latter relation takes the form ρδ [−1] = 0 where ρ = δ [p] − δg ∈
Q[G]. Now each χ ∈ G b satisfies

χ(ρ)χ(δ [−1] ) = χ(ρδ [−1] ) = χ(0) = 0.


8. Difference Sets 

For the trivial character, we have χ(δ [−1] ) = k; and by Theorem 8.6, D[−1] is a (v, k, r)-

difference set, so |χ(δ [−1] )| = n by Theorem 8.7. In either case, χ(δ [−1] ) 6= 0, so χ(ρ) = 0
b By Corollary 7.3(a), ρ = δ [p] − δg = 0 as required.
for all χ ∈ G.

Given a difference set D in an abelian group G, a multiplier of D is an automorphism


[t] ∈ Aut G (where t is an integer relatively prime to v =|G|) such that D[t] = Dg for some
g ∈ G. It is not hard to see that the product of two multipliers of D is again a multiplier
of D; and so the set of multipliers forms the multiplier group of D. Hall’s Multiplier
Theorem shows that this group contains all primes p > r which divide n but do not
divide v. (It has long been conjectured that the hypothesis p > r should not be necessary
for the stated conclusion; but despite much work on this nagging problem, this Multiplier
Conjecture remains open.)

Theorem 8.10. Let D be a (v, k, r)-difference set in an abelian group G of order v.


Let [t] ∈ Aut G be a multiplier of D. Then there exists a translate Dh (h ∈ G) which
is fixed by [t], i.e. (Dh)[t] = Dh.

The point is that Dh generates the same symmetric design as D, since these two difference
sets have the same translates in G; so for most purposes, we may assume [t] fixes D itself,
otherwise replace D by an appropriate translate.
Proof of Theorem 8.10. The automorphism [t] ∈ Aut G acts as an automorphism of the
associated symmetric design, since g ∈ Dh iff g t ∈ (Dh)[t] . Since 1t = 1, [t] fixes at least
one point; so by Theorem 8.4, [t] has at least one fixed block Dh, h ∈ G.

Example 8.11: Constructions using Multipliers. Consider the cyclic group G = {1, g, g 2 , . . . , g 6 }
of order seven. Although it is not hard to find difference sets of order 2 in G (see Example 8.3),
Hall’s Multiplier Theorem makes the job even faster. The prime p = 2 divides n = 2 but not v = 7,
so [2] ∈ Aut G is a multiplier. By Theorem 8.10, any difference set is equivalent to one invariant
under the multiplier. Now [2] has three orbits on G: {1}, {g, g 2 , g 4 }, {g 3 , g 5 , g 6 }. In order that
D[2] = D, D must be a union of these orbits. We obtain D = {g, g 2 , g 4 } and D∗ = {g 3 , g 5 , g 6 } as
(7, 2, 1)-difference sets; every (7, 2, 1)-difference set is therefore a translate of one of these. Taking
their unions with the orbit {1} gives two (7, 4, 2)-difference sets; and every difference set with these
parameters is a translate of one of these.
Now consider the cyclic group G = {1, g, g 2 , g 20 } of order 21. A cyclic (21, 5, 1)-difference
set in G has order 4 and so must have [2] as a multiplier. The orbits of [2] on G are {1},
{g 7 , g 14 }, {g 3 , g 6 , g 12 }, {g 9 , g 15 , g 18 }, {g, g 2 , g 4 , g 8 , g 11 , g 16 } and {g 5 , g 10 , g 13 , g 17 , g 19 , g 20 }; and
any (21, 5, 1)-difference set D is (up to translation) a union of cosets. Since k = 5, we must have
D = {g 3 , g 6 , g 7 , g 12 , g 14 } or {g 7 , g 9 , g 14 , g 15 , g 18 }. It is not hard to check that both of these are in
fact difference sets; and they have the form D and D∗ , so the corresponding designs are dual to each
other.
 8. Difference Sets

Example 8.12: Nonexistence via Multipliers. We show that there is no cyclic projective plane
of order 10. Such a design would necessarily arise from a (111, 11, 1)-difference set D in a cyclic
group G of order 111. The plane has order 10 and so [2] is a multiplier. Without loss of generality,
D[2] = D. A short computer program helps to enumerate the orbits of [2] on G, which are {1},
{g 37 , g 74 }, S, g 3 S and g 11 S where |S| = 36. Since there is no union of orbits having combined size
k = 11, there can be no such difference set. (There is in fact no projective plane of order 10, as we
now know by extensive computer search. It is remarkable how much easier it is to prove nonexistence
in the cyclic case.)

Example 8.13: Nonexistence via Characters. We prove that there is no (154, 18, 2)-difference
set in a cyclic group G. Here the order is n = 18 − 2 = 16. We expect [2] to be a multiplier,
but we cannot use Theorem 8.9 as stated since p = 2 does not exceed r = 2. But suppose D ⊂ G
is a (154, 18, 2)-difference set, and let χ ∈ G b be a character of order 11. The values of χ are in
D satisfies |χ(δ)|2 = 16. Denoting the principal ideal
P
O := Z[ζ], ζ = ζ11 . By Theorem 8.7, δ :=
A = (χ(δ)) ⊂ O, this yields the factorization AA = (2)4 and so A = A = (2)2 . Now the ideal
(2) = 2O ⊂ O is prime since the quotient O/(2) ∼ = Z[x]/(2, Φ11 (x)) ∼
= F2 [x]/(Φ11 (x)) ∼
= F210 is a
field. (Any root of Φ11 (x) in an extension of F2 is a primitive 11-th root of unity; and any extension
F2k having such a root must have 11|2k −1, and this requires 10|k.) Since A = (χ(δ)) = (4), we
must have χ(δ) = 4u for some unit u ∈ O× . But every automorphism σ ∈ Aut Q[ζ] has the property
that σ ◦ χ : G → Q[ζ] is also a nontrivial character, so the same reasoning gives |σ(χ(δ))| = 4.
Thus u ∈ O satisfies |σ(u)| = 1 for every σ ∈ Aut Q[ζ]. By Theorem 4.10, u is a root of unity
in Q[ζ11 ]; so u = ±ζ k and χ(δ) = ±4ζ k for some k ∈ {0, 1, 2, . . . , 10}. However [G : H] = 11
where H = ker χ = {g ∈ G : χ(g) = 1} and so G = H ∪ Ht ∪ Ht2 ∪ · · · ∪ Ht10 where t ∈ G
has order 11 satisfying χ(t) = ζ; thus χ(δ) = a0 +a1 ζ+a2 ζ 2 + · · · +a10 ζ 10 where ai = |D ∩ Hti |.
Since Φ11 (x) = 1+x+x2 + · · · +x10 is the minimal polynomial of ζ over Q, these two expressions for
χ(δ) can only agree if a0 +a1 x+a2P x2 + · · · +a10 x10 = ±4xk + aΦ11 (x) for some a ∈ Q. Comparing
coefficients gives a ∈ Z and 154 = 10 i=0 ai = ±4 + 11a and 154 ≡ ±4 mod 11, a contradiction. Thus
no (154, 18, 2)-difference set can exist over a cyclic group.
We remark on the necessity of observing that the ideal (2) ⊂ O is prime; if this were not the
2
case, then possibly (2) = PP for some ideal P ⊂ O such that (χ(δ)) = P2 and (χ(δ)) = P , which
would have invalidated our argument.

The literature on difference sets is vast and growing quickly; see for example the
surveys [Ju], [JS1], [JS2] on this subject. From the extensive list of known constructions
and general nonexistence results, we have highlighted only a few above, with a preference
for techniques using cyclotomic fields and group characters. It is with this preference in
mind that we have we have passed over many important results, in particular

Theorem 8.14 (Bruck, Ryser, Chowla [BR], [CR]). Suppose there exists a
symmetric (v, k, r)-design of order n := k − r. If v is even, then n is a square. If v
v−1
is odd, then the Diophantine equation nx2 + (−1) 2 y 2 = z 2 has a nontrivial integer
solution (i.e. (x, y, z) 6= (0, 0, 0)).
8. Difference Sets 

While this condition does not rule out the designs of Examples 8.12 and 8.13, it does rule
out many others, including the projective plane of order 6 mentioned earlier. Of course
every parameter set (v, k, r) for which no symmetric design exists, means also that there
is no difference set with the given parameters.

Exercises 8.
1. Let G be a cyclic group of order seven. How many (7, 3, 1)-difference sets does G have? If D is
one such difference set, are all the others of the form aDb or (aDb)∗ using Theorem 8.6?

2. Generalizing Example 8.3, show that if (P, B) is a symmetric (v, k, r)-design, then by comple-
menting every block B ∈ B we get a family of subsets B0 = {P r B : B ∈ B} such that (P, B0 )
is also a symmetric design. Find the parameters of (P, B0 ); and show that the complemen-
tary design (P, B0 ) has the same order as the original design (P, B). (Note that in view of
Theorem 8.5, every difference set D ⊂ G must also yield a complementary difference set
D0 := G r D of the same order.)

3. Let G be an additive (rather than multiplicative) group of order v. Then a (v, k, r)-difference
set in G is a subset D ⊂ G of size |D| = k < v such that every nonzero element g ∈ G can be
expressed as g = d1 − d2 in exactly r ways. Each integer t relatively prime to v determines an
automorphism [t] ∈ Aut G, g 7→ tg; and such a map is a multiplier of D if tD = D + g for some
g ∈ G.
(a) Let G be the additive group of a finite field Fq , q ≡ 3 mod 4; and let D be the set of
nonzero squares in Fq . Show that D is a difference set in G, and determine its parameters.
This construction gives the Paley difference sets; it includes the (7, 3, 1)-difference set of
Example 8.3 as a special case.
(b) Find all multipliers of the difference set D in (a). In the strict sense that we have defined
multipliers [t], we have only considered t ∈ Z. Can you generalize the definition of multipliers
to include more general automorphisms?
4. Let V be a vector space of dimension e > 3 over a finite field F = Fq . Let P be the set of all
1-dimensional subspaces of V , and let B be the set of all (e−1)-dimensional subspaces of V .
q e −1
(a) Show that |P| = |B| = q−1
.
(b) For P ∈ P and B ∈ B, we say that P lies in B if P is a subspace of B. Show that this
e
−1
defines a symmetric (v, k, r)-design where v = qq−1 . Express k, r and n := k − r in terms of
q and e.
(c) Show that when e = 3, the design (P, B) is a projective plane of order n. These are in fact
the classical projective planes, again including Example 8.3 as a special case.

5. Let E ⊃ F be an extension of finite fields of degree e > 3 where F = Fq and E = Fqe . Let ω ∈ E
be a generator of the cyclic group E × , i.e. a primitive (q e −1)-st root of unity.
q e −1
(a) Show that ω v is a generator of the multiplicative group F × , where v = q−1
; that is, ω v is
a primitive (q−1)-st root of unity.
e
−1
(b) Consider the quotient group G = E × /F × , a cyclic group of order v = qq−1 . Note that for
× ×
each α ∈ E and a ∈ F , we have TrE/F (aα) = 0 iff TrE/F (α) = 0 (using the F -linearity
of the trace map); so we have a well-defined subset D ⊂ G consisting of all cosets αF × ,
α ∈ E × such that TrE/F α = 0. Show that D is a cyclic difference set in G having the same
parameters as the design in #4. (These are the Singer designs.)
 9. Hadamard Matrices

9. Hadamard Matrices

A Hadamard matrix of order m > 1 is an m × m matrix H with entries ±1 satisfying


HH T = mI. Examples of the smallest Hadamard matrices, having orders 1, 2, 4, 8, are

+ + + + + + + +
 
− + + + − + − −
− + + + − − + + + − + −
   
 
++ + − + + − − − + + + − +
   
(9.1) + , , ,
+− + + − + − + − − + + + −
  
+ + + − − − + − − + + +
 
− + − + − − + +
 
− + + − + − − +

where we abbreviate ±1 by ±.

Theorem 9.2. The order m of any Hadamard matrix satisfies m = 1, 2 or 4n for


some n > 1.

Proof. Let H be a Hadamard matrix of order m > 2, and let u, v, w ∈ {±1}n be its first
three rows. Since the vectors u+v and u+w have even entries, m = u·u = (u+v)·(u+w) ≡
0 mod 4.

It is not known whether the converse of Theorem 9.2 holds; but it is a popular con-
jecture that for every positive m ≡ 0 mod 4, there is a Hadamard matrix of order m. The
smallest currently open case is m = 668. How should one go about trying to construct a
Hadamard matrix of such a size? Whatever is the best way, certainly the worst way is to
2
hope to go through all 2m matrices of size m × m with entries ±1 until finding success.
In trying to convey to my non-mathematical friends a sense of the kind of problems com-
binatorialists work on, I will often describe the search for a Hadamard matrix of size 668;
but the magnitude of such a search typically fails to impress my friends who are unaware
2
that if all the computers in the world were dedicated to a naive search of 2668 cases at the
optimistic rate of a million cases per second, it would still take much more than 10100000
times the current age of the universe to finish the task .
Rather than not looking at all, we look where the chances seem better. We are
reminded of the old story (often called The Streetlight Effect), one version [Ho] of
which goes:
A man got drunk in Memphis one night. Staggering down the street, he stumbled into an alley.
His watch fell out of his pocket. He heard it fall. He got up, and walked on down to the corner.
There he got down on his hands and knees and started crawling all around under the electric
light. Soon the traffic was blocked. A police officer came up to him and said : “What’re you
doing? Can’t you see you’re blocking traffic?” The drunk replied: “Well, I losht mer watch. It
was my daddy’s watch; sort of an heirloom in the family, and I’ve juss gotta find that watch.”
9. Hadamard Matrices 

The cop said: “All right, boss, I’ll help you find it. Where did you lose it?” The drunk: “I losht
it up there in that damn dark alley.” The cop: “Well, why don’t you look for it down there
where you lost it?” The drunk: “Why, you big fat fool, can’t you see there’s a lot more light up
here?”
The story offers us some guiding principles, both in looking for examples, and in trying to
glean insight from the known examples directing us what theorems we should be trying to
prove (at least whether existence or nonexistence results). When hunting for examples, we
are wise to look where the light is best, provided we look where our chances are good. But
before ruling out the darker regions or investing too much time trying to prove nonexistence
there, consider that the lack of known examples in the darker regions is due to the difficulty
of finding them there. (It is usually this second lesson that is intended by ‘the streetlight
effect’.)
The literature on Hadamard matrices is far too vast to adequately summarize here,
so we will content ourselves with describing a few general types of construction and some
approaches to classifying or proving nonexistence in special cases. Cyclotomic integers
play a key role in both of these ventures, but particularly in proving nonexistence or
classification results; however much of the work constructing examples uses group rings
and representations where cyclotomic integers arise in a somewhat clerical way.
However before describing general types
 1 1 of construction,
 H H we should note that if H
is a Hadamard matrix of order m, then 1 −1 ⊗ H = H −H is a Hadamard matrix of
order 2m. So for the existence problem, it suffices to consider only orders m ≡ 4 mod 8,
i.e. m = 4n where n is odd; and we will often confine our attention to this case. The
doubling trick we have just mentioned is a special case of

Theorem 9.3. If Hi is a Hadamard matrix of order mi for i ∈ {1, 2}, then H1 ⊗ H2


is a Hadamard matrix of order m1 m2 .

Proof. Each entry in H1 ⊗ H2 is the product of an entry in H1 times an entry in H2 ,


hence equal to ±1. Also (H1 ⊗ H2 )(H1 ⊗ H2 )T = (H1 H1T ) ⊗ (H2 H2T ) = m1 In1 ⊗ m2 Im2 =
m1 m2 Im1 m2 .

We also mention that if H is Hadamard, then multiplying some of the rows of H by −1,
and multiplying some of the columns by −1, then permuting rows and permuting columns,
results in another Hadamard matrix which is said to be monomially equivalent to H.
Since H T is also Hadamard, one might also consider the coarser equivalence relation which
allows for transposing of Hadamard matrices. In counting Hadamard matrices, we may be
interested in the raw number of Hadamard matrices of order m (a rapidly growing function
of m), or in the number of equivalence classes of Hadamard matrices of order m (either up to
monomial equivalence, or up to monomial equivalence and transpose). The exact number of
monomial equivalence classes of Hadamard matrices of order m is known only for m 6 32.
For m = 4, 8, 12, 16, 20, 24, 28, 32, the number of classes is 1, 1, 1, 5, 3, 60, 487, 13710027.
 9. Hadamard Matrices

Skew-Type Hadamard Matrices


A Hadamard matrix of order m is of skew type if it has the form Im + S where S
is m × m skew-symmetric with entries ±1 off its main diagonal. The fourth example in
(9.1) is of this type. After scaling certain rows and columns by −1, one shows that every
skew-type Hadamard matrix takes the equivalent form

+ + + ··· +
 
− 
H = −
 
I+A
 
.
 ..



where
(9.4) A is skew symmetric (AT = −A) of size (m−1) × (m−1) with zeroes on
its main diagonal and entries ±1 elsewhere; and
(9.5) AJ = JA = 0 and A2 = J − (m−1)I where I = Im−1 and J = Jm−1 are
the identity matrix and all-ones matrix of size (m−1)×(m−1) respectively.

These conditions are equivalent to the assertion that A = A1 −AT1 where A1 is the (0, 1)-
incidence matrix of a symmetric (4n−1, 2n−1, n−1)-design (and AT1 is the incidence matrix
of the dual design with the same parameters). Such a design has order n and is called
a Hadamard 2-design. Of course any (4n−1, 2n−1, n−1)-difference set in an abelian
group of order v = 4n−1 immediately gives such a design and Hadamard matrix of or-
der 4n. These difference sets are usually said to be of Paley type. But there is conflicting
terminology: some authors, a minority it seems, refer to (4n−1, 2n−1, n−1)-difference
sets as Hadamard difference sets. Others reserve this terminology for the parameters
(4n2 , 2n2 ±n, n2 ±n) discussed below.
The classical construction, due to Paley, uses the additive group of Fq , q ≡ 3 mod 4,
in which the squares form a (4n−1, 2n−1, n−1) difference set where n = 14 (q + 1); see
Exercise #8.3. After rewriting the group as a multiplicative group G of order q, we let
α, β, κ ∈ Z[G] be the elements denoted by
X X X
α= g, β= g, κ= g.
g∈S g∈N g∈G

Note that κ = 1 + α + β (where the ‘1’ represents 0 ∈ Fq , but rewritten multiplicatively).


Now Theorem 7.7(b) yields αα∗ = n + (n−1)κ This expresses the fact that the squares
form a difference set with the indicated parameters. When q = 7, this gives the Hadamard
matrix of order 8 listed in (9.1).
To show that there exist also nonclassical skew-type Hadamard matrices, we present
next the twin prime power construction [SS] using a Paley-type difference set of order
2
n = 41 q + 1 whenever both q and q + 2 are odd prime powers. Here G is the additive
9. Hadamard Matrices 

group of the ring R := Fq ⊕ Fq+2 . The difference set D consists of pairs (a, b) ∈ R such
that
• a and b are both nonzero squares in their respective fields; or
• a and b are both nonsquares in their respective fields; or
• a ∈ Fq is arbitrary and b = 0.
Note that |G| = q(q + 2) = 4n − 1 and |D| = 14 (q 2 − 1) + 14 (q 2 − 1) + q = 2n − 1. We write
G = G1 × G2 and
X X
κ= = κ1 κ2 ∈ Q[G] where κi = g ∈ Q[Gi ];
g∈G g∈Gi
see the description of group algebras for direct product groups in Section 7. We also let
αi , βi ∈ Q[Gi ] corresponding to nonzero squares and nonsquares in the respective fields, so
P
that κi = 1+αi +βi in the notation of Theorem 7.7. Now δ = d∈D d = α1 α2 + β1 β2 + κ1 .
To verify that D is a (4n−1, 2n−1, n−1)-difference set in G, i.e. δδ ∗ = n + (n − 1)κ, it

suffices by Theorem 8.7 to check that |χ(δ)| = n for every nontrivial χ ∈ G. b Recall that
χ = χ1 ×χ2 where χi ∈ G ci , and at least one of χ1 , χ2 is nontrivial. Again by the comments
of Section 7,
χ(δ) = χ1 (α1 )χ2 (α2 ) + χ1 (β1 )χ2 (β2 ) + χ1 (κ1 ).
If χ1 is trivial, then χ2 is nontrivial and
q−1 q−1 q−1 q+1 √
χ(δ) = 2 χ2 (α2 ) + 2 χ2 (β2 ) +q = 2 (−1) +q = 2 = n.
If χ2 is trivial then χ1 is nontrivial and

χ(δ) = χ1 (α1 ) q+1 q+1 q+1
2 + χ1 (β1 ) 2 = (−1) 2 = − n.
In the remaining case, both χ1 and χ2 arenontrivial and
√ √ 
χ1 (α1 ) = 12 −1 + ε1 q , χ1 (β1 ) = 12 −1 − ε1 q ,
p p
χ2 (α2 ) = 12 −1 + ε2 q+2 , χ2 (β2 ) = 12 −1 − ε2 q+2
 

where ε1 ∈ {1, −1} and ε2 ∈ {i, −i} if q ≡ 1 mod 4; or ε1 ∈ {i, −i} and ε2 ∈ {1, −1} if
q ≡ 3 mod 4. Now
p p
χ(δ) = χ1 (α1 )χ2 (α2 ) + χ1 (β1 )χ2 (β2 ) = 12 1 + ε1 ε2 q(q+2) = 21 1 ± i q(q+2)
 

and |χ(δ)|2 = 41 1 + q(q+2) = 14 (q + 1)2 = n2 .




Williamson-Hadamard Matrices
A Hadamard matrix of order m = 4n is of Williamson type if it has the form
A B C D
 
 −B A −D C 
H=
−C D A −B

−D −C B A
where A, B, C, D are symmetric circulant matrices of order n which commute with each
other. We recall that an n × n matrix is circulant if each row is a cyclic shift of the
 9. Hadamard Matrices

previous row, with ‘wraparound’. More precisely, the set of all n × n circulant matrices is
the algebra Q[T ] generated by the n × n matrix

0 10 ··· 0 0
 
0 01 ··· 0 0
. ..
.. . . .... 
 .. .. . . .
T = .
0
 0 0 ··· 1 0 
0 0 0 ··· 0 1
1 0 0 ··· 0 0

Note that {I, T, T 2 , . . . , T n−1 } is a basis for Q[T ]. For reasons described already, we will
assume n = 2t+1 is odd. We have required A, B, C, D to be symmetric circulant matrices;
equivalently, they lie in the subalgebra of Q[T ] having basis {I} ∪ {T i +T −i : 1 6 i 6 t}.
Now the condition HH T = mIm reduces to the single relation

A2 + B 2 + C 2 + D2 = 4nIn .

In the following, the group G = {1, g, g 2 , . . . , g n−1 } is cyclic of order n; and we abbre-
viate the elements ωi = g i + g −i ∈ Z[G] for i ∈ Z/nZ which are easily seen to satisfy
ωi ωj = ωi+j + ωi−j (noting that all indices here are modulo n). As before, we write
t
ωi ∈ Z[G] where t = n−1
P i P
κ= g = 1+ 2 .
i∈Z/nZ i=1

Theorem 9.6 ([Wi]; see also [BH]). Let G be a cyclic group of order n = 2t + 1.
The following conditions are equivalent.
(i) There exists a Hadamard matrix of order m = 4n = 8t + 4 of Williamson type.
(ii) There exist elements α, β, γ, δ ∈ Z[G] of the form
Pt Pt Pt t
P
α = 1 + ai ωi , β = 1 + bi ωi , γ = 1 + ci ωi , δ = 1+ di ωi
i=1 i=1 i=1 i=1
where ai , bi , ci , di ∈ {1, −1}, satisfying
α2 + β 2 + γ 2 + δ 2 = 4n.
(iii) There exists a partition {1, 2, . . . , t} = I1 t I2 t I3 t I4 (with Ij possibly empty)
and signs ε1 , ε2 , . . . , εt ∈ {1, −1} such that the elements
X
τj = 1 + 2 εi ωi ∈ Z[G]
i∈Ij
satisfy τ12 + τ22 + τ32 + τ42 = 4n.

Proof. It is easily verified that for an m × m matrix H of the form described above , one
has HH T = mIm iff A2 + B 2 + C 2 + D2 = 4nIn ; here we use the fact that A, B, C, D are
commuting symmetric n × n matrices with entries ±1. This condition requires that
9. Hadamard Matrices 

t t
ai (T i +T −i ), di (T i +T −i )
P P
A = a0 I + ... , D = d0 I +
i=1 i=1
where all coefficients ai , bi , ci , di ∈ {1, −1}. Without loss of generality, a0 = 1; otherwise
replace A by −A while preserving all necessary conditions. Similarly, b0 = c0 = d0 = 1.
The algebra Q[T ] is nothing other than the group algebra of a cyclic group G of order n; and
using the generic symbol g for the generator of G, the isomorphism takes the form Q[T ] →
Pn−1 Pn−1
Q[G], i=0 ai T i 7→ i=0 ai g i . Taken together, these facts establish the equivalence of
(i) and (ii).
P
Now given α, β, γ, δ as in (ii), let us write α = 2α0 −κ where α0 = 1 + i∈S1 ωi and S1
is the set of all i ∈ {1, 2, . . . , t} such that ai = +1. Similarly write β = 2β0 −κ, γ = 2γ0 −κ,
δ = 2δ0 − κ. We have
α2 = 4α02 − 4α0 κ + κ2 = 4α02 + (n − 4 − 8|S1 |)κ
P P P
= 4 + 8|S1 | + 8 ωi + 4 ω2i + 4 (ωi+j +ωi−j ) + (n − 4 − 8|S1 |)κ
i∈S1 i∈S1 i,j∈S1
i>j

and similarly for β, γ, δ using subsets S2 , S3 , S4 ⊆ {1, 2, . . . , t}. By (ii), we have


4n = α2 + β 2 + γ 2 + δ 2
4 h
X i
P P P
= 16 + 4(n − 4)κ + 4 (2−2κ)|Sk | + 2 ωi + ω2i + 2 (ωi+j +ωi−j ) .
k=1 i∈Sk i∈Sk i,j∈Sk
i>j

Note that the constant term (i.e. the g 0 term) on the right side is 16 + 4(n − 4) = 4n, in
agreement with the left side. Comparing coefficients of g ` on both sides for 1 6 ` < n gives
4 4 
(i, j) ∈ Sk2 : i > j, |i−`| = j
P P
0 = 4(n−4) − 8 |Sk | + 8s` + 4s2` + 8
k=1 k=1

where s` = 2 + 21 a` + b` + c` + d` is the number of k such that ` ∈ Sk . Reading the last




equation modulo 8, and recalling that n is odd, we deduce that s` is odd; that is, exactly
three of the terms a` , b` , c` , d` have the same sign, and the other has the opposite sign.
Equivalently (see Exercise #3), the vector
−1 1 1 1
 
  1  1 −1 1 1 
a` , eb` , e
c` , de` = a` , b` , c` , d` U, U = 
2 1 1 −1 1
e 
1 1 1 −1
has three zero coordinates and one coordinate ±2. Setting (τ1 , τ2 , τ3 , τ4 ) = (α, β, γ, δ)U ,
we obtain four elements
t
P t
P t
P t
P
τ1 = 1 + a` ω` ,
e τ2 = 1 + eb` ω` , τ3 = 1 + c` ω ` ,
e τ4 = 1 + de` ω`
`=1 `=1 `=1 `=1

in Z[G] having exactly the form described by (iii). The converse (iii)⇒(ii) follows by re-
versing the steps.
 9. Hadamard Matrices

Theorem 9.6 provides the following algorithm, demonstrated in the next two examples,
to construct Hadamard matrices of order 4n ≡ 4 mod 8. Applying the trivial character
χ : Z[G] → Z, g ag g 7→ g ag to the relation τ12 +τ22 +τ32 +τ42 = 4n gives a representation
P P
P4
of 4n as a sum of four odd squares k=1n2k where nk = χ(τk ) ≡ 1 mod 4 is the trivial
character value of τk (the sum of the integer coefficients of τk ). We first therefore enumerate
all representations of 4n as a sum of four odd squares. Next, we find τk ∈ Z[G] of the form
1 ± 2ωi ± 2ωj ± · · · with distinct subscripts, whose sums of coefficients give the required
representations of 4n as a sum of four squares.

Example 9.7: A Hadamard matrix of Williamson type of order 20. Let G = hgi be cyclic
of order n = 5. The unique representation of 20 as a sum of four odd squares (unique, that is, up
to permutation of the four terms) is 20 = 12 + 12 + (−3)2 + (−3)2 . This gives a unique solution
(again, up to permutation) of the relations (iii) in Theorem 9.6, namely τ1 = τ2 = 1, τ3 = 1 − 2ω1 ,
τ4 = 1 − 2ω2 where ω1 = g+g 4 and ω2 = g 2 +g 3 . We check that this unique feasible choice does
indeed satisfy (iii):
τ12 + τ22 + τ32 + τ42 = 1 + 1 + (1−2ω1 )2 + (1−2ω2 )2 = 4 − 4ω1 − 4ω2 + 4ω12 + 4ω22
= 4 − 4ω1 − 4ω2 + 4(ω2 +2) + 4(ω1 +2) = 20.
We obtain
(α, β, γ, δ) = (τ1 , τ2 , τ3 , τ4 )U = (1−ω1 −ω2 , 1−ω1 −ω2 , 1+ω1 −ω2 , 1−ω1 +ω2 )
which yields
     
+−−−− ++−−+ +−++−
−+−−− +++−− −+−++
A = B = −−+−−  , C = −+++−  , D = +−+−+ 
−−−+− −−+++ ++−+−
−−−−+ +−−++ −++−+

and thereby a Hadamard matrix of Williamson type of order 20.

The same general strategy is used in the next example; except that with n = 9 in
the following example, n is not prime (compare with n = 5 in the previous example) so a
P4
little more work is required. The required condition k=1 τk2 = 4n in Z[G] is equivalent
P4
to k=1 χ(τk )2 = 4n as χ ∈ G b ranges over a set of representatives of characters of orders
dividing n (see Corollary 7.3). Thus in Example 9.8 we consider characters of order 1,
3 and then 9, in that order, refining our choices of the τk ’s with each step. While these
examples are intended to provide a taste of how this strategy might work in general, the
reader should keep in mind that for all but the smallest values of n, this strategy is typically
implemented by computer.

Example 9.8: Hadamard Matrices of order 36 of Williamson Type. Let G = hgi be cyclic of
order n = 9. This group has an automorphism mapping g 7→ g 2 which cycles ω1 7→ ω2 7→ ω4 7→ ω1
while fixing ω3 7→ ω3 , thereby reducing the list of equivalence classes of the resulting Hadamard
matrices. Representations of 36 as a sum of four odd squares include (−3)2 + (−3)2 + (−3)2 + (−3)2
and 12 +12 +(−3)2 +52 , both of which lead to solutions of (iii). Here we consider only solutions of the
second type such that (τ1 , τ2 , τ3 , τ4 ) has the form (1, 1−2ωi +2ωj , 1−2ωk , 1+2ω` ) where {i, j, k, `} =
{1, 2, 3, 4}; and Exercise #4 covers the remaining cases not treated here. We require that
(*) 12 + (1−2ωi +2ωj )2 + (1−2ωk )2 + (1+2ω` )2 = 36; or equivalently,
(**) χ(1)2 + χ(1−2ωi +2ωj )2 + χ(1−2ωk )2 + χ(1+2ω` )2 = 36 for all χ ∈ G.
b
9. Hadamard Matrices 

The pattern chosen for the τk ’s ensures that the trivial character satisfies (**) a fortiori; and before
dealing with all 24 choices of indices in (*), we find that most of these cases can be eliminated readily
using (**). Take now the character χ of order 3 with χ(g) = ζ = ζ3 , so that χ(ωi ) = ζ+ζ 2 = −1 for
i = 1, 2, 4; χ(ω3 ) = 2. Thus
(a) χ(1−2ωk )2 =  9 for k = 1, 2, 3, 4;
2 1, for ` = 1, 2, 4;
(b) χ(1+2ω` ) =
25, for ` = 3;
49, for i 6= j = 3;
(
2
(c) χ(1−2ωi +2ωj ) = 25, for i = 3 6= j;
1, for i, j, 3 distinct.
We clearly require either i = 3 or ` = 3. If i = 3 then (i, j, k, `) = (3, 1, 2, 4) or (3, 1, 4, 2) (or four
other possibilities equivalent to these using the 3-cycle mentioned at the outset); but none of these
cases satisfy (*). Indeed we may directly compute
1 + (1−2ω3 +2ω1 )2 + (1−2ω2 )2 + (1+2ω4 )2
= 1 + (1 + 4ω32 + 4ω12 − 4ω3 + 4ω1 − 8ω3 ω1 ) + (1 − 4ω2 + 4ω22 ) + (1 + 4ω4 + 4ω22 )
= 1 + 1 + 4(ω3 +2) + 4(ω2 +2) − 4ω3 + 4ω1 − 8(ω4 +ω2 )
+ 1 − 4ω2 + 4(ω4 +2) + 1 + 4ω4 + 4(ω4 +2)
= 36+8ω1 −8ω2 6= 0
and similarly for (i, j, k, `) = (3, 1, 4, 2). If ` = 3 then up to equivalence we have (i, j, k, `) =
(1, 2, 4, 3) or (2, 1, 4, 3). More direct computation rules out the first of these cases; but the choice
(i, j, k, `) = (2, 1, 4, 3) is found to satisfy (*). So (τ1 , τ2 , τ3 , τ4 ) = (1, 1−2ω2 +2ω1 , 1−2ω4 , 1+2ω3 )
gives a Hadamard matrix of order 36 of Williamson type.

Theorem 9.9 ([Wh]). For every prime power q ≡ 1 mod 4 there exists a Hadamard
matrix of order 4n = 2(q + 1) of Williamson type.

Proof. Take n = 12 (q + 1) throughout, and note that n is odd. Consider the quadratic
extension E ⊃ F of fields of order q 2 and q respectively. Let χ : F → {0, ±1} be the
quadratic character. Choose a generator ω for the multiplicative group E × , so that ω1 =
ω q+1 is a generator of F × . We abbreviate Tr = TrE/F : x 7→ xq + x throughout. We
define the sequence uk = χ(Tr ω k ) for k ∈ Z. Recall that the quadratic character χ
satisfies χ(ω1k ) = (−1)k and so uk+2n = χ(Tr ω k+2n ) = χ(Tr(ω1 ω k )) = χ(ω1 Tr ω k ) =
χ(ω1 )χ(Tr ω k ) = −uk . Thus
(9.10) uk+2n = −uk for all k. In particular, the value of uk depends only on
k mod 4n.

Since Tr ω k = ω qk + ω k = ω (q+1)k (ω −k + ω −qk ) = ω1k Tr ω −k , applying χ yields


(9.11) u−k = (−1)k uk for all k.
2
The element θ = ω n satisfies θ2 = ω 2n = ω1 and θq = ω nq = ω (q −1)/2 ω n = −θ so Tr θ = 0.
Now {1, θ} is a basis for E over F ; and Tr(a + bθ) = 2a for all a, b ∈ F . Note that for
z ∈ E, we have χ(Tr z) = 0 iff Tr z = 0 iff z ∈ F θ = {bθ : b ∈ F }. Thus
(9.12) uk = 0 for k ≡ n mod 2n; and otherwise, uk = ±1.
 9. Hadamard Matrices

Next we want to evaluate

χ(Tr z)χ(Tr(ω r z)) for z ∈ E, r ∈ Z.


P
(9.13)
z∈E

Writing z = a + bθ, ω r = c + dθ with a, b, c, d ∈ F , the sum (9.13) takes the form


X X X
1
χ c + bdω
 
χ(2a)χ(2(ac + bdω1 )) = χ 2a χ(2(ac + bdω1 )) = a
1

a,b∈F a,b∈F a,b∈F


a6=0 a6=0

q(q − 1)χ(c), if d = 0;

=
0, if d 6= 0.
Using the periodicity relation (9.10) and the fact that (9.13) vanishes at z = 0, the sum
(9.13) simplifies to
n−1
q(q − 1)χ(c), if d = 0;
X 
k k+r
2(q − 1) χ(Tr ω )χ(Tr(ω )) =
k=0
0, if d 6= 0.

Also ω r = c + dθ ∈ F iff r ≡ 0 mod 2n, in which case χ(2c) = χ(Tr ω r ) = (−1)r/2n ; so


 r
n−1
P (−1) 2n χ(2)q, if r ≡ 0 mod 2n;
(9.14) uk uk+r =
k=0 0, otherwise.
n−1 n−1
ak T k , . . . , D = dk T k where
P P
Now construct the four n × n matrices A =
k=0 k=0

1, if k ≡ 0 mod n; 1, if k ≡ 0 mod n;
 
ak = bk = ck =dk =u4k , for all k.
u4k+n , otherwise; −u4k+n , otherwise;
Note by (9.12) that all the values ak , bk , ck , dk ∈ {±1}. Now
n−1
P n−1 
A2 + B 2 + C 2 + D2 = (ak ak+r + bk bk+r + ck ck+r + dk dk+r ) T r .
P
r=0 k=0

All the diagonal entries (for r = 0) are clearly 4n; and for r 6= 0 we have
n−1
X n−1
X n−1
X
(ck ck+r + dk dk+r ) = 2 u4k u4k+4r = 2 u` u`+r = 0
k=0 k=0 `=0

using the permutation k 7→ ` = 4k on Z/nZ. The remaining terms are


n−1
X X
(ak ak+r + bk bk+r ) = (ak ak+r + bk bk+r ) since ar +br = u4r+n −u4r+n = 0
k=0 0<k<n
k6≡−r mod n
and a−r +b−r = u−4r+n −u−4r+n = 0
X
=2 u4k+n u4k+4r+n since un = 0 by (9.12)
06k<n
n−1
X
=2 u` u`+r = 0,
`=0
9. Hadamard Matrices 

again using a permutation k 7→ ` = 4k + n of the index set Z/nZ. Now A, B, C, D are


commuting circulant matrices which satisfy A2 + B 2 + C 2 + D2 = 4nIn as required.

Example 9.15: A Hadamard Matrix of order 12. Here we demonstrate Theorem 9.9 for q = 5,
n = 12 (q+1) = 3. Take F25 = F5 [ω] where the primitive element ω satisfies ω 2 = ω + 2. The sequence
u0 , u1 , u2 , . . . is −+−0+++−+0−−−+−0+++−+0−− −+− . . ., giving A=I−T −T 2 , B=I+T +T 2 ,
C = D = −I+T +T 2 . The construction of Theorem 9.9 gives
 
+−− +++ −++ −++
−+− +++ +−+ +−+
 −−+ +++ ++− ++− 
A B C D −−− +−− +−− −++ 
 
−B −−− −+− −+− +−+ 
A −D C  

−−− −−+ −−+ ++− 
H=  = +−− −++ +−− −−−  .

 −C D A −B   −+− +−+ −+− −−− 

−−+ ++− −−+ −−− 
−D −C B A 
+−− +−− +++ +−− 
−+− −+− +++ −+−
−−+ −−+ +++ −−+

Regular Hadamard Matrices


A Hadamard matrix is regular if all its row sums and all its column sums are equal.
By Exercise #1, a regular Hadamard matrix is the same thing as a matrix of the form
H = A+ − A− where A+ and A− are incidence matrices of a complementary pair of
symmetric (4n2 , 2n2 ±n, n2 ±n)-designs. Each of these designs has order n.
A regular Hadamard matrix may or may not arise from a difference set. When it does,
it is given by a difference set with parameters (4n2 , 2n2 ±n, n2 ±n) in a group of order 4n2 .
A difference set with these (very prolific) parameters is a Menon difference set, and the
associated design a Menon design. (Warning: Many authors also refer to these as sim-
ply Hadamard difference sets, inviting confusion with the parameters (4n−1, 2n−1, n−1)
which also go by this name.) Up to complementation, it suffices to consider only the pa-
rameters (4n2 , 2n2 −n, n2 −n). Note that interchanging A+ ↔ A− switches H ↔ −H, but
every Hadamard matrix is equivalent to its negative. Note also that in general, a regular
Hadamard matrix may be equivalent to a Hadamard matrix which is not regular. Thus
for example, (9.1) includes Hadamard matrices H2 and H4 of order 2 and 4 respectively;
here H4 is regular while the equivalent matrix H2 ⊗ H2 is not.
The smallest regular Hadamard matrices of interest arises from symmetric (16, 6, 2)-
design, which in turn arises from a difference sets. Of the five equivalence classes of
Hadamard matrices of order 16, three classes contain regular Hadamard matrices. One
might argue that the one with the most symmetry is H4 ⊗ H4 with H4 = J4 − 2I4 as in
the previous paragraph. We describe the symmetric design and difference sets giving rise
to this example; and we comment only briefly on the other regular Hadamard matrices of
order 16.
There are exactly three symmetric (16, 6, 2)-designs, up to isomorphism. All of them
arise from difference sets in multiple ways; these details may be found in [AK], [AS], [Ki].
 9. Hadamard Matrices

Indeed, of the fourteen groups of order 16, all but two (the cyclic and dihedral groups)
admit (16, 6, 2)-difference sets in multiple ways. However, different groups and distinct
difference sets in these groups can yield isomorphic designs. This is because a given design
may admit more than one regular (i.e. sharply transitive) group of automorphisms; recall
Theorem 8.5. What is true, however, is that the three nonisomorphic symmetric (16, 6, 2)-
designs yield three inequivalent regular Hadamard matrices. We describe just one of these,
beginning with the design.

We construct a symmetric (16, 6, 2)-design whose 16 points are the 16


cells of a 4 × 4 grid, as pictured on the right. The shaded cells in our pic-
ture indicate one of the sixteen blocks, where we have taken the symmetric
difference of a row and a column. This can be done in 16 ways, generat-
ing thereby the 16 blocks: take the union of any row and any column, then
delete the single cell where they intersect. Verify mentally that every block contains 6
points; every point lies in 6 blocks; two distinct points have exactly 2 blocks in common;
and two distinct blocks have exactly 2 points in common. There are twelve nonisomorphic
groups that can act regularly on the points (and blocks) of this design, yielding twenty-
four inequivalent difference sets. Here we consider only the possibilities with G abelian.
Let K1 = {1, a, b, c} and K2 = {1, a0 , b0 , c0 } be multiplicative groups of order 4 (any
combination of cyclic groups and Klein 4-groups is fine) and let G = K1 × K2 . By
choosing distinct symbols for the nonidentity elements in K1 and K2 , these subgroups
can be identified with their images in G = K1 × K2 . Then D = {a, b, c, a0 , b0 , c0 } is a
difference set in G with parameters (16, 6, 2). If we denote κ1 = 1 + a + b + c ∈ Z[K1 ] and
κ2 = 1 + a0 + b0 + c0 ∈ Z[K2 ], then κ := κ1 κ2 = G and
P

δδ ∗ = δ 2 = (κ1 + κ2 − 2)2 = 4κ1 + 4κ2 + 4 − 4κ1 − 4κ2 + 2κ1 κ2 = 4 + 2κ,

verifying that D is a difference set in G with parameters (16, 6, 2). If we let K1 act regularly
on the rows of the grid, and K2 regularly on the columns, then G = K1 × K2 acts regularly
on the cells (i.e. points of the design) and D is the set of all g ∈ G mapping P into B,
for some choice of point P and block B. It is not hard to see that the resulting regular
Hadamard matrix is H4 ⊗ H4 in the notation of the previous paragraph.

Circulant Hadamard Matrices


A circulant Hadamard matrix is a circulant matrix that is also Hadamard. The
only known examples have order 1 and 4, as in (9.1). It is clear that every circulant
Hadamard matrix must be regular; and it must be constructed from a Menon difference
set in a cyclic group of order 4n2 for some positive integer n. The Circulant Hadamard
Conjecture poses that these do not exist for n > 1. Much work has been devoted to
proving this conjecture; see [LeS], [S2]. (In related internet searches, beware of bogus
proofs claiming to have already proved the conjecture.) In the smallest nontrivial case
9. Hadamard Matrices 

4n2 = 16, we noted above that there is no (16, 6, 2)-difference set in a cyclic group of
order 16. This result, long known, is attributed to Turyn.

Theorem 9.16. There is no cyclic difference set with parameters (16, 6, 2), and
hence no circulant Hadamard matrix of order 16.

Proof. Let G = {1, g, g 2 , . . . , g 15 } be cyclic of order 16, and suppose there exists a differ-
P
ence set D ⊂ G with parameters (16, 6, 2). Then δ = d∈D d ∈ Z[G] is a sum of six distinct
P15
elements in G satisfying δδ ∗ = 4+2κ where κ = i=0 g i . Let O = Z[ζ], ζ=ζ16 and consider
the character χ ∈ Gb of order 16 satisfying χ(g) = ζ. Denote α = χ(δ) = P
d∈D χ(d) ∈ O.
By Theorem 8.7, |α| = 2. (Note here that (v, k, r) = (16, 6, 2) and n = 6 − 2 = 4.) Much
more than this, for every σ ∈ Aut Q[ζ], we have |σ(α)| = 2, since σ ◦ χ : G → hζi is also a
nontrivial character of G.
Evaluating both sides of
Φ8 (x) = x8 + 1 = (x − ζ)(x − ζ 3 )(x − ζ 5 ) · · · (x − ζ 15 )
at 1 yields
2 = (1 − ζ)(1 − ζ 3 )(1 − ζ 5 ) · · · (1 − ζ 15 ).
By Theorem 4.3, all eight factors in the latter product are associates of ε = 1 − ζ in O,
yielding the factorization of principal ideals (2) = (ε)8 , so N((ε)) = 2 and the ideal
(ε) ⊂ O is prime, the only distinct prime factor of (2) in O (i.e. the prime 2 ramifies
in Q[ζ]). Also (α)(α) = (4) = (ε)16 , so comparing prime factors on both sides gives
(α) = (ε)r and (α) = (ε)s where r + s = 16. Now 2r = N((α)) = N((α)) = 2s so
r = s and (α) = (α) = (ε)8 = (2). Since α and 2 are associates, α = 2u for some unit
u ∈ O× . Recalling that |σ(α)| = 2 for all σ ∈ Aut Q[ζ], we must have |σ(u)| = 1 for all
σ ∈ Aut Q[ζ]. By Theorem 4.10, u is a root of unity in O. By Theorem 4.2, u = ζ k for
some k ∈ {0, 1, 2, . . . , 15}. Without loss of generality, α = 2; otherwise use Theorem 8.6
to replace D by g −k D, another (equivalent) difference set in G with the same parameters
(16, 6, 2) satisfying χ(g −k δ) = ζ −k α = 2.
P15 P15
Now express δ in the form δ = k=0 ak g k where ak ∈ {0, 1} with k=0 ak = 6, and
note that ζ 8 = −1 to get
15 7
ak ζ k = (ak − ak+8 )ζ k .
P P
(9.17) 2 = α = χ(δ) =
k=0 k=0

Since the minimal polynomial of ζ over Q is x8 +1, (9.17) requires a0 = 2+a8 and ak =ak+8
for k = 1, 2, . . . , 7. Since ak ∈ {0, 1}, this is impossible.

Although Q[ζ16 ] is known to be a UFD, we did not require this in the latter proof; all our
factorizations were with regard to ideals.
 10. Quadratic Reciprocity

Exercises 9.
1. Show that every Hadamard matrix of the form H = I + S where S T = −S with first row
+++· · ·+ and first column +−−· · ·− satisfies conditions (9.4) and (9.5). (Not much more needs
to be said about (9.4), but (9.5) requires some explanation.)

2. Let H be a v × v regular Hadamard matrix, v = 4N . Suppose that H is regular; so by definition,


there is an integer k such that every row and column of H has k ones and v−k minus ones.
Prove that the positions of the ones in H form the incidence matrix of a (4n2 , 2n2 ±n, n2 ±n)-
design; and the positions of the minus ones form the incidence matrix of the complementary
(4n2 , 2n2 ∓n, n2 ∓n)-design. In other words, show that H = A+−A− where A++A− = Jv and the
matrices A+ and A− are the incidence matrices of a pair of complementary (4n2 , 2n2 ±n, n2 ±n)-
designs.
Hint: Show that for any two distinct rows of H, the k minus ones in one row and the k minus ones
in the other row overlap in exactly r positions where r = k − N . Show that the positions of the
minus ones form the incidence matrix of a symmetric (v, k, r) design. Show that N = (2N − k)2
using the feasibility relation of Section 8. Let n = 2N − k.
3. The root lattice of type D4 is the set L consisting of all vectors 21 (a1 , a2 , a3 , a4 ) ∈ R4 such
that a1 , a2 , a3 , a4 are integers of the same parity (i.e. all even or all odd).
(a) Show that L is a 4-dimensional lattice; that is, there exists a basis {v1 , v2 , v3 , v4 } of the real
vector space R4 such that L = Zv1 + Zv2 + Zv3 + Zv4 .
(b) Show that for every v ∈ L, the squared Euclidean length ||v||2 is an integer.
(c) The roots of L are the vectors v ∈ L of length 1. Show that there are sixteen roots,
partitioned as ∆ = ∆0 t ∆1 where |∆0 | = |∆1 | = 8, ∆0 ⊂ Z4 and vectors in ∆1 have
half-integer coordinates.
(d) Consider the Hadamard matrix H = J4 −2I4 of order 4 given at the start of this section. Show
that the matrix U = 21 H represents an isometry (i.e. distance-preserving transformation) on
R4 , which preserves L; in particular, U permutes the roots of L. Show moreover that U
interchanges the two subsets ∆0 ↔ ∆1 of the roots.
4. Complete the enumeration of Hadamard matrices of order 36 of Williamson type begun in Ex-
ample 9.8.
(a) Show that the representation 36 = (−3)2 + (−3)2 + (−3)2 + (−3)2 leads to a unique solu-
tion (τ1 , τ2 , τ3 , τ4 ) = (1−2ω1 , 1−2ω2 , 1−2ω3 , 1−2ω4 ) up to permutation of the four indices,
leading to a Hadamard matrix of order 36 of Williamson type.
(b) Show that the representation 36 = 12 + 12 + (−3)2 + 52 leads to just one more possibility
other than the one we considered in Example 9.8, namely (1, 1, 1−2ωi , 1−2ωj +2ωk +2ω` ).
Does this case actually yield any Hadamard matrices? Explain.

10. Quadratic Reciprocity


Let F = Fp be the field of odd prime order p, i.e. the integers mod p. (Some of the
results of this section will be extended to more general odd-order fields in Section 11.)
We recall some properties of F from Section 3: The group of units F × is cyclic of order
p − 1, partitioned as F × = S t N where S and N consist of the nonzero squares and the
nonzquares respectively. We have |S| = |N | = 12 (p − 1). Here S < F × is the subgroup of
index 2, with N the nontrivial coset of S. We have the quadratic character χ : F → C
where χ(a) = 0, 1 or −1 according as a = 0, a ∈ S or a ∈ N .
10. Quadratic Reciprocity 

If we compose χ with the canonical homomorphism of additive groups Z → Z/pZ = F ,


we get the Legendre symbol defined for every integer a by
0, if a ≡ 0 mod p;
(
a
= 1, if a is a nonzero square mod p;
p
−1, if a is a nonsquare mod p.
The fact that χ : Z → C× is multiplicative is a simple consequence of the fact that it is
a composite of two multiplicative maps. The lifting of χ (the quadratic character on the
finite field F ) to all of Z, is an example of a Dirichlet character (see Appendix 6). In this
context, squares and nonsquares (mod p) are traditionally called quadratic residues and
nonquadratic residues; but unfortunately the latter is often abbreviated to nonresidues,
which is strictly a misnomer. We will simply speak of nonzero squares and nonsquares
mod p. The strict definition of the Legendre symbol usually excludes the case a ≡ 0
mod p; but a natural generalization of ap called the Jacobi symbol allows this case, and


so do we. (But we will have nothing more to say about the Jacobi symbol in these notes.)
As with the quadratic character on F , several other properties of F underlie basic
properties of the ring Z. For example Fermat’s Little Theorem, in the form
(10.1) ap−1 ≡ 1 mod p for every integer a 6≡ 0 mod p, p prime,

is a direct consequence of the fact that F × is a group of order p − 1. And from (10.1) we
immediately obtain the equivalent form of Fermat’s Little Theorem,
(10.2) ap ≡ a mod p for every integer a, where p is any prime.

Also the formula χ(a) = a(p−1)/2 in F , restated in the context of rational integers, gives
Euler’s Criterion:
a
(10.3) ≡ a(p−1)/2 mod p for every integer a.
p
We restate the multiplicativity of the Legendre symbol, which follows either from facts
about F or from (10.3):
 ab   a  b 
(10.4) = for all integers a, b.
p p p

The main result of this Section is

Theorem 10.5. Let p and q be distinct odd primes. Then


 −1  
(p−1)/2 1, if p ≡ 1 mod 4;
(a) = (−1) =
p −1, if p ≡ 3 mod 4;
2 
2 1, if p ≡ ±1 mod 8;
(b) = (−1)(p −1)/8 =
p −1, if p ≡ ±3 mod 4;
 p  q  
(p−1)(q−1)/4 1, if at least one of p, q is ≡ 1 mod 4;
(c) = (−1) =
q p −1, if p ≡ q ≡ 3 mod 4.
 10. Quadratic Reciprocity

Part (c) is properly known as the Law of Quadratic Reciprocity; and we include (a)
and (b) for completeness. There are currently hundreds of proofs of this result known—
probably more proofs than any result other than the Theorem of Pythagoras. A current
census lists 246 distinct proofs:
http://www.rzuser.uni-heidelberg.de/∼hb3/fchrono.html
Gauss himself gave many different proofs; and the proof we give here is his sixth proof,
although it has been rediscovered by many others since then. Before proving this result,
we demonstrate its utility for computing specific values of the Legendre symbol:
Example 10.6: Computing the Legendre symbol using the Law of Quadratic Reciprocity.
a
Evaluate each of the values ( 331 ) for a = 83, 101 and 146, noting that 331 is prime.
83
Solutions: ( 331 ) = −( 331
83
) = −( 82
83
) = −( −1
83
) = −(−1) = 1.
101 2
( 331 ) = ( 331
101
28
) = ( 101 2
) = ( 101 7
) ( 101 ) = ( 101
7
) = ( 73 ) = −( 73 ) = −( 13 ) = −1.
3
( 146
331
2
) = ( 331 73
)( 331 ) = (−1)( 331
73
39
) = −( 73 3
) = −( 73 13
)( 73 ) = −( 73
3
73
)( 13 ) = −( 31 )( 13
8 2
) = −( 13 ) = −(−1)3 = 1.

The utility of the Law of Quadratic Reciprocity for computing values of the Legendre
symbol is primarily for using hand computation with small examples such as these. For
larger integers, Euler’s Criterion (10.3) is much easier to implement by computer; and it
avoids the difficulty of having to perform integer factorization on larger numbers (which
is prohibitive for integers of hundreds of digits). But of course, the Law of Quadratic
Reciprocity has many uses beyond such numerical examples as these.
Example 10.7: Factoring quadratic polynomials mod p. Factor each of the polynomials
x2 + 13x + 17 and 3x2 + 13x + 16 in F823 [x].
Solution: x2 + 13x + 17 has discriminant 132 − 4·17 = 101 where
101
( 823 ) = ( 823
101
15
) = ( 101 5
)( 101 ) = ( 101
3
)( 101
5
) = ( 32 )( 15 ) = (−1)(1) = −1.
This polynomial has no roots in F823 since its discriminant is a nonsquare; so it is irreducible in
F823 [x].
The polynomial 3x2 + 13x + 16 has discriminant 132 − 4·3·16 = −23 where
h i
3 2
( −23
823
−1
) = ( 823 23
)( 823 ) = (−1) −( 823
23
) = ( 18
23
2
) = ( 23 )( 23 ) = (1)(−1)2 = 1.
Since the discriminant is a nonzero square in F823 , there are two √ distinct roots mod 823. Other
computational methods (see Exercise #3) confirm that 16 (−13 ± −23) = 533, 560 are the two roots
in F823 , yielding the factorization 3x2 + 13x + 16 = 3(x − 533)(x − 560) = 3(x + 290)(x + 263).

Our proof of Theorem 10.5 will require some preparation. First note that
p−1   p − 1
X k 0 1 2
(10.8) = + + + ··· + = 0
p p p p p
k=0

since the sum includes one 0 term, p−1 p−1


2 terms equal to +1, and 2 terms equal to −1.
Next recall that if a 6≡ 0 mod p, then ζ a is a primitive pth root of unity, hence a root of
Φp (x); so
(10.9) 1 + ζ a + ζ 2a + · · · + ζ (p−1)a = 0.
Next, following Gauss, we consider the sum
10. Quadratic Reciprocity 
p−1   p−1  
X k k
X k
(10.10) S = ζ = ζk;
p p
k=0 k=1

thus for example S = ζ − ζ 2 − ζ 3 + ζ 4 when p = 5. Sums of the form (10.10) are called
quadratic Gauss sums. Gauss himself proved

p, if p ≡ 1 mod 4;
 −1  
2
Lemma 10.11. S = p=
p −p, if p ≡ 3 mod 4.

Proof. We expand
p−1    X
X p−1   
2 k k ` `
S = ζ ζ
p p
k=1 `=0
p−1 X
p−1 
X k  `  k+`
= ζ
p p
k=1 `=0
p−1 X
p−1 
X k` 
= ζ k+` (by (10.4))
p
k=1 `=0
p−1 X
p−1  2 
X k m k+km
= ζ (substituting ` = km)
m=0
p
k=1
p−1 X
p−1 
X m  (1+m)k
= ζ
p
k=1 m=0
p−1   p−1 
X m  X (1+m)k
= ζ .
m=0
p
k=1
p−1
P (1+m)k
Now if m 6= p−1 then the inner sum ζ = −1 by (10.9), whereas if m = p−1 we
p−1
P (1+m)k p−1
P k=1
have ζ = 1 = p−1. This leaves
k=1 k=1
p−2 
2
X m p − 1
S =− + (p − 1)
m=0
p p
p − 1 p − 1
= + (p − 1) (by (10.8))
p p
p − 1
= p.
p

By Lemma 10.11,
 √
± p, if p ≡ 1 mod 4,
S= √
±i p, if p ≡ 3 mod 4.
 10. Quadratic Reciprocity

The ambiguous signs in this formula stare us in the face, begging to be resolved. This
is a natural and compelling problem, which perplexed Gauss for years before finally the
answer came to him. As Gauss wrote to a friend in 1805:

The determination of the sign of the root has vexed me for many years. This deficiency over-
shadowed everything that I found: over the last four years, there was rarely a week that I did not
make one or another attempt, unsuccessfully, to untie the knot. I succeeded—but not as a result
of my search but rather, I should say, through the mercy of God. As lightning strikes, the riddle
has solved itself.

The conclusive result, as Gauss showed, has ‘+’ in place of each of the signs ‘±’ above;
however for the purpose of proving the Law of Quadratic Reciprocity, the less definitive
version stated in Lemma 10.11 above suffices.
Before proceeding further, here is our last preparatory result:

(10.12) For every prime q, (x + y)q ≡ xq + y q mod qZ[x, y].

This follows from the binomial expansion of (x + y)q , using the fact that all binomial
q!
coefficients kq = k!(q−k)!

are divisible by q for k ∈ {1, 2, . . . , q−1}. This argument has
appeared in the proof of Theorem 3.7, but here our setting is a little more general: our x
and y are not in Fpe , nor are they in Z; they are indeterminates. So (10.12) is sufficiently
general as to apply in an arbitrary commutative ring. We will in particular evaluate (10.12)
for x, y ∈ Z[ζp ] in the course of the following proof.

Proof of Theorem 10.5. Let p, q be distinct odd primes, and consider the Gauss sum
p−1
P k k q−1
S= p ζ ∈ O as in (10.11), where O = Z[ζ], ζ = ζp . Taking the 2 power of the
k=0
relation in the Lemma 10.11 gives
 −1 (q−1)/2
(p−1)(q−1)/4 p
 
q−1 2 (q−1)/2 (q−1)/2
S = (S ) = p ≡ (−1) mod qO.
p q
Multiplying both sides by S gives
p
(−1)(p−1)(q−1)/4 S ≡ Sq
q
q−1  q
X k qk
≡ ζ (by (10.12))
p
k=0
q−1  
X k
≡ ζ qk (since q is odd)
p
k=0
q−1 
X kq 2  qk
≡ ζ (since q 6≡ 0 mod p)
p
k=0
10. Quadratic Reciprocity 

q−1 
X `q 
≡ ζ` (substituting ` = kq)
p
`=0
q−1 
X `  q  `
≡ ζ (by (10.4))
q p
`=0
q
≡ S mod qO.
p
Again we multiply both sides by S to obtain
(p−1)(q−1)/4 p
  q
2
(−1) S ≡ S 2 mod qO
q p
and since S 2 = −1

p p, which is an integer relatively prime to q, this gives
p q
(−1)(p−1)(q−1)/4 ≡ mod qO.
q p
All factors on both sides of this expression are ±1 so this gives part (c) of the Theorem.
Part (a) of the Theorem follows immediately from Euler’s
√ Criterion (10.3). For (b),
πi/4 1+i −1
let O = Z[ζ], ζ = ζ8 = e = 2 ; and let τ = ζ + ζ = 2.

ζ 2 =i
• ....................................
........... ......... . ...................................
........ ........ ......... ......
3 ..................................... . ......... ......
ζ
ζ .............
. ...............
• .
.
. ..
..
..
....... .
.
. • ..........
. ......
. .............
..
. ... . . ... ... .
.. . .
. . . . ... ... .
... ... .
. ... ...
..... ..... .
.
. . ... ... .
... .... . . ...
. ... ... .
. .... ... ... .
.... .... . .
.. ...
......
. .
. .
. ...
.. π/4
... ...
.....
.
.
.
−1

...... . . . . . . . . . . ... .. . . . ..... . . . . . . ........ .
ζ 4= − 1•.....
...... . 0• .
...
.
•1 .. .
...
....
. .
• τ = ζ+ζ
.
. = 2
... ... . . . .
. ..
. .
... ... . . . .
..
... ... . . ... .. . .
... ... . . . ... ...
... ... . . . .... ..... . .
.
... ... . . . ... ... .
... ... . . ... ....
... ... . ... ..
..... . . .
. ............
....... . .
• ........
.............
...... ........
. •
...............
...
...........
−1
ζ 5= − ζ ...... .........
........ .........
.
.......... ......... . ............................. .. .
........................
ζ =ζ 7
• .............................................

ζ 6= − i
By Euler’s Criterion (10.3),
2
≡ 2(p−1)/2 ≡ τ p−1 mod pO
p
so by (10.12),
2
τ ≡ τ p ≡ (ζ + ζ −1 )p ≡ ζ p + ζ −p mod pO.
p
Since ζ is an eighth root of unity, we obtain
2 2
τ ≡ (−1)(p −1)/8 τ mod pO
p
where
1, if p ≡ ±1 mod 8;

(p2 −1)/8
(−1) =
−1, if p ≡ ±3 mod 8.
 10. Quadratic Reciprocity

2
2 −1)/8

Multiplying both sides by τ gives 2 p ≡ (−1)(p 2 mod pO; and since 2 is relatively
prime to p, (b) follows.

Throughout our proof, ‘mod pO’ and ‘mod qO’ can be read as ‘mod p’ and ‘mod q’
respectively. The careful reader will observe that for x, y ∈ Z, we have x ≡ y mod pO iff
x ≡ y mod p in the usual sense; this is because Z ∩ pO = pZ.
We now resolve the sign of S. Until now we have required only that ζ is a root of Φp (t)
in some extension of Q (as this completely determines the algebraic structure of Q(ζ) ⊃ Q).
P k k
In order to resolve the ambiguous sign of S = k p ζ , we must fix a choice of ζ. This
choice needs to be made using extraneous (i.e. non-algebraic) properties of elements of our
extension, such as ordering, as without such considerations, all choices of ζ (and choices
of the sign of S) are equivalent under field automorphisms. The elusiveness of the proof of
Theorem 10.13 may be attributed to the inaccessibility of this result from purely algebraic
arguments within Q(ζ).
p−1
k
 k
Theorem 10.13. Fix ζ = e2πi/p where p is an odd prime, and let S =
P
p ζ .
k=1
Then √
p, if p ≡ 3 mod 4;
S= √
i p, if p ≡ 1 mod 4.

We present the proof found in [IR, Section 6.4]. Other proofs are available; for example
Dirichlet gave a proof using Fourier analysis. The proof in [LN, Theorem 5.15] uses some
representation theory. The correct generalization of Theorem 10.13 to all fields of odd
order is proved in Section 12 using the Hasse-Davenport relations.
p−1
Proof of Theorem 10.13. Evaluate Φp (x) = 1 + x + x2 + · · · + xp−1 = (x − ζ r ) at 1 to
Q
r=1
get p−1 p−1
p−1
p−1
Y Y 2
Y Y
r 4j 4j
p = Φp (1) = (1 − ζ ) = (1 − ζ ) = (1 − ζ ) (1 − ζ 4k )
r=1 j=1 j=1 k= p+1
2
p−1 p−1
2 2

=
Y
(1 − ζ 2p+2−4` )
Y
(1 − ζ 2p−2+4m ) (substitute ` = p+1
2 −j
p−1
and k = 2 + m)
`=1 m=1
p−1 p−1
Y2 2
Y
= (1 − ζ 2−4k )(1 − ζ 4k−2 ) = (ζ 2k−1 − ζ 1−2k )(ζ 1−2k − ζ 2k−1 )
k=1 k=1
p−1
2
p−1 Y
= (−1) 2 (ζ 2k−1 − ζ 1−2k )2 .
k=1
This says that
p−1 p−1
2 2
p−1 Y Y p−1 √
4k−2
(ζ 2k−1 − ζ 1−2k ) = ±i

(2i) 2 sin p π = 2 p
k=1 k=1
10. Quadratic Reciprocity 
p+3 p−1
where the factor sin( 4k−2

π is negative iff 6k6 2 ; and the number of integers k
 p−3  p 4
in this range is 4 . Thus
p−1
√
2
Y
b p−3 p−1 √ p, if p ≡ 1 mod 4;
(ζ 2k−1
−ζ 1−2k
) = (−1) 4 c i 2 p= √
i p, if p ≡ 3 mod 4.
k=1
This is exactly the conjectured value of S; and combining this with Lemma 10.11 and the
remarks following it,
p−1
Y2

S=ε (ζ 2k−1 − ζ 1−2k )


k=1
where ε = ±1. Our task is to prove that ε = 1. Following Kronecker, we introduce the
polynomial
p−1
p−1   2
X j j Y
f (x) = x −ε (x2k−1 − xp−2k+1 ).
j=1
p
k=1

Note that f (1) = 0 by (10.8), and f (ζ) = 0 by definition of ε; so f (x) is divisible by


(x − 1)Φp (x) = xp − 1. Write f (x) = (xp − 1)h(x) where h(x) ∈ Q[x]. In fact h(z) ∈ Z[x]
by Lemma A2.1(iii). Evaluating at x = ez ,
p−1
p−1   2
X j Y
(10.14) ejz − ε (e(2k−1)z − e(p−2k+1)z ) = (epz − 1)h(ez ).
j=1
p
k=1
p−1
Using power series expansions, compare the coefficient of z 2 on both sides of (10.14).
Pm
On the right side, we first write h(x) = j=0 bj xj where b0 , b1 , . . . , bm ∈ Z; then
m ∞
m X ∞ m
z
X
jz
X (jz)k X zk X
h(e ) = bj e = bj = ck , ck = bj j k ∈ Z
j=0 j=0 k=0
k! k! j=0
k=0
and so
∞ ∞ ∞ `
X pj z j X zk X z` X
(epz − 1)h(ez ) = ck = d` , d` = pjc`−j ∈ Z.
j=1
j! k! `! j=1
k=0 `=1
p−1
Clearly each d` ≡ 0 mod p. Equating coefficients of z 2 on both sides of (10.14), and then
multiplying both sides by p−1
2 !, we find
p−1
p−1   2
X j p−1
p−1
Y
j 2 − 2 !ε (4k−p−2) = d p−1 ≡ 0 mod p
j=0
p 2
k=1
| {z } | {z }
(10.15) (10.16)

since e(2k−1)z − e(p−2k+1)z = (4k − p + 2)z + O(z 2 ). Now (10.15) simplifies as


p−1   p−1    p−1
X j p−1 X j j X
j 2 ≡ ≡ 1 = −1 mod p
j=0
p j=0
p p j=0

by Euler’s Criterion, while (10.16) reduces as


 11. Gauss and Jacobi Sums

p−1 p−1
2 2
p−1
Y
p−1 p−1 Y
p−1 p−1 (p − 1)!
2 !ε (4k−p−2) ≡ 2 !ε·2
2 (2k−1) = 2 ! ε · 2 2
2·4·6 · · · (p−1)
k=1 k=1
= (p − 1)!ε ≡ −ε mod p
using Wilson’s Theorem (Exercise #3.1(b)). So our original congruence relating (10.15)
and (10.16) simplifies as −1 + ε ≡ 0 mod p. Since ε = ±1 and p is an odd prime, this
forces ε = 1, which completes our proof.

Exercises 10.
1. Evaluate each of the following Legendre symbols by hand as done in Example 10.6. Then use
appropriate computer software with arbitrary precision arithmetic capability to check your answer
by Euler’s Criterion (10.3).
−7
(a) ( 59
89
) (b) ( 233 ) (c) ( 111
347
) (d) ( 620
503
) (e) ( 709
809
)

2. (a) Using the Law of Quadratic Reciprocity, show that a prime p admits solutions of the con-
gruence x2 ± x + 1 ≡ 0 mod p iff p = 3 or p ≡ 1 mod 3.
(b) Now consider a finite field F = Fq where q = pe , and assume that p 6= 3. Show that solutions
of x2 + x + 1 = 0 in F are primitive cube roots of unity in F ; and solutions of x2 − x + 1 = 0
are primitive sixth roots of unity in F .
(c) Using the fact that the multiplicative group F × is cyclic, show that F has a primitive cube
root of unity iff q ≡ 1 mod 3; and F has a primitive sixth root of unity iff q ≡ 1 mod 6. (This
requires only elementary properties of cyclic groups.) Conclude that a finite field F of order
q has solutions of x2 ± x + 1 = 0 iff q ≡ 1 mod 3.

3. Complete the remaining steps of Example 10.7, using appropriate computer software with arbitrary
precision arithmetic capability.
(a) According to Example 10.7, ( −23
823
) = 1. Confirm this fact using Euler’s Criterion (10.3), by
computing (−23) 411 mod 823. (Note: 411 = 12 (823 − 1).)
(b) Using (a), evaluate (−23)412 mod 823, noting that 412 = 411 + 1.
(c) Now evaluate (−23)206 mod 823, noting that 206 = 412/2.
(d) Using the previous steps, find the two square roots of −23 mod 823.
(e) Find the two roots of 3x2 + 13x + 16 ∈ F823 [x].
(f) Generalizing your work, present to an algorithm for computing square roots mod p for an
arbitrary prime p ≡ 3 mod 4. Explain why this algorithm works.

11. Gauss and Jacobi Sums

Each finite field E = Fq gives rise to two finite groups, the additive group E of order q and
the multiplicative group E × of order q − 1. Characters of these groups are called additive
characters and multiplicative characters respectively. Much of the interplay between
these two groups is encoded in the language of Gauss sums. Following the notation of
11. Gauss and Jacobi Sums 

Section 6, the additive characters form a multiplicative group E,


b elementary abelian of
order q, whose elements ψ ∈ Eb satisfy
ψ(x + y) = ψ(x)ψ(y)
for all x, y ∈ E. Multiplicative characters form a cyclic multiplicative group E
c× of order
q − 1 whose elements χ ∈ E c× satisfy
χ(xy) = χ(x)χ(y)
for all x, y ∈ E × . The latter formula holds for all x, y ∈ E if we naturally extend the
domain of multiplicative characters by the convention that

c× ;
1, if χ = χ0 , the identity element of E
χ(0) =
0, otherwise.
Note that the trivial multiplicative character χ0 ∈ E
c× satisfies χ0 (x) = 1 for all x ∈ E.
In the following, we fix the prime field F = Fp where q = pe ; and abbreviate
Tr = TrE/F : E → F throughout, as in Theorem A1.7. By Theorem 3.8, the absolute trace
2 e−1
map satisfies Tr(a) = a + ap + ap + · · · + ap . Also recall that by Theorem 3.2, the field
has a cyclic multiplicative group E × = hωi.

Tr(ax)
Lemma 11.1. (i) The q additive characters of E have the form ψa (x) = ζp for
a ∈ E.
(ii) Fix a generator ω for E × . Then the q −1 multiplicative characters of E have the
form χk (ω r ) = ζq−1
kr
for k ∈ Z/(q−1)Z; and χk (0) = 0 for 1 6 k < q − 1.
In particular, additive characters have values in Z[ζp ]; and multiplicative characters
have values in Z[ζq−1 ]. More precisely, if d = gcd(k, q−1), then χk has values in
Z[ζ(q−1)/d ].

lcm(k,q−1)
We refer to χk (k 6= 0) as a character of order q−1 d = k where d = gcd(k, q−1),
q−1 ×
since d is the order of χk in the group E c . In this notation, χq−1 is another name for
the trivial multiplicative character χ0 described above. For q odd, χ q−1 is the quadratic
2
character. The trivial additive character is ψ0 (x) = 1.
Proof of Lemma 11.1. Clearly each ψa ∈ E, b and the trivial additive character is ψ0 (x) = 1
for all x ∈ E. If a 6= b in E then by Theorem A1.7(ii), there exists x ∈ E satisfying
Tr[(a − b)x] 6= 0, so ψψab (x)
(x)
= ψa−b (x) 6= 1 and ψa 6= ψb . So the additive characters ψa ∈ E b
(a ∈ E) are distinct; and since |E| b = q by Theorem 6.1(b), all additive characters have
this form. This proves (i), and (ii) is similar.

For χ ∈ E
c× and ψ ∈ E,
b we define the Gauss sums
X X
G(χ, ψ) = χ(x)ψ(x) ∈ Z[ζp , ζq−1 ] = Z[ζ(q−1)p ]; G(χ) = G(χ, ψ1 ) = χ(x)ζ Tr x .
x∈E x∈E
 11. Gauss and Jacobi Sums

This generalizes the quadratic Gauss sum S from Section 10. Questions about G(χ, ψ) can
usually be reduced to questions about G(χ), due to (i) below.

Theorem 11.2. (i) If χ 6= χ0 or a 6= 0, then G(χ, ψa ) = χ(a)G(χ) = χ(a)G(χ).


(ii) G(χ) = χ(−1)G(χ).

1
χ(x)ζ Tr(ax) = u∈E χ ua ζ Tr u = χ(a)
P P 
Proof. (i) If a 6= 0 then G(χ, ψa ) = x∈E G(χ) =
P
χ(a)G(χ). Now suppose a = 0, so G(χ, ψ0 ) = x∈E χ(x). If χ 6= χ0 then by the
convention above, χ(0) = 0 and the remaining terms cancel since χ and χ0 are orthogonal
by Theorem 6.2(a), so again the conclusion holds.
(ii) G(χ) = x∈E χ(x)ζ Tr x = x∈E χ(x)ζ − Tr x = G(χ, ψ−1 ) = χ(−1)G(χ)
P P

= χ(−1)G(χ) using (i) and the fact that χ(−1) = ±1.

Theorem 11.3. Let χ ∈ E


c× , ψ ∈ E,
b and assume χ 6= χ0 , ψ 6= ψ0 . Then

(i) |G(χ, ψ)| = q.
(ii) G(χ, ψ0 ) = 0.
(iii) G(χ0 , ψ) = 0.
(iv) G(χ0 , ψ0 ) = q.

P
Proof. (ii) G(χ, ψ0 ) = χ(x) = 0 since χ and χ0 are orthogonal (Theorem 6.2(a)).
x∈E ×
P
(iii) G(χ0 , ψ) = ψ(x) = 0 since ψ and ψ0 are orthogonal (Theorem 6.2(a)).
x∈E
P
(iv) G(χ0 , ψ0 ) = 1 = q.
x∈E
X X
(i) |G(χ, ψ)|2 = χ(x)ψ(x)χ(y)ψ(y) = q − 1 + χ(x)ψ(x)χ(y)ψ(y)
x,y∈E x6=y6=0
X
= q−1 + χ( xy )ψ(x−y)
x,y6=0
X
uv v
= q−1 + χ(u)ψ(v) (substituting x = u−1 , y= u−1 )
u6=1
v6=0
X X 
= q−1 + χ(u) ψ(v) = q − 1 + (−1)(−1) = q (see (ii),(iii)).
u6=1 v6=0

Now define the Jacobi sum of two multiplicative characters χ, χ0 ∈ E


c× by
X
J(χ, χ0 ) = J(χ0, χ) = χ(x)χ0(y).
x,y∈E
x+y=1
11. Gauss and Jacobi Sums 

Theorem 11.4. Let χ, χ0 ∈ E c× be nontrivial, i.e. χ, χ0 6= χ0 . Then


−χ(−1), if χχ0 = χ0 ,
(
0
(i) J(χ, χ ) = G(χ)G(χ0)
G(χχ0) , otherwise;
(ii) J(χ, χ0 ) = 0;
(iii) J(χ0 , χ0 ) = q.

Proof. (ii) and (iii) follow easily using Theorem 11.3(ii). If χχ0 = χ0 then
X X X
J(χ, χ0) = χ(x)χ−1(1 − x) = x

χ 1−x = χ(u) = −χ(−1)
x6=1 x6=1 u6=−1
using Theorem 11.3(ii) and the substitution x = Finally if χχ0 6= χ0 then
u
u+1 .
X X X
G(χ)G(χ0) = χ(x)χ0(y)ζ Tr(x+y) = χ(x)χ0(−x) + χ(x)χ0(y)ζ Tr(x+y)
x,y∈E x x+y6=0
X X
= χ0 (−1) (χχ0)(x) + χ(ts)χ0((1−t)s)ζ Tr s
x∈E s,t∈E
s6=0

0
= 0 since χχ0 6= χ0 , so
P
using the substitution (x, y) = (ts, (1−t)s). Now x (χχ )(x)
X X
G(χ)G(χ0) = χ(t)χ0(1−t) χ(s)χ0(s)ζ Tr s = J(χ, χ0)G(χχ0).
t∈E s∈E

When χ, χ0 and χχ0 are all nontrivial, their Gauss sums are nonzero (by Theorem 11.3(i))
and then we can solve for J(χ, χ0) to obtain the value claimed.

An interesting consequence of Theorem 11.4 is that when all of χ, χ0, χχ0 ∈ E


c× are

nonprincipal, |J(χ, χ0)| = q using Theorem 11.3(i). An application is

Corollary 11.5. (i) Every prime p ≡ 1 mod 4 is expressible in the form p = a2 +b2
with a, b ∈ Z.
(ii) Every prime p ≡ 1 mod 3 is expressible in the form p = a2 −ab+b2 with a, b ∈ Z.

Proof. Here we take q = p, so E = Fp .


(i) Suppose p ≡ 1 mod 4. Since |E| b = p − 1 is divisible by 4, there exists χ ∈ E b of
order 4; and χ has values in Z[i] (see Lemma 11.1, where we denoted χ = χ p−1 .) Thus
4
J(χ, χ) = a + bi for some a, b ∈ Z. Since χ and χ2 are nontrivial, |J(χ, χ)|2 = a2 + b2 = p.
(ii) Now suppose p ≡ 1 mod 3. A cubic character χ = χ p−1 ∈ E c× has values
3
in Z[ω] where ω = ζ3 . As in (i), we have J(χ, χ) = a + bω for some a, b ∈ Z, and
|J(χ, χ)|2 = (a + bω)(a + bω) = a2 − ab + b2 = p.
 11. Gauss and Jacobi Sums

It should be noted that computing a, b ∈ Z as in Corollary 11.5 has an efficient algo-


rithmic solution by Fermat’s method of descent; however if one only requires an existence
proof, then the argument above is hard to beat for conciseness. There is no point in
stating Corollary 11.5 more generally for prime powers, since expressing numbers in the
form a2 + b2 , or a2 − ab + b2 , reduces to the comparable question for p using elementary
arguments. In Corollary 11.5, note the necessity of the assumed congruences for p.
Both the rings Z[i] (i = ζ4 ) and Z[ω] (ω = ζ3 ) have unique factorization; see Corol-
lary A3.16. In both these rings, the units are just the roots of unity: Z[i]× = hii = {±1, ±i}
and Z[ω]× = hζ6 i = {±1, ±ω, ±ω 2 }. For a rational prime p ≡ 1 mod 4, the expression
p = a2 + b2 is essentially unique: there are in fact eight solutions (±a, ±b), (±b, ±a)
corresponding to the factorizations of p as a product of irreducible elements in Z[i]:

p = (a+bi)(a−bi) = (−b+ai)(−b−ai) = (−a−bi)(−a+bi) = (b−ai)(b+ai)

where all factorizations are obtained from the first one by migration of units.
For a rational prime p ≡ 1 mod 3, we have twelve solutions of p = a2 − ab + b2 , all
arising from the same factorization of p by migration of units in Z[ω]:
     
p = a + bω a−b − bω = −b + (a−b)ω −a + (b−a)ω = b−a − aω b + aω
     
= −a − bω b−a + bω = b + (b−a)ω a + (a−b)ω = a−b + aω −b − aω .

These are in fact all the factorizations of p in the ring O = Z[ω] of Eisenstein integers,
since the factors shown are irreducible (since they have norm equal to p, a prime) and O
has unique factorization up to units, of which there are exactly six. The resulting twelve
solutions of p = a2−ab+b2 are classified according to the pair of residues (a mod 3, b mod 3)
which evidently cannot be (0, 0), (1, 2) or (2, 1) as these choices yield a2 −ab+b2 ≡ 0 mod 3.
This means that the twelve solutions (a, b), reduced mod 3, yield each of the six remaining
pairs ±(1, 1), ±(1, 0), ±(0, 1) twice. In particular there are two solutions (a, b), (a−b, −b)
which reduce as (2, 0) modulo 3; and we now show that these two solutions are exactly the
ones arising from Jacobi sums of cubic characters:

Lemma 11.6. Let χ ∈ E c× be a cubic multiplicative character on a prime field E =


Fp of order p ≡ 1 mod 3. Then J(χ, χ) = a + bω where a ≡ 2 mod 3 and b ≡ 0 mod 3.
Moreover, (A, B) = (2a−b, 3b ) gives an integer solution of the Diophantine equation
4p = A2 + 27B 2 in which A ≡ 1 mod 3.
The conditions above essentially characterize J(χ, χ) in the following sense: The
equation 4p = A2 + 27B 2 has just two integer solutions (A, ±B) satisfying A ≡
1 mod 3; and they arise from J(χ, χ) and J(χ, χ) as just described, where χ and
χ = χ2 are the two cubic characters of E × .

Proof. Under the stated hypotheses,


11. Gauss and Jacobi Sums 

G(χ)2 = G(χ2 )J(χ, χ) (by Theorem 11.4(i))


= G(χ)J(χ, χ) (the inverse of χ is χ = χ2 )
= χ(−1)G(χ)J(χ, χ) (by Theorem 11.2(ii))
= G(χ)J(χ, χ) (χ(−1) = ±1 but its cube must equal 1).

Now multiply both sides by G(χ) and use Theorem 11.3(i) to get pJ(χ, χ) = G(χ)3 . We
will reduce both sides of the latter relation (in O = Z[ω]) modulo 3, to obtain a relation
in the quotient ring O/3O, a local ring (but not a field) of order 9:
X 3 X
pJ(χ, χ) = G(χ)3 = χ(x)ζ x ≡ χ(x)3 ζ 3x = −1 mod 3
x∈F x∈F
3x u 3
P P
since x∈F ζ = u∈F ζ = 0 and χ(x) = 0 or 1 according as x = 0 or x 6= 0. Since
p ≡ 1 mod 3, (i) gives J(χ, χ) = a + bω ≡ −1 mod 3, i.e. a ≡ 1 and b ≡ 0 mod 3. From
the defining formula it is clear that J(χ, χ) = J(χ, χ) = a + bω = a−b − bω. Note that the
resulting coefficients (a, b) and a−b, −b) are in fact both of the pairs which reduce to (2, 0)
modulo 3 as described.
It is straightforward to check that the substitution (A, B) = (2a−b, 3b ) transforms
an integer solution of p = a2 − ab + b2 with a ≡ 2 and b ≡ 0 mod 3 to a solution of
4p = A2 +27B 2 with A ≡ 1 mod 3. Conversely, given an integer solution of A2 +27B 2 = 4p,
an easy inspection of this relation modulo 8 shows that A and B must both be odd, so
(a, b) = ( 21 (A+3B), 3B) gives a pair of integers congruent to (2, 0) mod 3 and satisfying
a2 − ab + b2 = p.

We give an application to counting solutions of an equation over a finite field. The-


orem 3.6 yields the number of solutions of x2 + y 2 = 1 in a field of odd order; see also
Exercise #1. Here we answer the analogous problem for cubes in place of squares.

Theorem 11.7. Given a prime p ≡ 1 mod 3, the number of solutions of the equation
x3 + y 3 = 1 over Fp is p − 2 + A where 4p = A2 +27B 2 , A ≡ 1 mod 3 as in Lemma 11.6.

We remark that for p 6≡ 1 mod 3, the number of solutions of x3 + y 3 = 1 over E = F3 is


exactly p, for rather simple reasons. (If p = 3 then x3 = x for all x ∈ E. If p ≡ 2 mod 3
then the map E → E, x 7→ x3 is also a permutation of E: it maps 0 7→ 0, and it is an
automorphism of the cyclic group E × since 3 is relatively prime to p−1.)

Proof of Theorem 11.7. Denote by #(x3 + y 3 = 1) the number of solutions of x3 + y 3 = 1


in E. Clearly
X
#(x3 + y 3 = 1) = #(x3 = a)#(y 3 = b)
a+b=1
 11. Gauss and Jacobi Sums

where #(x3 = a) is the number of solutions of x3 = a in E, and similarly for the other
factor. Compare the values of #(x3 = a) with the values of a cubic character χ = χ p−1
3
on E:
 1, if a = 0;  0, if a = 0;
 
3 ×
#(x =a) = 3, if a ∈ E is a cube; χ(a) = 1, if a ∈ E × is a cube;
 
0, if a ∈ E × is not a cube; ω or ω, if a ∈ E × is not a cube.
Observe that #(x3 = a) = 1 + χ(a) + χ(a) = χ0 (a) + χ(a) + χ(a) where χ0 is the trivial
multiplicative character, and the two cubic characters are χ = χ p−1 and χ = χ 2(p−1) . Thus
3 3
X
#(x3 + y 3 = 1) = (χ0 (a) + χ(a) + χ(a))(χ0 (b) + χ(b) + χ(b))
a+b=1
2 X
X 2
= J(χi , χj ).
i=0 j=0

This sum has nine terms, most of which are given by Theorem 11.4:
J(χ0 , χ0 ) = p; J(χ0 , χ) = J(χ0 , χ) = J(χ, χ0 ) = J(χ, χ0 ) = 0; and
J(χ, χ) = J(χ, χ) = −χ(−1) = −1
using again the fact that χ(−1) = ±1 but χ(−1)3 = 1. The remaining two Jacobi sums
are given, in the notation of Lemma 11.6, by
J(χ, χ) + J(χ, χ) = a+bω + a+bω = a+bω + (a−b)−bω = 2a−b = A.
Adding these nine Jacobi sums gives #(x3 + y 3 = 1) = p − 2 + A.

Exercises 11.
q−1
1. According to Theorem 3.6(i), the number of solutions of x2 + y 2 = 1 in Fq is q − (−1) 2 when
q is odd.
(a) Give another proof using Jacobi sums, similar to the proof of Theorem 11.7.
(b) Explain why the number of solutions of x2 + y 2 = 1 in Fq is exactly q when q is even.

2. Make a table with five columns, labelled: p, A, B, p−2+A, ‘solutions’. In the first column, list all
primes p ≡ 1 mod 3 less than 50. For each p, find the integers (A, ±B) satisfying 4p = A2 +27B 2
with A ≡ 1 mod 3, and list them in columns 2 and 3, entering also the value of p−2+A in
column 4. In the last column, list all pairs (x, y) over Fp satisfying x3 + y 3 = 1 for x, y ∈ Fp
(note: list all solutions, not just the number of solutions). Count solutions in column 5 in each
case and verify that the number of solutions agrees with the expected number from column 4, as
predicted by Theorem 11.7. You may use a computer to perform this exercise.
×
3. Let E = Fp , p prime. Choose a multiplicative character χ ∈ E c of order n; recall that n divides
p − 1. By definition, G(χ) ∈ Z[ζn , ζp ] = Z[ζnp ]. Prove that G(χ)n ∈ Z[ζn ].
Hint : Choose r ∈ {1, 2, 3, . . . , p−1} which is a generator for E × . By the Chinese Remainder
Theorem, there exists k ∈ Z such that k ≡ 1 mod n and k ≡ r mod p. Now Q[ζnp ] has a unique
automorphism satisfying σ(ζnp ) = ζnp k . Find the fixed field of σ (denoted Fix
Q[ζnp ] (σ) in Ap-
pendix A5). Show that σ(G(χ)) = χ(r)G(χ) and take nth powers.
12. Zeta Functions and L-Functions 

12. Zeta Functions and L-Functions

Fix a finite field F = Fq . We will see that characters on F lift naturally to characters on
finite extension fields K ⊇ F via the trace and norm maps of the extension. It is natural
to ask how the Gauss sums of the lifted characters on K, may be expressed in terms of the
Gauss sums of the original characters on F . This is possible using the Hasse-Davenport
relations, which we prove in this section. In particular, this generalizes the explicit formula
for quadratic Gauss sums over prime fields (Theorem 10.13) to an explicit formula for
quadratic Gauss sums over arbitrary finite fields (Corollary 12.11). The key tool in this
development is L-functions, which we must first introduce. Because we deal here with
L-functions over function fields, students may first want to glance through Appendix A6
where L-functions over number fields are described. If, as we expect, the number field case
is more familiar to students, then that Appendix may serve as a bridge to the results in
this Section. Yet in no way do we actually require the results of Appendix A6.
Recall that the polynomial ring O := F [x] is a principal ideal ring; indeed, every
nonzero ideal A ⊆ O has a unique monic generator. The norm of an ideal A ⊆ O is the
number of cosets: N(A) = |O/A| = q n assuming A has a generator of degree n. The norm
is multiplicative: N(AB) = N(A) N(B) for all ideals A, B ⊆ O. Every nonzero prime ideal
P ⊂ O is maximal, and has the form P = (f (x)) where f (x) ∈ O is irreducible; and then
the residue field O/P = O/(f ) ∼ = Fqn where n = deg f .
The zeta function of O is the complex-valued function defined by
X 1
ζO (s) =
N(A)s
A
where the sum is over all nonzero ideals A ⊆ O. (Compare O with the ring Z whose
nonzero ideals have the form (n) = nZ for n > 1, giving the Riemann zeta function ζ(s) =
P∞ −s
P∞
n=1 |Z/nZ| = n=1 n−s .) Since nonzero ideals in O factor uniquely as products of
prime ideals, we obtain the factorization
Y 1 −1
ζO (s) = 1−
N(P)s
P

where P ranges over all nonzero prime ideals of O. This is the Euler factorization of
ζO (s), valid for exactly the same reasons as for the Riemann zeta function (or for Dedekind
zeta functions of more general number fields; see Appendix A6): it is a restatement of the
unique factorization property for nonzero ideals, using the fact that the norm is multi-
plicative.
Now every nonzero ideal A ⊆ O has a unique monic generator f (x) ∈ O; and N(A) =
deg f
q . Moreover there are exactly q n monic polynomials of degree n, so
∞ ∞
X qn X 1
(12.1) ζO (s) = ns
= qn zn =
n=0
q n=0
1 − qz
 12. Zeta Functions and L-Functions

where we have substituted z = q −s . The series converges in the right half-plane <s > 1,
i.e. in the open disk |z| < 1q ; but by analytic continuation, the function is meromorphic in
z with a simple pole at z = 1q . The Euler factorization yields
∞  ∞
Y 1 −nd Y −n
(12.2) ζO (s) = 1 − ds = 1 − zd d
q
d=1 d=1

where nd is the number of prime ideals of norm q d , i.e. the number of monic irreducible
polynomials of degree d; see Theorem 3.13. Let’s verify directly the equality of the two
expressions (12.1) and (12.2). Since both series have constant term 1, it suffices to compare
their derivatives with respect to z. It is more convenient to use logarithmic differentiation:
0
we apply the operator Df (z) = z dz d
f (z) = z ff (z)
(z)
to both (12.1) and (12.2), and compare
the results. For (12.1) we get

d  1  qz X
(12.3) z log = = qn zn
dz 1 − qz 1 − qz n=1

whereas (12.2) yields


∞ ∞ ∞
d Y
d −nd
 d X d
 X dnd z d
(12.4) z log 1−z = −z nd log 1 − z =
dz dz 1 − zd
d=1 d=1 d=1
∞ X
X ∞ ∞  X
X  ∞
X
= dnd z rd = dnd z n = qn zn .
d=1 r=1 n=1 16d|n n=1

Since (12.3) and (12.4) agree, and since (12.1) and (12.2) have the same constant term 1,
it follows that (12.1) and (12.2) agree.
Now let λ be a complex-valued multiplicative function defined on the monoid of
nonzero ideals of O. This means that for nonzero ideals A, B ⊆ O, we have λ(AB) =
λ(A)λ(B). We define the L-function
X λ(A)
Lλ (s) =
N(A)s
A
where the sum is again over all nonzero ideals A ⊆ O. Note that for the constant function
λ(A) = 1, this is just the zeta function. We shall immediately substitute z = q −s as before.
In all cases of interest we shall have |λ(A)| 6 1; so that by comparison, Lλ (s) converges in
the open disk |z| < 1q . The multiplicative property of λ (together with the multiplicative
property of the norm, as before) means that Lλ (s) admits an Euler factorization
Y λ(P) −1
Lλ (s) = 1−
N(P)s
P
where P ranges again over all nonzero prime ideals of O.
Now each nonzero ideal has a unique monic generator; so it makes sense to write
λ(f ) = λ(A) where A = (f (x)) ⊆ O and f (x) ∈ M ; here we denote by M the monoid of
monic polynomials in O. For d > 0, denote by Md ⊂ M the subset consisting of monic
12. Zeta Functions and L-Functions 

polynomials of degree d. Also let P ⊂ M be the set of monic irreducible polynomials; and
Pd = P ∩ Md is the set of monic irreducible polynomials of degree d. Thus
X λ(f ) X X∞ X
deg f
(12.5) Lλ (s) = = λ(f )z = λ(f )z n
q s deg f
f ∈M f ∈M n=0 f ∈Mn
∞ Y
Y λ(f ) −1 Y deg f −1
Y −1
1 − λ(f )z d .

= 1 − s deg f = 1 − λ(f )z =
q
f ∈P f ∈P d=1 f ∈Pd

Applying to (12.5) the differential operator D as above, we obtain


∞ ∞ X
d d X X d
 X dλ(f )z d
(12.6) z log Lλ (s) = −z log 1 − λ(f )z =
dz dz 1 − λ(f )z d
d=1 f ∈Pd d=1 f ∈Pd
∞ ∞ ∞ X 
n
X X X X X
= dλ(f )r z rd = d λ(f ) d z n .
d=1 f ∈Pd r=1 n=1 d|n f ∈Pd

Now how do we come up with suitable choices of λ, other than the constant λ(A) =
1? and which choices of multiplicative function are most useful? It is easy to concoct
multiplicative functions: simply define λ(f ) for f ∈ P arbitrarily, as this will uniquely
extend to the entire monoid M using the multiplicative property. And as long as we
choose |λ(f )| 6 1 for f ∈ P , λ will satisfy this bound for all f ∈ M .
Our interest is in a very special choice of λ, which will greatly simplify the coefficient
n
of z in (12.5). To this end, we first fix a multiplicative character χ ∈ Fc× and additive
character ψ ∈ Fb as in Section 11. For an arbitrary monic polynomial

f (x) = xd − a1 xd−1 + a2 xd−2 − · · · + (−1)d−1 ad−1 x + (−1)d ad ∈ Md , d > 1,

define λ(f ) = χ(ad )ψ(a1 ). In particular for f (x) = x − a ∈ M1 , we have λ(f ) = χ(a)ψ(a).
And of course for the unique monic constant polynomial, we require λ(1) = 1.

Lemma 12.7. (i) λ is multiplicative.


 1, if n = 0;

P
(ii) Assuming χ and ψ are not both trivial, λ(f ) = G(χ, ψ), if n = 1;
f ∈Mn 
0, if n > 2.

Proof. (i) Let f (x) ∈ Md as above, and g(x) = xe −b1 xe−1 +· · ·+(−1)e−1 x+(−1)e ∈ Me .
Then
f (x)g(x) = xd+e − (a1 +b1 )xd+e−1 + · · · + (−1)d+e ad be ∈ Md+e .
We have
λ(f )λ(g) = χ(ad )ψ(a1 )χ(be )ψ(b1 ) = χ(ad be )ψ(a1 +b1 ) = λ(f g).
P
(ii) Since M1 consists of polynomials of the form x − a for a ∈ F , we have λ(f ) =
f ∈M1
P
χ(a)ψ(a) = G(χ, ψ).
a∈F
 12. Zeta Functions and L-Functions

(iii) Write f (x) = xn − a1 xn−1 + a2 xn−2 − · · · + (−1)n an ∈ Mn where n > 2. Since


λ(f ) does not depend on the coefficients a2 , a3 , . . . , an−1 ,
X X X  X 
λ(f ) = λ(f ) = q n−2 χ(an )ψ(a1 ) = q n−2 χ(a) ψ(b) = 0
f ∈Mn a1 ,an ∈F a∈F b∈F

since either χ 6= χ0 or ψ 6= ψ0 .

Lemma 12.7 gives the coefficient of z n in the series expansion of (12.5), which therefore
(assuming χ and ψ are not both trivial) reduces to a polynomial of degree 1:

Lλ (s) = 1 − G(χ, ψ)z.

Substituting into (12.6) gives


∞ ∞ X X
G(χ, ψ)z X
n−1 n n
X n

(12.8) = (−1) G(χ, ψ) z = d λ(f ) z n .
d
1 + G(χ, ψ)z n=1 n=1 d|n f ∈Pd

There is a very straightforward interpretation of the coefficients of z n on the right. For


each finite extension K = Fqn , [K : F ] = n, the characters χ ∈ Fc× and ψ ∈ Fb lift to
characters χK ∈ K c× and ψ K ∈ K
b defined by

χK (a) = χ(NK/F a), ψ K (a) = ψ(TrK/F a) for a ∈ K.

Note that χK is multiplicative since it is a composite of two multiplicative functions (The-


orem A1.7); similarly ψ K is an additive character. We will assume that χ 6= χ0 . In the
Gauss sum
X X
(12.9) G(χK , ψ K ) = χK (a)ψ K (a) = χK (a)ψ K (a)
a∈K 06=a∈K

we partition the terms according to the minimal polynomial f (x) of a ∈ K × over F ,


grouping together all terms arising from roots of f (x). Each such polynomial has d =
deg f = [F [a] : F ] dividing n = [K : F ]; and in the intermediate field E := F [a] we obtain
the splitting
f (x) = (x − r1 )(x − r2 ) · · · (x − rd ), a ∈ {r1 , r2 , . . . , rd } ⊆ E.
Evidently
f (x) = xd −
 d−1
+ · · · + (−1)d i ri
P Q 
i ri x
and so  nd  nd
n Q P
λ(f ) d = χ i ri i riψ
n
(NE/F a) d ψ nd TrE/F

=χ a (by Theorem A5.13)
 
= χ NK/F a ψ TrK/F a (by Corollary A1.10)
= χK (a)ψ K (a).
12. Zeta Functions and L-Functions 

Moreover all algebraic conjugates of a (the d roots of f (x)) all contribute this same term
to the sum. Thus the coefficient of z n on the right side of (12.8) is

n
X X X
d λ(f ) d = χK (a)ψ K (a) = G(χK , ψ K ).
d|n f ∈Pd a∈K

Now comparing coefficients in (12.8) gives G(χK , ψ K ) = (−1)n−1 G(χ, ψ)n . This is known
as the Hasse-Davenport lifting relation, which can also be rewritten slightly as

Theorem 12.10 (Hasse-Davenport). Let χ and ψ be nontrivial where F = Fq .


For each finite extension K = Fqn ⊇ F , denote by χK ∈ Kc× and ψ K ∈ K
b the
characters obtained by lifting. Then
n
−G(χK , ψ K ) = −G(χ, ψ) .

P P
Lang [L1] defines G(χ, ψ) to be − a χ(a)ψ(a) instead of a χ(a)ψ(a), thereby simplifying
this formula and some others. This seems such a natural choice that I was tempted to
follow it in these notes; but ultimately I settled on the choice of most authors for the sake
of consistency.
Now in the case p is odd, the quadratic character of F = Fp is χ(a) = ap , this


being the unique character χ ∈ Fc× of order 2. Not surprisingly, the character χK ∈ K c×
obtained by lifting, is nothing other than the quadratic character of K, this being its
unique multiplicative character of order 2. We check that

q−1 p−1 q−1


χK (a) = χ(NK/F (a)) ≡ (a p−1 ) 2 =a 2 mod p

which is indeed the quadratic character on K; see also Exercise #3.3. As a special case of
Theorem 12.10, we have

Corollary 12.11. Let χ ∈Kc× be the quadratic character on a finite field of odd

n
(−1)n−1 q, if p ≡ 1 mod 4;
order q = p . Then G(χ) = √
(−i)n+2 q, if p ≡ 3 mod 4.

Proof. On F = Fp , the additive character ψ1 (a) = ζ a lifts to ψ K (a) = ζ TrK/F a . By The-


orem 12.10, G(χK ) = G(χK , ψ K ) = (−1)n−1 G(χ, ψ)n = (−1)n−1 G(χ)n . Now the result
follows from Theorem 10.13.
 13. Exponential Sums

Exercises 12.
1. According to the Kronecker-Weber Theorem (see Section 4), every abelian extension of Q is
contained in a cyclotomic extension. In particular, every quadratic extension of Q should be
contained in a cyclotomic extension. Here we verify this fact without using the Kronecker-Weber
Theorem. Let F ⊃ Q be a quadratic field extension; this is, [F : Q] = 2.

(a) Show that F = Q[ d] for some integer d 6≡ 0 mod 4. (Hint: Choose θ ∈ F , θ ∈ / Q, and let
f (x) ∈ Q[x] be the minimal polynomial of θ over Q. Consider the discriminant of f .)

(b) If d is as in (a), use Corollary 12.11 to show that d ∈ Q[ζn ] for some positive integer n. Verify
this first in the case that d is a prime power; then extend to the general case d 6≡ 0 mod 4.

13. Exponential Sums

Consider a prime field F = Fp . Generalizing slightly the definition of a Gauss sum, it


would be natural to consider sums of the form
X X
χ(a)ψ(f (a)) = χ(a)ζ f (a)
a∈F a∈F

for an arbitrary function f : F → F . As we shall soon see, for quadratic polynomials f (x)
the corresponding sums are already expressible in the language of Gauss sums. For more
general functions f : F → F there is much more to be said; and the case where χ is trivial
is already sufficiently interesting. This leads to the definition of exponential sums given
below; and it is worth keeping in mind that both Gauss sums and exponential sums are
special cases of sums having the form suggested above. While neither type of sum (Gauss
or exponential) is a generalization of the other, we shall see that the two types of sum
coincide in the quadratic case.
For an arbitrary function f : F → F , F = Fp , ζ = ζp = e2πi/p , we define the
exponential sum of f as X
Sf := ζ f (a) ∈ Z[ζ].
a∈F

For more general finite fields E = Fq , q = pe , we must compose with the trace map
Tr = TrE/F : E → F . Recall that this is the F -linear map defined by
2 e−1
Tr a = a + ap + ap + · · · + ap .

Since Tr a ∈ Fp , we are able to meaningfully define the exponential sum of f as


X
Sf = Sf (x) := ζ Tr f (a) ∈ Z[ζ],
a∈E

noting that values of Sf are still in the same ring Z[ζ], ζ = ζp as before. Of course Sf
depends really only on the multiset of values of f rather than on f itself: in general there
13. Exponential Sums 

will be many functions g : E → E such that f and g take each value in E the same number
of times, in which case Sg = Sf . In particular, there are q! permutations of E, all having
the same exponential sum Sf = 0, this being a consequence of the relation a∈E ζ Tr a = 0,
P

which remains valid after an arbitrary permutation of terms in the sum. We note that (ii)
only holds in the case of prime fields.

Lemma 13.1. For exponential sums over a field of order q, we have


(i) |Sf | 6 q; and equality holds iff f is a constant function.
(ii) Suppose that q = p. Then any two functions f, g : F → F have the same
exponential sum Sf = Sg iff f and g have the same multiset of values (i.e. for
every b ∈ E, the equation f (x) = a has the same number of solutions as the
equation g(x) = a. In particular, Sf = 0 iff f : F → F is a permutation of F .

Proof. Conclusion (i) is a simple application of the triangle inequality. In (ii), the argu-
ment above proves the ‘⇐’ in both ‘iff’ statements; and the ‘⇒’ direction in both statements
follows using the fact that Φp (x) = 1 + x + x2 + · · · + xp−1 is the minimal polynomial of ζ
over F .

It is worth keeping in mind that a ‘random’ function f : F → F is expected to have



|Sf | ≈ q, which is rather smaller than the upper bound of Lemma 13.1(i). This follows
from
X X X X
|Sf |2 = Sf Sf = ζ Tr f (a) ζ − Tr f (b) = ζ Tr[f (a)−f (b)] = q + ζ Tr[f (a)−f (b)] ≈ q,
a∈F b∈F a,b∈F a6=b

assuming the values of Tr f (a) are uniformly distributed in Fp , leading to widespread


cancellation of terms in the latter sum. The argument in fact shows (using linearity of
expectation) that for a random walk consisting of n unit steps in the plane, the steps
being taken randomly and independently from some distribution which is balanced (the
expected step being the zero vector), the expected square of the total distance travelled

is n; and thus the RMS (root mean square) distance travelled is n. This argument does

not say that the expected length of a random walk is n (since the squaring function is

nonlinear); nevertheless the approximation |Sf | ≈ q is a handy gauge against which to
compare |Sf | for those functions f which we encounter. Similar heuristics may be applied
P
to estimating more general sums of the form a∈F χ(g(a))ψ(f (a)).
The single most important result on exponential sums is

Theorem 13.2. Suppose f : E → E is represented by a polynomial in E[x] of



degree d > 1, where d is not divisible by p. Then |Sf | 6 (d − 1) q.
 13. Exponential Sums

This bound, which we refer to as Weil’s bound, also known as the Hasse-Davenport-
Weil bound, is actually the result of several 20th century mathematicians including André
Weil. The first complete proof of this bound relies on Pierre Deligne’s work on the Weil
conjectures, work completed in 1973 and for which he received the Fields Medal in 1978.
While the Weil conjectures are quite deep and far-reaching, today we have proofs by more
elementary methods; see [LN], [Sc]. We will not present the full proof of Weil’s bound; but
in Section 17 we present some of the key elements in an ‘elementary’ proof. We mention
that Weil’s bound extends also to Galois rings; and that we [MSW] have applied Weil’s
bound to eigenvalues of algebraically defined graphs, both for finite fields and for Galois
rings.
Note the obvious necessity of d > 1 in Weil’s bound; and the necessity of the hypothesis
p 6 d is discussed in Exercise #2. Weil’s bound is of course useless for larger values

d > 1+ q, since in that case it is weaker than the trivial bound of Lemma 13.1(i). In
applications of Weil’s bound, the reader is reminded that every function f : E → E is
representable as a polynomial of degree d 6 q−1, simply using interpolation. In general,
the strength of Weil’s bound depends on the particular choice of p-th root of unity ζ (recall
that one has φ(p) = p − 1 choices for ζ). Of course for d = 1, Weil’s bound holds with
equality (since polynomial maps of degree 1 are permutations; see Lemma 13.1(ii)). It is
not hard to show that equality also holds for quadratic polynomials:

Theorem 13.3. Consider an arbitrary quadratic polynomial f (x) = ax2 +bx+c ∈


1
E[x], a 6= 0 where E = Fq , q = pe , p an odd prime, e > 1. Writing d = 4a (b2 − 4ac),
we have
(i) Sf = χ(a)ζ − Tr d G(χ) where χ is the quadratic character on E (χ(a) = 0, 1, −1 for
a = 0, nonzero square, nonsquare respectively) and Sx2 = G(χ) is the quadratic

Gauss sum. In particular, |Sf | = q.
(ii) If q = p and ζ = e2πi/p then
( a  −d √
p ζ p, if p ≡ 1 mod 4;
Sf = √
i ap ζ −d p, if p ≡ 3 mod 4.


Recall that the generalization of (ii) for arbitrary odd q, giving the exact value of the
quadratic Gauss sum G(χ), was found at the end of Section 12.
Proof of Theorem 13.3. The quadratic Gauss sum is
X X X X X
G(χ) = χ(x)ζ Tr x = ζ Tr y − ζ Tr y = 1 + 2 ζ Tr y = −1 − 2 ζ Tr y
x∈E χ(y)=1 χ(y)=−1 χ(y)=1 χ(y)=−1

ζ Tr y = 1 + ζ Tr y + ζ Tr y . Now if f (x) = ax2 , χ(a) = 1 then


P P P
since 0 =
y∈E χ(y)=1 χ(y)=−1
X 2 X
Sf = ζ Tr(ax )
=1+2 ζ Tr y = G(χ)
x∈E χ(y)=1
13. Exponential Sums 

whereas if f (x) = ax2 , χ(a) = −1 then


X 2 X
Sf = ζ Tr(ax )
= −1 − 2 ζ Tr y = −G(χ).
x∈E χ(y)=−1

In either case, we have Sax2 = χ(a)G(χ).


b 2 b
 
In the general case, f (x) = ax2 + bx + c = a x + 2a −d = g x+ 2a − d where
1 b
2 2

g(x) = ax and d = 4a b − 4ac . Substituting u = x + 2a , we get
X X
Sf = ζ Tr f (x) = ζ Tr(g(u)−d) = ζ − Tr d Sg = χ(a)ζ − Tr d G(χ).
x∈E u∈E

The remaining assertions follows from Theorems 11.3(i) and 10.13.

In the case of prime fields E = Fp , we have already observed that those functions
f : E → E attaining |Sf | = 0 are just the permutations of E. Similarly, Theorem 13.3

admits a converse which characterizes those functions attaining Weil’s bound |Sf | = q
as exactly the quadratic polynomials, or permuted versions thereof (via Lemma 13.1(ii)):

Theorem 13.4 (Cavior [Ca]). Let f : F → F where F = Fp , p an odd prime.



Then |Sf | = p iff f has the same multiset of values as a quadratic polynomial.


Proof. If f has the same multiset of values as a quadratic polynomial, then |Sf | = p by
Lemma 13.1(ii) and Theorem 13.3.

Conversely, suppose |Sf | = p, so that Sf Sf = p = Sg Sg where g(x) = x2 . So the
principal ideals in O = Z[ζ] generated by the algebraic integers α := Sf and β := Sg satisfy
(α)(α) = (β)(β) = (p) = (ε)p−1 where the ideal (ε) = (1 − ζ) ⊂ O is the only distinct
prime factor of (p); see Theorem 4.4. By unique factorization of ideals we therefore have
(α) = (ε)r and (α) = (ε)s for some nonnegative integers satisfying r + s = p − 1. Since
NQ[ζ]/Q(α) = NQ[ζ]/Q(α), we have r = s; so (α) = (α) = (ε)(p−1)/2 . The same argument
applies to β, giving (β) = (ε)(p−1)/2 = (α). Thus α and β are associates, and α = uβ for

some unit u ∈ O× . Since since |α| = |β| = p, we obtain |u| = 1. Moreover for every
σ ∈ Aut Q[ζ], Theorem 4.1 gives

|σ(u)|2 = σ(u)σ(u) = σ(u)σ(u) = σ(uu) = σ(1) = 1

so |σ(u)| = 1. By Theorem 4.10, u is a root of unity. Now the only roots of unity in Q[ζ] are
±1, ±ζ, . . . , ±ζ p−1 by Theorem 4.2, so we have two cases. If u = ζ k , k ∈ {0, 1, 2, . . . , p−1},
then Sf = ζ k Sg = Sh where h(x) = x2 + k. By Lemma 13.1(ii), f has the same multiset
of values as the quadratic polynomial h(x) = x2 + k and we are done.
 13. Exponential Sums

Otherwise u = −ζ k , k ∈ {0, 1, 2, . . . , p−1} and by Theorem 13.3, the polynomial


h(x) = ηx2 + k has exponential sum Sh = −ζ k Sg provided η ∈ E is a nonsquare. Once
again, Sf = uSg = Sh so by Lemma 13.1(ii), f has the same multiset of values as h.

Cavior’s Theorem shows that exponential sums provide a characterization of quadratic


polynomials over F =Fp , p prime. Some further characterizations of polynomials of degree
0, 1 and 2, also using exponential sums, are given in Theorems 13.5, 13.11, 13.12 and 14.2
below. Applications of these results will be given in Sections 14, 15 and 16.

Theorem 13.5 [M3]. Let f : F → F where F = Fp , p an odd prime. Suppose


there exists a real constant κ > 0 such that for all c ∈ F , the exponential sum of the
function x 7→ f (x) + cx satisfies |Sf (x)+cx | ∈ {0, κ}. Then either

(a) f is quadratic and |Sf (x)+cx | = p for all c ∈ F , or
(b) f is constant or linear, i.e. f (x) = a1 x + a0 for some a0 , a1 ∈ F and

0, if c 6= −a1 ; or
|Sf (x)+cx | =
p, if c = −a1 .

Proof. Let ζ = ζp . For each c ∈ F , define αc ∈ C by


 −1
κ Sf (x)+cx , if Sf (x)+cx 6= 0;
αc =
1, if Sf (x)+cx = 0
so that |αc | = 1 for all c. Consider the complex p × p matrix
M = αc ζ xy+f (y) : x, y ∈ F .
 

A straightforward computation yields M M ∗ = pIp where M ∗ is the conjugate transpose


of M and Ip is the p × p identity matrix. Equivalently, √1p M is unitary. Now M is

diagonalizable and each of its eigenvalues has absolute value p.
Let 1 ∈ Cp be the column vector of 1’s. The hypothesis says that the vector M 1 has
k entries equal to κ, and p − k entries equal to zero, where k is the number of c ∈ F such
that |Sf (x)+cx | = κ. Now
kκ2 = ||M 1||2 = p||1||2 = p2 .

In particular, k > 1 and so κ = |Sf (x)+cx | for some c ∈ F . Since κ2 = Sf (x)+cx Sf (x)+cx ∈
2
Z[ζ], κ itself must be an algebraic integer. However, κ = pk ∈ Q; so by Theorem A3.2(ii),
κ ∈ Z and k ∈ {1, p}.

If k = p then |Sf (x)+cx | = p for all c ∈ F and so by Theorem 14.2, our conclusion
(a) holds. Otherwise we have k = 1, and |Sf (x)−a1 x | = κ = p for some a1 ∈ F . This means
that we have a constant function f (x) − a1 x = a0 ∈ F , so (b) holds.
13. Exponential Sums 

The next two technical lemmas will be required for our main results. For every function
f : F → F , we define
Af = {a ∈ F : Sf (x)+ax 6= 0}.

Lemma 13.6 ([M3]). Suppose |Af | 6 12 (p + 1). Then |Af | = 1, and f is either
constant or linear; i.e. f (x) = a1 x + a0 for some a0 , a1 ∈ F .

Proof. By definition, a ∈ Af iff there exist distinct x, y ∈ F such that f (x)+ax = f (y)+ay.
Thus the subset −Af = {−a : a ∈ Af } coincides with the set of all slopes of secants to the
graph of f in F 2 , i.e. the set of all values of the difference quotient (f (y) − f (x))/(y − x)
for x 6= y in F . The result follows by a theorem of Rédei [Re]; see also [Bl], [LS].

Lemma 13.7 ([M3]). Let F = Fp , p prime, and suppose g : F → F is a non-



constant function satisfying |Sx2 +cg(x) | = p for all c ∈ F . Then g is a permutation
of F . If moreover g(0) = 0 and g(1) = 1, then g(x) = ±x for all x ∈ F .

Proof. We use geometric terminology for the affine plane F 2 , which has p vertical lines
of the form x = a (a ∈ F ) and p2 non-vertical lines of the form y = mx + b (m, b ∈ F ).
Consider the point set O = {(g(t), t2 ) : t ∈ F } ⊂ F 2 . We will soon show that O consists
of p2 distinct points. First observe that if the equation t2 = mg(t) + b has more than two
solutions for t ∈ F , then the function h(t) = t2 −mg(t) attains the value b more than twice,
contrary to Cavior’s Theorem 13.4. This shows that

(13.8) for all m, b ∈ F , there are at most two values of t ∈ F such that the point
(g(t), t2 ) lies on the line y = mx + b.

Next we show that

(13.9) O consists of p distinct points.

If (g(t1 ), t21 ) = (g(t2 ), t22 ), t1 6= t2 , then t2 = −t1 6= 0. In this case, since g is not
constant, there exists t3 ∈ F such that g(t3 ) 6= g(t1 ). Let m = (t23 − t21 )/(g(t3 ) − g(t1 )),
b = t21 − mg(±t1 ) = t23 − mg(t3 ); then the line y = mx + b passes through (g(ti ), t2i ) for
i = 1, 2, 3, contrary to (13.8). This proves (13.9).
Now let ` ⊂ F 2 be any line, vertical or non-vertical. By (13.8), |` ∩ O| = 0, 1 or 2; and
we call ` a passant, secant or tangent accordingly. Each point P ∈ O lies on exactly
p + 1 lines, of which p − 1 are secants, and so P lies on exactly two tangents, one of which
(we claim) must be vertical. For each m ∈ F , the function h(t) = t2 − mg(t) attains some
 13. Exponential Sums

value b ∈ F exactly once, and each other value in F either 0 or two times, again by Cavior’s
Theorem; so among the p lines of slope m, exactly one (the line y = mx + b) is a tangent
line. Since there are p choices of m, there are exactly p non-vertical tangents; hence the
remaining p tangents must be vertical. Thus

(13.10) each vertical line is tangent to O. That is, g is a permutation of F .

Thus O is the graph Γf of a function f : F → F satisfying the hypotheses of Segre’s


Theorem 3.14. We conclude that O = {(x, f (x)) : x ∈ F } and f (x) = ax2 +bx+c for some
a, b, c ∈ F with a 6= 0.
Henceforth we assume that g(0) = 0 and g(1) = 1, so that (0, 0), (1, 1) ∈ O. Also the
only point (g(x), x2 ) ∈ O with second coordinate zero is (0, 0); so the x-axis is tangent
to O, forcing f (x) = x2 . The result follows.

Theorem 13.11 ([M3]). Let f1 , f2 : F → F be linearly independent functions



satisfying fi (0) = 0, and suppose that |Saf1 +bf2 | ∈ {0, p, p} for all a, b ∈ F . Then
there exists a permutation σ : F → F and constants ai , bi ∈ F such that fi (x) =
ai σ(x)2 + bi σ(x), i = 1, 2.

Proof. If |Saf1 +bf2 | = p then af1 + bf2 is constant; and since fi (0) = 0, this means
af1 + bf2 = 0. Since f1 and f2 are linearly independent, this forces a = b = 0. Thus

|Saf1 +bf2 | ∈ {0, p} whenever (a, b) 6= (0, 0).
Consider the case that f2 is a permutation. In this case we may assume f2 (x) = x;

otherwise replace fi by fi ◦ f2−1 for i = 1, 2. Now |Sf1 (x)+bx | ∈ {0, p} for all b ∈ F ; so by
Theorem 13.5, f1 (x) = a1 x2 + b1 x for some a1 , b1 ∈ F with a1 6= 0. The result follows in
this case.
Now if the two-dimensional space hf1 , f2 iF of functions F → F contains a permuta-
tion, then by change of basis we reduce to the previous case. We may therefore assume

hf1 , f2 iF contains no permutation, i.e. |Saf1 +bf2 | = p for all (a, b) 6= (0, 0). In partic-

ular |Sf1 | = p so by Cavior’s Theorem 13.4, there exists a permutation σ : F → F
such that f1 (x) = a1 σ(x)2 + b1 σ(x), a1 6= 0; moreover σ(0) = 0 (thus ensuring the value
f1 (0) = 0). Furthermore, there is no loss of generality in assuming that σ(x) = x and

a1 = 1; so |Sx2 +b1 x+bf2 (x) | = p for all b ∈ F . Define h(x) = f2 (x − b21 ) − f2 (− b21 ); then

|Sx2 +bh(x) | = |Sx2 +b1 x+bf2 (x) | = p for all b ∈ F , so h : F → F is bijective by Lemma 13.7.
This means that f2 is a permutation after all.

In Section 16 we will also require one more such characterization of polynomials of


small degree:
14. Affine Planes 

Theorem 13.12 ([M3]). Let f : F → F where F = Fp , and let a ∈ F be a nonzero



constant. Suppose that |Sax2 +bx+cf (x) | = p for all b, c ∈ F . Then f (x) = mx+d for
some m, d ∈ F .

Proof. By hypothesis,
P ax2 +bx+cf (x) 2 2
−y 2 )+b(x−y)+c(f (x)−f (y))
ζ a(x
P
p= ζ =
x∈F x,y∈F
2
ζ 2aty+t +bt+c(f (y+t)−f (y))
P
=
y,t∈F

for all b, c ∈ F . Multiply both sides by ζ −b and sum over b ∈ F to obtain

ζ 2ay+a+c(f (y+1)−f (y)) = 0 for all x ∈ F .


P
(13.13)
y∈F

Now suppose the desired conclusion fails, i.e. f is not representable as a polynomial of
degree 6 1; we seek a contradiction. Evidently the first-order difference of f is not constant,
so there exists x ∈ F such that

f (x + 1) − f (x) 6= m

where m = f (1) − f (0). Clearly x 6= 0. Set


2ax
c=
m − [f (x + 1) − f (x)]
and check that the exponent in (13.13) takes the same value for y = 0 and for y = x.
However, the only way for the exponential sum (13.13) to vanish is for the exponent to
have distinct values as y varies over F , which is the desired contradiction.

Exercises 13.
1. Give a direct proof (i.e. without using Gauss sums) that the exponential sum Sf for f (x) = ax2

on E = Fq , q odd, a 6= 0, has modulus |Sf | = q. (Hint: Expand |Sf |2 = Sf Sf as a double
sum, and use orthogonality of additive characters.)
2. Let E = Fq where q = pe , p prime, e > 2; and consider the function f : E → E, a 7→ ap − a.
Evaluate Sf and find conditions under which Weil’s bound of Theorem 13.2 fails. This points to
the necessity of the hypothesis gcd(d, q) = 1 in that result.

3. Show by example the necessity of the hypothesis that p is prime in Cavior’s


√ Theorem 13.4. (Hint:
Take E = F9 = F3 [i] where i2 = −1. If g(x) = x2 then |Sg | = 9 = 3. Find three terms in
the sum Sg of the form ζ 0 + ζ 1 + ζ 2 = 0 and modify just the three corresponding values of g(x)
to create a new function f with the same three terms in its exponential sum, so that Sf = Sg .
With some care, you can arrange that f takes on some value more than twice, so f has a different
multiset of values from any quadratic polynomial.)
 14. Affine Planes

14. Affine Planes


An affine plane of order n is an incidence system of n2 points and n(n + 1) lines such
that
• Each line has exactly n points;
• Any two distinct points lie on exactly one line;
• Given any line ` and any point P not on `, there is exactly one line m through P
having no point in common with `.
Now in an affine plane, we say two lines ` and m are parallel (denoted ` k m) if they
are either the same line, or they are disjoint (i.e. have no points in common). The axioms
above say that parallelism of lines is an equivalence relation (and in particular if `1 k `2 and
`2 k `3 , then `1 k `3 ). Each parallel class of lines consists of n lines of size n, constituting
a partition of the n2 points. Each point lies on exactly n + 1 lines, one from each parallel
class. If two lines are not parallel, then they meet in exactly one point.
The classical affine plane of order q (a prime power) is the plane coordinatized by
F = Fq in the usual way: Take points to be the q 2 ordered pairs (x, y) ∈ F 2 . There are
q(q+1) lines `m,h where m ∈ F ∪{∞}, h ∈ F , defined as follows:
• ‘Vertical’ lines are point sets of the form `∞,k := {k}×F = {(k, y) : y ∈ F } for k ∈ F .
Each such line may also be specified also by its equation x = k.
• ‘Nonvertical’ lines are point sets of the form `m,h := {(x, mx+h) : x ∈ F } where
m, h ∈ F . Each such line is also specified by its equation y = mx+h.
One readily checks that this structure satisfies the properties required of an affine plane
listed above. So for each prime power n, there is at least one plane of order n.
There exist many constructions of finite affine planes which are not isomorphic to the
classical planes constructed above. However, all known finite planes have prime power
order; and this has encouraged some to conjecture that every finite affine plane must have
prime power order. Take heed, however, of The Streetlight Effect (Section 9): it may well
be that the planes of prime order are the only ones known, simply because they are the
easiest ones to find.
The smallest non-classical planes have order 9; and in fact there are seven affine
planes of order 9 up to isomorphism, including the classical plane. (See [HP], [M4] for a
general introduction to affine and projective planes. Those familiar with the usual process
of projective completion will recognize that every affine plane of order n also yields a
projective plane of order n, but we will stick to the affine description here.)
One hopeful scheme for constructing finite affine planes is as follows. Consider a finite
field F = Fq , q = pe . A planar function on F is a function f : F → F such that for every
nonzero d ∈ F , the difference function ∆d f : F → F defined by (∆d f )(x) := f (x+d)−f (x)
is a permutation of F .

Proposition 14.1. If char F is odd, then every quadratic polynomial f (x) ∈ F [x]
represents a planar function on F .
14. Affine Planes 

Proof. Let f (x) = ax2 + bx + c where a, b, c ∈ F with a 6= 0. Then for all nonzero
m ∈ F , ∆m f (x) = 2amx + am2 + b is a polynomial of degree 1 (since 2am 6= 0 in odd
characteristic) and hence a permutation of F .

The interest in planar functions (and the explanation for their name) derives from the
fact that every planar function f : F → F gives rise to a finite affine plane Af of order q
(and hence also a projective plane of the same order). This plane has q 2 points (x, y) ∈ F 2
and q(q+1) lines `em,h where m ∈ F ∪{∞}, h ∈ F , defined as follows:
• ‘Vertical’ lines are point sets of the form `e∞,k := {k}×F = {(k, y) : y ∈ F } for k ∈ F .
Each such line is denoted also by its equation x = k.
• ‘Nonvertical’ lines are point sets of the form `em,h := {(x, f (x+m)+h) : x ∈ F } where
m, h ∈ F . Each such line is denoted also by its equation y = f (x+m)+h.
One readily verifies that the resulting structure is an affine plane of order q. For example
if m 6= n in F , the fact that `em,h ∩ `en,k contains a unique point (x, y) follows from the fact
that ∆m−n f (x0 ) = f (x+m) − f (x+n) = k − h has a unique solution for x0 := x+n in F .
Unfortunately, however, if one uses a quadratic polynomial f (x) = ax2 +bx+c ∈ F [x]
(a 6= 0) as our choice of planar function, the resulting plane is not new; it is just a
disguised version of the classical plane. To see this, observe that (x, y) ∈ `em,h iff y−ax2 =
2am + a2 m+h iff (x, y−ax2 ) ∈ `2am,a2 m+h . (The description of vertical lines does not
change under this recoordinatization.) The map (x, y) 7→ (x, y−ax2 ) gives an isomorphism
from Af to the classical plane of order p.
A great deal of effort has been expended on looking for new planar functions, in
the search for new nonclassical finite projective planes. Some non-quadratic planar func-
tions [DO] have been known since 1968 (including those constructed in Exercise #1);
however the planes constructed from them belong to a large recognized class of planes
known as translation planes. In 1997, Coulter and Matthews [CM] published a construc-
tion of planar functions, which give rise to planes which are not classical, nor are they
more general translation planes. Their construction has q = 3e with e > 4 (in particular,
the order is not prime). The main result of this Section is that for prime order fields,
every planar function is quadratic and so the associated plane is classical. This result was
obtained independently, and almost simultaneously, by three teams of researchers: Rónyai
and Szőnyi [RS], Hiramine [Hi], and Gluck [Gl]. We present here the proof by Gluck
because it is arguably the least technical, and because it beautifully demonstrates the
natural role played by cyclotomic fields in finite geometry; but also because his approach
lends itself naturally to certain generalizations [M3] which we will describe in Section 15.
In the following Theorem 14.2, the equivalence (a)↔(d) appears explicitly in [Gl], [RS]
and [Hi]. The equivalence of these statements with (b) and (c), which is easily inferred
from Gluck [Gl], will be useful in Section 15.
 14. Affine Planes

Theorem 14.2. Let f : F → F where F = Fp . Then the following four conditions


are equivalent.
(a) f is a planar function.
(b) For all m, m0 , b, b0 ∈ F with m 6= 0, the function f˜ : F → F defined by f˜(x) =

f (mx+b) + m0x + b0 has exponential sum satisfying |Sf˜| = p.
(c) For all m ∈ F , the function f˜ : F → F , f˜(x) = f (x) + mx has exponential sum

satisfying |Sf˜| = p.
(d) f is represented by a quadratic polynomial in F [x]; i.e. there exist a, b, c ∈ F
with a 6= 0 such that f (x) = ax2 + bx + c for all x ∈ F .

Proof. The implication (d)⇒(a) follows from Proposition 14.1; and the implication (b)⇒(c)
is trivial. It remains to prove (a)⇒(b) and (c)⇒(d). We start by assuming (a).
Let f : F → F be a planar polynomial over F = Fp , p an odd prime. Consider the
: x, y ∈ F . Denoting the conjugate-transpose of M by M ∗ ,
f (x−y)

p × p matrix M = ζ
the (x, y)-entry of M M ∗ is

0, if x 6= y;
X X X 
f (x−z) f (z−y) f (x−z)−f (y−z) (∆x−y f )(y−z)
ζ ζ = ζ = ζ =
z∈F z∈F z∈F
p, if x = y

by the definition of a planar polynomial. This says that M M ∗ = pI where I is the p × p


identity matrix; in other words, √1p M is unitary. In particular, every eigenvalue of M has

absolute value equal to p. Consider the all-ones vector 1 of length p. It is easy to see

that each entry of M 1 equals Sf , the exponential sum of f . So |Sf | = p.
Now given m, m0 , b, b0 ∈ F with m 6= 0, the function

f˜ : F → F, x 7→ f (mx+b) + m0x + b0

is evidently also planar: for all nonzero d ∈ F , we have

∆d f˜ : F → F, x 7→ (∆md f )(mx+b) + m0d

which is clearly a permutation of F because ∆md f is a permutation of F . Applying the


preceding argument to f˜ in place of f , we obtain (b).
Finally, assume (c). By Cavior’s Theorem 13.4, the multiset of values of f coincides
with the multiset of values of some quadratic polynomial g : F → F . In particular, f
assumes no value more than twice. The same is true for every function f˜ : F → F of the
form f˜(x) = f (x) + mx for m ∈ F . This means that in F 2 , the graph of f intersects any
non-vertical line y = −mx + b at most twice. By Segre’s Theorem 3.14, f is represented
by a quadratic polynomial, so (d) holds; and we have seen that Af is isomorphic to the
classical plane of order p in this case.
15. Nets 

The only known planes of prime order are the classical planes constructed from Fp ;
and it is tempting to conjecture that planes of prime order must be classical. (But once
again, keep in mind The Streetlight Effect.) Theorem 14.2 is the strongest result known in
this direction. The planes Af constructed from planar functions f share a special feature
in common with the classical planes: Each of these planes admits an elementary abelian
group of automorphisms of order p2 which transitively permutes the points: For each
(r, s) ∈ F 2 , the translation map (x, y) 7→ (x+r, y+s) takes `∞,a to `∞,a+s , and takes `m,b
to `m−r,b+s for m 6= ∞. Prior to Theorem 14.2, it was already known (using a combination
of geometric and group-theoretic arguments, which we omit here) that any affine plane of
prime order p, whose automorphism group has order divisible by p2 , must be of the form
Af for some planar polynomial f . Thus Theorem 14.2 yields the important consequence

Corollary 14.3. Any affine plane of prime order p whose automorphism group has
order divisible by p2 , must be classical.

Exercises 14.
k
1. Let E = Fq , q = pe , p an odd prime. Fix an automorphism σ ∈ Aut E; thus σ(x) = xp for some
k
k ∈ {0, 1, 2, . . . , e − 1}. Show that the function f : E → E, f (x) = xσ(x) = xp +1 is a planar
e e
function on F iff gcd(k,e) is odd. (Note that σ ∈ Aut E has order gcd(k,e) .)

2. Consider the quadratic extension E ⊃ F of fields of order q 2 and q, where q is odd. Recall that
σ : E → E, σ(x) = xq is the automorphism of order 2 with fixed field F . Define a new binary
operation on E by 
xy, if x is a square (zero or nonzero square);
x∗y =
xσ(y), if x is a nonsquare.
(a) Prove that ‘∗’ is associative and left-distributive, i.e. (x ∗ y) ∗ z = x ∗ (y ∗ z) and x ∗ (y + z) =
x ∗ y + x ∗ z for all x, y, z ∈ E.
(b) Prove that the nonzero elements of E form a nonabelian group under the operation ‘∗’.
(c) Although ‘∗’ is not right-distributive, this is compensated for by a weaker property which
you should show: if a, b, c ∈ E with a 6= b, then the equation a ∗ x = b ∗ x + c has a unique
solution x ∈ E.
(d) Show that the following structure is an affine plane of order q 2 , where we essentially replace
ordinary multiplication by ‘∗’. Take points to be ordered pairs (x, y) ∈ E 2 . There are two
types of lines:
• q ‘vertical’ lines x = k, i.e. point sets {(k, y) : y ∈ E} for k ∈ E; and
• q 2 ‘nonvertical’ lines y = m ∗ x + b, i.e. point sets {(x, m∗x + b) : x ∈ F }, where m, b ∈ E.
This construction gives one of the most standard classes of translation planes.

15. Nets

A k-net of order n is an incidence system of n2 points and kn subsets of the points called
lines, such that
 15. Nets

• each point is on k lines, and each line has n points;


• parallelism is an equivalence relation on the set of lines, where two lines `, `0 are
parallel (denoted ` k `0 ) if they are either equal or disjoint;
• each parallel class of lines (i.e. equivalence class under parallelism) consists of n lines
which partition the point set;
• any two distinct lines meet in either 0 or 1 points.
Clearly k 6 n + 1 here; and an (n + 1)-net of order n is the same thing as an affine plane
of order n. Here are all the nets of order 3, up to isomorphism:

If one hopes to build an affine plane of order n, one might reasonably try to do so incre-
mentally by starting with n2 points and adding one parallel class at a time, hoping to see
how far one might go. The first two parallel classes (which one may take to be ‘horizontal’
and ‘vertical’ lines) are trivially constructed. For every n > 2 it is possible to find a third
parallel class of lines extending this to a 3-net of order n. Now the construction process
becomes more delicate. While there exist 3-nets of order 6, none of them are extendible
to 4-nets (a fact known already to Euler); and this implies the nonexistence of an affine
plane of order 6. Although there exists a 5-net of order 4 (affine plane of order 4), there
exist 3-nets of order 4 which are not extendible to any 4-net or 5-net of order 4. Here is
an example of a maximal 3-net of order 4, i.e. one which cannot be extended to a 4-net:

And although 4-nets of order 10 have been constructed, it is not known whether or not
there exists a 5-net of order 10 (although no 11-net of order 10 exists, by Lam et al).
Here we describe an algebraic approach to studying finite nets, which (in our view) is
more promising than other approaches that have been tried. To simplify the exposition,
we assume here that n = p is prime. Our goal is to show that every plane of prime order p
is classical (a major open problem).
15. Nets 

To introduce this approach, we first consider the nets of order 3 shown above. Each
successive parallel class may be described by a triple of matrices, starting with A0 , A1 , A2
for the first parallel class; B0 , B1 , B2 for the second parallel class, etc., where
h1 1 1 i h1 0 0 i h1 0 0 i h1 0 0 i
A0 = 0 0 0 B0 = 1 0 0 C0 = 0 0 1 D0 = 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1

h0 0 0 i h0 1 0 i h0 1 0 i h0 0 1 i
A1 = 1 1 1 B1 = 0 1 0 C1 = 1 0 0 D1 = 1 0 0
0 0 0 0 1 0 0 0 1 0 1 0
h0 0 0i h0 0 1i h0 0 1i h0 1 0 i
A2 = 0 0 0 B2 = 0 0 1 C2 = 0 1 0 D2 = 0 0 1
1 1 1 0 0 1 1 0 0 1 0 0

2
Here the p points may be viewed as the pairs (i, j) with i, j ∈ F where F = Fp ; and each
matrix has 1’s in the positions of a line, with the other entries equal to zero. (The choice
of F3 as an index set here is purely a matter of convenience. Any set of p = 3 symbols
could be used in its place.) Note that

(15.1) A0 + A1 + A2 = B0 + B1 + B2 = C0 + C1 + C2 = D0 + D1 + D2 = J where
J = J3 , the 3 × 3 matrix of 1’s.

Denote by Ck the F -span of the matrices from the first k parallel classes, a k-net of order p.
We are interested in the dimensions of the sequence

(15.2) 0 = C0 6 C1 6 C2 6 · · · 6 Cp+1 .

In the case p = 3 shown above, the subspaces in (15.2) have dimensions 0,3,5,6,6. Now it
is the differences dim Ck − dim Ck−1 which are of interest. In our example, this is the se-
quence 3,2,1,0. It is not by coincidence that these values form an arithmetic sequence:
for every known plane of prime order p, these values always always form a sequence
p, p−1, p−2, . . . , 2, 1, 0, independently of which order we list the p+1 parallel classes. We
pose

(15.3) Conjecture [M1]: For any k-net of prime order p and (k − 1)-subnet
thereof, with k > 1, the subspaces Ck−1 6 Ck constructed as above have
dimensions satisfying
dim Ck − dim Ck−1 > p − k + 1.

The importance of (15.3) is due to

Theorem 15.4 [M1]. (i) Subnets of classical planes of prime order satisfy Conjec-
ture 15.3 with equality.
(ii) If Conjecture 15.3 holds for a given prime p, then all planes of order p are classical.
 15. Nets

It is known (see e.g. [M1]) that dim Cp+1 = 12 p(p+1) for every affine plane of prime order p.
Since p + (p−1) + · · · + 2 + 1 + 0 = 12 p(p + 1), the conjectured lower bound (15.3) would
require equality in the case of subnets of planes of order p; so if one’s sole interest is in
classifying planes of prime order p, then (15.3) could be replaced by the conjecture that for
subnets of planes of order p, dim Ck − dim Ck−1 = p−k +1. However, there are many nets
of prime order where strict inequality holds in (15.3); and it is conceivable that proving
the lower bound (15.3) in the more general setting of nets (not necessarily extendible to
planes) may be more natural or easier than proving equality. In Theorem 15.7 below, we
prove the first nontrivial case of Conjecture 15.3, the case k = 3. But first we rephrase the
descriptions both of the nets, and of the dimensions of the spaces Ci .
The p2 points of a k-net can be taken to be k-tuples x = (x1 , x2 , . . . , xk ) ∈ F k , F = Fp ,
where ai ∈ F indexes which line of the ith parallel class passes through the point x. Thus in
our example above, xi = 0, 1 or 2 according as Ai has entry 1 in the position corresponding
to the point x; and the 4-net is seen to be {(x, y, x+y, x−y) : x, y ∈ F } where F = F3 . In
general,

Theorem 15.5. Let F be an arbitrary set of n symbols, and let k > 2.


(i) Suppose N ⊆ F k is a subset of the k-tuples over F of size |N | = n2 such that
for all i 6= j in {1, 2, . . . , k}, the projection N → F 2 , (x1 , x2 , . . . , xk ) 7→ (xi , xj )
is surjective. (Thus every vector in N is uniquely determined by its ith and jth
coordinates.) Then N is the point set of a k-net of order n whose ith parallel
class (i ∈ {1, 2, . . . , k}) are the subsets {x ∈ N : xi = a} for a ∈ F .
(ii) Every k-net of prime order n is isomorphic to a net of the form described in (i).

We omit the proof of Theorem 15.5, which is straightforward. The notion of a k-net of
order n appears in many guises in the combinatorial literature (particularly as a set of k−2
mutually orthogonal Latin squares of order n, an orthogonal array OA(k, n), a transversal
design T D(k, n). See [ACD] for details, noting that our set of n2 vectors of length k above,
when transposed, form the columns of an OA(k, n).)
Now consider a k-net N of order n, k > 3. We will often write N = Nk to emphasize
that it is a k-net. We may assume Nk ⊆ F k has the form described in Theorem 15.5(i).
Deleting the ith coordinate from all vectors in Nk gives a (k − 1)-net of the same order n,
which we call a (k − 1)-subnet of the original net Nk . A k-net N has exactly k choices
of (k − 1)-subnet, each of which is formed by omitting one of the k parallel classes of lines
in N . Note that a 2-net is necessarily N2 = F 2 . The classical (or desarguesian) affine
planes of prime order p, as described in the notation of Theorem 15.5, have the form

{(x, y, x+y, x+2y, . . . , x+(p−1)y) : x, y ∈ F }, F = Fp

up to isomorphism. Moreover, any k-net obtained from one of these classical planes by
deleting (‘puncturing’) p + 1 − k of its coordinates, gives a k-subnet which we also call
15. Nets 

classical or desarguesian. Classical 3-nets can always be recoordinatized to have the


form {(x, y, x+y) : x, y ∈ F }; such 3-nets are also called cyclic.
Arbitrary k-nets of order p cannot be expected to admit an algebraic description as
in the classical case described above, and so Fp can then be replaced by an arbitrary index
set of size p. Yet we will continue to use F = Fp as our chosen index set in the general
case, as a matter of convenience.
Given a k-net Nk of order p as above, denote by V = Vk the vector space over F
consisting of all k-tuples of functions (f1 , f2 , . . . , fk ) such that fi : F → F satisfying

f1 (x1 ) + f2 (x2 ) + · · · + fk (xk ) = 0 for all (x1 , x2 , . . . , xk ) ∈ Nk .

In other words, if M is the kp × p2 incidence matrix of Nk (with rows indexed by points,


and columns indexed by lines) then Vk is essentially the (right) null space of M over F .
By the Fundamental Theorem of Linear Algebra,

dim Ck + dim Vk = kp

where Ck is the column space of M over F . (Although Ck has the same dimension as the
row space of M , the column space is interpreted more naturally—this being the subspace of
F p 2 spanned by the characteristic vectors of the lines of the net, as in our original example
for p = 3.) In terms of the spaces Vk , we may reformulate the conjectured inequality (15.3)
as dim Vk − dim Vk−1 6 k − 1 for k > 1.
Now consider the constant function : F → F , (a) = 1 for all a ∈ F ; and observe
that (a1 , a2 , . . . , ak ) ∈ Vk for all choices of scalars ai ∈ F satisfying a1 +a2 +· · ·+ak = 0.
(0)
These particular k-tuples of functions form a (k − 1)-dimensional subspace Vk 6 Vk , and
we obtain a splitting
(0)
Vk = Vk ⊕ Uk

where Uk is the subspace consisting of all (f1 , f2 , . . . , fk ) ∈ Vk such that f1 (0)=f2 (0)=· · ·=
(0)
fk (0) = 0. This splitting is obtained by noting that the map Vk → Vk , (f1 , f2 , . . . , fk ) 7→
(f1 (0) , f2 (0) , . . . , fk (0) ) is a projection with Uk as its kernel. This simplifies Conjec-
ture 15.3 further, leading to the equivalent form

(15.6) Conjecture: For any k-net of prime order p and (k − 1)-subnet thereof,
k > 2, the subspaces Uk−1 6 Uk constructed as above have dimensions
satisfying
dim Uk − dim Uk−1 6 k − 2.

As an example, for the 3-net of order 3 presented by the matrices Ai , Bi , Ci , i = 0, 1, 2 as


above, the one nontrivial relation between the first three parallel classes is

A1 − A2 + B1 − B2 − C1 + C2 = 0,
 15. Nets

giving (ι, ι, −ι) ∈ U3 where ι(a) = a for all a ∈ F3 ; and in this case U3 is one-dimensional
spanned by (ι, ι, −ι). The first nontrivial case of (15.6) says that for every 3-net of prime
order, dim U3 6 1. This is verified as follows.

Theorem 15.7 ([M1,M3]). Conjectures (15.3) and (15.6) hold for k = 3. In fact
for any 3-net of prime order p, we have dim U3 6 1; and equality holds iff the net is
cyclic, i.e. isomorphic to {(x, y, x+y) : x, y ∈ Fp }.

Proof. Let N3 ⊂ F 3 be a 3-net of order p, F = Fp ; and let ζ = ζp . Without loss of


generality, there exists a nonzero triple (f1 , f2 , f3 ) ∈ U3 . The exponential sums Sfi =
fi (a)
P
a∈F ζ satisfy
X  X  X
Sf1 Sf2 = ζ f1 (a) ζ f2 (b) = ζ f1 (a)+f2 (b)
a∈F b∈F a,b∈F
X X
= ζ f1 (a)+f2 (b) = p ζ −f3 (c) = pSf3
(a,b,c)∈N3 c∈F

since f1 (a) + f2 (b) + f3 (c) = 0 for all (a, b, c) ∈ N3 . Multiplying both sides by Sf3 , and
then using the same argument for the other pairs of subscripts in {1, 2, 3}, gives

Sf1 Sf2 Sf3 = p|Sf1|2 = p|Sf2|2 = p|Sf3|2 .

Now if any of the exponential sums Sfi is nonzero, they must all be nonzero and we obtain
|Sf1| = |Sf2| = |Sf3| = p; but then by Lemma 13.1(i), each of the functions fi is constant.
But since (f1 , f2 , f3 ) ∈ U3 , we have f1 (0) = f2 (0) = f3 (0) = 0. This forces f1 = f2 = f3 = 0, a
contradiction.
Thus Sf1 = Sf2 = Sf3 = 0. By Lemma 13.1(ii), each of the functions fi : F → F is a
permutation. Without loss of generality, f1 (x) = f2 (x) = x and f3 (x) = −x for all x ∈ F ;
for if not, then we simply relabel the p lines in each of the three parallel classes such that
this is the case. Now every (a, b, c) ∈ N3 satisfies

c = −f3 (c) = f1 (a) + f2 (b) = a + b;

that is, N3 = {(a, b, a+b) : a, b ∈ F }. It remains only to verify that for this particular
3-net, every (g1 , g2 , g3 ) ∈ U3 is a scalar multiple of (f1 , f2 , f3 ). For this, we may assume
g1 (1) = 0; otherwise replace (g1 , g2 , g3 ) by (g1 , g2 , g3 ) − g1 (1)(f1 , f2 , f3 ). But we now have
g1 (0) = g1 (1) = 0, so g1 is no longer a permutation of F , and the argument above then
forces g1 = g2 = g3 = 0 as required.

At this time we do not have a proof of Conjecture (15.3) or (15.6) for k = 4; but we
have some partial results. Our analysis of 4-nets begins with the following extension of
Theorem 15.7.
15. Nets 

Lemma 15.8 ([M3]). Let N4 be a 4-net of prime order p, and let (f1 , f2 , f3 , f4 ) ∈ U4 .
Then either
(i) three or more of f1 , f2 , f3 , f4 are permutations, or
(ii) |Sf1 | = |Sf2 | = |Sf3 | = |Sf4 | > 0.

Proof. As usual, let ζ = ζp . For all (x1 , x2 , x3 , x4 ) ∈ N4 we have f1 (x1 )+f2 (x2 )+f3 (x3 )+
f4 (x4 ) = 0, so ζ f1 (x1 )+f2 (x2 ) = ζ −f3 (x3 )−f4 (x4 ) . Summing over all (x1 , x2 , x3 , x4 ) ∈ N4 gives

Sf1 Sf2 = Sf3 Sf4 .

Multiplying both sides by Sf2 gives

Sf1 |Sf2 |2 = Sf2 Sf3 Sf4 .

By symmetry, we must have in fact

Sf1 |Sf2 |2 = Sf1 |Sf3 |2 = Sf1 |Sf4 |2 = Sf2 Sf3 Sf4

and
Sf1 (|Sf2 |2 − |Sf3 |2 ) = Sf1 (|Sf2 |2 − |Sf3 |2 ) = 0.
Now we may suppose at least one of the exponential sums Sfi is nonzero, otherwise case (i)
holds. So without loss of generality, Sf1 6= 0. Then we directly obtain |Sf2 | = |Sf3 | = |Sf4 |.
If the latter three exponential sums vanish, we obtain case (i); otherwise by symmetry we
obtain case (ii).

In the following, we write (0, x, x, x) ∈ U4 as an abbreviation for (0, ι, ι, ι) ∈ U4 where


ι(x) = x for x ∈ F .

Lemma 15.9 ([M3]). Suppose that N4 is a 4-net of prime order p for which there
exist linearly independent 4-tuples (f1 , f2 , f3 , f4 ), (0, x, x, x) ∈ U4 . Then either

(i) |Sf1 | = |Sf2 | = |Sf3 | = |Sf4 | = p and f2 , f3 , f4 are quadratic polynomials; or
(ii) Sf1 = 0 and at least two of f2 , f3 , f4 are scalar multiples of ι, ι(x) = x.

Proof. Suppose first that Sf1 6= 0. For all a ∈ F , (f1 , f2 , f3 , f4 ) + a(0, x, x, x) ∈ U4 ; so


Lemma 15.8 gives either

Sf2 (x)+ax = Sf3 (x)+ax = Sf4 (x)+ax = 0

or
|Sf2 (x)+ax | = |Sf3 (x)+ax | = |Sf4 (x)+ax | = |Sf | > 0.
 15. Nets

By Theorem 13.5, and using the fact that f2 (0) = f3 (0) = f4 (0) = 0, we obtain either
conclusion (i) or f2 = f3 = f4 = aι for some a ∈ F ; but in the latter case, we get
(f1 , 0, 0, 0) = (f1 , f2 , f3 , f4 ) − a(0, ι, ι, ι) ∈ U4 , forcing f1 = 0, a contradiction.
Hence we may assume that Sf1 = 0, so f1 is a permutation. Without loss of gener-
ality f1 = ι (otherwise relabel lines in the first parallel class so that this is the case). By
Lemma 15.8, the three sets Af2 , Af3 , Af4 (see Lemma 13.6) are mutually disjoint. Without
loss of generality, |Af2 | 6 |Af3 | 6 |Af4 |; otherwise permute the last three parallel classes
such that this inequality holds. Thus |Af2 | 6 |Af3 | 6 31 p 6 21 (p − 1). By Lemma 13.6
and the condition f2 (0) = f3 (0) = 0, we have f2 = aι and f3 = bι for some a, b ∈ F , so
conclusion (ii) holds.

Recall that a 4-net N4 has four 3-subnets, each formed by deleting one of the four
parallel classes of lines from N4 (or equivalently, by puncturing one of the four coordinates).

Theorem 15.10 ([M3]). Let N4 be a 4-net of prime order p. Then the number of
its cyclic 3-subnets is always 0, 1, 3 or 4, but never exactly 2.

Proof. We must show that if N4 has at least two cyclic 3-subnets, then it has a third.
Without loss of generality, parallel classes 1,2,3 of N4 form a cyclic 3-subnet; and so do par-
allel classes 2,3,4. After relabelling lines in each parallel class, we have (f1 , f2 , f3 , 0), (0, x,
x, x) ∈ U4 where f1 , f2 , f3 are permutations of F . By Lemma 15.9, we may suppose that
f2 (x) = ax for some a ∈ F . Now

(f1 (x), 0, f3 (x)−ax, −ax) = (f1 , f2 , f3 , 0) − a(0, x, x, x) ∈ U4

so that N4 has a third cyclic 3-subnet on parallel classes 1,3,4.

Remark : Theorem 15.10 is best possible in the sense that there exist 4-nets of prime order
for which the number of cyclic 3-subnets is 0, 1, 3 or 4.
Recall that a classical 4-net of order p is one of the form {(x, y, x+y, x+cy) : x, y ∈ F }
for some c ∈ F with c 6= 0, 1. For choices of p > 5, there are generally many nonisomorphic
4-nets of order p; different choices of c sometimes yield isomorphic 4-nets, but usually not.

Theorem 15.11 ([M3]). Let N4 be a 4-net of prime order p. Then N4 is classical


(i.e. desarguesian) iff all four of its 3-subnets are cyclic.

Proof. As in the proof of Theorem 15.10, we may suppose that

(f1 , f2 , f3 , 0), (0, x, x, x), (f1 , 0, f3 −ax, −ax) ∈ U4


15. Nets 

for some fixed (and evidently nonzero) a ∈ F ; and each of the nine nonzero coordinates
appearing in these 4-tuples is a permutation of F . Without loss of generality (again, by
permuting the labels on the lines of the first parallel class of lines) we have f1 (x) = x. By
hypothesis, there also exist permutations g1 , g2 , g4 of F such that

(g1 , g2 , 0, g4 ) ∈ U4 .

By Lemma 15.9, either g2 (x) = bx or g4 (x) = bx for some b ∈ F . We may assume that
g2 (x) = bx; otherwise interchange the second and fourth parallel classes (replacing also a
by −a, and f3 (x) by f3 (x)−ax). Now

(g1 (x), 0, −bx, g4 (x)−bx) = (g1 , g2 , 0, g4 ) − b(0, x, x, x) ∈ U4

so by Theorem 15.7, this is a scalar multiple of (x, 0, f3 (x)−ax, −ax). Without loss of
generality (after applying a suitable scalar multiple),

(g1 (x), 0, −bx, g4 (x)−bx) = (x, 0, f3 (x)−ax, −ax).

This forces
N4 = {(bx+ay, −x−y, x, y) : x, y ∈ F }.

Theorem 15.12 ([M3]). Let N4 be a 4-net of prime order p, and suppose that
N4 has a cyclic 3-subnet N3 . Then Conjectures 15.3 and 15.6 hold for N4 . Indeed,
dim U4 6 3; and equality holds iff N4 is isomorphic to a 4-subnet of a classical plane
of order p.

Proof. We may suppose that dim U4 > 3 and that

(f1 , f2 , f3 , f4 ), (g1 , g2 , g3 , g4 ), (0, x, x, x) ∈ U4

are linearly independent. By Theorem 15.7, the functions f1 and g1 are nonzero. More
than this, f1 and g1 are linearly independent functions F → F ; for if f1 = ag1 for some
a ∈ F , then
(f1 , f2 , f3 , f4 ) − a(g1 , g2 , g3 , g4 ) = b(0, x, x, x)
for some b ∈ F , a contradiction.

By Lemma 15.9 we have |Sf1 | ∈ {0, p}. More generally, for all a, b ∈ F the function

f = af1 + bg1 satisfies |Sf | ∈ {0, p, p}; so by Theorem 13.11, fi (x) = a2i σ(x)2 + bi σ(x)
for some ai , bi ∈ F and some permutation σ : F → F . We may assume σ(x) = x, after
relabelling lines in the first parallel class; and f1 (x) = x, g1 (x) = x2 , after a change of
basis for U4 . By Lemma 15.9, we may assume that

(x, a2 x, a3 x, f4 (x)), (x2 , g2 (x), g3 (x), g4 (x)), (0, x, x, x) ∈ U4


 16. Mutually Unbiased Bases

where a2 , a3 ∈ F and g2 , g3 , g4 are quadratic. In particular, we have nonzero tuples

(x, 0, (a3 −a2 )x, f4 (x)−a2 x), (x, (a2 −a3 )x, 0, f4 (x)−a3 x) ∈ U4

and so the 3-subnet formed by parallel classes 1,3,4 is cyclic; likewise the 3-subnet formed
by parallel classes 1,2,4. Since

(x2 , g2 (x), g3 (x), g4 (x)) + (x, a2 x, a3 x, f4 (x)) ∈ U4 ,

f4 +g4 is quadratic by Lemma 15.9; and since g4 is itself quadratic, this forces f4 to be
polynomial of degree 6 2. This means that f4 (x) = ag4 (x) + bx for some a, b ∈ F ; and so

(ax2 −x, ag2 (x) + (b−a2 )x, ag3 (x)+(b−a3 )x, 0) ∈ U4 .

This means that the 3-subnet formed by parallel classes 1,2,3 is cyclic (and a = 0). The
result follows by Theorem 15.11.

Exercises 15.
1. We have illustrated a cyclic 3-net of order 4. Prove that it is maximal; i.e. it is not a subnet of
any 4-net of order 4.

2. Let N be a k-net of prime order p, 2 6 k 6 p, in the standard form given by Theorem 15.5(i).
Consider a k-tuple of functions (f1 , f2 , . . . , fk ), fi : F → F , such that f1 (x1 ) + f2 (x2 ) + · · · +
fk (xk ) = 0 for all (x1 , x2 , . . . , xk ) ∈ N ; thus (f1 , f2 , . . . , fk ) ∈ Vk as we have defined the space Vk .
P
Denote Σi = a∈F fi (a) for i = 1, 2, . . . , k.
(a) Fix a ∈ F . By considering all p points (x1 , x2 , . . . , xk ) ∈ N with last coordinate xk = a,
show that Σ1 + Σ2 + · · · + Σk−1 = 0.
(b) By varying the choice of coordinate, obtain relations similar to that in (a) showing that any
k − 1 of Σ1 , Σ2 , . . . , Σk have sum equal to zero.
(c) Show that Σ1 = Σ2 = · · · = Σk = 0.
(d) Let ε = 1 − ζ where ζ = ζp . Show that for all i = 1, 2, . . . , k, the exponential sum Sfi lies in
the ideal (ε) ⊆ Z[ζ].

16. Mutually Unbiased Bases


A complex Hadamard matrix of order n is an n×n matrix H with complex entries such
that HH ∗ = nIn where H ∗ is the conjugate transpose of H. Every ordinary Hadamard
matrix is a complex Hadamard matrix, but not conversely. Unlike the situation for ordinary
Hadamard matrices, complex Hadamard matrices exist for every positive integer n:

Theorem 16.1. Let G be an abelian group of order n. Consider the n × n matrix


H with rows indexed by characters χ ∈ G
b and columns indexed by group elements
g ∈ G; and having (χ, g)-entry χ(g). Then H is a complex Hadamard matrix of
order n.
16. Mutually Unbiased Bases 

Proof. Orthogonality of the rows of H (with respect to the standard inner product on
Cn ) follows from Theorem 6.3(a).

The examples constructed in Theorem 16.1 are the character tables of the finite
abelian groups. For larger values of n, there are typically many other examples than these.

Example 16.2: Some smaller complex Hadamard matrices.


Order 6
Order 3 Order 4 1 1 1 1 1 1
 1 1 1 1 1 1

Order 2 "1 1 1 1 ζ2 ζ3 ζ4 ζ5 ω2 ω2
 11 ζ
 11 1 ω ω
  #
1 1 1
1 1 −1 −1 ζ2 ζ4 ζ2 ζ4  ω2 ω2 
H2 = 11 −1
 1 1 ω 1 ω
H60 =
 
H3 = 1 ζ ζ2 H4 = 1 −1 α −α
H6 =
1 ζ3 1 ζ3 1 ζ3 
; 1 ω2 ω 1 ω ω2 
1 ζ2 ζ
1 −1 −α α 1 ζ4 ζ2 1 ζ4 ζ2 1 ω2 ω2 ω 1 ω
1 ζ5 ζ4 ζ3 ζ2 ζ 1 ω ω2 ω2 ω 1
Here we denote ζ = ζ6 , ω = ζ3 = ζ 2 ; and α is an arbitrary complex number satisfying |α| = 1.
For every positive integer n, we may take ζ = ζn and H = [ζ xy : x, y ∈ Z/nZ]; this gives the
character table of the cyclic group of order n. The examples H2 , H3 , H6 above all arise in this
way. When n is not squarefree (i.e. not a product of distinct primes), then there exist noncyclic
groups of order n, hence additional character tables arising from Theorem 16.1. For n = 4 there
are two groups: the cyclic group of order 4 and the Klein 4-group. The choices α = 1 or i, in the
general form H4 above, give complex Hadamard matrices equivalent to the character tables of these
two groups. ‘Equivalence’ here is under row and column permutations, and scaling individual rows
and columns by complex numbers of modulus 1; these operations clearly take complex Hadamard
matrices to (equivalent) complex Hadamard matrices. Every complex Hadamard matrix of order 2
or 3 is equivalent to H2 or H3 respectively; but for order 4, there are uncountably many equivalence
classes of complex Hadamard matrices, due to the continuum of choices for α ∈ C, |α| = 1. (Note
that α is not required to be a root of unity, or even algebraic.) Classifying complex Hadamard
matrices of order n up to equivalence is a very difficult computational problem; for example it is only
rather recently that the case n = 5 was settled [Hp], with the result that all are equivalent to the
example arising from the cyclic group of order 5. For larger n, there are typically several isolated
equivalence classes of complex Hadamard matrices, and several non-isolated classes similar to H4 .

We will consider Cn as the set of row vectors of length n over C; thus for u, v ∈ Cn ,
the standard inner product of u and v may be written as uv ∗ ∈ C. A vector u ∈ Cn is
flat if all its entries have modulus √1n . Similarly, an n × n matrix A is flat if all its entries
have modulus √1n . Note that for any n × n complex matrix A, the Gram matrix of the
rows of A is the matrix AA∗ ; its (i, j)-entry is the inner product of rows i and j of A.

Theorem 16.3. Let H be an n × n complex matrix. The following three conditions


are equivalent.
(i) H is complex Hadamard.
(ii) √1 H is a flat unitary matrix.
n

(iii) The rows of √1 H are flat vectors forming an orthonormal basis of Cn .


n
 16. Mutually Unbiased Bases

We omit the proof of Theorem 16.3, which is straightforward. Note that the orthonormal
condition of (iii) is with respect to the standard complex inner product: it says that the
rows u1 , u2 , . . . , un of √1n H satisfy

1, if i = j;
ui u∗j = δi,j =
0, if i 6= j.

Let B = {u1 , u2 , . . . , un }, B 0 = {v1 , v2 , . . . , vn } be two orthonormal bases of Cn . We


say B and B 0 are unbiased if |ui vj∗ | = √1n for all i, j ∈ {1, 2, . . . , n}; equivalently, the
matrix of inner products ui vj∗ : i, j ∈ {1, 2, . . . , n} is flat.
 

The reason why the value √1n arises throughout, is that it is the only feasible value
for |ui vj∗ |, assuming this value is constant. To see this, suppose that c is a positive real
constant such that |ui vj∗ | = c for all i, j. We may expand ui with respect to the second
basis as
ui = ci,1 v1 + ci,2 v2 + · · · + ci,n vn
where ci,j = ui vj∗ and |ci,j | = c. Again using orthonormality, we have

1 = ||ui ||2 = |ci,1 |2 + |ci,2 |2 + · · · + |ci,n |2 = nc2 ,

so c = √1n . The unbiased property for two orthonormal bases is a symmetric (but neither
reflexive nor transitive) relation. For a given pair of orthonormal bases, it says that all
the vectors of one basis have a fixed ‘angle’ (actually, inner product) with respect to the
vectors in the other basis.
Every orthonormal basis is represented by a unitary matrix B having the vectors of
B as its rows. (While the columns of B are also orthonormal, it is only the rows that we
consider here.) So it is reasonable to say that two unitary matrices B1 , B2 are unbiased
if their rows form an unbiased pair of bases; equivalently, the matrix B1 B2∗ is flat. Since

B1 B2∗ is also unitary, the latter condition is also equivalent to the condition that nB1 B2∗
is complex Hadamard.
Turning this around, every complex Hadamard matrix H of order n gives rise to an
unbiased pair of unitary matrices In , √1n H and an unbiased pair of orthonormal bases (the
standard basis, forming the rows of In ; and the rows of √1n H).
Now consider a set of k orthonormal bases of Cn , say B1 , B2 , . . . , Bk . These bases are
mutually unbiased (of order n) if Bi and Bj are unbiased for all i 6= j in {1, 2, . . . , n}.
Equivalently, a list of unitary n × n matrices B1 , B2 , . . . , Bk is mutually unbiased if

nBi Bj∗ is complex Hadamard for all i 6= j in {1, 2, . . . , n}.

Example 16.4: Three mutually unbiased bases in C2 . The unitary matrices


1 0
, √1 11 −1
 1
, √1 1i 1i
 
0 1 2 2

are mutually unbiased where i = −1. Three is the maximum possible number of mutually unbiased
bases of order 2.
16. Mutually Unbiased Bases 

Example 16.5: Four mutually unbiased bases in C3 . The unitary matrices


       
1 0 0 1 1 1 1 ω ω 1 ω2 ω2
B∞ = 0 1 0 , B0 = √1 1 ω ω2 , B1 = √1 1 ω2 1 , B2 = √1 1 1 ω
0 0 1 3 1 ω2 ω 3 1 1 ω2 3 1 ω 1
are mutually unbiased where ω = ζ3 . Four is the maximum possible number of mutually unbiased
bases of order 3.

Expanding on comments (in Example 16.2) about equivalence of complex Hadamard


matrices, let us clarify what it means for two sets of MUBs (mutually unbiased bases) to
be equivalent. Two sets of MUBs are equivalent if one can be obtained from the other
by a combination of
• permuting the bases, or permuting the vectors within each basis;
• scaling the individual basis vectors by complex numbers of modulus 1, thus preserving
the orthonormal property of each basis; and
• applying a unitary transformation to Cn (which will simultaneously transform all of
the bases to new orthonormal bases which will still be unbiased).
Restated in terms of unitary matrices, this says that if {B1 , B2 , . . . , Bk } is a set of
k mutually unbiased unitary matrices of order n, then an equivalent set is {M1 B1 U,
M2 B2 U, . . . , Mk Bk U } where U is an arbitrary unitary n × n matrix; and M1 , M2 , . . . , Mk
are arbitrary unitary monomial n × n matrices. This means that each Mi has a single
nonzero entry in each row and each column; and these nonzero entries are complex numbers
of modulus 1. Note that since we refer to sets of matrices, their order is not important;
this takes care of equivalences due to permutations of the bases.
Quantum logical circuits make extensive use of complex Hadamard matrices as gates;
and sets of mutually unbiased bases have applications in quantum information theory, for
example in protocols for quantum cryptographic key exchange. While the details of these
applications are quite worthy of investigation, we must skip them for lack of available time.
We naturally ask: for each n, how large a set of mutually unbiased bases can be
found? How are they constructed in general? These are difficult questions! but some
partial answers are known:

Theorem 16.6. Suppose there exists a set of k mutually unbiased bases in Cn .


Then k 6 n + 1.

Proof. Denote by Vn the real vector space of all n × n Hermitian matrices, i.e. Vn is the
set of all A ∈ Cn×n such that A∗ = A. Note that dim Vn = n2 ; and the standard inner
product on Vn is the real inner product defined by [A, B] = tr(AB) for A, B ∈ Vn . (Note
that since A, B ∈ Vn , tr(AB) = tr(A B) = tr(AT B T ) = tr((BA)T ) = tr(BA) = tr(AB);
so this form is real-valued. Also tr(AA) = tr(AA∗ ) = i,j |ai,j |2 where A = ai,j , so the
P

form is positive definite.)


 16. Mutually Unbiased Bases

Suppose B1 , B2 , . . . , Bk are mutually unbiased bases of order n. Write B1 = {u1 , u2 ,


. . . , un }, B2 = {un+1 , un+2 , . . . , u2n }, . . . , Bk = {u(k−1)n+1 , u(k−1)n+2 , . . . , ukn } and con-
sider the matrices Ai = u∗i ui ∈ Vn for i = 1, 2, . . . , kn. We compute [Ai , Aj ] in each of
three essential cases, noting that [Ai , Aj ] = tr(Ai Aj ) = tr(u∗i ui u∗j uj ) = tr(ui u∗j uj u∗i ) =
(ui u∗j )(ui u∗j ) = |ui u∗j |2 .
For all i, [Ai , Ai ] = |ui u∗i |2 = 1. If i 6= j but ui and uj belong to the same orthonormal
basis Br , [Ai , Aj ] = |ui u∗j |2 = 0. Finally, if ui ∈ Br but uj ∈ Bs with r 6= s, we have
[Ai , Aj ] = |ui u∗j |2 = n1 . Thus the Gram matrix of A1 , A2 , . . . , Akn is

1 1
In n Jn ··· n Jn
 
 1 Jn In ··· 1
n Jn

M =  n. .
 
 .. .. .. .. 
. . . 
1 1
n Jn J
n n ··· In

We exhibit the eigenspaces of M acting on Ckn , with each eigenvector partitioned as


[w1 , w2 , . . . , wk ] ∈ Ckn where wi ∈ Cn :
• [1, 1, . . . , 1] is an eigenvector with eigenvalue k, where 1 ∈ Cn is the vector of 1’s;
• there is a (k−1)-dimensional eigenspace for eigenvalue 0, consisting of all [a1 1, a2 1, . . . ,
ak 1] ∈ Ckn where a1 , a2 , . . . , ak ∈ C satisfying a1 +a2 + · · · +ak = 0; and
• there is a (kn−k)-dimensional eigenspace for eigenvalue 1 consisting of all [w1 , w2 , . . . ,
wk ] ∈ Ckn where the vectors wi ∈ Cn satisfy 1wi∗ = 0.
Since this gives a decomposition of Ckn as a full set of eigenspaces for M , M has rank
equal to kn − k + 1. But this rank cannot exceed n2 , since M is the Gram matrix of a
set of vectors in the n2 -dimensional vector space Vn . From kn − k + 1 6 n2 we obtain
(n − 1)k 6 n2 − 1 and k 6 n + 1.

A complete set of MUBs (mutually unbiased bases) of order n is a set of


n + 1 MUBs of order n, thus attaining the upper bound of Theorem 16.6. Complete sets
of MUBs are known only for prime power values of n; the question of existence for non-
prime-power values of n is an open question. Thus the situation is very much like that for
k-nets of order n, where again k 6 n + 1 and the only known cases where equality holds
(giving rise to affine and projective planes of order n) are for prime power values of n.
The extent of the relationship between nets and MUBs remains rather mysterious at this
time—evidently there are connections; but there does not seem to be any theorem waiting
to be discovered, to the effect that if one or the other (affine plane of order n, or complete
set of MUBs of order n) exists, so does the other. (Some of the more naive researchers
have been drawn down that rabbit hole.) What can be said with assurance, however, is
that the most classical constructions of finite affine planes and complete sets of MUBs
have several common features. Both use finite fields (which necessarily have prime power
order). We prove the following only for q odd; the analogue for q even uses Galois rings
16. Mutually Unbiased Bases 

of characteristic 4, which we omit. The infinite family of classical examples to which we


refer here, generalizes Examples 16.4 and 16.5.

Theorem 16.7. Let q = pe , p an odd prime. Then there exists a complete set of
MUBs of order q.

2
Proof. Let B∞ = Iq . For each r ∈ F = Fq , define the q × q matrix Br = √1q ζ Tr(ry +xy) :


x, y ∈ F where Tr = TrF/K , K = Fp , ζ = ζp . For all r, s in F , the (x, y)-entry of Br Bs∗ is


1
q Sf where f (z) = (r −√ s)z 2 + (x − y)z. When r 6= s, f (z) is a quadratic polynomial; so by
Theorem 13.3, |Sf | = q and the matrix Br Bs∗ is flat. When r = s, f (z) = (x − y)z and
Sf = 0 for x 6= y; Sf = q for x = y so Br Br∗ = Iq ; thus Br is unitary. Of course, B∞ = Iq

is unitary. Finally, for each r ∈ F , Br B∞ = Br is clearly flat.

Haagerup [Hp] has showed that every complex Hadamard matrix is equivalent to the
construction of Theorem 16.1. We [MM] have extended this result to show that every set
of mutually unbiased bases of order 5 is contained (up to equivalence) in the complete set
constructed in Theorem 16.7. This result uses our Theorems 13.5 and 13.12, thus lending
credence to the belief in a connection between nets and MUBs. However our result relies
on Haagerup’s uniqueness result [Hp] for the complex Hadamard matrix of order 5. The
basic argument works for 2, 3 and 5 where there is a single complex Hadamard matrix up
to equivalence, but not for other orders.

Theorem 16.8 ([MM]). Every set of k mutually unbiased bases of order 5 is con-
tained (up to equivalence) in the complete set constructed in Theorem 16.7.

Proof. Throughout our proof we denote


• F = F5 and ζ = ζ5 ;
• U5 = U5 (C), the group of 5 × 5 unitary matrices over C;
• M5 isthe group of 5 × 5 unitary
 monomial matrices. This is the set of matrices of the
form λx δx,σ(y) : x, y ∈ F , F = F5 where λx ∈ C, |λx | = 1, and σ is one of the 120
permutations of F . Every M ∈ M5 factors uniquely as M = P D where P is a 5 × 5
permutation matrix and D is a 5 × 5 diagonal matrix with complex entries having
modulus 1 on its main diagonal;
2
• Br = √15 ζ ry +xy : x, y ∈ F for r ∈ F . These matrices, together with I5 , form


the standard complete set of MUB’s


 ryfrom Theorem 16.7. We also denote B = B0 =
2
1 xy
 
√ ζ
5
: x, y ∈ F so that Br = B ζ δx,y : x, y ∈ F for all r ∈ F .
Haagerup [Hp, Theorem 2.2] has classified the complex Hadamard matrices of order 5:

(16.9) Every complex Hadamard matrix of order 5 has the form M BM 0 for some
M, M 0 ∈ M5 .
 16. Mutually Unbiased Bases

Let us call a complex Hadamard matrix normalized if its first row and column consist
of 1’s. Every complex Hadamard matrix is equivalent to one which is normalized; simply
scale each row and column by an appropriate complex number of modulus 1 to obtain such
a normalized representative of its equivalence class. Or, we may choose to first permute
rows and columns before scaling, thereby obtaining a possibly different normalized matrix
in the equivalence class; so the normalized form is not unique in its equivalence class. In
particular there is only one equivalence class of complex Hadamard matrices of order 5,
but many normalized representatives in this class; see Exercise #2. We prove the following
refinement of (16.9):

(16.10) Every complex Hadamard matrix of order 5 has the form M BM 0 for some
M, M 0 ∈ M5 such that M 0 has entry 1 in its upper left corner.

(It is customary to refer to the upper left corner of a matrix as its (1, 1)-entry; although
when we index the entries using elements of F5 , it would make more sense to call this the
(0, 0)-entry.) Given a complex Hadamard matrix H of order 5, first write it in the form
H = M BM 0 for some M, M 0 ∈ M5 by (16.9). However the leftmost column of this M 0
has its nonzero entry in the (i,0) position. It is straightforward to check that the circulant
matrix  C = δx,y+1 : x,y ∈ F satisfies BC = DB where D ∈ M5 is the diagonal matrix
D = ζ δx,y : x, y ∈ F . The monomial matrix M 00 = C −i M 0 has a nonzero entry (call
x

it λ) in its top left corner, and H = M BM 0 = M BC i M 00 = M Di BM 00 = M 000 BM 00


where M 00 , M 000 ∈ M5 . Without loss of generality, λ = 1; otherwise replace M 00 , M 000 by
λM 00 , λM 000 respectively. This gives the form claimed in (16.10).

(16.11) Every normalized complex Hadamard matrix H of order 5 has fifth roots
of unity as entries. The product of the entries in each of its rows is 1; and
the same holds for columns.

To verify (16.11), let H = M BM 0 be a normalized complex Hadamard matrix of order 5,


where M, M 0 are as in (16.10). Comparing leftmost columns on both sides, we see that
M must also have entry 1 in its top left corner. Factor M = DP and M 0 = P 0 D0
where P and P 0 are permutation matrices (each having 1 in the upper left corner); also
D = diag(1, λ1 , λ2 , λ3 , λ4 ) and D0 = diag(1, λ01 , λ02 , λ03 , λ04 ) are diagonal matrices with |λi | =
|λ0i | = 1. Now H = D(P BP 0 )D0 where both H and P BP 0 have 1’s in their top row and
leftmost column. This forces D = D0 = I and H = P BP 0 . Since the product of the entries
in each row of B is 1, and similarly for each column, the same must be true for H. This
gives (16.11).
Hence we may suppose k ∈ {3, 4, 5, 6}; and our k MUB’s are represented by the unitary
matrices I5 , U0 , U1 , . . . , Uk−2 where

U0 = B, Ur = BMr , M1 , M2 , . . . , Mk−2 ∈ M5 ; and


each Mr has upper left corner entry 1.
16. Mutually Unbiased Bases 

(Left-multiplication by arbitrary monomial matrices is not required here, since this will
 
take our set to an equivalent set of MUB’s.) Now each Mr = λr,y δx,σr (y) : x, y ∈ F where
|λr,x | = 1 and σr is a permutation of F , for all x ∈ F and r ∈ {0, 1, . . . , k−2}; moreover,
λr,0 = λ0,x = 1, σ0 (y) = y (the identity permutation) and σr (0) = 0 for all r.
Now the matrix 5Ur Us∗ has (x, y)-entry equal to
X X
ζ xz λr,v δz,σr (v) δw,σs (v) λs,v ζ −wy = ζ xσr (v)−yσs (v) λr,v λs,v ,
z,v,w∈F v∈F


which is required to have modulus equal to 5 whenever r 6= s in {0, 1, . . . , k−2}. This
means that
P xσr (v)−yσs (v) 2
(16.12) for all x, y in F and r 6= s, 5 = ζ λr,v λs,v
v∈F
P x(σr (v)−σr (w))−y(σs (v)−σs (w))
= ζ λr,v λs,w λr,v λs,w .
v,w∈F

Now specialize (16.12) to the case r = y = 0 and recall that σ0 = id and λ0,x = 1 to
obtain

2
ζ −xw λs,w ζ x(v−w) λs,w λs,v = 5 whenever x ∈ F and s 6= 0.
P P
(16.13) =
w∈F v,w∈F

Multiply both sides of (16.13) by ζ rx , r ∈ F , and then sum over x ∈ F to obtain


P
λv+r λv = 5δr,0 for all r ∈ F
v∈F


where we abbreviate λx := λs,x for fixed s 6= 0. That is, the circulant matrix H = λx+y :

x, y ∈ F is complex Hadamard! Normalizing H, we obtain

1 1 1 1 1
 
 1 λ1 2λ2 λ1 λ2 λ3 λ1 λ3 λ4 λ1 λ4 
 
 2 
 1 λ1 λ2 λ3 λ2 λ4 λ2 λ3 λ1 λ2 λ4 
 ,
2
 1 λ1 λ3 λ4 λ2 λ3 λ1 λ3 λ0 λ2 λ3 λ4 
 
2
1 λ1 λ4 λ1 λ2 λ4 λ2 λ3 λ4 λ3 λ4

using the fact that λ0 = λs,0 = 1. By (16.11), the product of the entries in each row of this
matrix is 1. This says that each λx = λs,x is a fifth root of unity. So there exist functions
fs : F → F satisfying fs (0) = 0 and λs,x = ζ fs (x) . Returning to (16.13), we now have

ζ fs (w)−xw =
P
5 for all s 6= 0.
w∈F
 17. Weil’s Bound

By Theorem 13.5, fs is a quadratic function. Since λs,0 = ζ fs (0) = 1, we have fs (x) =


as x2 + bs x for some as , bs ∈ F . Returning to (16.12), but this time specializing to r = 0,
we obtain √
P as w2 +(x+bs )w+yσs (w)
ζ = 5 for all x, y ∈ F
w∈F

where as 6= 0 for all s 6= 0. By Theorem 13.12, the permutation σs must be a first-degree


polynomial: σs (w) = ms w + ds for some ms , ds ∈ F with ms 6= 0. Now a straightforward
computation shows that

Us = BMs = Ms0 Bas where Ms0 = ζ ds x δms x+bs ,y : x, y ∈ F ∈ M5 .


 

Again, we may dispense with the monomial matrices Ms0 , using equivalence of MUBs,
leaving Us = Bas for s = 1, 2, . . . , k−2. We also have U∞ = B∞ = I5 and U0 = B0 = B;
so our set of k − 2 MUB’s is (up to equivalence) a subset of the standard set.

Exercises 16.
1. The two groups of order 4 have character tables giving rise to Hadamard matrices
   
1 1 1 1 1 1 1 1
H4a =  1 i −1 −i  ; H4b =  1 −1 1 −1 .
1 −1 1 −1 1 1 −1 −1
1 −i −1 i 1 −1 −1 1

(a) Show that H4a and H4b are not equivalent Hadamard matrices. That is, show that there do
not exist unitary monomial matrices M, M 0 satisfying H4b = M H4a M 0 .
(b) Show that H4a is equivalent to H4 (from Example 16.2) for some choice of α; and similarly
for H4b .
2. Exactly how many ‘normalized’ complex Hadamard matrices of order 5 are there? (See (16.9)
and the comments which follow it.)

17. Weil’s Bound


Fix a polynomial g(x) = b0 + b1 x + b2 x2 + · · · + bd xd ∈ F [x] of degree d > 2, where
F = Fq ⊇ K = Fp , q = pe and gcd(d, q) = 1, bd 6= 0; and consider the exponential sum
X
Sg = ζ TrF/K g(a) ∈ Z[ζ]
a∈F

where ζ = ζp . Recall that TrF/K is the absolute trace map


2 e−1
F → K, a 7→ a + ap + ap + · · · + ap .

Weil’s bound (Theorem 13.2) is the assertion that



|Sg | 6 (d − 1) q.
17. Weil’s Bound 

We now outline key elements of the proof, using the machinery of L-functions introduced
Section 12. We replace the choice of multiplicative function λ : M → C used in Section 12
(designed for investigating Gauss sums and the Hasse-Davenport relations) by a new choice
λ = λg ; but everything in Section 12 up to (12.6) applies here as well. While we omit some
details in the proof of Weil’s bound, these can be found in other sources. The details, as
found in [LN], [Sc], are elementary if somewhat technical.
Our rationale for naming the fixed polynomial g(x) (above) is to reserve the name f ∈
M for an arbitrary monic polynomial as in our previous generalities regarding L-functions.
As before, M is the multiplicative monoid of monic polynomials in F [x]. Recall that all
nonzero ideals A ⊆ O = F [x] are principal, having the form A = (f ) for some f ∈ M .
We are ready to introduce our new choice of multiplicative function λ(A) = λ(f ) = λg (f )
which depends on the choice of given polynomial g above.
Recall that it suffices to define λ(f ) for f ∈ P (i.e. f monic irreducible), then extend
to the monoid M using unique factorization in F [x]. So given f ∈ Pk , recall that E = Fqk
is the splitting field of f over F , thus:

f (x) = (x − r1 )(x − r2 ) · · · (x − rk ), some distinct r1 , r2 , . . . , rk ∈ E.

We define
λ(f ) = λg (f ) = ζ TrF/K [g(r1 )+g(r2 )+···+g(rk )] .
P
We must of course show that this definition makes sense! Obviously i g(ri ) ∈ E; but since
every F -automorphism σ of E permutes the n roots of f by Theorem A5.3, σ permutes
j
the k terms in i g(ri ). (Recall: the Galois group G(E/F ) is cyclic and σ(a) = aq for
P
P P
some j ∈ {0, 1, 2, . . . , k−1}). Hence σ( i g(ri )) = i g(ri ). By Galois theory, this means
P
that i g(ri ) ∈ F , where F is the domain of our trace map TrF/K .
We will denote the associated L-function by Lg (s) := Lλg (s). As before, the key step
is proving that the coefficient of z n in the series expansion of Lg (s) vanishes for large n
(see (12.5)), thus forcing the L-function to be a polynomial in z.
Recall that Mn ⊂ M is the set of monic polynomials f (x) ∈ F [x] of degree n.
P
Lemma 17.1. If n > d, then λ(f ) = 0.
f ∈Mn

Proof. Each f ∈ Mn has the form

f (x) = xn − e1 xn−1 + e2 xn−2 − · · · + (−1)n en = (x − r1 )(x − r2 ) · · · (x − rn )

where ei = ei (r1 , r2 , . . . , rn ) ∈ F are elementary symmetric polynomials in the roots; see


Appendix A7. Now d Xn d
X j
X
g(r1 ) + g(r2 ) + · · · + g(rn ) = bj ri = bj mj
j=0 i=1 j=0
where
 17. Weil’s Bound

mj = mj (r1 , r2 , . . . , rn ) = r1j + r2j + · · · + rnj


are the moment polynomials in the roots. (Thus m0 = n, m1 = e1 , m2 = e21 − 2e2 , etc. as in
Theorem A7.3.) By Corollary A7.4, we have
md = (−1)d−1 ded + h(e1 , e2 , . . . , ed−1 )
for some polynomial h(t1 , t2 , . . . , td−1 ) ∈ F [t1 , t2 , . . . , td−1 ]. Thus
n
X
g(ri ) = b0 m0 + b1 m1 + b2 m2 + · · · + bd−1 md−1 + bd [(−1)d−1 ded + h(e1 , e2 , . . . , ed−1 )]
i=1
= (−1)d−1 dbd ed + e
h(e1 , e2 , . . . , ed−1 )

for some polynomial eh(e1 , e2 , . . . , ed−1 ) ∈ F [e1 , e2 , . . . , ed−1 ] (whose coefficients depend on
the fixed polynomial g). Thus
X X
λ(f ) = ζ TrF/K[g(r1 )+g(r2 )+···+g(rn )]
f ∈Mn f ∈Mn (summing over f ∈ Mn
X d−1
= ζ TrF/K[(−1) dbd ed +e
h(e1 ,e2 ,...,ed−1 )] amounts to summing over
e1 ,e2 ,...,en ∈F choices for its coefficients)

= q n−d
X
ζ TrF/K[(−1)
d−1
dbd ed +e
h(e1 ,e2 ,...,ed−1 )] (the summand is independent
of ed+1 , . . . , en ∈ F )
e1 ,e2 ,...,ed ∈F
X d−1 X
= q n−d ζ (−1) TrF/K (dbd ed )
ζ TrF/K eh(e1 ,e2 ,...,ed−1 ) = 0
ed ∈F e1 ,e2 ,...,ed−1 ∈F

since dbd 6= 0 and the map x 7→ ζ TrF/K(dbd x) is a nontrivial additive character of F by


Theorem A1.7(ii). This last step makes essential use of the hypothesis p6 d.

By (12.5), Lg (s) is a polynomial in z of degree at most d − 1, so


Lg (s) = (1 − ω1 z)(1 − ω2 z) · · · (1 − ωd−1 z)
for some ω1 , ω2 , . . . , ωd−1 ∈ C. Now
d−1
X ∞ X
X
ln(1 − ωi z) = ln Lg (s) = − ln(1 − λ(f )z k )
i=1 k=1 f ∈Pk
and so
∞ d−1 ∞ X
X X ωi z L0g (z) X kλ(f )z k
− (ω1n + · · · +ωd−1
n
)z n = − =z =
1 − ωi z Lg (z) 1 − λ(f )z k
n=1 i=1 k=1 f ∈Pk
∞ X X
X ∞ X∞ X X
= kλ(f )rz rk = kλ(f )n/kz n .
k=1 f ∈Pk r=1 n=1 k|n f ∈Pk

Equating coefficients gives


17. Weil’s Bound 

X X
−(ω1n + · · · + ωd−1
n
) = kλ(f )n/k
k|n f ∈Pk
for all n > 1. We will show that the latter double sum is simply α∈E ζ TrE/K g(α) where
P

E = Fqn . Given α ∈ E, let f (x) be its minimal polynomial over F , so deg f (x) = k =
[F [α] : F ] which divides [E : F ] = n; moreover in this case Theorem 3.8 gives
2 k−1
f (x) = (x − α)(x − αq )(x − αq ) · · · (x − αq )
and we can group together the k terms in our sum arising from the same minimal polyno-
mial f (x) to get
X X X  q q k−1
ζ TrE/K g(α) = ζ TrE/K g(α) + ζ TrE/K g(α ) + · · · + ζ TrE/K g(α )


α∈E k|n f ∈Pk


X X
= kζ TrE/K g(α) (Corollary A5.14)
k|n f ∈Pk
X X
= k(ζ TrF [α]/K g(α) )n/k (Corollary A1.10)
k|n f ∈Pk
X X
= k(ζ TrF/K TrF [α]/F g(α) )n/k (Theorem A1.8)
k|n f ∈Pk
q k−1
+···+g(α)q
X X
= k(ζ TrF/K[g(α)+g(α) ] n/k
) (Theorem 3.8)
k|n f ∈Pk
X X q
)+···+g(αq
k−1
(since g(x) ∈ F [x];
= k(ζ TrF/K[g(α)+g(α )] n/k
)
see Theorem A5.3)
k|n f ∈Pk
X X
= kλ(f )n/k .
k|n f ∈Pk
Thus

Corollary 17.2. Let F = Fq ⊇ K = Fp be a field of order q = pe , and let


g(x) ∈ F [x] be a polynomial of degree d > 2 with gcd(d, q) = 1. Then there ex-
P Tr g(α)
ist complex numbers ω1 , . . . , ωd−1 such that ζ E/K = −(ω1n + · · · + ωd−1
n
) for
α∈E
every d > 1 where E = Fqn .

Now Schmidt [Sc] gives an elementary (but rather technical) proof of

(17.3) Each of the complex numbers ω1 , ω2 , . . . , ωd−1 in Corollary 17.2 satisfies



|ωi | = p.

Using (17.3) together with Corollary 17.2, we have


X √ n
|Sg | = ζ TrE/K g(α) = |ω1n + ω2n + · · · + ωd−1
n
| 6 (d − 1) p = (d − 1)q n
α∈E
 17. Weil’s Bound

which is Weil’s bound. In place of Schmidt’s technical argument, Lidl and Niederreiter [LN]

substitute a slightly less technical (but also elementary) argument proving |ωi | 6 p. This
is also sufficient to establish Weil’s bound, as is clear from the argument above; but it is
less satisfying in that it falls short of proving the equality in (17.3). We did not feel so
compelled to complete the proof of Weil’s bound as to devote many more technical pages
to the goal, having already described what we view as the nicest part of the proof.

Example 17.4: Quadratic Exponential Sum. Let F = Fp where p is an odd prime, and take
g(x) = x2 . Here d = 2 which does not divide p. If E = Fq , q = pn then by Theorem 13.3 and
Corollary 12.11,
 √ n
(− p) , if p ≡ 1 mod 4;
ω1n = −G(χE ) = √
(−i p)n , if p ≡ 3 mod 4.
√ √
Evidently ω1 = − p or −i p according as p ≡ 1 mod 4 or p ≡ 3 mod 4. Note that (17.3) is satisfied.
Replacing g(x) with another quadratic polynomial give a similar results.

Exercises 17.
1. Let F = F3 and E = Fq where q = 3n , and let g(x) = x5 + x ∈ F [x]. Note that d = 5 which is
relatively prime to q. Here ζ = ζ3 = ω and all exponential sums have values in the Eisenstein
integers Z[ω].
(a) Compute
√ an table of values of g(a) for a ∈ F9 . It is convenient to take F9 = F3 [i] where
i = −1.
(b) Using (a), compute a∈E ω TrE/F g(a) ∈ Z[ω] for n = 1, 2.
P

(c) Equating the sum in (b) to −ω1n − ω2n , obtain two equations in two unknowns ω1 , ω2 ∈ C.
Solve for ω1 and ω2 .
(d) Does the equality of (17.3) hold? Explain.
(e) Use Corollary 17.2 to evaluate the exponential sum for n = 1, 2, 3, 4, 5, 6.
(f) Half of the values listed in (e) are zero. Give a very simple explanation for this fact. (Hint:
Comments immediately following the statement of Theorem 11.7 use similar reasoning.)
Appendix A1: Fields and Extensions

See e.g. [Sa], [Mc] for details on fields and extensions.


Recall that a field is a commutative ring F with identity, in which every nonzero
element is a unit (i.e. has a multiplicative inverse). The key feature which distinguishes
fields from more general commutative rings, is that it is closed not only under addition,
subtraction and multiplication, but also under division (meaning that if a, b ∈ F with a 6= 0,
then the equation ax = b has a unique solution x = ab ∈ F ). Examples of commutative
rings with identity, which are not fields, include: Z, R[x] where R is any commutative
ring with identity, and F n = F ⊕ · · · ⊕ F (n > 2 copies, with componentwise addition and
multiplication). Here R[x] is the ring of polynomials in an indeterminate x with coefficients
in R. Examples of fields include: C, Q, R, and the field F (x) of rational functions in
an indeterminate x with coefficients in x. Here F (x) = fg(x) (x)
: f (x), g(x) ∈ F [x] with
g(x) 6= 0 .
An extension of fields is a pair of fields E ⊇ F where F is a subring (and hence a
subfield) of E. In this case, it is clear from the axioms that E is also a vector space over
F ; and so we may speak of the dimension of this vector space. We call this dimension
the degree of the extension, denoted [E : F ]. Here [E : F ] > 1, where equality holds iff
E = F . For example, C ⊃ R is an extension of degree [C : R] = 2 (a quadratic extension)
with basis {1, i}, and the extension R ⊃ Q has infinite degree. A finite extension is an
extension of finite degree (regardless of whether the fields themselves are finite or infinite).
A tower of fields is a chain of extension fields En ⊇ En−1 ⊇ · · · ⊇ E1 ⊇ E0 . Every such
Qn
tower has degree [En : E0 ] = i=1 [Ei : Ei−1 ] as one proves by induction on n, using
the following result. We refer to this result as the transitivity of degrees for extension
fields.

Theorem A1.1. If K ⊇ E ⊇ F is a tower of fields, then [K : F ] = [K : E][E : F ].

Proof. Suppose {α1 , . . . , αm } is a basis for K over E, and {β1 , . . . , βn } is a basis for E
over F . It is easy to see that {αi βj : 16i6m, 16j6n} is a basis for E over F . Indeed, every
Pm
α ∈ K can be uniquely expressed as α = i=1 ai αi with ai ∈ E; and we can uniquely ex-
Pn Pm Pn
press ai = j=1 bij βj with bij ∈ F . This gives a unique expression α = i=1 j=1 bij αi βj
with bij ∈ F as required.

Although we have proved Theorem A1.1 only in the case of finite degree, one similarly
proves the general case (with the obvious convention that [K : F ] = ∞ iff at least one of
[K : E] or [E : F ] is infinite. Theorem A1.1 should be seen as the field-theoretic analogue
of the statement [G : K] = [G : H][H : K] for chains of subgroups G > H > K.
 Appendix A1: FIELDS AND EXTENSIONS

The characteristic of a field F , denoted char F , equals the minimum positive inte-
ger n such that 1 + 1 + · · · + 1 = 0 (with n 1’s), if such an n exists; otherwise we say F
has characteristic zero and we write char F = 0. For a field F of positive characteristic,
char F = p must be prime. This fact is proved by an obvious generalization of the following
explanation why char F 6= 6: otherwise

0 = 1 + 1 + 1 + 1 + 1 + 1 = (1 + 1)(1 + 1 + 1)

which yields either 1 + 1 = 0 or 1 + 1 + 1 = 0 in F , contradicting the minimality of


char F . For every extension E ⊇ F , it is clear that E and F have the same characteristic.
Moreover, every field F has a unique smallest subfield K, called the prime subfield of F ;
and this field is isomorphic to Fp , the field prime order p, if char F = p; or Q, in the case
char F = 0.

Theorem A1.2. Let E ⊇ F be an extension of fields., and let θ ∈ E. Then the


following three conditions are equivalent:
(i) θ is a root of a nonzero polynomial f (x) ∈ F [x];
(ii) the powers 1, θ, θ2 , θ3 , . . . are linearly dependent over F ;
(iii) F [θ] is a field (hence equal to F (θ), its field of quotients).
Assuming these conditions hold, there is a unique smallest degree monic polynomial
m(x) ∈ F [x] satisfying m(θ) = 0; and this polynomial m(x) is irreducible in F [x]. In
fact m(x) ∈ F [x] is the unique monic irreducible polynomial having θ as a root; and
a polynomial f (x) ∈ F [x] satisfies f (θ) = 0 iff f (x) is divisible by m(x) in F [x].

Under the conditions of Theorem A1.2, we say θ is algebraic over F ; and Irrθ,F (x) := m(x)
is the minimal polynomial of θ over F . Moreover, θ is algebraic of degree n where
n = deg m(x) = [E : F ].
Proof. Assuming (ii), there is a linear combination

a0 + a1 θ + a2 θ2 + · · · + an θn = 0

for some a0 , . . . , an ∈ F , not all zero; and this proves (i).


Assuming (i), the subset J ⊆ F [x] consisting of all f (x) ∈ F [x] such that f (θ) = 0, is a
nonzero ideal. Choose a nonzero element m(x) ∈ J of smallest degree; and without loss of

generality, m(x) is monic. Now J contains the principal ideal m(x) = {g(x)m(x) : g(x) ∈

F [x]} ⊆ J. By the Division Algorithm, it is easy to see that equality holds: J = m(x) .
If m(x) = m1 (x)m2 (x) where m1 (x), m2 (x) ∈ F [x], then either m1 (x) or m2 (x) must have
θ as a root; and by minimality of deg m(x), one of the factors mi (x) must be a nonzero

constant; so m(x) is irreducible in F [x]. Now the ideal m(x) ⊂ F [x] is maximal; so the

quotient ring F [x]/ m(x) is a field. Also the evaluation map F [x] → F [θ], f (x) 7→ f (θ) is
Appendix A1: FIELDS AND EXTENSIONS 

a surjective ring homomorphism; and its kernel is J = m(x) . So the First Isomorphism
Theorem for Rings gives
F [θ] ∼

= F [x]/ m(x) ,
a field. This proves (iii), and later assertions regarding m(x) follow as well.
Finally, suppose (iii) holds; and we must prove (ii). We may suppose θ 6= 0, otherwise
there is nothing to prove. By (iii), θ has a multiplicative inverse in F [θ], so there exist
m > 0 and b0 , b1 , . . . , bm ∈ F such that

(b0 + b1 θ + b2 θ2 + · · · + bm θm )θ = 1.

After expanding and moving all terms to one side, we obtain (ii) as required.

An extension E ⊇ F is algebraic if every element of E is algebraic over F .

Corollary A1.3. Every finite extension is algebraic.

Proof. Let E ⊇ F be a finite extension of degree n = [E : F ], and let θ ∈ E. Then the


n + 1 elements 1, θ, θ2 , . . . , θn ∈ E must be linearly dependent over F ; so θ is algebraic
over F (of degree at most n).

Corollary A1.4. Let E ⊇ F be an extension of fields, and suppose the elements


α, β ∈ E are algebraic over F . Then α + β, α − β and αβ are algebraic over F . Also
if β 6= 0 then α
β is algebraic over F .

Proof. Since α is algebraic over F , we have a finite extension field L := F [α] ⊇ F and
so [L : F ] < ∞. Since β is algebraic over F , it is algebraic over L; so L[β] ⊇ L is a finite
extension. Now [L[β] : F ] = [L[β] : L][L : F ] < ∞, so the extension L[β] ⊇ F is algebraic.
So the elements α±β, αβ, α β ∈ F [α, β] = L[β] are algebraic over F .

Thus the set of all elements of an extension E ⊆ F which are algebraic over F , forms an
intermediate subfield. In particular, C has a subfield A consisting of all complex numbers
that are algebraic over Q. Note that A is algebraically closed (i.e. every nonconstant
polynomial f (t) ∈ A[t] has a root in A); and the extension A ⊃ Q is algebraic (i.e. every
element of A is algebraic over Q); so A is the algebraic closure of Q. Evidently A
contains all complex roots of unity; but A is not generated by the roots of unity. Since
[A : Q] = ∞, the converse of Corollary A1.3 evidently fails. Note also that A is a proper
subfield of C, since C is uncountable whereas A is countable.
 Appendix A1: FIELDS AND EXTENSIONS

Corollary A1.5. Let K ⊇ E ⊇ F be a tower of extensions, where K ⊇ E is


algebraic and E ⊇ F is algebraic. Then K ⊇ F is algebraic.

By induction, Corollary A1.5 obviously extends to towers of extensions of arbitrary finite


length.
Proof of Corollary A1.5. Let θ ∈ K. By hypothesis, θ is algebraic over E, so there exist
n > 1 and a0 , a1 , . . . , an−1 ∈ E such that

θn + an−1 θn−1 + · · · + a1 θ + a0 = 0.

Since a0 ∈ E is algebraic over F , F [a0 ] ⊇ F is a finite extension. Also a1 ∈ E is algebraic


over F so it is algebraic over F0 := F [a0 ], which means that F1 := F0 [a1 ] = F [a0 , a1 ]
is a finite extension of F . Continuing in this way, we come to a finite extension field
Fn−1 := F [a0 , a1 , . . . , an−1 ] ⊇ F . The relation above shows that θ is algebraic over Fn−1 ,
so it generates a finite extension field Fn−1 [θ] ⊇ Fn−1 . Again using transitivity of exten-
sions, [Fn−1 [θ] : F ] < ∞ and so by Corollary A1.3, θ is algebraic over F .

Matrix Representations of Field Extensions


Let E ⊇ F be a field extension of degree n. Then E is isomorphic to a subring of the
ring of n × n matrices over F , which we denote by F n×n . To see this, To see this, note
that every α ∈ E defines an F -linear transformation Tα : E → E, x 7→ αx. Now fix
a basis of E over F ; and denote by M (α) the basis of Tα with respect to the chosen
basis. Now M : E → F n×n is clearly an injective, F -linear, and M (αβ) = M (α)M (β),
M (α+β) = M (α) + M (β). So the image of M : E → F n×n is a subring isomorphic to E.
Define the norm and trace of α (with respect to the extension E ⊇ F ) as the elements

NE/F α = det M (α), TrE/F α = tr M (α)

respectively, where tr : F n×n → F is the usual matrix trace (the sum of the diagonal
entries). Note that both of these are maps E → F . They do not depend on the choice
of bases used, since they are the determinant and the trace of Tα , admitting a basis-free
description. In the case where F is the prime field of E (i.e. its minimal subfield), the
norm and trace are called the absolute norm and absolute trace.

Example A1.6: The Complex Numbers. Take {1, i} as a basis for C over R. The matrix of
α = a + bi (a, b ∈ R) with respect to this basis is M (α) = [ ab −b
a
]. The norm and trace are given by
NC/R α = a2 + b2 and TrC/R α = 2a.
Appendix A1: FIELDS AND EXTENSIONS 

Theorem A1.7. Let E ⊇ F be a finite extension of fields. Write N=NE/F ,


Tr=TrE/F . For all a, b ∈ E we have
(i) N(ab) = N(a) N(b). Moreover N(a) = 0 iff a = 0.
(ii) Tr(a + b) = Tr a + Tr b. The map Tr : E → F is an F -linear functional. A bilinear
form on E is defined by (a, b) 7→ Tr(ab). If E (and F ) have characteristic zero
or are finite fields, then this bilinear form is nondegenerate and symmetric, i.e.
Tr(ab) = Tr(ba); and b = 0 is the only element satisfying Tr(ab) = 0 for all a ∈ E.

Proof. The identities N(ab) = N(a) N(b) and Tr(a+b) = Tr a+Tr b follow from Tab = Ta Tb
and Ta+b = Ta + Tb using basic properties of determinant and trace for linear transfor-
mations. Also N(1) = det I = 1; so if a ∈ E × then N(a) N(a−1 ) = N(1) = 1. Finally,
suppose Tr(ab) = 0 for all a ∈ E; we must show that b = 0. It suffices to find c ∈ E
satisfying Tr c 6= 0; for then we may take a = cb whenever b 6= 0. In characteristic zero,
Tr 1 = n = [E : F ] 6= 0 as required. In the finite case F = Fq and E = Fqn , an extension
2 n−1
of finite fields of degree [E : F ] = n, Tr a = a + aq + aq + · · · + aq by Theorem 3.8. If
q n−1
Tr c = 0 for all c ∈ E, then the polynomial x + · · · + x + x ∈ F [x] has q n roots in E,
q

and this number exceeds the degree q n−1 of the polynomial, a contradiction; so once again
there exists c ∈ E with Tr c 6= 0 as required.

Regarding the necessity of the additional assumption in Theorem A1.7(ii), see Exam-
ple A4.3.
The matrix representation gives the impression of making the arithmetic of field ex-
tensions more concrete or facilitating implementation. On the contrary, it is not practical
for implementation (since it requires storing n2 matrix entries for each element of E, rather
than E in the usual representation). For computer implementation, polynomial arithmetic
(modulo an irreducible polynomial) is still the best. But the matrix representation is sur-
prisingly useful as a theoretical device for explaining certain properties of field extensions.
For example, consider the ring R = F n×n , the ring of n × n matrices over F ;
and let Rm×m be the ring of m × m matrices over R. Then we have the isomorphism
Rm×m ∼ = F mn×mn which, although not difficult to prove, is still rather subtle and some-
what surprising. And replacing R by the subring S = {M (α) : α ∈ E} ⊆ R using
the matrix representation of an extension E ⊇ F of degree n as above, the isomorphism
E∼ = S ⊆ R induces an isomorphism between the ring E m×m of m×m matrices over E, and
the subring M (E m×m ) ⊆ F mn×mn defined by replacing each entry of a matrix A ∈ E m×m
by its matrix representation in S:
 M (α ) M (α ) · · ·
α11 α12 ··· α1m M (α1m )
  
11 12
 α21 α22 ··· α2m   M (α21 ) M (α22 ) · · · M (α2m ) 
A =
 ... .. .. ..  7 −→ M (A) =  .. .. .. .. 
. . .  
. . . .

αm1 αm2 ··· αmm M (αm1 ) M (αm2 ) · · · M (αmm )
| {z } | {z }
an m×m matrix over E an mn×mn matrix over F
 Appendix A1: FIELDS AND EXTENSIONS

This map preserves more than the ring operations ‘+’ and ‘×’; it also respects traces and
determinants: for all A ∈ E m×m ,

TrE/F (tr A) = tr M (A) and NE/F (det A) = det M (A).

(We continue to distinguish the usual matrix trace ‘tr’ and the trace ‘Tr’ for field extensions
using lower and upper case, respectively. Note the necessity of using the norm and trace
maps E → F since matrices on the left are over E; matrices on the right are over F .) The
first formula is easy to see by adding diagonal entries on both sides; the second formula
can be proved by first verifying it for elementary matrices A ∈ E m×m , then using the
multiplicative property to extend to the general case).
Now given a tower of finite extensions K ⊇ E ⊇ F , we have three trace maps and
three norm maps
TrK/F NK/F
......................................................................... .........................................................................
............... ........... ............... ...........
........... ......... .. ........... ......... ..
................ ..............
. .. ................ ..............
. ..
. . .
. . . ...................................................................... .. . . ......................................................................
K ..................................................................
E F K ..................................................................
E F
TrK/E TrE/F NK/E NE/F
The transitivity of norm and trace maps is the assertion that these diagrams commute:

Theorem A1.8. For a tower K ⊇ E ⊇ F as above, TrK/F = TrE/F ◦ TrK/E and


NK/F = NE/F ◦ NK/E .

Proof. Use the observations above, restricting the matrix A ∈ E m×m to lie in the matrix
representation of K, where m = [K : E].

When representing a finite field extension E ⊇ F as a ring of n × n matrices over F ,


sometimes we are led to consider the full characteristic polynomial of each matrix M (α),
α ∈ E (i.e. rather than just the trace and norm, which are obtained from just two coeffi-
cients in this polynomial). Here we may use

Theorem A1.9. Let K ⊇ F be a finite extension of fields, and let α ∈ K. Denote


E = F [α], n = [E : F ] and m = [K : E] so that [K : F ] = mn. Multiplication by α
gives F -linear maps E → E and K → K with corresponding matrix representations
ME (α) ∈ F n×n and MK (α) ∈ F mn×mn respectively. The characteristic polynomial
of MK (α) is det(xI − MK (α)) = h(x)m where h(x) = det(xI − ME (α)). Moreover,
h(x) is the minimal polynomial of α over F .

Proof. We may use {1, α, α2 , . . . , αn−1 } as a basis for E over F , and let {β1 , β2 , . . . , βm }
be a basis for K over E (as in the proof of Theorem A1.1). The matrix of Tα with respect
to the basis {αj βi : 16i6m, 16j6n} is MK (α) = Im ⊗ ME (α) where ME (α) is the n × n
Appendix A1: FIELDS AND EXTENSIONS 

companion matrix of h(x). The result follows.

Corollary A1.10. Under the hypotheses of Theorem A1.9, TrK/F α = m TrE/F α


and NK/F α = (NE/F α)m .
Appendix A2: Polynomials and Irreducibility

Every finitely generated additive subgroup of Q is generated by a single element. This


says that if r1 , . . . , rn are rational numbers, then there exists r ∈ Q such that

Zr1 + Zr2 + · · · + Zrn = Zr.

We will assume r1 , . . . , rn are not all zero (otherwise one clearly takes r = 0). Now to
find r, first find the smallest positive integer b such that bri ∈ Z for all i (so b is the least
common denominator). Then take a = gcd(br1 , . . . , brn ). Recall that there exist integers
k1 , . . . , kn such that
k1 ·br1 + k2 ·br2 + · · · + kn ·brn = a
and this is the least positive element in Zbr1 + · · · + Zbrn . Dividing both sides by b, we
get r := ab as the least positive integer in the additive subgroup Zr1 + · · · + Zrn ⊂ Q.
Again assuming r1 , . . . , rn are not all zero, the additive subgroup Zr1 + · · · + Zrn ⊂ Q
is infinite cyclic. Denote by r = wt(r1 , . . . , rn ) ∈ Q its unique positive generator (so that
±r are the two choices of generator). Also for any nonzero polynomial f (x) = a0 + a1 x +
· · · + an xn ∈ Q[x], define the weight of f (x) by
wt f = wt f (x) = wt(a0 , a1 , . . . , an ).

Lemma A2.1. Suppose f (x), g(x), h(x) ∈ Q[x] are nonzero polynomials.
(i) f (x) ∈ Z[x] iff wt f ∈ Z; and in this case, the weight of f is simply the greatest
common divisor of its coefficients.
(ii) wt(f g) = wt f · wt g.
(iii) Assume f (x) = g(x)h(x). If at least two of the polynomials f, g, h are monic
with integer coefficients, then so is the third.

Proof. (i) Let a0 , a1 , . . . , an be the coefficients in f (x); and let a = wt f . Since Za =


Za0 + Za1 + · · · + Zan , clearly a ∈ Z iff every ai ∈ Z. The remaining assertion uses only
well-known properties of ideals in Z, using the extended Euclidean algorithm (see also the
explanations above).
(ii) Let r = wt f , s = wt g, t = wt(f g) so that

f (x) = ruf (x), g(x) = sug (x), f (x)g(x) = tuf g (x)

where each of the three polynomials uf (x), ug (x), uf g (x) ∈ Z[x] has weight 1. Reducing
the fraction rs rs a
t ∈ Q to lowest terms as t = b where a, b are relatively prime positive
integers, we obtain
Appendix A2: POLYNOMIALS AND IRREDUCIBILITY 

(A2.2) buf g (x) = auf (x)ug (x).

If b > 1 then there exists a prime p dividing b, with p 6 a. Reducing both sides of
(A2.2) modulo p, we find a product of two nonzero polynomials in Fp [x] equal to the zero
polynomial. This is a contradiction, since F [x] has no zero divisors (by comparing leading
terms on both sides) for any field F . This shows that b = 1. Now the right hand side of
(A2.2) clearly has weight divisible by a. Since the left hand side has weight 1, we obtain
a = 1. This gives rs = t as required.
(iii) If g(x), h(x) are monic with integer coefficients, then clearly so is their product.
Now suppose f (x), g(x) ∈ Z[x] are monic with f (x) = g(x)h(x). Since wt f = wt g = 1,
we have wt h = 1 by (ii); and then h(x) ∈ Z[x] by (i). By comparing leading terms, h(x)
is also monic.

In the light of Theorem A1.2, it is useful to have tests for irreducibility of polynomials.
In the case of number fields, the most useful such test is the following.

Theorem A2.3. Let f (x) ∈ Z[x]. Then f (x) is irreducible in Q[x] iff it is irreducible
in Z[x].

Proof. Any nontrivial factorization f (x) = f1 (x)f2 (x), with nonconstant factors fi (x) ∈
Z[x], gives a nontrivial factorization in Q[x]. For the converse, let f (x) ∈ Z[x] and sup-
pose that f (x) is reducible in Q[x]. We may assume wt f (x) = 1; otherwise divide f (x)
by its weight. By assumption, f (x) = f1 (x)f2 (x) where each of the factors fi (x) ∈ Q[x]
 where ri = wt fi (x) and ui (x) ∈ Z[x]. By
has degree at least 1. Now fi (x) = ri ui (x)
Lemma A2.1, wt u1 (x)u2 (x) = wt u1 (x) wt u2 (x) = 1, so f (x) = u1 (x)u2 (x) where
each of the factors ui (x) ∈ Z[x] has degree at least 1.

The following criterion is more specialized but sometimes useful.

Theorem A2.4 (Eisenstein’s Criterion). Let f (x) = a0 + a1 x + · · · + an xn ∈


Z[x] where a0 , a1 , . . . , an are all divisible by some prime p which does not divide an .
Suppose further that p2 does not divide a0 . Then f (x) is irreducible in Z[x] (and
hence also in Q[x]).

Proof. Supposing that f (x) is reducible in Z[x], then f (x) = g(x)h(x) where g(x) ∈ Z[x]
has leading term bxk , h(x) ∈ Z[x] has leading term cxn−k with 1 6 k 6 n−1; and bc = an
which is not divisible by p. Reducing mod p gives a factorization of an xn (mod p) in Fp [x].
Since Fp is a field, Fp [x] has unique factorization; and so after reduction mod p, g(x) and
h(x) must reduce to bxk and cxn−k (mod p) respectively. This means that the original
polynomials g(x), h(x) ∈ Z[x] must both have constant term divisible by p. But this means
that f (x) = g(x)h(x) must have constant term divisible by p2 , a contradiction.
Appendix A3: Algebraic Integers

We recommend [Sa] for further details on algebraic integers.


A complex number θ ∈ C is algebraic if it is algebraic over Q, i.e. if it is a root of
some nonzero polynomial f (x) ∈ Q[x]. Without loss of generality, f (x) ∈ Z[x]; otherwise
multiply f (x) by the least common denominator of its coefficients. But the resulting
polynomial f (x) ∈ Z[x] is not necessarily monic.

Theorem A3.1. Let θ ∈ C be algebraic, and let m(x) ∈ Q[x] be its minimal
polynomial. Then the following conditions are equivalent.
(i) m(x) ∈ Z[x].
(ii) θ is a root of some monic polynomial with integer coefficients.
(iii) Z[θ] is a finitely generated as an additive group (or Z-submodule of C).
(iv) There is a chain of subrings Z[θ] ⊆ R ⊂ C such that R is a finitely generated as
an additive group (or Z-submodule of C).

Proof. We will prove (i)⇔(ii)⇒(iii)⇒(iv)⇒(ii). Obviously (i) implies (ii). Conversely,


suppose f (θ) = 0 where f (x) ∈ Z[x] is monic, and observe that f (x) = m(x)h(x) for some
h(x) ∈ Q[x]. As in Appendix A2, let r = wt m(x) and s = wt h(x) so that

m(x) = rum (x), h(x) = suh (x)

where um (x), uh (x) ∈ Z[x] have weight 1. Since f (x) ∈ Z[x] is monic, wt f (x) = 1. By
Lemma A2.1, rs = 1 so

f (x) = rum (x)suh (x) = um (x)uh (x).

Since um (x) = 1r m(x) has positive leading term 1r , comparing leading coefficients on the
left and right (these being integers) gives r = s = 1. In particular, m(x) = um (x) ∈ Z[x].
This gives (i).
Now suppose (ii) holds. There exist n > 0 and integers a0 , a1 , . . . , an−1 ∈ Z such that

θn + an−1 θn−1 + · · · + a1 θ + a0 = 0.

In this case, the elements 1, θ, θ2 , . . . , θn−1 generate Z[θ] as an additive group. To see this,
note that the additive subgroup generated by 1, θ, . . . , θn−1 is

A := Z + Zθ + Zθ2 + · · · + Zθn−1 ⊆ Z[θ].


Appendix A3: ALGEBRAIC INTEGERS 

Our hypothesis shows that θn ∈ A. Multiplying both sides by θ yields θn+1 ∈ A. Proceed-
ing inductively, θj ∈ A for all j > 0, and so A = Z[θ], which gives (iii).
It is obvious that (iii) implies (iv). Finally suppose (iv) holds, and let α1 , α2 , . . . , αn ∈
R such that
R = Zα1 + Zα2 + · · · + Zαn .
Denote by T : R → R the Z-module homomorphism (i.e. homomorphism of additive
groups) defined by α 7→ θα. There exist aij ∈ Z such that
X
T (αi ) = θαi = aij αj .
j=1

(In general the choice of coefficients aij ∈ Z is not unique; however, this
 point does not 
affect our argument.) Then f (T ) = 0 where f (x) = det(xI − A), A = aij : 1 6 i, j 6 n .
Clearly f (T ) : R → R is the Z-module homomorphism (i.e. homomorphism of additive
groups) α 7→ f (θ)α = f (T )α = 0. Since R has no zero divisors, this implies that f (θ) = 0.
But f (x) ∈ Z[x] is monic by construction, so (ii) follows.

A number θ ∈ C is integral (and θ is an algebraic integer, or simply an integer)


if the equivalent conditions of Theorem A3.1 hold. We denote by I the set of all algebraic
integers, also called the ring of algebraic integers, as justified by part (i) of the following.
To distinguish the ‘ordinary’ integers, elements of Z are called rational integers, and this
terminology is justified by part (ii) of the following.

Theorem A3.2. (i) If α, β ∈ I then α ± β, αβ ∈ I. Thus I ⊂ A is a subring.


(ii) The algebraic integers in Q are exactly the elements of Z. That is, I ∩ Q = Z.
(iii) Every element θ ∈ A satisfies kθ ∈ I for some positive integer k. In particular,
A is the field of quotients of I; and the abelian group quotient A/I is an infinite
torsion group (i.e. every element has finite order).

Proof. (i) Let m and n be the degrees of α and β over Q, respectively. Then Z[α, β] =
Pm−1 Pn−1 i j
i=0 n=0 Zα β . Since Z[α+β] ⊆ Z[α, β] where Z[α, β] is finitely generated, α+β ∈ I
by Theorem A3.1. The same argument holds for α−β and αβ.
(ii) The minimal polynomial of r ∈ Q over Q is m(x) = x−r. Use the characterization
of algebraic integers given in Theorem A3.1(i).
(iii) Let θ ∈ A be a root of m(x) = xn + an−1 xn−1 + · · · + a1 x + a0 ∈ Q[x]. Choose
k ∈ Z such that kai ∈ Z for all i. Then α = kθ is a root of

kn m x
= xn + kan−1 xn−1 + k 2 an−2 xn−2 + · · · + k n−1 a1 x + k n a0 ∈ Z[x]

k

so that α ∈ I.
 Appendix A3: ALGEBRAIC INTEGERS

A number field is a finite extension field E ⊇ Q. The ring of integers of E is the


subring O = OE = I ∩ E. An argument similar to the above shows that E is the quotient
field of O. The following is extremely useful in determining the ring of integers OE in an
extension E ⊇ Q, as the subsequent examples show.

Theorem A3.3. Let E ⊇ Q be a number field, and let α ∈ OE . Denote by Tα :


E → E the Q-linear transformation u 7→ αu. Then the characteristic polynomial of Tα
is the monic polynomial det(xI − Tα ) = h(x)[E:Q[α]] ∈ Z[x] where h(x) is the minimal
polynomial of α over Q. In particular, TrE/Q α = tr Tα ∈ Z and NE/Q α = det Tα ∈ Z.

Proof. We may use {1, α, α2 , . . . , αn−1 } as a basis for F = Q[α], and let {β1 , β2 , . . . , βm }
be a basis for E over F ; thus m = [F : Q] and mn = [E : Q] (see the proof of Theo-
rem A1.1). The matrix of T = Tα with respect to the basis {αj βi : 16i6m, 16j6n} is
Im ⊗ M where M is the n × n companion matrix of h(x). The result follows.

Theorem A3.4. Let E ⊇ Q be an extension of degree n. Then E has an F -basis


{θ1 , . . . , θn } consisting of algebraic integers, such that {θ1 , . . . , θn } is also a base for
OE over Z, i.e.
OE = Zθ1 + Zθ2 + · · · + Zθn .
In other words, OE is a free Z-module of rank n = [E : Q].

Proof. Let {α1 , . . . , αn } be a basis for E over Q. Without loss of generality, each αi ∈ O;
otherwise, by Theorem A3.2(iii), replace αi by a positive integer multiple thereof. Consider
the free abelian group (i.e. Z-submodule of E) generated by our basis:

L = Zα1 + Zα2 + · · · + Zαn ⊆ O.

Using the nondegenerate bilinear form in Theorem A1.6, there is another basis {β1 , . . . , βn }
of E over Q, dual to the first basis, such that TrE/Q(αi βj ) = δij . Now given θ ∈ O, we may
Pn
express θ as a linear combination of the second basis as θ = j=1 bj βj for some bj ∈ Q.
Since αi , θ ∈ O for each i, we have αi θ ∈ O and so bi = TrE/Q(αi θ) ∈ Z. This shows that

O ⊆ Zβ1 + Zβ2 + · · · + Zβn ,

so O is a free abelian group of rank at most n. Recalling that O has a subgroup L which
is free abelian of rank exactly n, this forces O to be free abelian also of rank n.
Appendix A3: ALGEBRAIC INTEGERS 

Now let E ⊇ Q be an extension of degree n, with ring of integers O. By Theorem A3.4,


there is a basis {θ1 , . . . , θn } for E over Q which also generates O as a Z-module. Define
the discriminant of the extension E ⊇ Q to be
  
disc θ1 , . . . , θn = det TrE/Q(θi θj ) : 1 6 i, j 6 n .

This determinant is a nonzero integer since it is the Gram matrix of our nondegenerate
bilinear form [a, b] = TrE/Q(ab) with respect to our base. Now consider another base
0 0 0
Pn 
{θ1 , . . . , θn } for O over Z, so that θi = i=1 aij θj and the matrix A = aij : 1 6 i, j 6 n
has integer entries. The inverse matrix A−1 expressing the original base of θi ’s in terms of
the θj0 ’s must similarly have integer entries; and so det A = ±1. then

disc θ10 , . . . , θn0 = det TrE/Q(θi0 θj0 ) : 1 6 i, j 6 n = det(A)2 disc θ1 , . . . , θn


   

= disc θ1 , . . . , θn .

So the discriminant of a finite extension over Q is well-defined, independent of the choice of


base. (For more general finite extensions E ⊇ F , however, the discriminant is well-defined
only up to multiplication by the square of a unit.)


Example A3.5: Quadratic Fields. Consider a quadratic extension E = Q[ d] ⊃ Q where d 6= 1
is a squarefree integer (i.e. a product of distinct primes). The extension √ is real quadratic if d > 2;
or imaginary
√ quadratic if d 6 −1. The matrix of Tα , α = a + b d ∈ OE with respect to the
basis {1, d} is [ ab db
a
]. In order that α ∈ O E , Theorem A3.4 requires that both TrE/Q α = 2a and
NE/Q α = a2 −db2 are integers. We have two cases. (Note that d 6≡ 0 mod 4 since d is naturally
assumed to be squarefree.) √
(i) When d ≡ 2 or 3 mod 4,√this simplifies to √ a, b ∈ Z and we have O ⊆ Z[ d]; 2and the reverse
0
containment is clear, so O = Z[ d] has base {1, d}. The discriminant is D := det[ 0 2d ] = 4d.
(ii) When d ≡ 1 mod 4, we instead have a = u and b = v2 where u, v ∈ Z with u ≡ v mod 2 so
1
√ 2
O ⊆ Z[θ] where θ = 2 (1 + d), and once again equality holds: O = Z[θ] has base {1, θ}. In this case
1
the discriminant is D = det[ 21 1+d ] = d.
2

As above, let E be a number field, and O its ring of integers. Denote by O× the
group of units (invertible elements) in O. By abuse of language, these are often called
the units of E. In the following, r denotes the number of embeddings of E in R (i.e. ring
monomorphisms E → R) and 2s is the number of pairs (under complex conjugation) of
embeddings E → C which do not lie in R. The total number of embeddings of E in C is
r + 2s = [E : Q]. (This relation, and the following theorem, hold for any finite extension
E ⊇ Q, Galois or not.)

Theorem A3.6 (Dirichlet). A maximal free subgroup G < O× has rank r + s − 1.


Every unit α ∈ O× is uniquely factorizable as α = ζg where ζ is a root of unity, and
g ∈ G. Thus O× = U × G where U is finite cyclic and G ∼= Zr+s−1 .
 Appendix A3: ALGEBRAIC INTEGERS

Proof. See [Sa, p.60].

Example A3.7: The Rationals. Q has (r, s) = (1, 0) and all its units Z× = {±1} are roots of
unity, a cyclic group of order 2.

Example
√ A3.8: An Imaginary Quadratic Field. The imaginary quadratic extension √ E=
Q[ −3] ⊃ Q has (r, s) = (0, 1). Its ring of integers O = Z[ω], ω = ζ3 = 21 (1 + −3) has a
group of units O× = {±1, ±ω, ±ω 2 } which is cyclic of order 6. There are no units of infinite order,
−b
as r + s − 1 = 0. As explained above, {1, ω} is a base for O. For α = a + bω we have Tα = [ ab a−b ]
2 2
with respect to our base; and NE/Q α = a −ab+b . To find units, we require integer solutions of
NE/Q(a+bω) = a2 −ab+b2 = 34 a2 + 14 (a−2b)2 = 1. The equation requires |a| 6 1, and a similar
argument gives |b| 6 1. After checking all nine pairs (a, b) satisfying these inequalities, we find only
six solutions of the Diophantine equation, viz. (a, b) ∈ {±(1, 0), ±(0, 1), ±(1, 1)} which gives the six
units listed above.


Example A3.9: A Real Quadratic Field. √ The real quadratic extension E = Q[ 7] ⊃ Q has
= Z[ 7] has units O× = {±g k : k ∈ Z} including two roots
(r, s) = (2, 0). Its ring of integers O √
of unity ±1 and the unit g = 8 + 3 7 which generates an infinite √ cyclic group (a free group on
r + s − 1 = 1 generator). The norm map N = NE/Q : O → Z, a + b 7 7→ a2 − 7b2 is similarly useful
in verifying these claims; but we omit the details.

Two nonzero elements α, β ∈ O generate the same principal ideal, i.e. αO = βO, iff
β = uα for some u ∈ O× . In this case we say α and β are associates in O. Denote
by SE the set of all nonzero elements of OE which are not units. An element α ∈ SE is
reducible if α = βγ for some β, γ ∈ SE . If α is not reducible, it is irreducible (in O).
Assuming α, α0 ∈ SE are associates, then α is reducible iff α0 is. Every element in SE
is expressible as a finite product of irreducible elements; but this factorization is not in
general unique since any factorization α = π1 π2 · · · πk (with irreducible factors π1 , . . . , πk )
yields other such factorizations through permutations of the k factors, or through the
replacement of the irreducible factors by suitable associates (a process called migration
of units). We say OE (or, abusing language, E itself) has unique factorization, if every
element α ∈ SE factors into irreducible factors an an essentially unique way (i.e. up to
permutation of the factors, and migration of units). Not every ring of integers OE has
unique factorization (i.e. of elements). But OE always has unique factorization of ideals
(Theorem A3.10 below). When O is a principal ideal ring, this forces elements to also have
unique factorization; but since ideals in O are not necessarily principal, we do not always
obtain unique factorization of elements.
It might help here to keep in mind the hierarchy ED ⇒ PID ⇒ UFD ⇒ ID where
an integral domain (ID) is a commutative ring with identity having no zero divisors; a
unique factorization domain (UFD) is an integral domain with unique factorization
(of elements as product of irreducibles); a principal ideal domain (PID) is an integral
domain in which every ideal is principal; and a Euclidean domain (ED) is an integral
domain in which the ‘division algorithm’ holds. More about this appears at the end of
this Appendix. Since our interest focuses on the special case of the ring O of integers
Appendix A3: ALGEBRAIC INTEGERS 

in a number field, the hierarchy simplifies (see Theorem A3.14). In particular the rule
PID ⇒ UFD has a valid converse in the case of rings of integers, but not in the general
case; recall that Z[x] is a UFD with a nonprincipal ideal (2, x). We will postpone the
relevant theorem until after presenting some examples of rings of integers in a few specific
number fields. And before that, we need to review some terminology.
Recall that an ideal is an additive subgroup A ⊆ O such that OA ⊆ A, i.e. ra ∈ A
for all r ∈ O, a ∈ A. The sum and product of two ideals are the ideals defined by
A+B = {a+b : a ∈ A, b ∈ B};
AB = {finite sums of products ab with a ∈ A, b ∈ B}
= {a1 b1 +a2 b2 + · · · +ak bk : k > 1, a1 , a2 , . . . , ak ∈ A, b1 , b2 , . . . , bk ∈ B}.
We often abbreviate (a) = aO ⊆ O for the principal ideal generated by an element
a ∈ O. More generally, the ideal generated by a list of elements a1 , . . . , ak ∈ O is
(a1 , a2 , . . . , ak ) := (a1 ) + (a2 ) + · · · + (ak ) = Oa1 + Oa2 + · · · + Oak ⊆ O.
Two elements a, b ∈ O generate the same ideal (a) = (b) iff a and b are associates. A
proper ideal P ⊂ O is prime any of the following equivalent conditions are satisfied:
(i) Whenever ab ∈ P with a, b ∈ O, we must have a ∈ P or b ∈ P.
(ii) If P ⊆ AB where A, B ⊆ O are ideals, we have P ⊆ A or P ⊆ B.
(iii) The quotient ring O/P is an integral domain (i.e. it has no zero divisors).
A nonzero principal ideal (π) ⊂ O is prime iff its generator π is irreducible. A proper
ideal M ⊂ O is maximal if there is no proper ideal of O which strictly contains M;
equivalently, O/M is a field. So every maximal ideal is prime. The converse is not true
in general, but the ring of integers O of a number field is special in many ways including
this:

Theorem A3.10. Let E ⊇ Q be a finite extension with ring of integers O = OE .


(i) The norm of an ideal A ⊆ O, defined as its index N(A) := |O/A|, is a positive
integer whenever A = 6 (0). The norm map is multiplicative: N(AB) = N(A) N(B)
for any two ideals A, B ⊆ O. The unique ideal of norm 1 is the ring O itself. The
norm of its principal ideal is the (absolute value of the) norm of its generator:
N (a) = |NE/Q(a)| for all nonzero a ∈ O.
(ii) Every nonzero prime ideal of O is maximal. There is a unique rational prime
p ∈ P ∩ Z, and the quotient ring O/P is a finite field of order q = pf where
f > 1.
(iii) Every nonzero ideal A ⊆ O factors as a product A = P1 P2 · · · Pk for some
(not necessarily distinct) prime ideals Pi ⊂ O. (An empty product of ideals is
simply O.) This factorization is unique up to permutation of the prime factors.
(iv) For each rational prime p, the ideal (p) = pO has prime factorization (p) =
Pe11 Pe22 · · · Pedd where the prime factors Pi ⊂ O are distinct; ei > 1; N(Pi ) = pfi ;
and e1 f1 + · · · + ed fd = [E : Q].
 Appendix A3: ALGEBRAIC INTEGERS

In (iv), each quotient field O/Pi is a residual field; the degree fi of its extension over Fp is
the residual degree; and the number of times ei that Pi divides (p) is the ramification
index of Pi . We say p ramifies in E if at least one of the indices satisfies ei > 1. We say
p remains prime if (p) = pO ⊂ O is prime; and p splits if there are d > 2 distinct prime
factors. There are only finitely many primes which ramify (namely, those primes which
divide the discriminant). For Galois extensions, (iv) simplifies to (p) = (P1 P2 · · · Pd )e ,
i.e. all ramification indices coincide: ei = e.

Example A3.11: The Rational Integers. Z has unique factorization. Here the irreducible
elements have the form ±p where p is an ordinary prime (of course p and −p are associates) and the
corresponding prime ideals have the form (p) = pZ. All ideals are principal, and unique factorization
of elements is due to unique factorization of ideals; for example, (12) = (2)2 (3) yields 12 = 22 ·3.
Addition of ideals corresponds to taking greatest common divisors: (a1 ) + (a2 ) + · · · + (ak ) =
(a1 , a2 , . . . , ak ) = (d) where d = gcd(a1 , a2 , . . . , ak ).

Example A3.12: √ An Imaginary Quadratic Extension. The imaginary quadratic extension


E = Q[θ], θ = −6, does not have unique factorization. Its ring of integers is O = Z[θ] and its
units are O× = {±1}. The element 6 factors in two essentially different ways as 6 = 2·3 = (−θ)θ,
where the indicated factors are irreducible. These properties are easily verified using the norm map
N = NE/Q : O → Z, a + bθ 7→ a2 + 6b2 using the identity N(αβ) = N(α) N(β). If αβ = 1 in O, then
N(α) N(β) = 1. Since norms of elements in O are nonnegative integers, this forces N(α) = N(β) = 1;
and since the equation a2 + 6b2 = 1 yields (a, b) = (±1, 0) as its only integer solutions, we obtain
O× = {±1}. Supposing 2 = αβ where α, β ∈ O, then N(α) N(β) = N(2) = 4. Clearly the equation
a2 + 6b2 = 2 has no integer solutions; so one of the factors α, β has norm one and thus is a unit.
This proves that 2 is irreducible in O; and a similar argument shows that 3 and ±θ are irreducible
in O. Finally, since the only units are ±1, it is easy to see that our two factorizations of 6 in O are
essentially different.
The ideal (6) ⊂ O has norm 36, and its prime factorization is (36) = P2 Q2 where the prime ideals
P = (2, θ) and Q = (3, θ) have norm 2 and 3 respectively. Here (2) = P2 has norm 4, (3) = Q2 has
norm 9, and (θ) = PQ has norm 6. The extension has discriminant D = −36 (see Example A3.5(i));
and the primes 2 and 3 are the only primes that ramify (each with ramification index 2).
The rational prime 13 remains prime since O/(13) ∼ = Z[x]/(13, x2 +6) ∼
= F13 [x]/(x2 +6) ∼
= F169 .
Here we use the fact that −6 is a nonsquare mod 13, i.e. ( −6 13
) = −1. By quadratic reciprocity, all
rational primes congruent to 13, 17, 19 or 23 mod 24 similarly remain prime in E.
The rational prime 11 splits as (11) = P11 P11 where P11 = (11, 4+θ) and P11 = (11, 7+θ). This
follows from O/11O ∼ = Z[x]/(11, x2 +6) ∼
= F11 [x]/(x2 +6) ∼
= F11 [x]/(x−4)⊕F11 [x]/(x−7) ∼ = F11 ⊕F11 .
By quadratic reciprocity, all primes congruent to 1, 5, 7 or 11 mod 24 split in this way.

Example A3.13: A Quartic Extension. Let E = Q[θ] where θ is a root of f (x) = x4 −x+3.
Since f (x) is irreducible over F2 , it is irreducible over Z and hence over Q. It may be shown that
O = Z[θ] and that the quartic extension E ⊃ Q has discriminant 6885 = 34 ·5·17; and the only
roots of unity in O are ±1. Since r+s−1 = 0+2−1 = 1 in Theorem A3.6, the unit group has
the form O× = {±g k : k ∈ Z} for some g. Computation shows that we may take g = θ2 +2θ+2,
g −1 = −θ3 +θ2 −1.
The rational prime 2 remains prime in E since O/2O ∼ = Z[x]/(2, f (x)) ∼
= F2 [x]/(x4 +x+1) ∼
= F16
(Example 3.3). Its residual degree is 4. Similarly, 11, 13, 43, 53, 61, . . . remain prime.
Appendix A3: ALGEBRAIC INTEGERS 

The rational prime 3 ramifies as (3) = P3a P33b where both distinct factors P3a = (3, θ) and P3b =
(3, 2+θ) have residual degree 1. This follows from O/3O ∼ = Z[x]/(3, f (x)) ∼= F3 [x]/(x(x+2)3 ) ∼
=
F3 [x]/(x) ⊕ F3 [x]/((x+2)3 ) ∼
= 3F ⊕ S where S ∼
= 3F [x]/(x 3 ) is a local ring of order 27. Although the

residual degrees coincide (f1 = f2 = 1), the ramification indices e1 = 1 and e2 = 3 do not. This
points to the fact that the extension is not Galois.
The rational prime 17 ramifies as (17) = P217 P017 P00 0
17 where P17 = (17, 13+θ), P17 = (17, 10+θ),
P17 = (17, 15+θ). Here O/17O ∼
00
= Z[x]/(17, f (x)) ∼
= F17 [x]/((x+13)2 (x+10)(x+15)) ∼ = R ⊕ F17 ⊕ F17
where R = F17 [ε]/(ε2 ) is the ring of dual numbers over F17 , a local ring of order 289. Again the
ramification indices 2,1,1 do not all coincide.
The rational prime 5 ramifies as (5) = P25 P02 5 where the distinct prime ideals P5 = (5, 1+θ)
and P05 = (5, 3+3θ+θ2 ) have residual degrees 1,2 and ramification indices 2,1. Here O/5O ∼ =
Z[x]/(5, f (x)) ∼
= F5 [x]/((x+1)2 (x2 +3x+3)) ∼ = R ⊕ F25 where R is the ring of dual numbers over F5 .
The rational prime 7 splits as (7) = P7 P07 where the residual degrees are 1,3 and both ramification
indices are 1. Here O/7O ∼ = Z[x]/(7, f (x)) ∼
= F7 [x]/((x+2)(x3 +5x2 +4x+5) ∼ = F7 ⊕ F343 . We find a
similar behaviour at the primes 19, 23, 37, 59, . . . .
The rational prime 29 splits as (29) = P29 P029 P00 29 with residual degrees 1,1,2 and ramification
indices 1,1,1. Here O/29O ∼ = Z[x]/(29, f (x)) ∼
= F29 [x]/((x+3)(x+6)(x2 +20x+5) ∼ = F29 ⊕ F29 ⊕ F841 .
We find a similar behaviour at the primes 31, 41, 47, . . . .

The examples above illustrate, among other things, the existence of nonprincipal ideals
in exactly those cases where unique factorization (of elements) fails. This is no coincidence:

Theorem A3.14. Let O be the ring of integers O in a number field. Then O is a


unique factorization domain if and only if O is a principal ideal domain.

As we have previously reminded the reader, for general integral domains, the PID property
does not imply the UFD property; an example is the ring Z[x] which is a UFD but the
ideal (2, x) is nonprincipal.
Proof of Theorem A3.14. Suppose O is a principal ideal domain, and suppose a, b ∈ O
such that ab is divisible by an irreducible element p. Since ab = pd for some d ∈ O,
(a)(b) = (p)(d). But ideals in O factor uniquely, and the nonzero ideal (p) ⊂ O is prime;
so (a) ⊆ (p) or (b) ⊆ (p), i.e. p divides a or p divides b.
For the converse, let A ⊆ O be an arbitrary ideal, and we must show that A is
principal. We may assume A is nonzero, so A = P1 P2 · · · Pk is a product of nonzero
prime ideals Pi . If each Pi is principal, so is A; thus we may assume A = P is itself a
nonzero prime ideal. Now P ∩ Z is a prime ideal in Z, so P ∩ Z = pZ for some rational
prime p. (Alternatively, O/P is a finite field, so O/P ∼ = Fq where q = pe , e > 1 and
p is prime.) Let p = π1 π2 · · · πr be the unique factorization of p as a product of irre-
ducibles in O. Since p = π1 π2 · · · πr ∈ P where the ideal P is prime, we have πi ∈ P for
some i. Now (πi ) ⊆ P ⊂ O; and since the nonzero prime ideal (πi ) is maximal, (πi ) = O.

One way to verify that O is a UFD (and hence a PID) is to show that it satisfies the
division algorithm. We say that O is Euclidean if for every x, d ∈ O with d 6= 0, there
 Appendix A3: ALGEBRAIC INTEGERS

exist q, r ∈ O such that x = qd + r with N(r) < N(d). More generally, any integral domain
satisfying such a division algorithm is called a Euclidean domain (ED) (whose ‘norm’
may go by another name, such as ‘degree’, depending on the context; but we do require
N(ab) = N(a) N(b), N(a) ∈ {0, 1, 2, . . .}, and N(a) = 0 iff a = 0).

Theorem A3.15. If O is a Euclidean domain, then O has unique factorization (O


is a PID and hence also a UFD).

Proof. Let A ⊆ O. Without loss of generality, A is nonzero. Choose a nonzero element


d ∈ A for which N(d) is as small as possible. Of course, (d) ⊆ A. Conversely, let a ∈ A.
Then a = qd + r where r ∈ O and N(r) < N(d). However, r − a − qd ∈ A; so by choice
of d, we must have r = 0. This means that a ∈ (d), which gives the reverse inclusion; so
A = (d).

The ring of Gaussian integers is Z[i]. The ring of Eisenstein integers is Z[ω].
Here i = ζ4 and ω = ζ3 .

Corollary A3.16. The rings Z[i] and Z[ω] are Euclidean. Hence these rings are
UFDs, as well as PIDs.

Proof. Let d = a + bi ∈ Z[i] be nonzero. Then the principal ideal (d) = Zd + iZd ⊂ O
forms a square lattice (the vertices of the square grid shown, below left).

Given z ∈ O, let qd ∈ (d) be a vertex of the square grid that is closest to z Although the
|d|
choice of closest vertex may not be unique, it certainly has distance at most √ 2
from z, i.e.
2 1 2 1
N (r) = |r| 6 2 |d| = 2 N (d) where r = z − qd. This shows that Z[i] is Euclidean. A sim-
ilar argument, using a grid formed by equilateral triangles, shows that Z[ω] Euclidean.

Let O = OE be the ring of integers in a number field E (i.e. E ⊇ Q is a finite extension


and O is its ring of algebraic integers). Consider two nonzero ideals A, B ⊆ O. We say A
Appendix A3: ALGEBRAIC INTEGERS 

and B are equivalent if mA = m0 B for some nonzero elements m, m0 ∈ O. This gives an


equivalence relation on the set of nonzero ideals of O. The equivalence class of A, denoted
by [A], is called the ideal class of A. It is not hard to prove that the binary operation

[A][B] = [AB]

is well-defined for ideal classes; that is, it does not depend on the choice of representative
of each ideal class. Furthermore, this operation makes the set of ideal classes of O into an
abelian group, called the ideal class group of O (or of E). Since OA = A, the identity
element of this group is [O]. This class consists of all the principal ideals of O. Thus O is
a PID iff its ideal class group is trivial. Now for the nontrivial result:

Theorem A3.17. The ideal class group of every number field is finite.

The class number of a number field E, usually denoted by hE , is the order of its ideal
class group. Constructing elements of the ideal class group of a given order (if they exist)
is usually not too hard; but finding explicit upper bounds on hE is often hard. Fortunately
for many of the smaller number fields of interest, class numbers and groups is within the
reach of appropriate computational software.
It follows immediately that if h = hE is the class number of E, then for every ideal
A ⊆ O, the ideal Ah ⊆ O is principal. This is often an adequate substitute for having a
PID. Sometimes it is helpful to note that h can be replaced here by the exponent of the
ideal class group. (Recall that the exponent of a group is the least common multiple of
the orders of the elements of that group, this being a divisor of the group order.)
Appendix A4: Normal and Separable Extensions

An algebraic extension of fields E ⊇ F is normal if for every element α ∈ E, the minimal


polynomial of α over F splits into linear factors in F [x]. An extension E ⊇ F is called a
splitting field for a polynomial f (x) ∈ F [x] if
(i) f (x) splits into linear factors in E[x], i.e. f (x) = a(x − α1 )(x − α2 ) · · · (x − αn ) where
a, α1 , . . . , αn ∈ E; and
(ii) E = F (α1 , α2 , . . . , αn ); thus E is the smallest extension of F where f (x) splits into
linear factors.

Example 1/3 and ω = e2πi/3 =


1 √ A4.1: A Cubic Extension and its Normal Closure. Let θ = 2
2
(−1 + −3). The extension L := Q[θ] ⊇ Q is not normal since the minimal polynomial of θ over
Q is m(x) = x3 − 2 = (x − θ)(x − ωθ)(x − ω 2 θ), only one of whose roots lies in L. By adjoining to L
the remaining roots of f (x), we obtain the extension E = Q[θ, ωθ, ω 2 θ] = Q[ω, θ] = L[ω] ⊃ Q which
is normal; it is the splitting field of m(x), and it is called the normal closure of L (the unique
smallest extension of L which is normal) over Q.

Theorem A4.2. (i) Let F be a field, and let f (x) ∈ F [x] be a nonconstant polyno-
mial. Then there exists a splitting field E ⊇ F for f (x) over F ; and the splitting
field is unique up to isomorphism.
(ii) A finite extension E ⊇ F is normal iff it is the splitting field of some polynomial
f (x) ∈ F [x] over F .

We therefore speak of the splitting field (rather than a splitting field) for f (x) ∈ F [x]
over F . We only sketch the construction of the splitting field of f (x) ∈ F [x] over F ,
as follows: First construct an extension E1 = F [α1 ] ∼

= F [x]/ f (x) such that f (x) =
(x − α1 )g(x), g(x) ∈ E1 [x], and then recursively apply this process to g(x), repeating
until we have obtained an extension in which f (x) splits into linear factors. The resulting
extension is finite by Theorem A1.1. For (ii), given a finite normal extension E ⊇ F , we
can easily express E = F [α1 , . . . , αn ], then take f (x) to be the least common multiple (in
F [x]) of the minimal polynomials of the generators α1 , . . . , αn over F ; clearly E is the
splitting field of f (x) over F .
An algebraic extension of fields E ⊇ F is separable if every irreducible polynomial
f (x) ∈ F [x] has no repeated roots in E, i.e. f (x) is not divisible by (x − α)2 for any α ∈ E.
Appendix A4: NORMAL AND SEPARABLE EXTENSIONS 

Example A4.3: An Inseparable Extension. Let E = Fp (t), the field of rational functions in an
indeterminate t, with coefficients in the prime order field Fp (thus E is the field of quotients of the
polynomial ring Fp [t]). This has a subfield F = Fp (tp ), and the extension E ⊃ F has degree p with
basis {1, t, t2 , . . . , tp−1 }. It is not separable; the polynomial f (x) = xp − tp ∈ F [x] is irreducible in
F [x]; yet it factors as f (x) = (x−t)p in E[x], where it has one distinct root t with multiplicity p. Now
E = F [t] ⊃ F is the splitting field of f (x), the minimal polynomial of t over F , so it is normal but
inseparable. Also TrE/F t = 0 as seen as seen from the coefficient of xp−1 in f (x). More generally,
TrE/F (tk ) = 0 for k = 0, 1, 2, . . . , p−1 and so the trace map of the extension vanishes identically:
TrE/F = 0. The conclusion of Theorem A1.7(ii) fails dramatically; but so does the hypothesis since
E and F are infinite fields of positive characteristic p.

While examples like A4.3 do arise naturally in certain situations, throughout this course
we will treat them as pathological cases to be avoided. We focus instead on fields which
are either finite or have characteristic zero, which are always separable by Theorem A4.5
below.

Theorem A4.4. Let E ⊇ K ⊇ F be a tower of finite fields. If the extension E ⊇ F


is separable, then so are the extensions E ⊇ K and K ⊇ F .

Proof. Suppose E ⊇ F is separable. Clearly K ⊇ F is separable: since every α ∈ E is a


simple root of its minimal polynomial over F , this is in particular true for every α ∈ K.
Now let f (x) ∈ K[x] be irreducible in K[x], with f (α) = 0 for some α ∈ E. Let g(x) ∈ F [x]
be the minimal polynomial of α over F . Since f (x) is the minimal polynomial of α over K,
g(x) = f (x)h(x) for some h(x) ∈ K[x]. Since α is a simple root of g(x), it is a simple root
of f (x). So E ⊇ K is separable.

Theorem A4.5. Let E ⊇ F be an extension of finite fields, or fields of characteristic


zero. Then the extension E ⊇ F is separable.

Proof. Consider first the case char E = char F = 0. Suppose f (x) ∈ F [x] is monic
irreducible, and write f (x) = a0 + a1 x + · · · + an−1 xn−1 + xn where n > 1 and ai ∈ F .
Suppose θ ∈ E is a root of f (x); so f (x) is the minimal polynomial of θ over F . Term-by-
term differentiation shows that the derivative f 0 (x) ∈ F [x] has leading term nxn−1 where
the coefficient is nonzero (it is here that we require the hypothesis that char F = 0) and
in particular deg f 0 (x) = n−1. By minimality of the degree of the irreducible polynomial,
f 0 (θ) 6=0. However if f (x) = (x − θ)2 g(x) where g(x) ∈ E[x], then the derivative f 0 (x) =
(x − θ) (x − θ)g 0 (x) + 2g(x) has θ as a root. This is a contradiction.
A similar argument works if E = Fqr , F = Fq , q = pe , p prime, r, e > 1. Let
f (x) ∈ F [x] be monic irreducible of degree n; and suppose θ ∈ E is a repeated root
of f (x). The argument above shows that f 0 (x) = 0 ∈ F [x]. In characteristic p this
 Appendix A4: NORMAL AND SEPARABLE EXTENSIONS

simply means that every term in f (x) has exponent divisible by p, so n = rp and
f (x) = b0 +b1 xp +b2 x2p +· · ·+br xrp = g(x)p where g(x) = c0 +c1 x+c2 x2 +· · ·+cr xr ∈ F [x]
e−1
and ci = bpi using Theorem 3.7. But deg g(x) = r < n and evidently g(θ) = 0, again
contradicting the minimality of the degree of the minimal polynomial of θ over F .

Now let E ⊇ F be an extension of degree n, and let C be an algebraically closed field


containing F . (For example if F = Q, one can take C to be C or A; if F is a finite field of
S∞
characteristic p, one can take C = k=1 Fpk .) An F -monomorphism or F -embedding
σ : E → C is an injective ring (or field) homomorphism satisfying σ(a) = a for all a ∈ F .

...........
Theorem A4.6. Let E ⊇ F be a separable extension of E.. ..............
... C
..
... .
... ...
...
degree n, and let C be an algebraically closed field containing F . ...
...
... .
.
...
.
... ...
Then there exist exactly n distinct F -monomorphisms ... ...
... .....
.. ..
from E into C. F

Proof. First consider the special case that E = F [α] for some α ∈ E. Let f (x) =
Irrα,F (x). Since C is algebraically closed, f (x) splits into linear factors in C[x], say f (x) =
(x − α1 )(x − α2 ) · · · (x − αn ) where each αi ∈ C. For each i, observe that Irrαi ,F (x) = f (x)
since f (x) is monic irreducible in F [x] and has αi as a root.
For each i = 1, 2, . . . , n, define σi : F [α] → C by g(α) 7→ g(αi ) where g(x) ∈ F [x].
Then σi is well-defined, since if g(α) = h(α), then g(x) ≡ h(x) mod (f (x)), in which
case g(αi ) = h(αi ). Clearly σi : E → C is a ring homomorphism, fixing every element
of F . Also σi is one-to-one, for if σi (g(α)) = g(αi ) = 0, then f (x) divides g(x), so that
g(α) = 0. So each σi : E → C is an F -monomorphism. The image of σi is the subfield
σi (E) = F [αi ] ⊆ C.
Now F [αi ] ∼= F [α] = E is separable over F , so α1 , α2 , . . . , αn are distinct. Since
σi (α) = αi , the monomorphisms σ1 , σ2 , . . . , σn are distinct.
Finally, let σ be any F -monomorphism from E into C. Then f (σ(α)) = σ(f (α)) =
σ(0) = 0, so that σ(α) ∈ {α1 , α2 , . . . , αn }. Let us say that σ(α) = αi . Since the ring
homomorphisms σ and σi agree on F and on α, they must agree on F [α] = E, i.e. σ = σi .
Thus σ1 , σ2 , . . . , σn are the only F -monomorphisms from E into C.
Consider now the general case E ⊃ F , and let α ∈ E r F . We may assume that
F [α] ⊂ E; otherwise we are done by the previous case. We have E ⊃ F [α] ⊃ F and
n = mt where m = [E : F [α]] and t = [F [α] : F ]. By induction on the degree of extension,
there exist t distinct F -monomorphisms σ1 , σ2 , . . . , σt : F [α] → C. Let αi = σi (α).
Appendix A4: NORMAL AND SEPARABLE EXTENSIONS 

C..
... ...
... ...
....
.
...
...
. ...
... ...
... ...
....
. ...
. ...
... ...
... θ ..
σ i . ij .
E ....................... ............... σ (E) i
............................ θ (σi (E))
ij
... ...
... ... ... ...
... ... ... ...
... ....
. ... .
....
... . ... .
... ... ... ...
... ... ... ...
...
... .
. ... ...
. .....
.. ...
...
... ..... ... ...
.. ....
σ i .
...............
F [α]
.
...................... F.[αi ]
... ...
...
... ...
... ....
.
... .
... ...
... ...
...
... ....
.
.
... ...
... ...

F
Since σi : E → σi (E) is an F -isomorphism, the extension σi (E) ⊇ F is separable; hence by
Theorem A4.4, the extension σi (E) ⊇ F [αi ] is separable. By induction on the degree of ex-
tension, for each i there exist m distinct F [αi ]-monomorphisms θi1 , θi2 , . . . , θim : σi (E) →
C. The composite maps θij ◦ σi : E → C constitute mt = n distinct F -monomorphisms.
To see that these are the only F -monomorphisms E → C, suppose that σ : E → C is an
F -monomorphism. As before, σ must take α to some αi . Then σ ◦ σi−1 : σi (E) → C is an
F [αi ]-monomorphism, so by induction, σ ◦ σi−1 = θij for some j, whence σ = θij ◦ σi as
required.

As preparation for the next theorem, we require

Lemma A4.7. Let V be a vector space over an infinite field F . Then V is not a
union of finitely many proper subspaces.

Proof. Suppose there exists a positive integer n for which there exists a vector space V
covered by finitely many proper subspaces. We may further suppose n is minimal with
this property; and now we seek a contradiction. Clearly n > 1; and there exists a vector
space V = V1 ∪ V2 ∪ · · · ∪ Vn over F where each Vi < V is a proper subspace. For each
S
i ∈ {1, 2, . . . , n}, there exists vi ∈ V r j6=i Vj by minimality of n. It is easy to see that
the affine line L = {v1 + tv2 : t ∈ F } intersects each Vi in at most one point. However, L
has an infinite number of points in V = V1 ∪ V2 ∪ · · · ∪ Vn , a contradiction.

It is often useful to have a single generator for an extension field. The following result
guarantees that such a generator exists for all finite separable extensions. We present a
proof, however, only in the easiest cases which we care about most: finite fields and fields
of characteristic zero. For a proof in the general case, see e.g. Garling [Ga]. This is usually
called the Theorem of the Primitive Element, terminology that conflicts with usage
 Appendix A4: NORMAL AND SEPARABLE EXTENSIONS

in the finite case, where a primitive element is a generator of the multiplicative group; so
we would prefer to call this the Theorem of Simple Extensions.

Theorem A4.8. Let E ⊇ F be a finite separable extension of fields. Then E =


F [α] for some α ∈ E.

Proof in the case char F = 0 or |F | < ∞. The finite field case is easy: just take α to
be a generator of E × , by Theorem 3.2. Hence we assume the characteristic is zero. Let
C be an algebraically closed field containing F . By Theorem A4.6, there exist distinct
F -monomorphisms σ1 , σ2 , . . . , σn : E → C where n = [E : F ].
We claim that there exists α ∈ E such that the images σ1 (α), σ2 (α), . . . , σn (α) ∈ C
are distinct. To see this, we apply Lemma A4.7 as follows. Whenever 1 6 i < j 6 n, the
set Vij = {x ∈ E : σi (x) = σj (x)} is a proper subspace of the vector space E over F .
Also |F | = ∞ since char F = 0. Since E cannot  be covered by finitely many proper
S
subspaces Vij , there exists α ∈ E r 16i<j6n Vij , and this α has the required property:
σi (α) 6= σj (α) whenever i 6= j.
Since [E : F ] < ∞, we have F (α) = F [α] by Theorem A1.2. So we have a tower
of extensions E ⊇ F [α] ⊇ F and n = [E : F [α]][F [α] : F ]. Since the restrictions
σ1 , . . . , σn : F [α] → C are distinct F -monomorphisms, we have n 6 [F [α] : F ] by Theo-
rem A4.6. Therefore [F [α] : F ] = n and E = F [α].

Remark: The use of an algebraically closed extension C in the proof of Theorem A4.8 was
merely a convenient crutch, and was not really necessary. All that is really required is a
finite normal extension of E, thereby avoiding reference to the Axiom of Choice.
Appendix A5: Field Automorphisms and Galois Theory

We give a very quick introduction to Galois theory, with a few key small examples. For
more details and proofs, see e.g. [Ga], [Sa]. The following is a restatement of Theorems A4.5
and A4.8.

Theorem A5.1. Let E ⊇ F be a finite extension. Assume either that E and F are
finite fields, or that they have characteristic zero. Then
(a) E = F [α] for some α ∈ E, i.e. the extension is simple.
(b) The extension E ⊇ F is separable. Recall: this means that for every polynomial
f (x) ∈ F [x] which is irreducible in F [x], the polynomial f (x) has no repeated
roots in E.

Throughout this section, all finite extensions considered are assumed to satisfy the hypothe-
ses (and therefore the conclusions) of Theorem A5.1.
Denote by Aut E the group of all automorphisms of a field E. Two elements α and
β in a field F are algebraic conjugates if there exists an automorphism σ ∈ Aut E of
some extension E ⊇ F such that σ(α) = β.

Theorem A5.2. Let σ1 , . . . , σk ∈ Aut E be distinct automorphisms of a field E.


Then σ1 , . . . , σk are linearly independent functions E → E.

Proof. The result is clear for k = 1 since each σ ∈ Aut E is nonzero. Suppose that there
exist distinct automorphisms σ1 , . . . , σk ∈ Aut E which are linearly dependent over E; we
seek a contradiction. We may suppose our counterexample is minimal; so k > 2 and every
set of k − 1 distinct automorphisms of E is linearly independent. By assumption, there
exist c1 , c2 , . . . , ck ∈ E, not all zero, such that

c1 σ1 + c2 σ2 + · · · + ck σk = 0.

In fact every ci 6= 0 by minimality of k. Since σ1 6= σ2 , there exists a ∈ E such that


σ1 (a) 6= σ2 (a). For every x ∈ E we have

c1 σ1 (x) + c2 σ2 (x) + · · · + ck σk (x) = 0;


c1 σ1 (ax) + c2 σ2 (ax) + · · · + ck σk (ax) = 0.

Multiply the first equation by σ1 (a) and subtract the second equation to get

c2 (σ2 (a)−σ1 (a))σ2 (x) + c3 (σ3 (a)−σ1 (a))σ3 (x) + · · · + ck (σk (a)−σk (a))σk (x) = 0
 Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY

for all x ∈ E. However, the coefficient of σ2 (x) in this linear combination is nonzero,
contrary to our assumption of the minimality of k. This is a contradiction as desired.

Let E ⊇ F be a finite extension. An F -automorphism of E is an automorphism


σ ∈ Aut E such that σ(a) = a for all a ∈ F . (This builds upon the terminology of
Appendix A4: an F -automorphism is an F -monomorphism which is also surjective.) The
group of all F -automorphisms of E is denoted G(E/F ). Clearly G(E/F ) 6 Aut E is a
subgroup in general; and equality holds if F is the prime subfield of E (the unique smallest
subfield of E).

Theorem A5.3. Let E ⊇ F be a finite extension of fields, satisfying the assump-


tions of Theorem A5.1. Let f (x) ∈ F [x], and let α1 , . . . , αk be the distinct roots of
f (x) in E. Then every σ ∈ G(E/F ) permutes α1 , . . . , αk .

Proof. Denote f (x) = a0 + a1 x + a2 x2 + · · · + an xn where a0 , a1 , . . . , an ∈ F . For each


i ∈ {1, 2, . . . , k}, the image σ(αi ) ∈ E satisfies

f (σ(αi )) = a0 + a1 σ(αi ) + a2 σ(αi )2 + · · · + an σ(αi )n


= σ(a0 + a1 αi + a2 αi2 + · · · + an αin ) = σ(0) = 0

by our hypotheses, so that αi ∈ {α1 , . . . , αk }. Since σ is injective, it must therefore per-


mute α1 , . . . , αk .

Theorem A5.4. Let E = F [α] ⊇ F be an extension of degree n, and let f (x) be


the minimal polynomial of α over F . Let α1 , α2 , . . . , αk be all the roots of f (x) in E
(these are distinct by Theorem A5.1). Then
(i) G(E/F ) transitively permutes α1 , α2 , . . . , αk .
(ii) The only σ ∈ G(E/F ) fixing any of the roots αi is the identity. (In the lan-
guage of permutation groups, G(E/F ) permutes the roots regularly, i.e. sharply
transitively.)
(iii) |G(E/F )| = k 6 n.

Proof. Let C be an algebraic closure of E. Then f (x) has exactly n roots α1 , α2 , . . . , αn ∈


C; and these are distinct by Theorem A5.1. By assumption, exactly the first k of these roots
lie in E. By Theorem A4.6., there are exactly n distinct F -monomorphisms σi : E → C
where σi (α) = αi , i = 1, 2, . . . , n. Of these, only σ1 , . . . , σk have values in E; so these are
all the F -automorphisms of E.
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY 

Note that the heavy lifting in the last proof was accomplished by Theorem A4.6. This is
also true of our next proof.
A finite extension E ⊇ F for which equality holds with |G(E/F )| = [E : F ] is a
Galois extension. In this case, G = G(E/F ) is the Galois group of the extension.
Alternatively, one may characterize an extension as Galois iff it is finite, normal and
separable. This equivalence is due to the following.

Theorem A5.5. Let E ⊇ F be a finite extension satisfying the hypotheses of


Theorem A5.1. Then E ⊇ F is Galois iff it is normal, iff E is the splitting field
of some polynomial f (x) ∈ F [x]. Assuming E = F [α1 , α2 , . . . , αk ] where f (x) =
Qk
i=1 (x − αi ) ∈ F [x] with distinct roots α1 , α2 , . . . , αk ∈ E, then G(E/F ) is faithfully
represented as a group of permutations of these k roots. In particular, |G(E/F )| 6 k!.

Qk
Proof. First suppose E is the splitting field of f (x) ∈ F [x] over F ; say f (x) = i=1 (x −
αi ) ∈ F [x] and E = F [α1 , . . . , αk ]. By Theorem A5.1, every σ ∈ G(E/F ) permutes the
roots α1 , . . . , αk ; and since these roots generate E over F , distinct elements of G(E/F )
yield distinct permutations of the roots, and |G(E/F )| 6 k!. Let C be an algebraic
closure of E, and let n = [E : F ]. By Theorem A4.6, there are exactly n distinct F -
monomorphisms E → C; and all of these must map E → E since they permute the roots
of f (x), these being generators of E over F . So we obtain n distinct elements of G(E/F ),
and the extension E ⊇ F is Galois.
Conversely, suppose E ⊇ F is normal. By Theorem A5.1, E = F [α] for some α ∈ F .
Let f (x) ∈ F [x] be the minimal polynomial of α over F , so that deg f (x) = n = [E : F ].
Qn
Since E is normal and separable over F , f (x) = i=1 (x − αi ) with distinct roots αi ∈ E.
By Theorem A5.4, G(E/F ) permutes α1 , α2 , . . . , αn transitively.

Quadratic field extensions are normal (and hence Galois). This is the field-theoretic
analogue of the fact that in group theory, subgroups of index 2 are normal:

Example A5.6: Quadratic Extensions. Assuming the hypotheses of Theorem A5.1, every
quadratic extension is Galois. Let E ⊃ F be a quadratic extension, and let α ∈ E r F . Since
E ⊇ F [α] ⊃ F where [E : F ] = 2, we must have E = F [α]. Let f (x) ∈ F [x] be the minimal
polynomial of α over F . Then f (x) is quadratic with a root in E, so f (x) has two distinct roots in
E: f (x) = (x − α)(x − α0 ) where α, α0 ∈ E. By Theorem A5.5, E ⊃ F is Galois. This means that
G(E/F ) is generated by an automorphism σ of order 2 interchanging α ↔ α0 .

For n > 3, there exist both Galois and non-Galois extensions of degree n. The next
two examples include both types for n = 3. A cyclic extension is a Galois extension
with a cyclic Galois group; this is the case in Example A5.7.
 Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY

Example A5.7: A Cyclic Cubic Extension. The polynomial f (x) = x3 + x2 − 2x − 1 ∈ Q[x] is


irreducible over Q (since it is irreducible over F2 and hence over Z). Let α ∈ C be a root of f (x); so
we have a cubic extension E = Q[α] ⊃ Q. We compute
α3 = 1 + 2α − α2 ,
α4 = α + 2α2 − α3 = −1 − α + 3α2 ,
α 5 2
= −α − α + 3α 3 = 3 + 5α − 4α2 ,
α 6 = 3α + 5α − 4α = −4 − 5α + 9α2 , etc.
2 3

By direct computation, we verify that β := α2 − 2 is also a root of f :


f (β) = β 3 + β 2 − 2β − 1
= (α2 −2)3 + (α2 −2)2 − 2(α2 −2) − 1
= (α6 −6α4 +12α2 −8) + (α4 −4α2 +4) − 2(α2 −2) − 1
= α6 − 5α4 + 6α2 − 1 = 0.
Exactly the same reasoning shows that γ := β 2 − 2 = (α2 − 2)2 − 2 = 1 − α − α2 must be a root
of f (x). Since α, β, γ are algebraic of degree 3, they must be distinct; for example if β = α then α
would satisfy a quadratic relation over Q. We compute γ 2 − 2 = α and so Q[α] ⊆ Q[γ] ⊆ Q[β] ⊆ Q[α];
therefore E = Q[α] = Q[β] = Q[γ] is the splitting field of f (x) = (x − α)(x − β)(x − γ). The Galois
group is G = G(E/Q) = Aut E = hσi where the automorphism σ cyclically permutes the roots as
α 7→ β 7→ γ 7→ α.

By the Kronecker-Weber Theorem, every cyclic extension (being abelian) must be con-
tained in a cyclotomic extension. The extension of Example A5.7 is the ‘simplest’ example
of a cyclic extension of degree 3; it is a subfield of Q[ζ7 ] where we take α = ζ7 + ζ7−1 ; see
Example 4.9.

Example A5.8: Galois Closure. The real number α = 21/3 generates a cubic extension K =
Q[α] ⊃ Q. The minimal polynomial of α over Q is f (x) = x3 − 2 = (x − α)(x2 + αx + α2 ) where
the quadratic factor is irreducible over K. The cubic extension K ⊃ Q is not Galois; f (x) does
not split into linear factors in K[x], and the group G(K/Q) = Aut K is trivial, in accordance with
Theorem A5.4.
The splitting field E of f (x) is the Galois closure or normal closure of K, i.e. the smallest
Galois extension of Q containing K. Since f (x) = (x − α)(x − ωα)(x − ω 2α) where ω = ζ3 , we have
E = Q[α, ω]. Note that [E : Q] = [E : K][K : Q] = 2 · 3 = 6. The Galois group of the extension
is G = G(E/Q) = Aut E = hσ, τ i, a dihedral group of order 6 permuting the six roots in all 3! = 6
possible ways. Here τ denotes complex conjugation ω ↔ ω 2 and fixing α; σ cycles the three roots
as α 7→ ωα 7→ ω 2α 7→ α while fixing ω. The three roots of f (x) form the vertices of an equilateral
triangle embedded in C, on which G induces the full group of symmetries:
ωα •................
... ........
... ....... τ reflects across the horizontal axis of symmetry;
.... α
.......•
..
...
.. .........
.
...
. σ rotates 120◦ counter-clockwise about the center
ω 2α •........
Of course σ does not rotate the entire complex plane—it fixes all points of Q. The only elements of
G acting continuously on E are ι and τ .

√ √
Example A5.9: An Abelian Quartic √ Extension.
√ Let E = Q[ 2, 3] ⊃ Q. We show that this is
a simple extension generated by α = 2 + 3, an algebraic integer of degree 4. Direct computation
shows that α is a root of f (x) = x4 − 10x2 + 1 ∈ Q[x]. Clearly f (x) has no linear factors in Z[x],
since it has no roots in Z (indeed, no roots in F3 ). It has six monic quadratic factors in C[x]:
√ √ √ √ √ √
f (x) = (x2 +2 2x−1)(x2 −2 2x−1) = (x2 +2 3x+1)(x2 −2 3x+1) = (x2 −5+2 6)(x2 −5−2 6)
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY 

√ √ √
but none of these factors are in Q[x] since 2, 3, 6 are all irrational.
√ √ It follows that √
f (x)√is ir-
reducible in Q[x]. From these factorizations it also follows that Q[ 2, 3] ⊆ √ ⊆ Q[ 2, 3] so
Q[α] √
E = Q[α] is a quartic extension of Q as claimed. The four roots of f (x) are ± 2 ± 3 ∈ E, so E is
the splitting field of f (x), hence a Galois extension of Q.
Let G = G(E/Q) = Aut E, so that √ |G| √= [E : Q] = 4. Every automorphism of E is
determined by its
√ action on
√ √ the generators
√ 2 and 3; but there are only four possible combinations
of sign changes 2 7→ ± 2, 3 7→ ± 3; so all four of these combinations must√yield automorphisms
√ √
of E. So
√ √ we must
√ have√ a Klein√ four-group G = hσ, τ i√= {ι, σ, √
τ, στ } where
√ σ( √2) = − 2,√ σ( 3)
√=
3; τ ( 2) = 2, τ ( 3) = − 3. Here ι = id and στ ( 2) = − 2, στ ( 3) = − 3, so στ ( 6) = 6.

Our convention here is to compose automorphisms right-to-left, thus στ = σ ◦ τ . (In


other expositions where composition is left-to-right, this will generally be evident from the
superscript notation used for automorphisms, as in aστ = (aσ )τ .)

Example A5.10: A Galois Extension Admitting the Dihedral Group of Order 8. The
polynomial f (x) = √ x4 − 2 ∈ Z[x] is irreducible over Q by Eisenstein’s Criterion A2.4. Its roots are
±α, ±iα where i = −1 and α = 21/4 . The splitting field of f (x) over Q is therefore E = Q[α, i] =
Q[α, ζ] where ζ = ζ8 = 1+i √ . Note that E = K[i] where K = Q[α] so [E : Q] = [E : K][K : Q] =
2
2 · 4 = 8. The Galois group G = G(E/Q) = Aut E = hσ, τ i is dihedral of order 8 where τ is complex
conjugation; σ(i) = i and σ permutes the four roots of f (x) cyclically as α 7→ iα 7→ −α 7→ −iα 7→ α.
Thus G permutes the four vertices of a square embedded in the complex plane as shown:
...• ...... iα
..... .........
.
. ...... ..... τ reflects across the horizontal axis of symmetry;
−α•.............. .....
.....•
... α
.....
..... ......
..... .....
. .
. σ rotates the four roots 90◦ counter-clockwise about the center
•....−iα

Let E ⊇ F be a Galois extension with Galois group G = G(E/F ). Galois theory gives
a beautiful description of all the intermediate fields K (i.e. E ⊇ K ⊇ F ), establishing a
one-to-one correspondence with the subgroups of G. A priori, it may not even be clear
why the number of intermediate fields K should even be finite, or whether there should
be any effective means of listing them all; but since G is a finite group, G has only
finitely many subgroups and these can be effectively enumerated, thereby giving the exact
number of subfields and their explicit description. This bijection, known as the Galois
correspondence, is naturally defined as follows:
n o
intermediate fields K : ←→ {subgroups H 6 G}
E⊇K⊇F
nσ ∈ G : σ(a) = ao
K 7−→ GK =
for all a ∈ K
na ∈ E : σ(a) = ao
FixE (H) = ←− H
for all a ∈ H
= fixed subfield of H (in E)
 Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY

Theorem A5.11 (Fundamental Theorem of Galois Theory). Let E ⊇ F be


a Galois extension, with Galois group G. Then the correspondence defined above is
a bijection between the intermediate subfields K satisfying E ⊇ K ⊇ F , and the
subgroups H 6 G. It satisfies
(i) The correspondence is order-reversing. Thus given intermediate subfields K, K 0 ,
we have K ⊇ K 0 iff GK 6 GK 0 . Equivalently, given subgroups H, H 0 6 G, we
have H 6 H 0 iff FixE (H) ⊇ FixE (H 0 ).
(ii) Assuming containments as in (i), the subgroup index equals the degree of exten-
sion: [K : K 0 ] = [GK 0 : GK ].
(iii) GE = {ι}; GF = G; FixE (G) = F ; FixE ({ι}) = E where ι = id ∈ G.
(iv) Assuming containments as in (i), normality for subgroup containment is equiv-
alent to normality for the corresponding field extension. That is, the extension
K ⊇ K 0 is normal iff GK E GK 0 . In this case, K ⊇ K 0 is Galois with group
G(K/K 0 ) ∼
= GK 0 /GK .

We illustrate the Galois correspondence in Examples A5.6–10 by presenting the Hasse


diagram of intermediate fields in each case, side by side with the Hasse diagram of sub-
groups of the Galois group. Containment is depicted using vertical lines in each case.
Double lines indicate normality; and the vertical lines are labelled by the corresponding
index or degree. In each case, the Hasse diagram of subfields is obtained from the Hasse
diagram of subgroups by inverting top-to-bottom (but preserving left and right).
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY 

The following application of Galois theory is typical: we want to justify why certain
(given) elements of E lie in a desired subfield. Given a Galois extension E ⊇ F , Theo-
rem A5.11 says that an element a ∈ E is fixed by every element of G = G(E/F ) iff a ∈ F .
See Appendix A7 for symmetric multivariate polynomials.

Theorem A5.12. Let E ⊇ G be a Galois extension, and G = G(E/F ) its Galois


P Q
group. For all α ∈ E, we have σ∈G σ(α) ∈ F and σ∈G σ(α) ∈ F . More generally,
all symmetric polynomials in the algebraic conjugates of α lie in F .

Proof. Let s(x1 , x2 , . . . , xn ) ∈ F [x1 , x2 , . . . , xn ] be a symmetric polynomial in n = |G|


indeterminates with coefficients in F . Denoting G = {σ1 , σ2 , . . . , σn }, the element s(σ1 (α),
σ2 (α), . . . , σn (α)) ∈ E is fixed by every σi ∈ G, since σi permutes the n arguments of s.
(These arguments are the algebraic conjugates of α, each listed the same number of times—
they are not assumed to be distinct.) By Theorem A5.11(iii), s(σ1 (α), σ2 (α), . . . , σn (α)) ∈
FixE (G) = F .

Theorem A5.13. Let K ⊇ F be a Galois extension with group G = G(K/F ).


Then the norm and trace maps of the extension satisfy
Q P
NK/F α = σ(α), TrK/F α = σ(α) for all α ∈ K.
σ∈G σ∈G

Proof. Let f (x) ∈ F [x] be the minimal polynomial of α over F , and let n = deg f (x). Let
E = F [α], so that [E : F ] = n and [K : F ] = mn where m = [K : E]. Since the extension
K ⊇ F is Galois, f (x) splits into linear factors in K[x] and there exist τ1 , τ2 , . . . , τn ∈ G
such that τ1 (α), τ2 (α), . . . , τn (α) ∈ K are the roots of f (x). (Note that the roots do not
necessarily lie in E.) Now
n
(x − τi (α)) = xn − a1 xn−1 + a2 xn−2 − · · · + (−1)n an ∈ F [x].
Q
f (x) =
i=1

By Theorem A5.11, |GK | = m and [G : GK ] = n where GK is the set of all σ ∈ G fixing


every element in K. The n left cosets of GK in G must be τ1 GK , τ2 GK , . . . , τn GK since
τi GK ∩ τj GK = ∅ whenever i 6= j (since the images τi (σ(α)) = τi (α) are distinct for
i = 1, 2, . . . , n where σ ∈ GK ). Thus
n n m
τi (α)m = am
Q Q Q Q
σ(α) = τi (σ(α)) = n = NE/F α = NK/F α
σ∈G i=1 σ∈GK i=1

and
P n
P P n
P
σ(α) = τi (σ(α)) = m τi (α) = ma1 = m TrE/F α = TrK/F α
σ∈G i=1 σ∈GK i=1
 Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY

by Corollary A1.10.

Corollary A5.14. Let E ⊇ F be a Galois extension, and let α ∈ E. Then all


algebraic conjugates of α have the same trace; and they all have the same norm.

Proof. Let G = G(E/F ). Every algebraic conjugate of α has the form τ (α) ∈ E for some
τ ∈ G. Then
X X
TrE/F (τ (α)) = σ(τ (g)) = ρ(G) = TrE/F (α)
σ∈G ρ∈G
by Theorem A5.13, after substituting ρ = στ . The argument for norms is similar.

Now consider a Galois extension E ⊇ F of degree n with group G = G(E/F ). A


normal basis for E over F is a basis B = {β1 , β2 , . . . , βn } for E over F which is permuted
transitively by G, i.e. B = {σ(β) : σ ∈ G} where β = β1 . The size |G| = n is just right
to make this seem possible. For instance, if we choose β ∈ Q[i] to be neither real nor pure
imaginary, then {β, β} is a normal basis for the extension Q[i] ⊃ Q. This example shows
that it is not sufficient to take β to satisfy E = F [β]; but, a ‘random’ (or generic) choice
of β ∈ E seems like it should work. Nevertheless, it is tricky to prove the existence of a
normal basis in general!

Theorem A5.15 (Normal Basis Theorem). Let E ⊇ F be a Galois extension


satisfying the hypotheses of Theorem A5.1. Then there exists a normal basis for E
over F .

Proof. First consider the case that E = Fqn , F = Fq . By Theorem 3.8, G = G(E/F ) =
{ι, σ, σ 2 , . . . , σ n−1 } where σ(x) = xq and σ n = ι. Regarding σ as an F -linear transforma-
tion E → E at the moment, its minimal polynomial m(x) ∈ F [x] must divide xn −1. But if
deg m(x) < n, this would give a nontrivial F -linear combination of ι, σ, σ 2 , . . . , σ n−1 equal
to zero, contrary to Theorem A5.2. This cannot happen; so deg m(x) = n. This means
that m(x) coincides with the characteristic polynomial of σ on E. Thus E is a cyclic F [σ]-
module, i.e. there exists β ∈ F such that {β, σ(β), σ 2 (β), . . . , σ n−1 (β)} spans E over F ,
thereby forming a normal basis as required. See e.g. [HH, Chapter 11] for relevant results
from linear algebra.
It remains to consider the case E and F have characteristic zero; in particular they
are infinite fields. Here we paraphrase Artin’s proof [Ar]. By Theorem A5.1, E = F [α]
for some α ∈ E. Let f (x) ∈ F [x] be the minimal polynomial of α over F , so that
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY 
Q
deg f (x) = n = [E : F ] = |G| and f (x) = (x − σ(α)). For each τ ∈ G consider the
σ∈G
polynomial
Y x − σ(α)
gτ (x) = ∈ E[x]
σ∈G
τ (α) − σ(α)
σ6=τ

of degree n − 1 (noting that the n roots of f (x) are distinct so there is no division by zero
here). The reader may recognize these n polynomials as the Lagrange interpolation basis
for the polynomials of degree n − 1 at the roots of f (x): the polynomial gτ (x) vanishes
at all the roots of f (x) except at τ (α), where it has value 1 (see Theorem 3.12). Now it
follows that
P
(A5.16) gτ (x) = 1
τ ∈G

since the polynomial on the left has degree at most n − 1, but by the preceding comments
it evaluates to 1 at each of the n distinct roots of f (x). Also in E[x] we have

6 ρ in G;

0 mod f (x), if τ =
(A5.17) gτ (x)gρ (x) ≡
gτ (x) mod f (x), if τ = ρ in G.

The first congruence follows since gτ (x)gρ (x) vanishes at all of the n roots of f (x) when-
ever τ 6= ρ. When τ = ρ, multiplying both sides of (A15.16) by gτ (x) yields gτ (x)2 ≡
gτ (x) mod f (x). Considering now the action of G on E[x] via its natural action on coeffi-
cients, one easily finds that

(A5.18) G permutes the n polynomials gτ (x) for τ ∈ G, in the same way that G
permutes the n roots of f (x) (i.e. they are equivalent G-sets). In fact,
σ(gτ (x)) = gστ (x) for all σ, τ ∈ G.

Now consider the n × n matrix M (x) with rows and columns indexed by elements of G,
having (σ, τ )-entry equal to the polynomial gστ (x) ∈ E[x]. Since M (x) is an n × n ma-
trix with entries in E[x], its determinant is also a polynomial in x (in fact, of degree at
most n(n − 1)). We will show that det M (x) 6= 0, by showing that det(M (x)TM (x)) =
(det M (x))2 ≡ 1 mod f (x). The (σ, τ )-entry of M (x)TM (x) is
P P
gρσ (x)gρτ (x) = ρ(gσ (x)gτ (x)) ≡ 0 mod f (x)
ρ∈G ρ∈G

if σ 6= τ , by (A5.17); whereas for σ = τ ,

gρ (x)2 ≡
P P P
gρσ (x)gρσ (x) = gρ (x) ≡ 1 mod f (x).
ρ∈G ρ∈G ρ∈G

Thus in E[x] we have det M (x)2 ≡ 1 mod f (x) and, in particular, det M (x) is a nonzero
polynomial. Since F is an infinite field, there exists a ∈ F such that det M (a) 6= 0. Take
 Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY

B = {gτ (a) : τ ∈ G}. For all σ ∈ G we have σ(a) = a and so σ(gτ (a)) = gστ (a); thus G acts
P
on B. It remains to be shown that B is a basis for E over F . Suppose that cτ gτ (a) = 0
τ ∈G
for some constants cτ ∈ F . Applying an arbitrary σ ∈ G to this equation yields
P P
0= σ(cτ gτ (a)) = gστ (a)cτ
τ ∈G τ ∈G


so the vector cτ : τ ∈ G is in the null space of the nonsingular matrix M (a). This forces
cτ = 0 for all τ , so B is a basis.
Appendix A6: Dedekind Zeta Functions and Dirichlet Series

Let E be a number field with ring of integers O = OE . The Dedekind zeta function
of E is the complex-valued function
X 1
ζE (s) =
N (A)s
06=A⊆O

where the sum extends over all nonzero ideals A ⊆ O, and N (A) = |O/A| is the norm of A.
The series converges for complex numbers s with <(s) > 1; but by analytic continuation,
the function has a meromorphic extension to C with a simple pole at s = 1. It has an
Euler factorization given by
Y 1 −1
ζE (s) = 1− ,
N (P)s
P

also convergent for <(s) > 1; the product extends over all nonzero prime ideals P ⊂ O.
The theorem equating the infinite series with the infinite product, can readily be seen as
an algebraic reformulation of the fact that every nonzero ideal A ⊆ O factors uniquely as
a product of prime ideals (although the details relating convergence requires a little more
care than we provide here).

Example A6.1: The Riemann Zeta Function. ForPE = Q, the Dedekind zeta function coincides
∞ 1
with the Riemann Zeta Function: ζQ (s) = ζ(s) = n=1 ns . Its Euler factorization is ζ(s) =
−s −1
Q
p (1 − p ) where the product extends over all rational primes p.

The zeta functions of Dedekind are the most typical zeta functions of number theory;
and although not required in our Section 12, this appendix is intended to provide motiva-
tional context for our discussion there. Here the student may see something of the larger
role of zeta functions for studying the distribution of primes in Dedekind domains. An-
other reason for including this Appendix is to provide an additional application (Dirichlet’s
Theorem A6.2 below) of the character theory of finite abelian groups of Section 6.
A Dedekind domain is an integral domain in which every nonzero ideal factors
uniquely as a product of prime ideals. The two main examples are the ring OE of integers
in a number field E; and the ring of polynomials OE = F [x1 , x2 , . . . , xn ] in a function
field E = F (x1 , x2 , . . . , xn ). In both cases E is the field of fractions of OE . Questions
regarding the distribution of primes in OE are best studied by rephrasing them in terms
of the behavior of ζE (s) (particularly the zeroes and poles of this zeta function). Often
these questions are too difficult to solve in the number field case (witness the Riemann
hypothesis); and then one turns to the function field case where the questions are typically
 Appendix A6: DEDEKIND ZETA FUNCTIONS AND DIRICHLET SERIES

more manageable, hoping for inspiration that might apply in the number field case. Thus
for example, the very precise formula of Theorem 3.13 counting irreducible polynomials
of each degree (and thereby prime ideals of a given norm) in Fq [x], has a clear analogue
for the prime-counting function π(x) = |{prime p ∈ N : p 6 x}| which we can state as a
conjecture, but are currently unable to prove except in a weaker asymptotic sense.

Theorem A6.2 (Dirichlet). Let a and N be relatively prime positive integers.


Then there exist infinitely many (rational) primes p ≡ a mod N .

Let G = (Z/N Z)× , the multiplicative group of units of the ring of integers mod N ; so
|G| = n := φ(N ). For each character χ ∈ G, b compose χ with the canonical projection
k 7→ k + N Z in order to lift χ to a map χ : Z → Z/N Z → C. We write χ(k) = 0 whenever
gcd(k, N ) 6= 1; and χ(k) ∈ hζn i as before, if gcd(k, N ) = 1. This extension of χ ∈ G
b to a
function Z → C, while not exactly a linear character as defined in Section 6, is completely
multiplicative (i.e. χ(k`) = χ(k)χ(`) for all k, ` ∈ Z). It is called a Dirichlet character
modulo N . Each character χ ∈ G b (lifted to Z) yields a Dirichlet L-function


X χ(k)
(A6.3) Lχ (s) = , where s ∈ C.
ks
k=1

As with the Riemann zeta function of Example A6.1, the series (A6.3) converges for <s > 1
but admits an analytic continuation to a meromorphic function on Cr{1}. And for exactly
the same reasons as in the zeta function case, we obtain an Euler factorization
Y χ(p) −1
Lχ (s) = 1− ,
p
ps

convergent at least for <s > 1. (Here and throughout, the index p varies over all rational
primes.) While the function Lχ (s) has complex values in general, we can safely restrict
s to real values > 1 for the argument at hand. Here, all Euler factors have values in the
right half-plane where we can take the standard branch of natural logarithm; thus
X  χ(p) 
ln Lχ (x) = − ln 1 − x , for x > 1.
p
p

1
We require the Taylor expansion of each of these terms, found by integrating 1−u =
1 + u + u2 + u3 + · · · (for |u| < 1) to obtain

u u2 u3 X uk
− ln(1 − u) = u + + + + ··· = , for |u| < 1.
2 3 4 k
k=1

Using this in the previous formula gives the series expansion


Appendix A6: DEDEKIND ZETA FUNCTIONS AND DIRICHLET SERIES 
∞ ∞
XX χ(p)k X χ(p) XX χ(p)k
(A6.4) ln Lχ (x) = = + .
p k=1
kpkx p
px p k=2
kpkx

The dominant terms in (A6.4) are those in the first sum p χ(p)
P
px . To see that the remaining
terms are small (their total contribution is uniformly bounded for all x > 1), we note that
∞ ∞
XX χ(p)k XX 1 1X 1
6 6
p k=2
kpkx p k=2
kpkx 2 p (px )2

1 X 1 1  π2 
6 = − 1 < 1 for all x > 1.
2 r=2
r2 2 6

Using orthogonality of characters from Theorem 6.2(b), we have

n, if k ≡ a mod N ;
X 
(A6.5) χ(a)χ(k) = for x > 1.
0, otherwise
χ∈G
b

When applied to (A6.4), this yields

X X X χ(a)χ(p) X 1
(A6.6) χ(a) ln Lχ (x) = + O(1) = + O(1) as x → 1+
p χ∈G
px p x
χ∈G
b b p≡a mod N

using (A6.5) for all terms with gcd(k, N )=1; and we recall that the terms with gcd(k, N )>1
give χ(k) = 0. Here ‘O(1)’ stands for terms that are uniformly bounded (it has absolute
value at most n, whatever the value of x > 1; this follows from (A6.4) and the estimate
which follows it). We see that the Dirichlet characters succeed in filtering out individual
congruence classes within the sequence of primes, thereby bringing us closer to our goal.
We now investigate the behaviour of each of the functions Lχ (x) as x → 1+ . We first
show that

(A6.7) b we have Lχ (x) → ∞ as x → 1+ .


for the trivial character χ ∈ G,

For in this case



X 1 X 1 X 1
Lχ (x) = > = → ∞ as x → 1+
kx k x
r=1
(rN + 1) x
gcd(k,N )=1 k≡1 mod N

by comparison with
Z ∞
dt 1
x
= → ∞ as x → 1+ .
1 (tN +1) (x−1)N (N +1)x−1
 Appendix A6: DEDEKIND ZETA FUNCTIONS AND DIRICHLET SERIES

Now consider an arbitrary nontrivial character χ ∈ G,


b and we must show that Lχ (x)
+
remains bounded as x → 1 . In this case we break up the positive integers into intervals
of size N , thus:
∞ ∞ X
N
X χ(k) X χ(k)
(A6.8) Lχ (x) = =
kx r=0 k=1
(rN +k)x
k=1
∞ X
N N
X χ(k) X h 1 1 i
= + χ(k) −
r=0 k=1
(rN +N )x (rN +k)x (rN +N )x
k=1
∞ X
N
X h 1 1 i
= χ(k) −
r=0 k=1
(rN +k)x (rN +N )x
PN
since k=1 χ(k) = 0 by orthogonality of characters from Theorem 6.2(a). Now bound the
inner sum by
N N h
X h 1 1 i X 1 1 i
(A6.9) χ(k) − 6 −
(rN +k)x (rN +N )x (rN +k)x (rN +N )x
k=1 k=1
h 1 1 i
6 N − .
(rN +1)x (rN +N )x

Denoting f (u) = u−x for u > 0, the Mean Value Theorem yields
(N − 1)x (N − 1)x
f (rN +1) − f (rN +N ) = f 0 (ξ)(1 − N ) = (N − 1)xξ −x−1 6 x+1
6
(rN +1) (rN +1)2
for some ξ between rN +1 and rN +N , where x > 1. Using this in (A6.9) and substituting
into (A6.8) gives
∞ ∞
X 1 X 1 N (N −1)xπ 2
|Lχ (x)| 6 N (N −1)x 6 N (N −1)x = .
r=0
(rN +1)2 `2 6
`=1

This finally gives

(A6.10) b |Lχ (x)| remains bounded as x → 1+ .


for every nontrivial character χ ∈ G,

We are now ready to prove Theorem A6.2, arguing by contradiction. Suppose there
are only finitely many primes p ≡ a mod N . Then the right side of (A6.6) is bounded
P 1 as
x → 1+ (because the O(1) terms are bounded; and the other sum converges to p :
primes p ≡ a mod N , a finite sum by assumption). Therefore the left side of (A6.6)
must also remain bounded as x → 1+ . The terms χ(a) ln Lχ (x) for nontrivial χ certainly
remain bounded as x → 1+ , by (A6.10). However for the trivial character χ, the term
|χ(a) ln Lχ (x)| → ∞ as x → 1+ , by (A6.7). This is the desired contradiction; so Theo-
rem A6.2 follows.
Appendix A7: Symmetric Polynomials

Let F be a field. A multivariate polynomial s(x1 , x2 , . . . , xn ) ∈ F [x1 , x2 , . . . , xn ] is sym-


metric if it is unchanged under all n! permutations of the coordinates. Examples include
the elementary symmetric polynomials
X
ek = ek (x1 , x2 , . . . , xn ) = xi1 xi2 · · · xik , k ∈ {0, 1, 2, . . . , n}.
16i1 <i2 <···<ik 6n

Note that ek (x1 , . . . , xn ) has nk terms, these being the products of all k-subsets of the n


indeterminates. In particular,

e0 (x1 , . . . , xn ) = 1, e1 (x1 , . . . , xn ) = x1 +x2 + · · · +xn , en (x1 , . . . , xn ) = x1 x2 · · · xn

and ek = 0 for k ∈
/ {0, 1, 2, . . . , n}. From the definition, one readily deduces the identity
n
Y
(t − xi ) = tn − e1 tn−1 + e2 tn−2 − · · · + (−1)n en
i=1

in F [x1 , . . . , xn , t], and so this product serves as a generating function for the elementary
symmetric polynomials. It also shows that the coefficients in any univariate polynomial
are (up to signs) the elementary symmetric polynomials in its roots.
Another important set of symmetric polynomials is the set of moment polynomials
or power sum polynomials

mk = mk (x1 , x2 , . . . , xkn ) = xk1 + xk2 + · · · + xkn , k ∈ {0, 1, 2, . . .}

(and in particular m0 = n). A famous set of relations allows us to recursively express


the moment polynomials in terms of the elementary symmetric polynomials (and often
conversely, but see the later comments):

Theorem A7.1 (Newton’s Identities). For all k > 0,


e1 = m1
2e2 = m1 e1 − m2
3e3 = m1 e2 − m2 e1 + m3
..
.
k
X
kek = (−1)i+1 mi ek−i
i=1
 Appendix A7: SYMMETRIC POLYNOMIALS

Proof. In the field F ((x1 , x2 , . . . , xn , t)) we have

∞ ∞ n
X Yn
X
i
X
j j
 1
mi t (−1) ej t = (1 − xj t)
i=0 j=0 i=1
1 − xi t j=1
n
X Y n−1
X
= (1 − xj t) = (−1)j (n − j)ej tj .
i=1 16j6n j=0
j6=i

P Q
The last equality holds because in the expansion of i j6=i (1 − xj t), every monomial of
the form (−1)j xi1 xi2 · · · xij tj appears n − j times (once for every index i ∈
/ {i1 , i2 , . . . , ij }).
Comparing coefficients of like powers of t on both sides gives the required identities.

Example A7.2: Computing Characteristic Polynomials. We compute the characteristic poly-


nomial f (t) = det(tI − A) ∈ F7 [t] of the 4 × 4 matrix
3 1 4 2
 
2 5 5 3
A=
0 6 3 4
6 2 1 5
over F7 which was generated randomly. The coefficients in f (t) = t4 − e1 t3 + e2 t2 − e3 t + e4 are
elementary symmetric polynomials in the eigenvalues; and these in turn are expressible in terms of the
moments of the spectrum, these being just the traces mk = tr(Ak ) = 4, 2, 1, 3, 4 for k = 0, 1, 2, 3, 4.
By Newton’s identities we find ek = 1, 2, 5, 6, 4 for k = 0, 1, 2, 3, 4, giving f (t) = t4 + 5t3 + 5t2 + x + 4.
The utility of this approach for determining characteristic polynomials, lies in the simplicity of
implementing matrix powers in a variety of computing languages. When working over Q (or R), this
method requires care due to the growth of matrix entries (or roundoff error); but over a fixed finite
field, this is never a concern.

Newton’s identities show that the moment polynomials can be recursively expressed
as polynomials in e1 , e2 , . . . , en with integer coefficients, i.e. mk ∈ Z[e1 , e2 , . . . , en ] for all
k > 0. A foundational result in classical invariant theory shows that much more gener-
ally, every symmetric polynomial in x1 , x2 , . . . , xn is expressible as a polynomial in the
elementary symmetric polynomials (with coefficients in F ). This says that the subring
of F [x1 , x2 , . . . , xn ] consisting of all polynomials invariant under the full symmetric group
Sn , is exactly the subring F [e1 , e2 , . . . , en ]. The moment polynomials generate a subring
of the ring of all symmetric polynomials, i.e. F [m1 , m2 , . . . , mn ] ⊆ F [e1 , e2 , . . . , en ]. In
characteristic zero, equality holds as can be seen from Newton’s identities; since in charac-
Pk
teristic zero we can solve for ek = k1 i=1 (−1)i+1 mi ek−i and thereby recursively express
e1 , e2 , . . . , en in terms of the moment polynomials. Similarly in positive characteristic p,
Theorem A7.1 allows us to express the elementary symmetric polynomials ek in terms of
the moments, as long as k 6≡ 0 mod p.
Appendix A7: SYMMETRIC POLYNOMIALS 

Although Newton’s identities give a very fast and practical recursive method for gen-
erating the moment polynomials from the sequence of elementary symmetric polynomials,
sometimes it is preferable to have instead a more explicit formula. In such cases we use

Theorem A7.3 (Waring’s Formula). For k > 1, mk ∈ Z[e1 , e2 , . . . , en ] is given


by
X k(i1 +i2 + · · · +in −1)! i1 in
mk = e1 (−e2 )i2 ei33 (−e4 )i4 · · · (−1)n+1 en .
i ,i ,...,in >0
i1 !i2 ! · · · ik !
1 2
i1 +2i2 +3i3 +···+nin =k

Before proving this formula, some remarks bear mention. General results of invariant
theory tell us that this expansion is unique (there can be no more than one way to express
mk in terms of the elementary symmetric polynomials since e1 , e2 , . . . , en are algebraically
independent in F (x1 , x2 , . . . , xn ) ⊃ F , where F is the algebraic closure of F ). As indicated
already, mk ∈ Z[e1 , e2 , . . . , en ] as follows by induction using Newton’s identities; therefore
the coefficients in Waring’s Formula must also be integers. Note that Waring’s Formula
expresses mk in terms of e1 , e2 , . . . , eν only, where ν = min{k, m}; this is because ej = 0
for j > n, and the constraints on the indices i1 , i2 , . . . , in implicitly require that ij = 0
whenever j > k; moreover ik ∈ {0, 1}, and the only term with ik = 1 is (−1)k+1 kek . This
yields the following, which we use in Section 16:

Corollary A7.4. For all k > 1, mk + (−1)k kek ∈ Z[e1 , e2 , . . . , ek−1 ].

It is not too hard to infer this result directly from Newton’s identities. Of course when
k > n, Corollary A7.4 reduces to the statement mk ∈ Z[e1 , e2 , . . . , en ] which we have
already seen.
Proof of Theorem A7.3. Reversing the list of coefficients in f (t) gives the identity
Yn
(1 − xi t) = 1 − e1 t + e2 t2 − · · · + (−1)n en tn
i=1
in Z[x1 , . . . , xn , t]. Now in Q((x1 , x2 , . . . , xn , t)) we obtain the identity
∞ n ∞ n
X mj j X X xji j X
t = t =− ln(1 − xi t)
j=1
j i=1 j=1
j i=1

= − ln 1 − e1 t + e2 t2 − · · · + (−1)n en tn


X (e1 t − e2 t2 + e3 t3 − · · · + (−1)n+1 en tn )k
=
k
k=1
X  i1 +i2 + · · · +in in ti1 +2i2 +3i3 +···+nin
= ei11 (−e2 )i2 ei33 · · · (−1)n+1 en .
i1 , i2 , · · · , in i1 +i2 + · · · +in
i1 ,i2 ,...,in >0

Comparing coefficients of tk on both sides gives Waring’s formula.


Appendix A8: Computational Software
Listed below are six reputable software packages of use in computational algebra. Of these,
the first two (PARI/GP and Mathematica) are probably your best options for this course.
We have attached sample worksheets for both of these, demonstrating worked examples
taken from these notes.

PARI/GP
PARI is open source software designed specifically for computational number theory.
Although it is not a general purpose package for symbolic computation, for computational
number theory its capabilities are on par with anything else you will have access to; and
it is easier to install than any of the other systems. It is freely available for download in
Windows, Mac and Linux versions, from
https://pari.math.u-bordeaux.fr/download.html
In addition to the documentation available through the official PARI/GP website, many
tutorials are available online in both video and readable document form.

Mathematica
Although Mathematica is proprietary software, it is accessible to current students
through our campus license. It is suitable for general symbolic computation, not only in
computational number theory, but for a wide range of mathematical tasks.

Maple
Another general purpose package for symbolic computation, including computational
number theory, is Maple. This is proprietary software which is also currently available to
our students; but we anticipate losing the license for this about a year from now.

Sage
Sage is open source software for performing general symbolic computation. It is freely
available for download from
http://www.sagemath.org/download.html
although trickier to install and use than other options. It is also not as full-featured as the
other software available; but it is steadily growing thanks to the programming contributions
of its devoted users and fans.

GAP (Groups, Algorithms and Programming)


GAP is open source software which excels at some kinds algebraic symbolic computa-
tion. It is intended primarily for group theory, but it offers some more general functionality
as well. It is not too hard to install, and it is freely available for download from
https://www.gap-system.org/

Magma
Magma is proprietary software for general algebraic computation. However if you are
interested, you might ask around our department for help getting this installed.
Appendix A8: COMPUTATIONAL SOFTWARE 

PARI/GP
The screenshot below (on the right) shows a short PARI/GP session verifying selected
details from our Example A3.13. Ending a command with a semicolon suppresses output.
This interactive session included 16 input commands. Our comments on the session, as
follows, are listed according to step numbers:

[1] Input the minimal polynomial f (x) = x4 −x+3.


[2] We compute the discriminant to be 6885.
[3] We factor the discriminant as 34 · 5 · 17.
[4] The ideal (2) ⊂ Z remains prime in the extension
(just one prime factor 2O).
[5] Compute the ramification index e = 1, residual
degree f = 4, and generator 2 of 2O.
[6] Find that 3O has two distinct prime factors.
[7] The first prime factor of 3O is (3, −1+θ) with
e = 3, f = 1.
[8] The second prime factor of 3O is (3, θ) with e=1,
f = 1.
[9] Find that 17O has three distinct prime factors.
[10] The first prime factor of 17O is (17, −7+θ) with
e = 1, f = 1.
[11] The second prime factor of 17O is (17, −4+θ)
with e = 2, f = 1.
[12] The third prime factor of 17O is (17, −2+θ) with
e = 1, f = 1.
[14] The class number is 1 (so O is a PID).
[15] The group of roots of unity has order 2, gener-
ated by −1.
[16] A fundamental unit is θ3 − θ2 + 1.

In fact, the default command for computing the class number in PARI/GP is conditional on
GRH (the Generalized Riemann Hypothesis). Should you choose not to trust this result,
the PARI/GP documentation describes how to verify this computation unconditionally (i.e.
without relying on GRH).

Mathematica
The following pages show a Mathematica session checking some of the steps in the same
Example A3.13. Although Mathematica does not currently have all features available, you
will have no trouble reproducing all the details of Example A3.13 using Mathematica to
do the laborious calculation, if you know what you are doing and follow the steps shown
in our worked Example A3.13.
172

Example A3.13: A Quartic Extension


f = x^4 - x + 3
3 - x + x4
In[3]:=

Out[3]=

theta = Root[f, 1]
Root3 - #1 + #14 &, 1
In[4]:=

Out[4]=

Compute the discriminant and its factorization


In[5]:= NumberFieldDiscriminant[theta]

Out[5]= 6885

FactorInteger[%]
{{3, 4}, {5, 1}, {17, 1}}
In[6]:=

Out[6]=

Verify irreducibility
Factor[f]
3 - x + x4
In[ ]:=

Out[ ]=

Factor f(x) over small primes


Factor[f, Modulus → 2]
1 + x + x4
In[ ]:=

Out[ ]=

Factor[f, Modulus → 3]
x 2 + x3
In[ ]:=

Out[ ]=

Factor[f, Modulus → 5]
1 + x2 3 + 3 x + x2 
In[ ]:=

Out[ ]=

Factor[f, Modulus → 7]
2 + x 5 + 4 x + 5 x2 + x3 
In[ ]:=

Out[ ]=

Factor[f, Modulus → 11]


3 + 10 x + x4
In[ ]:=

Out[ ]=

Factor[f, Modulus → 13]


3 + 12 x + x4
In[ ]:=

Out[ ]=
173
Factor[f, Modulus → 17]
10 + x 13 + x2 15 + x
In[ ]:=

Out[ ]=

Factor[f, Modulus → 19]


10 + x 6 + 5 x + 9 x2 + x3 
In[ ]:=

Out[ ]=

Factor[f, Modulus → 23]


14 + x 15 + 12 x + 9 x2 + x3 
In[ ]:=

Out[ ]=

Factor[f, Modulus → 29]


3 + x (6 + x) 5 + 20 x + x2 
In[ ]:=

Out[ ]=

Compute Roots of Unity


NumberFieldRootsOfUnity[theta]
{- 1, 1}
In[7]:=

Out[7]=

Compute Fundamental Units


NumberFieldFundamentalUnits[theta]
AlgebraicNumberRoot3 - #1 + #14 &, 1, {- 1, 0, 1, - 1}
In[8]:=

Out[8]=

Compute Class Number


In[9]:= NumberFieldClassNumber[theta]

NumberFieldClassNumber : The class number of the number field generated by Root 3 - #1 + #14 &, 1, 0 is not yet available.

Out[9]= NumberFieldClassNumberRoot3 - #1 + #14 &, 1



Bibliography
[ACD] R.J.R. Abel, C.J. Colbourn and J.H. Dinitz, ‘Mutually orthogonal latin squares’,
pp.160–193 in Handbook of Combinatorial Designs, 2nd ed., ed. C.J. Colbourn and
J.H. Dinitz, Chapman & Hall/CRC, Boca Raton, 2007.
[Ar] E. Artin, Galois Theory, 2nd ed., Univ. Notre Dame, 1944.
[AK] E.F. Assmus, Jr. and J.D. Key, ‘Hadamard matrices and their designs: a coding-
theoretic approach’, Trans. Amer. Math. Soc. 330 no.1 (1992), 269–293.
[AS] E.F. Assmus, Jr. and C.J. Salwach, ‘The (16, 6, 2) designs’, Internat. J. Math. and
Math. Sci. 2 no.2 (1979), 261–281.
[BH] L.D. Baumert and M. Hall, Jr., ‘Hadamard matrices of the Williamson type’, Math.
Comp. 19 no.91 (1965), 442–447.
[Bl] A. Blokhuis, ‘Polynomials in finite geometries and combinatorics’, pp.35–52 in: Surveys
in Combinatorics, 1993, ed. K. Walker, Camb. Univ. Press, 1993.
[BD] T. Bröcker and T. tom Dieck, Representations of Compact Lie Groups, Springer, New
York, 1985.
[BR] R.H. Bruck and H.J. Ryser, ‘The nonexistence of certain finite projective planes’,
Canad. J. Math. 1 (1949) 88–93.
[Ca] S. Cavior, ‘Exponential sums related to polynomials over GF (p)’, Proc. Amer. Math.
Soc. 15 (1964), 175–178.
[CR] S. Chowla and H.J. Ryser, ‘Combinatorial problems’, Canad. J. Math. 2 (1950), 93–99.
[CM] R. Coulter and R.W. Matthews, ‘Planar functions and planes of Lenz-Barlotti class
II’, Des. Codes Crypt. 10 (1977) 167–184.
[DO] P. Dembowski and T.G. Ostrom, ‘Planes of order n with collineation groups of order
n2 ’, Math. Z. 103 (1968) 239–258.
[Ga] D.J.H. Garling, A Course in Galois Theory, Camb. Univ. Press, Cambridge, 1986.
[Gl] D. Gluck, ‘A note on permutation polynomials and finite geometries’, Discrete Math.
80 (1990) 97–100.
[Hp] U. Haagerup, ‘Orthogonal maximal *-subalgebras of the n × n matrices and cyclic
n-roots’, pp.296–322 in Operator Algebras and Quantum Field Theory (Rome), In-
ternational Press, Cambridge MA, 1996.
[Ha] M. Hall, Jr., ‘A survey of difference sets’, Proc. Amer. Math. Soc. 7 (1956) 975–986.
[HR] M. Hall, Jr. and H.J. Ryser, ‘Cyclic incidence matrices’, Can. J. Math. 3 (1951) 495–
502.
[HH] B. Hartley and T.O. Hawkes, Rings, Modules and Linear Algebra, Camb. Univ. Press,
Cambridge, 1970.
 Bibliography

[Hi] Y. Hiramine, ‘A conjecture on affine planes of prime order’, J. Combin. Theory Ser.
A 52 (1989) no. 1, 44–50.
[Ho] S.F. Hobbs, ‘The law which is not yet in the law books, yet fills them’, pp.93–101 in
Alabama State Bar Association: Report of the Proceedings of the Annual Meeting,
July 1st and 2nd, 1926’.
[HP] D.R. Hughes and F.C. Piper, Projective Planes, Springer Verlag, New York, 1973.
[Is] I.M. Isaacs, Character Theory of Finite Groups, Academic Press, San Diego, 1976.
[IR] K. Ireland and M. Rosen, A Classical Introduction to Modern Number Theory, 2nd
ed., Springer, New York, 1990.
[Ju] D. Jungnickel, ‘Difference sets’, pp.241–324 in Contemporary Design Theory: A Col-
lection of Surveys, ed. J.H. Dinitz and D.R. Stinson, Wiley, New York, 1992.
[JS1] D. Jungnickel and B. Schmidt, ‘Difference sets: an update’, pp.89–112 in Geometry,
Combinatorial Designs and Related Structures: Proceedings of the First Pythagorean
Conference, ed. J.W.P. Hirschfeld, S.S. Magliveras and M.J. de Resmini, Camb. Univ.
Press, Cambridge, 1997.
[JS2] D. Jungnickel and B. Schmidt, ‘Difference sets: a second update’, Rend. Circ. Palermo
Serie II, Suppl. 53 (1998) 89–118.
[K] N.M. Katz, ‘An overview of Deligne’s proof of the Riemann hypothesis for varieties
over finite fields’, Proc. Symp. Pure Math. 28, Amer. Math. Soc., Providence, R.I.,
1976, pp.275–305.
[Ki] R.E. Kibler, ‘A summary of noncyclic difference sets, k < 20’, J. Comb. Theory 25
(1978), 62–67.
[L1] S. Lang, Cyclotomic Fields I and II: Combined Second Edition, Springer, New York,
1990.
[L2] S. Lang, Algebraic Number Theory, 2nd ed., Springer-Verlag, New York, 1994.
[LN] R. Lidl and H. Niederreiter, Finite Fields, Encyclopedia of Mathematics and its Ap-
plications, Vol. 20, ed. G.-C. Rota, Camb. Univ. Press, Cambridge, 1997.
[LeS] K.H. Leung and B. Schmidt, ‘New restrictions on possible orders of circulant Hada-
mard matrices’, Des. Codes Cryptogr. 64 (2012), 143–151.
[LoS] L. Lovász and A. Schrijver, ‘Remarks on a theorem of Rédei’, Studia Scient. Math.
Hungar. 16 (1981), 449–454.
[MM] D.P. May and G.E. Moorhouse, ‘Uniqueness of mutually unbiased bases of order 5’,
preprint, 2009.
[Mc] P.J. McCarthy, Algebraic Extensions of Fields, Dover, New York, 1991.
[M1] G.E. Moorhouse, ‘Bruck nets, codes, and characters of loops’, Des. Codes Crypt. 1
(1991), 7–29.
Bibliography 

[M2] G.E. Moorhouse, Abstract Algebra I, University of Wyoming, revised 2003.


http://ericmoorhouse.org/handouts/algebra.pdf
[M3] G.E. Moorhouse, Incidence Geometry, revised 2017.
http://ericmoorhouse.org/handouts/Incidence Geometry.pdf
[M4] G.E. Moorhouse, ‘Codes of nets and projective planes’, pp. 207–216 in: Error-Correct-
ing Codes, Finite Geometries and Cryptography, ed. A.A. Bruen and D.L. Wehlau,
Contemporary Mathematics 523, American Mathematical Society, Providence RI,
2010.
[MSW] G.E. Moorhouse, S. Sun and J. Williford, ‘The eigenvalues of the graphs D(4, q)’, J.
Comb. Theory Ser. B 17 (2017) 1–20.
[Re] L. Rédei, Lückenhavte Polynome über endlichen Körpen, Birkhäuser Verlag, Basel,
1970.
[RS] L.Rónyai and T. Szőnyi, ‘Planar functions over finite fields’, Combinatorica 9 (1989)
no. 3, 315–320.
[Ro] M. Rosen, Number Theory in Function Fields, Springer, New York, 2002.
[Sa] P. Samuel, Algebraic Theory of Numbers, Dover, Mineola, NY, 1970.
[S1] B. Schmidt, ‘Cyclotomic integers and finite geometry’, J. Amer. Math. Soc. 12 no. 4
(1999) 929–952.
[S2] B. Schmidt, Characters and Cyclotomic Fields in Finite Geometry, Springer, Berlin,
2002.
[Sc] W.M. Schmidt, Equations over Finite Fields: An Elementary Approach, Springer-
Verlag, Berlin, 1976.
[Se] J.-P. Serre, Linear Representations of Finite Groups, Springer-Verlag, New York, 1977.
[SS] R.G. Stanton and D.A. Sprott, ‘A family of difference sets’, Canad. J. Math. 10 (1958)
73–77.
[Wa] L.C. Washington, Introduction to Cyclotomic Fields, 2nd ed., Springer Verlag, 1997.
[Wh] A.L. Whiteman, ‘An infinite family of Hadamard matrices of Williamson type’, J.
Comb. Theory Ser. A 14 (1973) 334–340.
[Wi] J. Williamson, ‘Hadamard’s determinant theorem and the sum of four squares’, Duke
J. Math. 11 (1944) 65–81.

Index
adjacency operator . . . . . . . . . . . . 41 Hadamard . . . . . . . . . . . . . . . 66
affine plane . . . . . . . . . . . . . . 104 Menon . . . . . . . . . . . . . . . . . 73
classical (desarguesian) . . . . . . 104, 110 Singer . . . . . . . . . . . . . . . . . 63
alphabet . . . . . . . . . . . . . . . . . 42 symmetric . . . . . . . . . . . . . . . 55
algebraic difference set . . . . . . . . . . . . . 54, 63
closure . . . . . . . . . . . . . . . 131 abelian . . . . . . . . . . . . . . . . 55
conjugate . . . . . . . . . . . . . . 153 complementary . . . . . . . . . . . . . 63
extension . . . . . . . . . . . . . . 131 cyclic . . . . . . . . . . . . . . . . . 55
integer . . . . . . . . . . . . . . . . 139 Hadamard . . . . . . . . . . . . . . . 73
number . . . . . . . . . . . . . 131, 138 Menon . . . . . . . . . . . . . . . . . 73
algebraically closed . . . . . . . . . . . 131 nonabelian . . . . . . . . . . . . . . . 55
antiautomorphism . . . . . . . . . . . . . 55 order . . . . . . . . . . . . . . . . . 56
associate . . . . . . . . . . . . . . . . 142 Paley . . . . . . . . . . . . . . . . . 63
automorphism parameters . . . . . . . . . . . . . . . 55
of a design . . . . . . . . . . . . . . . 57 Dirichlet’s Theorem
of a field . . . . . . . . . . . . . 153, 154 on primes in arithmetic progression . . . 164
regular . . . . . . . . . . . . . . . . 57 on units . . . . . . . . . . . . . . . 141
discriminant . . . . . . . . . . . . . . 141
Bernoulli number . . . . . . . . . . . . . 34 domain
block . . . . . . . . . . . . . . . . . . 55 Dedekind . . . . . . . . . . . . . . 163
Euclidean (ED) . . . . . . . . . . 142, 145
categorification . . . . . . . . . . . . . . 6 integral (ID) . . . . . . . . . . . . . . 49
Cavior’s Theorem . . . . . . . . . . . . .99 principal ideal (PID) . . . . . . . . . 142
Cayley (di)graph . . . . . . . . . . . . .40 unique factorization (UFD) . . . . . . 142
character . . . . . . . . . . . . . . . .36 dual code . . . . . . . . . . . . . . . . 43
additive . . . . . . . . . . . . . . . .84 dual group . . . . . . . . . . . . . . . . 36
Dirichlet . . . . . . . . . . . . . . . 164 dual numbers, ring of . . . . . . . . . . 145
linear . . . . . . . . . . . . . . . . . 36
multiplicative . . . . . . . . . . . . . . 84 Eisenstein Criterion . . . . . . . . . . . 137
order of . . . . . . . . . . . . . . . . 85 Eisenstein integers . . . . . . . . . . . 146
principal . . . . . . . . . . . . . . . . 36 elementary symmetric polynomial . . . . . 167
quadratic . . . . . . . . . . . . . . . 9 embedding . . . . . . . . . . . . . . . 150
table . . . . . . . . . . . . . . . . 117 equivalence
trivial . . . . . . . . . . . . . . . . . 9 of complex Hadamard matrices . . . . . 117
characteristic of a field . . . . . . . . . . 130 of difference sets . . . . . . . . . . . . 58
class number . . . . . . . . . . . . . . 147 of Hadamard matrices . . . . . . . . . . 65
code . . . . . . . . . . . . . . . . . . . 41 of ideals . . . . . . . . . . . . . . . 147
codeword . . . . . . . . . . . . . . . . 41 of sets of MUBs . . . . . . . . . . . 119
completely multiplicative function . . . . 2, 164 error syndrome . . . . . . . . . . . . . . 42
complex Hadamard matrix . . . . . . . . 116 Euclidean ring . . . . . . . . . . . 142, 145
normalized . . . . . . . . . . . . . . 122 Euler Criterion . . . . . . . . . . . . . . 77
convolution . . . . . . . . . . . . . 38, 49 Euler factorization . . . . . . 91, 92, 163, 164
cyclic group . . . . . . . . . . . . . . . 1 exponent of a group . . . . . . . . . . 36, 147
cyclotomic exponential sum . . . . . . . . . . . . . 96
field . . . . . . . . . . . . . . . . . . 19 extension . . . . . . . . . . . . . . . 129
integer . . . . . . . . . . . . . . . . . 27 abelian . . . . . . . . . . . . . . . . 19
number . . . . . . . . . . . . . . . . 27 algebraic . . . . . . . . . . . . . . . 130
polynomial . . . . . . . . . . . . . . . 5 cubic . . . . . . . . . . . . . . 148, 156
unit . . . . . . . . . . . . . . . . . . 27 cyclic . . . . . . . . . . . . . . . . 155
degree of . . . . . . . . . . . . . . . 129
Dedekind domain . . . . . . . . . . . . 163 finite . . . . . . . . . . . . . . . . 129
Dedekind zeta function . . . . . . . . . 163 Galois . . . . . . . . . . . . . . . . 155
design imaginary quadratic . . . . . . . . . . 141
automorphism . . . . . . . . . . . . . 57 normal . . . . . . . . . . . . . . . 148
180 INDEX

quadratic . . . . . . . . . . . . 129, 155 Hadamard 2-design . . . . . . . . . . . . 66


quartic . . . . . . . . . . . . . . . 144 Hadamard matrix . . . . . . . . . . . . . 66
real quadratic . . . . . . . . . . . . 141 complex . . . . . . . . . . . . . . . 116
separable . . . . . . . . . . . . . . 141 circulant . . . . . . . . . . . . . . . . 74
simple . . . . . . . . . . . . . . 152, 153 regular . . . . . . . . . . . . . . . . 73
skew-type . . . . . . . . . . . . . . . 66
feasibility relation . . . . . . . . . . . . . 55 Williamson type . . . . . . . . . . . . 67
Fermat’s Hall’s Multiplier Theorem . . . . . . . . . 60
Last Theorem . . . . . . . . . . . . . 29 Hasse-Davenport relation . . . . . . . . . 95
Little Theorem . . . . . . . . . . . . . 77 Hasse diagram . . . . . . . . . . . . . 158
method of descent . . . . . . . . . . 29, 88
field . . . . . . . . . . . . . . . . . . 129 ideal . . . . . . . . . . . . . . . . . 143
algebraically closed . . . . . . . . . . 131 class group . . . . . . . . . . . . . . 147
characteristic of . . . . . . . . . . . . 130 maximal . . . . . . . . . . . . . . . 143
cyclotomic . . . . . . . . . . . . . . . 19 norm of . . . . . . . . . . . . . . . 143
finite . . . . . . . . . . . . . . . . . 7 prime . . . . . . . . . . . . . . . . 143
fixed . . . . . . . . . . . . . . . . 157 principal . . . . . . . . . . . . . . . 143
prime . . . . . . . . . . . . . . . . 130 product of . . . . . . . . . . . . . . 143
of rational functions . . . . . . . . . . 129 sum of . . . . . . . . . . . . . . . . 143
splitting . . . . . . . . . . . . . . . 148 idempotent . . . . . . . . . . . . . . . . 40
flat . . . . . . . . . . . . . . . . . . 117 primitive . . . . . . . . . . . . . . . . 40
Fourier transform . . . . . . . . . . . . . 39 information rate . . . . . . . . . . . . . 42
discrete (DFT) . . . . . . . . . . . . . 44 integer (algebraic) . . . . . . . . . . . . 139
fast (FFT) . . . . . . . . . . . . . . . 44 rational . . . . . . . . . . . . . . . 139
residual . . . . . . . . . . . . . . . . 91 ring of . . . . . . . . . . . . . . . . 140
function integral domain . . . . . . . . . . . . . . 49
L- . . . . . . . . . . . . . . . . 92, 164 irreducible . . . . . . . . . . . . . . . 142
multiplicative . . . . . . . . . . . . . . 2
Jacobi sum . . . . . . . . . . . . . . . . 86
planar . . . . . . . . . . . . . . . . 104
polynomial . . . . . . . . . . . . . . . 13 Kronecker-Weber Theorem . . . . . . . . . 20
totient . . . . . . . . . . . . . . . . . 1
Lagrange interpolation . . . . . . . . . . . 14
Galois Laurent polynomial . . . . . . . . . . . . 52
closure . . . . . . . . . . . . . . . 156 Legendre symbol . . . . . . . . . . . . . 77
correspondence . . . . . . . . . . . . 157 Lemma of Tangents . . . . . . . . . . . . 17
group . . . . . . . . . . . . . . . . 155 L-function . . . . . . . . . . . . . . . . 92
theory, Fundamental Theorem of . . . . 158
Gauss sum . . . . . . . . . . . . . . . . 85 MacWilliams relations . . . . . . . . . . . 43
quadratic . . . . . . . . . . . . . 79, 95 matrix
Gaussian integers . . . . . . . . . . . . 146 circulant . . . . . . . . . . . . . . . . 67
general linear group . . . . . . . . . . . . 8 complex Hadamard . . . . . . . . . . 116
group flat . . . . . . . . . . . . . . . . . 117
additive . . . . . . . . . . . . . . . . 51 generator . . . . . . . . . . . . . . . 42
algebra . . . . . . . . . . . . . . 38, 49 Gram . . . . . . . . . . . . . . . . 117
automorphism . . . . . . . . . . . . 153 Hadamard . . . . . . . . . . . . . . . 64
cyclic . . . . . . . . . . . . . . . . . 1 parity check . . . . . . . . . . . . . . 42
direct product . . . . . . . . . . . . . 52 representation . . . . . . . . . . . . 132
elementary abelian . . . . . . . . . . . 8 Vandermonde . . . . . . . . . . 15, 22, 45
Galois . . . . . . . . . . . . . . . . 155 migration of units . . . . . . . . . . . . 142
general linear . . . . . . . . . . . . . . 8 minimal polynomial . . . . . . . . . . . 130
multiplier . . . . . . . . . . . . . . . 61 moment polynomial . . . . . . . . . . . 167
regular . . . . . . . . . . . . . . . . 57 monic . . . . . . . . . . . . . . . . . . 8
ring . . . . . . . . . . . . . . . . 38, 39 monomial
INDEX 181

equivalence . . . . . . . . . . . . . 65, 119 quadratic


unitary matrix . . . . . . . . . . . . 119 character . . . . . . . . . . . . . . . 9
monomorphism . . . . . . . . . . . . . 150 extension . . . . . . . . . . . . 129, 155
Gauss sum . . . . . . . . . . . . . 79, 95
multiplicative function . . . . . . . . . . . 2 reciprocity . . . . . . . . . . . . . . . 77
multiplier . . . . . . . . . . . . . . 61, 63
Conjecture . . . . . . . . . . . . . . . 61 ramification . . . . . . . . . . . . . . 144
group . . . . . . . . . . . . . . . . . 61 reducible . . . . . . . . . . . . . . . . 142
mutually unbiased . . . . . . . . . . . 118 regular permutation group . . . . . . . 57, 154
representation theory . . . . . . . . . . . 40
MUBs . . . . . . . . . . . . . . . . . 119
residual degree . . . . . . . . . . . . . 144
complete set of . . . . . . . . . . . . 120 residual field . . . . . . . . . . . . . . . 91
root lattice . . . . . . . . . . . . . . . . 76
net . . . . . . . . . . . . . . . . . . 107
classical (desarguesian) . . . . . . . . 111 secant . . . . . . . . . . . . . . . . . 101
cyclic . . . . . . . . . . . . . . . . 111 Segre’s Theorem . . . . . . . . . . . . . 16
Newton’s identities . . . . . . . . . . . 167 sharply divides . . . . . . . . . . . . . . 35
splitting field . . . . . . . . . . . . . . 148
nonsquare . . . . . . . . . . . . . . 10, 77 spectrum . . . . . . . . . . . . . . . . 41
norm square . . . . . . . . . . . . . . . . 10, 77
absolute . . . . . . . . . . . . . . . 132 squarefree . . . . . . . . . . . . . . . . 3
of an element . . . . . . . . . . . 132, 159 Streetlight Effect . . . . . . . . . . . . . 64
of an ideal . . . . . . . . . . . . . 91, 143 subfield . . . . . . . . . . . . . . . . 129
fixed . . . . . . . . . . . . . . . . 157
normal closure . . . . . . . . . . . 148, 156 intermediate . . . . . . . . . . . . . 157
normal basis . . . . . . . . . . . . . . 160 maximal real . . . . . . . . . . . . . . 23
number field . . . . . . . . . . . . . . 140 prime . . . . . . . . . . . . . . . . 130
subnet . . . . . . . . . . . . . . . . . 110
order of a character . . . . . . . . . . . . 85 symmetric polynomial . . . . . . . . 159, 167
order of a design . . . . . . . . . 56, 104, 107 elementary . . . . . . . . . . . . . . 167
power sum (moment) . . . . . . . . . 167
Paley . . . . . . . . . . . . . . . . . . 66
tangent . . . . . . . . . . . . . . . 16, 101
parallel . . . . . . . . . . . . . . 104, 108 totient function . . . . . . . . . . . . . . 1
passant . . . . . . . . . . . . . . . . 101 tower of fields . . . . . . . . . . . . . 129
perfect . . . . . . . . . . . . . . . . . 4 trace . . . . . . . . . . . . . . . 132, 159
planar function . . . . . . . . . . . . . 104 absolute . . . . . . . . . . . . . . . 132
plane transitivity
of extension degree . . . . . . . . . . 129
affine . . . . . . . . . . . . . . . . 104 of norm and trace . . . . . . . . . . . 134
projective . . . . . . . . . . . . . . . 56 twin prime power . . . . . . . . . . . . . 66
point . . . . . . . . . . . . . . . . . . 55
polynomial function . . . . . . . . . . . . 13 unbiased . . . . . . . . . . . . . . . . 118
power sum polynomial . . . . . . . . . . 167 unique factorization . . . . . . . . . . . 142
unit . . . . . . . . . . . . . . . . . . 141
prime
ideal . . . . . . . . . . . . . . . . 143 Vandermonde matrix . . . . . . . . 15, 22, 45
irregular . . . . . . . . . . . . . . . . 34
Mersenne . . . . . . . . . . . . . . . 4 Waring’s Formula . . . . . . . . . . . . 169
ramifies . . . . . . . . . . . . . . . 144 weight
distribution . . . . . . . . . . . . . . 41
regular . . . . . . . . . . . . . . . . 34
enumerator . . . . . . . . . . . . . . . 43
Sophie Germain pair . . . . . . . . . . 35 minimum . . . . . . . . . . . . . . . 41
splits . . . . . . . . . . . . . . . . 144 of a vector . . . . . . . . . . . . . . . 41
subfield . . . . . . . . . . . . . . . 130 of a polynomial . . . . . . . . . . . . 136
primitive Weil’s bound . . . . . . . . . . . . . . . 97
element . . . . . . . . . . . . . . . 151 Wilson’s Theorem . . . . . . . . . . . . . 18
word . . . . . . . . . . . . . . . . . . 41
idempotent . . . . . . . . . . . . . . . 40
root of unity . . . . . . . . . . . . . 1, 4 zeta function . . . . . . . . . . . . . . . 91
projective plane . . . . . . . . . . . . . . 56 Dedekind . . . . . . . . . . . . . . 163
classical (desarguesian) . . . . . . . . . 63 Riemann . . . . . . . . . . . . . . 91, 163

You might also like