Cyclotomic Fields With Applications
Cyclotomic Fields With Applications
Fields with
applications
G Eric Moorhouse
CYCLOTOMIC FIELDS
WITH APPLICATIONS
G. Eric Moorhouse
University of Wyoming
c 2018
ii
Preface
These notes were written during the summer of 2018, while planning a graduate
course for the Fall 2018 semester. The theme was chosen to appeal to students of varying
backgrounds, some with interest primarily in number theory, and others more interested
in combinatorics, graph theory and finite geometry. As it happens, my research has led me
into areas of overlap between these two areas, with cyclotomic fields arising as a common
theme. And so beyond the immediate goal of appealing to students with multiple interests,
this course was conceived also as a way of crystallizing in my mind some of the finer points
of the theory of cyclotomic fields, many of which I had less familiarity with. Students were
referred primarily to Washington’s book [Wa] for further details on cyclotomic fields, and
various other sources as needed.
The realities of my teaching environment (perhaps yours too?) mean that I now
apportion less lecture time on theory and proofs, with more on examples and applications,
than when I first began teaching. In the grand tradition of mathematics, these applications
arise largely in . . . (wait for it!) . . . other areas of mathematics. (That may not be
strictly true; but as usual, our description of these applications has been rather simplified,
sometimes oversimplified, to their mathematical essence, for the sake of brevity.) These
applications include
algorithms for fast arithmetic with polynomials and integers;
constructions and nonexistence results for Hadamard matrices, difference sets, and
designs, particularly nets finite affine and projective planes;
spectra of Cayley graphs and digraphs over abelian groups;
counting solutions to equations over finite fields;
the MacWilliams relations for error-correcting codes;
Dirichlet’s theorem on primes on arithmetic progressions; and
mutually unbiased bases (quantum information theory).
Given the demands of this pedagogical emphasis, there has been no single reference avail-
able where all of these developments can be found.
Another design constraint on these notes has been the varying backgrounds of our
students, some will have had advanced courses in field theory or number theory, and
others not. In order to keep these notes as self-contained as possible, I have included
appendices containing many of the results needed from field theory and number theory,
omitting the longer proofs; also omitting major results in the theory which to not bear
directly upon our particular development or featured applications. I expect that during
this fall semester, I will actually summarize much of the content in these appendices during
the lectures, rather than leaving students to read these solely on their own.
I am indebted to many sources from which I have borrowed extensively, particularly
[IR], [LN], [Sa] and [Wa]. Often this has meant rewriting content in my own way, and
iii
adding details which other authors have left as exercises. I have also looked for ways
to avoid explicitly developing all the tools required in some of the standard proofs—
not that I feel these tools are unimportant for students to learn, but due to concern
that the proliferation of technical definitions and warmup lemmas would overly distract
students from the main points. One of my goals, in particular, is a presentation of Gluck’s
Theorem 14.2 (a beautiful and very accessible argument using cyclotomic integers in a
nontrivial and surprising way). Its proof, however, invokes a theorem of Segre usually
formulated in the language of projective plane geometry. Not wanting to spend the extra
time on such an extended detour for the majority of our students without this conceptual
background, I strove instead for an alternative presentation of Segre’s Theorem in the
language of affine plane geometry. I am very happy with the resulting Theorem 3.14,
which I feel is also better adapted to the proof of Gluck’s Theorem than the original.
I regret omitting several major topics which a more comprehensive textbook would
have included: Stickelberger’s Theorem, higher reciprocity laws, applications to algebraic
coding theory and cryptology, and Bernhard Schmidt’s definitive work on the circulant
Hadamard conjecture. However, in the spirit of a set of working lecture notes, my priority
has been to limit the scope to only what I believe can be accomplished in a single semester.
But perhaps in a future revision. . .
Throughout all my rewriting of standard material, I will certainly have added many
of my own errors, for which I take full responsibility. A list of errata will be posted at
http://ericmoorhouse.org/courses/5590/
With each mistake/misprint that you encounter in this manuscript, please first check the
website to see if it has already been listed; if not, please email me at [email protected]
with the necessary correction to add to this list. Thank you!
Eric Moorhouse
August, 2018
Notational Conventions
Throughout these notes, I compose functions right-to-left, as in (στ )(a) = σ(τ (a)).
Groups are multiplicative unless otherwise indicated. The symbol ζ denotes a complex root
of unity, except when it represents a zeta function (à la Riemann, Dedekind, Hasse, etc.).
Likewise, ‘i’ signifies either an integer index (sometimes a dummy index of summation),
√
or −1, again depending on the context. So deal with it.
iv
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
8. Difference Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9. Hadamard Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Skew-Type Hadamard Matrices 66
Williamson-Hadamard Matrices 67
Regular Hadamard Matrices 73
Circulant Hadamard Matrices 74
Appendices
A1. Fields and Extensions. . . . . . . . . . . . . . . . . . . . . . . . . . .129
Matrix Representations of Field Extensions 132
v
A4. Normal and Separable Extensions . . . . . . . . . . . . . . . 148
A5. Field Automorphisms and Galois Theory . . . . . . . . 153
A6. Dedekind Zeta Functions and Dirichlet Series . . . 163
A7. Symmetric Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 167
A8. Computational Software . . . . . . . . . . . . . . . . . . . . . . . . 170
PARI/GP 171
Mathematica 172
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
vi
1. Finite Cyclic Groups
A cyclic group is a group generated by a single element. A cyclic group may be finite or
infinite. Every infinite cyclic group is isomorphic to the additive group of Z; or equivalently,
the multiplicative subgroup hπi = {π k : k ∈ Z} ⊂ C× . (Here C× is the multiplicative group
of nonzero complex numbers; and one can replace π by any nonzero complex number which
is not a root of unity.) Every finite cyclic group of order n is isomorphic to the additive
group Z/nZ of integers mod n; equivalently, the multiplicative subgroup of complex nth
roots of unity. The latter group is
Theorem 1.1. Let G = hxi be a (multiplicative) cyclic group of order n > 1. Then
G has φ(n) generators xk , 1 6 k 6 n, gcd(k, n) = 1. Every subgroup of G is cyclic of
order d dividing n. Conversely, for every positive d n, G has a unique subgroup of
order d given by hxn/d i. Thus X
n= φ(d).
16d|n
Proof. Most of Theorem 1.1 is proved by the Division Algorithm. For example if H is a
subgroup of G, let d ∈ {1, 2, . . . , n} be minimal such that xd ∈ H. (Note that the set of
d ∈ {1, 2, . . . , n} satisfying xd ∈ H is nonempty since xn = 1 ∈ H; so the minimum such
d is defined.) So hxd i ⊆ H. We have n = qd + r for some integers q, r with 0 6 r < d. If
0 < r < d then xr = xn (xd )−q ∈ H, contradicting the minimality of d; so we must have
d n. Now hxd i ⊆ H; and to prove equality, let h ∈ H, so h = xj for some j. Again by the
Division Algorithm, j = q 0 d + r0 where 0 6 r0 < d; and again using the minimality of d,
0
we have r0 = 0, so xj = (xd )q ∈ hxd i. This gives H = hxd i. The relation n = d|n φ(d)
P
follows by counting in two different ways the number of pairs (g, H) where g ∈ G and
H = hgi 6 G.
1. Finite Cyclic Groups
Theorem 1.2. If G is a cyclic group of order n, then its automorphism group Aut G
is abelian of order φ(n). In fact, Aut G ∼
= (Z/nZ)× , the multiplicative group of units
of the ring of integers mod n.
Proof. Let G = {1, g, g 2 , . . . , g n−1 }. For each k ∈ {1, 2, . . . , n} with gcd(k, n) = 1, define
σk : G → G by σk (x) = xk . One easily checks that σk is well-defined, bijective, and
σk (xy) = (xy)k = xk y k = σk (x)σk (y). Thus σk ∈ Aut G.
Conversely, let σ ∈ Aut G. Since g has order n, so does σ(g); thus σ(g) = g k for some
k ∈ {1, 2, . . . , n} with gcd(k, n) = 1. It follows readily that σ = σk . (Every x ∈ G has the
form x = g r for some g; and then σ(x) = σ(g r ) = σ(g)r = (g k )r = σk (g r ) = σk (x).) Thus
Aut G = {σk : 1 6 k 6 n, gcd(k, n) = 1}. The map (Z/nZ)× → Aut G, k 7→ σk is in fact
an automorphism: it is bijective, and for all x ∈ G, (σk (σ` (x)) = (x` )k = xk` = σk` (x), so
σk σ` = σk` . In particular, Aut G is abelian of order φ(n).
Caution: Do not confuse G with its automorphism group Aut G. Keep in mind that G and
Aut G do not have the same order. Theorem 1.2 does not say that Aut G is cyclic; nor
does it say that the automorphism group of an abelian group is abelian. See Exercise #4.
A function f defined on positive integers is multiplicative if f (mn) = f (m)f (n)
whenever m, n are relatively prime positive integers. (Note the condition that gcd(m, n) =
1. A function f is completely multiplicative if f (mn) = f (m)f (n) for all m, n. We shall
have reason to consider functions with this stronger property; but φ is not an example.)
1
Q
Theorem 1.3. φ is multiplicative, and φ(n) = n p|n 1− p where the product
extends over all prime divisors p|n .
Proof. Suppose n = ab where a and b are relatively prime positive integers. The natural
homomorphism f : Z → (Z/aZ) ⊕ (Z/bZ) mapping k 7→ (k+aZ, k+bZ) has kernel nZ
(since this is the set of all integers divisible by both a and b.). Since f is surjective, it
induces a ring isomorphism
Z/nZ ∼
= (Z/aZ) ⊕ (Z/bZ).
Now it is easy to see that if R1 and R2 are rings with identity, then the units of R1 ⊕ R2
are exactly the elements (u1 , u2 ) with ui a unit in Ri . Thus
(Z/nZ)× ∼
= (Z/aZ)× × (Z/bZ)× .
Taking the cardinality of both sides gives φ(n) = φ(a)φ(b).
If p is prime and r > 1, then the integers k ∈ {1, 2, . . . , pr } not relatively prime
to
r r−1 r r r−1 r 1
p are the integers k = p`, ` ∈ {1, 2, . . . , p }; so φ(p ) = p − p = p 1 − p . So
r
the indicated formula for φ(n) holds for prime powers n = p . More generally, let n be a
1. Finite Cyclic Groups
Qr
positive integer and consider its prime factorization n = i=1 pei i where p1 , p2 , . . . , pr are
the distinct prime factors of n, and ei > 1. Using multiplicativity of φ,
r
Y Yr
ei 1
1 − p1i .
φ(n) = pi 1 − pi = n
i=1 i=1
Some textbooks use the Chinese Remainder Theorem in place of the ring isomorphism
Z/nZ ∼ = (Z/aZ) ⊕ (Z/bZ) in proving the multiplicativity of φ. In our view, however, it
is the ring isomorphism which most naturally gives rise to both the Chinese Remainder
Theorem and the group isomorphism (Z/nZ)× ∼ = (Z/aZ)× × (Z/bZ)× . This is a classic
instance where just a modicum of abstract algebra provides a clean and insightful proof,
where the alternative is a rather tedious and technical argument.
A positive integer n is squarefree if n is not divisible by any square integer larger
than 1; equivalently, n is a product of distinct primes (possibly an empty product, so that
1 is squarefree). For every positive integer n, define
(−1)k , if n is a product of k distinct primes;
µ(n) =
0, if n is not squarefree.
Like φ, the function µ is multiplicative. Indeed, µ is the unique multiplicative function
satisfying
1, if k = 0;
µ(pk ) = −1, if k = 1;
0, if k > 2
where p is prime. From Theorems 1.1 and 1.3 we obtain
X X µ(d)
n
Theorem 1.4. φ(n) = µ d d = d n.
16d|n 16d|n
Note that the two formulas given are equivalent via the substitution d ↔ nd for divisors
d|n. The formulas Pcan be proved in several ways: (i) directly by mathematical induction
on n, using n = 16d|n φ(d); or (ii) using the inclusion-exclusion principle for counting
the cardinalities of the subgroups H 6 G where |G|/|H|Q is squarefree;
or (iii) using Möbius
1
inversion; or (iv) by expanding the formula φ(n) = n p|n 1 − p from Theorem 1.3.
Exercises 1.
1. (a) Let G be a cyclic group of order n. Note that if one chooses g ∈ G uniformly at random, the
φ(n) φ(n) φ(n)
probability that hgi=G is n ∈ [0, 1]. Show that lim sup n = 1 and lim inf n = 0.
n→∞ n→∞
n
This says that in particular, the ratio φ(n)
has no upper bound.
φ(n) 1 φ(1) φ(2) φ(n)
(b) Determine the ‘limiting average value’ of n
, i.e. evaluate lim [ 1 + +· · ·+ ].
n→∞ n 2 n
Some numerical data might provide an initial insight here.
2. Factorization of most integers having more than a couple hundred decimal digits, is prohibitively
difficult. (In fact, no polynomial-time algorithm for integer factorization is known.) Show that
computation of φ(n) for large integers is also prohibitively difficult in general. Hint: Consider
2. Cyclotomic Polynomials
numbers of the form n = pq where p 6= q are large primes. For such numbers we have φ(n) =
(p − 1)(q − 1). Show that any algorithm to compute φ(n) (given only the decimal representation
of n) can also be used to provide the prime factorization of n.
3. To compute gcd(m, n) for small integers, one typically relies on first determining the prime
factorizations of m and n. By remarks above, this approach fails for large integers. However,
computation of gcd(m, n) for large integers (having several hundred digits) is very efficient using
Euclid’s algorithm (which runs in polynomial time). But by #2, one cannot expect to be able
to compute φ(n) exactly for most large values of n.
Given a large positive integer n, one might try to estimate φ(n) by random sampling: Repeat-
edly choose k between 1 and n (uniformly distributed, using a pseudorandom number generator).
After N trials, if d is the number of values of k found for which gcd(k, n) = 1, we obtain an
estimate φ(n) ≈ dnN
. How practical is this approach as a means to estimate φ(n)? In particular
can one realistically hope to approximate φ(n) to within, say, 10% of its true value? Can you
suggest obvious improvements to this approach? Consider in particular (by #1) that for large
values of n, there are values of n where randomly sampling k ∈ {1, 2, . . . , n} almost always finds
gcd(k, n) = 1; and other values of n where random sampling almost always finds gcd(k, n) > 1.
4. (a) Find the smallest n such that the automorphism group of a cyclic group G of order n is not
cyclic. Determine the isomorphism type of G in this case.
(b) Find the smallest abelian group G for which Aut G is nonabelian. Indicate the isomorphism
types of G and Aut G in this case.
5. The sum of the positive divisors of n is a multiplicative function σ(n) similar to φ(n), satisfying
the formula σ(n) = n p|n (1 + p1 ).
Q
(a) Derive formulas (analogues of Theorems 1.1 and 1.4) expressing n as a sum of values of the
σ function, and the reverse.
(b) A notoriously difficult problem is the determination of solutions of σ(n) = 2n. Such numbers
are called perfect. No odd perfect numbers are known. All even perfect numbers have the
form 2p−1 (2p − 1) where both p and 2p − 1 are prime; but only finitely many primes of the
form 2p − 1 are known (Mersenne primes) although it is conjectured that infinitely many
exist. So only finitely many (roughly 50) perfect numbers are known. Say what you can
about solutions of the analogous equation n = 2φ(n).
2. Cyclotomic Polynomials ζ3 ζ2
............... .
•
......................................................................
.................
• ...........
.......
. . . ..................
Let n be a positive integer. The multiplicative group C× of ... .. . ..........
.
.... . ..........
... . .........
.
... . .........
......
units in the complex numbers, form a cyclic group ..
...
.
.
.. .
. .
.
.
•ζ . ...
. ........
..
. . .
. ......
..
. . . ......
......
hζi = {1, ζ, ζ 2 , . . . , ζ n−1 } ... . . .
.... . . . .... ......
.. . .
... 2π/n ......
.. ... ....
.. . . . ... . . . . . . ......
of order n, where ζ = ζn is a primitive complex nth root of 0•
...
...
...
•1 .
.
. .......
.
... . ........
.
... . . ..
unity. Usually we take ζn = e2kπi/n ; although for most pur- ...
...
...
.
.
.
. .........
..
......
..
. ...
poses, any primitive n-th root of unity will serve just as well. •ζ ...
...
.... .......
. .
. −1
..... .........
..... .............
2 n−1 ...... ..
We consider 1, ζ, ζ , . . . , ζ as unit vectors in the plane sym- ......
........
..........
......
.......
.....
............................................
metrically arranged about the origin, forming the vertices of
a regular n-gon inscribed in the circle |z| = 1. By symmetry, we directly infer the relation
1 + ζ + ζ 2 + · · · + ζ n−1 = 0 whenever n > 2,
a fact that we can also derive algebraically by comparing coefficients of tn−1 on both sides of
2. Cyclotomic Polynomials
So ζ is a root of 1 + t + t2 + · · · + tn−1 ∈ Z[t]; yet this not in general the minimal polynomial
of ζ. The n-th cyclotomic polynomial is the monic polynomial defined by
Y
Φn (t) = (t − ζ k ).
16k6n
gcd(k,n)=1
By construction, its roots are all the φ(n) primitive n-th roots of unity in C; so the coeffi-
cients in Φn (t), being the elementary symmetric polynomials in these roots, are algebraic
integers. The extension E = Q[ζ] ⊇ Q contains all these roots, since they are powers of ζ;
and so E is the splitting field of Φn (t) over Q. In particular, E ⊇ Q is a Galois exten-
sion (Appendix A5). Every automorphism σ ∈ Aut E permutes the roots of Φn (t), so the
coefficients in Φn (t) lie in Q (Theorem A5.12). So by Theorem A3.2(ii), these coefficients
must be rational integers. This shows that Φn (t) ∈ Z[t].
Grouping together the factors t − ζ r in (2.1) according to gcd(r, n), we have
Y Y
tn − 1 = (t − ζ r ).
d|n 16r6n
gcd(r,n)=d
n
gcd j, nd = 1. This gives
Now gcd(r, n) = d iff r = dj where 1 6 j 6 d,
Y Y Y
tn − 1 = (t − ζ dj ) = Φ nd (t)
d|n 16j6 n d d|n
gcd(j, n )=1
d
(ii) Each of the polynomials Φn (t) has integer coefficients; it is irreducible in Z[t],
and so also in Q[t]. Hence Φn (t) is the minimal polynomial of ζn over Q.
The fact that Φn (t) ∈ Z[t] was shown above. We have not actually shown that Φn (t) is
irreducible in Z[t] (and so also in Q[t]). Here we give the standard proof of this in the
important special case n = p is prime. A similar argument works for prime powers n = pe
(Exercise #2). Lang [L2] proves the irreducibility of Φn (t) in the general case, using an
argument which reduces to the prime power case. Now for p prime, use the fact that
is irreducible in Z[t]. So Φp (t) ∈ Z[t] is also irreducible in Z[t] (as follows from the
substitutions u = t+1, t = u−1 with all-integer coefficients) and so Φp (t) is also irreducible
in Q[t].
The factorization tn − 1 = d|n Φd (t) can be reversed to compute the cyclotomic
Q
polynomials from the polynomials td − 1 using ordinary division of polynomials. This may
be expressed as
d
− 1)µ(n/d) .
Q
Theorem 2.3. For every n > 1, Φn (t) = d|n (t
For example,
Φ1 (t) = t − 1
t2 −1
Φ2 (t) = t−1 =t+1
3
t −1
Φ3 (t) = t−1 = t2 + t + 1
t4 −1
Φ4 (t) = t2 −1 = t2 + 1
t5 −1
Φ5 (t) = t−1 = t4 + t3 + t2 + t + 1
(t6 −1)(t−1)
Φ6 (t) = (t3 −1)(t2 −1) = t2 − t + 1
t7 −1
Φ7 (t) = t−1 = t6 + t5 + t4 + t3 + t2 + t + 1
t8 −1
Φ8 (t) = t4 −1 = t4 + 1
t9 −1
Φ9 (t) = t3 −1 = t6 + t3 + 1
(t10 −1)(t−1)
Φ10 (t) = (t5 −1)(t2 −1) = t4 − t3 + t2 − t + 1
· · · etc.
Theorem 2.3 also gives us another proof that Φn (t) ∈ Q[t] (leading to another expla-
nation why Φn (t) ∈ Z[t]). Comparing degrees on both sides of Theorem 2.2(i) gives n =
n
P P
d|n φ(d); and comparing degrees on both sides of Theorem 2.3 gives φ(n) = d|n µ d d.
So we recover the formulas of Theorems 1.1 and 1.4. Intuition recognizes a connection here;
formally, this is expressed by the observation that Theorems 2.2 and 2.3 are categorified
versions of their counterparts in Section 1. We do not explain the meaning of categorifi-
cation, but it is worth noting that efforts to categorify numerical formulas in a way much
like this are often very fruitful.
Proof. We have already explained why the extension E ⊇ Q is the splitting field of Φn (t)
over Q (relying on the fact that Φn (t) is irreducible over Q, for which we have cited
Lang [L2] in the general case). So E ⊇ Q is Galois of degree [E : Q] = deg Φn (t) = φ(n);
and Aut E = G(E/Q) also has order φ(n). By Theorems A5.4 and A5.5, G is faithfully
represented as a transitive group of permutations of the φ(n) roots of Φn (t). This gives a
representation of G as a group of automorphisms of the cyclic group hζi of order n; and
this group has been completely described in the proof of Theorem 1.2. Noting that hζi has
only these φ(n) automorphisms, we may identify G with the automorphism group of the
cyclic group hζi of order n.
The general theory of cyclotomic fields will be continued in Section 4. Before then
we will quickly survey the theory of finite fields, which will put us in better stead for the
sequel.
Exercises 2.
1. Contrary to what one might first guess based on the first few examples, coefficients appearing in
cyclotomic polynomials are not always 0 or ±1. Find the first example (i.e. with the smallest n)
in which Φn (t) has a coefficient ±2.
2. Let n = pe , p prime, e > 1. Prove that Φn (t) has integer coefficients and is irreducible in Z[t],
hence also in Q[t].
(b) State and prove an analogue of (a) expressing Φ3n (t) in terms of Φn (t).
(c) By generalizing (a) and (b), can you give a formula for Φpn (t) in terms of Φn (t) whenever p is
prime? If so, does this lead to a practical algorithm for computing cyclotomic polynomials?
By ‘practical’, one might ask how well it compares with the algorithm based on Theorem 2.3,
and illustrated by the examples which follow that result.
4. Find, with proof, a simple formula for Φn (1).
5. The general linear group of degree n over a field F is the multiplicative group GLn (F )
consisting of all invertible n × n matrices over the Q
field F . In the special case that F is finite with
|F | = q elements, one has |GLn (Fq )| = q n(n−1)/2 n k
k=1 (q − 1). For each n = 1, 2, . . . , 6, express
|GLn (Fq )| as an explicit polynomial in q; and in each case, factor the resulting polynomial into
irreducible factors in Z[q].
3. Finite Fields
For every prime p, the integers mod p form a field K = Fp . The essential observation
here is that every nonzero element a ∈ K has a multiplicative inverse. (Interpret a as an
integer not divisible by p; then since gcd(a, p) = 1, the extended Euclidean algorithm gives
3. Finite Fields
1 = ra + sp for some r, s ∈ Z, and then r is an inverse for a mod p.) We next consider
arbitrary finite fields.
Let F be a finite field; and let q = |F | be its order. Since the additive group of F
is finite, every element in this group has finite order; so there is an element a ∈ F of
prime order p in the additive group. Now every nonzero element b ∈ F must also have
additive order p. To see this, note that the map θ : F → F , x 7→ ab x is clearly bijective
and θ(x + y) = θ(x) + θ(y), so θ is an automorphism of the additive group of F ; so b has
the same order as θ(b) = a. Thus all nonidentity elements of F have additive order p. An
abelian group with this property is elementary abelian: it is a direct product of cyclic
groups of order p. Thus every finite field F has prime power order q = pe for some e > 1
and prime p. Less obvious is the fact that for every prime power q, there is a field F of
order q; and it is unique up to isomorphism, so we may unambiguously write F = Fq . The
field F is an extension of degree [F : K] = e over the prime field K = Fp . The prime p
is the characteristic of F (and of K). See Appendix A1 for more a general discussion of
fields and extensions.
The most direct way to construct F = Fq , q = pe , is to first choose a monic irreducible
polynomial f (t) ∈ K[t] of degree e. Without loss of generality, f (t) is monic (its leading
coefficient is 1). Then F = K[θ] where θ is a formal symbol acting as a root of f (t). In
other words, F ∼ = K[t]/(f (t)). Now F has {1, θ, θ2 , . . . , θe−1 } as a basis over K. This
information gives a completely explicit construction of F . (Note that there are pe monic
polynomials in K[t]; and at least one of them is irreducible. Take this fact on faith for
now; also the fact that the resulting field F doesn’t depend on which such irreducible
polynomial we choose.)
Example 3.1: The field of order 16. Let K = F2 = {0, 1}. The monic polynomials t and
t+1 of degree 1 are of course irreducible. Of the four polynomials of degree 2, three (namely t2 ,
t2 +1 = (t+1)2 and t2 +t = t(t+1)) are reducible; so by process of elimination, the polynomial
t2 +t+1 is irreducible. Of the sixteen polynomials of degree 4, we need only consider those with
constant term 1 (so that 0 is not a root) and an odd number of terms (so that 1 is not a root). This
leaves just four polynomials including t4 +t2 +1 = (t2 +t+1)2 ; so each of the remaining three choices
t4 +t+1, t4 +t3 +1, t4 +t3 +t2 +t+1
are all irreducible (because otherwise, an irreducible factor of degree 6 2 would be involved, a possi-
bility which we have already ruled out). For simplicity, we take f (t) = t4 +t+1 and F = K[θ] where
f (θ) = 0, i.e. θ4 = θ+1.
to prove that F × is itself cyclic, it suffices to prove that gcd(ni , nj ) = 1 for all distinct i, j.
If not, then gcd(ni , nj ) = d > 2 for some i 6= j. But then F × has at least d2 elements
of order dividing d (inside the subgroup Cni × Cnj ), hence at least d2 > d roots of the
polynomial xd − 1, a contradiction.
Example 3.3: The field of order 16. Take F = F16 = K[θ] where K = F2 and θ4 = θ+1 as in
Example 3.1. Recursively computing θj+1 = θθj gives
θ4 = θ+1 θ7 = θ3 +θ+1 θ10 = θ2 +θ+1 θ13 = θ3 +θ2 +1
θ5 = θ2 +θ θ8 = θ2 +1 θ11 = θ3 +θ2 +θ θ14 = θ3 +1
6 3
θ = θ +θ 2 9 3
θ = θ +θ 12 3
θ = θ +θ +θ+1 2 θ15 = 1
Of course in the present context (characteristic two), all minus signs are the same as plus signs. Note
that the cyclic group F × of order 15 contains
• one element 1 of order 1 (the root of Φ1 (t) = t−1);
• two elements of order 3. These are θ5 and θ10 , the roots of Φ3 (t) := t2 +t+1;
• four elements of order 5. These are θ3 , θ6 , θ9 , θ12 , the roots of Φ5 (t) := t4 +t3 +t2 +t+1; and
• eight elements of order 15. These include θ, θ2 , θ4 , θ8 , the roots of t4 +t+1; and θ7 , θ11 , θ13 , θ14 ,
the roots of t4 +t3 +1. Together these make up the eight roots of Φ15 (t) = t8 −t7 +t5 −t4 +t3 −t+1.
The cyclotomic polynomial Φn (t) is defined in Section 2; its roots are the primitive n-th roots of
unity. The elements of F are the sixteen roots of t16 −t; and the elements of F × are the fifteen roots
of t15 −1 = Φ1 (t)Φ3 (t)Φ5 (t)Φ15 (t).
the map
1, if a ∈ S;
(
χ : F → C, χ(a) = −1, if a ∈ N ;
0, if a = 0.
Theorem 3.4. For any field F of odd order q, the quadratic character χ satisfies
q−1
χ(a) = a 2 (interpreted as an element of F ). In particular, χ(ab) = χ(a)χ(b) for
all a, b ∈ F . Thus the product of two squares, or of two nonsquares, is a square; the
product of a square and a nonsquare is a nonsquare.
Proof. The assertions are clear when a = 0 (or b = 0); so assume ab 6= 0. The q − 1
q−1 q−1
elements of the cyclic group F × are the roots of tq−1 −1 = t 2 +1 t 2 −1 where every
q−1
square a2 ∈ S satisfies (a2 ) 2 = aq−1 = 1, so the nonzero squares are the q−1
2 roots
3. Finite Fields
q−1 q−1
of t 2 −1 in F ; and by elimination, the nonsquares are the roots of t 2 +1 in F . The
q−1
multiplicative property χ(ab) = χ(a)χ(b) follows from the formula χ(a) = a 2 .
q−1
Proof. Use χ(−1) = (−1) 2 .
q−1
Theorem 3.6. Let a ∈ F × , F = Fq , q odd, ε = (−1) 2 . Then each s ∈ S has
1
(i) 4 (q−4−ε) solutions of s = s1 + s2 , (s1 , s2 ) ∈ S × S;
1
(ii) 4 (q−2+ε) solutions of s = s1 + n1 , (s1 , n1 ) ∈ S × N ; and
1
(iii) 4 (q−ε) solutions of s = n1 + n2 , (n1 , n2 ) ∈ N × N .
Each n ∈ N has
1
(iv) 4 (q−ε) solutions of n = s1 + s2 , (s1 , s2 ) ∈ S × S;
1
(v) 4 (q−2+ε) solutions of n = s1 + n1 , (s1 , n1 ) ∈ S × N ; and
1
(vi) 4 (q−4−ε) solutions of n = n1 + n2 , (n1 , n2 ) ∈ N × N .
Proof. Elementary counting arguments show that the set of triples (a, b, c) with a, b, c ∈
F × and a + b + c = 0, form a set T of size |T | = (q − 1)(q − 2). Let mi be the number
3. Finite Fields
of such triples containing exactly i squares, i ∈ {0, 1, 2, 3}. A fixed η ∈ N acts on T via
(a, b, c) 7→ (ηa, ηb, ηc), showing that m3−i = mi . So
Now the 14 (q − 1)2 triples (a, b, −a−b) with a, b ∈ S come in three types:
(I) m3 triples with −a−b ∈ S.
(II) 13 m2 triples with −a−b ∈ N . By symmetry, triples (a, b, c) ∈ T containing exactly
two squares are equally distributed between N ×S×S, S×N ×S and S×S×N .
(III) Triples with −a−b = 0, i.e. −1 = ab ∈ S. Such triples can only occur if q ≡ 1 mod 4,
in which case there are 21 (q −1) such triples (a, −a, 0), a ∈ S. In all cases, the number
of such triples can be written as 14 (1 + ε)(q − 1).
Thus 2
1
q − 1 = m3 + 13 m2 + 1
4 4 1+ε q−1 .
We now solve to obtain
1 3
m0 = m3 = 8 q − 1 q − 2 − 3ε , m1 = m2 = 8 q−1 q−2+ε .
Now it suffices to prove (i)–(iii), since these yield (iv)–(vi) by simply multiplying each of
the equations to be solved by η as above. And in each of (i)–(iii), the number of solutions
is independent of the choice of s ∈ S, as follows by multiplying the corresponding equation
by s.
For (i), solutions of 1 = s1 + s2 correspond to triples (1, −s1 , −s2 ) ∈ T . There are
m3 1 m1 /3 1
(q−1)/2 = 4 (q − 5) such triples if ε = 1; or (q−1)/2 = 4 (q − 3) such triples if ε = −1, where
m1 1
3 accounts for threefold symmetry as in (II). In either case we find 4 (q − 4 − ε) solutions
as claimed in (i).
For (ii), solutions of 1 = s1 + n1 correspond to triples (1, −s1 , −n1 ) ∈ T . Regardless
m2 /3
of the value of ε, there are (q−1)/2 = 41 (q − 2 + ε) such triples.
For (iii), solutions of 1 = n1 + n2 correspond to triples (1, −n1 , −n2 ) ∈ T . There are
m1 /3 1 m3 1
(q−1)/2 = 4 (q − 1) such triples if ε = 1; or (q−1)/2 = 4 (q + 1) triples if ε = −1. In both
cases, the number of solutions is 14 (q − ε) as claimed in (iii).
p p p
p−1 p−1
σ(a + b) = ap + 1 a b+ 2 ap−2 b2 + · · · +
ab + bp = ap + bp = σ(a) + σ(b)
p−1
the prime p. Thus σ is a ring homomorphism. Since F is a field, its only ideals are {0}
and F ; and ker σ 6= F since σ(1) = 1. So ker σ = {0} and σ is injective. Since F is finite,
σ is also bijective, and it is an automorphism of F . The element σ ∈ Aut F has order at
most e = [F : K] by Theorem A5.4(iii). Its order must be exactly e since if 1 6 k < e,
k
the nonconstant polynomial xp − x ∈ K[x] cannot have pe distinct roots. Since the upper
bound |G| = e = [F : K] is attained, the extension F ⊇ K is Galois and G = hσi.
A general principle (not actually a theorem) is that almost anything that works for
prime order fields, works for finite fields. We now generalize Theorem 3.7 to arbitrary
finite fields.
polynomials don’t even have the same degree). In grappling with the source of one’s own
confusion, the student will come to better understand polynomials and functions, not only
over finite fields, but over other fields including R. The confusion here is not attributable
to characteristic; it comes down to a distinction between finite and infinite ground fields.
In a nutshell,
Over an infinite field, there are more functions than polynomials;
over a finite field, there are more polynomials than functions.
Let’s make sense of this synopsis.
If F is any field, then the set of all functions F → F is an algebra over F which we
may denote by F F . Sums, products and scalar multiples of functions are by pointwise
evaluation; thus for f, g ∈ F F , i.e. f, g : F → F , and scalars a, b ∈ F , the functions
f g : F → F and af + bg : F → F are defined by
(f g)(c) = f (c)g(c) and (af + bg)(c) = af (c) + bg(c) for all c ∈ F.
Now any polynomial f (x) ∈ F [x] yields a function f : F → F simply by evaluating the
polynomial at elements of F . But we must learn in general to distinguish the polynomial
f (x) ∈ F [x] from the resulting function F → F . When the field F is infinite, the func-
tions representable as polynomials (the so-called polynomial functions) form a proper
subalgebra of the algebra F F of all functions F → F . For example, the function R → R,
x 7→ sin x is not a polynomial function. One proves that over an infinite field F , the
polynomial functions form a proper subalgebra of F F , isomorphic to F [x]. This is not an
immediate consequence of the definitions; but we will outline a couple of ways to prove it
(below).
When the field F is finite, say |F | = q, then every function F → F is representable
by a polynomial, but in more than one way. Indeed, there are infinitely many polynomials
((q−1)q n distinct polynomials of degree n, for each degree n > 0; plus the zero polynomial)
but only finitely many functions (|F F | = q q in fact).
To make sense of this, observe that for an arbitrary field F (finite or infinite), every
polynomial f (x) ∈ F [x] can be used to represent a function f ∈ F F , i.e. f : F → F . This
gives a map which we shall denote by
(3.9) θ : F [x] → F F , f (x) 7→ f .
Now consider f (x) ∈ ker θ, i.e. f (a) = 0 for all a ∈ F . If a1 , a2 , . . . , an ∈ F are distinct,
then repeated application of the Division Algorithm shows that f (x) is divisible by (x −
a1 )(x − a2 ) · · · (x − an ) in F [x]; so either f (x) = 0 or deg f (x) > n. It follows that for
3. Finite Fields
F infinite, ker θ = 0, i.e. θ is injective and so each polynomial can be identified with
the polynomial function that it represents. On the other hand if |F | = q, we see that
Q
ker θ ⊂ F [x] is the principal ideal generated by a∈F (x − a). This polynomial of degree
|F | = q must actually equal xq −x ∈ F [x] which is the unique monic polynomial of degree q
having all elements of F as roots, by Theorems 3.7 and 3.8. In summary, we obtain
As an alternative to the argument above using the Division Algorithm, one can obtain the
isomorphism using Lagrange interpolation, which is described as follows.
of degree n − 1. Then
(a) fi (aj ) = δi,j .
(b) f1 (x), f2 (x), . . . , fn (x) form a basis for the n-dimensional subspace of F [x] con-
sisting of all polynomials of degree < n.
(c) Given b1 , b2 , . . . , bn ∈ F (not necessarily distinct), the unique polynomial of
degree < n whose graph passes through the n points (a1 , b1 ), (a2 , b2 ), . . . , (an , bn )
∈ F 2 is a1 f1 (x) + a2 f2 (x) + · · · + an fn (x).
The proof of Theorem 3.12 is elementary; and the basis in 3.12(b) is the Lagrange inter-
polation basis. For |F | = q, it shows that every f : F → F is represented by a unique
polynomial of degree < q. This gives an alternative proof of Theorem 3.11.
A third proof of Theorem 3.11 is based on the observation that the problem of con-
structing a polynomial f (x) ∈ F [x] whose graph passes through n pairs (ai , bi ) as in
Theorem 3.12, is equivalent to solving a linear system of n equations in n unknowns. The
n × n coefficient matrix of this system is of Vandermonde form and is well known to be
invertible. This gives a unique interpolating polynomial.
A final word about obtaining functions from polynomials: Given a polynomial f (x) ∈
F [x], in addition to the function f : F → F represented by f (x), one also obtains a
function E → E for every extension field E ⊇ F . Thus for example, every f (x) ∈ Q[x]
3. Finite Fields
Let F = Fq . Denote by nq,d (or simply nd , since q will usually remain unchanged
throughout our discussion) the number of monic irreducible polynomials f (x) ∈ F [x] of
degree d > 1. In the construction of finite fields, we make essential use of the fact that
nd > 1 for all d. Here we give a formula for nd , from which it follows that for a monic
polynomial f (x) ∈ F [x] of degree d chosen uniformly at random, the probability that f (x)
is irreducible is asymptotically d1 as d → ∞. The total number of elements in Fqk is
X
qk = dnd
d|k
since each α ∈ Fqk is algebraic of some degree d k; and the minimal polynomial of α
over F has d distinct roots in this extension. It follows (by induction on d, or by Möbius
inversion, or by inclusion-exclusion; cf. Section 1) that
k
X X
k
qd =
knk = µ d µ(d)q d .
d|k d|k
1X k qk Y 1 qk
(ii) nk = µ(d)q d = 1 − (1− 1 )k ∼ as k → ∞.
k k q r k
d|k prime r |k
Notice, by the way, that this formula asserts that nk > 0 for all k > 1. (However, our
argument cannot be construed as proof of the existence of irreducible polynomials of every
degree, if one first assumes the existence of Fkq ∼
= Fq [x]/(m(x)) obtained using an irreducible
polynomial m(x) ∈ Fq [x] of degree k, as this would constitute circular reasoning. One
could however obtain the same formula by counting polynomials of each degree instead of
counting elements of each degree.)
3. Finite Fields
The following result will be required in Sections 13 and 14. It is a fundamental result
in finite projective geometry (see [M4]); but in keeping with the scope of this course, here
we state and prove it in the affine setting.
Proof. Clearly (i) implies (ii). Points of intersection of the graphs of y = f (x) and
y = mx + k (m, k ∈ F ) are found by first solving ax2 + bx + c = mx + k for x; this has at
most two solutions for x ∈ F and hence at most two points of intersection (x, y).
Conversely, suppose f satisfies (ii). The following observation will be useful later.
Each point P = (x0 , f (x0 )) lies on q−1 secant lines passing through the other points
(x, f (x)) of Γf , x 6= x0 ; and by (ii), these q−1 secants are necessarily distinct. Since there
are exactly q non-vertical lines through P (corresponding to the q choices from F for the
slope), each point P ∈ Γf lies on a unique tangent line, this being the unique non-vertical
line intersecting Γf in the unique point P . Now Γf has q tangent lines, and it is not hard
to see that any two tangent lines differ in slope. (Consider the tangent line ` through a
point P ∈ Γf , and let Q 6= P be another point of `. Since |Γf | = q is odd and every line
through Q meets Γf in 0, 1 or 2 points, there must be an even number of tangent lines
passing through Q (remembering that the vertical line through Q is not considered here
as a tangent line). Since Q already lies on `, each of the q−1 points Q 6= P on ` must
lie on at least one additional tangent line other than `. Since each of the q−1 tangent
lines `0 6= ` meets ` in a single point (since `0 and ` have different slope), the Pigeonhole
Principle shows that each point Q 6= P on ` lies on a unique tangent line other than P .
This verifies our claim that no point of the plane F 2 lies on more than two tangent lines.
At this point we prove the following fact, which is usually known as the Lemma of
Tangents.
3. Finite Fields
odd (see Exercise #2 regarding the situation in even characteristic). Put another way, in
fields of even characteristic, no arithmetic progression can have more than two distinct
terms.
Proof of (3.16). By (3.15), there is no loss of generality in assuming that the points
A and B have coordinates (0, 0) and (1, 0) respectively. (The transformations in (3.15)
not only preserve the properties (i) and (ii); they also preserve the property described
by the conclusion of (3.16).) Denote by f 0 (x) the slope of the tangent line to Γf at the
point (x, f (x)). (Of course f 0 is simply a convenient name to use here; it is not intended
to connote differentiation. Note that f 0 (0) 6= 0 since the tangent line at A cannot pass
through B; and likewise, f 0 (1) 6= 0.) The points P (x, f (x)) ∈ Γf other than A, B are
indexed by the values x ∈ F , x 6= 0, 1; and each such point determines two secant lines
P A, P B with nonzero slopes f (x) f (x)
x and x−1 respectively. Now consider the product
Y slope of P A
Π=
P ∈Γ
slope of P B
f
P 6=A,B
having q−2 factors in the numerator, including all of the nonzero elements of F except
f 0 (0) (since only three lines through A have been omitted: the horizontal and vertical line,
and the tangent line). Similarly, the denominator of Π includes all of the nonzero elements
of F except f 0 (1) (the slope of the tangent line through B). After cancelling like factors,
0
we obtain Π = ff 0 (1) (0) . However, the preceding expressions for the slopes of P A and P B
give
Y slope of P A Y f (x)/x Y x−1
Π= = = = −1
P ∈Γ
slope of P B x∈F
f (x)/(x−1) x∈F
x
f
P 6=A,B x6=0,1 x6=0,1
since in the latter product, the only element of F × omitted in the numerator is −1,
whereas the denominator omits only 1 as a factor. Equating these two expressions for Π
yields f 0 (0) = −f 0 (1) = m for some m ∈ F × . This means that M is the point 12 , m
2 ,
which completes the proof of (3.16).
Now to complete the proof of Theorem 3.14, still assuming (ii) holds, we fix two points
A, B in Γf . We will henceforth assume (without loss of generality, using (3.15)) that these
3. Finite Fields
are the points (−1, 1) and (1, 1) respectively; and that the point
M where their tangents meet is the point (0, −1). (Recall
that −1 6= 1 since q is odd. These coordinates differ from
the choices in our proof of (3.16); but our new choices benefit
from a different use of symmetry.) Consider an arbitrary point
P (x, f (x)) ∈ Γf distinct from A and B. Denote by Q and
R the points where the tangent at P meets the tangent lines
through A and B, respectively. By (3.15), these points have
coordinates Q = x−1 x+1
2 , u and R = 2 , v for some u, v ∈ F . Since A, M, Q are collinear,
we must have u = −x; and collinearity of B, M, R requires v = x. Finally, the collinearity
of P, Q, R requires f (x) = x2 .
We remark that in the projective setting, the three lines AR, BQ and P M all pass
through a common point O; and this assertion is the more general form of the Lemma
of Tangents. In affine coordinates as above, assuming x2 − x + 1 6= 0, one finds that
O = 2(xx(x+1) 1
e
2 −x+1) , 2(x2 −x+1) . This is not a problem except when q = 3 and x = −1; or
when q ≡ 1 mod 6 and x is a primitive sixth root of unity in F (see Exercise #10.2). In
these cases O lies ‘at infinity’ and then the appropriate affine description is that the lines
AR, BQ and P M are mutually parallel. The traditional proof of Segre’s Theorem includes
the full version of the Lemma of Tangents, as is most natural in the projective setting.
But our proof above suffices because the complete statement of the Lemma of Tangents is
. . . er . . . tangential to the immediate goal of proving Segre’s Theorem. While (3.15) is
itself a special case of the Lemma of Tangents, this however is easily stated in affine form.
Since we will use of Segre’s Theorem in Section 13 to prove something about cyclotomic
fields, it is appropriate to question whether the geometric terminology of Segre’s Theorem
should be required to prove an essentially algebraic fact. Our personal view is that the
geometric language and figures used here provide a conceptual aid which is surely helpful
in following the proof. While the pictures might not be strictly necessary, the finiteness
of F (enabling us to use the Pigeonhole Principle) was indispensable.
Exercises 3.
1. (a) The proof of (3.16) involves the product of all q−1 nonzero elements of the finite field F = Fq .
Prove that this product equals −1. (Our proof did not require the explicit value of this
product, becauseQalmost all factors in the top and bottom of Π were cancelled.) Hint: For
each factor x in x6=0 x, the factor x−1 also appears.
(b) As a corollary, obtain Wilson’s Theorem: (p − 1)! ≡ −1 mod p for every prime p.
4. Cyclotomic Fields and Integers
2. Let F be a finite field of even order q = 2e . (Keep in mind that for fields of even order, the map
√
a 7→ a2 is an automorphism, by Theorem 3.7. Its √ inverse, a 7→ √a, is√also an automorphism.
√ √ √
Thus every element has a unique square root; and a + b = a + b, ab = a b.) Consider
the function f : F → F defined by f (a) = a1 , if a 6= 0; f (0) = 0.
(a) Show that no three points of the graph of f are collinear. (Compare with Theorem 3.14(ii).)
(b) Show that the graph of f is represented by a quadratic polynomial (as in Theorem 3.14(i)) for
q ∈ {2, 4}, but not for q > 8. This shows the failure of Segre’s Theorem in even characteristic.
3. Consider an extension E ⊇ F of finite fields of odd order, and consider a nonzero element a ∈ E × .
Show that a is a square in E iff its norm NE/F (a) is a square in F . (‘Square in E’ means an
element of the form b2 , b ∈ E. ‘Square in F ’ means an element of the form b2 , b ∈ F .)
7. Give explicit formulas for nq,d , the number of irreducible polynomials in Fq [x] of degree d ∈
{1, 2, 3, 4, 5}.
known as the Kronecker-Weber Theorem, indicates clearly the very special nature of
cyclotomic extensions; but its proof is beyond the scope of our course. See [Wa], [L1] for
details.
An important consequence of the fact that cyclotomic extensions are abelian, is
Be warned that this is a special property not valid in most Galois extensions! See Examples
A5.8 and A5.10, for two instances where στ 6= τ σ; and τ is complex conjugation in each
of those examples.
Proof of Theorem 4.1. Denote by τ ∈ Aut E the complex conjugation map τ (z) = z.
Since Aut E is abelian, we have στ (z) = τ σ(z). (In fact, τ = σ−1 . So if σ = σk , then
στ = σ−k = τ σ.)
Theorem 4.2. The roots of unity in Q[ζn ] form a cyclic group hζn i of order n, if n
is even; or hζ2n i of order 2n if n is odd.
Proof. Without loss of generality, n is even; since for odd n, the field Q[ζn ] already con-
tains a primitive 2n-th root of unity −ζn . Now given a primitive m-th root of unity ζm in
Q[ζn ], we must show that ζm is a power of ζn , i.e. that m divides n. Suppose not; then there
exists a prime power pd , p prime, d > 1, dividing m but not dividing n. Without loss of
m/pd
generality m = pd ; otherwise replace ζm by ζm , a primitive pd -th root of unity in Q[ζn ].
Let n0 = lcm(m, n) = pd−k n where pk is the largest power of p dividing n, and d − k > 1.
Clearly ζm ζn is a primitive n0 -th root of unity in E = Q[ζn ], so that E ⊇ Q[ζn0 ] ⊇ Q and
φ(n0 )|φ(n). However, φ(n0 ) = pd−k φ(n) > φ(n) by Section 1, a contradiction.
Since Q[ζn ] = Q[ζ2n ] whenever n is odd (where we may take ζ2n = −ζn ), and in order to
avoid the exceptional alternative of Theorem 4.2, we will often want to assume that n is
even.
4. Cyclotomic Fields and Integers
Proof. Since r and s are relatively prime, ζ r and ζ s are primitive n-th roots of unity, so
each of them is a power of the other. (We must remark that ζ r 6= 1 and ζ s 6= 1 since
n > 2.) In particular, ζ s = (ζ r )k for some integer k. Thus
1 − ζ s = 1 − ζ kr = (1 − ζ r )(1 + ζ r + ζ 2r + · · · + ζ (k−1)r ) = (1 − ζ r )u
We have mentioned the following result, found in [Wa, Theorem 2.6] with proof relying
on [L2, p.68]. We give the proof only in the important case that n is prime.
α1 1 ζ ζ2 ··· ζ p−2 a0 a0
α2 1 ζ2 ζ4 ··· ζ 2(p−2) a1 a1
α3 = 1 ζ3 ζ6 ··· ζ 3(p−2) a2 a2
= M
.. .
. .. .. .. .. .. ..
. . . . . .
. .
αp−1 1 ζ p−1 ζ 2(p−1) ··· ζ (p−2)(p−1) ap−2 ap−2
in which the coefficient matrix M is a Vandermonde matrix of order p−1. Its determinant,
as found by a well-known formula, is
Y
det M = (ζ j − ζ i ) = u(1 − ζ)k
16i<j6p−1
p−1
for some u ∈ O× by Theorem 4.3 and k =
2 > 0. Now
a0 α1
a1 α
. = M −1 .2
.. ..
ap−2 αp−1
so each ai = a0i /εk for some a0i ∈ O. Since p = εp−1 , there exists a positive integer m such
that pm ai ∈ O for all i ∈ {0, 1, 2, . . . , p−2}. However, pm ai ∈ Q. By Theorem A3.2(ii),
pm ai ∈ Z and so pm α ∈ Z[ζ] = Z[1−ζ] = Z[ε]; we may write
We need to show that α ∈ Z[ζ] = Z[ε], so without loss of generality m > 1 and suppose at
least one of the bi is not divisible by p; we seek a contradiction. Let k ∈ {0, 1, 2, . . . , p−2}
Pp−2
be minimal such that bk 6≡ 0 mod p. Note that i=0 bi εi ∈ / (ε)k+1 since bk εk ∈
/ (ε)k+1 but
all other terms lie in (ε)k+1 . However, pm α ∈ (p) ⊆ (ε)k+1 , a contradiction as desired.
Discriminants and their use are discussed in Appendix A3. For the prime p = 2, note
that Z[ζ2 ] = Z[−1] = Z which has discriminant 1. The discriminant of Z[ζn ] for general
n > 2 is
nφ(n)
(−1)φ(n)/2 Q φ(n)/(p−1)
;
p|n p
Proof. For k 6≡ 0 mod p, ζ k is a primitive p-th root of unity, and its algebraic conjugates
are ζ, ζ 2 , . . . , ζ p−1 so TrE/Q(ζ k ) = ζ + ζ 2 + · · · + ζ p−1 = −1; whereas if k ≡ 0 mod p, then
ζ k = 1 and TrE/Q(ζ k ) = 1+1+· · ·+1 = p−1. In view of the relation 1+ζ+ζ 2 +· · ·+ζ p−1 = 0,
any p − 1 of the elements 1, ζ, ζ 2 , . . . , ζ p−1 form a base for Z[ζ] over Z. It is convenient for
us to use ζ, ζ 2 , ζ 3 , . . . , ζ p−1 as our choice of base. As described in Appendix A3,
−1 −1 −1 · · · −1 −1
−1 −1 −1 · · · −1 p−1
−1 −1 −1 · · · p−1 −1
= det .. .
.. .
.. . . . (p−1) × (p − 1) matrix
. .. .. ..
−1 −1 p−1 · · · −1 −1
−1 p−1 −1 · · · −1 −1
−1 −1 −1 · · · −1 −1
0 0 0 ··· 0 p
0 0 0 ··· p 0
= det . . . . . . = (−1) p−12 pp−2 .
. . . . . .
. . . . . .
0 0 p ··· 0 0
0 p 0 ··· 0 0
Since the only rational prime dividing the discriminant is p, the result follows from Ap-
pendix A3 (the remarks about ramification following Theorem A3.10).
Example 4.7: √ The Cyclotomic Field Q[ζ3 ]. Let E = Q[ζ3 ]. Our primitive cube root of unity is
ζ3 = 12 (−1 + −3). By Theorem 4.6, E has discriminant −3. Since E ⊃ Q is an imaginary quadratic
extension, this is covered by Example A3.5 where the same value −3 is found for the discriminant.
See also Example A3.8 regarding this extension.
The maximal real subfield of Q[ζ] is the subfield R ∩ Q[ζ]. It is often denoted by
+
Q[ζ] .
Note that α1 = α; and for each k in the indicated interval, the binomial expansion gives
X k
αk−2i , for k odd;
i
06i6 k−1
2
αk =
k X k
+ αk−2i , for k even.
k/2 i
k−2
06i6 2
Induction shows that {α0 , α1 , . . . , αk } and {α0 , α1 , α2 , . . . , αk } span the same Q-subspace
of Q[α] for each k. In fact by the relations above, the change-of-basis matrix between the
two bases is upper triangular. Indeed, since the change-of-basis matrix has integer entries
with 1’s on the main diagonal, it follows that {α0 , α1 , . . . , αk } and {α0 , α1 , α2 , . . . , αk }
span the same Z-submodule of Z[α] for each k.
Now suppose β ∈ Q[α] is an algebraic integer; we must show that β ∈ Z[α]. Simply
express
NP −1 NP−1
β = bk αk = b0 + bk (ζ k +ζ −k )
k=0 k=1
1
where bk ∈ Q for k = 0, 1, 2, . . . , N −1; N = 2 φ(n)−1. In order that β be an algebraic
integer, Theorem 4.5 requires each bk ∈ Z; but then the observations above indicate that
β is a Z-linear combination of α0 , α1 , α2 , . . . , αN −1 , i.e. β ∈ Z[α].
Example 4.9: The Maximal Real Subfield of Q[ζ7 ]. Let ζ = ζ7 , α = ζ+ζ −1 , α2 = ζ 2 +ζ −2 +2,
α3 = ζ 3 +ζ −3 + 3(ζ+ζ −1 ). We have
α3 +α2 −2α2 −1 = ζ 3 +ζ −3 +3(ζ+ζ −1 ) + ζ 2 +ζ −2 +2 − 2(ζ+ζ −1 ) − 1
= ζ 3 +ζ −3 + ζ 2 +ζ −2 + ζ + ζ −1 + 1 = 0
so α is a root of f (x) = x3 + x2 − 2x − 1 ∈ Z[x]. Since the degree of f (x) is 12 φ(7) = 3, this is
the minimal polynomial of α. Allowing ζ to vary over the primitive seventh roots of unity in C, we
obtain the roots of f (x) as
αk = e2kπi/7 +e−2kπi/7 = 2 cos 2kπ
7
, k = 1, 2, 3.
Theorem 4.10. Let α ∈ I. Then α is a root of unity iff every algebraic conjugate
of α has absolute value 1.
in particular br,0 = 1, br,1 = −α1r − α2r − · · · − αnr , and br,n = (−1)n α1r α2r · · · αnr . Since each
|αi | = 1, we obtain the bounds
n
br,0 = 1, |br,n | = 1, and |br,k | 6 for each k.
k
Each σ ∈ Aut E permutes α1 , . . . , αn and so must satisfy σ(br,k ) = br,k for all r, k. By
Galois theory, the fixed field of Aut E is Q; but clearly each br,k is an algebraic inte-
ger, so br,k ∈ I ∩ Q = Z by Theorem A3.2(ii). Thus gr (x) ∈ Z[x] for each r > 1. But
there are only 2 nk + 1 choices of integer br,k satisfying the bounds indicated above for
k = 1, 2, . . . , n−1 (also two choices of br,n = ±1, and the single choice br,0 = 1), hence at
Qn−1
most N := 2 k=1 2 nk + 1 possibilities for each such polynomial gr (x). The set of all
distinct roots of all the polynomials in our family {g1 (x), g2 (x), g3 (x), . . .} is therefore a
set of cardinality at most nN ; and all distinct powers αr for r > 1 must lie in this finite
set. Evidently the powers αr cannot all be distinct, so α is a root of unity.
We can finally say something more about the group of units O× of E. By Dirichlet’s
Theorem A3.6, O× ∼ = {roots of unity} × Zr+s−1 where E has r embeddings in R and 2s
pairs of complex conjugate embeddings in C. By Theorem 4.2, we have a firm grasp on the
first factor, consisting of the roots of unity (the torsion subgroup of O× ). For the second
factor, the maximal free abelian subgroup of O, note that E := Q[ζn ] has no embeddings
in R for n > 2; and it has N := 12 φ(n) pairs (under complex conjugation) of embeddings
× ∼
in C. So OE = hζn i × ZN −1 , assuming n > 2 is even. Now the maximal real subfield
F := Q[α] ⊂ E, α = ζ + ζ −1 , has N real embeddings and no complex embeddings outside
of R. Its roots of unity are just ±1, so its group of units satisfies OF× ∼
= h−1i × ZN −1 . This
means that the subgroup OF× ⊆ OE × ×
has finite index [OE : OF× ] < ∞. In other words, if we
can identify the units in the maximal real subfield F , we shall have found ‘almost all’ the
units in E.
×
A full enumeration of the unit group OE is beyond the scope of this course; details
can be found in [Wa, Chapter 8]. But for the important case of n = p prime we have:
Theorem 4.11. Let E = Q[ζ], ζ = ζp , p an odd prime; and let F = Q[α] be its
×
maximal real subfield, α = ζ + ζ −1 . Then every unit u ∈ OE has the form u = ζ r u+
for some r ∈ {0, 1, 2, . . . , p−1} and some real unit u+ ∈ OF× .
× × ×
Proof. Since u ∈ OE , we have u ∈ OE . Let β = u/u, and note that β ∈ OE . By
Theorem 4.1,
u u
|σ(β)|2 = σ(β)σ(β) = σ(β)σ(β) = σ(ββ) = σ = σ(1) = 1
uu
4. Cyclotomic Fields and Integers
2u = u − ζ k u ≡ u − u ≡ 0 mod(ε),
i.e. 2u = εv for some v ∈ OE . But then taking norms, 2p−1 NE/Q(u) = p NE/Q(v) where all
values of the norm are integers. But NE/Q(u) = 1 since u is a unit, so 2p−1 is divisible by
p, a contradiction.
Thus β = ζ k for some k ∈ Z/pZ. Since p is odd, we have k = 2r for some r ∈ Z/pZ.
Let u+ = ζ −r u, so that u+ is a unit. Also u2+ = ζ −2r u2 = ζ −2r u·uζ k = uu = |u|2 , which
is a positive real number; so u+ ∈ R. This means that u+ ∈ E ∩ R = F , the maximal real
subfield of E. Moreover, u = ζ r u+ as required.
Of course, we will not find these observations to be of much use in identifying the
group of units of E, unless we can first find the group of units of F . Fortunately, however,
the units of Theorem 4.3 are sufficient to generate almost the full group of units (meaning
r
a subgroup of finite index in the full group of units). Although the ratios 1−ζ 1−ζ s are not
generally real, they do provide generators of OF× after factoring out roots of unity; see
Example 4.9. More explicitly, let n > 2, and for convenience denote by ζ2n a primitive
2
2n-th root of unity satisfying ζ2n = ζn . Also let r, s be integers relatively prime to n. Then
Theorem 4.3 gives us units in E of the form
−r 2rπ r−s sin rπ
1 − ζnr 2r
1 − ζ2n r
ζ2n r
(ζ2n − ζ2n ) r−s sin 2n n ×
= = = ζ = ζ sπ ∈ OE ,
2
2s −s 2n 2sπ n
1 − ζns 1 − ζ2n s s
ζ2n (ζ2n − ζ2n ) sin 2n
sin n
Example 4.12: Q[ζ23 ] is neither a PID nor a UFD. The field E = Q[ζ23 ] does not have unique
factorization. To see this, by Theorem √ A3.14 it suffices to find an ideal in OE = Z[ζ], ζ = ζ23 ,
which is√not principal. First note that −23 ∈ E, which follows √ from Theorem 10.13. The subfield
F = Q[ −23] ⊂ E has as its ring of integers OF = Z[θ], θ = 12 (1+ −23) by Example A3.5. Consider
the ideals p = (2, θ) ⊂ OF (note: p = 2OF +θOF ) and p = (2, θ). We have 2 = (−2)2 + θθ ∈ pp, and
the reverse containment follows from (2a+bθ)(2c+dθ) = 2(2ac+adθ+bcθ+3bd) ∈ (2). Thus (2) = pp.
Now the norm map on OF is defined by N(a+bθ) = (a+bθ)(a+bθ) = a2 + ab + 6b2 . In particular,
N(p) N(p) = N(2) = 4 and since N(p) = N(p) by algebraic conjugation, we must have N(p)=N(p)=2.
Evidently the ideals p and p are nonprincipal; for example if p = (a+bθ), a, b ∈ Z, then a2 +ab+6b2 =
N(p) = 2; but this has no solution in integers. (If 2 = a2 +ab+6b2 = (a + 2b )2 + 23 4
b2 > 23
4
b2 , then we
2
must have b = 0; but then a = 2, a contradiction.) Now the extension E ⊃ F is Galois of degree
1
2
φ(23) = 11, with Galois group G = G(E/F ) = {ι, σ, σ 2 , . . . , σ 10 }. The ideal pOE ⊂ OE has prime
factorization of the form either
(i) pOE = P prime, NE/F(P) = 211 , σ(P) = P; or
(ii) pOE = Pσ(P)σ 2 (P) · · · σ 10 (P), NE/F(σ i (P)) = 2. We do not require (or assume) that the eleven
prime factors are distinct.
Suppose that P = πOE is principal. In case (i) this yields π ∈ F and p = P ∩ OF = πOF , a principal
ideal in OF , a contradiction. In case (ii),
10
Q 10
Q
pOE = σ(P) = σ(π) OE = NE/F(π)OE
i=0 i=0
where NE/F(π) ∈ F ; but then p = NE/F(π)OE ∩ OF = NE/F(π)OF is principal, a contradiction.
Exercises 4.
1. Let ζ = ζ2n , n > 3.
kπ Qn−1 kπ n
(a) Using the relation 2i sin n
= ζ k − ζ −k = ζ k (1 − ζ −2k ), show that k=1 sin n
= 2n−1
.
kπ
(b) Use (a) to show that sin n cannot always be an algebraic integer.
kπ
(c) Show that none of the values sin 7
, k = 1, 2, . . . , 6, are algebraic integers.
2. In the notation of Example 4.12, consider the nonprincipal prime ideal p = (2, θ) ⊂ OF . Show
that p3 ⊂ OF is principal. (Hint: Verify that 2−θ = 23 +22 θ +θ3 ∈ p3 . Argue that (2−θ) ⊆ p3 ;
then compare norms on both sides to obtain equality.)
3. Recall that 1 − ζp ∈ Z[ζp ] is irreducible for p prime (Theorem 4.4). The condition that p is prime
is strictly necessary here. Show, for example, that if ζ = ζ15 , then 1 − ζ is a unit in Z[ζ]. State
and prove a generalization of this fact.
5. Fermat’s Last Theorem
Fermat’s Last Theorem (FLT) is, of course, the statement that the equation xn +y n =
z n has no positive integer solutions for exponent n > 2. While interest in FLT has had
a profound impact on modern mathematics, we shall have nothing to say here about the
recognized proof of this theorem due to Wiles and others, since this involves a great many
topics beyond the scope of our course. Our purpose is rather to say something about the
role of cyclotomic fields in the earliest work on FLT. One might justifiably say that the
early development of the theory of cyclotomic fields was largely motivated by FLT; but by
the time we came to accept that other tools would be required to resolve FLT, the theory
and applications of cyclotomic fields have grown far beyond their original confines.
The role of cyclotomic fields in studying FLT is evident already in the smallest case
n = 3. Fermat claimed to have settled this case in his correspondence, although there is no
surviving copy of his proof. We presume it was along the lines of Euler’s later proof of 1770,
which is essentially the proof we give below. There is a surviving copy of Fermat’s proof
for the exponent n = 4, also using his ‘method of infinite descent’. While the case n = 3
is a little more involved, the case of prime exponent is more typical in its of cyclotomic
fields; so it is this case we have chosen to highlight here.
Proof. After replacing the integer z by −z, the equation to be solved takes the more
symmetrical form x3 + y 3 + z 3 = 0; here we may suppose there is a solution in nonzero
integers, and seek a contradiction. But it will be easier to prove a seemingly stronger
statement. Let O = Z[ω], the ring of Eisenstein integers, where ω = ζ3 . The ring O is
a UFD; see Corollary A3.16. Its group of units is the finite group of sixth roots of unity
O× = {±1, ±ω, ±ω 2 }; see Example A3.8. Now we suppose that
and we seek a contradiction. The contradiction to which we are led, is that any solution
of (5.2) leads to a smaller solution of (5.2) (this is Fermat’s ‘method of descent’). Here
‘smaller’ is in terms of the norms; and since (absolute values of) norms lie in N, this leads
to an infinite descending sequence in N, hence a contradiction. The advantage of (5.2) over
the original equation, is that at the descent step, we have a stronger inductive hypothesis
available (as well as a stronger conclusion to fulfill). Put another way, we may further
suppose that our solution of (5.2) is as small as possible, and obtain from this an outright
contradiction. In particular,
otherwise π divides the third member of the triple and then dividing by the common
factor π, we would obtain a smaller nonzero solution of (5.2).
A typical element z = a + bω ∈ O (a, b ∈ Z) has norm N(z) = zz = a2 − ab + b2 . The
irreducible element ε = 1 − ω has norm N(ε) = 3; and the rational prime 3 ramifies in O
as (3) = (ε)2 (see Theorem 4.4). Indeed, ε2 = 1 − 2ω + ω 2 = −3ω and (ε2 ) = (3) since −ω
is a unit.
Now it is profitable to consider possibilities for each of the variables x, y, z mod ε. We
have O/(ε) ∼= F3 . This follows directly from the formula N((ε)) = |N(ε)| = 3; but one can
also note that every x ∈ O can be expressed uniquely as x = a + bε; so x ≡ a ≡ 0, 1 or
2 mod ε (recall that 3 ≡ 0 mod ε). If x = 1 + bε then
Now consider (5.2) mod 9 and apply (5.4) to see that x, y, z must be congruent to 0, 1, −1
mod ε (in some order). Moreover if z ≡ ±1 mod ε then we must have u = ±1; but in this
case we may exchange z with either x or y to get x ≡ 1 mod ε, y ≡ −1 mod ε, z ≡ 0 mod ε.
Now (5.4) gives x3 + y 3 ≡ 0 mod 9, so z 3 ≡ 0 mod 9, i.e. z 3 ≡ 0 mod ε4 . Thus
(5.7) the factors x+y, x+ωy, x+ω 2y are, in some order, equal to u1 εα13 , u2 εα23
and u3 ε3k−2 α33 where ui ∈ O× and αi ∈ O are not divisible by ε.
Without loss of generality, it is the third factor x + ω 2y that is divisible by ε3k−2 ; other-
wise replace y by ω jy for some j, thereby cycling the three factors while preserving the
5. Fermat’s Last Theorem
conditions (5.5). Now ui ∈ {±1, ±ω, ±ω 2 }; but without loss of generality, ui ∈ hωi, since
any ‘−’ signs can be absorbed into αi . Therefore we may assume
and since αi ≡ ±1 mod ε, (5.4) gives αi3 ≡ ±1 mod 9. Evidently α13 ≡ 1 and α23 ≡
−1 mod 9 (or we reverse the roles of α1 and α2 so that this is the case) and j2 ≡ j1−1 mod 3.
Now
and so
Thus (α1 , α2 , εk−1 α3 ) solves the same equation as (x, y, z) in (5.2). By the remarks fol-
lowing (5.2), it suffices to show that the new solution is smaller in the sense of norm, than
the original solution; specifically, we show that
−uz 3 = x3 + y 3 = (x + y)(x + ωy)(x + ω 2y) = (ω j1 εα13 )(ω j2 εα23 )(ω j3 ε3k−2 α33 )
= ω j1 +j2 +j3 ε3 (α1 α2 εk−1 α3 )3 .
Taking norms of both sides in the extension Q[ω] ⊃ Q gives N(z)3 = ±27 N(α1 α2 εk−1 α3 )3
and so
0 < 3|N(α1 α2 εk−1 α3 )| = |N(z)| 6 |N(xyz)|
and (5.10) follows, completing the proof of Theorem 5.1.
Now consider the general case of FLT. Because Fermat himself proved his conjecture
for n = 4, clearly we can confine our attention to the case of prime exponent p > 3. So
5. Fermat’s Last Theorem
p−1
Y
p p
(5.11) x +y = (x + ζ i y) = z p .
i=0
This relation clearly begs us to compare prime factors on both sides, expecting to find that
apart from units and powers of ε = 1 − ζ, the factors x + ζ i y are pairwise relatively prime
pth powers. This is the key idea in the proof of Theorem 1.1; and it figures prominently
also for the classification of primitive Pythagorean triples (the case of exponent n = 2).
This plan, despite its merits, runs into difficulty in Z[ζp ], where unique factorization does
not hold in general. Some famous early attempts to prove FLT (including some by the
best mathematicians of the 19th century) foundered precisely by assuming Q[ζ] to be a
UFD in general. It is natural to speculate that Fermat himself fell prey to this fallacy,
although presumably we will never know this. It was largely to repair this defect that the
concept of ‘ideal’ was introduced (so named by Ernst Kummer whose work on cyclotomic
fields, together with Sophie Germain’s contributions, led the progress toward FLT during
the 19th century).
Experience has also shown that it is profitable to approach (5.11) in two separate
cases: (i) none of x, y, z are divisible by p; or (ii) exactly one of x, y, z is divisible by p.
Tradition refers to these two cases as the first case and the second case of FLT. The
following result concerns the first case; for the counterpart of this result in the second
case, also using the theory of cyclotomic fields, see [Wa, Chapter 9]. Refer to Appendix A3
regarding the class number of an extension.
Theorem 5.12. Suppose that p > 2 is a prime for which the class number of Z[ζ],
ζ =ζp , is not divisible by p. Then the equation xp +y p = z p has no solution in integers
x, y, z relatively prime to p.
Proof. The cases p = 3, 5 are easily disposed of, even without Theorem 1.1. For p = 3,
we have z 3 ≡ ±1 mod 9 whenever gcd(z, 3) = 1; and by this same fact, x3 + y 3 ≡ 0 or
±2 mod 9. So there are no solutions for p = 3 in the first case. Exactly the same reasoning
works for p = 5, by considering Fermat’s equation mod 25. Thus we may assume p > 7.
Let O = Z[ζ], ζ = ζp . Recall that the element ε = 1−ζ ∈ O is irreducible and Z∩εO = pZ;
see Theorem 4.4.
We claim that the ideals (x+ζ i y) ⊆ O are pairwise relatively prime for i = 0, 1, 2, . . . ,
p−1. Suppose that on the contrary, (x + ζ i y) and (x + ζ j y) have a common prime factor
P ⊂ O = Z[ζ], 0 6 i < j < p. Then (1 − ζ j−i )y = ζ −i [(x + ζ i y) − (x + ζ j y)] ∈ P; so by
5. Fermat’s Last Theorem
Theorem 4.3, either P = (1−ζ) = (ε) or y ∈ P. Similarly, (1−ζ j−i )x = (x+ζ j y)−ζ j−i (x+
ζ i y) ∈ P, so either P = (ε) or x ∈ P. We cannot have both x and y in P, otherwise
x, y are both divisible by the prime p0 satisfying p0 Z = P ∩ Z. Thus P = (ε) and z n is
divisible by (x + ζ i y) − (x + ζ j y) = ζ i (1 − ζ j−i )y which means that p z. This violates the
assumption that we are in the ‘first case’ of FLT. This proves our initial claim.
By considering prime ideal factors on both sides of (5.11), it follows easily that each
ideal (x + ζ i y) is itself the p-th power of an ideal: (x + ζ i y) = Bip for some ideal Bi ⊆ O.
Let h be the class number of O; so we have a principal ideal Bih = (ai ) for some ai ∈ O.
By hypothesis there exists a positive integers k, ` such that mp = kh + 1, so (x + ζ i y)m =
Bimp = Bikh+1 = (βi )k Bi . This implies that the ideal Bi ⊆ O is principal: Bi = (βi ),
βi ∈ O.
Now abbreviate B = B1 , β = β1 . We have (x + ζy) = B p = (β)p ; so by Theorem 4.11,
we have x+ζy = ζ k uβ p for some k ∈ {0, 1, 2, . . . , p−1} and a real unit u ∈ Z[ζ+ζ −1 ]. Also
Pp−2 Pp−2 Pp−2
writing β = i=0 bi ζ i , where bi ∈ Z, we have β p ≡ i=0 (bi ζ i )p ≡ i=0 bpi ≡ b mod p
for some b ∈ Z. Thus x + ζy ≡ ζ k ub mod p; and after complex conjugation, x + ζ −1 y ≡
ζ −k ub mod p. Since u is a unit and b an integer, we get
If the powers 1, ζ, ζ 2k , ζ 2k−1 are distinct, (5.13) gives a contradiction, as we now show.
Any p − 1 of the distinct powers 1, ζ, ζ 2 , . . . , ζ p−1 form a basis for Q[ζ] over Q. Since
p > 7 (the case to which we reduced at the outset), we may choose such a basis containing
1, ζ, ζ 2k , ζ 2k−1 . Here we assume for the sake of argument that 1, ζ, ζ 2k , ζ 2k−1 are distinct
members of {1, ζ, ζ 2 , . . . , ζ p−2 }; and if this is not the case, the choice of basis can be
adapted accordingly. Then by (5.13) we have
may therefore assume x 6≡ y mod p and move the third variable z to the right side of
Fermat’s equation.
This completes the proof of Theorem 5.12.
A prime p is called regular if the class number of Z[ζp ] is not divisible by p; otherwise
p is irregular. The first few irregular primes are 37, 59, 67, 101, 103, etc.; these are the
primes for which the hypotheses of Theorem 5.12 fail. Reasonable heuristics, backed by
computational evidence, support the conjecture that a proportion e−1/2 ≈ 61% of primes
are regular; so it might seem that the theory of cyclotomic fields solves FLT for ‘most’
prime exponents. However, it is not known that there are infinitely many regular primes.
(Curiously, the set of irregular primes is known to be infinite, despite being apparently
less dense than the sequence of regular primes.) Fortunately there is a test for regularity
of primes which is (at least conceptually) rather explicit.
The sequence of Bernoulli numbers B0 , B1 , B2 , B3 , . . . given by
1, − 21 , 1
6,
1
0, − 30 1
, 0, − 42 , ...
It is easily shown that Bn = 0 for even integers n > 4; and that the nonzero Bernoulli
numbers alternate in sign. Among the many uses of Bernoulli numbers, we mention
(i) an exact formula for certain special values of the Riemann zeta function:
22k−1 π 2k
ζ(2k) = (−1)k+1 B2k , k = 1, 2, 3, . . . ;
(2k)!
(ii) a formula expressing the sum of the kth powers of the first n positive integer as a
polynomial in n of degree k + 1:
k
k k k 1 X k k k+1
1 +2 +3 +···+n = (−1) Bi nk+1−i .
k + 1 i=0 i
For example, 691, 3617 and 43867 (which are primes) must be irregular; and the first
irregular prime, 37, divides the numerator of
7709321041217 37·683·305065927
B32 = − =− .
510 2·3·5·17
Exercises 5.
1. Find a positive integer n for which the ring of cyclotomic integers Z[ζn ] contains nonzero solutions
of x3 + y 3 = z 3 .
We say that p sharply divides n, denoted p ||n, if p divides n, but p2 does not divide n.
2. Let p be an odd prime, and let x and y be relatively prime nonzero integers with x + y 6= 0.
(a) Show that gcd(x + y, (xp + y p )/(x + y)) = 1 or p.
(b) If p divides x + y, show that p also divides xp + y p and p sharply divides (xp + y p )/(x + y).
Hint: Let u = x + y. Simplify the expression ((u − y)p + y p )/u after first expanding (u − y)p by
the Binomial Theorem. At some point you may also want to recall Fermat’s Little Theorem.
3. Let p be an odd prime; and let x, y, z be pairwise relatively prime nonzero integers satisfying
xp + y p + z p = 0. Recall that p divides at most one of x, y, z.
(a) If p6 | z, show that x + y = ap and (xp + y p )/(x + y) = Ap for some integers a, A. (Hint: Use
#2.)
(b) In the first case of FLT, p6 | xyz. Argue as in (a) to obtain x + z = bp , y + z = cp and
2x = ap + bp − cp .
(c) If p|z then we are in the second case of FLT, and (a) fails. Show that in this case, we instead
obtain x + y = pp−1 ap and (xp + y p )/(x + y) = pAp for some integers a, A.
A pair of odd primes {p, q} such that q = 2p + 1 is a Sophie Germain pair of primes. It is
conjectured that there are infinitely many such pairs of primes.
4. Suppose we have a pair of Sophie Germain primes {p, q}, q = 2p + 1. As in the first case of FLT,
suppose that x, y, z are pairwise relatively prime integers satisfying xp + y p + z p = 0 with p6 | xyz.
(a) If q6 | x, show that xp ≡ ±1 mod q. Hint: Theorem 3.4.
(b) Show that q |xyz. Hence without loss of generality, we assume q |x.
(c) Show that q |abc where a, b, c are as in #3.
(d) By considering in cases q |a, q |b or q |c, obtain a contradiction.
This solved the first case of FLT for many primes, and conjecturally an infinite class of primes. This
breakthrough of Sophie Germain was later generalized by Legendre, enabling all prime p < 100 to
be dealt with. Subsequent work of others in the 20th century, all based on Sophie Germain’s idea,
was able to prove the first case of FLT for an infinite class of primes (yet without proving there are
infinitely many Sophie Germain primes).
notational issue that arises when describing group rings. In Section 7, where we focus
on group rings, we will discuss the natural accommodations for dealing with additive
groups; but for now, there is no need to be distracted by this.) A character of G is a
homomorphism χ : G → C× ; thus χ(xy) = χ(x)χ(y) for all x, y ∈ G. (So χ is what would
be called a linear character in the larger world of group theory.) Since xn = 1 for all
x ∈ G (this being a special case of Lagrange’s Theorem), χ(x)n = χ(xn ) = χ(1) = 1 and
so all values of χ are complex roots of unity. In fact, all values of χ are complex m-th
roots of unity, where m is the exponent of G (this being the smallest positive integer m
such that xm = 1 for all x ∈ G). Note the exponent m of G divides n = |G|; and m = n
iff G is cyclic.
If χ, χ0 : G → C× are characters, then the product χχ0 : G → C× defined pointwise
by (χχ0 )(x) = χ(x)χ0 (x) is clearly a character. The trivial character (or principal
character) is the constant map G → {1} ⊂ C× . The multiplicative inverse of a character
χ is its complex conjugate χ(x) = χ(x)−1 (since the multiplicative inverse of every complex
root of unity is its complex conjugate). We see that the set of characters of G forms a
multiplicative group, which we call the dual group G.b
(6.2) c1 ×G
G c2 → G\
1 ×G2 , (χ1 , χ2 ) 7→ χ1 ×χ2
is a homomorphism. The kernel of (6.2) is trivial (for if χ1 ×χ2 (g1 , g2 ) = χ1 (g1 )χ2 (g2 ) = 1
for all (g1 , g2 ) ∈ G1 ×G2 , restricting to g1 = 1 or g2 = 1 yields both χ1 = 1 and χ2 = 1, the
trivial character on G1 and on G2 respectively). Finally, the map (6.2) is surjective. (For
if χ ∈ G\ 1 ×G2 , then restriction to the two factors gives homomorphisms χ1 (g1 ) = χ(g1 , 1)
and χ2 (g2 ) = χ(1, g2 ) and clearly χ1 ×χ2 = χ.) So the map (6.2) is an isomorphism, and
(a) follows.
For (b), we use the Fundamental Theorem of Finite Abelian Groups: Every finite
abelian group is isomorphic to a direct product of finite cyclic groups. Thus it suffices to
consider the case G is finite cyclic; and then (b) will follow in the general case using (a)
for the inductive step.
6. Characters of Finite Abelian Groups
∼
=
The isomorphism G −→ G b is not canonical . For example if G = hxi is cyclic of
√
order 4, the dual group G b = hχ1 i is also cyclic of order 4 where χ1 (x) = i = −1.
However there is no algebraic property distinguishing i from −i (the other principal fourth
root of unity in C). There are the two isomorphisms G → G, b one mapping x 7→ χ1 and the
other mapping x 7→ χ1 ; and there is no way to distinguish one of these as the ‘preferred’
isomorphism. The situation is quite like what we find in linear algebra: If V is a finite-
dimensional vector space over a field F , then the dual space V ∗ is a vector space over F
having the same dimension as V , so we must have V ∗ ∼ = V ; however there is no canonical
∼
= ∗
choice of vector space isomorphism V −→ V . Any choice of isomorphism requires that
we fix particular choices of ordered bases for V and for V ∗ .
The group algebra C[G] is the set of all formal linear combinations of elements of
G with complex coefficients. This is a complex vector space of dimension n = |G| having
G as basis; but it is also a ring with multiplication defined as in G, as extended uniquely
to C[G] by the distributive law. To be explicit, let α, β ∈ C[G] be given by
X X
α= ag g, β= bg g
g∈G g∈G
after substituting (x, y) = (gh−1 , h). Thus it is natural to consider the convolution of
two functions f1 , f2 : G → C; this is the function f1 ∗ f2 : G → C defined by
X
(f1 ∗ f2 )(g) = f1 (gh−1 )f2 (h).
h∈G
This works fine for a general finite group G, although here we only consider the case G is
abelian. Now if we define X
fb = f (g)g ∈ C[G]
g∈G
1 ∗ f2
fb1 fb2 = f\
holds in C[G].
We now consider three complex inner product spaces of dimension n = |G| = |G|,
b as
follows: The space L2 (G) consists of all functions G → C with inner product
X
[f1 , f2 ] = f1 (g)f2 (g).
g∈G
1 X
[F1 , F2 ] = F1 (χ)F2 (χ).
n
χ∈G
b
C[G]
where ι(f ) = fb as defined above. Each g ∈ G yields a natural function g ∗ : G
b → C, namely
the evaluation g ∗ (χ) = χ(g); and the map b ι : G → L2 (G),b g 7→ g ∗ has a unique linear
2 b
ι : C[G] → L (G). The Fourier transform is the map F = b
extension to b ι ◦ ι : L2 (G) →
L2 (G)
b defined as follows: Given f : G → C, the map Ff : G b → C is given by
X
(Ff )(χ) = f (g)χ(g).
g∈G
All three of these vector space isomorphisms are in fact isometries. To see for example
that F is an isometry, let f1 , f2 : G → C, so that
1 X 1 X X
[Ff1 , Ff2 ] = (Ff1 )(χ)(Ff2 )(χ) = f1 (g)χ(g)f2 (h)χ(h)
n n
χ∈G
b χ∈Gb g,h∈G
1 X X
= f1 (g)f2 (h)δg,h n = f1 (g)f2 (g) = [f1 , f2 ]
n
g,h∈G g∈G
using Theorem 6.3(b). The identity fb1 fb2 = f\ 1 ∗ f2 shows that ι is not only an isometry,
2
but also an algebra isomorphism from L (G) (under convolution) to C[G]. All three edges
of our commutative triangle are algebra isomorphisms if we endow L2 (G) b with pointwise
multiplication: for F1 , F2 : G → C, (F1 F2 )(χ) = F1 (χ)F2 (χ). Now given f1 , f2 : G → C
b
and χ ∈ G,
b we have
X X X
F(f1 ∗ f2 )(χ) = f\1 ∗ f2 (g)χ(g) = f1 (gh−1 )f2 (h)χ(g) = f1 (x)f2 (y)χ(xy)
g∈G g,h∈G x,y∈G
X X
= f1 (x)χ(x) f2 (y)χ(y) = (Ff1 )(χ)(Ff2 )(χ)
x∈G y∈G
Theorem 6.4. The three algebras L2 (G) (with convolution), L2 (G) b (with pointwise
multiplication) and C[G] are isometrically isomorphic. In particular,
C[G] ∼
= Cn = C ⊕ C ⊕ ··· ⊕ C
| {z }
n times
where Cn has coordinatewise ring operations and the standard complex inner prod-
uct.
6. Characters of Finite Abelian Groups
Expositions of this topic vary widely between different sources, both in the extent of
generality as well as in presentation style, depending on the author’s tastes; and I have
chosen the approach which best suits our particular theme and goals. Two directions in
which the Fourier transform generalizes are to the case G is nonabelian, and the case G is
infinite. When G is nonabelian, of course C[G] ∼ 6 Cn since the group algebra C[G] is no
=
longer commutative; and instead one finds that C[G] is isomorphic to a direct sum of full
matrix algebras. This is the realm of representation theory; see e.g. [Is], [Se]. Much of
this theory carries over in the infinite case, with more evident ease and success when G is a
compact topological group, particularly a Lie group; or an algebraic group. See e.g. [BD].
The group algebra formulation C[G] is not really suitable in the infinite case, and so this
member of the triangle is understandably absent, where one makes do with L2 (G) (as
an algebra under convolution) instead. See Exercise #5 for the standard example with
G = S1.
As an algebra, Cn has 2n idempotent elements (i.e. elements satisfying ε2 = ε);
these are the vectors having all components either 0 or 1. It also has exactly n primitive
idempotents, these being the standard basis vectors of Cn . (A primitive idempotent
is a nonzero idempotent which is not expressible as ε = ε1 + ε2 where ε1 and ε2 are
also nonzero idempotents.) Every idempotent is uniquely expressible as a sum of distinct
primitive idempotents. In view of the isomorphisms above, C[G] must also have a basis
consisting of n primitive idempotents.
1 1
P
Theorem 6.5. The primitive idempotents of C[G] are the elements nχb = n χ(g)g
g∈G
for χ ∈ G.
b
Proof. For χ, ψ ∈ G
b we have
X X X X
1 1 b
χ
n n
b ψ = 1
n2 χ(x)ψ(y)xy = 1
n2 χ(gh−1 )ψ(h)g = 1
n2 χ(g) χ(h)ψ(h) g
x,y∈G g,h∈G g∈G h∈G
X
1
= n2 χ(g)δχ,ψ ng = δχ,ψ n1 χ
b
g∈G
using Theorem 6.3(a). These relations uniquely characterize the n primitive idempotents
of C[G] ∼
= Cn .
P
Theorem 6.6. The n eigenvalues of Γ are the values χ(σ) = s∈S χ(s) for χ ∈ G.
b
We observe that these eigenvalues all lie in the ring Z[ζm ] where m is the exponent of G
since, as we previously observed, all character values lie in this ring. Cayley graphs (and
digraphs) of nonabelian finite groups also have eigenvalues consisting of cyclotomic integers
since all character values lie in Z[ζm ] (although characters are defined somewhat differently,
and the eigenvalues are computed by a rather more subtle process. In the case Γ is an
ordinary graph (i.e. s−1 ∈ S whenever s ∈ S), it is not hard to see that Theorem 6.6 gives
real eigenvalues as expected: characters whose values extend outside {±1} occur in complex
conjugate pairs, so their net contribution to the sum in Theorem 6.6 yields real values. So
−1
in the case of ordinary Cayley graphs, the eigenvalues of Γ lie in Q[ζm ] ∩ R = Z[ζm +ζm ].
Proof of Theorem 6.6. We show that the primitive idempotents of C[G] form a basis
consisting of eigenvectors for A, with the indicated eigenvalues. In place of the primitive
idempotents n1 χ b, χ ∈ G.
b we may of course use the scalar multiples χ b For each χ ∈ G,b
X X XX X
Ab
χ= s χ(g)g = χ(s−1 x)x = χ(σ) χ(x)x = χ(σ)b
χ.
s∈S g∈G s∈S x∈G x∈G
Error-Correcting Codes
A second application of group characters is to the theory of error-correcting codes. Let
F = Fq , and fix n > 1. The (Hamming) weight of a vector v = (v1 , v2 , . . . , vn ) ∈ F n ,
denoted by wt(v), is the number of nonzero coordinates in v; thus for example, the vector
(1, 0, 1, 1, 1, 0) ∈ F 6 has wt(v) = 4. A linear [n, k]-code is a k-dimensional subspace C 6
F n . Vectors in F n are words; and vectors in C are codewords. The weight distribution
of the code C is the sequence A0 , A1 , A2 , . . . , An where Ai = |{v ∈ C : wt(v) = i}|. The
minimum weight of C is the smallest d > 1 such that Ad 6= 0 (i.e. the smallest weight
of any nonzero codeword). By linearity, the minimum distance of C (the minimum
6. Characters of Finite Abelian Groups
number of coordinates in which two distinct codewords differ) coincides with the minimum
weight d.
Let us briefly summarize the key concepts of the theory of error-correcting codes
(although this does injustice to such an extensive subject!) Words of length n are vectors
v ∈ F n whose coordinates are regarded as sequences of n letter symbols from some finite
alphabet (in this case F ) used in information exchange. We may view C as the row space
of a k×n matrix B of full rank over F . The matrix B is the generator matrix of the code.
(It is usually denoted G; but we reserve ‘G’ for groups.) Now C = {uB : u ∈ F k } where
we regard each vector u ∈ F k as a plaintext message, and uB ∈ C as its corresponding
codeword. The isomorphism F k → C is the encoding process. The purpose of such
encoding is to protect against loss of information due to a limited number of errors during
transmission between two parties (a ‘transmitter’ T and a ‘receiver’ R) in a noisy channel.
Before transmitting the message u ∈ F k , T first encodes it as v = uB ∈ C and sends this
codeword. If R correctly receives v, all is well. If R instead receives v 0 ∈ F n , a slightly
corrupted version of v, then R can hope to correctly deduce the original transmitted word
v (and thereby u) if v ∈ C is the unique codeword satisfying wt(v − v 0 ) 6 e where e is
sufficiently small. Indeed if e = b d−1
2 c, then for every word w ∈ C, there is at most one
v ∈ C satisfying wt(w−v) 6 e. In this case, C is an e-error correcting code of length n.
The goal is to construct codes of a given length n over an alphabet of given size q, with
the number of codewords q k as large as possible (thereby ensuring a large information
rate), yet with minimum distance d as large as possible (thereby maximizing the error-
correcting capability e). These constraints, however, compete against each other; and a
great deal of mathematics is used to look for the optimal code for a given set of parameters
n, q, etc. There are other design considerations as well, which we have not mentioned.
In particular, why must the alphabet be a finite field F , and the code a subspace of F n ?
For some parameter sets there are in fact nonlinear codes (non-subspaces) which perform
slightly better than any linear code with the same parameters; but these are harder to
design and to work with. Moreover any code, in order to be of practical value, must admit
efficient algorithms for both encoding and decoding. Given these constraints, linear codes
are generally the best bet for information exchange.
In addition to the generator matrix B introduced above, every [n, k]-code over F can
also be defined as the null space of an (n − k) × n matrix H over F :
C = {v ∈ F n : Hv T = 0}.
Here H is the parity check matrix of C; it has full rank n − k. The parity check matrix
H is useful for decoding: R uses it to check whether a word v is a valid codeword. The
vector HwT is the error syndrome of the word w ∈ F n . If HwT = 0, then w is a certified
codeword; if HwT 6= 0, then the syndrome may yield useful information in locating the
codeword closest to w.
6. Characters of Finite Abelian Groups
Now the two matrices B and H play roles dual to each other: the row space of B is C,
while the row space of H is the dual code
Note that wv T is the usual ‘dot product’ of two row vectors v, w ∈ F n . While C is an
[n, k]-code with generator matrix B and parity check matrix H, the dual code C ⊥ is an
[n, n−k]-code with generator matrix H and parity check matrix B. There is also a relation-
ship between the weight distributions of these two codes (and in particular between their
minimum weights). Please note however that while C and C ⊥ must have complementary
dimensions, they are not in general complementary subspaces; see Example 6.7 below.
The relationship between the weight distribution of a code and its dual (Theorem 6.8)
is expressed most naturally in terms of their weight enumerators. For a code C as above,
the weight enumerator of C is the polynomial
X n
X
n−wt(v) wt(v)
AC (x, y) = x y = Ad xn−d y d ∈ Z[x, y].
v∈C d=0
Example 6.7: A Code and its Dual. Let q = 2, F = F2 , and let C < F 5 be the [5, 2]-code
spanned by 10111 and 01110. The dual code C ⊥ is the [5, 3]-code spanned by 10001, 01011 and
00110. Since
C = {00000, 10111, 01110, 11001}, C ⊥ = {00000, 10001, 01011, 00110, 11010, 10111, 01101, 11100}
we obtain AC (x, y) = x5 +2x2 y 3 +xy 4 , AC ⊥(x, y) = x5 +2x3 y 2 +4x2 y 3 +xy 4 . Following Theorem 6.8,
we obtain either weight enumerator from the other via
1 1
A (x+y, x−y) = AC ⊥(x, y);
4 C
A (x+y, x−y) = AC (x, y)
8 C⊥
which can be verified by direct computation. The code C has minimum distance 3, the best (largest)
possible for a [5, 2]-code over F2 . It is 1-error correcting. Note that the word 10111 lies in C ∩ C ⊥; so
although C and C ⊥ have complementary dimensions, they are not complementary subspaces.
T
/ C ⊥ (for such vectors v, the dot
P
since the inner sum u∈C χ(uv ) = 0 whenever v ∈
product uv T yields each value of F the same number of times); whereas for v ∈ C ⊥, we get
a constant value χ(uv T ) = 1 for all q k choices of u ∈ C. Now for v = (v1 , v2 , . . . , vn ) ∈ F n ,
we have
1, if vi 6= 0;
wt(v) = wt(v1 ) + wt(v2 ) + · · · + wt(vn ) where wt(vi ) =
0, if vi = 0
and so
X
g(u) = χ(u1 v1 +u2 v2 + · · · +un vn )xn−wt(u1 )−wt(u2 )−···−wt(un ) y wt(u1 )+wt(u2 )+···+wt(un )
v1 ,v2 ,...,vn ∈F
X
= χ(u1 v1 )x1−wt(v1 ) y wt(v1 ) χ(u2 v2 )x1−wt(v2 ) y wt(v2 ) · · · χ(un vn )x1−wt(vn ) y wt(vn )
v1 ,v2 ,...,vn ∈F
Yn X
= χ(ui vi )x1−wt(vi ) y wt(vi ) .
i=1 vi ∈F
Let G = Z/nZ (note: additive notation here). The dual group is G b = {χj : j ∈ G}
jk
where χj (k) = ζ . The Fourier transform of an arbitrary function f : G → C is F = Ff
where we abbreviate F (χj ) by F (j) to get
n−1
ζnjk f (k).
P P
F (j) = f (k)χj (k) =
k∈G k=0
Hn := ζnjk : 0 6 j, k < n
which goes by many names: Fourier matrix, character table of the cyclic group of order n,
generalized Sylvester matrix, etc. Of course it is also a particular Vandermonde matrix.
And we will meet this matrix in Section 16 as the most classical construction of complex
Hadamard matrix; hence our notation Hn for this matrix. A naive computer implemen-
tation of the Fourier transform entails multiplying Hn by a column vector in Cn . This
requires n2 product operations in C, plus several addition operations in C. Scalar addition
is much faster than scalar multiplication, so for simplicity we neglect them in saying
Assume for the moment that G has even order, say n = 2m. To improve upon (6.9),
represent f and F = Ff by column vectors as
f (0) f (1)
hf i f (2) f (3)
even f (4) f (5)
f ↔ where feven = , fodd = ;
fodd .. ..
. .
f (n−2) f (n−1)
and dually,
F (0) F (m)
h F i F (1) F (m+1)
top F (2) F (m+2)
F ↔ where Ftop = , Fbottom = .
Fbottom .. ..
. .
F (m−1) F (2m−1)
Note that we have indexed entries of F in the usual way; but the coordinates of f have
been indexed differently, starting with the even coordinates (expressing the restriction of
f to the subgroup of G of index 2), followed by the odd coordinates (where f is restricted
to the other coset of that subgroup). The reason we say ‘dually’, and why this is the right
thing to do, is that the vectors Ftop and Fbottom list the values of F on cosets of the subgroup
hχm i of order 2 in G.
b With respect to this indexing of the rows and columns, the Fourier
transform is takes the form
6. Characters of Finite Abelian Groups
jk
2 m−1
where Hm = ζm : j, k ∈ Z/mZ and Dm = diag(1, ζm , ζm , . . . , ζm ). Computing
2
Hm feven requires only m scalar multiplications, as does Hm fodd ; and then left-multiplication
by Dm requires an additional scalar multiplications. Once again, the faster operations of
scalar addition have been neglected here. Thus
Note that the improvement from (6.9) to (6.11) is a reduction in execution time, almost by
a factor of two. Similar gains are found using an arbitrary small prime divisor p n in place
of the prime 2. Now note that the main step in implementing (6.10) is the application of
Hm , which can be similarly reduced to Hm0 where m = 2m0 , or m = pm0 using another
small prime p dividing m, to obtain further speedup. Assuming the original n is a product
of small primes, iterating this reduction significantly improves execution time. Notably,
This is the idea of the FFT. Its applications are far too ubiquitous to be summarized here.
We content ourselves with describing two of the many applications of FFT.
in C[G] (reduced mod (xn − 1)) is the same as the answer in C[x]. The algorithm to
compute this product, improving upon the naive approach, is as follows.
(I) First compute fd (x) ∈ L2 (G).
b This is the Fourier transform of the sequence of
coefficients in f (x), requiring O(n log n) operations using FFT. Similarly compute
d ∈ L2 (G),
g(x) b which also requires O(n log n) operations.
(II) Multiply to obtain fd (x)g(x) b ∼
d in L2 (G) = Cn . This requires O(n) scalar multiplications
in C. Note that fd (x)g(x) \
d = f (x)g(x).
F F
a
...
........................................................................
fd
.
(x) L2 (G)
.
...................................................................
L2. (G)
b
.. ... ..
...
... ........... ... ...........
... ....
.
...
... ....
.
... . ... .
... ... ... ...
... ... ... ...
... ... in ... ...
ι ...
... .
...
..
ι ι ...
... .
...
..
ι
... ...
...
b ...
b
...
... .
.... ...
... .
....
.
............. .. .............. ..
... ...
.. ... .. ...
f (x) C[G]
We have described the final step (III) above as a Fourier transform. It is actually the
b given by F −1 : L2 (G)
inverse of the Fourier transform F : L2 (G) → L2 (G), b → L2 (G); but
b∼
since G = G (canonically), this is just the usual Fourier transform for the dual group G.b
−1 −1 ∗
The matrix expressing F as a linear transformation is Hn = Hn = Hn since Hn is
symmetric and unitary. Of course this is also a DFT; so it is efficiently computed using
FFT (but for the dual group, so its coefficients are the complex conjugates of those in the
FFT of step (I)).
Exercises 6.
1. (a) Prove that for any finite group G, convolution of functions G → C is associative; that is,
(f1 ∗ f2 ) ∗ f3 = f1 ∗ (f2 ∗ f3 ) for all functions f1 , f2 , f3 ∈ G. Can you obtain this result using
known properties of the group algebra C[G]? Explain.
(b) Is the set of functions G → C under convolution a group? Explain.
2. (a) Give an example of two finite abelian groups G1 and G2 of the same order, yet with G1 ∼
6= G2 .
(b) If G1 and G2 are as in (a), are the corresponding group algebras C[G1 ] and C[G2 ] isomorphic?
Explain.
3. Let F = F3 , and consider the linear [5, 2]-code C spanned by the vectors 10012 and 01211.
(a) Find a basis for the dual code C ⊥.
(b) Explicitly list all vectors in C and in C ⊥.
(c) From (b), write down the explicit weight enumerators for C and for C ⊥.
(d) By direct computation, verify that this example satisfies the MacWilliams relation (Theo-
rem 6.8).
4. Find an ordinary graph Γ (undirected, with no loops or multiple edges), as small as possible,
whose eigenvalues are not cyclotomic integers. Justify your answer.
Much of Section 6 generalizes to infinite groups; but this works best when G is a compact
topological group. For a group G to be a topological group, one requires that G also have
a topology compatible with the algebraic structure in the sense that the group multiplication
(g, h) 7→ gh and the inverse map g 7→ g −1 are continuous. Compactness and commutativity
together mean that we have a translation-invariant measure on G (Haar measure) with respect
to which we can integrate. Here we give only the group G = S 1 as an example; and we stop short
of providing a full account of the appropriate generalization of Theorem 6.4 to this situation.
7. Group Rings R[G]
5. Let G be the multiplicative group consisting of all z ∈ C such that |z| = 1. This is not a finite
group, but it is abelian. As a topological space, G is homeomorphic to a circle (and in particular,
Z Z 2π
dz 1
G is compact). For f1 , f2 : G → C, define [f1 , f2 ] = f1 (z)f2 (z) = f1 (eti )f2 (eti ) dt.
G iz 2π 0
The complex
p vector space L2 (G) consists of all integrable functions G → C having finite norm
||f || = [f, f ], but with two functions identified whenever they disagree on a set of measure zero.
The convolution of two such functions is defined by
Z Z 2π
dz 1
(f1 ∗ f2 )(w) = f1 (wz −1 )f2 (z) = f1 (we−ti )f2 (eti ) dt.
z∈G iz 2π 0
We have ||f1 ∗ f2 || < ∞ whenever ||fi || < ∞; so the space L2 (G) is closed under convolution (you
may assume this).
(a) Show that convolution is associative, i.e. (f1 ∗f2 )∗f3 = f1 ∗(f2 ∗f3 ) for all f1 , f2 , f3 ∈ L2 (G).
(b) Find an infinite cyclic group {χn : n ∈ Z} of homomorphisms χn : G → C× . (As homomor-
phisms of topological groups, the maps χn : G → C× are required to be continuous as well
as multiplicative.)
Section 6 includes a description of the group algebra C[G] of a finite abelian group G over
the complex numbers. Replacing the coefficient ring C by another choice of commutative
ring R with identity, one similarly obtains the group ring R[G] of G over R (or in the
case R is actually a field, the group algebra over R). As before, the group G is assumed
to be multiplicative. Despite conceptual convenience of complex number coefficients, C
suffers from some difficulties not evident with other rings. In particular, computer imple-
mentation of arithmetic in C[G] is fraught with difficulty due to numerical errors inherent
in floating point approximation; whereas arithmetic in Z[G], or even in Q[G], can be im-
plemented exactly if arbitrary precision arithmetic with coefficients is available—a realistic
expectation in many programming languages. For many applications, this is a serious con-
sideration. Another advantage of varying the coefficient ring R will appear below (see the
comments following Corollary 7.3). For our purposes, taking R to be an integral domain
(i.e. a commutative ring with identity having no zero divisors) is a quite adequate level of
generality.
Every character χ ∈ G b naturally extends to a homomorphism of algebras over Q given
by X X
χ : Q[G] → Q[ζm ], ag g 7→ ag χ(g)
g∈G g∈G
where m is the exponent of G (the least common multiple of the orders of the elements
of G). It is easy to verify the required properties
from the definitions. Now we must be wary when using the same letter χ to denote both
a group homomorphism G → C× and an algebra homomorphism Q[G] → C; in particular
these two maps have rather different kernels as given by
×
But these two kernels are related;
in particular for g ∈ G, we have g ∈ ker χ : G → C
iff g − 1 ∈ ker χ : Q[G] → C . Rather than introduce a new letter for the algebra
homomorphism, we shall try to clarify using context whenever the algebra homomorphism
is intended; and whenever we write simply ker χ, we mean the kernel of χ : G → C× .
= Q[x]/(xn −1) ∼
Q[G] ∼ Q[x]/(Φd (x)) ∼
M M
(7.1) = = Q[ζd ],
d|n d|n
χ then lifts to an algebra homomorphism Q[x] → Q[ζn ] whose kernel contains the ideal
(xn − 1). Since the image of this map is evidently a subfield of Q[ζn ], this image is Q[ζd ] for
some d n. Now the kernel of the homomorphism Q[x] → Q[ζd ] induced by χ is the principal
ideal (Φd (x)) ⊂ Q[x]. This means that the values of χ generate the cyclotomic extension
Q[ζd ]. Under the isomorphism Q[G] ∼ = Q[x]/(xn −1) above, the monomial x corresponds to
a generator g of G; and then an arbitrary element g k ∈ G lies in ker χ = ker χ : G → C× ,
iff g k − 1 lies in the kernel of χ : Q[x] → Q[ζn ], iff xk − 1 is divisible by Φd (x), iff d k.
This shows that [G : ker χ] = d. We obtain
Note that the parameter d n uniquely characterizes the kernel of the algebra homomor-
phism χ : Q[G] → C, as well as the kernel (and the order) of χ : G → C× ; however it does
not uniquely characterize the image Q[ζd ] since for d odd, Q[ζd ] = Q[ζ2d ].
Corollary 7.3. Let G be a finite abelian group, and let α, β ∈ Q[G]. Then
(a) α = β iff χ(α) = χ(β) for all χ ∈ G.
b
(b) Suppose G is cyclic of order n; and for each k n, consider the character χk ∈ G
b
of order d = nk . Then α = β iff χk (α) = χk (β) for all k n.
7. Group Rings R[G]
The set X = {χk : k n} has cardinality |X | = σ(n), where σ(n) is the number of positive
integer divisors of n; see Exercise #1.5. Note that σ(n) is generally quite small compared
with n. By comparison, given two elements α, β ∈ C[G] where G is cyclic of order n, we
have α = β iff χ(α) = χ(β) for all n characters χ ∈ G. b No fewer than all n characters
will suffice for this purpose. Recall the algebra homomorphism C[G] ∼ = Cn ; and note that
each algebra homomorphism χ : C[G] → C, being C-linear, has an (n−1)-dimensional
subspace as its kernel. The intersection of all these kernels is {0}; but for any proper
T
subset X ⊂ G of size |X | = k < n, the subspace χ∈X ker χ : C[G] → C ⊆ C[G] has
b
dimension > n − k > 1; it therefore contains α 6= 0 satisfying χ(α) = χ(0) = 0 for all
χ ∈ X . Another way to say this is that over C, the analogue of (7.1) is the algebra
∼
= b . Each equation χ(α) = χ0 (β) says that
isomorphism C[G] −→ Cn , α 7→ χ(α) : χ ∈ G
two vectors in Cn (corresponding to α, β ∈ C[G]) have two coordinates the same; but to
guarantee equality of the two vectors, one must compare all n coordinates.
In practical implementations of Corollary 7.3 for the purpose of checking for equality
of two elements of Q[G], it should be remembered (as previously observed) that each of
the equalities χk (α) = χk (β) can be checked exactly using arbitrary precision arithmetic,
in the ring Q[G] ∼ = Q[x]/(xn −1). Nevertheless, since floating precision is typically much
easier to implement and requires less execution time, it may be that when testing a large
number of pairs (α, β) in Q[G] as candidates for equality, most of the cases can be ruled
out quickly using floating point arithmetic; and only in those cases where numerical values
agree to within a well-chosen tolerance, then closer inspection using exact arithmetic be
used for a final check for equality.
∼
= L
Proof of Corollary 7.3. By Theorem 7.2, the isomorphism Q[G] −→ d|n Q[ζd ] of (7.1)
is explicitly given by α 7→ χd (α) : d|n . The fact that this map is injective is the desired
conclusion.
xg xh = xg+h for g, h ∈ G.
of X. Now the group algebra of X (or of G) over R has addition and multiplication defined
by P P
ag xg + bg xg = (ag +bg )xg ; ag xg bg xg = ag−h bh xg.
P P P P P
g∈G g∈G g∈G g∈G g∈G g∈G h∈G
The group ring R[X] of X over R works just like before; and we will often refer to this
group ring as simply R[G], implicitly invoking the isomorphism X ∼
= G, as this is merely
a notational device.
Example 7.4: The Group Algebra R[Z]. The infinite additive cyclic group G = Z is rewritten
multiplicatively as X = {xk : k ∈ Z}. The group ring (with real coefficients) takes the form
R[G] = R[X] = R[x, x−1 ] which is the ring of Laurent polynomials with real coefficients. It
consists of all polynomials in x and x−1 (note: only finitely many terms, but exponents may be
positive, negative or zero). This ring is of course an algebra over R of infinite dimension, with X as
basis.
Example 7.5: The Group Algebra Q[Z/nZ]. The finite additive cyclic group G = Z/nZ =
{0, 1, 2, . . . , n−1} of order n is isomorphic to the finite multiplicative cyclic group X = {1, x, x2 , . . . ,
xn−1 } where the generator x has order n. The group ring (with rational coefficients) is Q[G] =
{a0 +a1 x+ · · · +an−1 xn−1 : ai ∈ Q} ∼ = Q[x]/(xn − 1), an algebra of dimension n.
Direct Products
Let G1 and G2 be multiplicative groups of order n1 and n2 respectively. The direct product
G1 × G2 is the group of order n1 n2 whose elements are ordered pairs (g1 , g2 ), gi ∈ Gi .
Multiplication is componentwise, viz. (g1 , g2 )(g10 , g20 ) = (g1 g10 , g2 g20 ). We will identify G1
and G2 with the corresponding subgroups G1 × {1} and {1} × G2 of G1 × G2 via the
embeddings g1 7→ (g1 , 1) and g2 7→ (1, g2 ). (There is no harm or ambiguity in these
identifications unless G1 and G2 contain nonidentity elements sharing the same symbols;
but then we simply replace G1 or G2 by an isomorphic copy on a new set of symbols to
avoid the ambiguity.) Now the embeddings G1 , G2 ⊆ G1 × G2 give rise to subalgebras
R[G1 ] and R[G2 ] embedded in the group algebra R[G1 ×G2 ]; and the set of products of the
form α1 α2 , αi ∈ R[Gi ], serve to generate the entire algebra R[G1 × G2 ] as an R-module.
Readers comfortable with tensor products will already recognize that this observation is
more fully expressed by the isomorphism R[G1 × G2 ] ∼ = R[G1 ] ⊗R R[G2 ]; and readers
unfamiliar with this terminology can safely shelve it for future reference.
Recall from Theorem 6.1(a) that every character χ ∈ G\ 1 × G2 has the form χ = χ1×χ2
where χi ∈ Gi , so that χ(g1 , g2 ) = χ1 (g1 )χ2 (g2 ) whenever gi ∈ Gi . This extends by Q-
c
linearity to the rational group algebra of G1 × G2 , so that if
k
X
α= rj α1,j α2,j ∈ Q[G1 × G2 ], rj ∈ Q, αi,j ∈ Q[Gi ],
j=1
7. Group Rings R[G]
then
k
X
χ(α) = rj χ1 (α1,j )χ2 (α2,j ) ∈ C.
j=1
Example 7.6: Q[G1 ×G2 ] where G1 and G2 are finite cyclic. Let G1 = {1, x, x2 , . . . , xm−1 }
be cyclic of order m, and G2 = {1, y, y 2 , . . . , y n−1 } be cyclic of order n. Then Q[G1 × G2 ] ∼ =
Q[x, y]/(xm −1, y n −1), an algebra of dimension mn over Q. Note that Q[G1 ] ∼ = Q[x]/(xm −1) and
Q[G2 ] ∼
= Q[y]/(y n −1); and we have simply tensored these two algebras together over Q using Q[x, y] ∼
=
Q[x] ⊗Q Q[y] and the remarks above. Here {xi y j : 0 6 i < m, 0 6 j < n} is a basis over Q. Similarly
if G1 = Z/mZ and G2 = Z/nZ, then replacing these additive cyclic groups by their multiplicative
proxies as in Example 7.5, we once again have Q[G1 × G2 ] ∼ = Q[x, y]/(xm −1, y n −1).
Theorem 7.7. Let G be the additive group of a finite field F = Fq of odd order.
Let S, N ⊂ G be the subsets of size q−12 corresponding to the nonzero squares and
the nonsquares in F . Let α, β, κ ∈ Q[G] denote the sums of S, N and G respectively,
in the group algebra. Then α generates a 3-dimensional ideal in Q[G] with basis
{1, α, β}, or {1, α, κ}. We have κ = 1 + α + β; ακ = βκ = q−1 2
2 κ; κ = qκ and the
following relations are satisfied.
(a) If q ≡ 1 mod 4: α2 = q−1 q−5 q−1 q−1 ∗
2 + 4 α + 4 β = 4 (1 + κ) − α; α = α; β = β;
∗
√ √
b satisfies χ(α) = 1 (−1 ± q) and χ(β) = 1 (−1 ∓ q).
every nontrivial χ ∈ G 2 2
(b) If q ≡ 3 mod 4: α∗ = β; αα∗ = q+1 q−3
4 + 4 κ; every nontrivial character χ ∈ G
b
√ √
satisfies χ(α) = 21 (−1 ± i q) and χ(β) = 21 (−1 ∓ i q).
Proof. The fact that {1, α, β} spans an ideal in Q[G], with structure constants as stated,
q−1
follow from Theorem 3.6. For example when q ≡ 1 mod 4 and ε = (−1) 2 in the notation
of Theorem 3.6, we may write α2 = m0 + m+ α + m− β where m0 , m+ , m− are positive inte-
gers expressing the number of ways to express 0 (respectively, each square, each nonsquare)
as a sum of two squares in F . Here m0 = q−1 2 is the number of solutions 0 = a + (−a)
q−5 q−1
with a ∈ S; also m+ = 4 and m− = 4 by parts (i) and (iv) of Theorem 3.6.
Now let χ ∈ G b be nontrivial. By Theorem 6.3(a), we have χ(κ) = 0. If q ≡ 1 mod 4
then
χ(α)2 = χ q−1
q−1
4 (1 + κ) − α = 4 − χ(α)
so that
1 2 q
χ(α) + 2 = 4
8. Difference Sets
1 √ 1 √
which yields χ(α) = 2 −1 ± q ; also 0 = χ(κ) = χ(1 + α + β) gives χ(β) = 2 −1 ∓ q .
If q ≡ 3 mod 4 then
Exercises 7.
1. Consider the group algebra R = F [G] of a finite (multiplicative) abelian group G over a field F ,
and let A ⊆ R be an ideal. Recall that R is an n-dimensional vector space with basis G, where
n = |G|; and A is a subspace. Assume that the characteristic of F does not divide n. (Thus
char F may equal zero; however char F cannot equal p for any of the finitely many primes p
dividing n.)
(a) Using linear algebra, explain why R has a subspace U complementary to A, i.e. R = A ⊕ U ;
this means that R = A + U and A ∩ U = 0. Here you may cite any known theorems from
linear algebra.
(b) Show by example that the subspace U 6 R in (a) is not necessarily an ideal of R, or even a
subalgebra, in general.
(c) Prove that there exists an F -linear transformation P : R → R such that P 2 = P , having
image equal to A and null space equal to U . (Again cite any known facts from linear algebra,
using (a).)
1 P −1 P (hv). The hypothesis regarding char F guar-
(d) Define T : R → R by T (v) = n h∈G h
antees that n has an inverse in F (so we are not dividing by zero here). Prove that T is
F -linear.
(e) Prove that T (gv) = gT (v) for all g ∈ G and v ∈ R.
(f) Prove that T 2 = T .
(g) Prove that T (v) = v iff v ∈ A.
(h) Prove that the image of T is A, and the kernel of T is a subspace B 6 R complementary to
A; so R = A ⊕ B.
(i) Prove that B ⊆ R is in fact an ideal. (While (a) gives a complementary subspace, this is
stronger: it gives a complementary ideal.)
Wherever well-known facts from linear algebra suffice, please indicate so. This is not the place
to re-prove basic facts from linear algebra, only to demonstrate a knowledge of which facts these
are. We remark that the assumption that G is abelian is not actually required here; however if
G is nonabelian, we must speak of left ideals throughout, instead of (two-sided) ideals.
2. Let R = F [G] where F = F2 and G = {1, g} is cyclic. Note that the hypotheses of #1 are not
satisfied. Here we how that the conclusion of #1 also fails. Find an ideal A ⊆ R for which there
does not exist a complementary ideal; so there is no ideal B ⊆ R satisfying R = A ⊕ B.
8. Difference Sets
Let G be a multiplicative group of order v. (While our groups will typically be abelian,
we do not yet require this.) A (v, k, r)-difference set in G is a subset D ⊂ G of size
8. Difference Sets
for all α, β ∈ G. (If G is abelian, then clearly this map is an automorphism of the algebra.)
P
In the following, we also denote κ := g ∈ Z[G].
g∈G
(ii) Any two distinct points lie in exactly r common blocks. Dually, any two distinct
blocks intersect in exactly r points.
Again, a necessary condition for the existence of a symmetric (v, k, r)-design is the feasi-
bility relation (v − 1)r = k(k − 1), deduced by counting in two different ways the number
of pairs of distinct points in a fixed block. As before, we call (v, k, r) the parameters of
the design, and n := k − r its order; and to avoid trivial cases, we always assume v > k,
i.e. k > r, n > 0. From the feasibility relation, we see that any symmetric design with r = 1
has parameters (n2 +n+1, n+1, 1); this is called a projective plane of order n. Note
that the feasibility relations are necessary, but not sufficient, condition for the existence
of a symmetric design with a given set of parameters; for example, there is no symmet-
ric design with the feasible parameter set (43, 7, 1) (since there is no projective plane of
order 6). Given this assertion and Theorem 8.2 below, it follows that there is also no
(43, 7, 1)-difference set. The smallest currently open parameter set for which the existence
of a symmetric design has not yet been resolved, is (81, 16, 3) (although so much is known
about the automorphism group of a putative symmetric (81, 16, 3)-design, that it cannot
arise from any difference set).
Theorem 8.2. A (v, k, r)-difference set D ⊂ G gives rise to a symmetric (v, k, r)-
design whose point set is G and whose blocks are the right translates Dh = {dh : d ∈
D}, h ∈ G. This symmetric design admits G as a group of automorphisms permuting
the points regularly (by right-multiplication).
Example 8.3: The Symmetric (7,3,1)- and (7,4,2)-Designs. Consider the cyclic
g3....
group of order seven, G = {1, g, g 2 , . . . , g 6 }. The subset D = {g, g 2 , g 4 } is a (7, 3, 1)- ..• ....
... ... ....
... ....... ...
difference set. The corresponding symmetric (7, 3, 1)-design, whose seven translates g5.............. ..... .............. g6
..•
. ....... ..1........• ..
.......... .......
are the lines shown on the right, is a projective plane of order 2. The complement of ..... ......•
. .... ................ .... ................. .....
. ... . ... ...... ...........
D is a (7, 4, 2)-difference set {1, g 3 , g 5 , g 6 } whose translates are the seven quadrangles g2•....................................•
.........................•
g
.. 4
g
(sets of four points, no three collinear) in the same plane.
The (7, 2, 1)-difference set generalizes in more than one way; see Exercises #3,4,5.
Proof of Theorem 8.2. Obviously each block Dh contains exactly k points dh, d ∈ D.
Each point g ∈ G lies in exactly k blocks Dd−1g for d ∈ D; this is because g ∈ Dh iff
g = dh for some d ∈ D, iff h = d−1g.
Given two distinct points g1 6= g2 in G, a block Dh contains both points iff (g1 , g2 ) =
(d1 h, d2 h) for some d1 , d2 ∈ D, iff d1 d−1
2 = g1 g2−1 and h = d−1
1 g1 . There are exactly r
pairs (d1 , d2 ) in D satisfying these conditions; and each such pair (d1 , d2 ) yields a unique
block Dh = Dd−1 1 g1 . Thus there are exactly r blocks containing both points g1 and g2 . It
remains only to show that any two distinct blocks intersect in exactly r points.
Let A be the v × v incidence matrix of our point-block structure: rows and columns of
A are indexed by points and blocks respectively; and the entry in row g ∈ G and column
8. Difference Sets
Theorem 8.4. Every automorphism of a symmetric (v, k, r)-design has the same
number of fixed points as fixed blocks.
Not every symmetric design arises from a difference set; the designs constructed above
are special in that they admit G as a regular group of automorphisms. Indeed, right-
multiplication by an element a ∈ G permutes points via g 7→ ga and blocks via Dh 7→ Dha,
thereby preserving incidence. To say that this group of automorphisms is regular is to
say that for any two points g1 , g2 , there is a unique automorphism in our group mapping
g1 7→ g2 (in this case, right-multiplication by a = g1−1g2 ). It is clear that this group of
automorphisms of the design is isomorphic to the group G that we started with, and that
it regularly permutes the blocks (as well as regularly permuting the blocks). In fact, for
8. Difference Sets
a symmetric design, any group which regularly permutes the points must also regularly
permute the blocks, although we do not prove this here.
Proof. We have only to prove the converse. Arbitrarily we choose a point and label it
‘1’. The other points are then labelled by the remaining group elements, using the regular
action of G which we may assume acts by right-multiplication: element g ∈ G maps point
‘1’ to point ‘g’. Choose a block B arbitrarily, and let D be the set of all points in B. Given
our labelling of points by group elements, D is viewed as a subset of G, with |D| = k. Given
g 6= 1 in G, by assumption there are exactly r blocks containing both the points g and 1.
Any such block may be denoted Bh, or simply Dh = {dh : d ∈ D}, for some h ∈ G, using
the regular action of G on the set of blocks; and we have g, 1 ∈ Dh iff (g, 1) = (d1 h, d2 h)
for some d1 , d2 ∈ D, iff (g, h) = (d1 d−1 −1
2 , d2 ). Thus every non-identity element g ∈ G is
expressible as g = d1 d−1
2 in exactly r ways. So D ⊂ G is a (v, k, r)-difference set.
P P
Proof. Let δ = d∈D d ∈ Z[G] and κ = g∈G g ∈ Z[G] as before. By Lemma 8.1,
∗
δδ = n + rκ. Given a, b ∈ G, the sum of the elements in aDb is aδb, which satisfies
so aDb is also a (v, k, r)-difference set. It is also clear that σ(D) is a (v, k, r)-difference set
in G whenever σ ∈ Aut G.
It is clear from the definition that the dual of a symmetric (v, k, r)-design (in which the
roles of points and blocks are reversed) is also a symmetric (v, k, r)-design (which may or
may not be isomorphic to the original design). Moreover those symmetric designs arising
from difference sets in the way we have shown, have a group G permuting both points and
8. Difference Sets
blocks regularly. From a (v, k, r)-difference set D ⊂ G we construct a symmetric (v, k, r)-
design in which point P lies in block d(B) iff d ∈ D, iff d−1 (P ) lies in the block B; so in the
dual design, the ‘point’ B lies in the ‘block’ d∗ (P ) iff d∗ ∈ D∗ , where D∗ = {d−1 : d ∈ D}.
This dual design is also a symmetric (v, k, r)-design admitting G as a regular group of
automorphisms; and so by Theorem 8.5, D∗ is also a difference set.
X
α[t] = ag g t ∈ Q[G].
g∈G
Note that α[s] [t] = α[st] and α[−1] = α∗ . Now if D is a (v, k, r)-difference set in an abelian
group G, and t is relatively prime to v, it follows from Theorem 8.6 that the subset
D[t] := {dt : d ∈ D}
is also a (v, k, r)-difference set in G. However, D[t] may coincide with a translate gD for
some g ∈ G. Hall’s Multiplier Theorem 8.9 indicates some sufficient conditions for this to
occur. But first we prove
Proof. By Lemma 8.1, D is a (v, k, r)-difference set iff δδ ∗ = n + rκ. By Corollary 7.3,
this is equivalent to
for all d n. For d = 1 this gives another proof of k 2 = n + rv, which is just the feasibility
P
relation. For d > 1, we have χd (κ) = g∈G χd (g) = 0 by Theorem 6.3(a), so (8.8) reduces
to |χd (δ)|2 = n.
Theorem 8.9 (Hall’s Multiplier Theorem [Ha], [HR]). Let D be a (v, k, r)-
difference set in an abelian group G of order v. Suppose that the order n = k − r has
a prime divisor p > r which does not divide v. Then D[p] = gD for some g ∈ G.
[−1]
P P
Proof. Let δ = d∈D d and κ = g∈G g, so that δδ = n + rκ. We see from the
p [p]
Multinomial Theorem that δ = δ + pα for some α ∈ Z[G], so
δ [p] δ [−1] = δ p δ [−1]− pαδ [−1] = δ p−1 δδ [−1]+ pα1 = (n + rκ)δ p−1+ pα1 = rκ + pα2
for some α1 , α2 ∈ Z[G]. Since all coefficients on the left side are non-negative integers,
this must also be true on the right side; and since p > r this means that all coefficients in
α2 are non-negative integers. Multiplying both sides by κ yields k 2 κ = rvκ + pα2 κ; and
using the feasibility relation k 2 = n + rv we obtain pα2 κ = nκ. Applying [−1] yields also
[−1]
pα2 κ = nκ. Now
[−1] [p]
δ [p] δ [−1] δ [p] δ [−1] = δ [p] δ [−1] δ [−p] δ = δδ [−1] δδ [−1] = (n+rκ)(n+rκ)[p]
= n2 + 2rnκ + r2 vκ.
We would like to cancel δ [−1] from both sides, but first we need to know that δ [−1] is not
a zero divisor in Q[G]. The latter relation takes the form ρδ [−1] = 0 where ρ = δ [p] − δg ∈
Q[G]. Now each χ ∈ G b satisfies
For the trivial character, we have χ(δ [−1] ) = k; and by Theorem 8.6, D[−1] is a (v, k, r)-
√
difference set, so |χ(δ [−1] )| = n by Theorem 8.7. In either case, χ(δ [−1] ) 6= 0, so χ(ρ) = 0
b By Corollary 7.3(a), ρ = δ [p] − δg = 0 as required.
for all χ ∈ G.
The point is that Dh generates the same symmetric design as D, since these two difference
sets have the same translates in G; so for most purposes, we may assume [t] fixes D itself,
otherwise replace D by an appropriate translate.
Proof of Theorem 8.10. The automorphism [t] ∈ Aut G acts as an automorphism of the
associated symmetric design, since g ∈ Dh iff g t ∈ (Dh)[t] . Since 1t = 1, [t] fixes at least
one point; so by Theorem 8.4, [t] has at least one fixed block Dh, h ∈ G.
Example 8.11: Constructions using Multipliers. Consider the cyclic group G = {1, g, g 2 , . . . , g 6 }
of order seven. Although it is not hard to find difference sets of order 2 in G (see Example 8.3),
Hall’s Multiplier Theorem makes the job even faster. The prime p = 2 divides n = 2 but not v = 7,
so [2] ∈ Aut G is a multiplier. By Theorem 8.10, any difference set is equivalent to one invariant
under the multiplier. Now [2] has three orbits on G: {1}, {g, g 2 , g 4 }, {g 3 , g 5 , g 6 }. In order that
D[2] = D, D must be a union of these orbits. We obtain D = {g, g 2 , g 4 } and D∗ = {g 3 , g 5 , g 6 } as
(7, 2, 1)-difference sets; every (7, 2, 1)-difference set is therefore a translate of one of these. Taking
their unions with the orbit {1} gives two (7, 4, 2)-difference sets; and every difference set with these
parameters is a translate of one of these.
Now consider the cyclic group G = {1, g, g 2 , g 20 } of order 21. A cyclic (21, 5, 1)-difference
set in G has order 4 and so must have [2] as a multiplier. The orbits of [2] on G are {1},
{g 7 , g 14 }, {g 3 , g 6 , g 12 }, {g 9 , g 15 , g 18 }, {g, g 2 , g 4 , g 8 , g 11 , g 16 } and {g 5 , g 10 , g 13 , g 17 , g 19 , g 20 }; and
any (21, 5, 1)-difference set D is (up to translation) a union of cosets. Since k = 5, we must have
D = {g 3 , g 6 , g 7 , g 12 , g 14 } or {g 7 , g 9 , g 14 , g 15 , g 18 }. It is not hard to check that both of these are in
fact difference sets; and they have the form D and D∗ , so the corresponding designs are dual to each
other.
8. Difference Sets
Example 8.12: Nonexistence via Multipliers. We show that there is no cyclic projective plane
of order 10. Such a design would necessarily arise from a (111, 11, 1)-difference set D in a cyclic
group G of order 111. The plane has order 10 and so [2] is a multiplier. Without loss of generality,
D[2] = D. A short computer program helps to enumerate the orbits of [2] on G, which are {1},
{g 37 , g 74 }, S, g 3 S and g 11 S where |S| = 36. Since there is no union of orbits having combined size
k = 11, there can be no such difference set. (There is in fact no projective plane of order 10, as we
now know by extensive computer search. It is remarkable how much easier it is to prove nonexistence
in the cyclic case.)
Example 8.13: Nonexistence via Characters. We prove that there is no (154, 18, 2)-difference
set in a cyclic group G. Here the order is n = 18 − 2 = 16. We expect [2] to be a multiplier,
but we cannot use Theorem 8.9 as stated since p = 2 does not exceed r = 2. But suppose D ⊂ G
is a (154, 18, 2)-difference set, and let χ ∈ G b be a character of order 11. The values of χ are in
D satisfies |χ(δ)|2 = 16. Denoting the principal ideal
P
O := Z[ζ], ζ = ζ11 . By Theorem 8.7, δ :=
A = (χ(δ)) ⊂ O, this yields the factorization AA = (2)4 and so A = A = (2)2 . Now the ideal
(2) = 2O ⊂ O is prime since the quotient O/(2) ∼ = Z[x]/(2, Φ11 (x)) ∼
= F2 [x]/(Φ11 (x)) ∼
= F210 is a
field. (Any root of Φ11 (x) in an extension of F2 is a primitive 11-th root of unity; and any extension
F2k having such a root must have 11|2k −1, and this requires 10|k.) Since A = (χ(δ)) = (4), we
must have χ(δ) = 4u for some unit u ∈ O× . But every automorphism σ ∈ Aut Q[ζ] has the property
that σ ◦ χ : G → Q[ζ] is also a nontrivial character, so the same reasoning gives |σ(χ(δ))| = 4.
Thus u ∈ O satisfies |σ(u)| = 1 for every σ ∈ Aut Q[ζ]. By Theorem 4.10, u is a root of unity
in Q[ζ11 ]; so u = ±ζ k and χ(δ) = ±4ζ k for some k ∈ {0, 1, 2, . . . , 10}. However [G : H] = 11
where H = ker χ = {g ∈ G : χ(g) = 1} and so G = H ∪ Ht ∪ Ht2 ∪ · · · ∪ Ht10 where t ∈ G
has order 11 satisfying χ(t) = ζ; thus χ(δ) = a0 +a1 ζ+a2 ζ 2 + · · · +a10 ζ 10 where ai = |D ∩ Hti |.
Since Φ11 (x) = 1+x+x2 + · · · +x10 is the minimal polynomial of ζ over Q, these two expressions for
χ(δ) can only agree if a0 +a1 x+a2P x2 + · · · +a10 x10 = ±4xk + aΦ11 (x) for some a ∈ Q. Comparing
coefficients gives a ∈ Z and 154 = 10 i=0 ai = ±4 + 11a and 154 ≡ ±4 mod 11, a contradiction. Thus
no (154, 18, 2)-difference set can exist over a cyclic group.
We remark on the necessity of observing that the ideal (2) ⊂ O is prime; if this were not the
2
case, then possibly (2) = PP for some ideal P ⊂ O such that (χ(δ)) = P2 and (χ(δ)) = P , which
would have invalidated our argument.
The literature on difference sets is vast and growing quickly; see for example the
surveys [Ju], [JS1], [JS2] on this subject. From the extensive list of known constructions
and general nonexistence results, we have highlighted only a few above, with a preference
for techniques using cyclotomic fields and group characters. It is with this preference in
mind that we have we have passed over many important results, in particular
Theorem 8.14 (Bruck, Ryser, Chowla [BR], [CR]). Suppose there exists a
symmetric (v, k, r)-design of order n := k − r. If v is even, then n is a square. If v
v−1
is odd, then the Diophantine equation nx2 + (−1) 2 y 2 = z 2 has a nontrivial integer
solution (i.e. (x, y, z) 6= (0, 0, 0)).
8. Difference Sets
While this condition does not rule out the designs of Examples 8.12 and 8.13, it does rule
out many others, including the projective plane of order 6 mentioned earlier. Of course
every parameter set (v, k, r) for which no symmetric design exists, means also that there
is no difference set with the given parameters.
Exercises 8.
1. Let G be a cyclic group of order seven. How many (7, 3, 1)-difference sets does G have? If D is
one such difference set, are all the others of the form aDb or (aDb)∗ using Theorem 8.6?
2. Generalizing Example 8.3, show that if (P, B) is a symmetric (v, k, r)-design, then by comple-
menting every block B ∈ B we get a family of subsets B0 = {P r B : B ∈ B} such that (P, B0 )
is also a symmetric design. Find the parameters of (P, B0 ); and show that the complemen-
tary design (P, B0 ) has the same order as the original design (P, B). (Note that in view of
Theorem 8.5, every difference set D ⊂ G must also yield a complementary difference set
D0 := G r D of the same order.)
3. Let G be an additive (rather than multiplicative) group of order v. Then a (v, k, r)-difference
set in G is a subset D ⊂ G of size |D| = k < v such that every nonzero element g ∈ G can be
expressed as g = d1 − d2 in exactly r ways. Each integer t relatively prime to v determines an
automorphism [t] ∈ Aut G, g 7→ tg; and such a map is a multiplier of D if tD = D + g for some
g ∈ G.
(a) Let G be the additive group of a finite field Fq , q ≡ 3 mod 4; and let D be the set of
nonzero squares in Fq . Show that D is a difference set in G, and determine its parameters.
This construction gives the Paley difference sets; it includes the (7, 3, 1)-difference set of
Example 8.3 as a special case.
(b) Find all multipliers of the difference set D in (a). In the strict sense that we have defined
multipliers [t], we have only considered t ∈ Z. Can you generalize the definition of multipliers
to include more general automorphisms?
4. Let V be a vector space of dimension e > 3 over a finite field F = Fq . Let P be the set of all
1-dimensional subspaces of V , and let B be the set of all (e−1)-dimensional subspaces of V .
q e −1
(a) Show that |P| = |B| = q−1
.
(b) For P ∈ P and B ∈ B, we say that P lies in B if P is a subspace of B. Show that this
e
−1
defines a symmetric (v, k, r)-design where v = qq−1 . Express k, r and n := k − r in terms of
q and e.
(c) Show that when e = 3, the design (P, B) is a projective plane of order n. These are in fact
the classical projective planes, again including Example 8.3 as a special case.
5. Let E ⊃ F be an extension of finite fields of degree e > 3 where F = Fq and E = Fqe . Let ω ∈ E
be a generator of the cyclic group E × , i.e. a primitive (q e −1)-st root of unity.
q e −1
(a) Show that ω v is a generator of the multiplicative group F × , where v = q−1
; that is, ω v is
a primitive (q−1)-st root of unity.
e
−1
(b) Consider the quotient group G = E × /F × , a cyclic group of order v = qq−1 . Note that for
× ×
each α ∈ E and a ∈ F , we have TrE/F (aα) = 0 iff TrE/F (α) = 0 (using the F -linearity
of the trace map); so we have a well-defined subset D ⊂ G consisting of all cosets αF × ,
α ∈ E × such that TrE/F α = 0. Show that D is a cyclic difference set in G having the same
parameters as the design in #4. (These are the Singer designs.)
9. Hadamard Matrices
9. Hadamard Matrices
+ + + + + + + +
− + + + − + − −
− + + + − − + + + − + −
++ + − + + − − − + + + − +
(9.1) + , , ,
+− + + − + − + − − + + + −
+ + + − − − + − − + + +
− + − + − − + +
− + + − + − − +
where we abbreviate ±1 by ±.
Proof. Let H be a Hadamard matrix of order m > 2, and let u, v, w ∈ {±1}n be its first
three rows. Since the vectors u+v and u+w have even entries, m = u·u = (u+v)·(u+w) ≡
0 mod 4.
It is not known whether the converse of Theorem 9.2 holds; but it is a popular con-
jecture that for every positive m ≡ 0 mod 4, there is a Hadamard matrix of order m. The
smallest currently open case is m = 668. How should one go about trying to construct a
Hadamard matrix of such a size? Whatever is the best way, certainly the worst way is to
2
hope to go through all 2m matrices of size m × m with entries ±1 until finding success.
In trying to convey to my non-mathematical friends a sense of the kind of problems com-
binatorialists work on, I will often describe the search for a Hadamard matrix of size 668;
but the magnitude of such a search typically fails to impress my friends who are unaware
2
that if all the computers in the world were dedicated to a naive search of 2668 cases at the
optimistic rate of a million cases per second, it would still take much more than 10100000
times the current age of the universe to finish the task .
Rather than not looking at all, we look where the chances seem better. We are
reminded of the old story (often called The Streetlight Effect), one version [Ho] of
which goes:
A man got drunk in Memphis one night. Staggering down the street, he stumbled into an alley.
His watch fell out of his pocket. He heard it fall. He got up, and walked on down to the corner.
There he got down on his hands and knees and started crawling all around under the electric
light. Soon the traffic was blocked. A police officer came up to him and said : “What’re you
doing? Can’t you see you’re blocking traffic?” The drunk replied: “Well, I losht mer watch. It
was my daddy’s watch; sort of an heirloom in the family, and I’ve juss gotta find that watch.”
9. Hadamard Matrices
The cop said: “All right, boss, I’ll help you find it. Where did you lose it?” The drunk: “I losht
it up there in that damn dark alley.” The cop: “Well, why don’t you look for it down there
where you lost it?” The drunk: “Why, you big fat fool, can’t you see there’s a lot more light up
here?”
The story offers us some guiding principles, both in looking for examples, and in trying to
glean insight from the known examples directing us what theorems we should be trying to
prove (at least whether existence or nonexistence results). When hunting for examples, we
are wise to look where the light is best, provided we look where our chances are good. But
before ruling out the darker regions or investing too much time trying to prove nonexistence
there, consider that the lack of known examples in the darker regions is due to the difficulty
of finding them there. (It is usually this second lesson that is intended by ‘the streetlight
effect’.)
The literature on Hadamard matrices is far too vast to adequately summarize here,
so we will content ourselves with describing a few general types of construction and some
approaches to classifying or proving nonexistence in special cases. Cyclotomic integers
play a key role in both of these ventures, but particularly in proving nonexistence or
classification results; however much of the work constructing examples uses group rings
and representations where cyclotomic integers arise in a somewhat clerical way.
However before describing general types
1 1 of construction,
H H we should note that if H
is a Hadamard matrix of order m, then 1 −1 ⊗ H = H −H is a Hadamard matrix of
order 2m. So for the existence problem, it suffices to consider only orders m ≡ 4 mod 8,
i.e. m = 4n where n is odd; and we will often confine our attention to this case. The
doubling trick we have just mentioned is a special case of
We also mention that if H is Hadamard, then multiplying some of the rows of H by −1,
and multiplying some of the columns by −1, then permuting rows and permuting columns,
results in another Hadamard matrix which is said to be monomially equivalent to H.
Since H T is also Hadamard, one might also consider the coarser equivalence relation which
allows for transposing of Hadamard matrices. In counting Hadamard matrices, we may be
interested in the raw number of Hadamard matrices of order m (a rapidly growing function
of m), or in the number of equivalence classes of Hadamard matrices of order m (either up to
monomial equivalence, or up to monomial equivalence and transpose). The exact number of
monomial equivalence classes of Hadamard matrices of order m is known only for m 6 32.
For m = 4, 8, 12, 16, 20, 24, 28, 32, the number of classes is 1, 1, 1, 5, 3, 60, 487, 13710027.
9. Hadamard Matrices
+ + + ··· +
−
H = −
I+A
.
..
−
where
(9.4) A is skew symmetric (AT = −A) of size (m−1) × (m−1) with zeroes on
its main diagonal and entries ±1 elsewhere; and
(9.5) AJ = JA = 0 and A2 = J − (m−1)I where I = Im−1 and J = Jm−1 are
the identity matrix and all-ones matrix of size (m−1)×(m−1) respectively.
These conditions are equivalent to the assertion that A = A1 −AT1 where A1 is the (0, 1)-
incidence matrix of a symmetric (4n−1, 2n−1, n−1)-design (and AT1 is the incidence matrix
of the dual design with the same parameters). Such a design has order n and is called
a Hadamard 2-design. Of course any (4n−1, 2n−1, n−1)-difference set in an abelian
group of order v = 4n−1 immediately gives such a design and Hadamard matrix of or-
der 4n. These difference sets are usually said to be of Paley type. But there is conflicting
terminology: some authors, a minority it seems, refer to (4n−1, 2n−1, n−1)-difference
sets as Hadamard difference sets. Others reserve this terminology for the parameters
(4n2 , 2n2 ±n, n2 ±n) discussed below.
The classical construction, due to Paley, uses the additive group of Fq , q ≡ 3 mod 4,
in which the squares form a (4n−1, 2n−1, n−1) difference set where n = 14 (q + 1); see
Exercise #8.3. After rewriting the group as a multiplicative group G of order q, we let
α, β, κ ∈ Z[G] be the elements denoted by
X X X
α= g, β= g, κ= g.
g∈S g∈N g∈G
group of the ring R := Fq ⊕ Fq+2 . The difference set D consists of pairs (a, b) ∈ R such
that
• a and b are both nonzero squares in their respective fields; or
• a and b are both nonsquares in their respective fields; or
• a ∈ Fq is arbitrary and b = 0.
Note that |G| = q(q + 2) = 4n − 1 and |D| = 14 (q 2 − 1) + 14 (q 2 − 1) + q = 2n − 1. We write
G = G1 × G2 and
X X
κ= = κ1 κ2 ∈ Q[G] where κi = g ∈ Q[Gi ];
g∈G g∈Gi
see the description of group algebras for direct product groups in Section 7. We also let
αi , βi ∈ Q[Gi ] corresponding to nonzero squares and nonsquares in the respective fields, so
P
that κi = 1+αi +βi in the notation of Theorem 7.7. Now δ = d∈D d = α1 α2 + β1 β2 + κ1 .
To verify that D is a (4n−1, 2n−1, n−1)-difference set in G, i.e. δδ ∗ = n + (n − 1)κ, it
√
suffices by Theorem 8.7 to check that |χ(δ)| = n for every nontrivial χ ∈ G. b Recall that
χ = χ1 ×χ2 where χi ∈ G ci , and at least one of χ1 , χ2 is nontrivial. Again by the comments
of Section 7,
χ(δ) = χ1 (α1 )χ2 (α2 ) + χ1 (β1 )χ2 (β2 ) + χ1 (κ1 ).
If χ1 is trivial, then χ2 is nontrivial and
q−1 q−1 q−1 q+1 √
χ(δ) = 2 χ2 (α2 ) + 2 χ2 (β2 ) +q = 2 (−1) +q = 2 = n.
If χ2 is trivial then χ1 is nontrivial and
√
χ(δ) = χ1 (α1 ) q+1 q+1 q+1
2 + χ1 (β1 ) 2 = (−1) 2 = − n.
In the remaining case, both χ1 and χ2 arenontrivial and
√ √
χ1 (α1 ) = 12 −1 + ε1 q , χ1 (β1 ) = 12 −1 − ε1 q ,
p p
χ2 (α2 ) = 12 −1 + ε2 q+2 , χ2 (β2 ) = 12 −1 − ε2 q+2
where ε1 ∈ {1, −1} and ε2 ∈ {i, −i} if q ≡ 1 mod 4; or ε1 ∈ {i, −i} and ε2 ∈ {1, −1} if
q ≡ 3 mod 4. Now
p p
χ(δ) = χ1 (α1 )χ2 (α2 ) + χ1 (β1 )χ2 (β2 ) = 12 1 + ε1 ε2 q(q+2) = 21 1 ± i q(q+2)
Williamson-Hadamard Matrices
A Hadamard matrix of order m = 4n is of Williamson type if it has the form
A B C D
−B A −D C
H=
−C D A −B
−D −C B A
where A, B, C, D are symmetric circulant matrices of order n which commute with each
other. We recall that an n × n matrix is circulant if each row is a cyclic shift of the
9. Hadamard Matrices
previous row, with ‘wraparound’. More precisely, the set of all n × n circulant matrices is
the algebra Q[T ] generated by the n × n matrix
0 10 ··· 0 0
0 01 ··· 0 0
. ..
.. . . ....
.. .. . . .
T = .
0
0 0 ··· 1 0
0 0 0 ··· 0 1
1 0 0 ··· 0 0
Note that {I, T, T 2 , . . . , T n−1 } is a basis for Q[T ]. For reasons described already, we will
assume n = 2t+1 is odd. We have required A, B, C, D to be symmetric circulant matrices;
equivalently, they lie in the subalgebra of Q[T ] having basis {I} ∪ {T i +T −i : 1 6 i 6 t}.
Now the condition HH T = mIm reduces to the single relation
A2 + B 2 + C 2 + D2 = 4nIn .
In the following, the group G = {1, g, g 2 , . . . , g n−1 } is cyclic of order n; and we abbre-
viate the elements ωi = g i + g −i ∈ Z[G] for i ∈ Z/nZ which are easily seen to satisfy
ωi ωj = ωi+j + ωi−j (noting that all indices here are modulo n). As before, we write
t
ωi ∈ Z[G] where t = n−1
P i P
κ= g = 1+ 2 .
i∈Z/nZ i=1
Theorem 9.6 ([Wi]; see also [BH]). Let G be a cyclic group of order n = 2t + 1.
The following conditions are equivalent.
(i) There exists a Hadamard matrix of order m = 4n = 8t + 4 of Williamson type.
(ii) There exist elements α, β, γ, δ ∈ Z[G] of the form
Pt Pt Pt t
P
α = 1 + ai ωi , β = 1 + bi ωi , γ = 1 + ci ωi , δ = 1+ di ωi
i=1 i=1 i=1 i=1
where ai , bi , ci , di ∈ {1, −1}, satisfying
α2 + β 2 + γ 2 + δ 2 = 4n.
(iii) There exists a partition {1, 2, . . . , t} = I1 t I2 t I3 t I4 (with Ij possibly empty)
and signs ε1 , ε2 , . . . , εt ∈ {1, −1} such that the elements
X
τj = 1 + 2 εi ωi ∈ Z[G]
i∈Ij
satisfy τ12 + τ22 + τ32 + τ42 = 4n.
Proof. It is easily verified that for an m × m matrix H of the form described above , one
has HH T = mIm iff A2 + B 2 + C 2 + D2 = 4nIn ; here we use the fact that A, B, C, D are
commuting symmetric n × n matrices with entries ±1. This condition requires that
9. Hadamard Matrices
t t
ai (T i +T −i ), di (T i +T −i )
P P
A = a0 I + ... , D = d0 I +
i=1 i=1
where all coefficients ai , bi , ci , di ∈ {1, −1}. Without loss of generality, a0 = 1; otherwise
replace A by −A while preserving all necessary conditions. Similarly, b0 = c0 = d0 = 1.
The algebra Q[T ] is nothing other than the group algebra of a cyclic group G of order n; and
using the generic symbol g for the generator of G, the isomorphism takes the form Q[T ] →
Pn−1 Pn−1
Q[G], i=0 ai T i 7→ i=0 ai g i . Taken together, these facts establish the equivalence of
(i) and (ii).
P
Now given α, β, γ, δ as in (ii), let us write α = 2α0 −κ where α0 = 1 + i∈S1 ωi and S1
is the set of all i ∈ {1, 2, . . . , t} such that ai = +1. Similarly write β = 2β0 −κ, γ = 2γ0 −κ,
δ = 2δ0 − κ. We have
α2 = 4α02 − 4α0 κ + κ2 = 4α02 + (n − 4 − 8|S1 |)κ
P P P
= 4 + 8|S1 | + 8 ωi + 4 ω2i + 4 (ωi+j +ωi−j ) + (n − 4 − 8|S1 |)κ
i∈S1 i∈S1 i,j∈S1
i>j
Note that the constant term (i.e. the g 0 term) on the right side is 16 + 4(n − 4) = 4n, in
agreement with the left side. Comparing coefficients of g ` on both sides for 1 6 ` < n gives
4 4
(i, j) ∈ Sk2 : i > j, |i−`| = j
P P
0 = 4(n−4) − 8 |Sk | + 8s` + 4s2` + 8
k=1 k=1
equation modulo 8, and recalling that n is odd, we deduce that s` is odd; that is, exactly
three of the terms a` , b` , c` , d` have the same sign, and the other has the opposite sign.
Equivalently (see Exercise #3), the vector
−1 1 1 1
1 1 −1 1 1
a` , eb` , e
c` , de` = a` , b` , c` , d` U, U =
2 1 1 −1 1
e
1 1 1 −1
has three zero coordinates and one coordinate ±2. Setting (τ1 , τ2 , τ3 , τ4 ) = (α, β, γ, δ)U ,
we obtain four elements
t
P t
P t
P t
P
τ1 = 1 + a` ω` ,
e τ2 = 1 + eb` ω` , τ3 = 1 + c` ω ` ,
e τ4 = 1 + de` ω`
`=1 `=1 `=1 `=1
in Z[G] having exactly the form described by (iii). The converse (iii)⇒(ii) follows by re-
versing the steps.
9. Hadamard Matrices
Theorem 9.6 provides the following algorithm, demonstrated in the next two examples,
to construct Hadamard matrices of order 4n ≡ 4 mod 8. Applying the trivial character
χ : Z[G] → Z, g ag g 7→ g ag to the relation τ12 +τ22 +τ32 +τ42 = 4n gives a representation
P P
P4
of 4n as a sum of four odd squares k=1n2k where nk = χ(τk ) ≡ 1 mod 4 is the trivial
character value of τk (the sum of the integer coefficients of τk ). We first therefore enumerate
all representations of 4n as a sum of four odd squares. Next, we find τk ∈ Z[G] of the form
1 ± 2ωi ± 2ωj ± · · · with distinct subscripts, whose sums of coefficients give the required
representations of 4n as a sum of four squares.
Example 9.7: A Hadamard matrix of Williamson type of order 20. Let G = hgi be cyclic
of order n = 5. The unique representation of 20 as a sum of four odd squares (unique, that is, up
to permutation of the four terms) is 20 = 12 + 12 + (−3)2 + (−3)2 . This gives a unique solution
(again, up to permutation) of the relations (iii) in Theorem 9.6, namely τ1 = τ2 = 1, τ3 = 1 − 2ω1 ,
τ4 = 1 − 2ω2 where ω1 = g+g 4 and ω2 = g 2 +g 3 . We check that this unique feasible choice does
indeed satisfy (iii):
τ12 + τ22 + τ32 + τ42 = 1 + 1 + (1−2ω1 )2 + (1−2ω2 )2 = 4 − 4ω1 − 4ω2 + 4ω12 + 4ω22
= 4 − 4ω1 − 4ω2 + 4(ω2 +2) + 4(ω1 +2) = 20.
We obtain
(α, β, γ, δ) = (τ1 , τ2 , τ3 , τ4 )U = (1−ω1 −ω2 , 1−ω1 −ω2 , 1+ω1 −ω2 , 1−ω1 +ω2 )
which yields
+−−−− ++−−+ +−++−
−+−−− +++−− −+−++
A = B = −−+−− , C = −+++− , D = +−+−+
−−−+− −−+++ ++−+−
−−−−+ +−−++ −++−+
The same general strategy is used in the next example; except that with n = 9 in
the following example, n is not prime (compare with n = 5 in the previous example) so a
P4
little more work is required. The required condition k=1 τk2 = 4n in Z[G] is equivalent
P4
to k=1 χ(τk )2 = 4n as χ ∈ G b ranges over a set of representatives of characters of orders
dividing n (see Corollary 7.3). Thus in Example 9.8 we consider characters of order 1,
3 and then 9, in that order, refining our choices of the τk ’s with each step. While these
examples are intended to provide a taste of how this strategy might work in general, the
reader should keep in mind that for all but the smallest values of n, this strategy is typically
implemented by computer.
Example 9.8: Hadamard Matrices of order 36 of Williamson Type. Let G = hgi be cyclic of
order n = 9. This group has an automorphism mapping g 7→ g 2 which cycles ω1 7→ ω2 7→ ω4 7→ ω1
while fixing ω3 7→ ω3 , thereby reducing the list of equivalence classes of the resulting Hadamard
matrices. Representations of 36 as a sum of four odd squares include (−3)2 + (−3)2 + (−3)2 + (−3)2
and 12 +12 +(−3)2 +52 , both of which lead to solutions of (iii). Here we consider only solutions of the
second type such that (τ1 , τ2 , τ3 , τ4 ) has the form (1, 1−2ωi +2ωj , 1−2ωk , 1+2ω` ) where {i, j, k, `} =
{1, 2, 3, 4}; and Exercise #4 covers the remaining cases not treated here. We require that
(*) 12 + (1−2ωi +2ωj )2 + (1−2ωk )2 + (1+2ω` )2 = 36; or equivalently,
(**) χ(1)2 + χ(1−2ωi +2ωj )2 + χ(1−2ωk )2 + χ(1+2ω` )2 = 36 for all χ ∈ G.
b
9. Hadamard Matrices
The pattern chosen for the τk ’s ensures that the trivial character satisfies (**) a fortiori; and before
dealing with all 24 choices of indices in (*), we find that most of these cases can be eliminated readily
using (**). Take now the character χ of order 3 with χ(g) = ζ = ζ3 , so that χ(ωi ) = ζ+ζ 2 = −1 for
i = 1, 2, 4; χ(ω3 ) = 2. Thus
(a) χ(1−2ωk )2 = 9 for k = 1, 2, 3, 4;
2 1, for ` = 1, 2, 4;
(b) χ(1+2ω` ) =
25, for ` = 3;
49, for i 6= j = 3;
(
2
(c) χ(1−2ωi +2ωj ) = 25, for i = 3 6= j;
1, for i, j, 3 distinct.
We clearly require either i = 3 or ` = 3. If i = 3 then (i, j, k, `) = (3, 1, 2, 4) or (3, 1, 4, 2) (or four
other possibilities equivalent to these using the 3-cycle mentioned at the outset); but none of these
cases satisfy (*). Indeed we may directly compute
1 + (1−2ω3 +2ω1 )2 + (1−2ω2 )2 + (1+2ω4 )2
= 1 + (1 + 4ω32 + 4ω12 − 4ω3 + 4ω1 − 8ω3 ω1 ) + (1 − 4ω2 + 4ω22 ) + (1 + 4ω4 + 4ω22 )
= 1 + 1 + 4(ω3 +2) + 4(ω2 +2) − 4ω3 + 4ω1 − 8(ω4 +ω2 )
+ 1 − 4ω2 + 4(ω4 +2) + 1 + 4ω4 + 4(ω4 +2)
= 36+8ω1 −8ω2 6= 0
and similarly for (i, j, k, `) = (3, 1, 4, 2). If ` = 3 then up to equivalence we have (i, j, k, `) =
(1, 2, 4, 3) or (2, 1, 4, 3). More direct computation rules out the first of these cases; but the choice
(i, j, k, `) = (2, 1, 4, 3) is found to satisfy (*). So (τ1 , τ2 , τ3 , τ4 ) = (1, 1−2ω2 +2ω1 , 1−2ω4 , 1+2ω3 )
gives a Hadamard matrix of order 36 of Williamson type.
Theorem 9.9 ([Wh]). For every prime power q ≡ 1 mod 4 there exists a Hadamard
matrix of order 4n = 2(q + 1) of Williamson type.
Proof. Take n = 12 (q + 1) throughout, and note that n is odd. Consider the quadratic
extension E ⊃ F of fields of order q 2 and q respectively. Let χ : F → {0, ±1} be the
quadratic character. Choose a generator ω for the multiplicative group E × , so that ω1 =
ω q+1 is a generator of F × . We abbreviate Tr = TrE/F : x 7→ xq + x throughout. We
define the sequence uk = χ(Tr ω k ) for k ∈ Z. Recall that the quadratic character χ
satisfies χ(ω1k ) = (−1)k and so uk+2n = χ(Tr ω k+2n ) = χ(Tr(ω1 ω k )) = χ(ω1 Tr ω k ) =
χ(ω1 )χ(Tr ω k ) = −uk . Thus
(9.10) uk+2n = −uk for all k. In particular, the value of uk depends only on
k mod 4n.
q(q − 1)χ(c), if d = 0;
=
0, if d 6= 0.
Using the periodicity relation (9.10) and the fact that (9.13) vanishes at z = 0, the sum
(9.13) simplifies to
n−1
q(q − 1)χ(c), if d = 0;
X
k k+r
2(q − 1) χ(Tr ω )χ(Tr(ω )) =
k=0
0, if d 6= 0.
1, if k ≡ 0 mod n; 1, if k ≡ 0 mod n;
ak = bk = ck =dk =u4k , for all k.
u4k+n , otherwise; −u4k+n , otherwise;
Note by (9.12) that all the values ak , bk , ck , dk ∈ {±1}. Now
n−1
P n−1
A2 + B 2 + C 2 + D2 = (ak ak+r + bk bk+r + ck ck+r + dk dk+r ) T r .
P
r=0 k=0
All the diagonal entries (for r = 0) are clearly 4n; and for r 6= 0 we have
n−1
X n−1
X n−1
X
(ck ck+r + dk dk+r ) = 2 u4k u4k+4r = 2 u` u`+r = 0
k=0 k=0 `=0
Example 9.15: A Hadamard Matrix of order 12. Here we demonstrate Theorem 9.9 for q = 5,
n = 12 (q+1) = 3. Take F25 = F5 [ω] where the primitive element ω satisfies ω 2 = ω + 2. The sequence
u0 , u1 , u2 , . . . is −+−0+++−+0−−−+−0+++−+0−− −+− . . ., giving A=I−T −T 2 , B=I+T +T 2 ,
C = D = −I+T +T 2 . The construction of Theorem 9.9 gives
+−− +++ −++ −++
−+− +++ +−+ +−+
−−+ +++ ++− ++−
A B C D −−− +−− +−− −++
−B −−− −+− −+− +−+
A −D C
−−− −−+ −−+ ++−
H= = +−− −++ +−− −−− .
−C D A −B −+− +−+ −+− −−−
−−+ ++− −−+ −−−
−D −C B A
+−− +−− +++ +−−
−+− −+− +++ −+−
−−+ −−+ +++ −−+
Indeed, of the fourteen groups of order 16, all but two (the cyclic and dihedral groups)
admit (16, 6, 2)-difference sets in multiple ways. However, different groups and distinct
difference sets in these groups can yield isomorphic designs. This is because a given design
may admit more than one regular (i.e. sharply transitive) group of automorphisms; recall
Theorem 8.5. What is true, however, is that the three nonisomorphic symmetric (16, 6, 2)-
designs yield three inequivalent regular Hadamard matrices. We describe just one of these,
beginning with the design.
verifying that D is a difference set in G with parameters (16, 6, 2). If we let K1 act regularly
on the rows of the grid, and K2 regularly on the columns, then G = K1 × K2 acts regularly
on the cells (i.e. points of the design) and D is the set of all g ∈ G mapping P into B,
for some choice of point P and block B. It is not hard to see that the resulting regular
Hadamard matrix is H4 ⊗ H4 in the notation of the previous paragraph.
4n2 = 16, we noted above that there is no (16, 6, 2)-difference set in a cyclic group of
order 16. This result, long known, is attributed to Turyn.
Theorem 9.16. There is no cyclic difference set with parameters (16, 6, 2), and
hence no circulant Hadamard matrix of order 16.
Proof. Let G = {1, g, g 2 , . . . , g 15 } be cyclic of order 16, and suppose there exists a differ-
P
ence set D ⊂ G with parameters (16, 6, 2). Then δ = d∈D d ∈ Z[G] is a sum of six distinct
P15
elements in G satisfying δδ ∗ = 4+2κ where κ = i=0 g i . Let O = Z[ζ], ζ=ζ16 and consider
the character χ ∈ Gb of order 16 satisfying χ(g) = ζ. Denote α = χ(δ) = P
d∈D χ(d) ∈ O.
By Theorem 8.7, |α| = 2. (Note here that (v, k, r) = (16, 6, 2) and n = 6 − 2 = 4.) Much
more than this, for every σ ∈ Aut Q[ζ], we have |σ(α)| = 2, since σ ◦ χ : G → hζi is also a
nontrivial character of G.
Evaluating both sides of
Φ8 (x) = x8 + 1 = (x − ζ)(x − ζ 3 )(x − ζ 5 ) · · · (x − ζ 15 )
at 1 yields
2 = (1 − ζ)(1 − ζ 3 )(1 − ζ 5 ) · · · (1 − ζ 15 ).
By Theorem 4.3, all eight factors in the latter product are associates of ε = 1 − ζ in O,
yielding the factorization of principal ideals (2) = (ε)8 , so N((ε)) = 2 and the ideal
(ε) ⊂ O is prime, the only distinct prime factor of (2) in O (i.e. the prime 2 ramifies
in Q[ζ]). Also (α)(α) = (4) = (ε)16 , so comparing prime factors on both sides gives
(α) = (ε)r and (α) = (ε)s where r + s = 16. Now 2r = N((α)) = N((α)) = 2s so
r = s and (α) = (α) = (ε)8 = (2). Since α and 2 are associates, α = 2u for some unit
u ∈ O× . Recalling that |σ(α)| = 2 for all σ ∈ Aut Q[ζ], we must have |σ(u)| = 1 for all
σ ∈ Aut Q[ζ]. By Theorem 4.10, u is a root of unity in O. By Theorem 4.2, u = ζ k for
some k ∈ {0, 1, 2, . . . , 15}. Without loss of generality, α = 2; otherwise use Theorem 8.6
to replace D by g −k D, another (equivalent) difference set in G with the same parameters
(16, 6, 2) satisfying χ(g −k δ) = ζ −k α = 2.
P15 P15
Now express δ in the form δ = k=0 ak g k where ak ∈ {0, 1} with k=0 ak = 6, and
note that ζ 8 = −1 to get
15 7
ak ζ k = (ak − ak+8 )ζ k .
P P
(9.17) 2 = α = χ(δ) =
k=0 k=0
Since the minimal polynomial of ζ over Q is x8 +1, (9.17) requires a0 = 2+a8 and ak =ak+8
for k = 1, 2, . . . , 7. Since ak ∈ {0, 1}, this is impossible.
Although Q[ζ16 ] is known to be a UFD, we did not require this in the latter proof; all our
factorizations were with regard to ideals.
10. Quadratic Reciprocity
Exercises 9.
1. Show that every Hadamard matrix of the form H = I + S where S T = −S with first row
+++· · ·+ and first column +−−· · ·− satisfies conditions (9.4) and (9.5). (Not much more needs
to be said about (9.4), but (9.5) requires some explanation.)
so do we. (But we will have nothing more to say about the Jacobi symbol in these notes.)
As with the quadratic character on F , several other properties of F underlie basic
properties of the ring Z. For example Fermat’s Little Theorem, in the form
(10.1) ap−1 ≡ 1 mod p for every integer a 6≡ 0 mod p, p prime,
is a direct consequence of the fact that F × is a group of order p − 1. And from (10.1) we
immediately obtain the equivalent form of Fermat’s Little Theorem,
(10.2) ap ≡ a mod p for every integer a, where p is any prime.
Also the formula χ(a) = a(p−1)/2 in F , restated in the context of rational integers, gives
Euler’s Criterion:
a
(10.3) ≡ a(p−1)/2 mod p for every integer a.
p
We restate the multiplicativity of the Legendre symbol, which follows either from facts
about F or from (10.3):
ab a b
(10.4) = for all integers a, b.
p p p
Part (c) is properly known as the Law of Quadratic Reciprocity; and we include (a)
and (b) for completeness. There are currently hundreds of proofs of this result known—
probably more proofs than any result other than the Theorem of Pythagoras. A current
census lists 246 distinct proofs:
http://www.rzuser.uni-heidelberg.de/∼hb3/fchrono.html
Gauss himself gave many different proofs; and the proof we give here is his sixth proof,
although it has been rediscovered by many others since then. Before proving this result,
we demonstrate its utility for computing specific values of the Legendre symbol:
Example 10.6: Computing the Legendre symbol using the Law of Quadratic Reciprocity.
a
Evaluate each of the values ( 331 ) for a = 83, 101 and 146, noting that 331 is prime.
83
Solutions: ( 331 ) = −( 331
83
) = −( 82
83
) = −( −1
83
) = −(−1) = 1.
101 2
( 331 ) = ( 331
101
28
) = ( 101 2
) = ( 101 7
) ( 101 ) = ( 101
7
) = ( 73 ) = −( 73 ) = −( 13 ) = −1.
3
( 146
331
2
) = ( 331 73
)( 331 ) = (−1)( 331
73
39
) = −( 73 3
) = −( 73 13
)( 73 ) = −( 73
3
73
)( 13 ) = −( 31 )( 13
8 2
) = −( 13 ) = −(−1)3 = 1.
The utility of the Law of Quadratic Reciprocity for computing values of the Legendre
symbol is primarily for using hand computation with small examples such as these. For
larger integers, Euler’s Criterion (10.3) is much easier to implement by computer; and it
avoids the difficulty of having to perform integer factorization on larger numbers (which
is prohibitive for integers of hundreds of digits). But of course, the Law of Quadratic
Reciprocity has many uses beyond such numerical examples as these.
Example 10.7: Factoring quadratic polynomials mod p. Factor each of the polynomials
x2 + 13x + 17 and 3x2 + 13x + 16 in F823 [x].
Solution: x2 + 13x + 17 has discriminant 132 − 4·17 = 101 where
101
( 823 ) = ( 823
101
15
) = ( 101 5
)( 101 ) = ( 101
3
)( 101
5
) = ( 32 )( 15 ) = (−1)(1) = −1.
This polynomial has no roots in F823 since its discriminant is a nonsquare; so it is irreducible in
F823 [x].
The polynomial 3x2 + 13x + 16 has discriminant 132 − 4·3·16 = −23 where
h i
3 2
( −23
823
−1
) = ( 823 23
)( 823 ) = (−1) −( 823
23
) = ( 18
23
2
) = ( 23 )( 23 ) = (1)(−1)2 = 1.
Since the discriminant is a nonzero square in F823 , there are two √ distinct roots mod 823. Other
computational methods (see Exercise #3) confirm that 16 (−13 ± −23) = 533, 560 are the two roots
in F823 , yielding the factorization 3x2 + 13x + 16 = 3(x − 533)(x − 560) = 3(x + 290)(x + 263).
Our proof of Theorem 10.5 will require some preparation. First note that
p−1 p − 1
X k 0 1 2
(10.8) = + + + ··· + = 0
p p p p p
k=0
thus for example S = ζ − ζ 2 − ζ 3 + ζ 4 when p = 5. Sums of the form (10.10) are called
quadratic Gauss sums. Gauss himself proved
p, if p ≡ 1 mod 4;
−1
2
Lemma 10.11. S = p=
p −p, if p ≡ 3 mod 4.
Proof. We expand
p−1 X
X p−1
2 k k ` `
S = ζ ζ
p p
k=1 `=0
p−1 X
p−1
X k ` k+`
= ζ
p p
k=1 `=0
p−1 X
p−1
X k`
= ζ k+` (by (10.4))
p
k=1 `=0
p−1 X
p−1 2
X k m k+km
= ζ (substituting ` = km)
m=0
p
k=1
p−1 X
p−1
X m (1+m)k
= ζ
p
k=1 m=0
p−1 p−1
X m X (1+m)k
= ζ .
m=0
p
k=1
p−1
P (1+m)k
Now if m 6= p−1 then the inner sum ζ = −1 by (10.9), whereas if m = p−1 we
p−1
P (1+m)k p−1
P k=1
have ζ = 1 = p−1. This leaves
k=1 k=1
p−2
2
X m p − 1
S =− + (p − 1)
m=0
p p
p − 1 p − 1
= + (p − 1) (by (10.8))
p p
p − 1
= p.
p
By Lemma 10.11,
√
± p, if p ≡ 1 mod 4,
S= √
±i p, if p ≡ 3 mod 4.
10. Quadratic Reciprocity
The ambiguous signs in this formula stare us in the face, begging to be resolved. This
is a natural and compelling problem, which perplexed Gauss for years before finally the
answer came to him. As Gauss wrote to a friend in 1805:
The determination of the sign of the root has vexed me for many years. This deficiency over-
shadowed everything that I found: over the last four years, there was rarely a week that I did not
make one or another attempt, unsuccessfully, to untie the knot. I succeeded—but not as a result
of my search but rather, I should say, through the mercy of God. As lightning strikes, the riddle
has solved itself.
The conclusive result, as Gauss showed, has ‘+’ in place of each of the signs ‘±’ above;
however for the purpose of proving the Law of Quadratic Reciprocity, the less definitive
version stated in Lemma 10.11 above suffices.
Before proceeding further, here is our last preparatory result:
This follows from the binomial expansion of (x + y)q , using the fact that all binomial
q!
coefficients kq = k!(q−k)!
are divisible by q for k ∈ {1, 2, . . . , q−1}. This argument has
appeared in the proof of Theorem 3.7, but here our setting is a little more general: our x
and y are not in Fpe , nor are they in Z; they are indeterminates. So (10.12) is sufficiently
general as to apply in an arbitrary commutative ring. We will in particular evaluate (10.12)
for x, y ∈ Z[ζp ] in the course of the following proof.
Proof of Theorem 10.5. Let p, q be distinct odd primes, and consider the Gauss sum
p−1
P k k q−1
S= p ζ ∈ O as in (10.11), where O = Z[ζ], ζ = ζp . Taking the 2 power of the
k=0
relation in the Lemma 10.11 gives
−1 (q−1)/2
(p−1)(q−1)/4 p
q−1 2 (q−1)/2 (q−1)/2
S = (S ) = p ≡ (−1) mod qO.
p q
Multiplying both sides by S gives
p
(−1)(p−1)(q−1)/4 S ≡ Sq
q
q−1 q
X k qk
≡ ζ (by (10.12))
p
k=0
q−1
X k
≡ ζ qk (since q is odd)
p
k=0
q−1
X kq 2 qk
≡ ζ (since q 6≡ 0 mod p)
p
k=0
10. Quadratic Reciprocity
q−1
X `q
≡ ζ` (substituting ` = kq)
p
`=0
q−1
X ` q `
≡ ζ (by (10.4))
q p
`=0
q
≡ S mod qO.
p
Again we multiply both sides by S to obtain
(p−1)(q−1)/4 p
q
2
(−1) S ≡ S 2 mod qO
q p
and since S 2 = −1
p p, which is an integer relatively prime to q, this gives
p q
(−1)(p−1)(q−1)/4 ≡ mod qO.
q p
All factors on both sides of this expression are ±1 so this gives part (c) of the Theorem.
Part (a) of the Theorem follows immediately from Euler’s
√ Criterion (10.3). For (b),
πi/4 1+i −1
let O = Z[ζ], ζ = ζ8 = e = 2 ; and let τ = ζ + ζ = 2.
√
ζ 2 =i
• ....................................
........... ......... . ...................................
........ ........ ......... ......
3 ..................................... . ......... ......
ζ
ζ .............
. ...............
• .
.
. ..
..
..
....... .
.
. • ..........
. ......
. .............
..
. ... . . ... ... .
.. . .
. . . . ... ... .
... ... .
. ... ...
..... ..... .
.
. . ... ... .
... .... . . ...
. ... ... .
. .... ... ... .
.... .... . .
.. ...
......
. .
. .
. ...
.. π/4
... ...
.....
.
.
.
−1
√
...... . . . . . . . . . . ... .. . . . ..... . . . . . . ........ .
ζ 4= − 1•.....
...... . 0• .
...
.
•1 .. .
...
....
. .
• τ = ζ+ζ
.
. = 2
... ... . . . .
. ..
. .
... ... . . . .
..
... ... . . ... .. . .
... ... . . . ... ...
... ... . . . .... ..... . .
.
... ... . . . ... ... .
... ... . . ... ....
... ... . ... ..
..... . . .
. ............
....... . .
• ........
.............
...... ........
. •
...............
...
...........
−1
ζ 5= − ζ ...... .........
........ .........
.
.......... ......... . ............................. .. .
........................
ζ =ζ 7
• .............................................
ζ 6= − i
By Euler’s Criterion (10.3),
2
≡ 2(p−1)/2 ≡ τ p−1 mod pO
p
so by (10.12),
2
τ ≡ τ p ≡ (ζ + ζ −1 )p ≡ ζ p + ζ −p mod pO.
p
Since ζ is an eighth root of unity, we obtain
2 2
τ ≡ (−1)(p −1)/8 τ mod pO
p
where
1, if p ≡ ±1 mod 8;
(p2 −1)/8
(−1) =
−1, if p ≡ ±3 mod 8.
10. Quadratic Reciprocity
2
2 −1)/8
Multiplying both sides by τ gives 2 p ≡ (−1)(p 2 mod pO; and since 2 is relatively
prime to p, (b) follows.
Throughout our proof, ‘mod pO’ and ‘mod qO’ can be read as ‘mod p’ and ‘mod q’
respectively. The careful reader will observe that for x, y ∈ Z, we have x ≡ y mod pO iff
x ≡ y mod p in the usual sense; this is because Z ∩ pO = pZ.
We now resolve the sign of S. Until now we have required only that ζ is a root of Φp (t)
in some extension of Q (as this completely determines the algebraic structure of Q(ζ) ⊃ Q).
P k k
In order to resolve the ambiguous sign of S = k p ζ , we must fix a choice of ζ. This
choice needs to be made using extraneous (i.e. non-algebraic) properties of elements of our
extension, such as ordering, as without such considerations, all choices of ζ (and choices
of the sign of S) are equivalent under field automorphisms. The elusiveness of the proof of
Theorem 10.13 may be attributed to the inaccessibility of this result from purely algebraic
arguments within Q(ζ).
p−1
k
k
Theorem 10.13. Fix ζ = e2πi/p where p is an odd prime, and let S =
P
p ζ .
k=1
Then √
p, if p ≡ 3 mod 4;
S= √
i p, if p ≡ 1 mod 4.
We present the proof found in [IR, Section 6.4]. Other proofs are available; for example
Dirichlet gave a proof using Fourier analysis. The proof in [LN, Theorem 5.15] uses some
representation theory. The correct generalization of Theorem 10.13 to all fields of odd
order is proved in Section 12 using the Hasse-Davenport relations.
p−1
Proof of Theorem 10.13. Evaluate Φp (x) = 1 + x + x2 + · · · + xp−1 = (x − ζ r ) at 1 to
Q
r=1
get p−1 p−1
p−1
p−1
Y Y 2
Y Y
r 4j 4j
p = Φp (1) = (1 − ζ ) = (1 − ζ ) = (1 − ζ ) (1 − ζ 4k )
r=1 j=1 j=1 k= p+1
2
p−1 p−1
2 2
=
Y
(1 − ζ 2p+2−4` )
Y
(1 − ζ 2p−2+4m ) (substitute ` = p+1
2 −j
p−1
and k = 2 + m)
`=1 m=1
p−1 p−1
Y2 2
Y
= (1 − ζ 2−4k )(1 − ζ 4k−2 ) = (ζ 2k−1 − ζ 1−2k )(ζ 1−2k − ζ 2k−1 )
k=1 k=1
p−1
2
p−1 Y
= (−1) 2 (ζ 2k−1 − ζ 1−2k )2 .
k=1
This says that
p−1 p−1
2 2
p−1 Y Y p−1 √
4k−2
(ζ 2k−1 − ζ 1−2k ) = ±i
(2i) 2 sin p π = 2 p
k=1 k=1
10. Quadratic Reciprocity
p+3 p−1
where the factor sin( 4k−2
π is negative iff 6k6 2 ; and the number of integers k
p−3 p 4
in this range is 4 . Thus
p−1
√
2
Y
b p−3 p−1 √ p, if p ≡ 1 mod 4;
(ζ 2k−1
−ζ 1−2k
) = (−1) 4 c i 2 p= √
i p, if p ≡ 3 mod 4.
k=1
This is exactly the conjectured value of S; and combining this with Lemma 10.11 and the
remarks following it,
p−1
Y2
p−1 p−1
2 2
p−1
Y
p−1 p−1 Y
p−1 p−1 (p − 1)!
2 !ε (4k−p−2) ≡ 2 !ε·2
2 (2k−1) = 2 ! ε · 2 2
2·4·6 · · · (p−1)
k=1 k=1
= (p − 1)!ε ≡ −ε mod p
using Wilson’s Theorem (Exercise #3.1(b)). So our original congruence relating (10.15)
and (10.16) simplifies as −1 + ε ≡ 0 mod p. Since ε = ±1 and p is an odd prime, this
forces ε = 1, which completes our proof.
Exercises 10.
1. Evaluate each of the following Legendre symbols by hand as done in Example 10.6. Then use
appropriate computer software with arbitrary precision arithmetic capability to check your answer
by Euler’s Criterion (10.3).
−7
(a) ( 59
89
) (b) ( 233 ) (c) ( 111
347
) (d) ( 620
503
) (e) ( 709
809
)
2. (a) Using the Law of Quadratic Reciprocity, show that a prime p admits solutions of the con-
gruence x2 ± x + 1 ≡ 0 mod p iff p = 3 or p ≡ 1 mod 3.
(b) Now consider a finite field F = Fq where q = pe , and assume that p 6= 3. Show that solutions
of x2 + x + 1 = 0 in F are primitive cube roots of unity in F ; and solutions of x2 − x + 1 = 0
are primitive sixth roots of unity in F .
(c) Using the fact that the multiplicative group F × is cyclic, show that F has a primitive cube
root of unity iff q ≡ 1 mod 3; and F has a primitive sixth root of unity iff q ≡ 1 mod 6. (This
requires only elementary properties of cyclic groups.) Conclude that a finite field F of order
q has solutions of x2 ± x + 1 = 0 iff q ≡ 1 mod 3.
3. Complete the remaining steps of Example 10.7, using appropriate computer software with arbitrary
precision arithmetic capability.
(a) According to Example 10.7, ( −23
823
) = 1. Confirm this fact using Euler’s Criterion (10.3), by
computing (−23) 411 mod 823. (Note: 411 = 12 (823 − 1).)
(b) Using (a), evaluate (−23)412 mod 823, noting that 412 = 411 + 1.
(c) Now evaluate (−23)206 mod 823, noting that 206 = 412/2.
(d) Using the previous steps, find the two square roots of −23 mod 823.
(e) Find the two roots of 3x2 + 13x + 16 ∈ F823 [x].
(f) Generalizing your work, present to an algorithm for computing square roots mod p for an
arbitrary prime p ≡ 3 mod 4. Explain why this algorithm works.
Each finite field E = Fq gives rise to two finite groups, the additive group E of order q and
the multiplicative group E × of order q − 1. Characters of these groups are called additive
characters and multiplicative characters respectively. Much of the interplay between
these two groups is encoded in the language of Gauss sums. Following the notation of
11. Gauss and Jacobi Sums
Tr(ax)
Lemma 11.1. (i) The q additive characters of E have the form ψa (x) = ζp for
a ∈ E.
(ii) Fix a generator ω for E × . Then the q −1 multiplicative characters of E have the
form χk (ω r ) = ζq−1
kr
for k ∈ Z/(q−1)Z; and χk (0) = 0 for 1 6 k < q − 1.
In particular, additive characters have values in Z[ζp ]; and multiplicative characters
have values in Z[ζq−1 ]. More precisely, if d = gcd(k, q−1), then χk has values in
Z[ζ(q−1)/d ].
lcm(k,q−1)
We refer to χk (k 6= 0) as a character of order q−1 d = k where d = gcd(k, q−1),
q−1 ×
since d is the order of χk in the group E c . In this notation, χq−1 is another name for
the trivial multiplicative character χ0 described above. For q odd, χ q−1 is the quadratic
2
character. The trivial additive character is ψ0 (x) = 1.
Proof of Lemma 11.1. Clearly each ψa ∈ E, b and the trivial additive character is ψ0 (x) = 1
for all x ∈ E. If a 6= b in E then by Theorem A1.7(ii), there exists x ∈ E satisfying
Tr[(a − b)x] 6= 0, so ψψab (x)
(x)
= ψa−b (x) 6= 1 and ψa 6= ψb . So the additive characters ψa ∈ E b
(a ∈ E) are distinct; and since |E| b = q by Theorem 6.1(b), all additive characters have
this form. This proves (i), and (ii) is similar.
For χ ∈ E
c× and ψ ∈ E,
b we define the Gauss sums
X X
G(χ, ψ) = χ(x)ψ(x) ∈ Z[ζp , ζq−1 ] = Z[ζ(q−1)p ]; G(χ) = G(χ, ψ1 ) = χ(x)ζ Tr x .
x∈E x∈E
11. Gauss and Jacobi Sums
This generalizes the quadratic Gauss sum S from Section 10. Questions about G(χ, ψ) can
usually be reduced to questions about G(χ), due to (i) below.
1
χ(x)ζ Tr(ax) = u∈E χ ua ζ Tr u = χ(a)
P P
Proof. (i) If a 6= 0 then G(χ, ψa ) = x∈E G(χ) =
P
χ(a)G(χ). Now suppose a = 0, so G(χ, ψ0 ) = x∈E χ(x). If χ 6= χ0 then by the
convention above, χ(0) = 0 and the remaining terms cancel since χ and χ0 are orthogonal
by Theorem 6.2(a), so again the conclusion holds.
(ii) G(χ) = x∈E χ(x)ζ Tr x = x∈E χ(x)ζ − Tr x = G(χ, ψ−1 ) = χ(−1)G(χ)
P P
P
Proof. (ii) G(χ, ψ0 ) = χ(x) = 0 since χ and χ0 are orthogonal (Theorem 6.2(a)).
x∈E ×
P
(iii) G(χ0 , ψ) = ψ(x) = 0 since ψ and ψ0 are orthogonal (Theorem 6.2(a)).
x∈E
P
(iv) G(χ0 , ψ0 ) = 1 = q.
x∈E
X X
(i) |G(χ, ψ)|2 = χ(x)ψ(x)χ(y)ψ(y) = q − 1 + χ(x)ψ(x)χ(y)ψ(y)
x,y∈E x6=y6=0
X
= q−1 + χ( xy )ψ(x−y)
x,y6=0
X
uv v
= q−1 + χ(u)ψ(v) (substituting x = u−1 , y= u−1 )
u6=1
v6=0
X X
= q−1 + χ(u) ψ(v) = q − 1 + (−1)(−1) = q (see (ii),(iii)).
u6=1 v6=0
Proof. (ii) and (iii) follow easily using Theorem 11.3(ii). If χχ0 = χ0 then
X X X
J(χ, χ0) = χ(x)χ−1(1 − x) = x
χ 1−x = χ(u) = −χ(−1)
x6=1 x6=1 u6=−1
using Theorem 11.3(ii) and the substitution x = Finally if χχ0 6= χ0 then
u
u+1 .
X X X
G(χ)G(χ0) = χ(x)χ0(y)ζ Tr(x+y) = χ(x)χ0(−x) + χ(x)χ0(y)ζ Tr(x+y)
x,y∈E x x+y6=0
X X
= χ0 (−1) (χχ0)(x) + χ(ts)χ0((1−t)s)ζ Tr s
x∈E s,t∈E
s6=0
0
= 0 since χχ0 6= χ0 , so
P
using the substitution (x, y) = (ts, (1−t)s). Now x (χχ )(x)
X X
G(χ)G(χ0) = χ(t)χ0(1−t) χ(s)χ0(s)ζ Tr s = J(χ, χ0)G(χχ0).
t∈E s∈E
When χ, χ0 and χχ0 are all nontrivial, their Gauss sums are nonzero (by Theorem 11.3(i))
and then we can solve for J(χ, χ0) to obtain the value claimed.
Corollary 11.5. (i) Every prime p ≡ 1 mod 4 is expressible in the form p = a2 +b2
with a, b ∈ Z.
(ii) Every prime p ≡ 1 mod 3 is expressible in the form p = a2 −ab+b2 with a, b ∈ Z.
where all factorizations are obtained from the first one by migration of units.
For a rational prime p ≡ 1 mod 3, we have twelve solutions of p = a2 − ab + b2 , all
arising from the same factorization of p by migration of units in Z[ω]:
p = a + bω a−b − bω = −b + (a−b)ω −a + (b−a)ω = b−a − aω b + aω
= −a − bω b−a + bω = b + (b−a)ω a + (a−b)ω = a−b + aω −b − aω .
These are in fact all the factorizations of p in the ring O = Z[ω] of Eisenstein integers,
since the factors shown are irreducible (since they have norm equal to p, a prime) and O
has unique factorization up to units, of which there are exactly six. The resulting twelve
solutions of p = a2−ab+b2 are classified according to the pair of residues (a mod 3, b mod 3)
which evidently cannot be (0, 0), (1, 2) or (2, 1) as these choices yield a2 −ab+b2 ≡ 0 mod 3.
This means that the twelve solutions (a, b), reduced mod 3, yield each of the six remaining
pairs ±(1, 1), ±(1, 0), ±(0, 1) twice. In particular there are two solutions (a, b), (a−b, −b)
which reduce as (2, 0) modulo 3; and we now show that these two solutions are exactly the
ones arising from Jacobi sums of cubic characters:
Now multiply both sides by G(χ) and use Theorem 11.3(i) to get pJ(χ, χ) = G(χ)3 . We
will reduce both sides of the latter relation (in O = Z[ω]) modulo 3, to obtain a relation
in the quotient ring O/3O, a local ring (but not a field) of order 9:
X 3 X
pJ(χ, χ) = G(χ)3 = χ(x)ζ x ≡ χ(x)3 ζ 3x = −1 mod 3
x∈F x∈F
3x u 3
P P
since x∈F ζ = u∈F ζ = 0 and χ(x) = 0 or 1 according as x = 0 or x 6= 0. Since
p ≡ 1 mod 3, (i) gives J(χ, χ) = a + bω ≡ −1 mod 3, i.e. a ≡ 1 and b ≡ 0 mod 3. From
the defining formula it is clear that J(χ, χ) = J(χ, χ) = a + bω = a−b − bω. Note that the
resulting coefficients (a, b) and a−b, −b) are in fact both of the pairs which reduce to (2, 0)
modulo 3 as described.
It is straightforward to check that the substitution (A, B) = (2a−b, 3b ) transforms
an integer solution of p = a2 − ab + b2 with a ≡ 2 and b ≡ 0 mod 3 to a solution of
4p = A2 +27B 2 with A ≡ 1 mod 3. Conversely, given an integer solution of A2 +27B 2 = 4p,
an easy inspection of this relation modulo 8 shows that A and B must both be odd, so
(a, b) = ( 21 (A+3B), 3B) gives a pair of integers congruent to (2, 0) mod 3 and satisfying
a2 − ab + b2 = p.
Theorem 11.7. Given a prime p ≡ 1 mod 3, the number of solutions of the equation
x3 + y 3 = 1 over Fp is p − 2 + A where 4p = A2 +27B 2 , A ≡ 1 mod 3 as in Lemma 11.6.
where #(x3 = a) is the number of solutions of x3 = a in E, and similarly for the other
factor. Compare the values of #(x3 = a) with the values of a cubic character χ = χ p−1
3
on E:
1, if a = 0; 0, if a = 0;
3 ×
#(x =a) = 3, if a ∈ E is a cube; χ(a) = 1, if a ∈ E × is a cube;
0, if a ∈ E × is not a cube; ω or ω, if a ∈ E × is not a cube.
Observe that #(x3 = a) = 1 + χ(a) + χ(a) = χ0 (a) + χ(a) + χ(a) where χ0 is the trivial
multiplicative character, and the two cubic characters are χ = χ p−1 and χ = χ 2(p−1) . Thus
3 3
X
#(x3 + y 3 = 1) = (χ0 (a) + χ(a) + χ(a))(χ0 (b) + χ(b) + χ(b))
a+b=1
2 X
X 2
= J(χi , χj ).
i=0 j=0
This sum has nine terms, most of which are given by Theorem 11.4:
J(χ0 , χ0 ) = p; J(χ0 , χ) = J(χ0 , χ) = J(χ, χ0 ) = J(χ, χ0 ) = 0; and
J(χ, χ) = J(χ, χ) = −χ(−1) = −1
using again the fact that χ(−1) = ±1 but χ(−1)3 = 1. The remaining two Jacobi sums
are given, in the notation of Lemma 11.6, by
J(χ, χ) + J(χ, χ) = a+bω + a+bω = a+bω + (a−b)−bω = 2a−b = A.
Adding these nine Jacobi sums gives #(x3 + y 3 = 1) = p − 2 + A.
Exercises 11.
q−1
1. According to Theorem 3.6(i), the number of solutions of x2 + y 2 = 1 in Fq is q − (−1) 2 when
q is odd.
(a) Give another proof using Jacobi sums, similar to the proof of Theorem 11.7.
(b) Explain why the number of solutions of x2 + y 2 = 1 in Fq is exactly q when q is even.
2. Make a table with five columns, labelled: p, A, B, p−2+A, ‘solutions’. In the first column, list all
primes p ≡ 1 mod 3 less than 50. For each p, find the integers (A, ±B) satisfying 4p = A2 +27B 2
with A ≡ 1 mod 3, and list them in columns 2 and 3, entering also the value of p−2+A in
column 4. In the last column, list all pairs (x, y) over Fp satisfying x3 + y 3 = 1 for x, y ∈ Fp
(note: list all solutions, not just the number of solutions). Count solutions in column 5 in each
case and verify that the number of solutions agrees with the expected number from column 4, as
predicted by Theorem 11.7. You may use a computer to perform this exercise.
×
3. Let E = Fp , p prime. Choose a multiplicative character χ ∈ E c of order n; recall that n divides
p − 1. By definition, G(χ) ∈ Z[ζn , ζp ] = Z[ζnp ]. Prove that G(χ)n ∈ Z[ζn ].
Hint : Choose r ∈ {1, 2, 3, . . . , p−1} which is a generator for E × . By the Chinese Remainder
Theorem, there exists k ∈ Z such that k ≡ 1 mod n and k ≡ r mod p. Now Q[ζnp ] has a unique
automorphism satisfying σ(ζnp ) = ζnp k . Find the fixed field of σ (denoted Fix
Q[ζnp ] (σ) in Ap-
pendix A5). Show that σ(G(χ)) = χ(r)G(χ) and take nth powers.
12. Zeta Functions and L-Functions
Fix a finite field F = Fq . We will see that characters on F lift naturally to characters on
finite extension fields K ⊇ F via the trace and norm maps of the extension. It is natural
to ask how the Gauss sums of the lifted characters on K, may be expressed in terms of the
Gauss sums of the original characters on F . This is possible using the Hasse-Davenport
relations, which we prove in this section. In particular, this generalizes the explicit formula
for quadratic Gauss sums over prime fields (Theorem 10.13) to an explicit formula for
quadratic Gauss sums over arbitrary finite fields (Corollary 12.11). The key tool in this
development is L-functions, which we must first introduce. Because we deal here with
L-functions over function fields, students may first want to glance through Appendix A6
where L-functions over number fields are described. If, as we expect, the number field case
is more familiar to students, then that Appendix may serve as a bridge to the results in
this Section. Yet in no way do we actually require the results of Appendix A6.
Recall that the polynomial ring O := F [x] is a principal ideal ring; indeed, every
nonzero ideal A ⊆ O has a unique monic generator. The norm of an ideal A ⊆ O is the
number of cosets: N(A) = |O/A| = q n assuming A has a generator of degree n. The norm
is multiplicative: N(AB) = N(A) N(B) for all ideals A, B ⊆ O. Every nonzero prime ideal
P ⊂ O is maximal, and has the form P = (f (x)) where f (x) ∈ O is irreducible; and then
the residue field O/P = O/(f ) ∼ = Fqn where n = deg f .
The zeta function of O is the complex-valued function defined by
X 1
ζO (s) =
N(A)s
A
where the sum is over all nonzero ideals A ⊆ O. (Compare O with the ring Z whose
nonzero ideals have the form (n) = nZ for n > 1, giving the Riemann zeta function ζ(s) =
P∞ −s
P∞
n=1 |Z/nZ| = n=1 n−s .) Since nonzero ideals in O factor uniquely as products of
prime ideals, we obtain the factorization
Y 1 −1
ζO (s) = 1−
N(P)s
P
where P ranges over all nonzero prime ideals of O. This is the Euler factorization of
ζO (s), valid for exactly the same reasons as for the Riemann zeta function (or for Dedekind
zeta functions of more general number fields; see Appendix A6): it is a restatement of the
unique factorization property for nonzero ideals, using the fact that the norm is multi-
plicative.
Now every nonzero ideal A ⊆ O has a unique monic generator f (x) ∈ O; and N(A) =
deg f
q . Moreover there are exactly q n monic polynomials of degree n, so
∞ ∞
X qn X 1
(12.1) ζO (s) = ns
= qn zn =
n=0
q n=0
1 − qz
12. Zeta Functions and L-Functions
where we have substituted z = q −s . The series converges in the right half-plane <s > 1,
i.e. in the open disk |z| < 1q ; but by analytic continuation, the function is meromorphic in
z with a simple pole at z = 1q . The Euler factorization yields
∞ ∞
Y 1 −nd Y −n
(12.2) ζO (s) = 1 − ds = 1 − zd d
q
d=1 d=1
where nd is the number of prime ideals of norm q d , i.e. the number of monic irreducible
polynomials of degree d; see Theorem 3.13. Let’s verify directly the equality of the two
expressions (12.1) and (12.2). Since both series have constant term 1, it suffices to compare
their derivatives with respect to z. It is more convenient to use logarithmic differentiation:
0
we apply the operator Df (z) = z dz d
f (z) = z ff (z)
(z)
to both (12.1) and (12.2), and compare
the results. For (12.1) we get
∞
d 1 qz X
(12.3) z log = = qn zn
dz 1 − qz 1 − qz n=1
Since (12.3) and (12.4) agree, and since (12.1) and (12.2) have the same constant term 1,
it follows that (12.1) and (12.2) agree.
Now let λ be a complex-valued multiplicative function defined on the monoid of
nonzero ideals of O. This means that for nonzero ideals A, B ⊆ O, we have λ(AB) =
λ(A)λ(B). We define the L-function
X λ(A)
Lλ (s) =
N(A)s
A
where the sum is again over all nonzero ideals A ⊆ O. Note that for the constant function
λ(A) = 1, this is just the zeta function. We shall immediately substitute z = q −s as before.
In all cases of interest we shall have |λ(A)| 6 1; so that by comparison, Lλ (s) converges in
the open disk |z| < 1q . The multiplicative property of λ (together with the multiplicative
property of the norm, as before) means that Lλ (s) admits an Euler factorization
Y λ(P) −1
Lλ (s) = 1−
N(P)s
P
where P ranges again over all nonzero prime ideals of O.
Now each nonzero ideal has a unique monic generator; so it makes sense to write
λ(f ) = λ(A) where A = (f (x)) ⊆ O and f (x) ∈ M ; here we denote by M the monoid of
monic polynomials in O. For d > 0, denote by Md ⊂ M the subset consisting of monic
12. Zeta Functions and L-Functions
polynomials of degree d. Also let P ⊂ M be the set of monic irreducible polynomials; and
Pd = P ∩ Md is the set of monic irreducible polynomials of degree d. Thus
X λ(f ) X X∞ X
deg f
(12.5) Lλ (s) = = λ(f )z = λ(f )z n
q s deg f
f ∈M f ∈M n=0 f ∈Mn
∞ Y
Y λ(f ) −1 Y deg f −1
Y −1
1 − λ(f )z d .
= 1 − s deg f = 1 − λ(f )z =
q
f ∈P f ∈P d=1 f ∈Pd
Now how do we come up with suitable choices of λ, other than the constant λ(A) =
1? and which choices of multiplicative function are most useful? It is easy to concoct
multiplicative functions: simply define λ(f ) for f ∈ P arbitrarily, as this will uniquely
extend to the entire monoid M using the multiplicative property. And as long as we
choose |λ(f )| 6 1 for f ∈ P , λ will satisfy this bound for all f ∈ M .
Our interest is in a very special choice of λ, which will greatly simplify the coefficient
n
of z in (12.5). To this end, we first fix a multiplicative character χ ∈ Fc× and additive
character ψ ∈ Fb as in Section 11. For an arbitrary monic polynomial
define λ(f ) = χ(ad )ψ(a1 ). In particular for f (x) = x − a ∈ M1 , we have λ(f ) = χ(a)ψ(a).
And of course for the unique monic constant polynomial, we require λ(1) = 1.
Proof. (i) Let f (x) ∈ Md as above, and g(x) = xe −b1 xe−1 +· · ·+(−1)e−1 x+(−1)e ∈ Me .
Then
f (x)g(x) = xd+e − (a1 +b1 )xd+e−1 + · · · + (−1)d+e ad be ∈ Md+e .
We have
λ(f )λ(g) = χ(ad )ψ(a1 )χ(be )ψ(b1 ) = χ(ad be )ψ(a1 +b1 ) = λ(f g).
P
(ii) Since M1 consists of polynomials of the form x − a for a ∈ F , we have λ(f ) =
f ∈M1
P
χ(a)ψ(a) = G(χ, ψ).
a∈F
12. Zeta Functions and L-Functions
since either χ 6= χ0 or ψ 6= ψ0 .
Lemma 12.7 gives the coefficient of z n in the series expansion of (12.5), which therefore
(assuming χ and ψ are not both trivial) reduces to a polynomial of degree 1:
Moreover all algebraic conjugates of a (the d roots of f (x)) all contribute this same term
to the sum. Thus the coefficient of z n on the right side of (12.8) is
n
X X X
d λ(f ) d = χK (a)ψ K (a) = G(χK , ψ K ).
d|n f ∈Pd a∈K
Now comparing coefficients in (12.8) gives G(χK , ψ K ) = (−1)n−1 G(χ, ψ)n . This is known
as the Hasse-Davenport lifting relation, which can also be rewritten slightly as
P P
Lang [L1] defines G(χ, ψ) to be − a χ(a)ψ(a) instead of a χ(a)ψ(a), thereby simplifying
this formula and some others. This seems such a natural choice that I was tempted to
follow it in these notes; but ultimately I settled on the choice of most authors for the sake
of consistency.
Now in the case p is odd, the quadratic character of F = Fp is χ(a) = ap , this
being the unique character χ ∈ Fc× of order 2. Not surprisingly, the character χK ∈ K c×
obtained by lifting, is nothing other than the quadratic character of K, this being its
unique multiplicative character of order 2. We check that
which is indeed the quadratic character on K; see also Exercise #3.3. As a special case of
Theorem 12.10, we have
Corollary 12.11. Let χ ∈Kc× be the quadratic character on a finite field of odd
√
n
(−1)n−1 q, if p ≡ 1 mod 4;
order q = p . Then G(χ) = √
(−i)n+2 q, if p ≡ 3 mod 4.
Exercises 12.
1. According to the Kronecker-Weber Theorem (see Section 4), every abelian extension of Q is
contained in a cyclotomic extension. In particular, every quadratic extension of Q should be
contained in a cyclotomic extension. Here we verify this fact without using the Kronecker-Weber
Theorem. Let F ⊃ Q be a quadratic field extension; this is, [F : Q] = 2.
√
(a) Show that F = Q[ d] for some integer d 6≡ 0 mod 4. (Hint: Choose θ ∈ F , θ ∈ / Q, and let
f (x) ∈ Q[x] be the minimal polynomial of θ over Q. Consider the discriminant of f .)
√
(b) If d is as in (a), use Corollary 12.11 to show that d ∈ Q[ζn ] for some positive integer n. Verify
this first in the case that d is a prime power; then extend to the general case d 6≡ 0 mod 4.
for an arbitrary function f : F → F . As we shall soon see, for quadratic polynomials f (x)
the corresponding sums are already expressible in the language of Gauss sums. For more
general functions f : F → F there is much more to be said; and the case where χ is trivial
is already sufficiently interesting. This leads to the definition of exponential sums given
below; and it is worth keeping in mind that both Gauss sums and exponential sums are
special cases of sums having the form suggested above. While neither type of sum (Gauss
or exponential) is a generalization of the other, we shall see that the two types of sum
coincide in the quadratic case.
For an arbitrary function f : F → F , F = Fp , ζ = ζp = e2πi/p , we define the
exponential sum of f as X
Sf := ζ f (a) ∈ Z[ζ].
a∈F
For more general finite fields E = Fq , q = pe , we must compose with the trace map
Tr = TrE/F : E → F . Recall that this is the F -linear map defined by
2 e−1
Tr a = a + ap + ap + · · · + ap .
noting that values of Sf are still in the same ring Z[ζ], ζ = ζp as before. Of course Sf
depends really only on the multiset of values of f rather than on f itself: in general there
13. Exponential Sums
will be many functions g : E → E such that f and g take each value in E the same number
of times, in which case Sg = Sf . In particular, there are q! permutations of E, all having
the same exponential sum Sf = 0, this being a consequence of the relation a∈E ζ Tr a = 0,
P
which remains valid after an arbitrary permutation of terms in the sum. We note that (ii)
only holds in the case of prime fields.
Proof. Conclusion (i) is a simple application of the triangle inequality. In (ii), the argu-
ment above proves the ‘⇐’ in both ‘iff’ statements; and the ‘⇒’ direction in both statements
follows using the fact that Φp (x) = 1 + x + x2 + · · · + xp−1 is the minimal polynomial of ζ
over F .
This bound, which we refer to as Weil’s bound, also known as the Hasse-Davenport-
Weil bound, is actually the result of several 20th century mathematicians including André
Weil. The first complete proof of this bound relies on Pierre Deligne’s work on the Weil
conjectures, work completed in 1973 and for which he received the Fields Medal in 1978.
While the Weil conjectures are quite deep and far-reaching, today we have proofs by more
elementary methods; see [LN], [Sc]. We will not present the full proof of Weil’s bound; but
in Section 17 we present some of the key elements in an ‘elementary’ proof. We mention
that Weil’s bound extends also to Galois rings; and that we [MSW] have applied Weil’s
bound to eigenvalues of algebraically defined graphs, both for finite fields and for Galois
rings.
Note the obvious necessity of d > 1 in Weil’s bound; and the necessity of the hypothesis
p 6 d is discussed in Exercise #2. Weil’s bound is of course useless for larger values
√
d > 1+ q, since in that case it is weaker than the trivial bound of Lemma 13.1(i). In
applications of Weil’s bound, the reader is reminded that every function f : E → E is
representable as a polynomial of degree d 6 q−1, simply using interpolation. In general,
the strength of Weil’s bound depends on the particular choice of p-th root of unity ζ (recall
that one has φ(p) = p − 1 choices for ζ). Of course for d = 1, Weil’s bound holds with
equality (since polynomial maps of degree 1 are permutations; see Lemma 13.1(ii)). It is
not hard to show that equality also holds for quadratic polynomials:
Recall that the generalization of (ii) for arbitrary odd q, giving the exact value of the
quadratic Gauss sum G(χ), was found at the end of Section 12.
Proof of Theorem 13.3. The quadratic Gauss sum is
X X X X X
G(χ) = χ(x)ζ Tr x = ζ Tr y − ζ Tr y = 1 + 2 ζ Tr y = −1 − 2 ζ Tr y
x∈E χ(y)=1 χ(y)=−1 χ(y)=1 χ(y)=−1
In the case of prime fields E = Fp , we have already observed that those functions
f : E → E attaining |Sf | = 0 are just the permutations of E. Similarly, Theorem 13.3
√
admits a converse which characterizes those functions attaining Weil’s bound |Sf | = q
as exactly the quadratic polynomials, or permuted versions thereof (via Lemma 13.1(ii)):
√
Proof. If f has the same multiset of values as a quadratic polynomial, then |Sf | = p by
Lemma 13.1(ii) and Theorem 13.3.
√
Conversely, suppose |Sf | = p, so that Sf Sf = p = Sg Sg where g(x) = x2 . So the
principal ideals in O = Z[ζ] generated by the algebraic integers α := Sf and β := Sg satisfy
(α)(α) = (β)(β) = (p) = (ε)p−1 where the ideal (ε) = (1 − ζ) ⊂ O is the only distinct
prime factor of (p); see Theorem 4.4. By unique factorization of ideals we therefore have
(α) = (ε)r and (α) = (ε)s for some nonnegative integers satisfying r + s = p − 1. Since
NQ[ζ]/Q(α) = NQ[ζ]/Q(α), we have r = s; so (α) = (α) = (ε)(p−1)/2 . The same argument
applies to β, giving (β) = (ε)(p−1)/2 = (α). Thus α and β are associates, and α = uβ for
√
some unit u ∈ O× . Since since |α| = |β| = p, we obtain |u| = 1. Moreover for every
σ ∈ Aut Q[ζ], Theorem 4.1 gives
so |σ(u)| = 1. By Theorem 4.10, u is a root of unity. Now the only roots of unity in Q[ζ] are
±1, ±ζ, . . . , ±ζ p−1 by Theorem 4.2, so we have two cases. If u = ζ k , k ∈ {0, 1, 2, . . . , p−1},
then Sf = ζ k Sg = Sh where h(x) = x2 + k. By Lemma 13.1(ii), f has the same multiset
of values as the quadratic polynomial h(x) = x2 + k and we are done.
13. Exponential Sums
In particular, k > 1 and so κ = |Sf (x)+cx | for some c ∈ F . Since κ2 = Sf (x)+cx Sf (x)+cx ∈
2
Z[ζ], κ itself must be an algebraic integer. However, κ = pk ∈ Q; so by Theorem A3.2(ii),
κ ∈ Z and k ∈ {1, p}.
√
If k = p then |Sf (x)+cx | = p for all c ∈ F and so by Theorem 14.2, our conclusion
(a) holds. Otherwise we have k = 1, and |Sf (x)−a1 x | = κ = p for some a1 ∈ F . This means
that we have a constant function f (x) − a1 x = a0 ∈ F , so (b) holds.
13. Exponential Sums
The next two technical lemmas will be required for our main results. For every function
f : F → F , we define
Af = {a ∈ F : Sf (x)+ax 6= 0}.
Lemma 13.6 ([M3]). Suppose |Af | 6 12 (p + 1). Then |Af | = 1, and f is either
constant or linear; i.e. f (x) = a1 x + a0 for some a0 , a1 ∈ F .
Proof. By definition, a ∈ Af iff there exist distinct x, y ∈ F such that f (x)+ax = f (y)+ay.
Thus the subset −Af = {−a : a ∈ Af } coincides with the set of all slopes of secants to the
graph of f in F 2 , i.e. the set of all values of the difference quotient (f (y) − f (x))/(y − x)
for x 6= y in F . The result follows by a theorem of Rédei [Re]; see also [Bl], [LS].
Proof. We use geometric terminology for the affine plane F 2 , which has p vertical lines
of the form x = a (a ∈ F ) and p2 non-vertical lines of the form y = mx + b (m, b ∈ F ).
Consider the point set O = {(g(t), t2 ) : t ∈ F } ⊂ F 2 . We will soon show that O consists
of p2 distinct points. First observe that if the equation t2 = mg(t) + b has more than two
solutions for t ∈ F , then the function h(t) = t2 −mg(t) attains the value b more than twice,
contrary to Cavior’s Theorem 13.4. This shows that
(13.8) for all m, b ∈ F , there are at most two values of t ∈ F such that the point
(g(t), t2 ) lies on the line y = mx + b.
If (g(t1 ), t21 ) = (g(t2 ), t22 ), t1 6= t2 , then t2 = −t1 6= 0. In this case, since g is not
constant, there exists t3 ∈ F such that g(t3 ) 6= g(t1 ). Let m = (t23 − t21 )/(g(t3 ) − g(t1 )),
b = t21 − mg(±t1 ) = t23 − mg(t3 ); then the line y = mx + b passes through (g(ti ), t2i ) for
i = 1, 2, 3, contrary to (13.8). This proves (13.9).
Now let ` ⊂ F 2 be any line, vertical or non-vertical. By (13.8), |` ∩ O| = 0, 1 or 2; and
we call ` a passant, secant or tangent accordingly. Each point P ∈ O lies on exactly
p + 1 lines, of which p − 1 are secants, and so P lies on exactly two tangents, one of which
(we claim) must be vertical. For each m ∈ F , the function h(t) = t2 − mg(t) attains some
13. Exponential Sums
value b ∈ F exactly once, and each other value in F either 0 or two times, again by Cavior’s
Theorem; so among the p lines of slope m, exactly one (the line y = mx + b) is a tangent
line. Since there are p choices of m, there are exactly p non-vertical tangents; hence the
remaining p tangents must be vertical. Thus
Proof. If |Saf1 +bf2 | = p then af1 + bf2 is constant; and since fi (0) = 0, this means
af1 + bf2 = 0. Since f1 and f2 are linearly independent, this forces a = b = 0. Thus
√
|Saf1 +bf2 | ∈ {0, p} whenever (a, b) 6= (0, 0).
Consider the case that f2 is a permutation. In this case we may assume f2 (x) = x;
√
otherwise replace fi by fi ◦ f2−1 for i = 1, 2. Now |Sf1 (x)+bx | ∈ {0, p} for all b ∈ F ; so by
Theorem 13.5, f1 (x) = a1 x2 + b1 x for some a1 , b1 ∈ F with a1 6= 0. The result follows in
this case.
Now if the two-dimensional space hf1 , f2 iF of functions F → F contains a permuta-
tion, then by change of basis we reduce to the previous case. We may therefore assume
√
hf1 , f2 iF contains no permutation, i.e. |Saf1 +bf2 | = p for all (a, b) 6= (0, 0). In partic-
√
ular |Sf1 | = p so by Cavior’s Theorem 13.4, there exists a permutation σ : F → F
such that f1 (x) = a1 σ(x)2 + b1 σ(x), a1 6= 0; moreover σ(0) = 0 (thus ensuring the value
f1 (0) = 0). Furthermore, there is no loss of generality in assuming that σ(x) = x and
√
a1 = 1; so |Sx2 +b1 x+bf2 (x) | = p for all b ∈ F . Define h(x) = f2 (x − b21 ) − f2 (− b21 ); then
√
|Sx2 +bh(x) | = |Sx2 +b1 x+bf2 (x) | = p for all b ∈ F , so h : F → F is bijective by Lemma 13.7.
This means that f2 is a permutation after all.
Proof. By hypothesis,
P ax2 +bx+cf (x) 2 2
−y 2 )+b(x−y)+c(f (x)−f (y))
ζ a(x
P
p= ζ =
x∈F x,y∈F
2
ζ 2aty+t +bt+c(f (y+t)−f (y))
P
=
y,t∈F
Now suppose the desired conclusion fails, i.e. f is not representable as a polynomial of
degree 6 1; we seek a contradiction. Evidently the first-order difference of f is not constant,
so there exists x ∈ F such that
f (x + 1) − f (x) 6= m
Exercises 13.
1. Give a direct proof (i.e. without using Gauss sums) that the exponential sum Sf for f (x) = ax2
√
on E = Fq , q odd, a 6= 0, has modulus |Sf | = q. (Hint: Expand |Sf |2 = Sf Sf as a double
sum, and use orthogonality of additive characters.)
2. Let E = Fq where q = pe , p prime, e > 2; and consider the function f : E → E, a 7→ ap − a.
Evaluate Sf and find conditions under which Weil’s bound of Theorem 13.2 fails. This points to
the necessity of the hypothesis gcd(d, q) = 1 in that result.
Proposition 14.1. If char F is odd, then every quadratic polynomial f (x) ∈ F [x]
represents a planar function on F .
14. Affine Planes
Proof. Let f (x) = ax2 + bx + c where a, b, c ∈ F with a 6= 0. Then for all nonzero
m ∈ F , ∆m f (x) = 2amx + am2 + b is a polynomial of degree 1 (since 2am 6= 0 in odd
characteristic) and hence a permutation of F .
The interest in planar functions (and the explanation for their name) derives from the
fact that every planar function f : F → F gives rise to a finite affine plane Af of order q
(and hence also a projective plane of the same order). This plane has q 2 points (x, y) ∈ F 2
and q(q+1) lines `em,h where m ∈ F ∪{∞}, h ∈ F , defined as follows:
• ‘Vertical’ lines are point sets of the form `e∞,k := {k}×F = {(k, y) : y ∈ F } for k ∈ F .
Each such line is denoted also by its equation x = k.
• ‘Nonvertical’ lines are point sets of the form `em,h := {(x, f (x+m)+h) : x ∈ F } where
m, h ∈ F . Each such line is denoted also by its equation y = f (x+m)+h.
One readily verifies that the resulting structure is an affine plane of order q. For example
if m 6= n in F , the fact that `em,h ∩ `en,k contains a unique point (x, y) follows from the fact
that ∆m−n f (x0 ) = f (x+m) − f (x+n) = k − h has a unique solution for x0 := x+n in F .
Unfortunately, however, if one uses a quadratic polynomial f (x) = ax2 +bx+c ∈ F [x]
(a 6= 0) as our choice of planar function, the resulting plane is not new; it is just a
disguised version of the classical plane. To see this, observe that (x, y) ∈ `em,h iff y−ax2 =
2am + a2 m+h iff (x, y−ax2 ) ∈ `2am,a2 m+h . (The description of vertical lines does not
change under this recoordinatization.) The map (x, y) 7→ (x, y−ax2 ) gives an isomorphism
from Af to the classical plane of order p.
A great deal of effort has been expended on looking for new planar functions, in
the search for new nonclassical finite projective planes. Some non-quadratic planar func-
tions [DO] have been known since 1968 (including those constructed in Exercise #1);
however the planes constructed from them belong to a large recognized class of planes
known as translation planes. In 1997, Coulter and Matthews [CM] published a construc-
tion of planar functions, which give rise to planes which are not classical, nor are they
more general translation planes. Their construction has q = 3e with e > 4 (in particular,
the order is not prime). The main result of this Section is that for prime order fields,
every planar function is quadratic and so the associated plane is classical. This result was
obtained independently, and almost simultaneously, by three teams of researchers: Rónyai
and Szőnyi [RS], Hiramine [Hi], and Gluck [Gl]. We present here the proof by Gluck
because it is arguably the least technical, and because it beautifully demonstrates the
natural role played by cyclotomic fields in finite geometry; but also because his approach
lends itself naturally to certain generalizations [M3] which we will describe in Section 15.
In the following Theorem 14.2, the equivalence (a)↔(d) appears explicitly in [Gl], [RS]
and [Hi]. The equivalence of these statements with (b) and (c), which is easily inferred
from Gluck [Gl], will be useful in Section 15.
14. Affine Planes
Proof. The implication (d)⇒(a) follows from Proposition 14.1; and the implication (b)⇒(c)
is trivial. It remains to prove (a)⇒(b) and (c)⇒(d). We start by assuming (a).
Let f : F → F be a planar polynomial over F = Fp , p an odd prime. Consider the
: x, y ∈ F . Denoting the conjugate-transpose of M by M ∗ ,
f (x−y)
p × p matrix M = ζ
the (x, y)-entry of M M ∗ is
0, if x 6= y;
X X X
f (x−z) f (z−y) f (x−z)−f (y−z) (∆x−y f )(y−z)
ζ ζ = ζ = ζ =
z∈F z∈F z∈F
p, if x = y
f˜ : F → F, x 7→ f (mx+b) + m0x + b0
The only known planes of prime order are the classical planes constructed from Fp ;
and it is tempting to conjecture that planes of prime order must be classical. (But once
again, keep in mind The Streetlight Effect.) Theorem 14.2 is the strongest result known in
this direction. The planes Af constructed from planar functions f share a special feature
in common with the classical planes: Each of these planes admits an elementary abelian
group of automorphisms of order p2 which transitively permutes the points: For each
(r, s) ∈ F 2 , the translation map (x, y) 7→ (x+r, y+s) takes `∞,a to `∞,a+s , and takes `m,b
to `m−r,b+s for m 6= ∞. Prior to Theorem 14.2, it was already known (using a combination
of geometric and group-theoretic arguments, which we omit here) that any affine plane of
prime order p, whose automorphism group has order divisible by p2 , must be of the form
Af for some planar polynomial f . Thus Theorem 14.2 yields the important consequence
Corollary 14.3. Any affine plane of prime order p whose automorphism group has
order divisible by p2 , must be classical.
Exercises 14.
k
1. Let E = Fq , q = pe , p an odd prime. Fix an automorphism σ ∈ Aut E; thus σ(x) = xp for some
k
k ∈ {0, 1, 2, . . . , e − 1}. Show that the function f : E → E, f (x) = xσ(x) = xp +1 is a planar
e e
function on F iff gcd(k,e) is odd. (Note that σ ∈ Aut E has order gcd(k,e) .)
2. Consider the quadratic extension E ⊃ F of fields of order q 2 and q, where q is odd. Recall that
σ : E → E, σ(x) = xq is the automorphism of order 2 with fixed field F . Define a new binary
operation on E by
xy, if x is a square (zero or nonzero square);
x∗y =
xσ(y), if x is a nonsquare.
(a) Prove that ‘∗’ is associative and left-distributive, i.e. (x ∗ y) ∗ z = x ∗ (y ∗ z) and x ∗ (y + z) =
x ∗ y + x ∗ z for all x, y, z ∈ E.
(b) Prove that the nonzero elements of E form a nonabelian group under the operation ‘∗’.
(c) Although ‘∗’ is not right-distributive, this is compensated for by a weaker property which
you should show: if a, b, c ∈ E with a 6= b, then the equation a ∗ x = b ∗ x + c has a unique
solution x ∈ E.
(d) Show that the following structure is an affine plane of order q 2 , where we essentially replace
ordinary multiplication by ‘∗’. Take points to be ordered pairs (x, y) ∈ E 2 . There are two
types of lines:
• q ‘vertical’ lines x = k, i.e. point sets {(k, y) : y ∈ E} for k ∈ E; and
• q 2 ‘nonvertical’ lines y = m ∗ x + b, i.e. point sets {(x, m∗x + b) : x ∈ F }, where m, b ∈ E.
This construction gives one of the most standard classes of translation planes.
15. Nets
A k-net of order n is an incidence system of n2 points and kn subsets of the points called
lines, such that
15. Nets
If one hopes to build an affine plane of order n, one might reasonably try to do so incre-
mentally by starting with n2 points and adding one parallel class at a time, hoping to see
how far one might go. The first two parallel classes (which one may take to be ‘horizontal’
and ‘vertical’ lines) are trivially constructed. For every n > 2 it is possible to find a third
parallel class of lines extending this to a 3-net of order n. Now the construction process
becomes more delicate. While there exist 3-nets of order 6, none of them are extendible
to 4-nets (a fact known already to Euler); and this implies the nonexistence of an affine
plane of order 6. Although there exists a 5-net of order 4 (affine plane of order 4), there
exist 3-nets of order 4 which are not extendible to any 4-net or 5-net of order 4. Here is
an example of a maximal 3-net of order 4, i.e. one which cannot be extended to a 4-net:
And although 4-nets of order 10 have been constructed, it is not known whether or not
there exists a 5-net of order 10 (although no 11-net of order 10 exists, by Lam et al).
Here we describe an algebraic approach to studying finite nets, which (in our view) is
more promising than other approaches that have been tried. To simplify the exposition,
we assume here that n = p is prime. Our goal is to show that every plane of prime order p
is classical (a major open problem).
15. Nets
To introduce this approach, we first consider the nets of order 3 shown above. Each
successive parallel class may be described by a triple of matrices, starting with A0 , A1 , A2
for the first parallel class; B0 , B1 , B2 for the second parallel class, etc., where
h1 1 1 i h1 0 0 i h1 0 0 i h1 0 0 i
A0 = 0 0 0 B0 = 1 0 0 C0 = 0 0 1 D0 = 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1
h0 0 0 i h0 1 0 i h0 1 0 i h0 0 1 i
A1 = 1 1 1 B1 = 0 1 0 C1 = 1 0 0 D1 = 1 0 0
0 0 0 0 1 0 0 0 1 0 1 0
h0 0 0i h0 0 1i h0 0 1i h0 1 0 i
A2 = 0 0 0 B2 = 0 0 1 C2 = 0 1 0 D2 = 0 0 1
1 1 1 0 0 1 1 0 0 1 0 0
2
Here the p points may be viewed as the pairs (i, j) with i, j ∈ F where F = Fp ; and each
matrix has 1’s in the positions of a line, with the other entries equal to zero. (The choice
of F3 as an index set here is purely a matter of convenience. Any set of p = 3 symbols
could be used in its place.) Note that
(15.1) A0 + A1 + A2 = B0 + B1 + B2 = C0 + C1 + C2 = D0 + D1 + D2 = J where
J = J3 , the 3 × 3 matrix of 1’s.
Denote by Ck the F -span of the matrices from the first k parallel classes, a k-net of order p.
We are interested in the dimensions of the sequence
(15.2) 0 = C0 6 C1 6 C2 6 · · · 6 Cp+1 .
In the case p = 3 shown above, the subspaces in (15.2) have dimensions 0,3,5,6,6. Now it
is the differences dim Ck − dim Ck−1 which are of interest. In our example, this is the se-
quence 3,2,1,0. It is not by coincidence that these values form an arithmetic sequence:
for every known plane of prime order p, these values always always form a sequence
p, p−1, p−2, . . . , 2, 1, 0, independently of which order we list the p+1 parallel classes. We
pose
(15.3) Conjecture [M1]: For any k-net of prime order p and (k − 1)-subnet
thereof, with k > 1, the subspaces Ck−1 6 Ck constructed as above have
dimensions satisfying
dim Ck − dim Ck−1 > p − k + 1.
Theorem 15.4 [M1]. (i) Subnets of classical planes of prime order satisfy Conjec-
ture 15.3 with equality.
(ii) If Conjecture 15.3 holds for a given prime p, then all planes of order p are classical.
15. Nets
It is known (see e.g. [M1]) that dim Cp+1 = 12 p(p+1) for every affine plane of prime order p.
Since p + (p−1) + · · · + 2 + 1 + 0 = 12 p(p + 1), the conjectured lower bound (15.3) would
require equality in the case of subnets of planes of order p; so if one’s sole interest is in
classifying planes of prime order p, then (15.3) could be replaced by the conjecture that for
subnets of planes of order p, dim Ck − dim Ck−1 = p−k +1. However, there are many nets
of prime order where strict inequality holds in (15.3); and it is conceivable that proving
the lower bound (15.3) in the more general setting of nets (not necessarily extendible to
planes) may be more natural or easier than proving equality. In Theorem 15.7 below, we
prove the first nontrivial case of Conjecture 15.3, the case k = 3. But first we rephrase the
descriptions both of the nets, and of the dimensions of the spaces Ci .
The p2 points of a k-net can be taken to be k-tuples x = (x1 , x2 , . . . , xk ) ∈ F k , F = Fp ,
where ai ∈ F indexes which line of the ith parallel class passes through the point x. Thus in
our example above, xi = 0, 1 or 2 according as Ai has entry 1 in the position corresponding
to the point x; and the 4-net is seen to be {(x, y, x+y, x−y) : x, y ∈ F } where F = F3 . In
general,
We omit the proof of Theorem 15.5, which is straightforward. The notion of a k-net of
order n appears in many guises in the combinatorial literature (particularly as a set of k−2
mutually orthogonal Latin squares of order n, an orthogonal array OA(k, n), a transversal
design T D(k, n). See [ACD] for details, noting that our set of n2 vectors of length k above,
when transposed, form the columns of an OA(k, n).)
Now consider a k-net N of order n, k > 3. We will often write N = Nk to emphasize
that it is a k-net. We may assume Nk ⊆ F k has the form described in Theorem 15.5(i).
Deleting the ith coordinate from all vectors in Nk gives a (k − 1)-net of the same order n,
which we call a (k − 1)-subnet of the original net Nk . A k-net N has exactly k choices
of (k − 1)-subnet, each of which is formed by omitting one of the k parallel classes of lines
in N . Note that a 2-net is necessarily N2 = F 2 . The classical (or desarguesian) affine
planes of prime order p, as described in the notation of Theorem 15.5, have the form
up to isomorphism. Moreover, any k-net obtained from one of these classical planes by
deleting (‘puncturing’) p + 1 − k of its coordinates, gives a k-subnet which we also call
15. Nets
dim Ck + dim Vk = kp
where Ck is the column space of M over F . (Although Ck has the same dimension as the
row space of M , the column space is interpreted more naturally—this being the subspace of
F p 2 spanned by the characteristic vectors of the lines of the net, as in our original example
for p = 3.) In terms of the spaces Vk , we may reformulate the conjectured inequality (15.3)
as dim Vk − dim Vk−1 6 k − 1 for k > 1.
Now consider the constant function : F → F , (a) = 1 for all a ∈ F ; and observe
that (a1 , a2 , . . . , ak ) ∈ Vk for all choices of scalars ai ∈ F satisfying a1 +a2 +· · ·+ak = 0.
(0)
These particular k-tuples of functions form a (k − 1)-dimensional subspace Vk 6 Vk , and
we obtain a splitting
(0)
Vk = Vk ⊕ Uk
where Uk is the subspace consisting of all (f1 , f2 , . . . , fk ) ∈ Vk such that f1 (0)=f2 (0)=· · ·=
(0)
fk (0) = 0. This splitting is obtained by noting that the map Vk → Vk , (f1 , f2 , . . . , fk ) 7→
(f1 (0) , f2 (0) , . . . , fk (0) ) is a projection with Uk as its kernel. This simplifies Conjec-
ture 15.3 further, leading to the equivalent form
(15.6) Conjecture: For any k-net of prime order p and (k − 1)-subnet thereof,
k > 2, the subspaces Uk−1 6 Uk constructed as above have dimensions
satisfying
dim Uk − dim Uk−1 6 k − 2.
A1 − A2 + B1 − B2 − C1 + C2 = 0,
15. Nets
giving (ι, ι, −ι) ∈ U3 where ι(a) = a for all a ∈ F3 ; and in this case U3 is one-dimensional
spanned by (ι, ι, −ι). The first nontrivial case of (15.6) says that for every 3-net of prime
order, dim U3 6 1. This is verified as follows.
Theorem 15.7 ([M1,M3]). Conjectures (15.3) and (15.6) hold for k = 3. In fact
for any 3-net of prime order p, we have dim U3 6 1; and equality holds iff the net is
cyclic, i.e. isomorphic to {(x, y, x+y) : x, y ∈ Fp }.
since f1 (a) + f2 (b) + f3 (c) = 0 for all (a, b, c) ∈ N3 . Multiplying both sides by Sf3 , and
then using the same argument for the other pairs of subscripts in {1, 2, 3}, gives
Now if any of the exponential sums Sfi is nonzero, they must all be nonzero and we obtain
|Sf1| = |Sf2| = |Sf3| = p; but then by Lemma 13.1(i), each of the functions fi is constant.
But since (f1 , f2 , f3 ) ∈ U3 , we have f1 (0) = f2 (0) = f3 (0) = 0. This forces f1 = f2 = f3 = 0, a
contradiction.
Thus Sf1 = Sf2 = Sf3 = 0. By Lemma 13.1(ii), each of the functions fi : F → F is a
permutation. Without loss of generality, f1 (x) = f2 (x) = x and f3 (x) = −x for all x ∈ F ;
for if not, then we simply relabel the p lines in each of the three parallel classes such that
this is the case. Now every (a, b, c) ∈ N3 satisfies
that is, N3 = {(a, b, a+b) : a, b ∈ F }. It remains only to verify that for this particular
3-net, every (g1 , g2 , g3 ) ∈ U3 is a scalar multiple of (f1 , f2 , f3 ). For this, we may assume
g1 (1) = 0; otherwise replace (g1 , g2 , g3 ) by (g1 , g2 , g3 ) − g1 (1)(f1 , f2 , f3 ). But we now have
g1 (0) = g1 (1) = 0, so g1 is no longer a permutation of F , and the argument above then
forces g1 = g2 = g3 = 0 as required.
At this time we do not have a proof of Conjecture (15.3) or (15.6) for k = 4; but we
have some partial results. Our analysis of 4-nets begins with the following extension of
Theorem 15.7.
15. Nets
Lemma 15.8 ([M3]). Let N4 be a 4-net of prime order p, and let (f1 , f2 , f3 , f4 ) ∈ U4 .
Then either
(i) three or more of f1 , f2 , f3 , f4 are permutations, or
(ii) |Sf1 | = |Sf2 | = |Sf3 | = |Sf4 | > 0.
Proof. As usual, let ζ = ζp . For all (x1 , x2 , x3 , x4 ) ∈ N4 we have f1 (x1 )+f2 (x2 )+f3 (x3 )+
f4 (x4 ) = 0, so ζ f1 (x1 )+f2 (x2 ) = ζ −f3 (x3 )−f4 (x4 ) . Summing over all (x1 , x2 , x3 , x4 ) ∈ N4 gives
and
Sf1 (|Sf2 |2 − |Sf3 |2 ) = Sf1 (|Sf2 |2 − |Sf3 |2 ) = 0.
Now we may suppose at least one of the exponential sums Sfi is nonzero, otherwise case (i)
holds. So without loss of generality, Sf1 6= 0. Then we directly obtain |Sf2 | = |Sf3 | = |Sf4 |.
If the latter three exponential sums vanish, we obtain case (i); otherwise by symmetry we
obtain case (ii).
Lemma 15.9 ([M3]). Suppose that N4 is a 4-net of prime order p for which there
exist linearly independent 4-tuples (f1 , f2 , f3 , f4 ), (0, x, x, x) ∈ U4 . Then either
√
(i) |Sf1 | = |Sf2 | = |Sf3 | = |Sf4 | = p and f2 , f3 , f4 are quadratic polynomials; or
(ii) Sf1 = 0 and at least two of f2 , f3 , f4 are scalar multiples of ι, ι(x) = x.
or
|Sf2 (x)+ax | = |Sf3 (x)+ax | = |Sf4 (x)+ax | = |Sf | > 0.
15. Nets
By Theorem 13.5, and using the fact that f2 (0) = f3 (0) = f4 (0) = 0, we obtain either
conclusion (i) or f2 = f3 = f4 = aι for some a ∈ F ; but in the latter case, we get
(f1 , 0, 0, 0) = (f1 , f2 , f3 , f4 ) − a(0, ι, ι, ι) ∈ U4 , forcing f1 = 0, a contradiction.
Hence we may assume that Sf1 = 0, so f1 is a permutation. Without loss of gener-
ality f1 = ι (otherwise relabel lines in the first parallel class so that this is the case). By
Lemma 15.8, the three sets Af2 , Af3 , Af4 (see Lemma 13.6) are mutually disjoint. Without
loss of generality, |Af2 | 6 |Af3 | 6 |Af4 |; otherwise permute the last three parallel classes
such that this inequality holds. Thus |Af2 | 6 |Af3 | 6 31 p 6 21 (p − 1). By Lemma 13.6
and the condition f2 (0) = f3 (0) = 0, we have f2 = aι and f3 = bι for some a, b ∈ F , so
conclusion (ii) holds.
Recall that a 4-net N4 has four 3-subnets, each formed by deleting one of the four
parallel classes of lines from N4 (or equivalently, by puncturing one of the four coordinates).
Theorem 15.10 ([M3]). Let N4 be a 4-net of prime order p. Then the number of
its cyclic 3-subnets is always 0, 1, 3 or 4, but never exactly 2.
Proof. We must show that if N4 has at least two cyclic 3-subnets, then it has a third.
Without loss of generality, parallel classes 1,2,3 of N4 form a cyclic 3-subnet; and so do par-
allel classes 2,3,4. After relabelling lines in each parallel class, we have (f1 , f2 , f3 , 0), (0, x,
x, x) ∈ U4 where f1 , f2 , f3 are permutations of F . By Lemma 15.9, we may suppose that
f2 (x) = ax for some a ∈ F . Now
Remark : Theorem 15.10 is best possible in the sense that there exist 4-nets of prime order
for which the number of cyclic 3-subnets is 0, 1, 3 or 4.
Recall that a classical 4-net of order p is one of the form {(x, y, x+y, x+cy) : x, y ∈ F }
for some c ∈ F with c 6= 0, 1. For choices of p > 5, there are generally many nonisomorphic
4-nets of order p; different choices of c sometimes yield isomorphic 4-nets, but usually not.
for some fixed (and evidently nonzero) a ∈ F ; and each of the nine nonzero coordinates
appearing in these 4-tuples is a permutation of F . Without loss of generality (again, by
permuting the labels on the lines of the first parallel class of lines) we have f1 (x) = x. By
hypothesis, there also exist permutations g1 , g2 , g4 of F such that
(g1 , g2 , 0, g4 ) ∈ U4 .
By Lemma 15.9, either g2 (x) = bx or g4 (x) = bx for some b ∈ F . We may assume that
g2 (x) = bx; otherwise interchange the second and fourth parallel classes (replacing also a
by −a, and f3 (x) by f3 (x)−ax). Now
so by Theorem 15.7, this is a scalar multiple of (x, 0, f3 (x)−ax, −ax). Without loss of
generality (after applying a suitable scalar multiple),
This forces
N4 = {(bx+ay, −x−y, x, y) : x, y ∈ F }.
Theorem 15.12 ([M3]). Let N4 be a 4-net of prime order p, and suppose that
N4 has a cyclic 3-subnet N3 . Then Conjectures 15.3 and 15.6 hold for N4 . Indeed,
dim U4 6 3; and equality holds iff N4 is isomorphic to a 4-subnet of a classical plane
of order p.
are linearly independent. By Theorem 15.7, the functions f1 and g1 are nonzero. More
than this, f1 and g1 are linearly independent functions F → F ; for if f1 = ag1 for some
a ∈ F , then
(f1 , f2 , f3 , f4 ) − a(g1 , g2 , g3 , g4 ) = b(0, x, x, x)
for some b ∈ F , a contradiction.
√
By Lemma 15.9 we have |Sf1 | ∈ {0, p}. More generally, for all a, b ∈ F the function
√
f = af1 + bg1 satisfies |Sf | ∈ {0, p, p}; so by Theorem 13.11, fi (x) = a2i σ(x)2 + bi σ(x)
for some ai , bi ∈ F and some permutation σ : F → F . We may assume σ(x) = x, after
relabelling lines in the first parallel class; and f1 (x) = x, g1 (x) = x2 , after a change of
basis for U4 . By Lemma 15.9, we may assume that
(x, 0, (a3 −a2 )x, f4 (x)−a2 x), (x, (a2 −a3 )x, 0, f4 (x)−a3 x) ∈ U4
and so the 3-subnet formed by parallel classes 1,3,4 is cyclic; likewise the 3-subnet formed
by parallel classes 1,2,4. Since
f4 +g4 is quadratic by Lemma 15.9; and since g4 is itself quadratic, this forces f4 to be
polynomial of degree 6 2. This means that f4 (x) = ag4 (x) + bx for some a, b ∈ F ; and so
This means that the 3-subnet formed by parallel classes 1,2,3 is cyclic (and a = 0). The
result follows by Theorem 15.11.
Exercises 15.
1. We have illustrated a cyclic 3-net of order 4. Prove that it is maximal; i.e. it is not a subnet of
any 4-net of order 4.
2. Let N be a k-net of prime order p, 2 6 k 6 p, in the standard form given by Theorem 15.5(i).
Consider a k-tuple of functions (f1 , f2 , . . . , fk ), fi : F → F , such that f1 (x1 ) + f2 (x2 ) + · · · +
fk (xk ) = 0 for all (x1 , x2 , . . . , xk ) ∈ N ; thus (f1 , f2 , . . . , fk ) ∈ Vk as we have defined the space Vk .
P
Denote Σi = a∈F fi (a) for i = 1, 2, . . . , k.
(a) Fix a ∈ F . By considering all p points (x1 , x2 , . . . , xk ) ∈ N with last coordinate xk = a,
show that Σ1 + Σ2 + · · · + Σk−1 = 0.
(b) By varying the choice of coordinate, obtain relations similar to that in (a) showing that any
k − 1 of Σ1 , Σ2 , . . . , Σk have sum equal to zero.
(c) Show that Σ1 = Σ2 = · · · = Σk = 0.
(d) Let ε = 1 − ζ where ζ = ζp . Show that for all i = 1, 2, . . . , k, the exponential sum Sfi lies in
the ideal (ε) ⊆ Z[ζ].
Proof. Orthogonality of the rows of H (with respect to the standard inner product on
Cn ) follows from Theorem 6.3(a).
The examples constructed in Theorem 16.1 are the character tables of the finite
abelian groups. For larger values of n, there are typically many other examples than these.
We will consider Cn as the set of row vectors of length n over C; thus for u, v ∈ Cn ,
the standard inner product of u and v may be written as uv ∗ ∈ C. A vector u ∈ Cn is
flat if all its entries have modulus √1n . Similarly, an n × n matrix A is flat if all its entries
have modulus √1n . Note that for any n × n complex matrix A, the Gram matrix of the
rows of A is the matrix AA∗ ; its (i, j)-entry is the inner product of rows i and j of A.
We omit the proof of Theorem 16.3, which is straightforward. Note that the orthonormal
condition of (iii) is with respect to the standard complex inner product: it says that the
rows u1 , u2 , . . . , un of √1n H satisfy
1, if i = j;
ui u∗j = δi,j =
0, if i 6= j.
The reason why the value √1n arises throughout, is that it is the only feasible value
for |ui vj∗ |, assuming this value is constant. To see this, suppose that c is a positive real
constant such that |ui vj∗ | = c for all i, j. We may expand ui with respect to the second
basis as
ui = ci,1 v1 + ci,2 v2 + · · · + ci,n vn
where ci,j = ui vj∗ and |ci,j | = c. Again using orthonormality, we have
so c = √1n . The unbiased property for two orthonormal bases is a symmetric (but neither
reflexive nor transitive) relation. For a given pair of orthonormal bases, it says that all
the vectors of one basis have a fixed ‘angle’ (actually, inner product) with respect to the
vectors in the other basis.
Every orthonormal basis is represented by a unitary matrix B having the vectors of
B as its rows. (While the columns of B are also orthonormal, it is only the rows that we
consider here.) So it is reasonable to say that two unitary matrices B1 , B2 are unbiased
if their rows form an unbiased pair of bases; equivalently, the matrix B1 B2∗ is flat. Since
√
B1 B2∗ is also unitary, the latter condition is also equivalent to the condition that nB1 B2∗
is complex Hadamard.
Turning this around, every complex Hadamard matrix H of order n gives rise to an
unbiased pair of unitary matrices In , √1n H and an unbiased pair of orthonormal bases (the
standard basis, forming the rows of In ; and the rows of √1n H).
Now consider a set of k orthonormal bases of Cn , say B1 , B2 , . . . , Bk . These bases are
mutually unbiased (of order n) if Bi and Bj are unbiased for all i 6= j in {1, 2, . . . , n}.
Equivalently, a list of unitary n × n matrices B1 , B2 , . . . , Bk is mutually unbiased if
√
nBi Bj∗ is complex Hadamard for all i 6= j in {1, 2, . . . , n}.
Proof. Denote by Vn the real vector space of all n × n Hermitian matrices, i.e. Vn is the
set of all A ∈ Cn×n such that A∗ = A. Note that dim Vn = n2 ; and the standard inner
product on Vn is the real inner product defined by [A, B] = tr(AB) for A, B ∈ Vn . (Note
that since A, B ∈ Vn , tr(AB) = tr(A B) = tr(AT B T ) = tr((BA)T ) = tr(BA) = tr(AB);
so this form is real-valued. Also tr(AA) = tr(AA∗ ) = i,j |ai,j |2 where A = ai,j , so the
P
1 1
In n Jn ··· n Jn
1 Jn In ··· 1
n Jn
M = n. .
.. .. .. ..
. . .
1 1
n Jn J
n n ··· In
Theorem 16.7. Let q = pe , p an odd prime. Then there exists a complete set of
MUBs of order q.
2
Proof. Let B∞ = Iq . For each r ∈ F = Fq , define the q × q matrix Br = √1q ζ Tr(ry +xy) :
Haagerup [Hp] has showed that every complex Hadamard matrix is equivalent to the
construction of Theorem 16.1. We [MM] have extended this result to show that every set
of mutually unbiased bases of order 5 is contained (up to equivalence) in the complete set
constructed in Theorem 16.7. This result uses our Theorems 13.5 and 13.12, thus lending
credence to the belief in a connection between nets and MUBs. However our result relies
on Haagerup’s uniqueness result [Hp] for the complex Hadamard matrix of order 5. The
basic argument works for 2, 3 and 5 where there is a single complex Hadamard matrix up
to equivalence, but not for other orders.
Theorem 16.8 ([MM]). Every set of k mutually unbiased bases of order 5 is con-
tained (up to equivalence) in the complete set constructed in Theorem 16.7.
(16.9) Every complex Hadamard matrix of order 5 has the form M BM 0 for some
M, M 0 ∈ M5 .
16. Mutually Unbiased Bases
Let us call a complex Hadamard matrix normalized if its first row and column consist
of 1’s. Every complex Hadamard matrix is equivalent to one which is normalized; simply
scale each row and column by an appropriate complex number of modulus 1 to obtain such
a normalized representative of its equivalence class. Or, we may choose to first permute
rows and columns before scaling, thereby obtaining a possibly different normalized matrix
in the equivalence class; so the normalized form is not unique in its equivalence class. In
particular there is only one equivalence class of complex Hadamard matrices of order 5,
but many normalized representatives in this class; see Exercise #2. We prove the following
refinement of (16.9):
(16.10) Every complex Hadamard matrix of order 5 has the form M BM 0 for some
M, M 0 ∈ M5 such that M 0 has entry 1 in its upper left corner.
(It is customary to refer to the upper left corner of a matrix as its (1, 1)-entry; although
when we index the entries using elements of F5 , it would make more sense to call this the
(0, 0)-entry.) Given a complex Hadamard matrix H of order 5, first write it in the form
H = M BM 0 for some M, M 0 ∈ M5 by (16.9). However the leftmost column of this M 0
has its nonzero entry in the (i,0) position. It is straightforward to check that the circulant
matrix C = δx,y+1 : x,y ∈ F satisfies BC = DB where D ∈ M5 is the diagonal matrix
D = ζ δx,y : x, y ∈ F . The monomial matrix M 00 = C −i M 0 has a nonzero entry (call
x
(16.11) Every normalized complex Hadamard matrix H of order 5 has fifth roots
of unity as entries. The product of the entries in each of its rows is 1; and
the same holds for columns.
(Left-multiplication by arbitrary monomial matrices is not required here, since this will
take our set to an equivalent set of MUB’s.) Now each Mr = λr,y δx,σr (y) : x, y ∈ F where
|λr,x | = 1 and σr is a permutation of F , for all x ∈ F and r ∈ {0, 1, . . . , k−2}; moreover,
λr,0 = λ0,x = 1, σ0 (y) = y (the identity permutation) and σr (0) = 0 for all r.
Now the matrix 5Ur Us∗ has (x, y)-entry equal to
X X
ζ xz λr,v δz,σr (v) δw,σs (v) λs,v ζ −wy = ζ xσr (v)−yσs (v) λr,v λs,v ,
z,v,w∈F v∈F
√
which is required to have modulus equal to 5 whenever r 6= s in {0, 1, . . . , k−2}. This
means that
P xσr (v)−yσs (v) 2
(16.12) for all x, y in F and r 6= s, 5 = ζ λr,v λs,v
v∈F
P x(σr (v)−σr (w))−y(σs (v)−σs (w))
= ζ λr,v λs,w λr,v λs,w .
v,w∈F
Now specialize (16.12) to the case r = y = 0 and recall that σ0 = id and λ0,x = 1 to
obtain
2
ζ −xw λs,w ζ x(v−w) λs,w λs,v = 5 whenever x ∈ F and s 6= 0.
P P
(16.13) =
w∈F v,w∈F
where we abbreviate λx := λs,x for fixed s 6= 0. That is, the circulant matrix H = λx+y :
x, y ∈ F is complex Hadamard! Normalizing H, we obtain
1 1 1 1 1
1 λ1 2λ2 λ1 λ2 λ3 λ1 λ3 λ4 λ1 λ4
2
1 λ1 λ2 λ3 λ2 λ4 λ2 λ3 λ1 λ2 λ4
,
2
1 λ1 λ3 λ4 λ2 λ3 λ1 λ3 λ0 λ2 λ3 λ4
2
1 λ1 λ4 λ1 λ2 λ4 λ2 λ3 λ4 λ3 λ4
using the fact that λ0 = λs,0 = 1. By (16.11), the product of the entries in each row of this
matrix is 1. This says that each λx = λs,x is a fifth root of unity. So there exist functions
fs : F → F satisfying fs (0) = 0 and λs,x = ζ fs (x) . Returning to (16.13), we now have
√
ζ fs (w)−xw =
P
5 for all s 6= 0.
w∈F
17. Weil’s Bound
Again, we may dispense with the monomial matrices Ms0 , using equivalence of MUBs,
leaving Us = Bas for s = 1, 2, . . . , k−2. We also have U∞ = B∞ = I5 and U0 = B0 = B;
so our set of k − 2 MUB’s is (up to equivalence) a subset of the standard set.
Exercises 16.
1. The two groups of order 4 have character tables giving rise to Hadamard matrices
1 1 1 1 1 1 1 1
H4a = 1 i −1 −i ; H4b = 1 −1 1 −1 .
1 −1 1 −1 1 1 −1 −1
1 −i −1 i 1 −1 −1 1
(a) Show that H4a and H4b are not equivalent Hadamard matrices. That is, show that there do
not exist unitary monomial matrices M, M 0 satisfying H4b = M H4a M 0 .
(b) Show that H4a is equivalent to H4 (from Example 16.2) for some choice of α; and similarly
for H4b .
2. Exactly how many ‘normalized’ complex Hadamard matrices of order 5 are there? (See (16.9)
and the comments which follow it.)
We now outline key elements of the proof, using the machinery of L-functions introduced
Section 12. We replace the choice of multiplicative function λ : M → C used in Section 12
(designed for investigating Gauss sums and the Hasse-Davenport relations) by a new choice
λ = λg ; but everything in Section 12 up to (12.6) applies here as well. While we omit some
details in the proof of Weil’s bound, these can be found in other sources. The details, as
found in [LN], [Sc], are elementary if somewhat technical.
Our rationale for naming the fixed polynomial g(x) (above) is to reserve the name f ∈
M for an arbitrary monic polynomial as in our previous generalities regarding L-functions.
As before, M is the multiplicative monoid of monic polynomials in F [x]. Recall that all
nonzero ideals A ⊆ O = F [x] are principal, having the form A = (f ) for some f ∈ M .
We are ready to introduce our new choice of multiplicative function λ(A) = λ(f ) = λg (f )
which depends on the choice of given polynomial g above.
Recall that it suffices to define λ(f ) for f ∈ P (i.e. f monic irreducible), then extend
to the monoid M using unique factorization in F [x]. So given f ∈ Pk , recall that E = Fqk
is the splitting field of f over F , thus:
We define
λ(f ) = λg (f ) = ζ TrF/K [g(r1 )+g(r2 )+···+g(rk )] .
P
We must of course show that this definition makes sense! Obviously i g(ri ) ∈ E; but since
every F -automorphism σ of E permutes the n roots of f by Theorem A5.3, σ permutes
j
the k terms in i g(ri ). (Recall: the Galois group G(E/F ) is cyclic and σ(a) = aq for
P
P P
some j ∈ {0, 1, 2, . . . , k−1}). Hence σ( i g(ri )) = i g(ri ). By Galois theory, this means
P
that i g(ri ) ∈ F , where F is the domain of our trace map TrF/K .
We will denote the associated L-function by Lg (s) := Lλg (s). As before, the key step
is proving that the coefficient of z n in the series expansion of Lg (s) vanishes for large n
(see (12.5)), thus forcing the L-function to be a polynomial in z.
Recall that Mn ⊂ M is the set of monic polynomials f (x) ∈ F [x] of degree n.
P
Lemma 17.1. If n > d, then λ(f ) = 0.
f ∈Mn
for some polynomial eh(e1 , e2 , . . . , ed−1 ) ∈ F [e1 , e2 , . . . , ed−1 ] (whose coefficients depend on
the fixed polynomial g). Thus
X X
λ(f ) = ζ TrF/K[g(r1 )+g(r2 )+···+g(rn )]
f ∈Mn f ∈Mn (summing over f ∈ Mn
X d−1
= ζ TrF/K[(−1) dbd ed +e
h(e1 ,e2 ,...,ed−1 )] amounts to summing over
e1 ,e2 ,...,en ∈F choices for its coefficients)
= q n−d
X
ζ TrF/K[(−1)
d−1
dbd ed +e
h(e1 ,e2 ,...,ed−1 )] (the summand is independent
of ed+1 , . . . , en ∈ F )
e1 ,e2 ,...,ed ∈F
X d−1 X
= q n−d ζ (−1) TrF/K (dbd ed )
ζ TrF/K eh(e1 ,e2 ,...,ed−1 ) = 0
ed ∈F e1 ,e2 ,...,ed−1 ∈F
X X
−(ω1n + · · · + ωd−1
n
) = kλ(f )n/k
k|n f ∈Pk
for all n > 1. We will show that the latter double sum is simply α∈E ζ TrE/K g(α) where
P
E = Fqn . Given α ∈ E, let f (x) be its minimal polynomial over F , so deg f (x) = k =
[F [α] : F ] which divides [E : F ] = n; moreover in this case Theorem 3.8 gives
2 k−1
f (x) = (x − α)(x − αq )(x − αq ) · · · (x − αq )
and we can group together the k terms in our sum arising from the same minimal polyno-
mial f (x) to get
X X X q q k−1
ζ TrE/K g(α) = ζ TrE/K g(α) + ζ TrE/K g(α ) + · · · + ζ TrE/K g(α )
which is Weil’s bound. In place of Schmidt’s technical argument, Lidl and Niederreiter [LN]
√
substitute a slightly less technical (but also elementary) argument proving |ωi | 6 p. This
is also sufficient to establish Weil’s bound, as is clear from the argument above; but it is
less satisfying in that it falls short of proving the equality in (17.3). We did not feel so
compelled to complete the proof of Weil’s bound as to devote many more technical pages
to the goal, having already described what we view as the nicest part of the proof.
Example 17.4: Quadratic Exponential Sum. Let F = Fp where p is an odd prime, and take
g(x) = x2 . Here d = 2 which does not divide p. If E = Fq , q = pn then by Theorem 13.3 and
Corollary 12.11,
√ n
(− p) , if p ≡ 1 mod 4;
ω1n = −G(χE ) = √
(−i p)n , if p ≡ 3 mod 4.
√ √
Evidently ω1 = − p or −i p according as p ≡ 1 mod 4 or p ≡ 3 mod 4. Note that (17.3) is satisfied.
Replacing g(x) with another quadratic polynomial give a similar results.
Exercises 17.
1. Let F = F3 and E = Fq where q = 3n , and let g(x) = x5 + x ∈ F [x]. Note that d = 5 which is
relatively prime to q. Here ζ = ζ3 = ω and all exponential sums have values in the Eisenstein
integers Z[ω].
(a) Compute
√ an table of values of g(a) for a ∈ F9 . It is convenient to take F9 = F3 [i] where
i = −1.
(b) Using (a), compute a∈E ω TrE/F g(a) ∈ Z[ω] for n = 1, 2.
P
(c) Equating the sum in (b) to −ω1n − ω2n , obtain two equations in two unknowns ω1 , ω2 ∈ C.
Solve for ω1 and ω2 .
(d) Does the equality of (17.3) hold? Explain.
(e) Use Corollary 17.2 to evaluate the exponential sum for n = 1, 2, 3, 4, 5, 6.
(f) Half of the values listed in (e) are zero. Give a very simple explanation for this fact. (Hint:
Comments immediately following the statement of Theorem 11.7 use similar reasoning.)
Appendix A1: Fields and Extensions
Proof. Suppose {α1 , . . . , αm } is a basis for K over E, and {β1 , . . . , βn } is a basis for E
over F . It is easy to see that {αi βj : 16i6m, 16j6n} is a basis for E over F . Indeed, every
Pm
α ∈ K can be uniquely expressed as α = i=1 ai αi with ai ∈ E; and we can uniquely ex-
Pn Pm Pn
press ai = j=1 bij βj with bij ∈ F . This gives a unique expression α = i=1 j=1 bij αi βj
with bij ∈ F as required.
Although we have proved Theorem A1.1 only in the case of finite degree, one similarly
proves the general case (with the obvious convention that [K : F ] = ∞ iff at least one of
[K : E] or [E : F ] is infinite. Theorem A1.1 should be seen as the field-theoretic analogue
of the statement [G : K] = [G : H][H : K] for chains of subgroups G > H > K.
Appendix A1: FIELDS AND EXTENSIONS
The characteristic of a field F , denoted char F , equals the minimum positive inte-
ger n such that 1 + 1 + · · · + 1 = 0 (with n 1’s), if such an n exists; otherwise we say F
has characteristic zero and we write char F = 0. For a field F of positive characteristic,
char F = p must be prime. This fact is proved by an obvious generalization of the following
explanation why char F 6= 6: otherwise
0 = 1 + 1 + 1 + 1 + 1 + 1 = (1 + 1)(1 + 1 + 1)
Under the conditions of Theorem A1.2, we say θ is algebraic over F ; and Irrθ,F (x) := m(x)
is the minimal polynomial of θ over F . Moreover, θ is algebraic of degree n where
n = deg m(x) = [E : F ].
Proof. Assuming (ii), there is a linear combination
a0 + a1 θ + a2 θ2 + · · · + an θn = 0
(b0 + b1 θ + b2 θ2 + · · · + bm θm )θ = 1.
After expanding and moving all terms to one side, we obtain (ii) as required.
Proof. Since α is algebraic over F , we have a finite extension field L := F [α] ⊇ F and
so [L : F ] < ∞. Since β is algebraic over F , it is algebraic over L; so L[β] ⊇ L is a finite
extension. Now [L[β] : F ] = [L[β] : L][L : F ] < ∞, so the extension L[β] ⊇ F is algebraic.
So the elements α±β, αβ, α β ∈ F [α, β] = L[β] are algebraic over F .
Thus the set of all elements of an extension E ⊆ F which are algebraic over F , forms an
intermediate subfield. In particular, C has a subfield A consisting of all complex numbers
that are algebraic over Q. Note that A is algebraically closed (i.e. every nonconstant
polynomial f (t) ∈ A[t] has a root in A); and the extension A ⊃ Q is algebraic (i.e. every
element of A is algebraic over Q); so A is the algebraic closure of Q. Evidently A
contains all complex roots of unity; but A is not generated by the roots of unity. Since
[A : Q] = ∞, the converse of Corollary A1.3 evidently fails. Note also that A is a proper
subfield of C, since C is uncountable whereas A is countable.
Appendix A1: FIELDS AND EXTENSIONS
θn + an−1 θn−1 + · · · + a1 θ + a0 = 0.
respectively, where tr : F n×n → F is the usual matrix trace (the sum of the diagonal
entries). Note that both of these are maps E → F . They do not depend on the choice
of bases used, since they are the determinant and the trace of Tα , admitting a basis-free
description. In the case where F is the prime field of E (i.e. its minimal subfield), the
norm and trace are called the absolute norm and absolute trace.
Example A1.6: The Complex Numbers. Take {1, i} as a basis for C over R. The matrix of
α = a + bi (a, b ∈ R) with respect to this basis is M (α) = [ ab −b
a
]. The norm and trace are given by
NC/R α = a2 + b2 and TrC/R α = 2a.
Appendix A1: FIELDS AND EXTENSIONS
Proof. The identities N(ab) = N(a) N(b) and Tr(a+b) = Tr a+Tr b follow from Tab = Ta Tb
and Ta+b = Ta + Tb using basic properties of determinant and trace for linear transfor-
mations. Also N(1) = det I = 1; so if a ∈ E × then N(a) N(a−1 ) = N(1) = 1. Finally,
suppose Tr(ab) = 0 for all a ∈ E; we must show that b = 0. It suffices to find c ∈ E
satisfying Tr c 6= 0; for then we may take a = cb whenever b 6= 0. In characteristic zero,
Tr 1 = n = [E : F ] 6= 0 as required. In the finite case F = Fq and E = Fqn , an extension
2 n−1
of finite fields of degree [E : F ] = n, Tr a = a + aq + aq + · · · + aq by Theorem 3.8. If
q n−1
Tr c = 0 for all c ∈ E, then the polynomial x + · · · + x + x ∈ F [x] has q n roots in E,
q
and this number exceeds the degree q n−1 of the polynomial, a contradiction; so once again
there exists c ∈ E with Tr c 6= 0 as required.
Regarding the necessity of the additional assumption in Theorem A1.7(ii), see Exam-
ple A4.3.
The matrix representation gives the impression of making the arithmetic of field ex-
tensions more concrete or facilitating implementation. On the contrary, it is not practical
for implementation (since it requires storing n2 matrix entries for each element of E, rather
than E in the usual representation). For computer implementation, polynomial arithmetic
(modulo an irreducible polynomial) is still the best. But the matrix representation is sur-
prisingly useful as a theoretical device for explaining certain properties of field extensions.
For example, consider the ring R = F n×n , the ring of n × n matrices over F ;
and let Rm×m be the ring of m × m matrices over R. Then we have the isomorphism
Rm×m ∼ = F mn×mn which, although not difficult to prove, is still rather subtle and some-
what surprising. And replacing R by the subring S = {M (α) : α ∈ E} ⊆ R using
the matrix representation of an extension E ⊇ F of degree n as above, the isomorphism
E∼ = S ⊆ R induces an isomorphism between the ring E m×m of m×m matrices over E, and
the subring M (E m×m ) ⊆ F mn×mn defined by replacing each entry of a matrix A ∈ E m×m
by its matrix representation in S:
M (α ) M (α ) · · ·
α11 α12 ··· α1m M (α1m )
11 12
α21 α22 ··· α2m M (α21 ) M (α22 ) · · · M (α2m )
A =
... .. .. .. 7 −→ M (A) = .. .. .. ..
. . .
. . . .
αm1 αm2 ··· αmm M (αm1 ) M (αm2 ) · · · M (αmm )
| {z } | {z }
an m×m matrix over E an mn×mn matrix over F
Appendix A1: FIELDS AND EXTENSIONS
This map preserves more than the ring operations ‘+’ and ‘×’; it also respects traces and
determinants: for all A ∈ E m×m ,
(We continue to distinguish the usual matrix trace ‘tr’ and the trace ‘Tr’ for field extensions
using lower and upper case, respectively. Note the necessity of using the norm and trace
maps E → F since matrices on the left are over E; matrices on the right are over F .) The
first formula is easy to see by adding diagonal entries on both sides; the second formula
can be proved by first verifying it for elementary matrices A ∈ E m×m , then using the
multiplicative property to extend to the general case).
Now given a tower of finite extensions K ⊇ E ⊇ F , we have three trace maps and
three norm maps
TrK/F NK/F
......................................................................... .........................................................................
............... ........... ............... ...........
........... ......... .. ........... ......... ..
................ ..............
. .. ................ ..............
. ..
. . .
. . . ...................................................................... .. . . ......................................................................
K ..................................................................
E F K ..................................................................
E F
TrK/E TrE/F NK/E NE/F
The transitivity of norm and trace maps is the assertion that these diagrams commute:
Proof. Use the observations above, restricting the matrix A ∈ E m×m to lie in the matrix
representation of K, where m = [K : E].
Proof. We may use {1, α, α2 , . . . , αn−1 } as a basis for E over F , and let {β1 , β2 , . . . , βm }
be a basis for K over E (as in the proof of Theorem A1.1). The matrix of Tα with respect
to the basis {αj βi : 16i6m, 16j6n} is MK (α) = Im ⊗ ME (α) where ME (α) is the n × n
Appendix A1: FIELDS AND EXTENSIONS
We will assume r1 , . . . , rn are not all zero (otherwise one clearly takes r = 0). Now to
find r, first find the smallest positive integer b such that bri ∈ Z for all i (so b is the least
common denominator). Then take a = gcd(br1 , . . . , brn ). Recall that there exist integers
k1 , . . . , kn such that
k1 ·br1 + k2 ·br2 + · · · + kn ·brn = a
and this is the least positive element in Zbr1 + · · · + Zbrn . Dividing both sides by b, we
get r := ab as the least positive integer in the additive subgroup Zr1 + · · · + Zrn ⊂ Q.
Again assuming r1 , . . . , rn are not all zero, the additive subgroup Zr1 + · · · + Zrn ⊂ Q
is infinite cyclic. Denote by r = wt(r1 , . . . , rn ) ∈ Q its unique positive generator (so that
±r are the two choices of generator). Also for any nonzero polynomial f (x) = a0 + a1 x +
· · · + an xn ∈ Q[x], define the weight of f (x) by
wt f = wt f (x) = wt(a0 , a1 , . . . , an ).
Lemma A2.1. Suppose f (x), g(x), h(x) ∈ Q[x] are nonzero polynomials.
(i) f (x) ∈ Z[x] iff wt f ∈ Z; and in this case, the weight of f is simply the greatest
common divisor of its coefficients.
(ii) wt(f g) = wt f · wt g.
(iii) Assume f (x) = g(x)h(x). If at least two of the polynomials f, g, h are monic
with integer coefficients, then so is the third.
where each of the three polynomials uf (x), ug (x), uf g (x) ∈ Z[x] has weight 1. Reducing
the fraction rs rs a
t ∈ Q to lowest terms as t = b where a, b are relatively prime positive
integers, we obtain
Appendix A2: POLYNOMIALS AND IRREDUCIBILITY
If b > 1 then there exists a prime p dividing b, with p 6 a. Reducing both sides of
(A2.2) modulo p, we find a product of two nonzero polynomials in Fp [x] equal to the zero
polynomial. This is a contradiction, since F [x] has no zero divisors (by comparing leading
terms on both sides) for any field F . This shows that b = 1. Now the right hand side of
(A2.2) clearly has weight divisible by a. Since the left hand side has weight 1, we obtain
a = 1. This gives rs = t as required.
(iii) If g(x), h(x) are monic with integer coefficients, then clearly so is their product.
Now suppose f (x), g(x) ∈ Z[x] are monic with f (x) = g(x)h(x). Since wt f = wt g = 1,
we have wt h = 1 by (ii); and then h(x) ∈ Z[x] by (i). By comparing leading terms, h(x)
is also monic.
In the light of Theorem A1.2, it is useful to have tests for irreducibility of polynomials.
In the case of number fields, the most useful such test is the following.
Theorem A2.3. Let f (x) ∈ Z[x]. Then f (x) is irreducible in Q[x] iff it is irreducible
in Z[x].
Proof. Any nontrivial factorization f (x) = f1 (x)f2 (x), with nonconstant factors fi (x) ∈
Z[x], gives a nontrivial factorization in Q[x]. For the converse, let f (x) ∈ Z[x] and sup-
pose that f (x) is reducible in Q[x]. We may assume wt f (x) = 1; otherwise divide f (x)
by its weight. By assumption, f (x) = f1 (x)f2 (x) where each of the factors fi (x) ∈ Q[x]
where ri = wt fi (x) and ui (x) ∈ Z[x]. By
has degree at least 1. Now fi (x) = ri ui (x)
Lemma A2.1, wt u1 (x)u2 (x) = wt u1 (x) wt u2 (x) = 1, so f (x) = u1 (x)u2 (x) where
each of the factors ui (x) ∈ Z[x] has degree at least 1.
Proof. Supposing that f (x) is reducible in Z[x], then f (x) = g(x)h(x) where g(x) ∈ Z[x]
has leading term bxk , h(x) ∈ Z[x] has leading term cxn−k with 1 6 k 6 n−1; and bc = an
which is not divisible by p. Reducing mod p gives a factorization of an xn (mod p) in Fp [x].
Since Fp is a field, Fp [x] has unique factorization; and so after reduction mod p, g(x) and
h(x) must reduce to bxk and cxn−k (mod p) respectively. This means that the original
polynomials g(x), h(x) ∈ Z[x] must both have constant term divisible by p. But this means
that f (x) = g(x)h(x) must have constant term divisible by p2 , a contradiction.
Appendix A3: Algebraic Integers
Theorem A3.1. Let θ ∈ C be algebraic, and let m(x) ∈ Q[x] be its minimal
polynomial. Then the following conditions are equivalent.
(i) m(x) ∈ Z[x].
(ii) θ is a root of some monic polynomial with integer coefficients.
(iii) Z[θ] is a finitely generated as an additive group (or Z-submodule of C).
(iv) There is a chain of subrings Z[θ] ⊆ R ⊂ C such that R is a finitely generated as
an additive group (or Z-submodule of C).
where um (x), uh (x) ∈ Z[x] have weight 1. Since f (x) ∈ Z[x] is monic, wt f (x) = 1. By
Lemma A2.1, rs = 1 so
Since um (x) = 1r m(x) has positive leading term 1r , comparing leading coefficients on the
left and right (these being integers) gives r = s = 1. In particular, m(x) = um (x) ∈ Z[x].
This gives (i).
Now suppose (ii) holds. There exist n > 0 and integers a0 , a1 , . . . , an−1 ∈ Z such that
θn + an−1 θn−1 + · · · + a1 θ + a0 = 0.
In this case, the elements 1, θ, θ2 , . . . , θn−1 generate Z[θ] as an additive group. To see this,
note that the additive subgroup generated by 1, θ, . . . , θn−1 is
Our hypothesis shows that θn ∈ A. Multiplying both sides by θ yields θn+1 ∈ A. Proceed-
ing inductively, θj ∈ A for all j > 0, and so A = Z[θ], which gives (iii).
It is obvious that (iii) implies (iv). Finally suppose (iv) holds, and let α1 , α2 , . . . , αn ∈
R such that
R = Zα1 + Zα2 + · · · + Zαn .
Denote by T : R → R the Z-module homomorphism (i.e. homomorphism of additive
groups) defined by α 7→ θα. There exist aij ∈ Z such that
X
T (αi ) = θαi = aij αj .
j=1
(In general the choice of coefficients aij ∈ Z is not unique; however, this
point does not
affect our argument.) Then f (T ) = 0 where f (x) = det(xI − A), A = aij : 1 6 i, j 6 n .
Clearly f (T ) : R → R is the Z-module homomorphism (i.e. homomorphism of additive
groups) α 7→ f (θ)α = f (T )α = 0. Since R has no zero divisors, this implies that f (θ) = 0.
But f (x) ∈ Z[x] is monic by construction, so (ii) follows.
Proof. (i) Let m and n be the degrees of α and β over Q, respectively. Then Z[α, β] =
Pm−1 Pn−1 i j
i=0 n=0 Zα β . Since Z[α+β] ⊆ Z[α, β] where Z[α, β] is finitely generated, α+β ∈ I
by Theorem A3.1. The same argument holds for α−β and αβ.
(ii) The minimal polynomial of r ∈ Q over Q is m(x) = x−r. Use the characterization
of algebraic integers given in Theorem A3.1(i).
(iii) Let θ ∈ A be a root of m(x) = xn + an−1 xn−1 + · · · + a1 x + a0 ∈ Q[x]. Choose
k ∈ Z such that kai ∈ Z for all i. Then α = kθ is a root of
kn m x
= xn + kan−1 xn−1 + k 2 an−2 xn−2 + · · · + k n−1 a1 x + k n a0 ∈ Z[x]
k
so that α ∈ I.
Appendix A3: ALGEBRAIC INTEGERS
Proof. We may use {1, α, α2 , . . . , αn−1 } as a basis for F = Q[α], and let {β1 , β2 , . . . , βm }
be a basis for E over F ; thus m = [F : Q] and mn = [E : Q] (see the proof of Theo-
rem A1.1). The matrix of T = Tα with respect to the basis {αj βi : 16i6m, 16j6n} is
Im ⊗ M where M is the n × n companion matrix of h(x). The result follows.
Proof. Let {α1 , . . . , αn } be a basis for E over Q. Without loss of generality, each αi ∈ O;
otherwise, by Theorem A3.2(iii), replace αi by a positive integer multiple thereof. Consider
the free abelian group (i.e. Z-submodule of E) generated by our basis:
Using the nondegenerate bilinear form in Theorem A1.6, there is another basis {β1 , . . . , βn }
of E over Q, dual to the first basis, such that TrE/Q(αi βj ) = δij . Now given θ ∈ O, we may
Pn
express θ as a linear combination of the second basis as θ = j=1 bj βj for some bj ∈ Q.
Since αi , θ ∈ O for each i, we have αi θ ∈ O and so bi = TrE/Q(αi θ) ∈ Z. This shows that
so O is a free abelian group of rank at most n. Recalling that O has a subgroup L which
is free abelian of rank exactly n, this forces O to be free abelian also of rank n.
Appendix A3: ALGEBRAIC INTEGERS
This determinant is a nonzero integer since it is the Gram matrix of our nondegenerate
bilinear form [a, b] = TrE/Q(ab) with respect to our base. Now consider another base
0 0 0
Pn
{θ1 , . . . , θn } for O over Z, so that θi = i=1 aij θj and the matrix A = aij : 1 6 i, j 6 n
has integer entries. The inverse matrix A−1 expressing the original base of θi ’s in terms of
the θj0 ’s must similarly have integer entries; and so det A = ±1. then
√
Example A3.5: Quadratic Fields. Consider a quadratic extension E = Q[ d] ⊃ Q where d 6= 1
is a squarefree integer (i.e. a product of distinct primes). The extension √ is real quadratic if d > 2;
or imaginary
√ quadratic if d 6 −1. The matrix of Tα , α = a + b d ∈ OE with respect to the
basis {1, d} is [ ab db
a
]. In order that α ∈ O E , Theorem A3.4 requires that both TrE/Q α = 2a and
NE/Q α = a2 −db2 are integers. We have two cases. (Note that d 6≡ 0 mod 4 since d is naturally
assumed to be squarefree.) √
(i) When d ≡ 2 or 3 mod 4,√this simplifies to √ a, b ∈ Z and we have O ⊆ Z[ d]; 2and the reverse
0
containment is clear, so O = Z[ d] has base {1, d}. The discriminant is D := det[ 0 2d ] = 4d.
(ii) When d ≡ 1 mod 4, we instead have a = u and b = v2 where u, v ∈ Z with u ≡ v mod 2 so
1
√ 2
O ⊆ Z[θ] where θ = 2 (1 + d), and once again equality holds: O = Z[θ] has base {1, θ}. In this case
1
the discriminant is D = det[ 21 1+d ] = d.
2
As above, let E be a number field, and O its ring of integers. Denote by O× the
group of units (invertible elements) in O. By abuse of language, these are often called
the units of E. In the following, r denotes the number of embeddings of E in R (i.e. ring
monomorphisms E → R) and 2s is the number of pairs (under complex conjugation) of
embeddings E → C which do not lie in R. The total number of embeddings of E in C is
r + 2s = [E : Q]. (This relation, and the following theorem, hold for any finite extension
E ⊇ Q, Galois or not.)
Example A3.7: The Rationals. Q has (r, s) = (1, 0) and all its units Z× = {±1} are roots of
unity, a cyclic group of order 2.
Example
√ A3.8: An Imaginary Quadratic Field. The imaginary quadratic extension √ E=
Q[ −3] ⊃ Q has (r, s) = (0, 1). Its ring of integers O = Z[ω], ω = ζ3 = 21 (1 + −3) has a
group of units O× = {±1, ±ω, ±ω 2 } which is cyclic of order 6. There are no units of infinite order,
−b
as r + s − 1 = 0. As explained above, {1, ω} is a base for O. For α = a + bω we have Tα = [ ab a−b ]
2 2
with respect to our base; and NE/Q α = a −ab+b . To find units, we require integer solutions of
NE/Q(a+bω) = a2 −ab+b2 = 34 a2 + 14 (a−2b)2 = 1. The equation requires |a| 6 1, and a similar
argument gives |b| 6 1. After checking all nine pairs (a, b) satisfying these inequalities, we find only
six solutions of the Diophantine equation, viz. (a, b) ∈ {±(1, 0), ±(0, 1), ±(1, 1)} which gives the six
units listed above.
√
Example A3.9: A Real Quadratic Field. √ The real quadratic extension E = Q[ 7] ⊃ Q has
= Z[ 7] has units O× = {±g k : k ∈ Z} including two roots
(r, s) = (2, 0). Its ring of integers O √
of unity ±1 and the unit g = 8 + 3 7 which generates an infinite √ cyclic group (a free group on
r + s − 1 = 1 generator). The norm map N = NE/Q : O → Z, a + b 7 7→ a2 − 7b2 is similarly useful
in verifying these claims; but we omit the details.
Two nonzero elements α, β ∈ O generate the same principal ideal, i.e. αO = βO, iff
β = uα for some u ∈ O× . In this case we say α and β are associates in O. Denote
by SE the set of all nonzero elements of OE which are not units. An element α ∈ SE is
reducible if α = βγ for some β, γ ∈ SE . If α is not reducible, it is irreducible (in O).
Assuming α, α0 ∈ SE are associates, then α is reducible iff α0 is. Every element in SE
is expressible as a finite product of irreducible elements; but this factorization is not in
general unique since any factorization α = π1 π2 · · · πk (with irreducible factors π1 , . . . , πk )
yields other such factorizations through permutations of the k factors, or through the
replacement of the irreducible factors by suitable associates (a process called migration
of units). We say OE (or, abusing language, E itself) has unique factorization, if every
element α ∈ SE factors into irreducible factors an an essentially unique way (i.e. up to
permutation of the factors, and migration of units). Not every ring of integers OE has
unique factorization (i.e. of elements). But OE always has unique factorization of ideals
(Theorem A3.10 below). When O is a principal ideal ring, this forces elements to also have
unique factorization; but since ideals in O are not necessarily principal, we do not always
obtain unique factorization of elements.
It might help here to keep in mind the hierarchy ED ⇒ PID ⇒ UFD ⇒ ID where
an integral domain (ID) is a commutative ring with identity having no zero divisors; a
unique factorization domain (UFD) is an integral domain with unique factorization
(of elements as product of irreducibles); a principal ideal domain (PID) is an integral
domain in which every ideal is principal; and a Euclidean domain (ED) is an integral
domain in which the ‘division algorithm’ holds. More about this appears at the end of
this Appendix. Since our interest focuses on the special case of the ring O of integers
Appendix A3: ALGEBRAIC INTEGERS
in a number field, the hierarchy simplifies (see Theorem A3.14). In particular the rule
PID ⇒ UFD has a valid converse in the case of rings of integers, but not in the general
case; recall that Z[x] is a UFD with a nonprincipal ideal (2, x). We will postpone the
relevant theorem until after presenting some examples of rings of integers in a few specific
number fields. And before that, we need to review some terminology.
Recall that an ideal is an additive subgroup A ⊆ O such that OA ⊆ A, i.e. ra ∈ A
for all r ∈ O, a ∈ A. The sum and product of two ideals are the ideals defined by
A+B = {a+b : a ∈ A, b ∈ B};
AB = {finite sums of products ab with a ∈ A, b ∈ B}
= {a1 b1 +a2 b2 + · · · +ak bk : k > 1, a1 , a2 , . . . , ak ∈ A, b1 , b2 , . . . , bk ∈ B}.
We often abbreviate (a) = aO ⊆ O for the principal ideal generated by an element
a ∈ O. More generally, the ideal generated by a list of elements a1 , . . . , ak ∈ O is
(a1 , a2 , . . . , ak ) := (a1 ) + (a2 ) + · · · + (ak ) = Oa1 + Oa2 + · · · + Oak ⊆ O.
Two elements a, b ∈ O generate the same ideal (a) = (b) iff a and b are associates. A
proper ideal P ⊂ O is prime any of the following equivalent conditions are satisfied:
(i) Whenever ab ∈ P with a, b ∈ O, we must have a ∈ P or b ∈ P.
(ii) If P ⊆ AB where A, B ⊆ O are ideals, we have P ⊆ A or P ⊆ B.
(iii) The quotient ring O/P is an integral domain (i.e. it has no zero divisors).
A nonzero principal ideal (π) ⊂ O is prime iff its generator π is irreducible. A proper
ideal M ⊂ O is maximal if there is no proper ideal of O which strictly contains M;
equivalently, O/M is a field. So every maximal ideal is prime. The converse is not true
in general, but the ring of integers O of a number field is special in many ways including
this:
In (iv), each quotient field O/Pi is a residual field; the degree fi of its extension over Fp is
the residual degree; and the number of times ei that Pi divides (p) is the ramification
index of Pi . We say p ramifies in E if at least one of the indices satisfies ei > 1. We say
p remains prime if (p) = pO ⊂ O is prime; and p splits if there are d > 2 distinct prime
factors. There are only finitely many primes which ramify (namely, those primes which
divide the discriminant). For Galois extensions, (iv) simplifies to (p) = (P1 P2 · · · Pd )e ,
i.e. all ramification indices coincide: ei = e.
Example A3.11: The Rational Integers. Z has unique factorization. Here the irreducible
elements have the form ±p where p is an ordinary prime (of course p and −p are associates) and the
corresponding prime ideals have the form (p) = pZ. All ideals are principal, and unique factorization
of elements is due to unique factorization of ideals; for example, (12) = (2)2 (3) yields 12 = 22 ·3.
Addition of ideals corresponds to taking greatest common divisors: (a1 ) + (a2 ) + · · · + (ak ) =
(a1 , a2 , . . . , ak ) = (d) where d = gcd(a1 , a2 , . . . , ak ).
Example A3.13: A Quartic Extension. Let E = Q[θ] where θ is a root of f (x) = x4 −x+3.
Since f (x) is irreducible over F2 , it is irreducible over Z and hence over Q. It may be shown that
O = Z[θ] and that the quartic extension E ⊃ Q has discriminant 6885 = 34 ·5·17; and the only
roots of unity in O are ±1. Since r+s−1 = 0+2−1 = 1 in Theorem A3.6, the unit group has
the form O× = {±g k : k ∈ Z} for some g. Computation shows that we may take g = θ2 +2θ+2,
g −1 = −θ3 +θ2 −1.
The rational prime 2 remains prime in E since O/2O ∼ = Z[x]/(2, f (x)) ∼
= F2 [x]/(x4 +x+1) ∼
= F16
(Example 3.3). Its residual degree is 4. Similarly, 11, 13, 43, 53, 61, . . . remain prime.
Appendix A3: ALGEBRAIC INTEGERS
The rational prime 3 ramifies as (3) = P3a P33b where both distinct factors P3a = (3, θ) and P3b =
(3, 2+θ) have residual degree 1. This follows from O/3O ∼ = Z[x]/(3, f (x)) ∼= F3 [x]/(x(x+2)3 ) ∼
=
F3 [x]/(x) ⊕ F3 [x]/((x+2)3 ) ∼
= 3F ⊕ S where S ∼
= 3F [x]/(x 3 ) is a local ring of order 27. Although the
residual degrees coincide (f1 = f2 = 1), the ramification indices e1 = 1 and e2 = 3 do not. This
points to the fact that the extension is not Galois.
The rational prime 17 ramifies as (17) = P217 P017 P00 0
17 where P17 = (17, 13+θ), P17 = (17, 10+θ),
P17 = (17, 15+θ). Here O/17O ∼
00
= Z[x]/(17, f (x)) ∼
= F17 [x]/((x+13)2 (x+10)(x+15)) ∼ = R ⊕ F17 ⊕ F17
where R = F17 [ε]/(ε2 ) is the ring of dual numbers over F17 , a local ring of order 289. Again the
ramification indices 2,1,1 do not all coincide.
The rational prime 5 ramifies as (5) = P25 P02 5 where the distinct prime ideals P5 = (5, 1+θ)
and P05 = (5, 3+3θ+θ2 ) have residual degrees 1,2 and ramification indices 2,1. Here O/5O ∼ =
Z[x]/(5, f (x)) ∼
= F5 [x]/((x+1)2 (x2 +3x+3)) ∼ = R ⊕ F25 where R is the ring of dual numbers over F5 .
The rational prime 7 splits as (7) = P7 P07 where the residual degrees are 1,3 and both ramification
indices are 1. Here O/7O ∼ = Z[x]/(7, f (x)) ∼
= F7 [x]/((x+2)(x3 +5x2 +4x+5) ∼ = F7 ⊕ F343 . We find a
similar behaviour at the primes 19, 23, 37, 59, . . . .
The rational prime 29 splits as (29) = P29 P029 P00 29 with residual degrees 1,1,2 and ramification
indices 1,1,1. Here O/29O ∼ = Z[x]/(29, f (x)) ∼
= F29 [x]/((x+3)(x+6)(x2 +20x+5) ∼ = F29 ⊕ F29 ⊕ F841 .
We find a similar behaviour at the primes 31, 41, 47, . . . .
The examples above illustrate, among other things, the existence of nonprincipal ideals
in exactly those cases where unique factorization (of elements) fails. This is no coincidence:
As we have previously reminded the reader, for general integral domains, the PID property
does not imply the UFD property; an example is the ring Z[x] which is a UFD but the
ideal (2, x) is nonprincipal.
Proof of Theorem A3.14. Suppose O is a principal ideal domain, and suppose a, b ∈ O
such that ab is divisible by an irreducible element p. Since ab = pd for some d ∈ O,
(a)(b) = (p)(d). But ideals in O factor uniquely, and the nonzero ideal (p) ⊂ O is prime;
so (a) ⊆ (p) or (b) ⊆ (p), i.e. p divides a or p divides b.
For the converse, let A ⊆ O be an arbitrary ideal, and we must show that A is
principal. We may assume A is nonzero, so A = P1 P2 · · · Pk is a product of nonzero
prime ideals Pi . If each Pi is principal, so is A; thus we may assume A = P is itself a
nonzero prime ideal. Now P ∩ Z is a prime ideal in Z, so P ∩ Z = pZ for some rational
prime p. (Alternatively, O/P is a finite field, so O/P ∼ = Fq where q = pe , e > 1 and
p is prime.) Let p = π1 π2 · · · πr be the unique factorization of p as a product of irre-
ducibles in O. Since p = π1 π2 · · · πr ∈ P where the ideal P is prime, we have πi ∈ P for
some i. Now (πi ) ⊆ P ⊂ O; and since the nonzero prime ideal (πi ) is maximal, (πi ) = O.
One way to verify that O is a UFD (and hence a PID) is to show that it satisfies the
division algorithm. We say that O is Euclidean if for every x, d ∈ O with d 6= 0, there
Appendix A3: ALGEBRAIC INTEGERS
exist q, r ∈ O such that x = qd + r with N(r) < N(d). More generally, any integral domain
satisfying such a division algorithm is called a Euclidean domain (ED) (whose ‘norm’
may go by another name, such as ‘degree’, depending on the context; but we do require
N(ab) = N(a) N(b), N(a) ∈ {0, 1, 2, . . .}, and N(a) = 0 iff a = 0).
The ring of Gaussian integers is Z[i]. The ring of Eisenstein integers is Z[ω].
Here i = ζ4 and ω = ζ3 .
Corollary A3.16. The rings Z[i] and Z[ω] are Euclidean. Hence these rings are
UFDs, as well as PIDs.
Proof. Let d = a + bi ∈ Z[i] be nonzero. Then the principal ideal (d) = Zd + iZd ⊂ O
forms a square lattice (the vertices of the square grid shown, below left).
Given z ∈ O, let qd ∈ (d) be a vertex of the square grid that is closest to z Although the
|d|
choice of closest vertex may not be unique, it certainly has distance at most √ 2
from z, i.e.
2 1 2 1
N (r) = |r| 6 2 |d| = 2 N (d) where r = z − qd. This shows that Z[i] is Euclidean. A sim-
ilar argument, using a grid formed by equilateral triangles, shows that Z[ω] Euclidean.
[A][B] = [AB]
is well-defined for ideal classes; that is, it does not depend on the choice of representative
of each ideal class. Furthermore, this operation makes the set of ideal classes of O into an
abelian group, called the ideal class group of O (or of E). Since OA = A, the identity
element of this group is [O]. This class consists of all the principal ideals of O. Thus O is
a PID iff its ideal class group is trivial. Now for the nontrivial result:
Theorem A3.17. The ideal class group of every number field is finite.
The class number of a number field E, usually denoted by hE , is the order of its ideal
class group. Constructing elements of the ideal class group of a given order (if they exist)
is usually not too hard; but finding explicit upper bounds on hE is often hard. Fortunately
for many of the smaller number fields of interest, class numbers and groups is within the
reach of appropriate computational software.
It follows immediately that if h = hE is the class number of E, then for every ideal
A ⊆ O, the ideal Ah ⊆ O is principal. This is often an adequate substitute for having a
PID. Sometimes it is helpful to note that h can be replaced here by the exponent of the
ideal class group. (Recall that the exponent of a group is the least common multiple of
the orders of the elements of that group, this being a divisor of the group order.)
Appendix A4: Normal and Separable Extensions
Theorem A4.2. (i) Let F be a field, and let f (x) ∈ F [x] be a nonconstant polyno-
mial. Then there exists a splitting field E ⊇ F for f (x) over F ; and the splitting
field is unique up to isomorphism.
(ii) A finite extension E ⊇ F is normal iff it is the splitting field of some polynomial
f (x) ∈ F [x] over F .
We therefore speak of the splitting field (rather than a splitting field) for f (x) ∈ F [x]
over F . We only sketch the construction of the splitting field of f (x) ∈ F [x] over F ,
as follows: First construct an extension E1 = F [α1 ] ∼
= F [x]/ f (x) such that f (x) =
(x − α1 )g(x), g(x) ∈ E1 [x], and then recursively apply this process to g(x), repeating
until we have obtained an extension in which f (x) splits into linear factors. The resulting
extension is finite by Theorem A1.1. For (ii), given a finite normal extension E ⊇ F , we
can easily express E = F [α1 , . . . , αn ], then take f (x) to be the least common multiple (in
F [x]) of the minimal polynomials of the generators α1 , . . . , αn over F ; clearly E is the
splitting field of f (x) over F .
An algebraic extension of fields E ⊇ F is separable if every irreducible polynomial
f (x) ∈ F [x] has no repeated roots in E, i.e. f (x) is not divisible by (x − α)2 for any α ∈ E.
Appendix A4: NORMAL AND SEPARABLE EXTENSIONS
Example A4.3: An Inseparable Extension. Let E = Fp (t), the field of rational functions in an
indeterminate t, with coefficients in the prime order field Fp (thus E is the field of quotients of the
polynomial ring Fp [t]). This has a subfield F = Fp (tp ), and the extension E ⊃ F has degree p with
basis {1, t, t2 , . . . , tp−1 }. It is not separable; the polynomial f (x) = xp − tp ∈ F [x] is irreducible in
F [x]; yet it factors as f (x) = (x−t)p in E[x], where it has one distinct root t with multiplicity p. Now
E = F [t] ⊃ F is the splitting field of f (x), the minimal polynomial of t over F , so it is normal but
inseparable. Also TrE/F t = 0 as seen as seen from the coefficient of xp−1 in f (x). More generally,
TrE/F (tk ) = 0 for k = 0, 1, 2, . . . , p−1 and so the trace map of the extension vanishes identically:
TrE/F = 0. The conclusion of Theorem A1.7(ii) fails dramatically; but so does the hypothesis since
E and F are infinite fields of positive characteristic p.
While examples like A4.3 do arise naturally in certain situations, throughout this course
we will treat them as pathological cases to be avoided. We focus instead on fields which
are either finite or have characteristic zero, which are always separable by Theorem A4.5
below.
Proof. Consider first the case char E = char F = 0. Suppose f (x) ∈ F [x] is monic
irreducible, and write f (x) = a0 + a1 x + · · · + an−1 xn−1 + xn where n > 1 and ai ∈ F .
Suppose θ ∈ E is a root of f (x); so f (x) is the minimal polynomial of θ over F . Term-by-
term differentiation shows that the derivative f 0 (x) ∈ F [x] has leading term nxn−1 where
the coefficient is nonzero (it is here that we require the hypothesis that char F = 0) and
in particular deg f 0 (x) = n−1. By minimality of the degree of the irreducible polynomial,
f 0 (θ) 6=0. However if f (x) = (x − θ)2 g(x) where g(x) ∈ E[x], then the derivative f 0 (x) =
(x − θ) (x − θ)g 0 (x) + 2g(x) has θ as a root. This is a contradiction.
A similar argument works if E = Fqr , F = Fq , q = pe , p prime, r, e > 1. Let
f (x) ∈ F [x] be monic irreducible of degree n; and suppose θ ∈ E is a repeated root
of f (x). The argument above shows that f 0 (x) = 0 ∈ F [x]. In characteristic p this
Appendix A4: NORMAL AND SEPARABLE EXTENSIONS
simply means that every term in f (x) has exponent divisible by p, so n = rp and
f (x) = b0 +b1 xp +b2 x2p +· · ·+br xrp = g(x)p where g(x) = c0 +c1 x+c2 x2 +· · ·+cr xr ∈ F [x]
e−1
and ci = bpi using Theorem 3.7. But deg g(x) = r < n and evidently g(θ) = 0, again
contradicting the minimality of the degree of the minimal polynomial of θ over F .
...........
Theorem A4.6. Let E ⊇ F be a separable extension of E.. ..............
... C
..
... .
... ...
...
degree n, and let C be an algebraically closed field containing F . ...
...
... .
.
...
.
... ...
Then there exist exactly n distinct F -monomorphisms ... ...
... .....
.. ..
from E into C. F
Proof. First consider the special case that E = F [α] for some α ∈ E. Let f (x) =
Irrα,F (x). Since C is algebraically closed, f (x) splits into linear factors in C[x], say f (x) =
(x − α1 )(x − α2 ) · · · (x − αn ) where each αi ∈ C. For each i, observe that Irrαi ,F (x) = f (x)
since f (x) is monic irreducible in F [x] and has αi as a root.
For each i = 1, 2, . . . , n, define σi : F [α] → C by g(α) 7→ g(αi ) where g(x) ∈ F [x].
Then σi is well-defined, since if g(α) = h(α), then g(x) ≡ h(x) mod (f (x)), in which
case g(αi ) = h(αi ). Clearly σi : E → C is a ring homomorphism, fixing every element
of F . Also σi is one-to-one, for if σi (g(α)) = g(αi ) = 0, then f (x) divides g(x), so that
g(α) = 0. So each σi : E → C is an F -monomorphism. The image of σi is the subfield
σi (E) = F [αi ] ⊆ C.
Now F [αi ] ∼= F [α] = E is separable over F , so α1 , α2 , . . . , αn are distinct. Since
σi (α) = αi , the monomorphisms σ1 , σ2 , . . . , σn are distinct.
Finally, let σ be any F -monomorphism from E into C. Then f (σ(α)) = σ(f (α)) =
σ(0) = 0, so that σ(α) ∈ {α1 , α2 , . . . , αn }. Let us say that σ(α) = αi . Since the ring
homomorphisms σ and σi agree on F and on α, they must agree on F [α] = E, i.e. σ = σi .
Thus σ1 , σ2 , . . . , σn are the only F -monomorphisms from E into C.
Consider now the general case E ⊃ F , and let α ∈ E r F . We may assume that
F [α] ⊂ E; otherwise we are done by the previous case. We have E ⊃ F [α] ⊃ F and
n = mt where m = [E : F [α]] and t = [F [α] : F ]. By induction on the degree of extension,
there exist t distinct F -monomorphisms σ1 , σ2 , . . . , σt : F [α] → C. Let αi = σi (α).
Appendix A4: NORMAL AND SEPARABLE EXTENSIONS
C..
... ...
... ...
....
.
...
...
. ...
... ...
... ...
....
. ...
. ...
... ...
... θ ..
σ i . ij .
E ....................... ............... σ (E) i
............................ θ (σi (E))
ij
... ...
... ... ... ...
... ... ... ...
... ....
. ... .
....
... . ... .
... ... ... ...
... ... ... ...
...
... .
. ... ...
. .....
.. ...
...
... ..... ... ...
.. ....
σ i .
...............
F [α]
.
...................... F.[αi ]
... ...
...
... ...
... ....
.
... .
... ...
... ...
...
... ....
.
.
... ...
... ...
F
Since σi : E → σi (E) is an F -isomorphism, the extension σi (E) ⊇ F is separable; hence by
Theorem A4.4, the extension σi (E) ⊇ F [αi ] is separable. By induction on the degree of ex-
tension, for each i there exist m distinct F [αi ]-monomorphisms θi1 , θi2 , . . . , θim : σi (E) →
C. The composite maps θij ◦ σi : E → C constitute mt = n distinct F -monomorphisms.
To see that these are the only F -monomorphisms E → C, suppose that σ : E → C is an
F -monomorphism. As before, σ must take α to some αi . Then σ ◦ σi−1 : σi (E) → C is an
F [αi ]-monomorphism, so by induction, σ ◦ σi−1 = θij for some j, whence σ = θij ◦ σi as
required.
Lemma A4.7. Let V be a vector space over an infinite field F . Then V is not a
union of finitely many proper subspaces.
Proof. Suppose there exists a positive integer n for which there exists a vector space V
covered by finitely many proper subspaces. We may further suppose n is minimal with
this property; and now we seek a contradiction. Clearly n > 1; and there exists a vector
space V = V1 ∪ V2 ∪ · · · ∪ Vn over F where each Vi < V is a proper subspace. For each
S
i ∈ {1, 2, . . . , n}, there exists vi ∈ V r j6=i Vj by minimality of n. It is easy to see that
the affine line L = {v1 + tv2 : t ∈ F } intersects each Vi in at most one point. However, L
has an infinite number of points in V = V1 ∪ V2 ∪ · · · ∪ Vn , a contradiction.
It is often useful to have a single generator for an extension field. The following result
guarantees that such a generator exists for all finite separable extensions. We present a
proof, however, only in the easiest cases which we care about most: finite fields and fields
of characteristic zero. For a proof in the general case, see e.g. Garling [Ga]. This is usually
called the Theorem of the Primitive Element, terminology that conflicts with usage
Appendix A4: NORMAL AND SEPARABLE EXTENSIONS
in the finite case, where a primitive element is a generator of the multiplicative group; so
we would prefer to call this the Theorem of Simple Extensions.
Proof in the case char F = 0 or |F | < ∞. The finite field case is easy: just take α to
be a generator of E × , by Theorem 3.2. Hence we assume the characteristic is zero. Let
C be an algebraically closed field containing F . By Theorem A4.6, there exist distinct
F -monomorphisms σ1 , σ2 , . . . , σn : E → C where n = [E : F ].
We claim that there exists α ∈ E such that the images σ1 (α), σ2 (α), . . . , σn (α) ∈ C
are distinct. To see this, we apply Lemma A4.7 as follows. Whenever 1 6 i < j 6 n, the
set Vij = {x ∈ E : σi (x) = σj (x)} is a proper subspace of the vector space E over F .
Also |F | = ∞ since char F = 0. Since E cannot be covered by finitely many proper
S
subspaces Vij , there exists α ∈ E r 16i<j6n Vij , and this α has the required property:
σi (α) 6= σj (α) whenever i 6= j.
Since [E : F ] < ∞, we have F (α) = F [α] by Theorem A1.2. So we have a tower
of extensions E ⊇ F [α] ⊇ F and n = [E : F [α]][F [α] : F ]. Since the restrictions
σ1 , . . . , σn : F [α] → C are distinct F -monomorphisms, we have n 6 [F [α] : F ] by Theo-
rem A4.6. Therefore [F [α] : F ] = n and E = F [α].
Remark: The use of an algebraically closed extension C in the proof of Theorem A4.8 was
merely a convenient crutch, and was not really necessary. All that is really required is a
finite normal extension of E, thereby avoiding reference to the Axiom of Choice.
Appendix A5: Field Automorphisms and Galois Theory
We give a very quick introduction to Galois theory, with a few key small examples. For
more details and proofs, see e.g. [Ga], [Sa]. The following is a restatement of Theorems A4.5
and A4.8.
Theorem A5.1. Let E ⊇ F be a finite extension. Assume either that E and F are
finite fields, or that they have characteristic zero. Then
(a) E = F [α] for some α ∈ E, i.e. the extension is simple.
(b) The extension E ⊇ F is separable. Recall: this means that for every polynomial
f (x) ∈ F [x] which is irreducible in F [x], the polynomial f (x) has no repeated
roots in E.
Throughout this section, all finite extensions considered are assumed to satisfy the hypothe-
ses (and therefore the conclusions) of Theorem A5.1.
Denote by Aut E the group of all automorphisms of a field E. Two elements α and
β in a field F are algebraic conjugates if there exists an automorphism σ ∈ Aut E of
some extension E ⊇ F such that σ(α) = β.
Proof. The result is clear for k = 1 since each σ ∈ Aut E is nonzero. Suppose that there
exist distinct automorphisms σ1 , . . . , σk ∈ Aut E which are linearly dependent over E; we
seek a contradiction. We may suppose our counterexample is minimal; so k > 2 and every
set of k − 1 distinct automorphisms of E is linearly independent. By assumption, there
exist c1 , c2 , . . . , ck ∈ E, not all zero, such that
c1 σ1 + c2 σ2 + · · · + ck σk = 0.
Multiply the first equation by σ1 (a) and subtract the second equation to get
c2 (σ2 (a)−σ1 (a))σ2 (x) + c3 (σ3 (a)−σ1 (a))σ3 (x) + · · · + ck (σk (a)−σk (a))σk (x) = 0
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY
for all x ∈ E. However, the coefficient of σ2 (x) in this linear combination is nonzero,
contrary to our assumption of the minimality of k. This is a contradiction as desired.
Note that the heavy lifting in the last proof was accomplished by Theorem A4.6. This is
also true of our next proof.
A finite extension E ⊇ F for which equality holds with |G(E/F )| = [E : F ] is a
Galois extension. In this case, G = G(E/F ) is the Galois group of the extension.
Alternatively, one may characterize an extension as Galois iff it is finite, normal and
separable. This equivalence is due to the following.
Qk
Proof. First suppose E is the splitting field of f (x) ∈ F [x] over F ; say f (x) = i=1 (x −
αi ) ∈ F [x] and E = F [α1 , . . . , αk ]. By Theorem A5.1, every σ ∈ G(E/F ) permutes the
roots α1 , . . . , αk ; and since these roots generate E over F , distinct elements of G(E/F )
yield distinct permutations of the roots, and |G(E/F )| 6 k!. Let C be an algebraic
closure of E, and let n = [E : F ]. By Theorem A4.6, there are exactly n distinct F -
monomorphisms E → C; and all of these must map E → E since they permute the roots
of f (x), these being generators of E over F . So we obtain n distinct elements of G(E/F ),
and the extension E ⊇ F is Galois.
Conversely, suppose E ⊇ F is normal. By Theorem A5.1, E = F [α] for some α ∈ F .
Let f (x) ∈ F [x] be the minimal polynomial of α over F , so that deg f (x) = n = [E : F ].
Qn
Since E is normal and separable over F , f (x) = i=1 (x − αi ) with distinct roots αi ∈ E.
By Theorem A5.4, G(E/F ) permutes α1 , α2 , . . . , αn transitively.
Quadratic field extensions are normal (and hence Galois). This is the field-theoretic
analogue of the fact that in group theory, subgroups of index 2 are normal:
Example A5.6: Quadratic Extensions. Assuming the hypotheses of Theorem A5.1, every
quadratic extension is Galois. Let E ⊃ F be a quadratic extension, and let α ∈ E r F . Since
E ⊇ F [α] ⊃ F where [E : F ] = 2, we must have E = F [α]. Let f (x) ∈ F [x] be the minimal
polynomial of α over F . Then f (x) is quadratic with a root in E, so f (x) has two distinct roots in
E: f (x) = (x − α)(x − α0 ) where α, α0 ∈ E. By Theorem A5.5, E ⊃ F is Galois. This means that
G(E/F ) is generated by an automorphism σ of order 2 interchanging α ↔ α0 .
For n > 3, there exist both Galois and non-Galois extensions of degree n. The next
two examples include both types for n = 3. A cyclic extension is a Galois extension
with a cyclic Galois group; this is the case in Example A5.7.
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY
By the Kronecker-Weber Theorem, every cyclic extension (being abelian) must be con-
tained in a cyclotomic extension. The extension of Example A5.7 is the ‘simplest’ example
of a cyclic extension of degree 3; it is a subfield of Q[ζ7 ] where we take α = ζ7 + ζ7−1 ; see
Example 4.9.
Example A5.8: Galois Closure. The real number α = 21/3 generates a cubic extension K =
Q[α] ⊃ Q. The minimal polynomial of α over Q is f (x) = x3 − 2 = (x − α)(x2 + αx + α2 ) where
the quadratic factor is irreducible over K. The cubic extension K ⊃ Q is not Galois; f (x) does
not split into linear factors in K[x], and the group G(K/Q) = Aut K is trivial, in accordance with
Theorem A5.4.
The splitting field E of f (x) is the Galois closure or normal closure of K, i.e. the smallest
Galois extension of Q containing K. Since f (x) = (x − α)(x − ωα)(x − ω 2α) where ω = ζ3 , we have
E = Q[α, ω]. Note that [E : Q] = [E : K][K : Q] = 2 · 3 = 6. The Galois group of the extension
is G = G(E/Q) = Aut E = hσ, τ i, a dihedral group of order 6 permuting the six roots in all 3! = 6
possible ways. Here τ denotes complex conjugation ω ↔ ω 2 and fixing α; σ cycles the three roots
as α 7→ ωα 7→ ω 2α 7→ α while fixing ω. The three roots of f (x) form the vertices of an equilateral
triangle embedded in C, on which G induces the full group of symmetries:
ωα •................
... ........
... ....... τ reflects across the horizontal axis of symmetry;
.... α
.......•
..
...
.. .........
.
...
. σ rotates 120◦ counter-clockwise about the center
ω 2α •........
Of course σ does not rotate the entire complex plane—it fixes all points of Q. The only elements of
G acting continuously on E are ι and τ .
√ √
Example A5.9: An Abelian Quartic √ Extension.
√ Let E = Q[ 2, 3] ⊃ Q. We show that this is
a simple extension generated by α = 2 + 3, an algebraic integer of degree 4. Direct computation
shows that α is a root of f (x) = x4 − 10x2 + 1 ∈ Q[x]. Clearly f (x) has no linear factors in Z[x],
since it has no roots in Z (indeed, no roots in F3 ). It has six monic quadratic factors in C[x]:
√ √ √ √ √ √
f (x) = (x2 +2 2x−1)(x2 −2 2x−1) = (x2 +2 3x+1)(x2 −2 3x+1) = (x2 −5+2 6)(x2 −5−2 6)
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY
√ √ √
but none of these factors are in Q[x] since 2, 3, 6 are all irrational.
√ √ It follows that √
f (x)√is ir-
reducible in Q[x]. From these factorizations it also follows that Q[ 2, 3] ⊆ √ ⊆ Q[ 2, 3] so
Q[α] √
E = Q[α] is a quartic extension of Q as claimed. The four roots of f (x) are ± 2 ± 3 ∈ E, so E is
the splitting field of f (x), hence a Galois extension of Q.
Let G = G(E/Q) = Aut E, so that √ |G| √= [E : Q] = 4. Every automorphism of E is
determined by its
√ action on
√ √ the generators
√ 2 and 3; but there are only four possible combinations
of sign changes 2 7→ ± 2, 3 7→ ± 3; so all four of these combinations must√yield automorphisms
√ √
of E. So
√ √ we must
√ have√ a Klein√ four-group G = hσ, τ i√= {ι, σ, √
τ, στ } where
√ σ( √2) = − 2,√ σ( 3)
√=
3; τ ( 2) = 2, τ ( 3) = − 3. Here ι = id and στ ( 2) = − 2, στ ( 3) = − 3, so στ ( 6) = 6.
Example A5.10: A Galois Extension Admitting the Dihedral Group of Order 8. The
polynomial f (x) = √ x4 − 2 ∈ Z[x] is irreducible over Q by Eisenstein’s Criterion A2.4. Its roots are
±α, ±iα where i = −1 and α = 21/4 . The splitting field of f (x) over Q is therefore E = Q[α, i] =
Q[α, ζ] where ζ = ζ8 = 1+i √ . Note that E = K[i] where K = Q[α] so [E : Q] = [E : K][K : Q] =
2
2 · 4 = 8. The Galois group G = G(E/Q) = Aut E = hσ, τ i is dihedral of order 8 where τ is complex
conjugation; σ(i) = i and σ permutes the four roots of f (x) cyclically as α 7→ iα 7→ −α 7→ −iα 7→ α.
Thus G permutes the four vertices of a square embedded in the complex plane as shown:
...• ...... iα
..... .........
.
. ...... ..... τ reflects across the horizontal axis of symmetry;
−α•.............. .....
.....•
... α
.....
..... ......
..... .....
. .
. σ rotates the four roots 90◦ counter-clockwise about the center
•....−iα
Let E ⊇ F be a Galois extension with Galois group G = G(E/F ). Galois theory gives
a beautiful description of all the intermediate fields K (i.e. E ⊇ K ⊇ F ), establishing a
one-to-one correspondence with the subgroups of G. A priori, it may not even be clear
why the number of intermediate fields K should even be finite, or whether there should
be any effective means of listing them all; but since G is a finite group, G has only
finitely many subgroups and these can be effectively enumerated, thereby giving the exact
number of subfields and their explicit description. This bijection, known as the Galois
correspondence, is naturally defined as follows:
n o
intermediate fields K : ←→ {subgroups H 6 G}
E⊇K⊇F
nσ ∈ G : σ(a) = ao
K 7−→ GK =
for all a ∈ K
na ∈ E : σ(a) = ao
FixE (H) = ←− H
for all a ∈ H
= fixed subfield of H (in E)
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY
The following application of Galois theory is typical: we want to justify why certain
(given) elements of E lie in a desired subfield. Given a Galois extension E ⊇ F , Theo-
rem A5.11 says that an element a ∈ E is fixed by every element of G = G(E/F ) iff a ∈ F .
See Appendix A7 for symmetric multivariate polynomials.
Proof. Let f (x) ∈ F [x] be the minimal polynomial of α over F , and let n = deg f (x). Let
E = F [α], so that [E : F ] = n and [K : F ] = mn where m = [K : E]. Since the extension
K ⊇ F is Galois, f (x) splits into linear factors in K[x] and there exist τ1 , τ2 , . . . , τn ∈ G
such that τ1 (α), τ2 (α), . . . , τn (α) ∈ K are the roots of f (x). (Note that the roots do not
necessarily lie in E.) Now
n
(x − τi (α)) = xn − a1 xn−1 + a2 xn−2 − · · · + (−1)n an ∈ F [x].
Q
f (x) =
i=1
and
P n
P P n
P
σ(α) = τi (σ(α)) = m τi (α) = ma1 = m TrE/F α = TrK/F α
σ∈G i=1 σ∈GK i=1
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY
by Corollary A1.10.
Proof. Let G = G(E/F ). Every algebraic conjugate of α has the form τ (α) ∈ E for some
τ ∈ G. Then
X X
TrE/F (τ (α)) = σ(τ (g)) = ρ(G) = TrE/F (α)
σ∈G ρ∈G
by Theorem A5.13, after substituting ρ = στ . The argument for norms is similar.
Proof. First consider the case that E = Fqn , F = Fq . By Theorem 3.8, G = G(E/F ) =
{ι, σ, σ 2 , . . . , σ n−1 } where σ(x) = xq and σ n = ι. Regarding σ as an F -linear transforma-
tion E → E at the moment, its minimal polynomial m(x) ∈ F [x] must divide xn −1. But if
deg m(x) < n, this would give a nontrivial F -linear combination of ι, σ, σ 2 , . . . , σ n−1 equal
to zero, contrary to Theorem A5.2. This cannot happen; so deg m(x) = n. This means
that m(x) coincides with the characteristic polynomial of σ on E. Thus E is a cyclic F [σ]-
module, i.e. there exists β ∈ F such that {β, σ(β), σ 2 (β), . . . , σ n−1 (β)} spans E over F ,
thereby forming a normal basis as required. See e.g. [HH, Chapter 11] for relevant results
from linear algebra.
It remains to consider the case E and F have characteristic zero; in particular they
are infinite fields. Here we paraphrase Artin’s proof [Ar]. By Theorem A5.1, E = F [α]
for some α ∈ E. Let f (x) ∈ F [x] be the minimal polynomial of α over F , so that
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY
Q
deg f (x) = n = [E : F ] = |G| and f (x) = (x − σ(α)). For each τ ∈ G consider the
σ∈G
polynomial
Y x − σ(α)
gτ (x) = ∈ E[x]
σ∈G
τ (α) − σ(α)
σ6=τ
of degree n − 1 (noting that the n roots of f (x) are distinct so there is no division by zero
here). The reader may recognize these n polynomials as the Lagrange interpolation basis
for the polynomials of degree n − 1 at the roots of f (x): the polynomial gτ (x) vanishes
at all the roots of f (x) except at τ (α), where it has value 1 (see Theorem 3.12). Now it
follows that
P
(A5.16) gτ (x) = 1
τ ∈G
since the polynomial on the left has degree at most n − 1, but by the preceding comments
it evaluates to 1 at each of the n distinct roots of f (x). Also in E[x] we have
6 ρ in G;
0 mod f (x), if τ =
(A5.17) gτ (x)gρ (x) ≡
gτ (x) mod f (x), if τ = ρ in G.
The first congruence follows since gτ (x)gρ (x) vanishes at all of the n roots of f (x) when-
ever τ 6= ρ. When τ = ρ, multiplying both sides of (A15.16) by gτ (x) yields gτ (x)2 ≡
gτ (x) mod f (x). Considering now the action of G on E[x] via its natural action on coeffi-
cients, one easily finds that
(A5.18) G permutes the n polynomials gτ (x) for τ ∈ G, in the same way that G
permutes the n roots of f (x) (i.e. they are equivalent G-sets). In fact,
σ(gτ (x)) = gστ (x) for all σ, τ ∈ G.
Now consider the n × n matrix M (x) with rows and columns indexed by elements of G,
having (σ, τ )-entry equal to the polynomial gστ (x) ∈ E[x]. Since M (x) is an n × n ma-
trix with entries in E[x], its determinant is also a polynomial in x (in fact, of degree at
most n(n − 1)). We will show that det M (x) 6= 0, by showing that det(M (x)TM (x)) =
(det M (x))2 ≡ 1 mod f (x). The (σ, τ )-entry of M (x)TM (x) is
P P
gρσ (x)gρτ (x) = ρ(gσ (x)gτ (x)) ≡ 0 mod f (x)
ρ∈G ρ∈G
gρ (x)2 ≡
P P P
gρσ (x)gρσ (x) = gρ (x) ≡ 1 mod f (x).
ρ∈G ρ∈G ρ∈G
Thus in E[x] we have det M (x)2 ≡ 1 mod f (x) and, in particular, det M (x) is a nonzero
polynomial. Since F is an infinite field, there exists a ∈ F such that det M (a) 6= 0. Take
Appendix A5: FIELD AUTOMORPHISMS AND GALOIS THEORY
B = {gτ (a) : τ ∈ G}. For all σ ∈ G we have σ(a) = a and so σ(gτ (a)) = gστ (a); thus G acts
P
on B. It remains to be shown that B is a basis for E over F . Suppose that cτ gτ (a) = 0
τ ∈G
for some constants cτ ∈ F . Applying an arbitrary σ ∈ G to this equation yields
P P
0= σ(cτ gτ (a)) = gστ (a)cτ
τ ∈G τ ∈G
so the vector cτ : τ ∈ G is in the null space of the nonsingular matrix M (a). This forces
cτ = 0 for all τ , so B is a basis.
Appendix A6: Dedekind Zeta Functions and Dirichlet Series
Let E be a number field with ring of integers O = OE . The Dedekind zeta function
of E is the complex-valued function
X 1
ζE (s) =
N (A)s
06=A⊆O
where the sum extends over all nonzero ideals A ⊆ O, and N (A) = |O/A| is the norm of A.
The series converges for complex numbers s with <(s) > 1; but by analytic continuation,
the function has a meromorphic extension to C with a simple pole at s = 1. It has an
Euler factorization given by
Y 1 −1
ζE (s) = 1− ,
N (P)s
P
also convergent for <(s) > 1; the product extends over all nonzero prime ideals P ⊂ O.
The theorem equating the infinite series with the infinite product, can readily be seen as
an algebraic reformulation of the fact that every nonzero ideal A ⊆ O factors uniquely as
a product of prime ideals (although the details relating convergence requires a little more
care than we provide here).
Example A6.1: The Riemann Zeta Function. ForPE = Q, the Dedekind zeta function coincides
∞ 1
with the Riemann Zeta Function: ζQ (s) = ζ(s) = n=1 ns . Its Euler factorization is ζ(s) =
−s −1
Q
p (1 − p ) where the product extends over all rational primes p.
The zeta functions of Dedekind are the most typical zeta functions of number theory;
and although not required in our Section 12, this appendix is intended to provide motiva-
tional context for our discussion there. Here the student may see something of the larger
role of zeta functions for studying the distribution of primes in Dedekind domains. An-
other reason for including this Appendix is to provide an additional application (Dirichlet’s
Theorem A6.2 below) of the character theory of finite abelian groups of Section 6.
A Dedekind domain is an integral domain in which every nonzero ideal factors
uniquely as a product of prime ideals. The two main examples are the ring OE of integers
in a number field E; and the ring of polynomials OE = F [x1 , x2 , . . . , xn ] in a function
field E = F (x1 , x2 , . . . , xn ). In both cases E is the field of fractions of OE . Questions
regarding the distribution of primes in OE are best studied by rephrasing them in terms
of the behavior of ζE (s) (particularly the zeroes and poles of this zeta function). Often
these questions are too difficult to solve in the number field case (witness the Riemann
hypothesis); and then one turns to the function field case where the questions are typically
Appendix A6: DEDEKIND ZETA FUNCTIONS AND DIRICHLET SERIES
more manageable, hoping for inspiration that might apply in the number field case. Thus
for example, the very precise formula of Theorem 3.13 counting irreducible polynomials
of each degree (and thereby prime ideals of a given norm) in Fq [x], has a clear analogue
for the prime-counting function π(x) = |{prime p ∈ N : p 6 x}| which we can state as a
conjecture, but are currently unable to prove except in a weaker asymptotic sense.
Let G = (Z/N Z)× , the multiplicative group of units of the ring of integers mod N ; so
|G| = n := φ(N ). For each character χ ∈ G, b compose χ with the canonical projection
k 7→ k + N Z in order to lift χ to a map χ : Z → Z/N Z → C. We write χ(k) = 0 whenever
gcd(k, N ) 6= 1; and χ(k) ∈ hζn i as before, if gcd(k, N ) = 1. This extension of χ ∈ G
b to a
function Z → C, while not exactly a linear character as defined in Section 6, is completely
multiplicative (i.e. χ(k`) = χ(k)χ(`) for all k, ` ∈ Z). It is called a Dirichlet character
modulo N . Each character χ ∈ G b (lifted to Z) yields a Dirichlet L-function
∞
X χ(k)
(A6.3) Lχ (s) = , where s ∈ C.
ks
k=1
As with the Riemann zeta function of Example A6.1, the series (A6.3) converges for <s > 1
but admits an analytic continuation to a meromorphic function on Cr{1}. And for exactly
the same reasons as in the zeta function case, we obtain an Euler factorization
Y χ(p) −1
Lχ (s) = 1− ,
p
ps
convergent at least for <s > 1. (Here and throughout, the index p varies over all rational
primes.) While the function Lχ (s) has complex values in general, we can safely restrict
s to real values > 1 for the argument at hand. Here, all Euler factors have values in the
right half-plane where we can take the standard branch of natural logarithm; thus
X χ(p)
ln Lχ (x) = − ln 1 − x , for x > 1.
p
p
1
We require the Taylor expansion of each of these terms, found by integrating 1−u =
1 + u + u2 + u3 + · · · (for |u| < 1) to obtain
∞
u u2 u3 X uk
− ln(1 − u) = u + + + + ··· = , for |u| < 1.
2 3 4 k
k=1
The dominant terms in (A6.4) are those in the first sum p χ(p)
P
px . To see that the remaining
terms are small (their total contribution is uniformly bounded for all x > 1), we note that
∞ ∞
XX χ(p)k XX 1 1X 1
6 6
p k=2
kpkx p k=2
kpkx 2 p (px )2
∞
1 X 1 1 π2
6 = − 1 < 1 for all x > 1.
2 r=2
r2 2 6
n, if k ≡ a mod N ;
X
(A6.5) χ(a)χ(k) = for x > 1.
0, otherwise
χ∈G
b
X X X χ(a)χ(p) X 1
(A6.6) χ(a) ln Lχ (x) = + O(1) = + O(1) as x → 1+
p χ∈G
px p x
χ∈G
b b p≡a mod N
using (A6.5) for all terms with gcd(k, N )=1; and we recall that the terms with gcd(k, N )>1
give χ(k) = 0. Here ‘O(1)’ stands for terms that are uniformly bounded (it has absolute
value at most n, whatever the value of x > 1; this follows from (A6.4) and the estimate
which follows it). We see that the Dirichlet characters succeed in filtering out individual
congruence classes within the sequence of primes, thereby bringing us closer to our goal.
We now investigate the behaviour of each of the functions Lχ (x) as x → 1+ . We first
show that
by comparison with
Z ∞
dt 1
x
= → ∞ as x → 1+ .
1 (tN +1) (x−1)N (N +1)x−1
Appendix A6: DEDEKIND ZETA FUNCTIONS AND DIRICHLET SERIES
Denoting f (u) = u−x for u > 0, the Mean Value Theorem yields
(N − 1)x (N − 1)x
f (rN +1) − f (rN +N ) = f 0 (ξ)(1 − N ) = (N − 1)xξ −x−1 6 x+1
6
(rN +1) (rN +1)2
for some ξ between rN +1 and rN +N , where x > 1. Using this in (A6.9) and substituting
into (A6.8) gives
∞ ∞
X 1 X 1 N (N −1)xπ 2
|Lχ (x)| 6 N (N −1)x 6 N (N −1)x = .
r=0
(rN +1)2 `2 6
`=1
We are now ready to prove Theorem A6.2, arguing by contradiction. Suppose there
are only finitely many primes p ≡ a mod N . Then the right side of (A6.6) is bounded
P 1 as
x → 1+ (because the O(1) terms are bounded; and the other sum converges to p :
primes p ≡ a mod N , a finite sum by assumption). Therefore the left side of (A6.6)
must also remain bounded as x → 1+ . The terms χ(a) ln Lχ (x) for nontrivial χ certainly
remain bounded as x → 1+ , by (A6.10). However for the trivial character χ, the term
|χ(a) ln Lχ (x)| → ∞ as x → 1+ , by (A6.7). This is the desired contradiction; so Theo-
rem A6.2 follows.
Appendix A7: Symmetric Polynomials
Note that ek (x1 , . . . , xn ) has nk terms, these being the products of all k-subsets of the n
indeterminates. In particular,
and ek = 0 for k ∈
/ {0, 1, 2, . . . , n}. From the definition, one readily deduces the identity
n
Y
(t − xi ) = tn − e1 tn−1 + e2 tn−2 − · · · + (−1)n en
i=1
in F [x1 , . . . , xn , t], and so this product serves as a generating function for the elementary
symmetric polynomials. It also shows that the coefficients in any univariate polynomial
are (up to signs) the elementary symmetric polynomials in its roots.
Another important set of symmetric polynomials is the set of moment polynomials
or power sum polynomials
∞ ∞ n
X Yn
X
i
X
j j
1
mi t (−1) ej t = (1 − xj t)
i=0 j=0 i=1
1 − xi t j=1
n
X Y n−1
X
= (1 − xj t) = (−1)j (n − j)ej tj .
i=1 16j6n j=0
j6=i
P Q
The last equality holds because in the expansion of i j6=i (1 − xj t), every monomial of
the form (−1)j xi1 xi2 · · · xij tj appears n − j times (once for every index i ∈
/ {i1 , i2 , . . . , ij }).
Comparing coefficients of like powers of t on both sides gives the required identities.
Newton’s identities show that the moment polynomials can be recursively expressed
as polynomials in e1 , e2 , . . . , en with integer coefficients, i.e. mk ∈ Z[e1 , e2 , . . . , en ] for all
k > 0. A foundational result in classical invariant theory shows that much more gener-
ally, every symmetric polynomial in x1 , x2 , . . . , xn is expressible as a polynomial in the
elementary symmetric polynomials (with coefficients in F ). This says that the subring
of F [x1 , x2 , . . . , xn ] consisting of all polynomials invariant under the full symmetric group
Sn , is exactly the subring F [e1 , e2 , . . . , en ]. The moment polynomials generate a subring
of the ring of all symmetric polynomials, i.e. F [m1 , m2 , . . . , mn ] ⊆ F [e1 , e2 , . . . , en ]. In
characteristic zero, equality holds as can be seen from Newton’s identities; since in charac-
Pk
teristic zero we can solve for ek = k1 i=1 (−1)i+1 mi ek−i and thereby recursively express
e1 , e2 , . . . , en in terms of the moment polynomials. Similarly in positive characteristic p,
Theorem A7.1 allows us to express the elementary symmetric polynomials ek in terms of
the moments, as long as k 6≡ 0 mod p.
Appendix A7: SYMMETRIC POLYNOMIALS
Although Newton’s identities give a very fast and practical recursive method for gen-
erating the moment polynomials from the sequence of elementary symmetric polynomials,
sometimes it is preferable to have instead a more explicit formula. In such cases we use
Before proving this formula, some remarks bear mention. General results of invariant
theory tell us that this expansion is unique (there can be no more than one way to express
mk in terms of the elementary symmetric polynomials since e1 , e2 , . . . , en are algebraically
independent in F (x1 , x2 , . . . , xn ) ⊃ F , where F is the algebraic closure of F ). As indicated
already, mk ∈ Z[e1 , e2 , . . . , en ] as follows by induction using Newton’s identities; therefore
the coefficients in Waring’s Formula must also be integers. Note that Waring’s Formula
expresses mk in terms of e1 , e2 , . . . , eν only, where ν = min{k, m}; this is because ej = 0
for j > n, and the constraints on the indices i1 , i2 , . . . , in implicitly require that ij = 0
whenever j > k; moreover ik ∈ {0, 1}, and the only term with ik = 1 is (−1)k+1 kek . This
yields the following, which we use in Section 16:
It is not too hard to infer this result directly from Newton’s identities. Of course when
k > n, Corollary A7.4 reduces to the statement mk ∈ Z[e1 , e2 , . . . , en ] which we have
already seen.
Proof of Theorem A7.3. Reversing the list of coefficients in f (t) gives the identity
Yn
(1 − xi t) = 1 − e1 t + e2 t2 − · · · + (−1)n en tn
i=1
in Z[x1 , . . . , xn , t]. Now in Q((x1 , x2 , . . . , xn , t)) we obtain the identity
∞ n ∞ n
X mj j X X xji j X
t = t =− ln(1 − xi t)
j=1
j i=1 j=1
j i=1
= − ln 1 − e1 t + e2 t2 − · · · + (−1)n en tn
∞
X (e1 t − e2 t2 + e3 t3 − · · · + (−1)n+1 en tn )k
=
k
k=1
X i1 +i2 + · · · +in in ti1 +2i2 +3i3 +···+nin
= ei11 (−e2 )i2 ei33 · · · (−1)n+1 en .
i1 , i2 , · · · , in i1 +i2 + · · · +in
i1 ,i2 ,...,in >0
PARI/GP
PARI is open source software designed specifically for computational number theory.
Although it is not a general purpose package for symbolic computation, for computational
number theory its capabilities are on par with anything else you will have access to; and
it is easier to install than any of the other systems. It is freely available for download in
Windows, Mac and Linux versions, from
https://pari.math.u-bordeaux.fr/download.html
In addition to the documentation available through the official PARI/GP website, many
tutorials are available online in both video and readable document form.
Mathematica
Although Mathematica is proprietary software, it is accessible to current students
through our campus license. It is suitable for general symbolic computation, not only in
computational number theory, but for a wide range of mathematical tasks.
Maple
Another general purpose package for symbolic computation, including computational
number theory, is Maple. This is proprietary software which is also currently available to
our students; but we anticipate losing the license for this about a year from now.
Sage
Sage is open source software for performing general symbolic computation. It is freely
available for download from
http://www.sagemath.org/download.html
although trickier to install and use than other options. It is also not as full-featured as the
other software available; but it is steadily growing thanks to the programming contributions
of its devoted users and fans.
Magma
Magma is proprietary software for general algebraic computation. However if you are
interested, you might ask around our department for help getting this installed.
Appendix A8: COMPUTATIONAL SOFTWARE
PARI/GP
The screenshot below (on the right) shows a short PARI/GP session verifying selected
details from our Example A3.13. Ending a command with a semicolon suppresses output.
This interactive session included 16 input commands. Our comments on the session, as
follows, are listed according to step numbers:
In fact, the default command for computing the class number in PARI/GP is conditional on
GRH (the Generalized Riemann Hypothesis). Should you choose not to trust this result,
the PARI/GP documentation describes how to verify this computation unconditionally (i.e.
without relying on GRH).
Mathematica
The following pages show a Mathematica session checking some of the steps in the same
Example A3.13. Although Mathematica does not currently have all features available, you
will have no trouble reproducing all the details of Example A3.13 using Mathematica to
do the laborious calculation, if you know what you are doing and follow the steps shown
in our worked Example A3.13.
172
Out[3]=
theta = Root[f, 1]
Root3 - #1 + #14 &, 1
In[4]:=
Out[4]=
Out[5]= 6885
FactorInteger[%]
{{3, 4}, {5, 1}, {17, 1}}
In[6]:=
Out[6]=
Verify irreducibility
Factor[f]
3 - x + x4
In[ ]:=
Out[ ]=
Out[ ]=
Factor[f, Modulus → 3]
x 2 + x3
In[ ]:=
Out[ ]=
Factor[f, Modulus → 5]
1 + x2 3 + 3 x + x2
In[ ]:=
Out[ ]=
Factor[f, Modulus → 7]
2 + x 5 + 4 x + 5 x2 + x3
In[ ]:=
Out[ ]=
Out[ ]=
Out[ ]=
173
Factor[f, Modulus → 17]
10 + x 13 + x2 15 + x
In[ ]:=
Out[ ]=
Out[ ]=
Out[ ]=
Out[ ]=
Out[7]=
Out[8]=
NumberFieldClassNumber : The class number of the number field generated by Root 3 - #1 + #14 &, 1, 0 is not yet available.
[Hi] Y. Hiramine, ‘A conjecture on affine planes of prime order’, J. Combin. Theory Ser.
A 52 (1989) no. 1, 44–50.
[Ho] S.F. Hobbs, ‘The law which is not yet in the law books, yet fills them’, pp.93–101 in
Alabama State Bar Association: Report of the Proceedings of the Annual Meeting,
July 1st and 2nd, 1926’.
[HP] D.R. Hughes and F.C. Piper, Projective Planes, Springer Verlag, New York, 1973.
[Is] I.M. Isaacs, Character Theory of Finite Groups, Academic Press, San Diego, 1976.
[IR] K. Ireland and M. Rosen, A Classical Introduction to Modern Number Theory, 2nd
ed., Springer, New York, 1990.
[Ju] D. Jungnickel, ‘Difference sets’, pp.241–324 in Contemporary Design Theory: A Col-
lection of Surveys, ed. J.H. Dinitz and D.R. Stinson, Wiley, New York, 1992.
[JS1] D. Jungnickel and B. Schmidt, ‘Difference sets: an update’, pp.89–112 in Geometry,
Combinatorial Designs and Related Structures: Proceedings of the First Pythagorean
Conference, ed. J.W.P. Hirschfeld, S.S. Magliveras and M.J. de Resmini, Camb. Univ.
Press, Cambridge, 1997.
[JS2] D. Jungnickel and B. Schmidt, ‘Difference sets: a second update’, Rend. Circ. Palermo
Serie II, Suppl. 53 (1998) 89–118.
[K] N.M. Katz, ‘An overview of Deligne’s proof of the Riemann hypothesis for varieties
over finite fields’, Proc. Symp. Pure Math. 28, Amer. Math. Soc., Providence, R.I.,
1976, pp.275–305.
[Ki] R.E. Kibler, ‘A summary of noncyclic difference sets, k < 20’, J. Comb. Theory 25
(1978), 62–67.
[L1] S. Lang, Cyclotomic Fields I and II: Combined Second Edition, Springer, New York,
1990.
[L2] S. Lang, Algebraic Number Theory, 2nd ed., Springer-Verlag, New York, 1994.
[LN] R. Lidl and H. Niederreiter, Finite Fields, Encyclopedia of Mathematics and its Ap-
plications, Vol. 20, ed. G.-C. Rota, Camb. Univ. Press, Cambridge, 1997.
[LeS] K.H. Leung and B. Schmidt, ‘New restrictions on possible orders of circulant Hada-
mard matrices’, Des. Codes Cryptogr. 64 (2012), 143–151.
[LoS] L. Lovász and A. Schrijver, ‘Remarks on a theorem of Rédei’, Studia Scient. Math.
Hungar. 16 (1981), 449–454.
[MM] D.P. May and G.E. Moorhouse, ‘Uniqueness of mutually unbiased bases of order 5’,
preprint, 2009.
[Mc] P.J. McCarthy, Algebraic Extensions of Fields, Dover, New York, 1991.
[M1] G.E. Moorhouse, ‘Bruck nets, codes, and characters of loops’, Des. Codes Crypt. 1
(1991), 7–29.
Bibliography