0% found this document useful (0 votes)
26 views207 pages

Algebra

Uploaded by

Ken Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views207 pages

Algebra

Uploaded by

Ken Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 207

Algebra

Hung-Hsun, Yu

September 20, 2024


Hung-Hsun, Yu

2
Preface

This pdf contains algebra-related material that I have learned so far. Since the only
algebra classes that I have taken are 18.701 (Algera I) and 18.702 (Algebra II) at MIT,
most of the material of this pdf will come from those classes and the textbook they
use. Besides that, there are some other miscellaneous resource that I have consulted
(mostly from the net), either prior to the classes or during the course to gain a better
understanding in algebra, and I will include those into this pdf too.
Since I cannot recall where all the material comes from correctly, and also since this
pdf is not that formal, I will not care a lot about citation in this pdf. However, if the
reader notices that some citation should be made somewhere, I would appreciate it if the
reader can inform me of it so that I can add the citation.
Regarding the format of this pdf, it will primarily be the sketch of important moti-
vation and proofs since my motivation to edit this pdf is to keep a note of what I have
learned. However, I believe that the blank that I have left can be easily filled in by
the readers, and I think that the process of completing the proofs can be helpful for the
readers to learn the material better. If the readers encounter some difficulty completing
the proof, most of the complete proof can be found on the net. That said, the readers are
still encouraged to spend some time working on their own before consulting wikipedia or
google. If the readers find some sketch of proof particularly confusing, please tell me so
that I can improve that.
Finally, hope that my notes can help you understand the beauty of math more and
give you more motivation to learn more by yourselves :)

3
Hung-Hsun, Yu

4
Contents

1 Matrix 9
1.1 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Reduced Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Cofactor Matrix and Invertibility . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Basic Group Theory 17


2.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Symmetric Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Subgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Equivalence Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Coset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Normal Subgroup and Quotient Group . . . . . . . . . . . . . . . . . . . 24
2.8 First Isomorphism Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9 Direct Product and Semidirect Product . . . . . . . . . . . . . . . . . . . 27
2.10 Correspondence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.11 Simple Group and Alternating Group . . . . . . . . . . . . . . . . . . . . 30
2.12 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Vector Space 33
3.1 Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Linear Transformation and Matrix . . . . . . . . . . . . . . . . . . . . . 36
3.5 Multilinear Alternating Form . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Rank and Nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.8 Another Dimension Formula . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.9 Application: Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . 42
3.10 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Linear Operator 45
4.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Determinant and Characteristic Polynomial . . . . . . . . . . . . . . . . 46
4.3 Invariant Subspace and Eigenspace . . . . . . . . . . . . . . . . . . . . . 47
4.4 Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5
Hung-Hsun, Yu CONTENTS

4.5 Generalized Eigenspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50


4.6 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7 Application 1: Homogeneous Linear Dif Eq . . . . . . . . . . . . . . . . . 53
4.8 Application 2: Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . 54
4.9 Perron-Frobenius Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.10 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Symmetry and Group Action 61


5.1 Isometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Isometry of Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Symmetry group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Lattice Group and Point Group . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Crystallographic Restriction . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6 Group Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.7 Finite Subgroup of SO3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.8 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Advanced Group Theory 73


6.1 Cayley’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Class Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Normalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Cauchy’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5 Sylow p-group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.6 Free Group and Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.7 Todd-Coxeter Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.8 Burnside’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.9 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7 Bilinear Form 85
7.1 Bilinear Form and Dual Space . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Start From Standard Inner Product . . . . . . . . . . . . . . . . . . . . . 86
7.3 Symmetric Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.4 Hermitian Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.5 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.6 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.7 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.8 Positive Definite and Semi-definite Matrices . . . . . . . . . . . . . . . . 95
7.9 Application 1: Quadratic Form and Quadric . . . . . . . . . . . . . . . . 98
7.10 Application 2: Legendre Polynomial . . . . . . . . . . . . . . . . . . . . 100
7.11 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8 Group representation 105


8.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.2 Unitary Representation and Maschke’s Theorem . . . . . . . . . . . . . . 106
8.3 Schur’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.4 Interlude: Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.5 Character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.6 Orthonormality of Irreducible Characters . . . . . . . . . . . . . . . . . . 111
8.7 Restricted and Induced Representation . . . . . . . . . . . . . . . . . . . 114

6
Hung-Hsun, Yu CONTENTS

8.8 Dual Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116


8.9 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

9 Ring 119
9.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.2 Ring Homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
9.3 Subring and Ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.4 Integral Domain and Divisibility . . . . . . . . . . . . . . . . . . . . . . . 123
9.5 Ideal and Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.6 Z is PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.7 Noetherian and Existence of Factorization . . . . . . . . . . . . . . . . . 128
9.8 PID is UFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.9 Polynomial Ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.10 Adjoining an element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.11 Fraction Field and Localization . . . . . . . . . . . . . . . . . . . . . . . 134
9.12 Nullstellensatz and Algebraic Geometry . . . . . . . . . . . . . . . . . . 137
9.13 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

10 Module Theory 143


10.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.2 Submodule and Ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.3 Module Homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
10.4 Direct Sum and Direct Product . . . . . . . . . . . . . . . . . . . . . . . 148
10.5 Balanced Product and Tensor Product . . . . . . . . . . . . . . . . . . . 150
10.6 Group Representation Revisit . . . . . . . . . . . . . . . . . . . . . . . . 151
10.7 Structure Theorem for Module over PID . . . . . . . . . . . . . . . . . . 154
10.8 Two Applications of the Structure Theorem . . . . . . . . . . . . . . . . 157
10.9 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

11 Field and Galois Theory 161


11.1 Field Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11.2 Finite Field Part One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
11.3 Splitting Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11.4 Finite Field Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
11.5 Algebraic Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
11.6 Automorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.7 Fixed Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
11.8 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
11.9 Fundamental Theorem of Galois Theory . . . . . . . . . . . . . . . . . . 178
11.10More on Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
11.11Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

12 Applications of Field Extension and Galois Theory 187


12.1 Ruler and Compass Construction . . . . . . . . . . . . . . . . . . . . . . 187
12.2 Solution in Radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
12.3 Elementary Symmetric Polynomial . . . . . . . . . . . . . . . . . . . . . 191
12.4 Kummer Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
12.5 Characterization of Kummer Extension . . . . . . . . . . . . . . . . . . . 196
12.6 Natural Irrationality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7
Hung-Hsun, Yu CONTENTS

12.7 Radical Formula and Classifying Galois Group . . . . . . . . . . . . . . . 200


12.8 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8
Chapter 1

Matrix

Matrix is a simple but important tool, so we’ll investigate several properties of matrix at
the beginning of this note. Although all the properties can be discussed in a more general
scenario, it is always not bad to start with concrete examples that we are familiar with,
so the matrices in this chapter will all have complex coefficients.
If the readers feel comfortable to work/compute with matrices, feel free to skip this
section. Things get more interesting when matrices are associated with linear algebra :D.

1.1 Basic Operations


Matrix can serve as a way to record a bunch of numbers in a rectangular form. To be
specific:
Definition 1.1.1. An m by n matrix A with coefficients in K (where K = R or C in
this chapter) is a collection of numbers aij ∈ K where i = 1, . . . , m and j = 1, . . . , n.
Usually, we will arrange the aij in the following way to visualize M better:
 
a11 a12 · · · a1n
a · · · a2n 
 21 a22 
A=
 ..
.

 . 
am1 am2 · · · amn

Definition 1.1.2. The set of m by n matrices with coefficients in K is denoted by


Mm×n (K).

There is a convention denoting matrices by uppercase letters and the entries by cor-
responding lowercase letters. If no confusion will be created, I will use this convention
throughout the whole note.
Merely recording a two-dimensional array is quite boring, so let’s define some inter-
esting operations on matrices.
Definition 1.1.3. (Matrix addition/subtraction) For two m by n matrices A, B, define
the m by n matrix C = A ± B to be the matrix satisfying cij = aij ± bij for i = 1, . . . , m
and j = 1, . . . , n.

Definition 1.1.4. (Matrix multiplication) For an m by n matrix A and an n by l matrix

9
Hung-Hsun, Yu 1.1 Basic Operations

B, define the m by l matrix C = AB to be the matrix satisfying



n
cij = aik bkj
k=1

for i = 1, . . . , m and j = 1, . . . , l.

The definition of addition is really natural, while one might not see right away the
intention of the definition of multiplication. It will be clear after we introduce the concept
of linear operator.
I used to have a hard time doing matrix multiplication, and I learned a helpful trick
that helped me a lot. To present this trick, let’s do this with a concrete example.
Example 1.1.1. Compute
 
ñ ô 6 7 8 9
0 1 2  
0 1 2 3 .
3 4 5
4 5 6 7

Solution. Step 1. Lift the right matrix upward. Put an empty matrix on the right. Now
you can know what size the resulting matrix should be of!
 
6 7 8 9

0 1 2 3

ñ ô
4
ñ
5 6 7ô
0 1 2 · · · ·
3 4 5 · · · ·
Step 2. For every blank entry, draw a horizontal line and record the vector that the
line intersects. Draw a vertical line and do the same. For example, if we choose the top
left entry to begin with, then the resulting vectors are (0, 1, 2) and (6, 0, 4)
Step 3. Calculate the dot product/inner product of the vectors and put it in the entry.
For example, the dot product of (0, 1, 2) and (6, 0, 4) is 0 × 6 + 1 × 0 + 2 × 4 = 8, so we
put 8 in the top left entry entry.
 
6 7 8 9

0 1 2 3

ñ ô ñ
4 5 6 7ô
0 1 2 8 · · ·
3 4 5 · · · ·
If we complete Step 2. and Step 3. for every entry, then the result will be
 
6 7 8 9

0 1 2 3

ñ ô ñ
4 5 6 7 ô.
0 1 2 8 11 14 17
3 4 5 38 50 62 74
Therefore we have
 
ñ ô
6 7 8 9 ñ ô
0 1 2   8 11 14 17
0 1 2 3 = .
3 4 5 38 50 62 74
4 5 6 7

10
Hung-Hsun, Yu 1.2 Reduced Row Echelon Form

There is a reason that we say that these operations are addition and multiplication:
they satisfy MOST of the properties of addition and multiplication on integers, reals and
complex numbers.
Property 1.1.1. (Associativity and commutativity of matrix addition) For any m by
n matrics A, B, C, the following always holds:
(1) A + (B + C) = (A + B) + C;
(2) A + B = B + A.

Property 1.1.2. (Associaitivity of matrix multiplication) If A, B, C are respectively m


by n, n by p and p by q matrices, then (AB)C = A(BC).

Property 1.1.3. (Distributivity of matrix addition and multiplication) For any m by


n matrices A, B, n by l matrix C and l by m matrix D, we have (A + B)C = AC +
BC, D(A + B) = DA + DB.

Those properties can be proved by direct computation.


Note that MATRIX MULTILPLICATION IS NOT COMMUTATIVE. This is dif-
ferent from usual multiplications. The readers are encouraged to construct a counterex-
ample.
Definition 1.1.5. The identity matrix In is the n by n matrix such that the diagonal
entries are all 1 and all the other entries are zero.

Property 1.1.4. For any m by n matrix A and n by m matrix B, we have AIn =


A, In B = B.

Besides matrix addition and multiplication, we can also do scalar multiplication. It


is relatively simple:
Definition 1.1.6. (Scalar Multiplication) For any m by n matrix A and scalar c ∈ K,
define B = cA to be the matrix that satisfies bij = caij for i = 1, . . . , m and j = 1, . . . , n.

Also, we can exchange the two dimensions:


Definition 1.1.7. For any m by n matrix A, define its transpose to be the n by m
matrix B such that bij = aji for any i = 1, . . . , n and j = 1, . . . , m. Denote B by At (or
AT ).

1.2 Reduced Row Echelon Form


Let’s say we are solving a system of linear equations:

a11 x1 + · · · + a1n xn = b1 (1)


a21 x1 + · · · + a2n xn = b2 (2)
···
am1 x1 + · · · + amn xn = bm (m)

The middle-school method tells us that we can do the following two operations for multiple
time to get the answer:

11
Hung-Hsun, Yu 1.3 Determinant

1. Subtract c times of equation (i) from equation (j) to cancel a variable.

2. Multiply (or divide) equation (i) by c.

Note that we can write this system of linear equations into Ax = b. Therefore we can
“translate” the above two operations into the language of matrix.
Definition 1.2.1. Define a row operation of a matrix to be one of the three following
operations:
(1) Adding c times the i-th row to the j-th row, where i ̸= j.
(2) Multiplying the i-th row by c, where c ̸= 0.
(3) Exchanging the i-th row and the j-th row, where i ̸= j.

In fact, we can see a row operation as a matrix multiplication.


Definition 1.2.2. The corresponding matrices of the row operations are called the
elementary matrices.

We can apply the middle-school method to reduce any matrix to a specific form by
row operations. This is called the reduced row echelon form.
Definition 1.2.3. We say that a matrix M is in reduced row echelon form if
(1) The first nonzero term in every row is 1. This is called the pivot.
(2) The positions of pivots are strictly increasing.

Theorem 1.2.1. For any matrix, we can do finitely many row operations to make it
be in reduced row echelon form.

Sketch of Proof. Justify the middle-school method with induction!

Corollary 1.2.1. For any matrix A, there exist elementary matrices E1 , . . . , Ek and a
matrix in reduced row echelon form A′ such that E1 · · · Ek A = A′ .

1.3 Determinant
Determinant is an important attribute of square matrices. Although the definition of
determinant is somewhat weird and ugly, the striking properties that it has will explain
why it appears such early in the note.
There are several possible definition for determinant, and I feel like the one involving
permutation is the easiest one. So let’s first define what a permutation is.
Definition 1.3.1. A permutation π on n elements s1 , . . . , sn is a bijection between S
and itself, where S = {s1 , . . . , sn }.

Since we can label the n elements, we can see permutations as permuting the indices.
Therefore we usually assume that the permutations act on the set {1, . . . , n}.
Definition 1.3.2. The set Sn is the collection of permutations on the index set [n] :=
{1, . . . , n}.

12
Hung-Hsun, Yu 1.3 Determinant

It is clear that for any permutation, we can produce it by switching two elements
for multiple time. The following property shows that the number of exchange is not
arbitrary:
Property 1.3.1. For any permutation π, if we can represent it as a composite of k
transpositions and a composite of k ′ transpositions, then k and k ′ has the same parity.

Sketch of Proof. Consider the parity of the number of inversion pairs (i.e. the pair (i, j)
such that i < j, π(i) > π(j)). Every transposition changes the parity of the number of
inversion pairs.

Therefore it makes sense to define the parity of a permutation.


Definition 1.3.3. For any permutation π, if it is a composite of odd number of trans-
positions, then we say that π is odd. If π is a composite of even number of transpositions,
then we say that π is even.

Property 1.3.2. If π and σ are both permutations on n elements, then the parity of
π ◦ σ is the sum of the parities of π and σ.

Now we can define the determinant.


Definition 1.3.4. For any n by n matrix A, define its determinant to be

∑ ∏
n
(−1)σ aiσ(i) ,
σ∈Sn i=1

where by abuse of notation (−1)σ is −1 if σ is odd, and it is 1 if σ is even. Denote the


determinant of A by det(A).

The reason to define the determinant in this way will become clear after we know
about linear algebra.
Property 1.3.3. For any n by n matrices A, B and scalar c, we have
(1) det(cA) = cn det(A);
(2) det(In ) = 1;
(3) det(A) = 0 if two rows of A are identical;
(4) det(At ) = det(A)
(5) det(AB) = det(A) det(B).

The last property is really amazing, but it is not easy (or annoying) to prove at this
point. I’ll prove this when we revisit determinant in the world of linear algebra.
It is time-consuming to compute the determinant directly from the formula when the
matrix is big. However, the multiplicativity of determinant makes it possible to compute
it faster.
First, observe that if the entries below the diagonal are all zero, then the only term
that has contribution to the determinant is the product of all diagonal entries. Therefore
we can calculate the determinant in a short time if we manage to reduce the matrix into
an “upper triangular matrix.”

13
Hung-Hsun, Yu 1.4 Cofactor Matrix and Invertibility

Definition 1.3.5. A square matrix is upper triangular if all the entries below the
diagonal are all zero. A square matrix is lower triangular if all the entries above the
diagonal are all zero.

Now for any square matrix A, we know from Corollary 1.2.1 that there exist ele-
mentary matrices E1 , . . . , Ek and a matrix A′ in reduced row echelon form such that
E1 · · · Ek A = A′ . Therefore det(E1 ) · · · det(Ek ) det(A) = det(A′ ). Since A′ must be up-
per triangular (why?), it remains to calculate the determinant of the elementary matrices.
This should not be hard and is left to the readers for exercise.

1.4 Cofactor Matrix and Invertibility


Definition 1.4.1. For an n by n matrix A, an n by n matrix B is the inverse of A if
AB = BA = In . In this case, A is said to be invertible and we will denote B by A−1 .

It is not hard to see that if det(A) = 0 then A is not invertible. We will show that
the converse is also true. To show this, we first define the cofactor matrix.
Definition 1.4.2. For any n by n matrix A, the cofactor matrix of A, denoted by
cof(A), is the matrix B such that bij = (−1)i+j det(Aij ) for i, j = 1, . . . , n. Here Aij is
the matrix obtained by deleting the i-th row and the j-th column of A.

Property 1.4.1. If A is an n by n matrix, then A cof(A)T = cof(A)T A = det(A)In .

Sketch of Proof.
Lemma 1.4.1. For any i = 1, . . . , n, we have


n
det(A) = aij (−1)i+j det(Aij ).
j=1

This is clear by definition. Using this, we can see that


n
bik akj
k=1

is det(A) when i = j, and is the determinant of a matrix with two identical rows/columns
when i ̸= j, which is 0 by Property 1.3.3.

Therefore, if det(A) ̸= 0 then A is invertible.


Definition 1.4.3. Denote by GLn (K) the set of all invertible n by n matrices with
entries in K.

The “GL” here stands for ”general linear group.” Leave the magical word ”group”
alone for a while, it is called ”general” because it is not degenerate (i.e. the determinant
is non-zero), and it is called linear because... well... it has something to do with linear
algebra.
I don’t know if the above explanation makes the name “GL” less mysterious :p.

14
Hung-Hsun, Yu 1.5 Random Problem Set

1.5 Random Problem Set


Here are some problems that pop up in my head when editing this part. They are
probably helpful I guess :3?

1. (1.1) Try to find an injection f from C to M2×2 (R) such that f (x)f (y) = f (xy)
and f (x) ∈ GL2 (R) for any nonzero x. Then consider the function det ◦f : C → R.
Is this multiplicative? What function is this?

2. (1.2) Consider the set of solutions of Ax = 0 where A is an m by n matrix and x


is a n-dimensional vector. Can you describe the set of solutions by examining the
reduced row echelon form of A? To be more specific, prove that Ax = 0 ⇔ x = 0
if and only if there are n pivots in the reduced row echelon form.

3. (1.3) Suppose that A is a matrix with integer coefficients. Then A−1 exists and has
integer coefficients if and only if det(A) = ±1.

4. (1.4) Show that if A is an n by n matrix where A + AT = 0 and n is odd, then


det(A) = 0.

15
Hung-Hsun, Yu 1.5 Random Problem Set

16
Chapter 2

Basic Group Theory

Now the trip to the abstract kingdom begins. If you feel lost, try to grab some concrete
examples with you as your compass. It might be difficult at first to deal with these
abstract objects, but everything will be better when you get used to it.
Group is the simplest object in algebra in the sense that it has the least structure.
That said, the structure is still somewhat complicated. The most famous example for
this is the classification of “simple group”, which is by no mean simple.
As an introduction, the properties introduced in this chapter will all be very basic.
The readers are encouraged to be aware of what has been done, or what should be
considered, after constructing an abstract algebraic object. This will help a lot when
encountering other algebraic objects.

2.1 Definition and Examples


Definition 2.1.1. A law of composition ◦ on a set S is a function ◦ : S × S → S. The
image of (a, b) under the function ◦ is denoted by a ◦ b.

One can immediately come up with a lot of examples of laws of composition. For
example, addition/subtraction/multiplication on real numbers, division on non-zero real
numbers, taking power on positive reals, etc. However, it is really hard to say something
clever under this weak condition. So we usually assume something (much) stronger:
Definition 2.1.2. Suppose that · is a law of composition on a set G. (G, ·) is said to
be a group if there exists an element e ∈ G such that for any a, b, c ∈ G the following
hold:
(1) (associativity) (a · b) · c = a · (b · c);
(2) a · e = e · a = a;
(3) There exists an element a′ ∈ G such that a · a′ = a′ · a = e.
In this case, e is called the identity of (G, ·).

Example 2.1.1. (Z, +), (Q, +), (Q\{0}, ×), (R, +), (R\{0}, ×), (C, +), (C\{0}, ×),
(GLn (K), ×) are groups. (N, +), (R, ×), (Z, −), (Mn×n , ×) are not groups.

The first condition shows that the parentheses are in fact not necessary. Therefore we
will drop the parentheses from now on. The second one says that there must be a neutral

17
Hung-Hsun, Yu 2.2 Symmetric Group

element, and the third one says that the inverse always exists. Note that both do not say
that such elements are unique, but they are in fact unique already by the conditions.
Property 2.1.1. If e and e′ are both identities of G, then e = e′ .

Property 2.1.2. If a′ and a′′ are both inverses of a (i.e. a′ a = aa′ = a′′ a = aa′′ = e,
then a′ = a′′ .

Definition 2.1.3. Denote the unique inverse of a by a−1 .

Note that generally (G, ·) is not necessarily commutative.


Definition 2.1.4. If (G, ·) is a group and furthermore, a · b = b · a for any a, b ∈ G,
then G is said to be abelian.

Now most of the examples we have are abelian, but there will soon be a lot of non-
abelian examples. Since the abelian case is well-studied, most of people’s interest is also
in the non-abelian groups.
Usually we drop the · if it does not cause any confusion. Conventionally, when the
group is written additively (so 0 is the identity and −a is the inverse of a), then it
implicitly means that the group is abelian. When the group is written multiplicatively
(so 1 is the identity and a−1 is the inverse of a), then it does not necessarily mean that
the group is non-abelian, nor does it show that it is abelian.
There is an important property of groups before we go on:
Property 2.1.3. (Law of cancellation) If a, b, x are elements in the group G, then
ax = bx implies a = b. Moreover, xa = xb also implies a = b.

2.2 Symmetric Group


Before we go on, let’s expand our group collections to get more concrete examples.
Consider the permutations on index set [n]. It is clear that they form a group under
composition, where the identity is the identity map and the inverse of a permutation is
simply the inverse function.
Definition 2.2.1. For any n ∈ N, denote by Sn the group of permutations on [n].

To work with this group, we hope to calculate the product (or the composition)
efficiently. The cycle notation is really helpful in this regard.
Definition 2.2.2. Suppose that i1 , . . . , ik ∈ [n] are k different elements. Then the cycle
notation (i1 i2 . . . ik ) is the permutation π such that
®
ij+1 if m = ij
π(m) =
m else

where ik+1 = i1 .

Example 2.2.1. If σ is a permutation in S5 such that σ = (1 4 3), then the map table
of σ is

18
Hung-Hsun, Yu 2.2 Symmetric Group

i 1 2 3 4 5
.
σ(i) 4 2 1 3 5

Property 2.2.1. Every permutation in Sn can be represented as a product of disjoint


cycles. This is called the disjoint cycle representation of permutations.

Example 2.2.2. Let’s actually compute the disjoint cycle representation for a concrete
example. For example, take the permutation π = 451326 for example. We first consider
1. 1 is mapped to 4, 4 is mapped to 3 and 3 is mapped to 1. Since we come back to 1,
we can write
π = (1 4 3) · (??).
The next element that has not been considered is 2. 2 is mapped to 5, and 5 is mapped
to 2. Therefore,
π = (1 4 3)(2 5) · (??).
The only element left is 6. Since it is mapped to itslef, we finally have

π = (1 4 3)(2 5)(6).

Note that cycles of length 1 are in fact identity. Therefore we can drop (6) and wirte
π = (1 4 3)(2 5).

With this notation, we can easily compute the products of permutations.


Example 2.2.3. Let π = (1 4 7)(2 6 5) and σ = (1 2)(3 5)(6 8). We are going to
compute the disjoint cycle representation of πσ = (1 4 7)(2 6 5)(1 2)(3 5)(6 8).
We can go through the same procedure in the previous example. 1 is mapped to 2 by
σ and then to 6 by π, 6 is mapped to 8 by σ and then fixed by π, 8 is mapped to 6 by σ
to 5 by π, 5 is mapped to 3 by σ and fixed by π, 3 is mapped to 5 by σ and then to 2 by
π, 2 is mapped to 1 by σ and then to 4 by π, 4 is fixed by σ and then mapped to 7 by π,
and finally 7 is fixed by σ and then mapped to 1. Therefore

πσ = (1 6 8 5 3 2 4 7) · (??).

Note that every element is contained in the cycle now, so πσ = (1 6 8 5 3 2 4 7).

Remark. As we evaluate the composition of functions from right to left, we also cal-
culate the product of permutations from right to left. This is somehow counter-intuitive
and worth of caution.

Property 2.2.2. For any n ≥ 3, the group Sn is non-abelian.

Sketch of Proof. Consider (1 2) and (1 3).

Note that S3 has 3! = 6 elements. It can actually be shown that groups of order less
than 6 are always abelian, and therefore S3 is the smallest non-abelian group.
The group structure of Sn is fairly complicated, so whenever you need an example
of finite non-abelian group to test out a property, symmetric group is always your best
friend.

19
Hung-Hsun, Yu 2.3 Subgroup

2.3 Subgroup
Now that we have defined what a group is, we can think about what are the “subobjects”
and the “morphism” (function) when it comes to groups. In this section we will first
define the concept of subobjects in groups.
Definition 2.3.1. Given a group (G, ·). Let H be a nonempty subset of G. We say
that H is a subgroup of G if (H, ·) is also a group. This is denoted by (H, ·) ≤ (G, ·) or
H ≤ G when it does not have any ambiguity.

By the definition, it seems that we have to verify all the conditions to say that H is
a subgroup of G. However, since we are given a well-behaving operations in G already,
things are a bit easier for its subset.
Property 2.3.1. For any nonempty subset H in a group G, it is a subgroup of H if
and only if ab−1 ∈ H for any a, b ∈ H.

Sketch of Proof. It is clear that if H is a subgroup of G then ab−1 ∈ H for any a, b ∈ H.


It suffices to show that H is indeed a group when ab−1 ∈ H for any a, b ∈ H.
By the law of cancellation the identity of H must be the same as the identiy of G.
Pick a ∈ H, then e = aa−1 ∈ H. Pick a = e, then b−1 ∈ H. Therefore ab ∈ H for any
a, b ∈ H, and we are done.

Note that for every group G, itself and {e} are always subgroups of G. To say
something clever, we are usually interested in the other cases.
Definition 2.3.2. A subgroup H of G is proper if H ̸= G. A subgroup H of G is trivial
if H only contains the identity of G; otherwise, it is nontrivial.

Remark. In some other texts “proper” means “proper and nontrivial.” Be careful with
the definition when you are reading other books.

Given a group G and a subset S of G. If S is not a subgroup of G, we might be


interested in how far S is from being a subgroup. Therefore it is natural to ask how big
a subgroup H of G should be if S ⊆ H.
Definition 2.3.3. For any subset S of G, define the subgroup of G generated by S by

H,
H≤G,S⊆H

and denote it by ⟨S⟩.

Property 2.3.2. For any S ⊆ G, the set ⟨S⟩ is indeed a subgroup of G that contains
S. Moreover, it is the smallest possible one in the sense that if H ≤ G and S ⊆ H then
⟨S⟩ ⊆ H.

Example 2.3.1. In (R, +), the subgroup generated by {1} is Z. In (C\{0}, ×), the
subgroup generated by R+ ∪ {i} is (R\{0}) ∪ (R\{0})i, i.e. all the nonzero reals and
nonzerl purely imaginary numbers.

20
Hung-Hsun, Yu 2.4 Homomorphism

It is possible to write a constructive definition of ⟨S⟩. However, it is way too com-


plicated to work it unless S is very small. In particular, things are much easier when S
contains only one element.
Remark. From now on, by abuse of notation ⟨g1 , g2 , . . . , gn ⟩ means ⟨{g1 , . . . , gn }⟩ if
g1 , . . . , gn are elements in G.

Property 2.3.3. For any x ∈ G, the subgroup ⟨x⟩ is precisely


{xn |n ∈ Z}.

Sketch of Proof. By definition the set is clearly contained in ⟨x⟩. Therefore it suffices to
show that the set is indeed a subgroup of G.
Note that for two disdinct i, j ∈ Z, the elements xi , xj are not necessarily distinct. To
understand the structure of ⟨x⟩ better, let’s investigate when will xi = xj .
Definition 2.3.4. For any x ∈ G, the order of x in G is the smallest positive integer n
such that xn = 1. Denote it by ordG (x) (we’ll omit G if it is clear by context). If such n
does not exist, then define ordG (x) to be infinity.

Property 2.3.4. If ordG (x) < ∞, then xi = xj if and only if ordG (x)|i − j and so
⟨x⟩ = {1, x, . . . , xordG (x)−1 }. If ordG (x) = ∞, then xi = xj if and only if i = j.

Sketch of Proof. This is clear by the law of cancellation and the minimality of ordG (x).

Definition 2.3.5. If G is a group such that there exists x ∈ G such that G = ⟨x⟩, then
G is a cyclic group. If |G| is finite, denote G by Cn where n = |G|.

Therefore the property is actually saying that ⟨x⟩ is a cyclic group of order ordG (x).
Corollary 2.3.1. If G is a finte group, then every element in G has finite order.

Indeed a stronger version of this corollary holds, and we will see it really soon.

2.4 Homomorphism
Next is to define what type of functions we are going to consider in group theory. Ideally,
the functions should interact with the group structures. Hence the definition:
Definition 2.4.1. Given two groups G and G′ . A map ϕ : G → G′ is a homomorphism
if for any a, b ∈ G the following holds:
ϕ(a)ϕ(b) = ϕ(ab).
Furthermore, if ϕ is injective, then it is a monomorphism; if ϕ is surjective, then it is an
epimorphism; if ϕ is bijective, then it is an isomorphism.

In other words, a map ϕ : G → G′ is a homomorphism if the following diagram


commutes (i.e. all directed paths that start and end at the same points give the same
result).

21
Hung-Hsun, Yu 2.5 Equivalence Relation

(ϕ, ϕ)
G×G G′ × G′

· ·

G G′
ϕ

Property 2.4.1. If ϕ : G → G′ is a homomorphism, then ϕ(1G ) = 1G′ and ϕ(g −1 ) =


ϕ(g)−1 .

Example 2.4.1. For any group G and subgroup H ≤ G, we can construct a homomor-
phism (or furthermore a monomorphism) f : H → G by sending h ∈ H to h ∈ G. This
is often called the inclusion homomorphism.

Example 2.4.2. For a cyclic group G generated by x, we can construct a homomor-


phism f (or indeed an epimorphism) from Z to G sending n to xn .

Example 2.4.3. Taking determinant on GLn (R) is a homomorphism from GLn (R) to
R\{0}.

Note that if ϕ : G → G′ is an isomorphism, then we can actually identify the elements


in G with the elements in G′ in a way that is compatible with the group structures. In
other words, G and G′ are not distinguishable, and so we will see G and G′ the same in
group theory.
Definition 2.4.2. For any two groups G, G′ , if there exists an isomorphism between
G and G′ , then we say that G is isomorphic to G′ , or G and G′ are isomorphic. This is
denoted by G ∼= G′ .

Example 2.4.4. Let H ≤ GL2 (R) be a subgroup where


ñ ô
a b
H={ |a, b ∈ R, a2 + b2 ̸= 0}.
−b a
Then H is isomorphic to C× .

Example 2.4.5. If G and G′ are both cyclic groups and |G| = |G′ |, then G and G′ are
isomorphic. This justify the notation Cn because the structure of Cn only depends on n.

2.5 Equivalence Relation


Before we go on, let’s first discuss an important concept that is crucial in various fields
of math.
Definition 2.5.1. An equivalence relation ∼ on a set S is a subset of S × S satisfies
the following:
(a) (Reflexivity) a ∼ a ∀a ∈ S;
(b) (Symmetry) a ∼ b ⇒ b ∼ a ∀a, b ∈ S;
(c) (Transitivity) a ∼ b, b ∼ c ⇒ a ∼ c ∀a, b, c ∈ S.
Here a ∼ b means (a, b) ∈∼.

22
Hung-Hsun, Yu 2.6 Coset

Example 2.5.1. = on R is an equivalence relation. ≤ on R is not an equivalence


relation.

Definition 2.5.2. For any set S with an equivalence relation ∼, and for any element
s ∈ S, define the equivalence class [s] of s by

{s′ ∈ S|s′ ∼ s}.

Property 2.5.1. For any set S with an equivalence relation and s, s′ ∈ S, the sets
[s], [s′ ] are either the same or disjoint.

Definition 2.5.3. For any S with an equivalence relation ∼, the set S/ ∼ is the set of
equivalence classes. In other words,

S/ ∼= {[s]|s ∈ S}.

Corollary 2.5.1. S/ ∼ is a partitoin of S such that for any s, s′ ∈ S, they fall in the
same part if and only if s ∼ s′ .

Example 2.5.2. For any undirected graph G = (V, E), define an equivalence relation
∼ on V such that v ∼ v ′ if and only if v and v ′ are connected. Then V / ∼ is the set of
connected components of G.
Similarly, for any topological space X, we can define an equivalence relation ∼ on X
such that v ∼ v ′ if and only if they are path-connected, i.e. there exists a continuous
function f : [0, 1] → X such that f (0) = v and f (1) = v ′ . Then X/ ∼ is the set of
path-connected components.

2.6 Coset
In this section, we will furthermore explore the interaction between groups and their
subgroups.
Definition 2.6.1. Let H be a subgroup of a group G. For any a ∈ G, define the left
coset of H in G with respect to a to be

aH := {ah|h ∈ H}.

The set of left cosets of H in G is denoted by G/H.

Property 2.6.1. Let H be a subgroup of G. Then we can verify that the binary relation
∼ on G such that a ∼ b ⇔ a−1 b ∈ H is indeed an equivalence relation. In this case, aH
is the equivalence class of a.

Corollary 2.6.1. G/H is a partition of G.

Note that by the law of cancellation, we can show that every coset of H has the same
size with H. Hence, we have |H||G/H| = |G| if |G| is finte.
Definition 2.6.2. For any subgroup H of G, the index of H in G is |G/H|. This is
usually denoted by [G : H].

23
Hung-Hsun, Yu 2.7 Normal Subgroup and Quotient Group

Corollary 2.6.2. If G is a finite group, then [G : H]|H| = |G|.

Corollary 2.6.3. (Lagrange’s Thm.) For every subgroup H of a finite group G, the
size of H divides the size of G.

To try to apply Lagrange’s theorem, we can investigate the simplest fashion of sub-
group in a group G: the cyclic subgroup generated by an element x ∈ G. We know that
⟨x⟩ is of order ordG (x), and Langrange’s theorem tells us that |⟨x⟩| divides |G|. As a
corollary,
Corollary 2.6.4. If G is a finite group and x ∈ G, then x|G| = 1.

At this point, we are ready to give a group theory proof of Euler’s theorem:
Theorem 2.6.1. (Euler’s) For any positive integer n and any a coprime with n,
aφ(n) ≡ 1 mod n
where φ(n) is the number of positive integers that are not greater than n and are coprime
with n.

Sketch of Proof. Let G be the set {i|1 ≤ i ≤ n, gcd(i, n) = 1}, and define i · j = k if
n|ij − k. By some simple number theory properties, (G, ·) is a group, and so it is a direct
corollary of Corollary 2.6.4.

2.7 Normal Subgroup and Quotient Group


For any subgroup H of a group G, we can construct a natural projection map π : G →
G/H mapping a to aH. However, we wish to pass the group structure from G to G/H
by π. That is, we want to define a group structure on G/H in order for π to be a
homomorphism. This is nonetheless not necessarily possible.
Property 2.7.1. For a subgroup H of a group G, the projection map π : G → G/H is
a homomorphism for some group structure on G/H if and only if gHg −1 = H for every
g ∈ G.

Sketch of Proof. Note that if there is a homomorphism from G to G/H, then abH =
π(ab) = π(a)π(b) = aH · bH. Therefore it remains to see the condition for it to be
well-defined, i.e. if aH = a′ H, bH = b′ H then abH = a′ b′ H.
Now assume that the map is well-defined. Take b = b′ = a−1 and a′ = ah for some
h ∈ H, then aha−1 H = eH, which shows that aha−1 ∈ H. Therefore gHg −1 ⊆ H for
all g ∈ G. Replacing g with g −1 , we get that g −1 Hg ⊆ H, and so H ⊆ gHg −1 . As a
consequence, gHg −1 = H for all g ∈ G. One can also show that if gHg −1 = H then the
map is well-defined, and this is left as an exercise.
Definition 2.7.1. H is a normal subgroup of G if H is a subgroup of G such that
gHg −1 = H for any g ∈ G. It is often denoted by H ⊴ G. In this case, endow G/H with
a group structure such that aH · bH = abH. This is called a quotient group.

When thinking of quotient group G/H, we can see it by setting H as the identity (or
making H vanish). This will be helpful in the next section.

24
Hung-Hsun, Yu 2.8 First Isomorphism Theorem

Property 2.7.2. Every subgroup of an abelian group is normal.

Example 2.7.1. For any n ∈ N, consider the subgroup nZ := {nz|z ∈ Z} in (Z, +).
Since (Z, +) is abelian, the subgroup nZ is normal. Therefore we can consider the quotient
group Z/nZ. This is the addition group on integers modulo n. Note that |Z/nZ| = n
and Z/nZ = ⟨1 + nZ⟩. This shows that Z/nZ ∼ = Cn .

2.8 First Isomorphism Theorem


First isomorphism theorem is a really powerful tool to find the relation between two
groups by investigating a homomorphism.
Definition 2.8.1. For any homomorphism ϕ : G → G′ , define the kernel of ϕ to be
ker(ϕ) = ϕ−1 (1G′ ) and the image of ϕ to be ϕ(G).

One can think of the kernel of ϕ to be the set of elements that will vanish after being
mapped by ϕ. Note that as mentioned before, taking a quotient group is as making a
normal subgroup vanish. Therefore there might be a chance that G/ ker(ϕ) is isomorphic
to the image of ϕ. This is indeed the first isomorphism theorem.
Theorem 2.8.1. (First Isomorphism Theorem) Let ϕ : G → G′ to be a group homo-
morphism. Then ker(ϕ) ⊴ G, the image of ϕ is a subgroup of G′ , and it is isomorphic to
G/ ker(ϕ).

Sketch of Proof. It is easy to verify that ker(ϕ) ⊴ G and that the image of ϕ is a subgroup
of G′ by definition. Now consider the following diagram.

ϕ
G G′

π
ϕ̃
G/ ker(ϕ)

One can show that there exists a unique homomorphism ϕ̃ that makes this diagram com-
mute. By the definition of ker(ϕ), the homomorphism ϕ̃ is a monomorphism. Therefore
G/ ker(ϕ) is isomorphic to the image of ϕ (or equivalently ϕ̃) via ϕ̃.

Now whenever we construct an interesting homomorphism, we can always consider


applying first isomorphism and hope that it gives us something interesting!
Example 2.8.1. For any cyclic group G of order n generated by x, consider the homo-
morphism f from Z to G mapping m to xm . This is an epimorphism by definition, and
the kernel of f is nZ. Therefore Z/nZ ∼= G. This again shows that there exists a unique
cyclic group of order n for any n ∈ N.

Example 2.8.2. Consider the map from GLn (K) to K × by taking the determinant.
This is an epimorphism, and so GLn (K)/ ker(det) ∼
= K ×.

25
Hung-Hsun, Yu 2.8 First Isomorphism Theorem

Definition 2.8.2. Let SLn (K) be the kernel of taking determinant in GLn (K). In
other words, SLn (K) is the set of n by n matrices with entries in K of determinant 1.
This is called the special linear group.

Example 2.8.3. For any n ≥ 2, let sgn : Sn → {±1} such that sgn maps even permu-
tations to 1 and odd permutations to −1. This is an epimorphism, and so Sn / ker(sgn) ∼
=
{±1}.

Definition 2.8.3. Let An be the kernel of sgn. In other words, An is the set of even
permutations in Sn . This is called the alternating group.

Example 2.8.4. Now consider the symmetric group S4 . There are three ways of
partition of [4] into two parts of the same size, i.e. {1, 2} ∪ {3, 4}, {1, 3} ∪ {2, 4} and
{1, 4} ∪ {2, 3}. Label these three partitions as P1 , P2 , P3 .
For any π ∈ S4 , when π permutes the index set [4], we can see that

π(P1 ) = {π(1), π(2)} ∪ {π(3), π(4)}

is still a partition. Similarly π(P2 ), π(P3 ) are also partitions of [4]. Therefore π permutes
the set {P1 , P2 , P3 }, and so we can construct a map ϕ : S4 → S3 such that ϕ(π) is the
permutation of the partitions under π. It is clear that it is a homomorphism. Moreover,
ϕ((1 2)) = (2 3) and ϕ((1 3)) = (1 3). Since (1 3) and (2 3) genererates S3 , we have that
ϕ is surjective. Therefore S4 / ker(ϕ) ∼ = S3 .
Now let’s examine the kernel. Since |S4 | = 24 and |S3 | = 6, the kernel must be of size
4. Now instead of identity, it is clear that (1 2)(3 4), (1 3)(2 4) and (1 4)(2 3) are in the
kernel. Therefore

ker(ϕ) = {id, (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)}.

This group is an abelian group of order 4 whose elements have either order 1 or order 2.
It is usually denoted by K4 and called Klein four group.
In short, K4 ⊴ S4 and S4 /K4 ∼ = S3 .
There is really a reason that we choose S4 in this example. The readers are encouraged
to do the similar trick on other symmetric groups, and it should fail in most of the case.
There is a deep reason behind this, and it will soon be clear.

There are two other isomorphism theorems. However, they are just the corollaries of
first isomorphism theorem and they are relatively subtle, so I will omit the sketches of
proofs.
Theorem 2.8.2. (Second isomorphism theorem) Suppose that G is a group, N is a
normal subgroup of G and H is a subgroup of G. Then HN is a subgroup of G and
H ∩ N is a normal subgroup of H. Moreover, HN /N ∼
= H/(H ∩ N ).

Theorem 2.8.3. (Third isomorphism theorem) If N ⊆ K ⊆ G such that N, K are both


normal subgroups of G, then K/N is a normal subgroup of G/N and (G/N )/(K/N ) ∼
=
G/K.

26
Hung-Hsun, Yu 2.9 Direct Product and Semidirect Product

2.9 Direct Product and Semidirect Product


We have discovered a way to construct a smaller group from a group G and its normal
subgroups. Now we are going in the reverse direction: given two groups, can we construct
a bigger group?
Definition 2.9.1. Suppose that G1 , G2 are two groups, then define the direct product of
G1 , G2 to be the group G defined on the set G1 ×G2 such that (g1 , g2 )·(g1′ , g2′ ) = (g1 g1′ , g2 g2′ )
for any (g1 , g2 )(g1′ , g2′ ) ∈ G1 × G2 . Usually G is also denoted by G1 × G2 .

Example 2.9.1. K4 ∼
= Z/2Z × Z/2Z.

Property 2.9.1. Suppose that G = G1 × G2 . Then there are two natural homomor-
phisms f1 : G1 → G, f2 : G2 → G such that f1 (g1 ) = (g1 , 1G2 ) and f2 (g2 ) = (1G1 , g2 ).
Those two homomorphisms are injective, and so fi (Gi ) ∼
= Gi .

Definition 2.9.2. By abuse of notation, we usually denote fi (Gi ) by Gi , although they


actually don’t lie in G.

Now instead of the inclusion homomorphisms, consider the projection homomorphism


π1 : G → G1 such that π1 (g1 , g2 ) = g1 . The kernel of π1 is precisely G2 , and π1 is
surjective. Therefore,
Property 2.9.2. G1 , G2 are nomal subgroups of G1 × G2 . Moreover,
(G1 × G2 )/G2 ∼
= G1 .

Theorem 2.9.1. (Chinese Remainder Theorem) If m, n ∈ N are relatively prime num-


bers, then Z/mZ × Z/nZ ∼
= Z/mnZ.

Sketch of Proof. Consider the order of (1, 1) in Z/mZ × Z/nZ.


Example 2.9.2. Consider the cyclic groups C2 , C3 . By the Chinese remainder theorem,
C6 ∼
= C2 × C3 . Therefore C6 /C3 ∼
= C2 .

It may seem that direct product and quotient group are just mutually inverse opera-
tions. However, this is not the case in groups.
Example 2.9.3. Consider the cyclic group C4 generated by x. Since x2 has order 2,
the subgroup ⟨x2 ⟩ is isomorphic to C2 . Consider the map f from C4 to itself such that
f (y) = y 2 . This is a homomorphism because C4 is abelian. Both the kernel and image of
f are ⟨x2 ⟩, and so C4 /⟨x2 ⟩ ∼
= ⟨x2 ⟩. However, ⟨x2 ⟩ × ⟨x2 ⟩ ∼
= C2 × C2 is not isomorphic to
C4 because every element has an order 1 or 2 in C2 × C2 .

In this case, we can see that it fails because we cannot “put” C2 back to C4 as a
subgroup. More precisely, there does not exist monomorphism g : C2 → C4 such that
f ◦ g is the identity map on C2 . However, it is not the only potential factor that fails this.
Example 2.9.4. Consider the symmetric group S3 . We can see that ⟨(1 2 3)⟩ is a
normal subgroup of S3 and that

S3 /⟨(1 2 3)⟩ ∼
= ⟨(1 2)⟩.

27
Hung-Hsun, Yu 2.9 Direct Product and Semidirect Product

However, S3 is not isomorphic to ⟨(1 2 3)⟩ × ⟨(1 2)⟩ since the former is non-abelian while
the latter is abelian.

Although this time we can embed the image back to the group perfectly, the direct
product still does not give the correct answer. This is because that if S3 were isomorphic
to ⟨(1 2 3)⟩ × ⟨(1 2)⟩, then (1 2)(1 2 3)(1 2)−1 would be (1 2 3). However, the fact that
⟨(1 2 3)⟩ is a normal subgroup only gives that (1 2)(1 2 3)(1 2)−1 ∈ ⟨(1 2 3)⟩. In fact,
(1 2)(1 2 3)(1 2)−1 = (1 3 2) = (1 2 3)2 .
Let’s first put this aside and describe the situation more precisely.
Property 2.9.3. Suppose that N ⊴ G and f : G → H is an epimorphism with kernel
N . If there exists monomorphism g : H → G such that f ◦ g is the identity map on H,
then g(H) ∩ N = {1G } and G = N g(H).

Sketch of Proof. If a ∈ g(H) ∩ N , then f (a) = 1H . Therefore g −1 (a) = 1H and so a = 1G .


Then it follows by the second isomorphism theorem.
Definition 2.9.3. Suppose that N ⊴ G and H ≤ G such that the natural map H →
G/N is an isomorphism. Then G = N H by the property, and we say that G is the (inner)
semidirect product of N and H. This is denoted by N ⋊ H. To avoid ambiguity, one can
also say that G is a semiproduct of H acting on N .

Example 2.9.5. S3 = ⟨(1 2 3)⟩ ⋊ ⟨(1 2)⟩.

This definition nonetheless rely on the structure that G provides. To actually be


able to construct a new group, we hope to define semidirect product relying on only the
structure of N and H. We will soon define the way to go about it, but let’s stick with
the inner semidirect product for a while to get some sense about how we should actually
construct the group.
If G is a semiproduct of N and H, then every element in G can be uniquely written
as nh where n ∈ N, h ∈ H. If there are two elements n1 h1 , n2 h2 in G, we want to express
n1 h1 n2 h2 in the form nh. Now we can use the fact that N is a normal subgroup:
n1 h1 n2 h2 = n1 h1 n2 (h−1 −1
1 h1 )h2 = [n1 (h1 n2 h1 )](h1 h2 ).

The only complicated term here is (h1 n2 h−1


1 ). Therefore we just need to know (or decide)
how elements in H interact with elements in N . The form gxg −1 is especially crucial not
only here but throughout the whole group theory, so let’s give it a name.
Definition 2.9.4. For any a, b ∈ G, if there exists g ∈ G such that gag −1 = b, then we
say that b is a conjugate of a. For any g ∈ G, consider the map φg : G → G sending x to
gxg −1 . This is called the conjugation map.

Property 2.9.4. A conjugation map is an isomorphism.

Definition 2.9.5. For a group G, an isomorphism from G to itself is called an auto-


morphism of G. The set of all automorphisms of G forms a group under composition and
is denoted by Aut(G).

Now consider the map φ : H → Aut(N ) sending h to the conjugation map φh


restricted on N (we have to check that this is well-defined, and this is left as an exercise).

28
Hung-Hsun, Yu 2.10 Correspondence Theorem

It is clear that φ is a group homomorphism. Now for every n1 h1 , n2 h2 , we can write


n1 h1 n2 h2 as
n1 h1 n2 h2 = [n1 (h1 n2 h−1
1 )](h1 h2 ) = (n1 φh1 (n2 ))(h1 h2 ).

With this discovery, we are ready to define an outer semidirect product.


Definition 2.9.6. Let N and H be two groups and φ : H → Aut(N ) be a group
homomorphism. Define the (outer) semidirect product of N and H with respect to φ be
the group with the underlying set N × H such that

(n1 , h1 ) · (n2 , h2 ) = (n1 φh1 (n2 ), h1 h2 ).

This is denoted by N ⋊φ H.

Example 2.9.6. Suppose G1 , G2 are two groups and φ : G2 → Aut(G1 ) is the trivial
map. Then G1 × G2 = G1 ⋊φ G2 .

Example 2.9.7. Consider the cyclic groups Z/3Z and Z/2Z. Let
φ : Z/2Z → Aut(Z/3Z) be a homomorphism sending x to the automorphism multiplying
2x . Then Z/3Z ⋊φ Z/2Z is isomorphic to S3 .

Property 2.9.5. Suppose that N and H are two groups and φ : H → Aut(N ) is a
group homomorphism. Then N is a normal subgroup of N ⋊φ H and

(N ⋊φ H)/N ∼
= H.

Moreover N ∩ H is trivial in N ⋊φ H and N H = N ⋊φ H.

This leads us to determine the group Aut(G) for any group G. This is however a hard
problem to tackle. Therefore we will only consider the case where G is a cyclic group.
Property 2.9.6. Aut(Cn ) is isomorphic to Z/nZ× , i.e. the multiplicative group of
integers coprime with n modulo n. Aut(Z) ∼
= C2 .

2.10 Correspondence Theorem


Given a homomorphism ϕ : G → G′ , one can verify that for every subgroup H ≤ G, its
image ϕ(H) is a subgroup of G′ (and in particular a subgroup of im ϕ). This defines a map
from the set of the subgroups of G to the set of the subgroups of G′ . The correspondence
theorem states that if ϕ is surjective, then this map is bijective when restricted on the
set of the subgroups of G containing the kernel.
Theorem 2.10.1. For any epimorphism ϕ : G → G′ , the subgroups of G that contain
ker(ϕ) are in bijection with the subgroups of G′ via ϕ that is index-preserving. Moreover
the normal subgroups of G that contain ker(ϕ) are in bijection with the normal subgroups
of G′ .

Sketch of Proof. This is a direct corollary of the first isomorphism theorem.


Example 2.10.1. In this example, we will determine every subgroup of S4 of index 3.
Suppose that H is a subgroup of S4 of index 3 (or of order 8). Consider the epimorphism

29
Hung-Hsun, Yu 2.11 Simple Group and Alternating Group

f : S4 → S3 with kernel K4 . Then H/(H ∩ K4 ) ∼ = HK4 /K4 by the second isomorphism


theorem. By Lagrange’s theorem |HK4 /K4 | divides |S3 |, which shows that |H/(H ∩ K4 )|
divides 6. Since |H| = 8, we can only have |H ∩ K4 | = 4, which shows that K4 ⊆ H.
As a consequence, the subgroups of index 3 in S4 are in bijection with the subgroups of
index 3 in S3 by f . Since the only subgroups in S3 that have index 3 are ⟨(1 2)⟩, ⟨(1 3)⟩
and ⟨(2 3)⟩, we have that the subgroups of index 3 in S4 are their preimage.

2.11 Simple Group and Alternating Group


If a (finte) group has a nontrivial proper normal subgroup, then we can try to take the
quotient group and try to do something like induction. Usually a nontrivial proper normal
subgroup reveals a lot about the group. However, it does not necessarily exist.
Definition 2.11.1. A nontrivial group G is simple if it does not contain any nontrivial
proper normal subgroup.

Example 2.11.1. For any prime p, the cyclic group Cp is simple.

These groups are simple because of obvious reasons. However, there are still tons of
other simple groups. In this section, we are going to prove that An is simple for any
n ≥ 5.
Theorem 2.11.1. For any n ≥ 5, the alternating group An is simple.

Sketch of Proof. For any N ⊴ An , we are going to first prove that it consists of a 3-cycle.
A lemma would be helpful:
Lemma 2.11.1. For a cycle (i1 i2 . . . ik ) and a permutation σ ∈ Sn , we have

σ(i1 i2 . . . ik )σ −1 = (σ(i1 ) σ(i2 ) . . . σ(ik )).

Now for any permutation π in N that is not identity, we can divide this into three
cases:
Case 1. It contains a cycle of length at least 4. Suppose that the cycle is
(i1 i2 i3 i4 . . . ik ). Then by the lemma,

(i1 i2 i3 )(i1 i2 i3 . . . ik )(i1 i2 i3 )−1 = (i2 i3 i1 i4 . . . ik ).

Note that
(i2 i3 i1 i4 . . . ik )−1 (i1 i2 i3 i4 . . . ik ) = (i1 ik i3 ).
As a consequence,
(i1 i2 i3 )π −1 (i1 i2 i3 )−1 π = (i1 ik i3 ).
Since N is normal, this is contained in N .
Case 2. It contains cycles of length at most 3 and there is a 2-cycle. Since π is an
even permutation, there should be two 2-cycles. Suppose that those are (a b) and (c d).
Then
(a b c)(a b)(c d)(a b c)−1 = (b c)(a d).
Note that
(b c)−1 (a d)−1 (a b)(c d) = (a c)(b d).

30
Hung-Hsun, Yu 2.12 Random Problem Set

As a consequence,
(a b c)π −1 (a b c)−1 π = (a c)(b d).
Since N is normal, this is contained in N . By the fact that n ≥ 5, there exists an element
e other than a, b, c, d. Therefore

(a c e)((a c)(b d))(a c e)−1 ((a c)(b d))−1 = (a e c)

is in N .
Case 3. It only contains cycles of length 3. If it only contains one cycle, then we’re
done. Otherwise, there exist two 3-cycles (a b c), (d e f ) in the disjoint representation of
π. By the lemma,

(a b d)(a b c)(d e f )(a b d)−1 = (b d c)(a e f ).

Note that
((b d c)(a e f ))−1 (a b c)(d e f ) = (a c f b d).
As a consequence,
(a b d)π −1 (a b d)−1 π = (a c f b d)
is in N . This reduces the case to Case 1.
In conclusion, there is always a 3-cycle in N . Now we are going to prove that every
3-cycle belongs in N . This will conclude the proof since 3-cycles generate An .
Suppose that (i j k) ∈ N and p, q are two other indices. For any three cycle (i′ j ′ k ′ ),
pick a permutation σ such that σ(i) = i′ , σ(j) = j ′ , σ(k) = k ′ . We can assume that
σ ∈ An , for if σ is odd, then we just need to pick σ ′ = σ(p q). Now by the lemma,

σ(i j k)σ −1 = (i′ j ′ k ′ ) ∈ N.

2.12 Random Problem Set


1. (2.1) Show that if · is a law of composition on G and e ∈ G such that:
(1) The law of associativity holds;
(2) e · a = a for any a ∈ G;
(3) for any a ∈ G there exists a′ ∈ G such that a′ · a = e,
then (G, ·) is a group.

2. (2.2) (Much harder) For every permutation π ∈ Sn , define l(π) to be the number of
cycles in the disjoint cycle representation (including the 1-cycles). Prove that the
minimum number of 2-cycles whose product is π is n − l(π).

3. (2.3) Show that every group of finite and even order has an element of order 2.

4. (2.3) Show that for any a, b ∈ G, we have ordG (ab) = ordG (ba).

5. (2.4) Show that there exists a group G such that there is a proper subgroup H of
G that is isomorphic to G.

31
Hung-Hsun, Yu 2.12 Random Problem Set

6. (2.5) (Kind of tricky) Assume the axiom of choice in this problem. There are people
in a line labeled with 0, 1, 2, and so on. Each person has a hat that writes 0 or 1,
but the person does not know what it is. A person with the label i can see the hats
of all the people having larger labels. Now they are asked to guess the numbers
on their hats simultaneously. Show that if they are allowed to have a discussion
beforehand, then it is possible to guarantee that all but finitely many people guess
correctly.

7. (2.6) (Much much harder) Suppose that G is a group generated by n elements such
that every g ∈ G is of order 3. Prove that |G| ≤ 32 −1 .
n

8. (2.6) Given an n × n matrix of light bulbs and switches. For any switch, whenever
it is turned on/off, the bulbs at its position and next to it all alternate from on to
off or from off to on. Prove that starting from the state that all bulbs are off, the
number of possible states that one can reach is the power of 2.

9. (2.6) Prove that every group of prime order is cyclic.

10. (2.7) Suppose that G is a finite group and p is the smallest prime factor of |G|.
Show that every subgroup of G of index p is normal in G.

11. (2.7) Consider the following subgroup of GL2 (R):


ñ ô
1 x
H := { |x ∈ R}.
0 1

Show that H is a subgroup but not normal in GL2 (R).

12. (2.7) For any group G, define the commutator subgroup [G, G] as the subgroup
generated by
{ghg −1 h−1 |g, h ∈ G}.
Show that [G, G] is a normal subgroup of G and G/[G, G] is abelian.

13. (2.7) Let G be a subgroup of GLn (R). Prove that the matrices that are path-
connected to the identity in G form a normal subgroup of G.

14. (2,7) Suppose that M, N are both normal subgroups of G such that M ∩ N = {1}.
Show that mn = nm for any m ∈ M and n ∈ N .

15. (2.9) Suppose that G is a group containing two elements a, b such that ordG (a) = n
and aba−1 = bk . Prove that ordG (b)|k n − 1. Conversely, suppose that d is a positive
integer such that d|k n − 1 then there exists a group G and a, b ∈ G such that
ordG (a) = n, ordG (b) = d and aba−1 = bk .

16. (2.11) Show that if there exists an epimorphism from Sm to Sn , then m = n, n = 1,


n = 2 or (m, n) = (4, 3).

32
Chapter 3

Vector Space

We will leave the group kingdom for a while and jump into a more concrete area: the
vector space. There are a lot more restrictions on vector space, which makes it easier
to handle. These restrictions are reasonable though, so that the concept of vector space
appears a lot in various topics.
Matrix will appear a lot in this chapter, so it will be better to first get familiar with
the operations and then read this chapter. Also, which is the row and which is the column
might confuse you from time to time. If that happens, make sure to think clear and keep
track of the definition/convention.

3.1 Field
Before jumping into the definition of vector space, we have to first define what a field is.
Definition 3.1.1. A field is a non-trivial abelian group (F, +) with an additional law
of composition × such that:
(1) There is an element 1 ∈ F such that 1 × x = x for any x ∈ F .
(2) For any 0 ̸= x ∈ F there exists x−1 ∈ F such that x × x−1 = 1.
(3) (commutativity) a × b = b × a for any a, b ∈ F .
(4) (distributivity) (a × b) × c = a × (b × c) for any a, b, c ∈ F .
(5) (distributivity) (a + b) × c = a × c + b × c for any a, b, c ∈ F .

Informally, a field is the set that we can do addition, subtraction, multiplication and
division on.
Example 3.1.1. Q, R, C, Z/pZ are fields (where p is a prime). Z is not a field.

Note that in the definition, we don’t require 0 to have a multiplicative inverse. This
is because that it is impossible for 0 to have one.
Property 3.1.1. Suppose that F is a field. Then 0x = 0 for every x ∈ F .

Corollary 3.1.1. For any a, b ∈ F , ab = 0 implies that a = 0 or b = 0.

Corollary 3.1.2. (F \{0}, ×) is an abelian group. This is usually denoted by F × .

We will talk more about the properties of fields later, but for now it is enough to

33
Hung-Hsun, Yu 3.2 Definitions and Examples

know what a characteristic is.


Definition 3.1.2. The characteristic of a field is the order of the multiplicative identity
in the additive group. If the order is infinity, by convention define the characteristic to
be zero. This is denoted by char(F ).

Property 3.1.2. The characteristic of a field can only be 0 or prime.

Sketch of Proof. If the order of 1 is ab for some a, b ∈ N such that a, b > 1. Then a · 1, b · 1
are not zero in F while their products are. Here a · 1 means 1 + 1 + · · · + 1 where 1
appears a times.

Property 3.1.3. For any field F , if char(F ) = p ̸= 0 then Fp ⊆ F (where Fp is the


field with p elements); else Q ⊆ F .

In this section, we will mostly discuss about the properties about the vector space
over the general field. However, it is always helpful to take R or C as the field to get a
tangible example.

3.2 Definitions and Examples


Definition 3.2.1. An abelian group (V, +) is a vector space over a field F if there is a
function (scalar multiplication) · : F × V → V such that:
(1) (associativity) (ab) · v = a · (b · v) for any a, b ∈ F and v ∈ V .
(2) (distributivity) (a + b) · v = a · v + b · v and a · (v + w) = a · v + a · w for any a, b ∈ F
and v, w ∈ V .
(3) 1 · v = v for any v ∈ V .

Example 3.2.1. For any positive integer n and field F , the additive group F n is a
vector space over F . The set of polynomials with coefficients in F (or the polynomial
ring over F , denoted by F [x]) is also a vector space over F .

Property 3.2.1. If V is a vector space over F , then 0v = 0 for any v ∈ V .

As in the case in group theory, we want to define what subobjects and maps we want
to consider when it comes to vector space.
Definition 3.2.2. Suppose that V is a vector space over F and W is a subset of V
such that W is also a vector space over F , then W is called a vector subspace of V .

Definition 3.2.3. Suppose that V1 , V2 are vector spaces over F . A map f : V1 , V2 is a


linear transformation if for any v, v ′ ∈ V1 and a ∈ F ,

f (av + v ′ ) = af (v) + f (v ′ ).

Property 3.2.2. A map f : V1 , V2 is a linear transformation if and only if for any


v, v ′ ∈ V1 and a ∈ F , the following two hold:

f (v + v ′ ) = f (v) + f (v ′ ),

34
Hung-Hsun, Yu 3.3 Basis

f (av) = af (v).

Example 3.2.2. For any vector subspace W of a vector space V , the inclusion map is
a linear transformation.

Example 3.2.3. For any x ∈ F , the evaluation map evx : F [x] → F that evaluates the
polynomials at x is a linear transformation.

Definition 3.2.4. For two vector spaces V, V ′ over F , if there exists a bijective linear
transformation from V to V ′ then we say that V is isomorphic to V ′ , or V and V ′ are
isomorphic.

Example 3.2.4. The set F [x]≤d := {f ∈ F [x]| deg f ≤ d} is a vector subspace of F [x].
It is isomorphic to F d+1 .

3.3 Basis
In this section, we will introduce the concept of basis. It is so essential that most of the
theory in linear algebra depend on the existence (and appropriate choice) of basis. Before
that, let’s first introduce generating and linearly independent sets.
Definition 3.3.1. Suppose that V is a vector space over F and S ⊆ V is a subset. The
set S is spanning if for every v ∈ V there exist v1 , . . . , vn ∈ S and a1 , . . . , an ∈ F such
that a1 v1 + · · · + an vn = v. In other words, every element can be represented as a linear
combination of elements in S.

Definition 3.3.2. Suppose that V is a vector space over F and S ⊆ V is a subset.


The set S is linearly independent if for any v1 , . . . , vn ∈ S and a1 , . . . , an ∈ F such that
a1 v1 +· · ·+an vn = 0, the coefficients a1 , . . . , an are all zero. In other words, any nontrivial
linear combination of elements in S is not zero.

Note that although S may be infinite, we only consider finite sums in linear algebra
with the absence of the concept of limit. For example, the set {1, 0.1, 0.01, . . .} is not
spanning in R over Q.
Definition 3.3.3. Suppose that V is a vector space and S ⊆ V is a subset. If S is both
spanning and linearly independent, then S is called a basis of V .

Intuitively generating sets are “larger” than linearly independent sets, so bases are
the sets that fall on the borderline between spanning sets and linearly independent sets.
Property 3.3.1. Suppose that {v1 , . . . , vn } is a basis of V , then for any v ∈ V there
exists a unique tuple (a1 , . . . , an ) ∈ F n such that a1 v1 + · · · + an vn = v.

Sketch of Proof. The existence follows by the defintion. The uniqueness follows by taking
the difference of two tuples that result in the same vector in V .

Corollary 3.3.1. If V is a vector space that has an ordered basis β of size n, then V
is isomorphic to F n . Moreover, we can write the isomorphism ϕ : F n → V explicitly as

35
Hung-Hsun, Yu 3.4 Linear Transformation and Matrix

ϕ(v) = βv where β = (β1 , β2 , . . . , βn ).

This shows that it is easy to work in a vector space that has a basis. In fact, this is
true for finite dimensional vector spaces, whatever this means.
Theorem 3.3.1. Suppose that V is a vector space that has a finite generating set.
Then there is a finite basis of V .

Sketch of Proof. Consider a spanning set which is the smallest. Then it is the basis.
Definition 3.3.4. In this case, define the dimension of V to be the size of basis. This
is independent of the choice of basis and is denoted by dim V .

Remark. In fact, if we assume the axiom of choice (which we will in this note), then
there always is a basis of a given vector space, regardless of whether it is finite dimensional.
This can be shown by Zorn’s lemma and is left as exercise for readers that are acquainted
with the axiom of choice. The dimension of an infinite dimensional vector space is then
defined as the cardinality of the basis. That said, most of the discussion of linear algebra
will be on finite dimensional vector spaces in this note.

Example 3.3.1. The vector space containing polynomials over F of degree at most
d is of dimension d + 1, for the set {1, x, . . . , xd } is a basis of it. Note that the set
{1, x + 1, x2 + x + 1, . . . , xd + · · · + 1} is also a (rather unusual) basis. This suggest that
basis is generally not unique.

Next, we will prove a powerful theorem to conclude this section.


Theorem 3.3.2. (The replacement lemma) Suppose that S ⊆ V is a finite spanning
set and T ⊆ V is an independent set. Then there is a subset U ⊆ S such that |U | = |T |
and S\U ∪ T is still generating.

Sketch of Proof. We can do a (careful) induction on |T |. Suppose that v1′ ∈ T and


a1 v1 + · · · + an vn = v1′ where a1 , . . . , an ∈ F and v1 , . . . , vn ∈ S. WLOG suppose that
a1 ̸= 0, then we can rewrite the equation as v1 = a−1 ′
1 (v1 − a2 v2 − · · · − an vn ). Hence we
can replace v1 with v1′ . One must nonetheless notice that v1 might also come from T , but
this can be avoided since T is linearly independent.
Corollary 3.3.2. Suppose that V is a finite dimensional vector space and S ⊆ V is a
linearly independent set. Then there exists a basis of V that contains S.

3.4 Linear Transformation and Matrix


Now that we know that all finite dimensional vector space is isomorphic to F n for some
n, we can easily describe any linear transformation as a matrix.
Definition 3.4.1. The standard basis of F n is the basis
   

 1 0  

 

0
 
0
. , . . . , . .
   . 
 .. 

  .  

 

0 1

36
Hung-Hsun, Yu 3.4 Linear Transformation and Matrix

The vectors are usually denotes as e1 , e2 , . . . , en .

Property 3.4.1. Suppose that T : F n → F m is a linear transformation and A ∈


Mm×n (F ) is the unique matrix such that


m
aij ei = T (ej ) ∀j = 1, . . . , n,
i=1

then T (v) = Av for any v ∈ F n .

This property shows that determining the value of a linear transformation is equivalent
to determining the value that it takes on the basis.
Since we show that any finite dimensional vector space is isomorphic to F n for some
n, we can pull this back to the general setting.
Corollary 3.4.1. Suppose that T : V → V ′ is a linear transformation and β, β ′ are
ordered basis of V, V ′ , respectively. Then there is a unique matrix A ∈ Mm×n (F ) such
that
T (βv) = β ′ Av
for any v ∈ F n .

Definition 3.4.2. In the previous setting, A is usually denoted as [T ]ββ .

This tells us that linear transformations between finite dimensional vector spaces and
the matrices are essentially the same. From now on, we will frequently exchange these
two ideas at our convenience.
Note that under suitable situation we can define the composition of linear transforma-
tion. Ideally the matrix form of the composition should be the multiplication of matrices.
One can also think of it as the reason the multiplication is defined in this way.
Property 3.4.2. Suppose that T1 : V1 → V2 and T2 : V2 → V3 are two linear transfor-
mations and α, β, γ are bases of V1 , V2 , V3 , respectively. Then [T2 ◦ T1 ]γα = [T2 ]γβ [T1 ]βα .

Example 3.4.1. Consider R[x]≤3 . We can see that the map dx d


: R[x]≤3 → R[x]≤2 is a

linear transformation. Now take β = (1, x, x , x ) and β = (1, x, x2 ). We have that
2 3

d
(1) = 0 + 0x + 0x2 ,
dx
d
(x) = 1 + 0x + 0x2 ,
dx
d 2
(x ) = 0 + 2x + 0x2
dx
d 3
(x ) = 0 + 0x + 3x2 .
dx
Therefore  
0 1 0 0
d β′
[ ]β = 0 0 2 0
.
dx
0 0 0 3

37
Hung-Hsun, Yu 3.5 Multilinear Alternating Form

As a sanity check, 1 + 2x + 3x2 + 4x3 = β(1, 2, 3, 4)T and


 
 1 
d î ó 0 1 0 0  
2
2 
β ′[ β′
]β (1, 2, 3, 4)T = 1 x x 0 0 2 0
   = 2 + 6x + 12x2 ,
3
dx
0 0 0 3
4
d
which is indeed dx
(1 + 2x + 3x3 + 4x3 ).

3.5 Multilinear Alternating Form


In the previous section we finally see why the multiplication of matrix is defined in such
a weird fashion. With that point of view, one can easily prove the associativity of matrix
multiplication. In this section, we will give an intuition of where the determinant come
from. This will allow us to give a short proof of the multiplicativity of determinant.
Definition 3.5.1. Suppose that V, W are vector spaces over F . A map f : V n → W is
a multilinear form if it is linear with respect to each entry. That is, for any i = 1, . . . , n
and v1 , v2 , . . . , vn ∈ V , the map

fv1 ,...,vi−1 ,vi+1 ,...,vn (v) = f (v1 , . . . , vi−1 , v, vi+1 , . . . , vn ) : V → W

is a linear transformation.

Example 3.5.1. The map from F n to F taking the product of n variables is multilinear.
The map f : (Rn )2 → R such that f (u, v) = uT v is multilinear (and bilinear in this case
since there are two variables).

Property 3.5.1. Given a f : V n → W is a multilinear form. If l1 , . . . , ln are positive


integers and aij ∈ F, vij ∈ V for i = 1, . . . , n and j = 1, . . . , li , then
Ñ é ( n )

l1 ∑
ln ∑ ∏
f a1j v1j , . . . , anj vnj = aig(i) f (v1g(1) , . . . , vng(n) )
j=1 j=1 g i=1

where the summation runs over all g : [n] → N such that gi ≤ li .

Despite the complicated looking of this property, this is actually a type of distribu-
tivity. When in doubt, try to think of multilinear forms as products.
Definition 3.5.2. A multilinear form f : V n → W is alternating if f (v1 , . . . , vn ) = 0
for any v1 , . . . , vn ∈ V n where vi = vj for some i ̸= j.

Property 3.5.2. If f is a multilinear alternating form, then exchanging two variables


changes the sign of f .

Sketch of Proof. It suffices to show the bilinear case. Suppose that v, w ∈ V , then we
want to show that f (v, w) + f (w, v) = 0. By the antisymmetry,

0 = f (v + w, v + w) = f (v, v) + f (v, w) + f (w, v) + f (w, w) = f (v, w) + f (w, v).

38
Hung-Hsun, Yu 3.5 Multilinear Alternating Form

Remark. It seems that one can take the previous property as definition, which is more
intuitive, and recover the current definition as the following:

f (v, v) = −f (v, v), ⇒ f (v, v) = 0.

However this fails when char(F ) = 2. This is why we take f (v, v) = 0 as definition.

Example 3.5.2. The map f : (R3 )3 → R taking the triple product f (u, v, w) = u ·
(v × w) is multilinear and alternating. More genrally, the map f : (Fn )n taking the
determinant of the matrix formed by the n column vectors is multilinear and alternating.

Property 3.5.3. Given a multilinear alternating form f : V n → W . For any v1 , . . . , vn ∈


V and matrix A ∈ Mn×n (F ), the following identity holds:
Ñ é

n ∑
n
f a1j vj , . . . , anj vj = det(A)f (v1 , . . . , vn ).
j=1 j=1

Sketch of Proof.
Ñ ( n ) é

n ∑
n ∑ ∏
f a1j vj , . . . , anj vj ) = aig(i) f (vg(1) , . . . , vg(n)
j=1 j=1 g i=1

where the summation runs through all g : [n] → [n]. Note that if g(i) = g(j) for i ̸= j,
then vg(i) = vg(j) and so the term vanishes. Therefore
( n ) ( n )
∑ ∏ ∑ ∏
aig(i) f (vg(1) , . . . , vg(n) ) = aiσ(i) f (vσ(1) , . . . , vσ(n) ).
g i=1 σ∈Sn i=1

Note that since σ is a permutation, we can rearrange f (vσ(1) , . . . , vσ(n) ) to f (v1 , . . . , vn )


by a series of exchanging two variables. The number of exchanges has the same parity
with σ, and so
( n ) ( n )
∑ ∏ ∑ σ

aiσ(i) f (vσ(1) , . . . , vσ(n) ) = (−1) aiσ(i) f (v1 , . . . , vn ),
σ∈Sn i=1 σ∈Sn i=1

which is just det(A)f (v1 , . . . , vn ).


As before, one can also see this as the reason the determinant is defined in a such
weird fashion.
Corollary 3.5.1. If A, B are two n by n matrices, then det(AB) = det(A) det(B).

Sketch of Proof. Suppose that LA : F n → F n and LB : F n → F n are “left multiplica-


tion”, then LAB = LA ◦ LB . Let f : V n → W be a suitable multilinear alternating form.
By the previous property,

det(AB)f (v) = f (LAB (v)) = f (LA (LB (v))) = det(A)f (LB (v)) = det(A) det(B)f (v)

for any v ∈ V . Now it remains to choose suitable f and v such that f (v) ̸= 0, and this
is left as an exercise.

39
Hung-Hsun, Yu 3.6 Change of Basis

3.6 Change of Basis


We have seen that in vector spaces V over F of dimension n, if we choose a (ordered) basis,
then we can identify the elements in V with the elements with F n . In other words, we
can have a system of coordinates with respect to that basis. However, there are plenty
of bases in a given vector space, which gives rise to plenty of systems of coordinates.
In this section, we will see how one can do the so called coordinate transformation, or
equivalently, change of basis.
Property 3.6.1. (Change of basis formula for coordinates) Suppose that β and β ′ are
two bases of an n-dimensional vector space V over F . Let P be the n by n matrix
such that the i-th column is the coordinate of βi with respect to the basis β ′ . Then the
following hold:

(1) P = [id]ββ .
(2) P is invertible.
(3) β = β ′ P .
(4) If the coordinate of v ∈ V is x ∈ F n with respect to β, then its coordinate with
respect to β ′ is P x.

Corollary 3.6.1. (Change of basis formula for linear transformation) Suppose that
T : V1 → V2 is a linear transformation and α, α′ are two bases of V1 , β, β ′ are two bases
of V2 , then
′ ′
[T ]βα′ = [idV2 ]ββ [T ]βα [idV1 ]αα′ .

This shows that changing bases is just left multiplying the coordinate system with
an invertible matrix and left (and right) multiplying the transformation matrix with
invertible matrices. The converse is also true, i.e. if we do such multiplication then we
can change the basis accordingly.
Property 3.6.2. Suppose that β is a basis of an n-dimensional vector space V over F
and P is an n by n matrix that is invertible. Then β ′ = βP −1 is also a basis.

Corollary 3.6.2. Suppose that T : V1 → V2 is a linear transformation who has a


matrix form A ∈ Mm×n (F ) with respect to some bases of V1 and V2 . Then for any
P ∈ GLn (F ), Q ∈ GLm (F ) we can change the bases appropriately so that the matrix
form of T with respect to the new bases is QAP −1 .

This tells us that A is intrinsically the same as QAP −1 when seen as linear transfor-
mations for any invertible matrices P, Q. We will use this to discover some useful facts
later.

3.7 Rank and Nullity


As in the case in group theory, the image and the kernel of a linear transformation are
objects of our interest.
Property 3.7.1. The image and the kernel of a linear transformation are always vector
spaces.

40
Hung-Hsun, Yu 3.7 Rank and Nullity

Definition 3.7.1. The rank of a linear transformation T , denoted by rk(T ), is the


dimension of its image. The nullity of a linear transformation T , denoted by nul(T ), is
the dimension of its kernel.

Property 3.7.2. Suppose that T is a linear transformation that is the composition of


two linear transformations T1 and T2 . Then rk(T ) ≤ rk(T1 ) and rk(T ) ≤ rk(T2 ).

Theorem 3.7.1. Suppose that T : V1 → V2 is a linear transformation of rank r. Then


there exist bases α of V1 and β of V2 such that T (αi ) = βi for i = 1, . . . , r and T (αi ) = 0
for i = r + 1, . . . , dim V1 .

Sketch of Proof. Choose a basis β ′ of im(T ). Extend it by the replacement lemma to a


basis β of V2 . For each βi′ choose αi′ such that T (αi′ ) = βi′ . It is easy to verify that α′
is linearly independent. Extend α′ by adding the basis of the kernel. It is easy to show
that the result is a basis of V1 .

This is actually a stronger version of the first isomorphism theorem in vector space.
Moreover, this shows that the only intrinsic property of a linear transformation (besides
its dimension) is the rank. The readers are asked to think it through.
Corollary 3.7.1. (Rank-nullity theorem) Suppose that T : V1 → V2 is a linear trans-
formation, then rk(T ) + nul(T ) = dim(V1 ).

Corollary 3.7.2. Suppose that A is an m by n matrix. Then LA (i.e. left multiplying


the vectors with A) is of rank r if and only if there exists P ∈ GLn (F ) and Q ∈ GLm (F )
such that
ñ ô
−1 Ir 0
QAP = .
0 0

Now that we know that linear transformation and matrix are essentially the same, we
can extend the definitions to the matrices.
Definition 3.7.2. The rank of a matrix A, denoted by rk(A), is the rank of the linear
transformation LA . The nullity of a matrix A, denoted by nul(A), is the nullity of the
linear transformation LA .

Property 3.7.3. Suppose that A and B are two matrices such that AB is defined, then
rk(AB) ≤ rk(A) and rk(AB) ≤ rk(B).

Note that the image of LA is simply the vector space spanned by the columns of A.
This is called the column rank of A in this context. A natural question here is: if we
consider the row rank of A (i.e. the rank of AT or the rank of RA ), will it be the same?
The answer is yes.
Property 3.7.4. The row rank and the column rank of a matrix is identical. Therefore
there is no need to distinguish the two terminologies.

Sketch of Proof. Suppose that the column rank of A is r, then we know that there are

41
Hung-Hsun, Yu 3.8 Another Dimension Formula

invertible matrices P, Q such that


ñ ô
−1 I 0
QAP = r .
0 0

Since (P −1 )T , (Q−1 )T are also both invertible and


ñ ôT ñ ô
−1 T I 0 I 0
(P ) A Q = r
T T
= r ,
0 0 0 0

AT also has column rank r. This shows that the row rank of A is r.

Corollary 3.7.3. The rank of an m by n matrix is at most min(m, n). The rank of
a linear transformation from an n-dimensional vector space to an m-dimensional vector
space is of rank at most min(m, n).

Definition 3.7.3. A matrix or linear transformation is said to be of full rank if its rank
is the maximum possible out of among the matrices/linear transformations of the same
dimension.

3.8 Another Dimension Formula


In the previous section we prove the rank-nullity theorem which tells us something about
the dimensions. There is another important theorem about the dimensions:
Theorem 3.8.1. Suppose that V is a finite dimensional vector space and V1 , V2 are its
vector subspaces. Then V1 ∩ V2 , V1 + V2 := {v1 + v2 |v1 ∈ V1 , v2 ∈ V2 } are also vector
subspaces of V and

dim(V1 + V2 ) + dim(V1 ∩ V2 ) = dim V1 + dim V2 .

Sketch of Proof. Let γ be the basis of V1 ∩ V2 and extend it to α ∪ γ to be a basis of V1 .


Also extend γ to β ∪ γ to be a basis of V2 . Then α ∪ β ∪ γ is a basis of V1 + V2 .

This is actually a stronger version of the second isomorphism in vector space. In


particular, when V1 ∩ V2 = {0} then elements in V1 + V2 can be uniquelly written in the
form v1 + v2 where v1 ∈ V1 , v2 ∈ V2 .
Definition 3.8.1. If V1 , V2 are vector subspaces of V such that V1 ∩ V2 = {0}, then we
often denote V1 + V2 by V1 ⊕ V2 . This is called the direct sum of V1 and V2 (or V1 direct
sum V2 ).

This definition will become useful in the next chapter where we start to focus on linear
operator restricted on a subspace. For now, just remember that such thing exists.

3.9 Application: Lagrange Interpolation


In this section we will take Lagrange interpolation as an example to see the power of
linear algebra. We first recall the problem that Lagrange interpolation is used to solve:

42
Hung-Hsun, Yu 3.9 Application: Lagrange Interpolation

Problem 3.9.1. Given n distinct points x1 , . . . , xn ∈ F and a1 , . . . , an ∈ F , we want


to find a polynomial f ∈ F [x] such that f (xi ) = ai for any i = 1, . . . , n. Moreover, we
want the degree of f to be as small as possible. What is the smallest integer d such that
there is always a polynomial of degree at most d satisfying the condition?

We can phrase this in the language of linear algebra:


Problem 3.9.2. Given n distinct points x1 , . . . , xn ∈ F . For any F [x]≤d , consider the
linear transformation ϕ : F [x]≤d → F n such that ϕ(f ) = (f (x1 ), . . . , f (xn )). What is the
smallest possible d such that ϕ is surjective?

We can deduce a lower bound of d immediately by the rank-nullity theorem. If ϕ is


surjective, then n = rk(ϕ) ≤ dim(F [x]≤d ) = d + 1, and so d ≥ n − 1. Next we will show
that when d = n − 1, the linear transformation ϕ is surjective (and therefore bijective).
Let β = (1, x, . . . , xn−1 ) be a basis of F [x]≤(n−1) and β ′ = (e1 , . . . , en ) be a basis of
n
F . Then  
1 x1 · · · xn−11
1 x n−1 
β′  2 · · · x2 

[ϕ]β =  .. .. .. 
.
. . . 
1 xn · · · xn−1
n

This is called Vandermonde matrix. To see that ϕ is bijective, we only need to show that
the matrix is invertible, or equivalently the determinant of the Vandermonde matrix is
nonzero. Indeed, we can compute the determinant explicitly:
Property 3.9.1. The determinant of Vandermonde matrix is

(xj − xi ).
i<j

Sketch
Än ä of Proof. It is clear that the determinant is a homogeneous polynomial of degree
2
. Moreover, if we set xi = xj for i < j, then since there are two indentical rows, the
determinant is zero. This shows that xj − xi divides the determinat. As a consequence,

the determinant is c i<j (xj − xi ) for some constant c. It remains to determine c, which
is clearly 1.

As a consequence, when x1 , . . . , xn are distinct, the determinant is nonzero, and so ϕ


is bijective.
Corollary 3.9.1. For any distinct x1 , . . . , xn ∈ F and a1 , . . . , an ∈ F , there exists a
unique polynomial f of degree at most n − 1 such that f (xi ) = ai for i = 1, . . . , n.

Remark. In fact there is a way to show that ϕ is bijective without calculating the
determinant. We can show that ϕ is injective by the fact that if f (xi ) = 0 then (x −
x1 ) · · · (x − xn )|f .

Remark. It is possible to construct the polynomial f explicitly by xi and ai :


n ∏ x − xj
f (x) = ai .
i=1 j̸=i xi − xj

43
Hung-Hsun, Yu 3.10 Random Problem Set

3.10 Random Problem Set


1. (3.1) For every field F , prove that the set of rational polynomials with coefficients
in F , i.e. ® ´
f (x)
F (x) := |f, g ∈ F [x], g ̸= 0 ,
g(x)
is a field.

2. (3.2) Consider the euclidean space R3 . This is a vector space over R. For any
v ∈ R3 we can consider the map fv : R3 → R3 such that fv (u) = u × v for any
u ∈ R3 . Show that fv is a linear transformation but is not bijective.

3. (3.3) For every finite dimensional vector space V and a vector subspace W of V ,
show that dim W ≤ dim V . Moreover, show that the equality holds if and only if
V = W.

4. (3.4) Choose your favorite prime p that is not too large. Let β = {1, x, . . . , xp−1 }
be a basis of Fp [x]≤p−1 and β ′ be the standard basis of Fpp . Let T : Fp [x]≤p−1 → Fpp
be the linear transformation such that T (f ) = (f (0), f (1), . . . , f (p − 1)). Compute

[T ]ββ and deduce that T is an isomorphism.

5. (3.5) Show that if f : (F n )n → F is a multilinear alternating form such that


f (e1 , . . . , en ) = 1, then f is the map taking the determinant of the matrix formed
by the n column vectors.

6. (3.6) Compute the order of the set GLn (Fp ) by considering the number of (ordered)
bases in Fnp .

7. (3.7) Prove that for any m by n matrix A, we can decompose it into the sum of
rk(A) rank-one matrices, and that this is the least possible. This is called the
rank-one decomposition of A.

8. (3.7) Show that the rank of a matrix is the number of pivots of its reduced row
echelon form. Now consider Problem 2 in Chapter 1 again.

9. (3.7) Show that a matrix A is of rank r if and only if every (r + 1) by (r + 1)


submatrix of A is non-invertible and there exists r by r submatrix of A that is
invertible.

10. (3.9) Compute the inverse, if exists, of Vandermonde matrix.

11. (3.9) Show that


à í
(a1 + b1 )−1 (a1 + b2 )−1 · · · (a1 + bn )−1

(a2 + b1 )−1 (a2 + b2 )−1 · · · (a2 + bn )−1 i<j (aj − ai )(bj − bi )
det .. .. .. = ∏n ∏n .
. . . i=1 j=1 (ai + bj )
−1 −1
(an + b1 ) (an + b2 ) · · · (an + bn )−1

44
Chapter 4

Linear Operator

In the previous chapter we have seen that the only intrinsic property of a linear transfor-
mation is rank. However, if the domain and the range of a linear transformation is the
same vector space, then we usually hope that the bases that we choose for the domain
and the range are the same. As a consequence, the argument in the previous chapter does
not work any more, and we will actually see that in this case the linear transformation
possess much more intrinsic properties. This kind of linear transformation are called
linear operator.
In this section we will develop the theory of linear operator. The main result of this
chapter will be the Jordan canonical form. There is in fact a more general result, i.e. the
rational form. This is nonetheless hard to prove at this point, and I will put it off until
we learn about module theory.

4.1 Definition and Examples


Definition 4.1.1. If V is a vector space and T : V → V is a linear transformation,
then we say that T is a linear operator on V .

Since now the domain and the range are the same, we just need to choose a basis.
Definition 4.1.2. Suppose that T is a linear operator on a finite dimensional vector
space V who has a basis β, then [T ]β is the matrix [T ]ββ .

Property 4.1.1. (Change of basis formula for linear operator) Suppose that T is a
linear operator on a n-dimensional vector space V and β, β ′ are two bases of V . Then

[T ]β ′ = [id]ββ [T ]β [id]ββ ′ ,

or equivalently,
′ ′
[T ]β ′ = [id]ββ [T ]β ([id]ββ )−1 .
Conversly, if A is the matrix form of T with respect to some basis of V and Q is an n by
n invertible matrix, then there exists a basis of V such that the matrix form of T with
respect to that basis is QAQ−1 .

Definition 4.1.3. Suppose that A, B are two n by n matrix such that there exists
Q ∈ GLn (F ) such that B = QAQ−1 , then we say that A, B are similar.

45
Hung-Hsun, Yu 4.2 Determinant and Characteristic Polynomial

Look at that sad face. Quite unforgettable, huh?


Example 4.1.1. Consider a linear operator rθ on R2 rotating the whole plane about
the origin. Algebraically, consider the left multiplication of
ñ ô
cos θ − sin θ
.
sin θ cos θ

For any θ we can see that rθ is of full rank. However, they are not intrinsically the same.
In particular, r0 is the identity, while rπ is the reflection about the origin.

From this example we see that merely the rank does not determine the property of
the linear operator. From now on, we will be dedicated to finding other invariant of linear
operators.

4.2 Determinant and Characteristic Polynomial


The invariant hunt starts with observing that the change of basis formula is actually quite
nice.
Property 4.2.1. Suppose that T is a linear operator on a finite dimensional vector
space, then the determinant of the matrix form of T does not depend on the choice of
basis.

Sketch of Proof. Suppose that A, B are two matrix forms of T , then A and B are similar.
Therefore they have the same determinant.

Definition 4.2.1. For a finite dimensional linear operator T , the determinant of T ,


also denoted by det(T ), is the determinant of any matrix form of T .

Now we can take full advantage of the determinant to find another invariant:
Definition 4.2.2. If T is a linear operator on an n-dimensional vector space, then the
determinant of (x × id −T ), a polynomial in x of degree n, is called the characteristic
polynomial of T . This is denoted by charT (x). Similarly, if A is an n by n matrix, then
charA (x) is defined to be det(xIn − A).

Property 4.2.2. If A, B are similar square matrices, then charA (x) = charB (x).

Note that [xn−1 ] charA (x) is negative the sum of diagonal entries. Since the character-
istic polynomial of a linear operator T does not depend on the basis, the sum of diagonal
entries of its matrix form also is independent of the basis. This is another important
invariant that is worth a name.
Definition 4.2.3. If A is a square matrix, then the trace of A is the sum of its diagonal
entries. This is denoted by tr(A). If T is a finite dimensional linear operator, then tr(T )
is the trace of any matrix form of T .

We already know from the property of characteristic polynomial that two similar
matrices have the same trace. In fact, it holds in a slightly general situation.

46
Hung-Hsun, Yu 4.3 Invariant Subspace and Eigenspace

Property 4.2.3. Suppose that A, B are two square matrices of the same size. Then
tr(AB) = tr(BA).

Now that we know that the characteristic polynomial is also an intrinsic property of a
linear operator, we naturally hope that the characteristic polynomial and rank determine
the linear operator uniquely. This is however not true.
Example 4.2.1. Consider the two matrices
ñ ô ñ ô
1 0 1 1
, .
0 1 0 1
They both have characteristic polynomial (x − 1)2 and rank 2. However, they are not
similar, since every element is fixed by the first one but not the second one.

4.3 Invariant Subspace and Eigenspace


Invariant subspace is a very useful tool to analyze linear operators. In this chapter, most
of the result will be based on this concept.
Definition 4.3.1. Suppose that T is a linear operator on a vector space V . A vector
subspace W of V is T -invariant if T (W ) ⊆ W .

The intuition that this might be useful is that if we can break V into pieces of invariant
subspaces, then we can kind of “diagonalize” the linear operator.
Property 4.3.1. Suppose that T is a linear operator on a finite dimensional vector
space V . If V1 , V2 , . . . , Vk are T -invariant subspaces such that V = V1 ⊕ · · · ⊕ Vk then
there exists a basis of V such that the matrix form of T is in the form
 
A1
 
 A2 
 
 ... 
 
Ak
where Ai is a dim(Vi ) by dim(Vi ) matrix.

Definition 4.3.2. A linear operator is diagonalizable if there is a basis such that the
matrix form of it is a diagonal matrix. Similarly, a square matrix A is diagonalizable if
there exists invertible square matrix Q suxh that QAQ−1 is diagonal.

Corollary 4.3.1. Suppose that V is an n-dimensional vector space and T is a linear


operator on it. If there exist one dimensional T -invariant subspaces V1 , . . . , Vn such that
V = V1 ⊕ · · · ⊕ Vn , then T is diagonalizable.

This leads us to consider the one dimensional invariant subspaces. If cv(c ∈ F ) is a


one dimensional T -invariant subspace, then we can see that there exists λ ∈ F such that
T v = λv.
Definition 4.3.3. If λ ∈ F satisfies that there exists nonzero v ∈ V such that T v = λv,
then we say that λ is an eigenvalue of T . In this case, v is said to be a eigenvector of T .
We can define the eigenvalues/eigenvectors of a matrix similarly.

47
Hung-Hsun, Yu 4.3 Invariant Subspace and Eigenspace

Property 4.3.2. λ ∈ F is an eigenvalue of T if and only if charT (λ) = 0.

Sketch of Proof. There exists v ̸= 0 such that T v = λv if and only if T − λ id is not


invertible. Therefore λ is an eigenvalue if and only if det(λ id −T ) = 0.
Property 4.3.3. A linear operator/matrix is diagonalizable if and only if there exists
a basis consisting of eigenvectors.

Definition 4.3.4. For any eigenvalue λ of T , the eigenspace of T corresponding to λ is


the vector subspace Eλ := {v|T v = λv}.

Property 4.3.4. Suppose that λ1 , . . . , λk are distinct eigenvalues of T , then Eλ1 ⊕ · · · ⊕


Eλk is a direct sum.

Sketch of Proof. Suppose that v1 + v2 + · · · + vk = 0 where vi ∈ Eλi , then by applying T j


to it, we get that
λj1 v1 + λj2 v2 + · · · + λjk vk = 0.
Choose j = 0, 1, . . . , k − 1, then by Vandermonde’s determinant we know that v1 = v2 =
. . . = vn = 0.
Corollary 4.3.2. If Eλ1 ⊕ · · · ⊕ Eλk = V then T is diagonalizable.

Naturally our hope now is that Eλ1 ⊕· · ·⊕Eλk = V . This is nonetheless not necessarily
true.
Example 4.3.1. Let’s consider the rotation on R2 again. The characteristic polynomial
of rθ is x2 − 2 cos θ + 1. Since this polynomial has no real roots (unless θ = 0, π), rθ has
no eigenvalues. It even does not have any nontrivial proper invariant subspace in R2 .
This issue can be easily solved if we extend this linear operator to C2 over C. In this
case, there are two eigenvalues: eiθ and e−iθ . After some simple calculation, we can see
that rθ can be diagonalized:
ñ ô−1 ñ ô ñ ô
1 1 1 1 e−iθ
rθ = .
i −i i −i eiθ
Therefore we would like to make the assumption that the characteristic polynomial splits
in F . If it does not, we can simply take its “algebraic closure” to try to diagonalize it.

Example 4.3.2. This is however not the only issue that one might encounter. Consider
the matrix ñ ô
1 1
.
0 1
Its characteristic polynomial is (x − 1)2 which splits completely. However E1 is of dimen-
sion 1 and there is just no way to fix it.

From this example we can see that we kind of care about the dimension of eigenspaces
.
Property 4.3.5. If λ is an eigenvalue of T of multiplicity m, then
1 ≤ dim(Eλ ) ≤ m.

48
Hung-Hsun, Yu 4.4 Cayley-Hamilton Theorem

Sketch of Proof. It is clear that dim(Eλ ) ≥ 1. Now suppose that α is a basis of Eλ . We


can extend it to a basis β of V . Then the matrix form of T with respect to β will be of
the form ñ ô
λIn B
0 D
where n = dim(Eλ ). Therefore charT (x) = charλIn (x) charD (x) = (x − λ)n charD (x).
This shows that n ≤ m.
Property 4.3.6. A linear operator/matrix is diagonalizable if and only if its charac-
teristic polynomial splits and for every eigenvalue λ of multiplicaity m, the dimension of
Eλ is also m.

From this, we see that a linear operator/matrix is diagonalizable means that every
eigenspace achieves its maximum possible dimension.
Corollary 4.3.3. If a linear operator/matrix has a splitting characteristic polynomial
whose roots are distinct, then it is diagonalizable.

4.4 Cayley-Hamilton Theorem


Now we are going to deal with the matrices that are not diagonalizable. To achieve this,
we begin with a powerful theorem about linear operator/matrix.
Theorem 4.4.1. (Cayley-Hamilton theorem) Suppose that A is a square matrix, then
charA (A) = 0.

Remark. There is a fake proof for this theorem:

charA (A) = det(AIn − A) = det(0) = 0.

It nonetheless does not work, and as far as the author is concerned, there is no easy way
to fix this proof. The readers are encouraged to point out the abuse of notation/flaw of
logic here.

Sketch of Proof. Suppose that the transpose of the cofactor matrix of tIn − A is B(t).
Then B(t)(tIn − A) = charA (t)In . It is clear that every entry of B(t) is a polynomial in
t with degree at most n − 1. Therefore, we can write

n−1
B(t) = Bi ti .
i=0

Suppose that

n
charA (t) = pi ti ,
i=0

then pi In = Bi−1 − Bi A. Here B−1 , Bn are both zero matrices. Therefore,



n ∑
n
charA (A) = pi Ai = (Bi−1 − Bi A)Ai = B−1 − Bn An+1 = 0.
i=0 i=0

49
Hung-Hsun, Yu 4.5 Generalized Eigenspace

Corollary 4.4.1. Suppose that T is a linear operator on a finite dimensional vector


space, then charT (T ) = 0.

Definition 4.4.1. For a matrix A(or a linear operator on finite dimensional vector
space), the minimal polynomial of A is the polynomial f (x) ∈ F [x] with leading coefficient
1 of the smallest degree such that f (A) = 0.

Note that how Cayley-Hamilton theorem is powerful: If we directly use the fact that
an n by n matrix A lives in an n2 -dimensional vector space, then we can only get that
the degree of the minimal polynomial is at most n2 . However, Cayley-Hamilton theorem
not only tells us that the degree of the minimal polynomial is at most n, but also tells us
that the minimal polynomial divides its characteristic polynomial.
Note that with the concept of the minimal polynomial, we can now differentiate the
matrices ñ ô ñ ô
1 0 1 1
, .
0 1 0 1
The first one has minimal polynomial x−1, while the second one has minimal polynomial
(x−1)2 . Although it is generally not true that merely minimal polynomial can determines
a linear operator up to change of basis, this somehow leads us to consider the polynomials
and “where they vanish.”
Property 4.4.1. Suppose that T is a linear operator and f ∈ F [x], then ker f (T ) is
T -invariant.

4.5 Generalized Eigenspace


To extend the idea at the end of the previous section, we are going to examine the
properties of ker f (T ) where T is a linear operator and f is a polynomial. In particular,
we hope to decompose ker f (T ) into the direct sums of the kernel of smaller polynomials
in T in an attempt to decompose T .
Property 4.5.1. Suppose that T is a linear operator on V and f, g ∈ F [x] are two
polynomials whose gcd is d and lcm is l. Then:
(1) ker f (T ) + ker g(T ) = ker l(T ).
(2) ker f (T ) ∩ ker g(T ) = ker d(T ).

Sketch of Proof. It is easy to see that ker f (T ) + ker g(T ) ⊆ ker l(T ) and ker d(T ) ⊆
ker f (T ) ∩ ker g(T ). To show the opposite direction, we can apply Bezout’s theorem to
show that there exists a, b ∈ F [x] such that af + bg = d. Therefore for every v ∈ l(T ),

bg af
v= (T )v + (T )v ∈ ker f (T ) + ker g(T )
d d

and for every v ∈ ker f (T ) ∩ ker g(T ),

d(T )v = (af + bg)(T )v = (af )(T )v + (bg)(T )v = 0

which implies that v ∈ ker d(T ).

50
Hung-Hsun, Yu 4.6 Jordan Canonical Form

Corollary 4.5.1. Suppose that f ∈ F [x] is a polynomial such that

f = g1 g2 · · · gk

where g1 , . . . , gk are mutually coprime, then

ker f (T ) = ker g1 (T ) ⊕ · · · ⊕ ker gk (T ).

Corollary 4.5.2. Suppose that T is a linear operator on V whose characteristic poly-


nomial splits into linear factors:

charT (x) = (x − λ1 )m1 · · · (x − λk )mk ,

then
V = ker(T − λ1 id)m1 ⊕ · · · ⊕ ker(T − λk id)mk

is a decomposition of V into a direct sum of T -invariant subspaces.

Definition 4.5.1. For any eigenvalue λ of multiplicity m, the generalized eigenspace


corresponding to λ is the subspace ker(T − λ id)m . This is denoted as Eλ′ .

This shows that when the characteristic polynomial splits, the whole space is the
direct sum of the generalized eigenspaces. Therefore we just need to see how T can act
on the generalized eigenspaces.

4.6 Jordan Canonical Form


From the previous section, we have reduced the whole theory of linear operator whose
characteristic polynomial splits into the case where (T − λ id)m vanishes on the whole
space. Now we can also do the trick replacing T with T − λ id, and so the problem is
reduced to the case where T m = 0 for some m ∈ N.
Definition 4.6.1. Suppose that T is a linear operator on a vector space such that there
exists m ∈ N satisfying T m = 0, then we say that T is nilpotent. Similar definition can
be made for a square matrix.

If we can choose a good basis for any nilpotent linear operator, then we can express
any linear operator in a really good form.
Theorem 4.6.1. If T is a nilpotent linear operator on a finite vector space V , then V
be can decomposed into a direct sum of T -invariant subspaces V1 , . . . , Vk such that for
every Vi we can choose a basis {βi1 , . . . , βit } of Vi such that T βi1 = 0, T βi(j+1) = βij .

Sketch of Proof. Suppose that m is the smallest positive integer such that T m = 0 and
β1m , . . . , βlm m extend the basis of ker T m−1 to one of ker T m . Take βi(m−1) = T βim for
i = 1, . . . , lm and suppose that β(lm +1)(m−1) , . . . , βlm−1 (m−1) extend the basis of ker T m−2
union {β1(m−1) , . . . , βlm (m−1) } to the one of ker T m−1 . Continue this operation and we’re
done. To put it formally, one can show it by induction.

51
Hung-Hsun, Yu 4.6 Jordan Canonical Form

Corollary 4.6.1. Suppose that T is a nilpotent linear operator on a finite dimensional


vector space, then there exists a choice of basis such that the matrix form of T is
 
B1
 
 B2 
 
 ... 
 
Bk

where each square block Bi is of the form


 
0 1
 
 0 1 
 
 .. .. 
 . .  .
 
 
 0 1
0

Similarly, suppose that A is a square matrix, then there exists invertible matrix Q such
that QAQ−1 is of the form.

Now combining the fact that the vector spaces can be decomposed into ker(T −λi id)mi ,
we get that
Theorem 4.6.2. Suppose that T is a linear operator on a finite dimensional vector
space whose charateristic polynomial splits, then there exists a choice of basis such that
the matrix form of T is  
J1
 
 J2 
 
 ... 
 
Jk
where each square block Ji is of the form
 
λi 1
 
 λi 1 
 
 ... ... 
 .
 

 λi 1

λi

Here λi is an eigenvalue of T . Similarly, suppose that A is a square matrix, then there


exists invertible matrix Q such that QAQ−1 is of the form.

Definition 4.6.2. The blocks are called Jordan blocks. The matrix form made up of
Jordan blocks is said to be in Jordan canonical form.

Remark. Actually it is easy to show that Jordan canonical form of linear opera-
tors/matrices is unique up to a permutation of Jordan blocks. This is fairly trivial and
is left as an exercise.

Now we basically know that in order to understand a linear operator, it suffices to


compute its Jordan canonical form (in the algebraic closure of the field if necessary).

52
Hung-Hsun, Yu 4.7 Application 1: Homogeneous Linear Dif Eq

However, this does not suffice when we want to investigate the relation between two
linear operators; moreover, this does not work if we don’t want to extend the field to
its algebraic closure. For the former issue we basically have different tricks for different
situation. For the latter, we can in fact develop another canonical form, which is called
rational form. We will get into this after we introduce the concept of module.

4.7 Application 1: Homogeneous Linear Dif Eq


As the first example of the application of Jordan canonical form, we are going to show
a way to solve the system of homogeneous first order linear differential equations in one
variable:

x′1 (t) = a11 x1 (t) + a12 x2 (t) + · · · + a1n xn (t)


x′2 (t) = a21 x1 (t) + a22 x2 (t) + · · · + a2n xn (t)
···

xn (t) = an1 x1 (t) + an2 x2 (t) + · · · + ann xn (t)

where aij ∈ C and xi (i ∈ C) are differentiable functions from C to C. We can also write
it as x′ (t) = Ax(t) where A ∈ Mn×n (C) and x is a differentiable function from C to Cn .
We can first investigate the easiest case where n = 1. In this case, we have x′ (t) = ax,
which shows that
∫ t ∫ t ∫ x(t)
a dx
at = ads = ′
dx(s) = = log x(t) − log x(0),
0 0 x (s) x(0) x

and so x(t) = ceat where c ∈ C is a constant.


This immediately tells us how one should solve the linear equation when A is diago-
nalizable. Suppose D = QAQ−1 is diagonalized, then

x′ = Ax = Q−1 DQx, ⇒ (Qx)′ = Qx′ = D(Qx).

Therefore it suffices to compute the solution y to y ′ = Dy and take x = Q−1 y. Since


y ′ = Dy is simply n independent homogeneous linear differential equations, we conclude
that:
Theorem 4.7.1. Suppose that x satisfies x′ = Ax where A is diagonalizable, then xi
is a linear combination of eλt where λ is an eigenvalue of A. Moreover, the dimension of
the solutions to x′ = Ax is n.

When A is not diagonalizable, we can instead take Q such that QAQ−1 is in Jordan
Canonical form. Therefore the problem is reduced to calculate the solution y to y ′ = Jy
where J is a Jordan block.
Lemma 4.7.1. Suppose that J is an n by n Jordan block corresponding to an eigenvalue
λ and y ′ = Jy, then for any i, yi is of the form Pi (t)eλt where Pi is a polynomial of degree
at most n − i. Moreover, the dimension of the solutions is n.

Sketch of Proof. We can do induction on n − i. The case i = n is already dealt with


above. For general case, we know that yi′ = λyi + λyi+1 = λyi + Pi+1 (t)eλt . Let Pi′ = Pi+1 ,
then Pi (t)eλt is a particular solution. Therefore (Pi (t) + c)eλt is the general solution.

53
Hung-Hsun, Yu 4.8 Application 2: Markov Chain

Theorem 4.7.2. Suppose that x satisfies x′ = Ax, then xi is a linear combination


of xi eλt where λ is an eigenvalue of A and i is a non-negative number less than the
multiplicity of λ. Moreover, the dimension of the solutions to x′ = Ax is n.

Remark. Another way to approach this is to define the exponential of a matrix. We


will expect that eAt is a square matrix that satisfies (eAt )′ = AeAt , and so the linear
combinations of the columns of eAt are the solutions of x′ = Ax. We will look into this
later, but the readers are encouraged to attempt to define eAt as
A2 2 A3 3
eAt = In + At + t + t + ···
2! 3!
and see if eAt has the desired property.

4.8 Application 2: Markov Chain


In this section, we are going to introduce a model that is broadly used in various situation
to simulate random events: Markov chain. In this model, we assume that the probability
of one event occurring only depends on the previous event. We will call this Markov
property.
Example 4.8.1. Suppose that there is an ant on a real line. At each second, the ant
goes left with probability 1/2 and goes right with probability 1/2, and the ant moves a
unit each second. Then this process has Markov property, since the distribution of the
position of the ant only depends on the position at the previous second.

If there are only finitely many states, then we can use a matrix to record the change
of distribution each step.
Definition 4.8.1. Suppose that there are n states X1 , . . . , Xn in a Markov process,
and the probability of the state Xj transitioning to Xi is mij , then the matrix M is the
transition matrix corresponding to this Markov process.

Example 4.8.2. Suppose that there is an ant on a simple graph G with n vertices
v1 , . . . , vn . At each step, the ant randomly picks a neighbor and get to the chosen vertex.
Then this is a Markov process, and the transition matrix M satisfies
{
0 if vi , vj are not neighbors
mij = 1 .
dj
if vi , vj are neighbors
This is called a random walk on G.

Property 4.8.1. If M is a transition matrix, then every column sums to 1, and every
entry is non-negative. Moreover, suppose that at i-th step the distribution is p (i.e. the
probability of the process being at state Xi is pi ), then the distribution at step i + 1 is
M p.

In this kind of problem, we are usually interested in the probability distribution after
a long time. That is, given an initial distribution π, we want to calculate limk→∞ M k π.
Since this is just some convex combination of the columns of limM →∞ Ak , it suffices to
calculate limM →∞ Ak .

54
Hung-Hsun, Yu 4.8 Application 2: Markov Chain

Lemma 4.8.1. Suppose that M is a matrix whose entries are non-negative and whose
columns all sum to 1. Then the eigenvalue of M with the largest absolute value is 1.
Moreover, if for any proper nonempty subset S of [n], there exists i ∈ S, j ̸∈ S such that
mij ̸= 0, then the eigenvector of M corresponding to 1 is unique (up to scaling).

Sketch of Proof. It suffices to prove the statement for M T . It is clear that 1 is an eigen-
value since M T 1 = 1 where 1 is the all-one vector. Suppose that λ is an eigenvalue of
M T with eigenvector v, and suppose that vi has the largest absolute value among all the
entries of v, then

n ∑
n ∑
n
|λ||vi | = |λvi | = | mji vj | ≤ mji |vj | ≤ mji |vi | = |vi |.
j=1 j=1 j=1

Therefore |λ| ≤ 1.
Now if for any proper nonempty subset S of [n], there exists i ∈ S, j ̸∈ S such that
mij ̸= 0, suppose that M t v = v. Let S be the indices set

{k||vk | ̸= max |vi |}

and suppose that j ∈ S, i ̸∈ S such that mji ̸= 0. Then



n ∑
n ∑
n
|vi | = |λvi | = | mki vk | ≤ mki |vk | ≤ mki |vi | = |vi |.
k=1 k=1 j=1

Since the equality holds, we have for each k either mki = 0 or vk = vi . This is a
contradiction.
Lemma 4.8.2. Suppose that M is a matrix whose entries are non-negative and whose
columns all sum to 1. Then E1′ = E1 .

Sketch of Proof. Suppose not, then ||(M T )k ||∞ → ∞ as k → ∞ where ||(M T )k ||∞ is the
largest absolute value of the entries of (M T )k . However, we have (M T )k 1 = 1, which is
a contradiction since the entries of M k are non-negative.
Theorem 4.8.1. Suppose that M is a transition matrix corresponding to a Markov
process, and every eigenvalue of M with absolute value 1 is 1, then limk→∞ M k exists.
Moreover, the columns of limk→∞ M k are eigenvectors of M corresponding to 1.

Sketch of Proof. Consider the Jordan canonical form of M . Every block either has diag-
onal entries with absolute value less then 1 or is a single 1. In the first case, its power
tends to 0. In the second case, its power is always 1. Therefore limk→∞ M k exists. It is
then clear that every column of M k is an eigenvector corresponding to 1.
Corollary 4.8.1. Suppose that M is a transition matrix corresponding to a Markov
process, and every eigenvalue of M with absolute value 1 is 1 with the unique unit
eigenvector, then the distribution tends to that unit eigenvector no matter what the
initial distribution is.

Corollary 4.8.2. (Random walk on undirected graph) If an undirected graph G is


non-bipartite and connected, then the probability distribution always tends to v where
1T v = 1 and vi : vj = di : dj . Here 1 is the all-one vector.

55
Hung-Hsun, Yu 4.9 Perron-Frobenius Theorem

Sketch of Proof. One can prove that M has an eigenvalue λ ̸= 1 such that |λ| = 1 if and
only if G is bipartite (in fact λ = −1 in this case), and dim E1 > 1 if and only if G is not
connected.
Corollary 4.8.3. (Random walk on directed graph) If a directed graph G is aperiodic
(i.e. the gcd of lengths of cycles is 1) and strongly connected, then the probability
distribution tends to a unique distribution.

Sketch of Proof. One can prove that M has an eigenvalue λ ̸= 1 such that |λ| = 1 if and
only if G is periodic (in fact λ is a primitive k-th root of unity if and only if G has period
k), and dim E1 > 1 if and only if G is not strongly connected.

4.9 Perron-Frobenius Theorem


In the previous section there is something striking that is lurking behind. It seems that
matrices with non-negative entries have well-behaved eigenvalues and eigenvector. In this
section, we will spell out the actual phenomenon.
Definition 4.9.1. If a matrix A/a vector v has non-negative entries, then we say that
it is non-negative and denote this by A ≥ 0/v ≥ 0. If the entries are furthermore positive,
then we say that it is positive and denote this by A > 0/v > 0.

Definition 4.9.2. For a matrix A/a vector v, |A|/|v| is the matrix/vector obtained by
taking absolute value entry-wise.

Definition 4.9.3. The spectral radius of a square matrix A is the eigenvalue of A with
the largest absolute value. This is denoted by ρ(A).

Theorem 4.9.1. (Perron’s Theorem) Suppose that A > 0 is a square matrix, then:
(1) ρ(A) is an eigenvalue of A of multiplicity 1;
(2) If λ is an eigenvalue of A such that |λ| = ρ(A), then λ = ρ(A);
(3) The eigenvector corresponding to ρ(A) is positive;
(4) If v is an eigenvector that is non-negative, then Av = ρ(A)v.

Sketch of Proof. Suppose that λ is an eigenvalue of A such that |λ| = ρ(A) and v is the
corresponding eigenvector, then

ρ(A)|v| = |λv| = |Av| ≤ |A||v| = A|v|.

Suppose for the sake of contradiction that the equality does not hold, then u := A|v| −
ρ(A)|v| ≥ 0 and u ̸= 0. Since Au > 0, there exists ε such that Au > ερ(A)A|v|. This
shows that
A2 |v| − Aρ(A)|v| > ερ(A)A|v|
or equivalently
A
A|v| > A|v|.
(1 + ε)ρ(A)
Let B = (1 + ε)−1 ρ(A)−1 A, then B(A|v|) > A|v|. As a consequence, B k (A|v|) > A|v|
for any k ∈ N. However, ρ(B) = (1 + ε)−1 < 1 implies that limk→∞ B k = 0. Therefore

56
Hung-Hsun, Yu 4.9 Perron-Frobenius Theorem

0 > A|v|, which is a contradiction. Therefore 0 < A|v| = ρ(A)|v|, and so ρ(A) is an
eigenvalue and the corresponding eigenvector |v| is positive. Moreover, since the equality
must hold, we have that arg(v1 ), . . . , arg(vn ) are the same. This implies that λ = ρ(A).
Now if v ≥ 0 is an eigenvector of A such that Av = λv for some λ, suppose that w
is the eigenvector of AT corresponding to ρ(A). Then λwT v = wT Av = ρ(A)wT v, which
implies that λ = ρ(A).
It remains to show that ρ(A) is of multiplicity 1. We begin with showing that
dim Eρ(A) = 1. Suppose that u, v > 0 satisfy Au = ρ(A)u, Av = ρ(A)v, then A(u − cv) =
ρ(A)(u − cv). Pick c such that one of entry of u − cv is zero and the remaining are
non-negative, then ρ(A)(u − cv) = A(u − cv) > 0 if u − cv ̸= 0, which is impossible.
Therefore u = cv and so dim Eρ(A) = 1.

Finally, we want to show that dim Eρ(A) = 1. To show this, we first notice that
if dim Eρ(A) > 1 then ||C ||∞ → ∞ where C = ρ(A)−1 A. Here ||C k ||∞ is the largest
′ k

absolute value of the entries of C k . Since ρ(C) = 1, there is a positive vector x such
that Cx = x. Therefore C k x = x for any k ∈ N, which shows that ||x||∞ → ∞, a

contradiction. Therefore dim Eρ(A) = 1.

Using the same method, one can prove that


Theorem 4.9.2. (Perron’s Theorem for non-negative matrices) Suppose that A ≥ 0 is
a square matrix, then:
(1) ρ(A) is an eigenvalue of A;
(2) There exists an eigenvector corresponding to ρ(A) that is non-negative.

Note that because of the presence of zero entries, the uniqueness and the positiveness
both fail in this case. Frobenius found that most of the uniqueness and positiveness still
hold in a certain class of non-negative matrices.
Definition 4.9.4. A non-negative square matrix A is said to be irreducible if there
does not exist permutation matrix P such that P −1 AP is of the form
ñ ô
X Y
0 Z

where X, Z are square matrices.

Property 4.9.1. For a non-negative n by n matrix A, the following are equivalent:


(1) A is irreducible;
(2) There exists k ∈ N such that (In + A)k > 0;
(3) (In + A)n−1 > 0.

Sketch of Proof. Let G be a directed graph such that there is an edge from vi to vj if and
only if aji is positive. Then the three conditions are all equivalent to that G is strongly
connected.

Theorem 4.9.3. (Frobenius’ Theorem) Suppose that A ≥ 0 is an irreducible square


matrix, then:
(1) ρ(A) > 0 is an eigenvalue of A of multiplicity 1;
(2) The eigenvector corresponding to ρ(A) is positive;
(3) If v is an eigenvector that is non-negative, then Av = ρ(A)v.

57
Hung-Hsun, Yu 4.10 Random Problem Set

Sketch of Proof. Let B = (In + A)n−1 , then by Jordan canonical form it is clear that
the eigenvalues of B are (1 + λ)n−1 where λ is an eigenvalue of A. Therefore ρ(B) =
(1 + ρ(A))n−1 . By Perron’s theorem the multiplicity of (1 + ρ(A))n−1 is 1 w.r.t. B, and so
the multiplicity of ρ(A) is at most (and therefore equal to) 1 w.r.t. A. Moreover, suppose
that v is an eigenvector corresponding to λ of A, then
( ) ( )

n−1
n−1 i ∑ n−1
n−1
Bv = Av= λi v = (1 + λ)n−1 v.
i=0 i i=0 i
Therefore (2), (3) are both direct corollary of Perron’s theorem. Now suppose that
ρ(A) = 0, then Av = ρ(A)v = 0 for some v > 0, which is apparently a contradiction.
Therefore ρ(A) > 0.

4.10 Random Problem Set


1. (4.1) Categorize the possible linear operators on V over F2 such that dim V = 2 by
direct computation.
2. (4.2) Show that for any f ∈ F [x] that is not constant, there exists a matrix such
that its characteristic polynomial is f .
3. (4.2) We’ve shown that tr(AB) = tr(BA) if A, B are of the same size. Actually,
the more general holds: AB, BA actually share the same characteristic polynomial.
This is trivial when at least one of A, B is invertible. For the general case, consider
the matrices ñ ô ñ ô
xIn A In 0
C= ,D =
B In −B xIn
and show that AB, BA share the characteristic polynomial.
4. (4.3) If M is a real square matrix such that tr(M 2 ) < 0, show that M is not
diagonalizable.
5. (4.3) Suppose that M is a 2 by 2 real matrix such that tr(M )2 > 4 det(M ). Show
that M is diagonalizable.
6. (4.4) Prove that the minimal polynomial of an n by n matrix has degree at most
n2 without using Cayley-Hamilton theorem.
7. (4.6) Construct two matrices that have the same rank, same characteristic polyno-
mial and same minimal polynomial, but are not similar to each other. Hint: the
smallest example is 4 by 4.
8. (4.6) Suppose that A is an n by n matrix such that 0 is a eigenvalue of A with
multiplicity m. Show that for any v ∈ V ,
Ak v = 0 for some k ∈ N ⇔ Am v = 0.

9. (4.6) For any k ∈ N, compute Ak where


 
2 1 2 −2

1 0 2 1 −1
A= 


2 0 0 1 0
0 1 1 0

58
Hung-Hsun, Yu 4.10 Random Problem Set

by either Cayley-Hamilton theorem or Jordan canonical form. Compute limk→∞ Ak


accordingly.

10. (4.6) Suppose that A, B are two diagonalizable matrices that commute, i.e. AB =
BA. Show that A, B are simultaneously diagonalizable, i.e. there exists an invert-
ible matrix Q such that QAQ−1 , QBQ−1 are both diagonal.

11. (4.7) Using the result in Section 4.7, show that if xn + an−1 xn−1 + · · · + a0 has roots
λ1 , . . . , λk with multiplicity m1 , . . . , mk , then the general solution of

f (n) (t) + an−1 f (n−1) (t) + · · · + a1 f ′ (t) + a0 f (t) = 0

is P1 (t)eλt + · · · + Pk (t)eλt where Pi is a polynomial of degree at most mi − 1.

12. (4.7) Find all finite dimensional vector spaces V containing differentiable functions
f in one variable such that f ∈ W implies f ′ ∈ W .

13. (4.9) Show that there exists irreducible matrix A such that there is an eigenvalue
λ of A such that λ ̸= ρ(A) but |λ| = ρ(A).

59
Hung-Hsun, Yu 4.10 Random Problem Set

60
Chapter 5

Symmetry and Group Action

In this chapter we will introduce the concept of group action. This is an important tool
to learn about groups, and we can derive several important theorems from it. Before
jumping into it, we will first begin with a concrete example of group action, namely
symmetry. While developing the theory of symmetry, we will at the same time encounter
with more examples of finite non-abelian groups.

5.1 Isometry
Let P be the space Rn . Note that I write P instead of Rn here because that I don’t want
to endow P with the vector space structure. Instead, I want to think of P “geometrically”.
Specifically, for any point x, y in P , I only care about the vector from x to y, denoted by
x − y. This is a vector in Rn .
Definition 5.1.1. An isometry of P is a bijective function f : P → P such that
||f (x) − f (y)|| = ||x − y|| for any x, y ∈ P . In other words, an isometry is a distance-
preserving bijection. All isometries form a group, which is called the isometry group (of
P ). We will denote it by Mn .

Example 5.1.1. For any a ∈ Rn , translation ta sending x to x + a is an isometry. If


P is 2-dimensional, then for any x ∈ P and θ ∈ R, rotation ρx,θ rotating the whole plane
about x by angle θ is also an isometry. For any line l ⊆ P , reflection rl about l is also an
isometry.

To determine the isometry group of P , we first notice that f is somehow a linear


operator.
Lemma 5.1.1. Suppose that f is an isometry, then x−y = x′ −y ′ implies f (x)−f (y) =
f (x′ ) − f (y ′ ). Therefore f : Rn → Rn mapping v to f (x + v) − f (x) is well-defined. In
this case, f is a linear operator.

Sketch of Proof. Suppose that x − y = x′ − y ′ , then x, y, x′ , y ′ form a parallelogram.


Elementary geometry shows that f (x), f (y), f (x′ ), f (y ′ ) must also form a parallelogram,
and so f (x) − f (y) = f (x′ ) − f (y ′ ). Therefore f is well-defined. For any u, v ∈ Rn ,

f (u+v) = f (x+u+v)−f (x) = (f (x+u+v)−f (x+v))+(f (x+v)−f (x)) = f (u)+f (v).

61
Hung-Hsun, Yu 5.1 Isometry

This shows that f (−v) = −f (v). Now for any v ∈ Rn and c ∈ R+ we know that

||f (v) + f (cv)|| = ||f (x + (c + 1)v) − f (x)|| = (c + 1)||v|| = ||v|| + c||v||

= ||f (x + (c + 1)v) − f (x + cv)|| + ||f (x + cv) − f (x)|| = ||f (v)|| + ||f (cv)||.
Therefore there exists d ≥ 0 such that f (cv) = df (v). It is then clear that d = c.
Now f is a linear operator that preserves the norms of the vectors. This is a special
kind of linear operator that we will be interested in.
Property 5.1.1. Suppose that the matrix form of f corresponding to the standard
basis is M . Then M T = M −1 . Conversely, it M T = M −1 , then M preserves the norms
of the vectors.

Sketch of Proof. Since ||v||2 = v T v, we have that

M T = M −1
⇔aT M T M a = aT a ∀a ∈ Rn
⇔||M a|| = ||a|| ∀a ∈ Rn .

Definition 5.1.2. Suppose that M is a square matrix such that M T = M −1 , then we


say that M is orthogonal. All n by n orthogonal matrices form a group, which is denoted
by On . The subgroup containing orthogonal matrices with determinant one is denoted
by SOn .

Theorem 5.1.1. The map π : Mn → On is an epimorphism with kernel Tn , the group


of translations.

Corollary 5.1.1. Let φ : On → Aut(Tn ) be the homomorphism such that φM (ta ) =


tM a . Then Mn ∼
= Tn ⋊φ On .

Sketch of Proof. Choose a point x ∈ P . Then we can embed On into Mn by sending M


to ϕx,M , where
ϕx,M (y) = x + M (y − x).
Then
φM (ta ) = ϕx,M ta ϕx,M −1 = tM a .

Note that there is no canonical isomorphism between Mn and Tn ⋊φ On : we have


to make a choice of the origin, and there is no reason at all to favor one point over the
others. We can see choosing a origin as choosing a coordinate, and thus we will have the
change of coordinate formula:
Property 5.1.2. (Change of coordinate formula) For any x, y ∈ P and M ∈ On , the
two isometries ϕx,M , ϕy,M are related by

ϕy,M = ta ϕx,M = ϕx,M tb

where a = y − ϕx,M (y), b = x − ϕy,M −1 (x).

62
Hung-Hsun, Yu 5.2 Isometry of Plane

5.2 Isometry of Plane


To get a better understanding of isometry, let’s look into the case where P is 2-dimensional.
From the previous section, we know that it suffices to understand the structure of O2 .
Moreover, since
det(M )2 = det(M M T ) = 1
for any M ∈ O2 , we know that det(M ) = ±1. Therefore, if det(M ) = 1 then M ∈ SO2 ;
if det(M ) = −1, then ñ ô
1 0
M
0 −1
is in SO2 . Therefore it is enough to know what matrices there are in SO2 .
Property 5.2.1. Every matrix in SO2 is a rotation matrix. That is, for any M ∈ SO2
there exists θ ∈ R such that ñ ô
cos θ − sin θ
M= .
sin θ cos θ
In this note, we will denote this by ρθ .

Property 5.2.2. Every matrix in O2 is either of the form ρθ or ρθ r, where r is the


reflection ñ ô
1 0
.
0 −1

From now on, for simplicity, we will identify P with R2 , which induces an embedding
of O2 into M2 .
Property 5.2.3. For every isometry in M2 , it is either of the form ta ρθ or ta ρθ r.

Definition 5.2.1. In the first case, we say that the isometry is orientation-preserving.
In the second case, we say that the isometry is orientation-reversing.

Corollary 5.2.1. There are three kinds of isometry of plane: translation, rotation,
and glide reflection Here glide reflection gv,l is the composition of reflection about l and
translation by v, where v is parallel to l.

Sketch of Proof. We first deal with the case ta ρθ . For any x ∈ P we know that ρθ =
t−ρx,θ (0) ρx,θ . Therefore ta ρθ = ρx,θ if and only if a = ρx,θ (0) = (I2 − ρθ )x. Note that

det(I2 − ρθ ) = charρθ (1) = 2(1 − cos θ).

Therefore there is always such x unless θ ∈ 2πZ. Hence the isometries in this case can
only be translation or rotation.
Now the remaining is the second case ta ρθ r. We will first show that ρθ r is a reflection
about a line passing through the origin. In other words, there exists ϕ such that ρθ r =
ρϕ rρ−ϕ . A simple calculation shows that picking ϕ = θ/2 suffices. Now it remains to
show that we can somehow “reduce” the component of a that is perpendicular to the
line. For simplicity, let’s WLOG assume that the reflection is r, the reflection about the
x-axis. Then
ta r = ga1 ,y= 1 a2 .
2

63
Hung-Hsun, Yu 5.3 Symmetry group

From this, one can see that translation is somehow the degeneration of rotation.
Intuitively, translation is the rotation about the “infinite point”. Therefore sometime we
see translations as a part of generalized rotations.
Property 5.2.4. The composition of two generalized rotations is a generalized rotation.
The composition of two glide reflection is also a generalized rotation. The composition
of one generalized rotation and one glide reflection is a glide reflection.

5.3 Symmetry group


Definition 5.3.1. Given a figure F on a plane. The symmetry group of F is the
subgroup of isometries that map F to F .

Example 5.3.1. The symmetry group of a circle is the rotations about the center and
the reflections about the line passing through the center. The symmetry group of the
x-axis is the translation ta where a is parallel to the x-axis.

We can also do this in the reverse direction. That is, given a subgroup of the isome-
tries, can we generate a figure that has the given a subgroup as its symmetry group?
The simplest way is for a given subgroup G of the isometries, first put a figure F ′ on
a plane. Then we draw the figure F = ∪g∈G g(F ′ ). It is clear that every element g in G
fixes F . The only thing that we need to do is to make F ′ as asymmetric as possible, and
hope that the isometry group of F is precisely G.
Example 5.3.2. Consider the subgroup G generated by g(1,0),y=0 and ρ(0,0),π . The figure
that the funny stickman generates is the following diagram.

··· ···
Figure 5.1: Running Stickmans

This figure’s symmetry group is precisely G. For instance, this figure is fixed by t(2,0) .
2
On the other hand, t(2,0) = g(1,0),y=0 .

Example 5.3.3. Now consider the subgroup SO2 w.r.t. some coordinate. Suppose that
F is a figure that is fixed by SO2 , then F is a union of some circles centered at the origin.

64
Hung-Hsun, Yu 5.4 Lattice Group and Point Group

Therefore F is also fixed by elements in O2 . This shows that SO2 is not realizable as the
symmetric group of some figure.

The reason that the copy and paste method fails is that if we try to generate F with
a stickman, they will just overlap with each other too much and one can not tell that
the figure is generated by a stickman at all. This guides us to consider the subgroups of
isometries that are not “that dense.”


Figure 5.2: Sad stickman losing its contour

Definition 5.3.2. A subgroup G of isometries is discrete if there exists ε > 0 such


that for any nonzero a ∈ R2 or θ ∈ R such that ta ∈ G or ρθ ∈ G, we have ||a|| > ε or
|θ| > ε. In other words, it is discrete if the length of a nontrivial translation/the angle of
a nontrivial rotation cannot be arbitrarily small.

Theorem 5.3.1. For every discrete subgroup G of the isometries of plane, there exists
a figure having G as its symmetry group.

Sketch of Proof. We won’t give a complete proof of it in this note. However, after we
deduce some useful properties of discrete symmetry groups, one can do a case-by-case
analysis and construct the corresponding figure of each type of discrete symmetry group.

5.4 Lattice Group and Point Group


We know that we can kind of split Mn into two parts: Tn and On . For any subgroup G
of Mn , we can do the similiar thing.
Definition 5.4.1. Suppose that G is a subgroup of the isometry group, then the lattice
group L of G is the additive subgroup of Rn such that v ∈ L if and only if tv ∈ G. The
point group G of G is the image of G under π where π is the epimorphism defined above
from Mn to On .

Property 5.4.1. G/L ∼


= G.

65
Hung-Hsun, Yu 5.5 Crystallographic Restriction

Therefore, to understand the discrete symmetry groups, we only need to understand


the discrete lattice groups, the discrete point groups, and the interaction between the
lattice groups and the point groups. The first two will be studied in the next chapter.
Let’s look at the last one first.
Property 5.4.2. For any v ∈ L and M ∈ G, we have M v ∈ L.

Sketch of Proof. Suppose that ϕx,M ∈ G, then

tM v = ϕx,M tv ϕ−1
x,M ∈ G.

5.5 Crystallographic Restriction


In this section, we will prove the crystallographic restriction of the symmetry groups in
a plane. This is a powerful theorem that reveals some restriction on the possible lattice
groups. But before that, let’s first determine all the discrete lattice groups and discrete
point groups in a plane.
Property 5.5.1. Suppose that L is the lattice group of G ≤ M2 where G is discrete,
then one of the following holds:
(1) L = {0};
(2) L = {na|n ∈ Z} for some 0 ̸= a ∈ R2 ;
(3) L = {na + mb|m, n ∈ Z} for some linearly independent a, b ∈ R2 .

Sketch of Proof. Suppose that L ̸= {0}, then choose a′ ∈ L. Since L is discrete, there is
a ∈ Ra′ ∩ L that is nonzero but closest to 0. It is then clear that Ra ∩ L = Za.
If Za ̸= L, choose b′ such that a, b′ are linearly independent. Consider the map
π : R2 → R mapping xa + yb′ to y. Then π(L) is nontrivial and discrete (this is kind
of not obvious but is left as an exercise), and so by the same argument we know that
π(L) = Zr for some r ∈ R. Suppose that π(b) = r, then by ker π ∩ L = Za we know that
L = Za + Zb.
Property 5.5.2. Suppose that G is the point group of G ≤ M2 where G is discrete,
then one of the following holds:
(1) G = {ρ2kπ/n |k = 0, . . . , n − 1} for some n ∈ N;
(2) G = {ρ2kπ/n , ρ2kπ/n r|k = 0, . . . , n − 1} for some n ∈ N where r is a reflection.

Sketch of Proof. It suffices to show that when det(G) = {1} we have

G = {ρ2kπ/n |k = 0, . . . , n − 1}

for some n ∈ N. Let θ be the smallest positive angle such that ρθ ∈ G, then G = ⟨ρθ ⟩.
Now choose n ∈ N such that n is the smallest positive satisfying nθ ≥ 2π. It is clear that
the equality must hold.
Note that in the first case, it is clear that G ∼ = Cn . In the second case we see that
F ∼= ⟨x, r⟩ where x n
= 1, r 2
= 1 and rxr −1
= x −1
. This is called the dihedral group, and
is denoted by Dn . An easy way to visualize this is to think of it as the symmetry group
of a regular n-gon.

66
Hung-Hsun, Yu 5.6 Group Action

Example 5.5.1. D1 ∼
= C2 , D 2 ∼
= K4 , D 3 ∼
= S3 .

Theorem 5.5.1. (Crystallographic restriction) Suppose that G is a discrete symmetry


group of a plane figure whose lattice group is nontrivial, then every rotation in G has
order 1, 2, 3, 4 or 6.

Sketch of Proof. Since G is discrete, there is a shortest nonzero vector v in L. Let ρθ be


a rotation in G. Suppose first that θ = 2π
n
for n > 6. Then ρθ a − a is a shorter vector in
L, which is a contradiction. Now suppose that θ = 2π 5
, then ρθ a + a is a shorter vector
in L, which is again a contradiction. Therefore any element in G can only have order
1, 2, 3, 4 or 6.

Remark. In fact, one can actually find all the discrete symmetry group on a plane.
There are 7 types of discrete symmetry groups, called the frieze groups, with one-
dimensional lattice groups and 17 types of discrete symmetry groups, called the wallpaper
groups, with two-dimensional lattice groups.

Remark. We will show that every orientation-preserving orthogonal matrix is a rota-


tion in R3 . A consequence of this is that the crystallographic restriction also holds in the
three dimensional case.

5.6 Group Action


In fact, isometry and symmetry are just special cases of a more abstract concept. For
example, we can think of the automorphisms of a group G as isometries. The automor-
phisms form a group, and for any subgroup H of G we can consider the automorphisms
that fix H, which is an analogy of symmetry.
There is a more general concept of whom isometry and automorphism are just special
cases.
Definition 5.6.1. Let G be a group and S be a set. A function ∗ : G × S → S is a
group action if it satisfies the following conditions:
(1) 1 ∗ s = s ∀S;
(2) (gg ′ ) ∗ s = g ∗ (g ′ ∗ s) ∀g, g ′ ∈ G, s ∈ S.
If there is no ambiguity, we will drop the ∗ sign, and we will say that G acts on S.

Example 5.6.1. The group of isometries Mn acts on Rn . The group of automorphism


Aut(G) acts on G. The symmetric group Sn acts on the index set {1, . . . , n}.

Property 5.6.1. For any element g ∈ G, the map S → S defined by s 7→ gs is bijective.

Corollary 5.6.1. The data of a group action is precisely a group homomorphism from
G to the symmetric group of S.

Definition 5.6.2. A permutation representation of a group action G on a set S is


a homomorphism from G to the symmetric group of S such that g is mapped to the
permutation s 7→ gs.

67
Hung-Hsun, Yu 5.6 Group Action

When it comes to group action, the orbits and the stabilizers are usually considered.
Definition 5.6.3. Suppose that G acts on a set S. For any s ∈ S the orbit of s is the
set
orbG (s) := {gs|g ∈ G}.
The stabilizer of s is the subgroup

Gs := {g ∈ G|gs = s}.

Example 5.6.2. Consider the action of Dn on the set of vertices of regular n-gon. The
orbit of a vertex is the whole set. The stabilizer of a vertex is the subgoup {1, r} where
r is the reflection about the line passing through the vertex.

Note that the binary relation defined by s ∼ s′ ⇔ s ∈ orbG (s′ ) is an equivalence


relation. Therefore the equivalence classes, in this case the orbits, form a partition of S.
Example 5.6.3. Consider the action of D4 on the set of pairs of vertices of a square.
The pairs of adjacent vertices form an orbit. The pairs of non-adjacent vertices form
another orbit. These two orbits form a partition of the pairs of vertices.

Besides, orbits and stabilizers have some really nice relations. One can think of
stabilizers as subgroups and orbits as the corresponding cosets.
Property 5.6.2. Suppose that G acts on S, and s is an elements in S. Then
(1) Ggs = gGs g −1 for any g ∈ G;
(2) (Counting formula for group action) If G is finite, then | orbG (s)||Gs | = |G|.

Sketch of Proof. (1) is trivial. To show (2), it suffices to show that elements in orbG (s)
are in bijection with the cosets of Gs .

Example 5.6.4. Suppose that H ≤ G is a subgroup, then G acts on the cosets of H


naturally. The stabilizer of H is H itself. Moreover, for any coset aH, its stabilizer is
aHa−1 . Therefore H is normal if and only if any h ∈ H fixes all the cosets.

Example 5.6.5. Consider the rotational group G that fixes a tetrahedron. This group
acts on the set of faces of tetrahedron. It is clear that the orbit of any face is the whole set.
Moreover, for any face f , there are three elements in G that fix f , namely the rotation
about the axis through the center of f by 0, 2π/3 and 4π/3. Therefore,

|G| = |Gf || orbG (f )| = 12.

Now consider the group action of G on the set of all edges. It is clear that the orbit is
still the whole set. Therefore by the counting formula, for any edge e there should be two
elements in G that fix e. We can construct this elements explicitly: Suppose that e′ is
the unique edge that has no intersection with e, then the rotation about the line passing
the midpoints of e, e′ by π fixes e and e′ .
Therefore for any face we find two corresponding elements that have order 3, and for
any pair of disjoint edges we find one corresponding element that has order 2. Including
the identity, we find exactly 12 elements in G, 8 of them being of order 3 and 3 of them
being of order 2. In fact, G is isomorphic to A4 .

68
Hung-Hsun, Yu 5.7 Finite Subgroup of SO3

5.7 Finite Subgroup of SO3


As an application of what we’ve learned, we will classify all the finite subgroups of SO3
in this section.
Property 5.7.1. Every orientation-preserving matrix in O3 is a rotation about an axis.

Sketch of Proof. It suffices to show that it M is a 3 by 3 orthogonal matrix that has


determinant one, then M has 1 as an eigenvalue. Note that

charM (λ) = det(λI3 − M ) = λ3 det(I3 − λ−1 M ) = −λ3 det(M ) det(λ−1 I3 − M −1 )


= −λ3 charM T (λ−1 ) = −λ3 charM (λ−1 ).

Therefore λ and λ−1 has the same multiplicity. Since there are 3 eigenvalues (counted
with multiplicity) and the product of them is 1, at least one of them is 1.

Definition 5.7.1. A unit vector p is a pole of a rotation ρ if ρp = p. A spin is a pair


(ρ, p) where ρ is a rotation and p is one of its poles.

Note that every non-identity rotation has exactly two poles, and there is no preferred
choice of which should be included in the spin. This can be explained by the fact that
when two people see a rotation through the axis from opposite directions, one would say
that the rotation is clockwise while the other sees counterclockwise.
Now suppose that G is a finite subgroup of SO3 . Define the poles of G to be the poles
of non-identity rotations in G. If p is a pole of g, then hp is a pole of hgh−1 . Therefore
G acts on its poles.
Lemma 5.7.1. Let G be a finite subgroup of SO3 , and let G act on its poles. Suppose
that O1 , . . . , Ok are the orbits, and for any pi ∈ Oi the order of stabilizer of pi is ri . Then
Ç å

k
1 2
1− =2− .
i=1 ri |G|

Sketch of Proof. We will do a double counting on the spins of G. Note that every non-
identity rotation corresponds to two spins, and so there are 2|G| − 2 spins in total. On
the other hand, for any pole p, if there are rp rotations that fix p, then there are (rp − 1)
spins correspond to rp since one of the rp rotations is identity. Hence,

(rp − 1) = 2|G| − 2.
p

Since rp only depends on the orbit it lies in, we can replace it by ri . Suppose that
|Oi | = ni , then we have

k
ni (ri − 1) = 2|G| − 2.
i=1

By the counting formula, ni ri = |G|. Therefore we can get the desired identity by dividing
both side by |G|.

Theorem 5.7.1. The finite subgroups of SO3 are the following:


(1) Cn , the n rotations about an axis;

69
Hung-Hsun, Yu 5.8 Random Problem Set

(2) Dn , the symmetry group of a regular n-gon;


(3) T ∼
= A4 , the symmetry group of a tetrahedron;
(4) O∼= S4 , the symmetry group of a cube or an octahedron;
(5) I∼= A5 , the symmetry group of a dodecahedron or an icosahedron.

Remark. In the two dimensional case, Dn is not an orientation-preserving symmetry


group. However, a reflection in a plane can be realized as a rotation by π in R3 and hence
is orientation-preserving.

Sketch of Proof. We begin by solving for ri . Note that in the left hand side of the identity
in Lemma 5.7.1, each term is at least one half. Therefore k ≤ 3, and we can do a case-
by-case analysis. We first WLOG r1 ≤ r2 ≤ r3 .
If k = 1, then the left hand side is less than 1 while the right hand side is at least 1.
Therefore there is no solution in this case.
If k = 2, then
1 1 2
+ = .
r1 r2 |G|
Since r1 , r2 both divide |G|, this can only occur when r1 = r2 = |G|. Therefore all the
rotations in G are about the same axis, and so G ∼ = Cn .
If k = 3, then it is clear that r1 = 2. We first suppose that r2 = 2. Then r3 = n, |G| =
2n. Consider the poles p, p′ that form the third orbit. There are n rotations that fix p, p′
and the other half interchange p, p′ . Therefore there are n rotations about p, p′ by 2tπ/n
and n rotations about the axis perpendicular to p, p′ by π. This is the dihedral group
Dn .
If r2 > 2, then since 1/2+1/4+1/4 = 1, we have that r2 = 3. Since 1/2+1/3+1/6 = 1,
we have that r3 = 3, 4, 5. Therefore there are three possibilities:
(i) r1 = 2, r2 = 3, r3 = 3, |G| = 12: G is the tetrahedral group T .
(ii) r1 = 2, r2 = 3, r3 = 4, |G| = 24: G is the octahedral group O.
(iii) r1 = 2, r2 = 3, r3 = 5, |G| = 60: G is the icosahedral group I.
The proof that G is the tetrahedral/octahedral/icosahedral group is of more geometry
than of algebra and is thus omitted. One way to do this is to identify the orbit whose
elements form a regular polyhedron.

5.8 Random Problem Set


1. (5.1) Show that:
(1) {ϕx,M |x ∈ P, M ∈ On } generates Mn ;
(2) The rotation matrices generate SOn . Here A is a rotation matrix if and only if
A is of the form  
cos θ − sin θ
P sin θ cos θ  −1
P
In−2
where P is a permutation matrix.
Conclude that the two dimensional rotations together with a reflection generate all
the isometries.

70
Hung-Hsun, Yu 5.8 Random Problem Set

2. (5.2) Suppose that ABC is a triangle on a plane, and DEF are points on the
exterior of ∆ABC such that ∆DBC, ∆AEC, ∆ABF are isosceles and ∠BDC =
2α, ∠CEA = 2β, ∠AF B = 2γ where α + β + γ = π.
(1) Consider the isometry ρF,γ ρE,β ρD,α . Show that this is the identity map.
(2) Consider the image of D under this composition of isometries. Conclude that
∆DEF is a triangle that has angles α, β, γ.

3. (5.3) Show that the symmetry group of an equilateral triangle is isomorphic to S3 .

4. (5.3) Suppose that F is a nonempty figure on a plane such that the symmetry group
of F contains all rotations about a given point x. Show that either F is the whole
plane or the symmetry group of F is the same as the one of a circle centered at x.

5. (5.4) (Not hard but takes time) Find the seven frieze groups. A frieze pattern is
a figure whose symmetry group is a frieze group. For each frieze group, find a
corresponding frieze pattern.

6. (5.6) Consider the hypercube Qn of dimension n that has vertices (a1 , a2 , . . . , an )


where ai = 0, 1. Compute the order of symmetry group of Qn , and furthermore
describe the symmetry group explicitly.

7. (5.7) An Archimedean polyhedron is a non-regular convex polyhedron whose faces


are regular polygons and the symmetry group can send any vertex to any other
one. Show that if G = T, O, I and P is a G-orbit that is finite and forms a
convex polyhedron whose faces are regular polygons, then P forms an Archimedean
polyhedron. Try to name as many Archimedean polyhedra as possible in this way.
(Note: There are 13 Archimedean polyhedra.)

71
Hung-Hsun, Yu 5.8 Random Problem Set

72
Chapter 6

Advanced Group Theory

Now that we get group action as a new tool, we can get back to groups and discover
more useful properties. In this section, one can see that different group actions reveal
different properties of a group. Although it may be puzzling why a specific group action
is considered, it might be better to simply memorize the result first, for the result itself
is usually easier to understand than its proof.

6.1 Cayley’s Theorem


Given a group G. One of the first group actions that come to our minds may be the
group action of G on itself defined by left multiplication. This gives us Cayley’s theorem.
Theorem 6.1.1. (Cayley’s) Every group G can be embedded in the symmetric group
of itself.

Sketch of Proof. Consider the group action of G on itself defined by left multiplication.
For any g ∈ G, if g fixes every element, then gx = x ∀x ∈ G, which implies g = 1.
Therefore the permutation representation of this group action is a monomorphism from
G to the symmetric group of G.

Corollary 6.1.1. If G is a finite group, then G can be embedded into S|G| .

Note that the order of S|G| is (|G|)!, which is much larger than |G|. Therefore this
theorem is indeed not that useful practically. Theoretically, this theorem is implying that
symmetric group is the most complicated one. This theorem however does not reduce
the study of finite groups to the one of symmetric groups: people still understand the
structures of finite groups mainly from simple groups.
This group action actually does not give us too much, so let’s turn to another group
action.

6.2 Class Equation


Now consider the group action of G on itself defined by conjugation, namely g∗x = gxg −1 .
Then we can consider the orbits and the stabilizers in this case. Since this is a special
case that people usually consider, the orbits and the stabilizers in this case are given
other names.

73
Hung-Hsun, Yu 6.2 Class Equation

Definition 6.2.1. The orbit of an element g is called the conjugacy class of g, denoted
by Cg . The stabilizer of an element g is called the centralizer, denoted by CG (g). The
number of orbits is called the class number of G.

Property 6.2.1. |Cg ||CG (g)| = |G| for any g ∈ G, and the conjugacy classes form a
partition of G.

Example 6.2.1. Consider the tetrahedral group T . There is one identity, eight rota-
tions about a vertex by ±2π/3, and three rotations about the midpoints of a pair of edges
by π. The conjugacy class of the identity only contains the identity. Now suppose that
ρ is a rotation about a vertex v by θ, and ρ′ is another rotation about a vertex v ′ by θ.
Then ρ′ = f ρf −1 for any f ∈ T that maps v to v ′ , and so ρ, ρ′ are conjugates. Since
there are four vertices, we find four members in Cρ . Note that ⟨ρ⟩ is of order three and
is contained in the centralizer of ρ. Since 4 × 3 = 12 = |T |, the conjugacy class of ρ is of
order exactly 4.
It is easy to show that the rest of the rotations form a conjugacy class. Therefore
there is a conjugacy class of order 1, a conjugacy class of order 3 and two conjugacy
classes (corresponding to θ = ±2π/3 respectively) of order 4. A simple sanity check:
1 + 3 + 4 + 4 = 12, and so the conjugacy classes form a partition of T .

The information of conjugacy classes of a group makes it easier to determine all of its
normal subgroups.
Property 6.2.2. A subset N of a group G is a normal subgroup if and only if it is a
subgroup that is a union of conjugacy classes.

Example 6.2.2. The conjugacy classes of T have orders 1, 3, 4, 4,respectively. If N is


a normal subgroup, then it must contain the identity and also contain some conjugacy
classes. Since |N | must divide |T |, the only possible combinations of conjugacy classes
that N can contain are 1, 1 + 3 and 1 + 3 + 4 + 4. It is easy to verify that they are
subgroups, so these are the normal subgroups of T .

Note that since the conjugacy classes form a partition, the sums of their orders is the
order of the group. Together with the counting formula, we can get that
Theorem 6.2.1. (Class equation) Suppose that C1 , . . . , Ck are the conjugacy classes of
a finite group G, and for each i = 1, . . . , k pick a representative gi . Then


k ∑
k
|G| = |Ci | = [G : CG (gi )].
i=1 i=1

Usually, we will isolate the conjugacy classes of order 1. They form a subgroup that
is called the center.
Definition 6.2.2. Suppose that G is a group. Its center Z is the subgroup containing
s such that gs = sg for all g ∈ G.

Corollary 6.2.1. Suppose that C1 , . . . , Ck are the conjugacy classes of a finite group G

74
Hung-Hsun, Yu 6.3 Normalizer

that have orders greater than 1, and for each i = 1, . . . , k pick a representative gi . Then


k
|G| = |Z| + [G : CG (gi )].
i=1

We can similarly write a class equation for group action.


Theorem 6.2.2. (Class equation for group action) Suppose that G acts on a set S and
O1 , . . . , Ok are the orbits of sizes greater than 1, and Z is the subset of S whose elements
are fixed by every element in G. Suppose that si is a representative of the orbit Oi , then


k
|S| = |Z| + [G : Gsi ].
i=1

6.3 Normalizer
We can furthermore extend the group action to the subsets of G. The stabilizers are
something that one may consider.
Definition 6.3.1. Suppose that S is a subset of G. The stabilizer of S under conjuga-
tion is called the normalizer of S, which is denoted by NG (S).

Property 6.3.1. NG (S) is a subgroup of G. Furthermore, ⟨S⟩ is normal in NG (S). If


S is a subgroup, then NG (S) is the largest subgroup such that S is normal in NG (S).

Theorem 6.3.1. (N/C Theorem) Suppose that H ≤ G, then NG (H)/CG (H) is a sub-

group of Aut(H). Here CG (H) = h∈H CG (h).

Sketch of Proof. Let φ : NG (H) → Aut(H) be the homomorphism such that φ(n) sends
h to nhn−1 . Then φ is well-defined, and ker φ = CG (H).

6.4 Cauchy’s Theorem


Suppose that a group has order n, then the existence of an element with order p implies
that p|n. When p is a prime, the converse is also true.
Theorem 6.4.1. Suppose that p is a prime and G is a finite group. Then there exists
g ∈ G such that g has order p if and only if p||G|.

Sketch of Proof. It suffices to show the existence of such g when p||G|. Consider the set

S := {(g1 , . . . , gp ) ∈ Gp |g1 . . . gp = 1}.

Since for any g1 , . . . , gp−1 ∈ G there is a unique gp such that g1 . . . gp = 1, the size of S
is |G|p−1 . Now consider the group action of Cp on S that permutes the tuples cyclically.
The class equation is


k
|G|p−1 = |S| = |Z| + [Cp : (Cp )si ] = |Z| + pk.
i=1

75
Hung-Hsun, Yu 6.5 Sylow p-group

Therefore p||Z|. Note that (1G , . . . , 1G ) ∈ Z, so there exists (g1 , . . . , gp ) ∈ Z that is not
(1G , . . . , 1G ). Now since every cyclic permutation fixes (g1 , . . . , gp ), we have g1 = . . . = gp .
Therefore g1p = 1G and g1 ̸= 1G .
Remark. When |G| is not divided by p, there is only one element in Z. Therefore one
can get Fermat’s little theorem from this.

6.5 Sylow p-group


In the previous section we see that if p is a prime that divides the order of a group, then
there is a subgroup of order p. This holds more generally for prime powers.
Definition 6.5.1. A group with order of power of p is called a p-group. Suppose that
G is a group such that pk | |G| but pk+1 ∤ |G| where k ≥ 1, then a subgroup of G that
has order pk is called a Sylow p-subgroup.

Theorem 6.5.1. (Sylow First Theorem) Suppose that G is a group of order pk m where
p is a prime and m is not divisible by p. Then for every s = 1, . . . , k there exists a
subgroup of G that has order ps .

Sketch of Proof. Let S be the set of the subsets of G of size ps , and consider the group
action of G on S defined by left multiplication. For every A ∈ S we can construct an
injection from GA to A mapping g to ga where a is a chosen element in A. Therefore
|GA | ≤ |A| = ps .
Now suppose that A1 , . . . , At are the representatives of the orbits, then
( )
pk m ∑
t
= |S| = [G : GAi ].
ps i=1

Note that (( ))
pk m
vp = k − s,
ps
so there exists i such that vp ([G : GAi ]) ≤ k − s, which implies vp (GAi ) ≥ s. Since
|GAi | ≤ ps , we have that the equality holds. Therefore GAi is the desired subgroup.
Example 6.5.1. Take G = S4 . Then there are three Sylow 2-subgroups according to
Example 2.10.1. It is also easy to see there are four 3-subgroups.

Theorem 6.5.2. (Sylow Second Theorem) Suppose that G is a finite group, P is a


p-subgroup and H is a Sylow p-subgroup. Then there exists g ∈ G such that P ≤ gHg −1 .
In particular, any two Sylow p-groups are conjugates.

Sketch of Proof. Let S be the set of the left cosets of H, and consider the group action
of P on S defined by left multiplication. Then the class equation is

t
m = [G : H] = |S| = |Z| + [P : Psi ].
i=1

Note that since Psi is not P and P is a p-group, we have p|[P : Psi ]. Combined with the
fact that p ∤ m, we have that p ∤ Z, and in particular there is a coset gH that is fixed by

76
Hung-Hsun, Yu 6.5 Sylow p-group

every element in P . Therefore P gH = gH, and so P (gHg −1 ) = gHg −1 . This shows that
P ≤ gHg −1 .

Theorem 6.5.3. (Sylow Third Theorem) Let n be the number of Sylow p-subgroups
of G. Then n divides the order of G, and n ≡ 1 mod p.

Sketch of Proof. Consider first that the group action of G on the set of Sylow p-subgroups
defined by conjugation. By the second theorem we know that the orbit is the whole set.
Therefore the size of the set divides G.
Now consider the group action of G on the set S of the subsets of size pk (which is
the special case of the group action considered in the first theorem). Then
( )
pk m ∑
t
= |S| = [G : GAi ].
pk i=1

Note that the number of the Sylow p-subgroups is the number of i such that |GAi | = pk ,
so

t
[G : GAi ] ≡ mn mod p.
i=1

Therefore ( )
−1 pk m
n≡m ≡ m−1 m ≡ 1 mod p.
pk

Example 6.5.2. |S4 | = 24. There are three Sylow 2-subgroups, and 3 ≡ 1 mod 2,
3|24. There are four Sylow 3-subgroups, and 4 ≡ 1 mod 3, 4|24.

Sylow theorems, especially the third one, are really helpful to determine the possible
structures of groups of a given order.
Example 6.5.3. Let G be a group of order 15. Then there are one Sylow 3-group
M and one Sylow 5-group N according to Sylow third theorem. Therefore by Exercise
14 in Chapter 2, G ∼
= M ×N ∼ = Z/15Z, and so there is one group of order 15 up to
isomorphism.

Example 6.5.4. Now consider the group G of order 21. By Sylow third theorem, there
is only one Sylow 7-subgroup N , and so N is normal. Now when p = 3, we don’t know if
there is one Sylow 3-subgroup or 7 Sylow 3-subgroups. If there is only one, then by the
same argument used above we can see that G ∼ = Z/21Z, so let’s consider the second case.
Suppose that H is a Sylow 3-group, then G = N ⋊ H. Let x be the generator of N
and y be the generator of H, then yxy −1 = xi for some i = 1, . . . , 6. Since y is of order
3, we have i3 ≡ 1 mod 7. This shows that i = 1, 2, 4.
If i = 1, then it is easy to see that the semidirect product is indeed a direct product,
which is a contradiction. If i = 2, then we get another group G′ which is non-abelian.
It is easy to verify that there are 7 Sylow 3-subgroups in G′ . If i = 4, then it is also
isomorphic to G′ via the homomorphism that sends x to x and y to y −1 . Therefore there
are two groups of order 21 up to isomorphism.

77
Hung-Hsun, Yu 6.6 Free Group and Relation

6.6 Free Group and Relation


We know that we can describe a dihedral group Dn as a group generated by x, y and
defined by the relation
xn = y 2 = xyxy = 1.
What we omitted was what it actually means. For example, it is hard to determine the
underlying set by merely staring at these relations. In this section, we will go through the
rigorous defintion of the groups defined by the relations. A nice way to do this is to first
construct the largest group that has the least relations, and then “collapse” the group by
identifying some non-trivial elements with the identity. We will first do the former.
Definition 6.6.1. A word in a set S is an expression (or a formal product)

a1 a2 . . . an

where ai ∈ S ∪ S −1 . A reduction is to replace any substring of ss−1 by an empty string.


A word is reduced if no reduction can be done on it.

Definition 6.6.2. A free group FS generated by a set S is defined on the set of all
reduced words in S. The product of two words is the reduced word of the concatenation.
A group is free if it is isomorphic to a free group generated by some subset of it. The
rank of a free group is the cardinality of its freely generating set.

Example 6.6.1. The free group generated by a single element {x} consists of the
reduced words z n , z −n or ∅ where n ∈ N. Therefore the free group generated by an
element is isomorphic to (Z, +).

Property 6.6.1. (Universal property of free group) Suppose that G is a group generated
by S, then there is a unique homomorphism ϕ from FS to G such that ϕ(s) = s for every
s ∈ S. In other words, there is a unique homomorphism ϕ that makes the following
diagram commute.

ϕ
FS G

Sketch of Proof. For every reduced word a1 a2 . . . an in FS it is easy to see that ϕ(a1 . . . an )
has to be ϕ(a1 ) . . . ϕ(an ). To verify that ϕ is a homomorphism, we just need to show that
reduction does not affect the value that ϕ takes, which is true since ϕ(a)ϕ(a−1 ) = aa−1 =
1.
Corollary 6.6.1. Suppose that G is generated by S. Then G is a quotient group of
FS .

Sketch of Proof. This follows by the fact that ϕ is surjective.


There are several useful facts regarding free groups that I won’t show the proofs here.

78
Hung-Hsun, Yu 6.6 Free Group and Relation

Theorem 6.6.1. (Nielsen-Schreier theorem) Every subgroup of a free group G is also


a free group. In particular, if G is of rank n, then a subgroup of index e is of rank
1 + e(n − 1).

Remark. There is an easy but complicated proof for the case where the subgroup is
finitely generated. The main idea is to do something similar to row reduction to reduce
the length of the generating set.

Theorem 6.6.2. (Grushko’s theorem) If a group G is freely generated by a set of n


elements, and B is a generating subset of G that also has n elements, then B generates
G freely.

We know that every group generated by S can be represented by a quotient group of


FS . We can also do the converse: give the generating set and some relations on FS , and
obtain a new group. Take the dihedral group for example. The description of Dn is that
Dn is a quotiend group of F{x,y} , and we want to find an appropriate quotient F{x,y} /N of
F{x,y} such that xn = y 2 = xyxy = 1. Therefore we want that xn , y 2 , xyxy ∈ N . Also we
would like N to be as small as possible. This gives rise to the notion of normal closure.
Definition 6.6.3. Given a group G and a subset S. The normal closure of S is the
intersection of all normal subgroups N of G containing S.

Property 6.6.2. The normal closure of S is a normal subgroup of G. Furthermore, if


N is a normal subgroup of G that contains S, then N contains the normal closure of S.

Remark. Just like the definition of the group that a subset generates, there is an
explicit definition of the normal closure. This is somehow boring and is left as an exercise
for the readers.

Definition 6.6.4. The group G defined by the presentation ⟨S | R⟩ where S is a set


and R is a set of words in S is the quotient FS /N where N is the normal closure of R in
FS . S is called the generating set of G, and R is called the relators.

Example 6.6.2. The presentation of a dihedral group is ⟨x, y | xn , y 2 , xyxy⟩. The


presentation of a cyclic group is ⟨x | xn ⟩. The presentation of Z×n is ⟨x, y | xyx−1 y −1 ⟩.

Property 6.6.3. (Universal property of presentation) Suppose that G is defined by the


presentation ⟨S | R⟩, and G′ is generated by S such that all the relators are identity in
G′ . Then there is a unique homomorphism ϕ : G → G′ such that ϕ(s) = s. In other
words, there is a unique homomorphism ϕ that makes the following diagram commute.

ϕ
G G′

Sketch of Proof. By the universal property of free group, there is a homomorphism ϕ′

79
Hung-Hsun, Yu 6.7 Todd-Coxeter Algorithm

from FS to G′ such that ϕ′ (s) = s. Since all the relators vanish in G′ , we have R ⊆ ker ϕ′ .
Therefore the normal closure N of R in FS is also contained in ker ϕ′ . Therefore there is
a natural homomorphism from G = FS /N to G′ ∼ = FS / ker ϕ′ .
Although we can construct a lot of groups by writing down tons of presentations, it
is actually hard to work with the presentations because of the complexity of dealing with
free groups. It is even hard to determine whether two reduced words are the same under
the given relations.

6.7 Todd-Coxeter Algorithm


In this section, we will introduce a method to compute a certain type of group defined
by a given presentation. The main idea is that if we know explicitly G = ⟨S | R⟩ and a
subgroup H of G, then we can compute the group action of G on H. For some reason,
we usually consider the right action and the right cosets in this scenario.
Property 6.7.1. (Rules used for Todd-Coxeter Algorithm)
(1) The action of the group G on the cosets of H is transitive. That is, for any cosets
Ha, Hb there is an element g such that (Ha)g = Hb.
(2) For every g ∈ G, the action of g on the cosets is a permutation on the cosets.
(3) Any relator fixes all the cosets.
(4) All the elements in H fix the coset H.

What we are going to do is to use these four easy facts to compute the action. To
demonstrate this, let’s take our favorite dihedral group D4 as an example.
Example 6.7.1. Let G = ⟨x, y | x4 , y 2 , xyxy⟩ and H = ⟨x⟩ ≤ G. Then we know that
there is definitely a coset of H, namely H. For convenience, let’s label it 1. We know
that x fixes H, and all the relators fixes H. We can write this as the table.
x x x x y y x y x y
1 1 1 1 1 1 1 1 1 1

The table simply means that the i-th column is sent to the (i + 2)-th column by the
symble written in the (i + 1)-th column. Now we don’t know where y sends 1 to. Let’s
label this with 2 whatsoever. Then we can fill in some blanks:
x x x x y y x y x y
1 1 1 1 1 1 2 1 1 1 2 1
Therefore we know that the preimage of 1 under y is 2. We can hence fill in one more
blank:
x x x x y y x y x y
1 1 1 1 1 1 2 1 1 1 2 2 1

This shows that x sends 2 to 2. At this point, we know for sure that all the relators act
on 1 trivially. We still need to check this for 2:
x x x x y y x y x y
1 1 1 1 1 1 2 1 1 1 2 2 1
2 2 2 2 2 2 1 2 2 2 1 1 2

80
Hung-Hsun, Yu 6.7 Todd-Coxeter Algorithm

Now everything is compatible with the rules. Since the group action is transitive, we
know that the cosets are precisely 1 and 2. Although at this point we don’t know if 1 and
2 are distinct, there are no rules that tell us that they are the same, and we will prove
(informally) later that this shows that they are different. Therefore the index of ⟨x⟩ is 2,
which is consistent with the fact.
We can also do the similar algorithm for ⟨y⟩, and we can get the following table.

x x x x y y x y x y
1 2 3 4 1 1 1 1 1 2 4 1 1
2 3 4 1 2 2 4 2 2 3 3 4 2
3 4 1 2 3 3 3 3 3 4 2 3 3
4 1 2 3 4 4 2 4 4 1 1 2 4

Now note that x2 does not act trivially on the cosets of ⟨y⟩, which shows that x2 is not
the identity. Therefore |⟨x⟩| = 4, and so G is of order 8, which is again consistent with
the fact.
Besides the order of G and the indices of the subgroups, we also get a lot of bonus from
it. For example, since x fixes all the cosets of ⟨x⟩, we know that ⟨x⟩ is normal. We also
get permutation representations of G. The first table gives us the representation ϕ(x) =
id, ϕ(y) = (1 2) and the second one gives us the representation ϕ(x) = (1 2 3 4), ϕ(y) =
(2 4). The second one happens to be an injection, and so we successfully embed D4 into
S4 .

We can see that although we will definitely know the index of the subgroup (as long
as it is finite), we might have to do something else to get other information such as the
order of the subgroup/the group, or the order of a generator. A way to avoid this is to
take H as a trivial group, which has a cost of computational complexity.
Sometimes things do not go that well. There might be some collision of indices which
force the table to collapse.
Example 6.7.2. Let G = ⟨x, y|x2 , y 2 , yxyxy⟩ and H = ⟨x⟩. We can work out the table
similarly:

x x y y y x y x y
1 1 1 1 2 1 1 2 3 4 2 1
2 3 2 2 1 2 2 1 1 2 3 2

Now we see some collisions here: x sends 3 to 2 and sends 4 to 2. This shows that 3 = 4.
Also y sends 3 to 2 and 1 to 2, which shows that 3 = 1. So we can replace 3, 4 with 1:

x x y y y x y x y
1 1 1 1 2 1 1 2 1 1 2 1
2 1 2 2 1 2 2 1 1 2 1 2

Now there are collisions everywhere. For example, y sends 1 to 2 and also sends 1 to 1,
which shows that 1 = 2. Therefore the index of ⟨x⟩ is 1. In fact, we can directly see
y = 1 from the relation:

1 = yxyxy = y(yxyxy)y = xyx = x(xyx)x = y.

81
Hung-Hsun, Yu 6.8 Burnside’s Lemma

To state the algorithm formally is quite annoying (which is true for a lot of algorithms),
so I will state the algorithm informally since the above discussion should already say
enough about the algorithm. As a consequence, it is impossible to prove the correctness
formally, but I will demonstrate why the number of indices is the same as the number of
cosets.
Theorem 6.7.1. (Todd-Coxeter Algorithm) Given a group G = ⟨S | R⟩ and a subgroup
H. At each step, we can do one of the two following:
(1) Identify two indices if the rules force so;
(2) Choose a generator x and an index i such that the image of i under x is not
determined, and assign a newr index j to it.
If we can complete the table such that all the four rules are satisfied after finitely
many steps, then the number of cosets is equal to the number of distinct indices in the
table.

Sketch of Proof. We can construct the map from the indices to the cosets inductively. It
is easy for (2). For (1), we have to show that the two indices correspond to the same
cosets at the beginning, which is true since we only do step (1) when we are forced to. It
remains to show that the map is bijective. The surjectivity is clear since the group action
is transitive. The injectivity then follows from the fact that the stabilizer of 1 contains
H.

6.8 Burnside’s Lemma


There are a lot of counting problems can be formulated as counting the orbits of a group
action. Burnside’s lemma comes in handy in this situation.
Theorem 6.8.1. (Burnside’s lemma) Suppose that G is a finite group that acts on a
set S. Then the number of orbits is
1 ∑ g
|S |
G g∈G

where S g is the set of elements that are fixed by g.

Sketch of Proof. The key is to spread the total contribution of a orbit uniformly to every
element of the orbit. Then the number of orbits is
∑ ∑ 1 ∑ g
| orbG (s)|−1 = |G|−1 |CG (s)| = |G|−1 |{g ∈ G, s ∈ S|gs = s}| = |S |.
s∈S s∈S G g∈G

Example 6.8.1. There is a ring with p ∈ P beads, colored with m colors. We are going
to count the number of different colorings of the beads. Here two colorings are seen to
be the same if we can get one from rotating the other one.
Consider the action of Cn on the set S of all the sequences of n m-colored beads.
Suppose that x is the rotation by 2π/n. For every 0 ≤ i < n, we have |S x | = mgcd(n,i) .
i

Therefore the number of orbits is


1∑ m
1∑ n d
mgcd(n,i) = φ( )m .
n i=0 n d|n d

82
Hung-Hsun, Yu 6.9 Random Problem Set

This is much faster than using the inclusion-exclusion principle.

Remark. If we let n = p be a prime, then we have that the number of colorings is


1 p 1
(m + (p − 1)m) = m + (mp − m).
p p
Hence we get another proof of Fermat’s little theorem.

6.9 Random Problem Set


1. (6.1) Show that for any n ∈ N, there exists a group G such that |G| = n, and there
exists an embedding from G to Sm if and only if m ≥ n.
2. (6.2) Determine the conjugacy classes of the icosahedral group I. Conclude that I
is simple and isomorphic to A5 .
3. (6.2) Determine the conjugacy classes of the group GL3 (F2 ) and show that it is
simple.
4. (6.2) Show that for every group G, if |G| = p2 where p is a prime, then G is abelian.
5. (6.2) (Hard) In this problem we will show that if G is a group of odd order whose
class number is s, then s ≡ |G| mod 8.
(1) Show that the set of non-commuting pairs S in G2 is of order |G|(|G| − s).
(2) Show that α : (x, y) 7→ (y, x) and β : (x, y) 7→ (y −1 , x) are bijections on S.
(3) Show that α2 = β 4 = αβαβ = 1. Conclude that ⟨α, β⟩ is isomorphic to the
dihedral group D4 .
(4) Show that if |G| is odd, then no non-identity elements in ⟨α, β⟩ fix any element
in S. Conclude that s ≡ |G| mod 8.
6. (6.5) Show that if p is a prime factor of |G| where G is a group, then the number
of subgroups of G that are of order p is congruent to 1 modulo p.
7. (6.5) Show that the smallest non-abelian group of odd order has order 21.
8. (6.5) Classify all the groups of order 28.
9. (6.5) Let G be a group and H be a Sylow p-subgroup of G. Show that NG (NG (H)) =
NG (H).
10. (6.6) Suppose that the presentation of G is ⟨S | R⟩ and the one of G′ is ⟨S ′ | R′ ⟩.
Find a presentaion of G × G′ .
11. (6.7) Compute the order of the group ⟨x, y, z | x2 , y 3 , z 5 , xyz⟩. Can you identify
which group it is?
12. (6.7) What is the order of the largest group that is generated by two elements and
contain elements of order 1 or 3?
13. (6.8) Count the number of different colorings of faces of tetrahedron. Here two
colorings are the same if we can obtain one from the other via rigid motion.

83
Hung-Hsun, Yu 6.9 Random Problem Set

84
Chapter 7

Bilinear Form

In section 3.5 we briefly talked about multilinear forms. In this chapter, we will focus
on the easiest case: bilinear form. As the last chapter focusing on the structure of linear
algebra, we will endow the real vector space with an extra “measurement” via defining
the inner product. We will also talk about the analogy in the complex case.
Most of this chapter will be about real and complex vector space. This allows us to
discover more properties, and some of them are really striking. This is probably why
most of the linear algebra classes end with bilinear forms. So get ready, and let’s go
through the last chapter of the linear algebra part!

7.1 Bilinear Form and Dual Space


Recall the definition in section 3.5:
Definition 7.1.1. A bilinear form B on a vector space V over F is a function B :
V × V → F such that it is linear with respect to each of the two entries.

As what we do with linear transformation, we would like to write bilinear forms in


their matrix forms when V is finite dimensional.
Property 7.1.1. Suppose that V is finite dimensional and β = (β1 , . . . , βn ) is a basis.
Then for every x, y ∈ F n we have

B(xβ, yβ) = xT Ay

where Aij = B(ei β, ej β).

Property 7.1.2. (Change of coordinate formula for bilinear form) Suppose that β and
β ′ are two bases of a finite dimensional space V and the matrix forms of B with respect
to β and β ′ are A and A′ , respectively. Then
T
A′ = [id]ββ ′ A[id]ββ ′ .

We will refrain from determining the intrinsic properties of bilinear forms because the
fun begins only when we consider bilinear forms that have certain good properties. This
will be discussed in the next section.
Now given a bilinear form B, we can actually construct some objects out of it. For
example, if we fix v, then B(v, ·) is a linear transformation from V to F . Therefore B is

85
Hung-Hsun, Yu 7.2 Start From Standard Inner Product

a map from V to “the linear transformation from V to F .” This leads us to consider the
dual space.
Definition 7.1.2. Suppose that V is a vector space over F . The dual space of V is the
space V ∗ containing the linear transformation from V to F . The dual space naturally
possess a structure of vector space.

Property 7.1.3. If V is finite dimensional, then V and V ∗ are of the same dimension.
Moreover, if β1 , . . . , βn form a basis of V , then the linear transformations Ti such that

n
Ti ( aj βj ) = βi
j=1

also form a basis.

Property 7.1.4. Given a bilinear form B. Then we can construct naturally maps
B1 , B2 : V → V ∗ such that
B1 (v)(w) = B2 (w)(v) = B(v, w).

7.2 Start From Standard Inner Product


We know that in Rn , there is a well-behaving bilinear form: the standard inner product

n
⟨v, w⟩ = vi wi .
i=1

It has a lot of good properties that we hope that the ”good abstract bilinear forms”
should also have.
Property 7.2.1. The standard inner product has the following properties:
(1) ⟨v, w⟩ = ⟨w, v⟩ for any v, w ∈ Rn ;
(2) ⟨v, v⟩ ≥ 0 for any v ∈ Rn , and the equality holds if and only if v = 0.
(3) If v ∈ Rn satisfies that ⟨v, w⟩ = 0 for all w ∈ Rn , then v = 0.

Definition 7.2.1. A bilinear form B is symmetric if for any v, w ∈ V we have B(v, w) =


B(w, v).

Definition 7.2.2. A bilinear form B is non-degenerate if B(v, w) = 0 ∀w ∈ V implies


v = 0.

Definition 7.2.3. A bilinear form B on a R-vector space V is positive semi-definite if


B(v, v) ≥ 0 for any v ∈ V . If furthermore the equality holds only if v = 0, then we say
that B is positive definite.

Definition 7.2.4. If ⟨·, ·⟩ is a bilinear form on a R-vector space V that is positive


definite and symmetric, then we say that it is an inner product on V , and V is an inner
product space.

Back to the standard inner product case. The usual standard basis of Rn is e1 , . . . , en .
This basis interacts with the standard inner product very well.

86
Hung-Hsun, Yu 7.3 Symmetric Form

Property 7.2.2. Suppose that ⟨·, ·⟩ is the standard inner product on Rn and e1 , . . . , en
is the standard basis. Then for any i ̸= j we have ⟨ei , ej ⟩ = 0, and for any i we have
⟨ei , ei ⟩ = 1.

Definition 7.2.5. Suppose that B is a bilinear form on a vector space V . A subset


S ⊆ V is said to be orthogonal if for any distinct v, w ∈ S we have B(v, w) = 0. If
furthermore we have B(v, v) = 1 for all v ∈ S, then we say that S is orthonormal.

Besides, we can also consider the standard inner product on Cn :


n
⟨v, w⟩ = vi wi .
i=1

Note that the standard inner product on Cn IS NOT A BILINEAR FORM. We are
willing to give up the linearity with respect to the first entry because this keeps the
positive-definiteness.
Definition 7.2.6. A function ⟨·, ·⟩ : V × V → C where V is a C-vector space is said to
be hermitian if it satisfies the following:
(1) ⟨cv1 + v2 , w⟩ = c⟨v1 , w⟩ + ⟨v2 , w⟩;
(2) ⟨v, cw1 + w2 ⟩ = c⟨v, w1 ⟩ + ⟨v, w2 ⟩;
(3) ⟨v, w⟩ = ⟨w, v⟩.
We can also define orthogonal, orthonormal, positive definite, positive semi-definite
and inner product in a similar way as we did for bilinear form.

Now that we know which properties might give us something interesting, we are ready
to explore them one by one in the following sections.

7.3 Symmetric Form


Property 7.3.1. Suppose that B is a bilinear form on a finite dimensional vector space
V , and A is a matrix form of B. Then B is symmetric if and only if A is symmetric, i.e.
AT = A.

As always, it is good to diagonalize the matrix form. If a bilinear form B is diago-


nalizable, then this means that there is a basis β1 , β2 , . . . , βn such that B(βi , βj ) = 0 for
any i ̸= j. This shows that B is symmetric. What we are interested is if the converse is
also true.
Property 7.3.2. A symmetric form on a finite dimensional vector space is diagonaliz-
able if and only if there exists an orthogonal basis.

Theorem 7.3.1. If V is a finite dimensional vector space over F and char(F ) ̸= 2, then
any symmetric form on V is diagonalizable.

Sketch of Proof. This can be proved by induction. The case that dim V = 1 is trivial.
Suppose that the statement is true for dim V = n − 1, then for dim V = n, suppose
that β1 , . . . , βn is a basis of V . If there is i such that B(βi , βi ) ̸= 0, WLOG assume that

87
Hung-Hsun, Yu 7.3 Symmetric Form

B(βn , βn ) ̸= 0. Let
B(βi , βn )
βi′ = βi − βn
B(βn , βn )
for i = 1, . . . , n − 1. Then we have

B(βi , βn )
B(βi′ , βn ) = B(βi , βn ) − B(βn , βn ) = 0.
B(βn , βn )

Let W be the subspace spanned by β1′ , . . . , βn−1



. It is easy to show that every vector
in W is orthogonal to βn . We can then apply the induction hypothesis to W to get an
orthogonal basis of W , and this basis together with βn forms an orthogonal basis of V .
Now if there is no such i such that B(βi , βi ) ̸= 0, then suppose that i ̸= j such that
B(βi , βj ) ̸= 0 (otherwise there will be nothing to prove). Let βj′ = βi + βj . Then

B(βj′ , βj′ ) = B(βi , βi ) + 2B(βi , βj ) + B(βj , βj ) ̸= 0.

Therefore we can take β1 , . . . , βj−1 , βj′ , βj+1 , . . . , βn as a new basis and apply the above
argument.

Although we know that when char(F ) ̸= 2 we can diagonalize the symmetric form,
the result is usually not unique. For example, if we scale one element of the orthogonal
basis by c, then the corresponding diagonal entry will be scaled by c2 . This tells us that
when F = R we can adjust the orthogonal basis so that the diagonal entries are 0, 1 or
−1, and when F = C we can make the diagonal entries 0 or 1. In fact, the numbers of
the 0’s, 1’s and −1’s are invariants.
Definition 7.3.1. The kernel of a symmetric form B is the set of all v such that
B(v, w) = 0 ∀w ∈ W .

Property 7.3.3. The number of diagonal entries being 0 of a diagonalization of a


symmetric form B is the dimension of its kernel and thus is independent of the choice of
orthogonal basis.

Theorem 7.3.2. (Sylvester’s law of inertia) Suppose that B is a symmetric form on


R-vector space V . The numbers of positive diagonal entries, negative diagonal entries
and zero entries of a diagonalization of B is independent of the choice of orthogonal basis.

Sketch of Proof. Suppose that β, β ′ are two orthogonal bases with respect to B, and there
are r positive diagonal entries and s non-negative diagonal entries with respect to β, and
r′ positive diagonal entries and s′ non-negative diagonal entries with respect to β ′ . It
suffices to show that r = r′ . We will show this by showing that r + s′ ≤ dim V and
r′ + s ≤ dim V .
Suppose that v1 , . . . , vr are elements in β corresponding to the positive entries and
w1 , . . . , ws′ are elements in β ′ corresponding to the non-negative entries. If a1 , . . . , ar and
b1 , . . . , bs′ are reals such that


r ∑
s
ai v i + bi wi = 0,
i=1 i=1

88
Hung-Hsun, Yu 7.4 Hermitian Form

∑r ∑s′
let v = i=1 ai vi and w = i=1 bi wi . Then

r
0 = B(v, v + w) = B(v, v) + B(v, w) = a2 B(v
i i , vi ) + B(v, w) ≥ B(v, w)
i=1

and


s
0 = B(w, v + w) = B(w, w) + B(v, w) = b2 B(w
i i , wi ) + B(v, w) ≤ B(v, w).
i=1

Therefore all the equalities must hold, and in particular



r
a2 B(v
i i , vi ) = 0.
i=1

This shows that a1 , . . . , ar = 0. Note that w1 , . . . , ws′ are linearly independent, so


b1 , . . . , bs′ = 0. Therefore v1 , . . . , vr , w1 , . . . , ws′ are linearly independent, and so r + s′ ≤
dim V . Similarly r′ + s ≤ dim V .
Definition 7.3.2. The signature of a symmetric form on a finite dimensional R-vector
space is the three numbers n0 , n+ .n− being the numbers of zero, positive and negative
diagonal entries, respectively. Similar definition can be made for real symmetric matrices.

7.4 Hermitian Form


When thinking of hermitian form, we ought to think of it as the “complex version” of the
inner product on R-vector space. Therefore we can do the similar thing we did before.
Definition 7.4.1. For any matrix A/vector v, define the conjugate transpose to be
AT /v T . This is usually denoted by A∗ /v ∗ .

Property 7.4.1. Suppose that V is a finite dimensional C-vector space, and ⟨·, ·⟩ is a
hermitian form on V . Then for any given basis β and any x, y ∈ Cn we have

⟨xβ, yβ⟩ = x∗ Ay

where Aij = ⟨ei β, ej β⟩.

Property 7.4.2. (Change of coordinate formula for hermitian form) Suppose that β
and β ′ are two bases of a finite dimensional C-vector space V and the matrix forms of a
hermitian form ⟨·, ·⟩ with respect to β and β ′ are A and A′ , respectively. Then

A′ = [id]ββ ′ A[id]ββ ′ .

Property 7.4.3. Given a square complex matrix A. The form defined on Cn by

⟨x, y⟩ = x∗ Ay ∀x, y ∈ Cn

is hermitian if and only if A is hermitian, i.e. A∗ = A.

Theorem 7.4.1. If V is a finite dimensional complex vector space, then any hermitian
form on V can be diagonalized.

89
Hung-Hsun, Yu 7.5 Orthogonality

Sketch of Proof. The proof is similar to the proof for symmetric form. The only thing
we have to actually prove that is if there is x, y such that ⟨x, y⟩ ̸= 0, then there exists v
such that ⟨v, v⟩ ̸= 0. If ⟨x, x⟩ ≤ 0 or ⟨y, y⟩ ̸= 0, then we’re done, so let’s suppose that
they are both zero. Let v = ⟨x, y⟩x + y, then

⟨v, v⟩ = ⟨⟨x, y⟩x, y⟩ + ⟨y, ⟨x, y⟩x⟩ = 2⟨x, y⟩⟨x, y⟩ > 0.

Theorem 7.4.2. (Sylvester’s law of inertia) Suppose that ⟨·, ·⟩ is a symmetric form on
C-vector space V . The numbers of positive diagonal entries, negative diagonal entries
and zero entries of a diagonalization of ⟨·, ·⟩ is independent of the choice of orthogonal
basis.

Sketch of Proof. The proof is identical to the one for symmetric forms.
Basically a lot of properties of symmetric form also hold for hermitian form if one
replaces transpose with conjugate transpose.
The hermitian matrix itself has a lot of interesting properties. We will introduce one
of them first.
Property 7.4.4. Any eigenvalue of a hermitian matrix is real.

Sketch of Proof. Suppose that A is hermitian and λ is its eigenvalue. Let v be a non-zero
vector such that Av = λv. Then

λv = Av = A∗ v = λv.

Since v is non-zero, we have λ = λ, which shows that λ is real.


Corollary 7.4.1. All eigenvalues of real symmetric matrices are real.

Although this corollary is easy, it is somehow striking. Without the aid of complex
number, it will be very hard to show that the characteristic polynomial actually splits in
reals.

7.5 Orthogonality
In this section, we will suppose that V is a finite dimensional vector space equipped with
a symmetric/hermitian bilinear form B. For simplicity, we will use ∗ to indicate transpose
if B is a symmetric form, and to indicate conjugate transpose if B is hermitian. For every
subspace W of V , we can consider the set of vectors that are orthogonal to any element
in W .
Definition 7.5.1. The orthogonal complement of a subspace W is the set

{v ∈ V |B(w, v) = 0 ∀w ∈ W }.

This is usually denoted as W ⊥ .

Property 7.5.1. B is non-degenerate if and only if V ⊥ = {0}. Moreover, B is non-


degenerate on a subspace W if and only if W ∩ W ⊥ = {0}.

90
Hung-Hsun, Yu 7.5 Orthogonality

Property 7.5.2. For any subspace W , we have W ⊆ (W ⊥ )⊥ .

Instead of inspecting the orthogonal complement, we can also consider the matrix
form to see if the symmetric/hermitian form is non-degenerate.
Property 7.5.3. B is non-degenerate if and only if its matrix form is invertible.

Sketch of Proof. Suppose that A is the matrix form of B with respect to a basis β. Then
for any v ∈ F n we have vβ ∈ V ⊥ if and only if

w∗ Av = 0 ∀w ∈ F n ,

which is equivalent to Av = 0.
Now if B is non-degenerate on W , then W + W ⊥ is a direct sum. We naturally hope
that the direct sum is V . In this way, we will have a canonical way splitting V into the
direct sum of W and another subspace. Suppose that this is true, then we will get a
projection P from V to W such that P 2 = P and im P = W . Moreover, for any v ∈ V
we have P v − v is orthogonal to v.
Definition 7.5.2. If P is a linear operator such that P 2 = P and for any w ∈ im P, w′ ∈
ker P we have B(w, w′ ) = 0, then we say that P is an orthogonal projection.

Theorem 7.5.1. If W is a subspace such that B is non-degenerate on W , then V =


W ⊕ W ⊥ . Moreover, if w1 , . . . , ws is an orthogonal basis of W , then we have

s
B(wi , v)
Pv = wi
i=1 B(wi , wi )

where P is the orthogonal projection onto W .

Sketch of Proof. Arbitrarily choose a basis w1 , . . . , ws of W and extend it to a basis of


V . Suppose the matrix form of B is
ñ ô
P Q
R S

where P is the matrix form of B restricted on W and R = Q∗ . We want to make a base


change to make both Q and R zero. In this way, we will get a basis w1 , . . . , ws , v1 , . . . , vr
such that B(wi , vj ) = 0, which shows that W + W ⊥ = V .
Suppose that the base-change matrix we choose is
ñ ô
I A
,
0 I

then we have the new matrix form of B is


ñ ôñ ôñ ô ñ ô
I 0 P Q I A P PA + Q
= ∗ .
A∗ I R S 0 I A P +R ·

Since B is non-degenerate on W , we have P is invertible, and so we can choose

A = −P −1 Q

91
Hung-Hsun, Yu 7.6 Inner Product Space

to make both Q and R zero.


Now if w1 , w2 , . . . , ws is an orthogonal basis of W , then we have to prove

s
B(wi , v)
v− wi
i=1 B(wi , wi )

is orthogonal to any element in W . First notice that B is non-degenerate on W , which


shows that B(wi , wi ) ̸= 0 and so the formula makes sense. Now it suffices to check this
for the basis. For any j = 1, . . . , s we have

s
B(wi , v) ∑s
B(wi , v)
B(wj , v − wi ) = B(wj , v) − B(wj , wi ) = 0.
i=1 B(wi , wi ) i=1 B(wi , wi )

Corollary 7.5.1. If B is non-degenerate and W is a subspace such that B is non-


degenerate on W , then B is also non-degenerate on W ⊥ . Furthermore, (W ⊥ )⊥ = W .

Sketch of Proof. Choose a basis of W and a basis of W ⊥ . We know that the two bases
together form a basis of V . Consider the matrix form of B with respect to this basis.
Then it is of the form ñ ô
P 0
.
0 S
Since B is non-degenerate, we have that both P and S are invertible. This shows that B
is also non-degenerate on W ⊥ . Consequently, V = (W ⊥ ) ⊕ (W ⊥ )⊥ , and so (W ⊥ )⊥ = W
because they have the same dimension.

From these facts, we know that things get easier when the symmetric/hermitian form
is non-degenerate on every subspace. Inner product happens to satisfy this property, and
so we will focus on the case when B is an inner product in the next section.

7.6 Inner Product Space


In this and the next sections, we will suppose that V is a finite dimensional real/complex
vector space equipped with an inner product ⟨·, ·⟩. If we have an orthonormal basis,
then the matrix form of the inner product is just the identity, and so we can identify V
with the standard inner product space Rn /Cn . The question here is if there exists an
orthonormal basis.
Theorem 7.6.1. (Gram-Schimdt Process) Suppose that v1 , . . . , vn is a basis of V . Then
v1′ , . . . , vn′ is an orthogonal basis, where


i−1
⟨vj′ , vi ⟩ ′
vi′ = vi − ′ ′
vj .
j=1 ⟨vj , vj ⟩

Moreover,
⟨v1′ , v1′ ⟩− 2 v1′ , . . . , ⟨vn′ , vn′ ⟩− 2 vn′
1 1

form an orthonormal basis of V .

92
Hung-Hsun, Yu 7.7 Spectral Theorem

When working with inner product spaces, we usually consider the orthonormal basis.
This limits the possibility of the base-change matrix.
Property 7.6.1. If F = R, then a real square matrix A is some base-change matrix
of V if and only if A is orthogonal. If F = C, then a complex square matrix A is some
base-change matrix of V if and only if A∗ = A−1 .

Definition 7.6.1. If A∗ = A−1 , then A is said to be unitary. The set of all n by n


unitary matrices forms a group, which is denoted by Un .

Definition 7.6.2. If B = QAQ−1 for some orthogonal matrix Q, then A, B are said
to be orthogonally similar. If B = QAQ−1 for some unitary Q, then A, B are said to be
unitarily similar.

Note that the operator ∗ on the matrices is compatible with conjugations of orthogo-
nal/unitary matrices: If Q is orthogonal/unitary, then

QA∗ Q−1 = (QAQ−1 )∗ .

Therefore the following definition makes sense:


Definition 7.6.3. Suppose that T is an operator. The adjoint operator T ∗ of T is a
linear operator such that
[T ∗ ]β = [T ]∗β
for some orthonormal basis β.

Property 7.6.2. For any x, y ∈ V , we have

⟨T x, y⟩ = ⟨x, T ∗ y⟩.

Moreover, T ∗ is the unique operator that satisfies this.

We can also pull the definition of symmetric/hermitian and orthogonal/unitary to the


operator.
Definition 7.6.4. A linear opertaor T is symmetric/hermitian if T = T ∗ . It is orthog-
onal/unitary if T ∗ = T −1 .

Property 7.6.3. A linear operator T is symmetric/hermitian if and only if

⟨T x, y⟩ = ⟨x, T y⟩ ∀x, y ∈ V.

It is orthogonal/unitary if and only if

⟨T x, T y⟩ = ⟨x, y⟩ ∀x, y ∈ V.

7.7 Spectral Theorem


As always, we are interested in the matrices that are diagonalizable. We will begin with
the case F = C. The case F = R is very similar to this case.

93
Hung-Hsun, Yu 7.7 Spectral Theorem

Definition 7.7.1. A matrix A/operator T is normal if AA∗ = A∗ A/T T ∗ = T ∗ T .

This definition seems to come out of nowhere at this point, but as we will see, normal
operators are precisely those that are diagonalizable.
Property 7.7.1. A linear operator T is normal if and only if ⟨T x, T y⟩ = ⟨T ∗ x, T ∗ y⟩
for any x, y ∈ V .

Now we are going to prove that normal operators can be diagonalized. The concept
of invariant subspace here is still helpful.
Property 7.7.2. Suppose that W is a T -invariant subspace. Then W ⊥ is a T ∗ -invariant
subspace.

Sketch of Proof. Let w′ be any element in W ⊥ . We have to prove that for any w ∈ W
we have ⟨w, T ∗ w′ ⟩ = 0. This is justified by
⟨w, T ∗ w′ ⟩ = ⟨T w, w′ ⟩ = 0
due to the fact that T w ∈ W .
Lemma 7.7.1. Suppose that T is a normal operator and v is its eigenvector correspond-
ing to the eigenvalue λ. Then v is an eigenvector of T ∗ corresponding to the eigenvalue
λ.

Sketch of Proof. If λ = 0 then this is clear because


⟨T ∗ v, T ∗ v⟩ = ⟨T v, T v⟩ = 0.
For general λ, consider T − λ id. It is clear that (T − λ id)∗ = T ∗ − λ id. Therefore
(T − λ id)(T − λ id)∗ = T T ∗ − λT − λT = T ∗ T − λT − λT = (T − λ id)∗ (T − λ id)
and so T − λ id is also normal. Thus we can apply the result for λ = 0.
Theorem 7.7.1. (Spectral theorem for normal operator) A linear operator is diagonal-
izable in a complex inner product space if and only it it is normal.

Sketch of Proof. If T is diagonalizable, then there exists an orthonormal basis β such


that [T ]β is diagonalized. Consequently [T ]∗β is also diagonalized, and so [T ]β commutes
with [T ]∗β , which shows that T is normal.
Conversely, if T is normal, then choose an eigenvector v of T . From the lemma we
know that v is also an eigenvector of T ∗ . Let W be the subspace spanned by v, then W
is both T - and T ∗ -invariant. Therefore W ⊥ is also both T - and T ∗ -invariant. As a result,
T is also normal restricted on W ⊥ . Hence we can diagonalize T by induction.
Corollary 7.7.1. (Spectral theorem for normal matrix) A complex square matrix is
unitarily similar to a diagonalized matrix if and only if it is normal.

Corollary 7.7.2. (Spectral decomposition of normal operator) If T is normal, and Pλ


is the orthogonal projection to Eλ for every eigenvalue λ of T , then

T = λPλ .
λ

94
Hung-Hsun, Yu 7.8 Positive Definite and Semi-definite Matrices

Note that unitary matrices and hermitian matrices are all normal, and so these results
apply particularly on those matrices.
Corollary 7.7.3. Every conjugacy class of Un consists of at least one diagonalized
matrix.

The case F = R is not too different from the case F = C.


Theorem 7.7.2. (Spectral theorem for symmetric operator) A linear operator is diag-
onalizale in a real inner product space if and olny if it is symmetric.

Sketch of Proof. If T is diagonalizable, then there exists an orthonormal basis β such


that [T ]β is diagonalized. Consequently [T ]Tβ = [T ]β , and so T is symmetric.
Conversely, if T is symmetric, then all its eigenvalues are real. The rest of the proof
is identical to the one for normal operators.

Corollary 7.7.4. (Spectral theorem for symmetric matrix) A real square matrix is
orthogonally similar to a diagonalized matrix if and only if it is symmetric.

Corollary 7.7.5. (Spectral decomposition of normal operator) If T is symmetric, and


Pλ is the orthogonal projection to Eλ for every eigenvalue λ of T , then

T = λPλ .
λ

Actually most of the inner product spaces that people care about are infinite dimen-
sional, and those places are where the fun begins. However, they are more of calculus
than of linear algebra, so I didn’t mention them. If you are interested, try to google it
for fun!

7.8 Positive Definite and Semi-definite Matrices


Positive definiteness and semi-definiteness are strong properties that we might hope that
a matrix should have. However, it is usually hard to verify that a matrix is positive
(semi-)definite directly from definition. In this section, we will introduce several criteria
to help us verify if a given matrix is positive (semi-)definite or not. Let’s begin with an
easy one.
Property 7.8.1. A real symmetric/hermitian matrix A is positive definite if and only
if all of its eigenvalues are positive. Besides, A is positive semi-definite if and only if all
of its eigenvalues are non-negative.

Sketch of Proof. If A is positive definite, then for every eigenvector v corresponding to


the eigenvalue λ, we have
λv ∗ v = v ∗ Av > 0,
and so λ > 0. On the other hand, if all of its eigenvalues are positive, by the spectral the-
orem there exists an orthonormal basis x1 , . . . , xn with respect to standard inner product
consisting eigenvectors of A. Suppose that xi corresponds to the eigenvalue λi , then for

95
Hung-Hsun, Yu 7.8 Positive Definite and Semi-definite Matrices


every x = ci xi where ci are not all zero, we have

n ∑
n ∑
n ∑
n ∑
n
x∗ Ax = ( ci x∗i )A( ci x i ) = ( ci x∗i )( λi ci xi ) = |ci |2 λi > 0.
i=1 i=1 i=1 i=1 i=1

The proof for the positive semi-definite case proceeds similarly.


In the previous proof we see real symmetric/hermitian matrices as a linear operator
on the standard inner product space. We can also consider it as a bilinear form defined
on the vector space and try to diagonalize it. This gives us the following result.
Property 7.8.2. A real symmetric/hermitian matrix A is positive definite if and only
if there exists an invertible matrix B such that A = B ∗ B. Besides, it is positive semi-
definite if and only if there exists a square matrix B of the same size such that A = B ∗ B.

Sketch of Proof. If A = B ∗ B for some invertible matrix B, then it is clear that A is


positive definite. Conversely, if A is positive definite, then we know that there exists
invertible matrix P and diagonalized matrix D with entries 0, 1, −1 such that A = P ∗ DP .
Since A is positive definite, it is clear that all diagonal entries of D must be 1, and so
A = P ∗ P . A similar argument works for positive semi-definite matrices.
Now both equivalent conditions that we proved are hard to verify too, so we will
introduce another criterion called Sylvester’s criterion. To do this, we have to introduce
some definitions.
Definition 7.8.1. Suppose that A is an n by n matrix. For any 0 ≤ k ≤ n, a k by k
minor of A is the determinant of a k by k matrix obtained by deleting n − k columns and
n − k rows of A. The minor is principal if the i-th column is deleted if and only if the
i-th row is deleted for all i. The principal minor is leading if the remaining columns/rows
are the first k columns/rows.

Lemma 7.8.1. (Cauchy-Binet formula) Suppose that A is an m by n matrix and B is


an n by m matrix, then

det(AB) = det(A[m],S ) det(BS,[m] )
S⊆[n],|S|=m

where for every S ′ ⊆ [m] and S ′′ ⊆ [n], the symbol AS ′ ,S ′′ denotes the submatrix of A
containing rows indexed by elements in S ′ and columns indexed by elements in S ′′ , and
the symbol BS ′′ ,S ′ denotes the submatrix of B containing the rows indexed by elements
in S ′′ and columns indexed by elements in S ′ .

Sketch of Proof. We will use the generalization of Exercise 3 in Chapter 4:


charBA (x) = xn−m charAB (x).
We will proceed by comparing the coefficients of xn−m of both sides. Note that the
coefficient of xn−m of right hand side is simply (−1)m det(AB). The coefficient of xn−m of
the left hand side is the sum of all principal m by m minors of BA multiplied by (−1)m .
Therefore,
∑ ∑
det(AB) = det((BA)S,S ) = det(A[m],S ) det(BS,[m] ).
S⊆[n],|S|=m S⊆[n],|S|=m

96
Hung-Hsun, Yu 7.8 Positive Definite and Semi-definite Matrices

Theorem 7.8.1. (Sylvester’s criterion for positive definiteness) A real symmetric/ her-
mitian matrix A is positive definite if and only if all of the leading principal minors of A
is positive.

Sketch of Proof. Suppose that A is positive definite, then there exists an invertible matrix
B such that B ∗ B = A. We can actually show that any principal minor of A is positive.
For any S ⊆ [n], let P ∈ Mn,|S| satisfy

1i is the j-th element in S
pij =  .
0 otherwise
Then

det(AS,S ) = det(P ∗ AP ) = det((BP )∗ (BP )) = det((BP )∗[|S|],S0 ) det(BPS0 ,[|S|] )
S0 ⊆[n],|S0 |=|S|

= det(BPS0 ,[|S|] )2 ≥ 0.
S0 ⊆[n],|S0 |=|S|

The equality holds if and only if every |S|-minor of BP is zero. However, this is not
possible since B is invertible. Therefore every principal minors of A is positive.
Conversely, if every leading principal minor is positive, then we can induct on n. It
is trivial when n = 1. Suppose that the statement holds for n = k − 1, then for n = k we
can diagonalize the first k − 1 by k − 1 submatrix so that
ñ ô
∗ I v
Q AQ = k−1∗
v a
for some vector v and real a. Because

k−1
0 < | det(Q)|2 det(A) = a − |vi |2
i=1

we have that a > |vi |2 . Therefore for every x we have that

k−1 ∑
k−1 ∑
k−1
(Qx)∗ A(Qx) = (|xi |2 + 2ℜ(vi xi xk )) + a|xn |2 = |xi + vi xk |2 + (a − |vi |2 )|xk |2 ≥ 0.
i=1 i=1 i=1

The equality holds when xk = 0 and xi + vi xk = 0 for all i, which shows that x = 0.
Therefore A is also positive definite, and the desired statement follows by induction.
Theorem 7.8.2. (Sylvester’s criterion for positive semi-definiteness) A real symmet-
ric/hermitian matrix A is positive semi-definite if and only if all of the principal minors
of A is non-negative.

Sketch of Proof. With a same argument we can prove that if A is positive semi-definite
then every principal minor is non-negative. Conversely, if every principal minor is non-
negative, then since [xi ] charA (x) is the sum of n − i-minors multiplied by (−1)n−i , we
have that (−1)n−i [xi ] charA (x) ≥ 0. Therefore for any t > 0 we have that

n
(−1)n charA (−t) = (−1)n−i ti [xi ] charA (x) > 0,
i=0

and so charA (x) has no negative roots. This shows that A is positive semi-definite.

97
Hung-Hsun, Yu 7.9 Application 1: Quadratic Form and Quadric

7.9 Application 1: Quadratic Form and Quadric


Definition 7.9.1. A quadratic form is a homogeneous polynomial of degree 2.

Definition 7.9.2. A quadric is the zero locus of the function

q(x1 , x2 , . . . , xn ) + c1 x1 + · · · + cn xn + c

where q is a quadratic form. It is degenerate if it is empty or there is a “singular point”


(we won’t go into details about it, but a singular point is like a place where it is impossible
to do differentiation). It is non-degenerate if it is not degenerate.

Definition 7.9.3. A conic is a non-degenerate quadric in R2 .

In this section, we will focus on the case F = R and classify all the conics. To be
more specific, we will determine the orbits of conics under the action of isometries of the
plane. To achieve this, let’s first examine quadratic forms over real.
Property 7.9.1. Suppose that q(x1 , . . . , xn ) = a11 x21 + · · · + ann x2n + 2a12 x1 x2 + · · · +
2a(n−1)n xn−1 xn . Then we have
xT Ax = q(x)
for all x ∈ Rn . Here A is real symmetric. Conversely, for any real symmetric matrix A,
the form xT Ax is a quadratic form over x1 , . . . , xn .
∑n 2
Corollary 7.9.1. There exists an orthogonal operator T such that q(T x) = i=1 bii (T x)i
where bii are some reals.

Sketch of Proof. Write q(x) = xT Ax. By the spectral theorem there exists an orthog-
onal matrix Q such that QT AQ is diagonalized. Therefore q(Qx) = xT (QT AQ)x =
∑n 2
i=1 bii (Qx)i where bii are some reals.

Now let’s classify the conics. We know that it is the zero locus of the function

xT Ax + bT x + c

for some real symmetrix 2 by 2 matrix A, two dimensional vector b and constant c. By the
spectral theorem we can assume that A is diagonalized. Therefore the equation becomes

a11 x21 + a22 x22 + b1 x1 + b2 x2 + c = 0.

If both a11 , a22 are nonzero, then we can rewrite it as


b1 2 b2 2
a11 (x1 + ) + a22 (x2 + ) + c′ = 0.
2a11 2a22
Therefore by transformation we can eliminate b1 and b2 . Now if c′ = 0 then it is degenerate
because (0, 0) becomes singular, so let’s WLOG assume that c′ = −1. If a11 , a22 are
both negative, then the zero locus is empty, which is degenerate. Hence there are two
possibilities:
ax21 + bx22 − 1 = 0 (ellipse)

98
Hung-Hsun, Yu 7.9 Application 1: Quadratic Form and Quadric

ax21 − bx22 − 1 = 0 (hyperbola)


where a, b > 0.
If some of a11 , a22 is zero, WLOG assume that a22 = 0. If a11 = 0 then the quadric is
degenerate, so a11 ̸= 0 and then we can do the same trick to eliminate b1 . Now if b2 = 0
then the quadric becomes degenerate, so WLOG assume that b2 = −1. We can then
“absorb” c into x2 by
c
b2 x2 + c = b2 (x2 + ).
b2
Hence the conic must be of the form

ax21 − x2 = 0 (parabola).

Theorem 7.9.1. Every conic is congruent to an ellipse, a hyperbola or a parabola.

We can do the same thing in R3 and get the classification of nondegenerate quadrics
in R3 :
Theorem 7.9.2. Every nondegenerate quadric in R3 is congruent to one of the follow-
ing:
ax21 + bx22 + cx23 − 1 = 0 (ellipsoid)
ax21 + bx22 − cx23 − 1 = 0 (hyperboloid of one sheet)
ax21 − bx22 − cx23 − 1 = 0 (hyperboloid of two sheets)
ax21 + bx22 − x3 = 0 (elliptic paraboloid)
ax21 − bx22 − x3 = 0 (hyperbolic paraboloid)
where a, b, c > 0.

Since we are less familiar with the nondegenerate quadrics in R3 , let’s give some
geometric descriptions of the five cases.

1. An ellipsoid is like a distorted sphere. The intersection of any plane with an ellipsoid
is either empty, a point or an ellipse. This is why it is called an ellipsoid.

2. A hyperboloid of one sheet is connected: the intersection of x3 = x with the


hyperboloid of one sheet is always an ellipse. This is why we say that it is of one
sheet. It is like a distorted cylinder whose center is thinner.

3. A hyperboloid of two sheets is not connected: the intersection of x1 = 0 with the


hperboloid of two sheets is empty, and so there is a connected component in x1 > 0
and another in x1 < 0. This is why we say it is of two sheets. Note that if we
slowly adjust −1 to 1, then two sheets become one sheet, and at the point where
two sheets almost be come one sheet, the equation is

ax21 = bx22 + cx23 .

This is a cone, and it is degenerate because (0, 0, 0) is singular. The hyperboloid of


one sheet “covers” the cone, and the cone “contains” the hyperboloid of two sheets.

99
Hung-Hsun, Yu 7.10 Application 2: Legendre Polynomial

4. An elliptic paraboloid is like a family of ellipses that are stacked parabolically. It


looks like an oval cup. The intersection of x1 = x with the elliptic paraboloid is a
parabola. The intersection of x3 = x with the elliptic paraboloid is an ellipse.
5. A hyperbolic paraboloid is like a family of hyperbola that are stacked parabolically.
It looks like a saddle. The intersection of x1 = x with the hyperbolic paraboloid is a
parabola. The intersection of x3 = x with the hyperbolic paraboloid is a hyperbola.

7.10 Application 2: Legendre Polynomial


Let’s consider the real vector space C([−1, 1], R) consisting of real-valued continuous
functions on [−1, 1]. We can define an inner product on C([−1, 1], R) by
∫ 1
⟨f, g⟩ = f (x)g(x)dx.
−1

This induces naturally a stuff called “norm”:


»
||f ||2 = ⟨f, f ⟩
which measures the distance between f and 0. Note that C([−1, 1], R) is infinite dimen-
sional, and we know nearly nothing about it. So let’s consider an easy finite dimensional
subspace, namely R[x]≤n . We would naturally like to find an orthogonal basis of R[x]≤n .
Of course we could just go though the Gram-Schimdt process with the basis {1, x, . . . , xn },
but the computations is too complicated to carry out. Here, we will look for a polynomial
Pn of degree n that is orthogonal to every polynomial in R[x]≤n−1 . If this succeeds, then
P0 , . . . , Pn form an orthogonal basis of R[x]≤n .
Suppose that Q is a polynomial of degree at most n − 1, then we have
dn
Q(x) = 0.
dxn
Also let P be some polynomial to be determined and
dn
P (x) = Pn (x).
dxn
Then by integration by part, we know that
∫ 1
⟨Pn , Q⟩ = P (n) (x)Q(x)dx
−1
∫ 1
=P (n−1)
(1)Q(1) − P (n−1)
(−1)Q(−1) − P (n−1) (x)Q′ (x)dx
−1
Ä ä Ä ä
= P (n−1)
(1)Q(1) − P (n−1)
(−1)Q(−1) − P (n−2) (1)Q′ (1) − P (n−2) (−1)Q′ (−1)
∫ 1
+ P (n−2) (x)Q(2) (x)dx
−1
= ···

n−1 Ä ä ∫ 1
= (−1) P i (n−i−1)
(1)Q (1) − P
(i) (n−i−1) (i)
(−1)Q (−1) + (−1) n
P (x)Q(n) (x)dx
i=0 −1


n−1 Ä ä
= (−1)i P (n−i−1) (1)Q(i) (1) − P (n−i−1) (−1)Q(i) (−1) .
i=0

100
Hung-Hsun, Yu 7.10 Application 2: Legendre Polynomial

Therefore to make ⟨Pn , Q⟩ = 0 for all Q, it suffices to choose P such that


P (i) (1) = P (i) (−1) = 0
for any i = 0, 1, . . . , n − 1. This forces P to be c(x2 − 1)n . Conventionally we choose c to
be 2−n (n!)−1 to make Pn (1) = 1. Thus,
1 dn 2
Pn (x) = (x − 1)n .
2n n! dxn
Definition 7.10.1. Pn is called the Legendre polynomial.

Property 7.10.1. For any non-negative integers m, n, we have that ⟨Pm , Pn ⟩ = 0 if


m ̸= n. Besides,
2
⟨Pn , Pn ⟩ = .
2n + 1

Sketch of Proof. If m ̸= n, we can assume that m < n, and so ⟨Pm , Pn ⟩ = 0 by the choice
of Pn . Now by the discussion above, we know that
Ä2n ä Ä2n ä
∫ 1 ∫ 1 ∫ 1
⟨Pn , Pn ⟩ = (−1)n
P (x)P (2n)
(x)dx = (−1) n n
(x − 1) dx =
2 n n
(1 − x2 )n dx.
−1 −1 22n 22n −1

It is well-known that
∫ 1 ∫ π
22n+1 (n!)2
(1 − x2 )n dx = sin2n+1 (x)dx = .
−1 0 (2n + 1)!
Therefore Ä2n ä
22n+1 (n!)2 2
⟨Pn , Pn ⟩ = n
· = .
22n (2n + 1)! 2n + 1

Corollary 7.10.1. For every polynomial P , we have



∑ 2n + 1
P (x) = ⟨P, Pn ⟩Pn (x).
n=0 2
Note that the summands are eventually zero, and so this sum is well-defined.

Now for every f ∈ C([−1, 1], R) we can consider its orthogonal projection to R[x]≤n :

n
2n + 1
fn = ⟨P, Pn ⟩Pn (x).
i=0 2
It is clear that ||fn || is increasing, and that ||f − fn ||2 + ||fn ||2 = ||f ||2 . One might guess
that ||f − fn || tends to 0 as n tends to infinity, which shows that f0 , f1 , . . . is a sequence
of polynomials that “tends to” f . This is actually true because “the Parseval’s identity
holds” in this case. Since the proof involves some calculus details, it is omitted here.
Theorem 7.10.1. (A weak form of Stone-Weierstrass theorem) For every continuous
function f on an interval [a, b], there exists a sequence of polynomials f1 , f2 , . . . such that
∫ b
lim (f (x) − fn (x))2 dx = 0.
n→∞ a

In particular, we have limn→∞ fn (x) = f (x) for every x ∈ [a, b].

101
Hung-Hsun, Yu 7.11 Random Problem Set

7.11 Random Problem Set


1. (7.1) Suppose that V is a finite dimensional vector space. Show that despite the
fact that there is no canonical isomorphism between V and V ∗ (that is, we have
to make a choice of the basis), there is a canonical isomorphism between V and its
double-dual, V ∗∗ .

2. (7.2) Suppoes that ⟨·, ·⟩ is a hermitian form on V . Show that for any v ∈ V we
have ⟨v, v⟩ ∈ R.

3. (7.3) Give an example of non-diagonalizable symmetric form.

4. (7.3) Let A be an n by n matrix such that for any i, j, we have Aij = 1 if and only
if |i − j| = 1. Since A is real symmetric, we know that every eigenvalue is real.
Find all of its eigenvalues and the corresponding eigenvectors. Is A diagonalizable?

5. (7.5) Let R[x] be a vector space equipped with the inner product
∫ 1
⟨f, g⟩ = f (x)g(x)dx.
0

Calculate the orthogonal projection of x2 onto the subspace spanned by x.

6. (7.6) Suppose that B is a symmetric/hermitian form on a real/complex space V .


If B is non-degenerate on every subspace W of V , show that either B or −B is
positive definite.

7. (7.6) (Riesz representation theorem for finite dimensional real inner product space)
Suppose that ⟨·, ·⟩ is an inner product on a R-vector space V . Prove that the map
from V to V ∗ sending x to φx is an isomorphism. Here

φx (y) = ⟨x, y⟩ ∀y ∈ V.

8. (7.7) Suppose that T is a linear operator on a complex inner product space. Show
that T is normal if and only if there exists a polynomial g such that T ∗ = g(T ).

9. (7.8) We’ve shown that a real symmetric/hermitian matrix is positive definite if


and only if all of its eigenvalues are positive. Actually, a stronger statement holds:
the numbers of positive, zero and negative eigenvalues actually correspond to its
signature. Prove this statement.

10. (7.9) (Wolstenholme’s inequality) Show that if x1 , x2 , x3 , θ1 , θ2 , θ3 are reals such that
θ1 + θ2 + θ3 = π, then

x21 + x32 + x23 ≥ 2x1 x2 cos θ3 + 2x2 x3 cos θ1 + 2x3 x1 cos θ2 .

Hint: You can prove that cos2 θ1 + cos2 θ2 + cos2 θ3 + 2 cos θ1 cos θ2 cos θ3 = 1. You
can also try to do row operations to calculate the determinant.

102
Hung-Hsun, Yu 7.11 Random Problem Set

11. (7.10) Let F be the set of all real-valued continuous functions of period 2π defined
on R, and define an inner product on it:
∫ π
⟨f, g⟩ = f (x)g(x)dx.
−π

Show that {1, sin x, sin 2x, . . . , cos x, cos 2x, . . .} is an orthogonal set. Normalize this
orthogonal set. Express f in an infinite linear combination of 1, sin nx, cos nx where
f ∈ F satisfies 
x + π −π ≤ x ≤ 0
f (x) =  .
π−x 0≤x≤π

103
Hung-Hsun, Yu 7.11 Random Problem Set

104
Chapter 8

Group representation

Group representation is a relatively new tool to find some properties of groups. We know
little about groups, but we know a lot about vector spaces, so we simply study groups by
examining its (linear) action on vector spaces. This weird idea somehow takes us really
far and gives us some surprising result.
This note will be focusing on group representations of finite groups over complex
number. Group representations behave totally different if we consider other fields of
positive characteristics or if the group is not finite, so we will not discuss those here.

8.1 Definition
Definition 8.1.1. A representation of a group G on a vector space V is a homomor-
phism from G to GL(V ) where GL(V ) denotes all the invertible operators on V . It is
faithful if it is injective. The dimension of the representation is the dimension of V .

In other words, a representation of G is a special kind of group action on V with the


restriction that each g ∈ G acts on V as an linear operator.
The first problem arises here is if there always is a faithful representation of G here.
If not, then it is impossible to fully recover the properties of G simply by examining the
representations. It turns out that faithful representations always exist when |G| is finite.
Property 8.1.1. If G is a finite group, then there exists a faithful representation over
C of G.

Sketch of Proof. By Cayley theorem there exists a monomorphism from G to S|G| . Since
there is also a monomorphism from S|G| to GL|G| (C) sending a permutation to its corre-
sponding permutation matrix, we’re done.

Definition 8.1.2. The representation above is called the regular representation of G.

As always, we have to think about what the subobject is and what the map between
the objects is. Since we hope that the subrepresentation is still a representation of G,
the only possible way to define this is to consider the subspace of V . However, not all
subspaces give a subrepresentation–only those that are invariant under the action of G
do.

105
Hung-Hsun, Yu 8.2 Unitary Representation and Maschke’s Theorem

Definition 8.1.3. Given a representation ρ of G on V . A subrepresentation of ρ is


the restriction ρ|W on a G-invariant subspace W . Here G-invariant means that it is
ρ(g)-invariant for all g ∈ G.

Definition 8.1.4. Given two representations ρ, ρ′ of G on V, V ′ , respectively. An equiv-


ariant map T from (ρ, V ) to (ρ′ , V ′ ) is a linear map from V to V ′ such that for any g ∈ G,
we have T ρ(g) = ρ′ (g)T . That is, the following diagram commutes for all g ∈ G:
T
V V ′′

ρ(g) ρ′ (g)

V V′
T

If there is an equivariant map T from (ρ, V ) to (ρ′ , V ′ ) and an equivariant map T ′ from
(ρ′ , V ′ ) to (ρ, V ) such that T, T ′ are mutaully inverse, then we say that ρ and ρ′ are
isomorphic.

Definition 8.1.5. A representation is irreducible if it does not have any proper sub-
representation. If this is not the case, then the representation is said to be reducible.

Example 8.1.1. For every figure F on the plane R2 , if the point group of tye symmetry
group of F is trivial, then the symmetry group of F is a subgroup of GL(R2 ), which
induces naturally a faithful representation of dimension 2 over R. Take F as an equilateral
triangle, then we get a faithful representation of S3 . Since GL2 (R) is a subgroup of
GL2 (C), we get a faithful representation of S3 over C of dimension 2. Call it ρ′2 .

Example 8.1.2. Consider another representation of S3 on C3 obtained by permuting


the three entries. It is easy to verify that the only two proper subrepresentations are
(ρ1 , W1 ) and (ρ2 , W2 ) where W1 is the subspace spannes by (1, 1, 1) and W2 is the subspace
such that (a, b, c) ∈ W2 if and only if a + b + c = 0. One can verify that ρ2 and ρ′2 are
isomorphic. Moreover, one can verify that ρ1 , ρ2 are irreducible.

Note that C3 = W1 ⊕ W2 in the previous example. This somehow tells us that ρ is


the “direct sum” of ρ1 and ρ2 . Let’s put it more clear.
Definition 8.1.6. Suppose that ρ, ρ′ are representaions of G on V, V ′ , respectively.
ρ ⊕ ρ′ is the representation of G on V ⊕ V ′ such that (ρ ⊕ ρ′ )g = (ρ(g)) ⊕ (ρ(g ′ )) for any
g ∈ G. Here (T ⊕ T ′ )(v + v ′ ) = T v + T v ′ for any T ∈ GL(V ), T ′ ∈ GL(V ′ ), v ∈ V, v ′ ∈ V ′ .

It seems that maybe every representation can be decomposed into a direct sum of
some irreducible representations. It is the case when G is finite and the representation is
finite dimensional and over C.

8.2 Unitary Representation and Maschke’s Theorem


In this section, we will prove that any finite dimensional representation over C of a finite
group can be decomposed as a direct sum of irreducible representations. To show this, we

106
Hung-Hsun, Yu 8.2 Unitary Representation and Maschke’s Theorem

only need to show that if ρ1 is a proper subrepresentation of ρ, then there exists another
proper subrepresentation ρ2 such that ρ = ρ1 ⊕ ρ2 . In other words, we need to show that
if V ′ is a proper nontrivial G-invariant subspace, then there exists V ′′ such that V ′′ is
G-invariant subspace such that V = V ′ ⊕ V ′′ .
The problem here is that given a V ′ , there is no canonical way to choose V ′′ such that
V = V ′ ⊕ V ′′ . The key point here is that looking back Example 8.1.2, we find that the
two G-invariant subspaces are orthogonal complement of each other with respect to the
standard inner product. Therefore maybe we can define an appropriate inner product
⟨·, ·⟩ on V and take V ′′ to be the orthogonal complement. To show that V ′′ is G-invariant,
we hope that for every g ∈ G, we have

⟨v ′ , v ′′ ⟩ = 0 ∀v ′ ∈ V ′ ⇒ ⟨v ′ , ρ(g)v ′′ ⟩ = 0 ∀v ′ ∈ V ′ .

Note that since V ′ is G-invariant, the condition

⟨v ′ , ρ(g)v ′′ ⟩ = 0 ∀v ′ ∈ V ′

is the same as
⟨ρ(g)v ′ , ρ(g)v ′′ ⟩ = 0 ∀v ′ ∈ V ′ .
Therefore we will naturally hope that the inner product satisfies that

⟨u, v⟩ = ⟨ρ(g)u, ρ(g)v⟩

for all u, v ∈ V .
Definition 8.2.1. Suppose that V is an inner product space over C and ρ is a repre-
sentation of G on V . The representation ρ is said to be unitary if for every g ∈ G, the
operator ρ(g) is unitary.

Now our goal is clear: we have to construct an inner product such that the given
representation is unitary with respect to it. The trick that we will use is the averaging
trick. This trick will appear for another time soon after this, so keep the trick in mind.
Lemma 8.2.1. Suppose that ρ is a representation of a finite group G on V over C.
Then there exists an inner product on V such that ρ is unitary with respect to the inner
product.

Sketch of Proof. Let’s begin with an arbitrary inner product {·, ·} and try to construct
the desired inner product. For every v, w ∈ V , define
1 ∑
⟨v, w⟩ = {ρ(g)v, ρ(g)w}.
|G| g∈G

It is easy to verify that ⟨·, ·⟩ is still hermitian and positive definite. Also, for every g ′ ∈ G
we have
1 ∑ 1 ∑
⟨ρ(g ′ )v, ρ(g ′ )w⟩ = {ρ(gg ′ )v, ρ(gg ′ )w} = {ρ(g)v, ρ(g)w} = ⟨v, w⟩
|G| g∈G |G| g∈G

and so ρ is unitary.
Theorem 8.2.1. Suppose that ρ is a finite dimensional representation of a finite group
G on V over C. Then ρ can be decomposed as a direct sum of irreducible representations.

107
Hung-Hsun, Yu 8.3 Schur’s Lemma

Sketch of Proof. We can induct by the dimension of ρ. If ρ is irreducible, then there


is nothing to do. If ρ is reducible, suppose that V ′ is a nontrivial proper G-invariant
subspace. All we have to show is that there exists another G-invariant subspace V ′′ such
that V = V ′ ⊕V ′′ because the rest can be handled by the induction hypothesis. Take ⟨·, ·⟩
as the inner product in the lemma, and take V ′′ = V ′⊥ . Then for every g ∈ G, v ∈ V ′
and w ∈ V ′′ , we have
⟨v, ρ(g)w⟩ = ⟨ρ(g −1 )v, w⟩ = 0
because ρ(g −1 )v ∈ V ′ . This shows that ρ(g)w ∈ V ′′ , and so V ′′ is also G-invariant.

8.3 Schur’s Lemma


As always, when we consider the maps between the objects, we would like to consider
the kernel and the image of the map. In the case of group representation, we can show
that the kernel and the image are G-invariant.
Lemma 8.3.1. Suppose that T is an equivariant map from the representation (ρV , V )
of G to the representation (ρW , W ) of G. Then ker T and im T are both G-invariant.

Sketch of Proof. For every g ∈ G and v ∈ ker T , we have

T ρV (g)v = ρW (g)T v = 0

and so ρV (g)v ∈ ker T . For every g ∈ G and w ∈ im T , supposw that T v = w, then

ρW (g)w = ρW (g)T v = T ρV T ρV

Now if ρV and ρW are irreducible, then we immediately get that T must be trivial or
be an isomorphism. In fact, we can prove something even stronger.
Theorem 8.3.1. (Schur’s Lemma) Suppose that ρV , ρW are irreducible representations
of G on V, W over C respectively, and T is an equivariant map from V to W . If V, W are
not isomorphic, then T = 0. If V = W , then T is a scalar multiple of the identity map.

Sketch of Proof. Suppose that T ̸= 0. Since ker T, im T are G-invariant and T ̸= 0, we


have that ker T = 0 and im T = ρW (V ), which shows that T is an isomorphism. Now
suppose that V = W . If there exists v ∈ V such that T v = 0 then T = 0. For the general
case, we can use the trick used in Lemma 7.7: Choose an eigenvector v of T corresponding
to λ and consider the operator T − λ id. It is easy to verify that T − λ id is equivariant,
and (T − λ id)v = 0. Therefore T − λ id = 0, and so T = λ id.

8.4 Interlude: Tensor Product


To make the up-coming proof cleaner, I am going to introduce tensor product of vector
spaces first. This is somewhat irrelevant to the other contents in this chapter, so feel free
to skip this section if you already know tensor products.
The whole motivation is about bilinear map. Suppose that U, V, W are all vector
spaces over the same field F , consider a bilinear map ϕ : U × V → W . It is clear that ϕ is

108
Hung-Hsun, Yu 8.4 Interlude: Tensor Product

not a linear transformation (because we don’t even have a good way to define a structure
of vector space on U × V ), which is kind of sad. If there exists a vector space U ⊗ V
such that every bilinear map ϕ : U × V → W induces naturally and bijectively a linear
operator ϕ′ : U ⊗ V → W , then maybe we will be happier.
The first intuition to build U ⊗ V is to consider the vector space spanned by u ⊗ v for
all u ∈ U, v ∈ V . Then we can simply define ϕ′ (u ⊗ v) = ϕ(u, v). We have to nonetheless
record the bilinearity of ϕ into U ⊗ V . Since for every c ∈ F, u1 , u2 ∈ U, v ∈ V we have
ϕ(u1 + cu2 , v) = ϕ(u1 , v) + cϕ(u2 , v),
we automatically hope that (u1 + cu2 ) ⊗ v = u1 ⊗ v + cu2 ⊗ v. Similar property should
also hold when U, V are interchanged.
Definition 8.4.1. For any two vector spaces U, V over F , the tensor product U ⊗ V of
U and V is the vector space spanned by u ⊗ v for all u ∈ U, v ∈ V . Here u ⊗ v satisfies
the relation
(u1 + cu2 ) ⊗ v) = u1 ⊗ v + cu2 ⊗ v, u ⊗ (v1 + cv2 ) = u ⊗ v1 + cu ⊗ v2 .

Property 8.4.1. (Universal property of tensor product) For any vector spaces U, V, W
over F , if ϕ : U × V → W is a bilinear map, then there exists a unique linear map
ϕ′ : U ⊗ V → W such that ϕ′ (u ⊗ v) = ϕ(u, v). In other words, there exists a unique
linear map ϕ′ such that the following diagram commutes:
ϕ
U ×V W

φ
ϕ′
U ⊗V

where φ : U × V → U ⊗ V is the map sending (u, v) to u ⊗ v.

Property 8.4.2. Suppose that U, V are finite dimensional. Then U ∗ ⊗V is (canonically)


isomorphic to L(U, V ), where L(U, V ) is the vector space consisting all linear maps from
U to V .

Sketch of Proof. From the universal property, it is natural to construct a bilinear map
ϕ : U ∗ × V → L(U, V ). For every f ∈ U ∗ and v ∈ V , we have to decide what ϕ(f, v)
is. Since ϕ(f, v) is a linear map from U to V , we have to decide what ϕ(f, v)u is for all
u ∈ U . It is then natural to let
ϕ(f, v)u = (f u)v.
It is easy to show that ϕ is well-defined and bilinear. Therefore there is an induced linear
map ϕ′ from U ∗ ⊗ V to L(U, V ). We can verify that ϕ′ is an isomorphism by directly
constructing the inverse map
∑ ∗
ϕ′−1 T = u i ⊗ f (ui )
i

where u1 , . . . , un form a basis of U and u∗1 , . . . , u∗n form the corresponding dual basis of
U ∗.

109
Hung-Hsun, Yu 8.5 Character

Corollary 8.4.1. If dim U, dim V < ∞, then dim U ⊗ V = dim U dim V .

Corollary 8.4.2. If u1 , . . . , um form a basis of U , and v1 , . . . , vn form a basis of V , then


ui ⊗ vj (1 ≤ i ≤ m, 1 ≤ j ≤ n) form a basis of U ⊗ V .

Besides vector spaces, we can also define the tensor product of linear transformations.
Definition 8.4.2. Suppose that T1 : U1 → V1 and T2 : U2 → V2 are two linear maps,
then the tensor product T1 ⊗ T2 is the linear map from U1 ⊗ U2 to V1 ⊗ V2 such that

(T1 ⊗ T2 )(u1 ⊗ u2 ) = (T1 u1 ) ⊗ (T2 u2 )

for all u1 ∈ U1 , u2 ∈ U2 .

Property 8.4.3. If T1 is a linear operator on a finite dimensional vector space V1 , and


T2 is a linear operator on a finite dimensional vector space V2 , then T1 ⊗ T2 is a linear
operator on V1 ⊗ V2 and
tr T1 ⊗ T2 = tr T1 tr T2 .

Now that we know how to define the tensor product of linear transformations, we can
also define the tensor product of the representation.
Definition 8.4.3. Let ρ1 , ρ2 be the representations of G on V1 , V2 , respectively. Then
the tensor product ρ1 ⊗ ρ2 of ρ1 and ρ2 is the representation on V1 ⊗ V2 that satisfies

(ρ1 ⊗ ρ2 )(g) = ρ1 (g) ⊗ ρ2 (g).

8.5 Character
The data ρ : G → GL(V ) is often too complicated to deal with. If V is finite dimensional,
then we can consider some forgetful map GL(V ) → F that simplifies the data. So far
we’ve always considered determinant when we need to choose a forgetful map. However
determinant does not work quite well in this case (see the problem set for explanations).
In this specific situation, it turns out that taking the trace works better.
Definition 8.5.1. Suppose that ρ is a finite dimensional representation of G. The
character χρ of ρ is a function from G to F such that

χρ (g) = tr ρ(g) ∀g ∈ G.

If ρ is irreducible, then we say that χ is irreducible.

Property 8.5.1. Suppose that ρ1 , ρ2 are two finite dimensional representations of G


over the same field. Then χρ1 ⊕ρ2 = χρ1 + χρ2 and χρ1 ⊗ρ2 = χρ1 χρ2 .

If G is finite, then we know that for each g ∈ G there exists k ∈ N such that ρ(g)k = id.
This shows that if ρ is over C, then all the eigenvalues of ρ(g) must be a k-th root of
unity. This tells us that:
Property 8.5.2. If ρ is a representation over C of G of dimension n and G is finite,
then χρ (g) is a sum of n roots of unity. Moreover, we have χρ (g −1 ) = χρ (g).

110
Hung-Hsun, Yu 8.6 Orthonormality of Irreducible Characters

Although we choose to forget something, we still hope that χ has some interaction
with the structure of G. This can be achieved by realizing χρ (gh) = tr ρ(g)ρ(h) =
tr ρ(h)ρ(g) = χρ (hg). We can rewrite this in a slightly different way:
Property 8.5.3. Suppose that g, g ′ are conjugates in G, then χρ (g) = χρ (g ′ ).

Definition 8.5.2. A (complex-valued) class function f on G is a function from G to C


such that if g, g ′ belong to the same conjugacy class, then f (g) = f (g ′ ).

Corollary 8.5.1. Suppose that ρ is a finite dimensional representation of G, then the


character of ρ is a class function on G.

Property 8.5.4. If |G| is finite, then the class functions on G form a vector space over
C. Its dimensional is the class number of G.

Example 8.5.1. Let’s get back to our favorite example S3 . We want to evaluate the
possible irreducible characters of S3 . Note that isomorphic representations give the same
characters, so we only need to consider the non-isomorphic irreducible representations.
There are two one-dimensional characters, namely the identity and the sign map. To-
gether with the two-dimensional irreducible representation that we have already dis-
cussed, we have three irreducible representations of S3 now.
Besides, there are three conjugacy classes of S3 . Since characters are class functions,
we only need to determine the values they take on the identity, (1 2) and (1 2 3). By
simple calculation, we know that the three chacaters are as the following:

e (1 2) (1 2 3)
χ1 1 1 1
χ2 1 −1 1
χ3 2 0 −1

It is easy to see that χ1 , χ2 , χ3 actually form a basis of the class functions. More strikingly,
if we consider the inner product
1 ∑
⟨f, g⟩ = f (x)g(x),
|G| x∈G

then χ1 , χ2 , χ3 actually form an orthonormal basis! This tells us that maybe considering
the character that we define can give us some really strong statements that hold, and we
will prove some of them in the next section.

8.6 Orthonormality of Irreducible Characters


In this section, we will prove the orthonormality that we discovered in the last section.
Theorem 8.6.1. (Row orthogonality) Suppose that G is a finite group. Then all irre-
ducible characters over C of G form an orthonormal basis of the space consisting of class
functions on G equipped with the inner product
1 ∑
⟨f, g⟩ = f (x)g(x).
|G| x∈G

111
Hung-Hsun, Yu 8.6 Orthonormality of Irreducible Characters

Sketch of Proof. We first prove that the irreducible characters form an orthonormal basis.
Suppose that ρ, ρ′ are two irreducible representations of finite dimension on V and V ′ ,
resepctively. Then
Ñ é
1 ∑ 1 ∑ 1 ∑
⟨χρ , χρ′ ⟩ = χρ (g)χρ′ (g) = χρ (g)χρ′ (g −1 ) = tr ρ(g) ⊗ ρ′ (g −1 ) .
|G| g∈G |G| g∈G |G| g∈G

Note that
1 ∑
ρ(g) ⊗ ρ′ (g −1 )
|G| g∈G
is an linear operator on V ⊗V ′ , which is isomorphic to L(V ′ , V ), the vector space consisting
of all linear transformation from V ′ to V . Let T be any linear transformation from V ′ to
V , and let ϕ be any isomorphism from L(V ′ , V ) to V ⊗ V ′ . It is easy to verify that
(ρ(g) ⊗ ρ′ (g −1 ))(ϕ(T )) = ϕ(ρ(g)T ρ′ (g −1 )).
Let φ be the linear operator on L(V ′ , V ) such that
1 ∑
φ(T ) = ρ(g)T ρ′ (g −1 ),
|G| g∈G

then we have ⟨χρ , χρ′ ⟩ = tr φ. The main observation here is that any linear transformation
in im φ is equivariant. Therefore it is a direct corollary of Schur’s lemma that if ρ and ρ′
are not isomorphic, then ⟨χρ , χρ′ ⟩ = 0. If ρ and ρ′ are isomorphic, then we can WLOG
assume that ρ = ρ′ . This shows that φ(T ) = λ id by Schur’s lemma. Moreover, it is easy
to verify that φ(id) = id. Therefore ⟨χρ , χρ′ ⟩ = tr φ = 1, as desired.
Now it remains to show that all the irreducible characters indeed form a basis. If f
is a class function such that
⟨f, χ⟩ = 0
for any irreducible character χ, we have to show that f = 0. Let ρ be an arbitrary
irreducible representation of G over C. Then it is easy to verify that the linear operator
1 ∑
Tρ = f (g)ρ(g −1 )
|G| g∈G

is equivariant. Also
tr Tρ = ⟨f, χρ ⟩ = 0.
Hence by Schur’s lemma Tρ = 0. Now consider the regular representation ρreg . By
Maschke’s theorem we know that ρreg can be decomposed as a direct sum of irreducible
representations. Therefore the linear operator
1 ∑
Tρreg = f (g)ρreg (g −1 )
|G| g∈G

is also zero. Now since ρreg (g) is linearly independent, it is clear that f = 0.
Corollary 8.6.1. Suppose that G is a finite group. Then the character of a repre-
sentation determines the representation uniquely up to isomorphism. In particular, if
χ1 , . . . , χn are all the irreducible characters, then

n
ρ= ⟨χ, χi ⟩ρi .
i=1

112
Hung-Hsun, Yu 8.6 Orthonormality of Irreducible Characters

Corollary 8.6.2. Suppose that G is a finite group. For any irreducible character χ,
we have ⟨χ, χreg ⟩ ̸= 0. In other words, we can get all the irreducible representations by
decomposing the regular representation.

Corollary 8.6.3. Suppose that G is a finite group, χ1 , . . . , χn are all the irreducible

characters, and di is the dimension of ρi , then ni=1 d2i = |G|.

Sketch of Proof. It is clear that ⟨χi , χreg ⟩ = tr ρi (1) = di . Therefore



n ∑
n
|G| = dim ρ = ⟨χi , χreg ⟩ dim ρi = d2 .i
i=1 i=1

Corollary 8.6.4. If G is finite and χ is a character. Then χ is irreducible if and only


if ⟨χ, χ⟩ = 1.

Corollary 8.6.5. (Column orthogonality) For any g, g ′ ∈ G, we have



∑ 0, Cg = Cg ′
χ(g)χ(g ′ ) = 
χ |CG (g)|, Cg = Cg′

where the sum runs over all the irreducible characters.

Sketch of Proof. Let A be a square matrices such that


χi (gj )
Aij = »
|CG (gj )|
where χi is the i-th irreducible character and gj is a representative of the j-th class. Then
from the (row) orthogonality we see that A is unitary, and so we can deduce the column
orthogonality from this.
There is another property that is very useful when we want to find the character table
of a given group. However, the proof is hard and so we will omit the proof here.
Theorem 8.6.2. Suppose that ρ is an irreducible character of a finite group G over C,
then dim ρ||G|.

Example 8.6.1. Let’s use the theorem that we proved to evaluate the character table
of A4 (or equivalently the tetrahedral group T ). We know that there are four conju-
gacy classes of A4 , where id, (1 2 3), (1 3 2), (1 2)(3 4) are the four representatives.
Therefore there are four irreducible characters. Suppose that di is the dimension of the
i-th irreducible character, then d21 + d22 + d23 + d24 = 12. The only solution to this is
d1 = d2 = d3 = 1, d4 = 3. We can then conclude that the character table is of the
following form.
id (1 2 3) (1 3 2) (1 2)(3 4)
χ1 1 1 1 1
χ2 1 a b c
χ3 1 a′ b ′
c′
χ4 3 x y z

113
Hung-Hsun, Yu 8.7 Restricted and Induced Representation

Note that A4 /K4 ∼ = C3 . This directly provides us three 1-dimensional characters. There-
fore we can fill in some more information:

id (1 2 3) (1 3 2) (1 2)(3 4)
χ1 1 1 1 1
χ2 1 ω ω2 1
χ3 1 ω2 ω 1
χ4 3 x y z

Now we can compute χ4 by the fact that χ1 , χ2 , χ3 , χ4 are orthonormal. After some easy
calculation, one can get that x = y = 0 and z = −1.

Example 8.6.2. Besides computing the character table, we can also use this to get
some general result of the class number. Suppose that |G| is a finite group with odd
order and c is its class number. Moreover, suppose that d1 , . . . , dc are the dimensions of
the irreducible characters of G. Then we have

c ∑
c
|G| = d2 i ≡ 1 ≡ c mod 8
i=1 i=1

because di is odd. Compared with the method used in Problem 5 in Chapter 6, this is
much more straight forward.

8.7 Restricted and Induced Representation


In this and the next sections, we are going to introduce some ways to construct new
representations out of some given one. This allows us to get more representations of a
given group, and maybe it is easier to decompose the representations to irreducible ones
(than decomposing the regular representations).
Definition 8.7.1. Given a representation ρ of G on V . Suppose that H is a subgroup
of G, then the restriction ρ|H of ρ on H is the representation of H on V such that

ρ|H (h) = ρ(h) ∀h ∈ H.

This is pretty straightforward. The thing here is, can we do something that is somehow
the inverse of taking the restriction? That is, given a representation of a subgroup H,
how can we construct a representation of G from that?
Definition 8.7.2. Suppose that H is a subgroup of G, and ρ is a representation of H
on a vector space V . Let VHG be the vector space consisting of gv(g ∈ G, v ∈ V ) such
that
g(cv1 + v2 ) = cgv1 + gv2
and
(gh)v = g(ρ(h)v)
for all g ∈ G, h ∈ H. Then G acts on VHG naturally by g(g ′ v) = (gg ′ )v for any g, g ′ ∈ G.
This is a representation of G on VHG , and is denoted by IndG
H ρ. This is called the induced
representation of ρ.

114
Hung-Hsun, Yu 8.7 Restricted and Induced Representation

We can also states the definition of IndG H ρ in another way. Let gs (s ∈ S) be the
representatives of the left cosets of H. Then VHG is simply

gs V.
s∈S

For every g and any s ∈ S, we know that ggs falls into some coset of H. Let gσ(s) be its
representative, and ggs = gσ(s) hs . Then we can define

g(gs v) = gσ(s) (ρ(hs )v).

Example 8.7.1. Suppose that the subgroup H of G is trivial and ρ is the trivial
representation of H. Then ρG
H is the regular representation of G.

Property 8.7.1. If ρ is a representation of G on V with finite dimension, χ is the


character of ρ, then the character of ρ|H is χ|H .

Property 8.7.2. (Frobenius Formula) If H is a subgroup of a finite group G, ρ is a


representation of H on V with finite dimension, and χ is the character of ρ, then the
character IndG G
H χ of IndH ρ is

1 ∑
IndG
H χ(x) = χ(gxg −1 ).
|H| g∈G
gxg −1 ∈H

Sketch of Proof. Since we can decompose VHG into ⊕s∈S gs V , it suffices to calculate the
traces block-wise and sum them up. For a fixed x, the trace of IndG H ρ(x) on gs V is
zero if gσ(s) ̸= gs . Therefore we only need to consider the case σ(s) = s. In this case,
gs−1 xgs ∈ H and the trace is χ(gs−1 xgs ). Note that all the elements in gs ’s coset also
satisfy this. Therefore
∑ 1 ∑
IndG
H χ(x) = χ(gs−1 xgs ) = χ(gxg −1 ).
s∈S |H| g∈G
gs−1 xgs gxg −1 ∈H

We can furthermore expand this property to class functions.


Definition 8.7.3. Suppose that H is a subgroup of a finite group G, and f is a class
function on H. Then the induced class function IndG
H f is the function

1 ∑
IndG
H f (x) = f (gxg −1 ).
|H| g∈G
gxg −1 ∈H

With this property/definition, we can relate the inner product of the characters on G
to the one of the characters on H.
Theorem 8.7.1. (Frobenius Reciprocity) Suppose that G is a finite group, H is a
subgroup of G, and χ, ϕ are class functions on G, H, respectively. Then

⟨χ|H , ϕ⟩H = ⟨χ, IndG


H ϕ⟩G .

115
Hung-Hsun, Yu 8.8 Dual Representation

Sketch of Proof.
1 ∑
⟨χ, IndG
H ϕ⟩G = χ(x)IndG
H ϕ(x)
|G| x∈G
1 ∑ ∑
= χ(x)ϕ(yxy −1 )
|G||H| x∈G y∈G
yxy −1 ∈H
1 ∑ 1 ∑
= χ(y −1 xy)ϕ(x)
|H| x∈H |G| y∈G
1 ∑
= χ(x)ϕ(x)
|H| x∈H
= ⟨χ|H , ϕ⟩H .

Example 8.7.2. Let H = {e} and ϕ be the trivial character. Then IndG H ϕ is the
standard character χreg . For every charcter χ of G, it is easy to see that χ|H is simply
the dimension of χ. Hence,

⟨χ, χreg ⟩G = ⟨χ|H , ϕ⟩H = dim χ.

8.8 Dual Representation


Suppose that ρ is a representation of G on V with finite dimension. We can try to
think about what a reasonable way to construct a representation on the dual space V ∗ is.
Choose a basis of V and the corresponding dual basis of V ∗ . Then for any v ∈ V, f ∈ V ∗
we have
f (v) = f T v.
The representation ρ tells us how G can act on the elements in V , and what we want is
another representation ρ∗ that tells us how G can act on the elements in V ∗ . Naturally
we hope that
f (v) = (ρ∗ (g)f )(ρ(g)v)
holds for all g ∈ G, v ∈ V, f ∈ V ∗ . This tells us that

ρ∗ (g)T ρ(g) = In .

Definition 8.8.1. Suppose that ρ is a representation of G on V with finite dimension.


Then the dual representation ρ∗ of ρ is a representation of G on V ∗ such that

ρ∗ (g) = ρ(g −1 )T .

Property 8.8.1. Suppose that χ, χ∗ are characters of ρ and ρ∗ , respectively. Then


χ∗ = χ.

Property 8.8.2. ρ∗∗ is isomorphic to ρ.

Property 8.8.3. ρ is irreducible if and only if ρ∗ is irreducible.

116
Hung-Hsun, Yu 8.9 Random Problem Set

Sketch of Proof. Suppose that W is a G-invariant non-trivial proper subspace. Let W ′


be a subspace of V ∗ consisting of f that satisfies f (w) = 0 ∀w ∈ W . Then for any
f ∈ W ′ and w ∈ W , we have
(ρ∗ (g)f )(w) = f (ρ(g −1 )w) = 0.
holds for all g ∈ G. Therefore W ′ is also G-invariant. The property then follows by the
fact that ρ∗∗ is isomorphic to ρ∗ .
Definition 8.8.2. A representation ρ is self-dual if ρ∗ ∼
= ρ.

Theorem 8.8.1. Suppose that |G| is odd. Then the only self-dual irreducible repre-
sentation of G is the trivial representation.

Sketch of Proof. Let ρ be a self-dual representation. If ρ is not trivial, then we have



χρ (g) = 0
g∈G

by the orthogonality. Now for any g ∈ G that has order d, we know that the eigenvalues
of ρ(g) are d-th roots of unity. Therefore

χρ (g d )
1≤d′ ≤d,gcd(d′ ,d)=1

is an integer. Since ρ is self-dual, we know that ρ(g), ρ(g −1 ) share the eigenvalues. This
shows that we can pair up the non-real eigenvalues of ρ(g) and get that

χρ (g d )
1≤d′ ≤d,gcd(d′ ,d)=1

is even unless g is identity. Therefore



χρ (g)
g∈G

is congruent to χρ (1) modulo 2. This is absurd since χρ (1) = dim χ||G|.

8.9 Random Problem Set


1. (8.2) Consider the group G = Z/pZ and the representaion ρ : G → GL2 (Fp ) such
that ñ ô
1 g
ρ(g) = ∀g ∈ G.
0 1
Show that ρ is reducible but cannot be decomposed as a direct sum of two subrep-
resentations.
2. (8.2) Consider the group G = (C, +) and the representaion ρ : G → GL2 (C) such
that ñ ô
1 g
ρ(g) = ∀g ∈ G.
0 1
Show that ρ is reducible but cannot be decomposed as a direct sum of two subrep-
resentations.

117
Hung-Hsun, Yu 8.9 Random Problem Set

3. (8.2) Let S3 act on C3 by permuting the coordinates, and let W be the subspace
whose coordinates sum to zero. Let ρ : S3 → GL(W ) be the corresponding repre-
sentation. Choose a basis of W and write the representation explicitly. Find an
inner product on W such that ρ is unitary. Determine if ρ is irreducible or not.

4. (8.2) Show that if A, B are both linear operators on a vector space, and if they
commute, then they are simultaneously diagonalizable. That is, there exists a
basis that diagonalizes A and B. Using this, prove that every finite dimensional
representation of a finite abelian group over C decomposes as a direct sum of one
dimensional representations.

5. (8.2) Suppose that G is a finite abelian group. Show that the number of irreducible
representations of G over C is |G|.
Hint: You might need to use the fundamental theorem of finite abelain group.
Google it if you don’t know what that is.

6. (8.3) Show that Schur’s lemma does not hold if we replace C with R.

7. (8.5) Suppose that ρ is a finite dimensional representation of G. Show that the


composition det ◦ρ is a one dimensional representation of G.

8. (8.6) Construct explicitly all the irreducible representations of A4 .

9. (8.6) Suppose that N is a normal subgroup of a finite group G, and ρ is an irreducible


representation of G/N over C. Show that the representation ρ◦π of G is irreducible,
where π is the canonical projection from G to G/N . Does this hold if we don’t
require G is finite and ρ is over C?

10. (8.6) Show that for any prime p, any group with order p2 is abelian. This time, use
the character theory to prove this.

11. (8.8) Let c be the number of conjugacy classes of a finite group G. Show that if G
is of odd order, then
|G| ≡ c mod 16.

118
Chapter 9

Ring

We have learned about group in the preceding chapters. However, in many situations
we might want to know not only about a single law of composition but also about the
interaction between two laws of composition. For example, if we consider the integers,
then we can consider the addition and multiplication at the same time. The abstract
concept of this is called ring.
One can view field as a ring that we can do division as long as the divisor is non-zero.
However, field and ring are substantially distinct from each other. In fact, the structure of
ring is so complicated that usually one is only interested in some particular rings. In this
chapter, we will introduce some basic properties of rings and various extra constraints
that we would like to add to the structure of rings.

9.1 Definitions and Examples


Intuitively, a ring is an algebraic structure that is equipped with addition and multipli-
cation.
Definition 9.1.1. If R is a set equipped with two laws of composition + and ·, then
we say that (R, +, ·) is a ring if the following conditions are met:
(1) (R, +) is an abelian group;
(2) (associativity) (a · b) · c = a · (b · c);
(3) (distributivity) (a + b) · c = a · c + b · c.

Remark. Here we don’t require a ring to have a multiplicative identity. Some authors
tend to assume that ring has a multiplicaitive identity. To avoid ambiguity, we will from
now on call a ring that does not necessarily have a multiplicative identity “rng”, and call
a ring that has a multiplicative identity “ring with 1”.

Example 9.1.1. (Z, +, ×), (F, +, ×), (F [x], +, ×), (Z/nZ, +, ×), Mn×n (F ) are rings with
1. (xF [x], +, ×) is a rng. (N, +, ×), (Z, +, −) are not rngs.

Property 9.1.1. Suppose that R is a rng and 0 is the additive identity. Then 0 · x = 0
for all x ∈ R. Besides, if we denote the sum of n x’s by nx (where n can be negative),
then (nx) · y = n(x · y).

Sketch of Proof. 0 · x = (0 + 0) · x = 0 · x + 0 · x, and so 0 · x = 0.

119
Hung-Hsun, Yu 9.2 Ring Homomorphism

If n is positive, then (nx)·y = x·y+· · ·+x·y = n(x·y). If n is negative, then it suffices to


show that (−x)·y = −(x·y), which is true because that (−x)·y+x·y = (−x+x)·y = 0.

From now on, we will drop · if it does not lead to any ambiguity.
Unlike the case in field, there might be two non-zero elements whose product is zero.
For example, 2 · 3 = 0 in Z/6Z.
Definition 9.1.2. If R is a rng and a ∈ R is an element such that there exists b ∈ R\{0}
satisfying ab = 0, then a is called a left zero divisor of R. If there exists b ∈ R\{0} such
that ba = 0, then a is called a right zero divisor of R. If two cases occur simultaneously,
then a is called a two-sided zero divisor. If an element is not a zero divisor, then it is
said to be regular.

Definition 9.1.3. If R is a ring with 1 and a is an element such that the multiplicative
inverse exists, then we call a a unit.

Example 9.1.2. For every field, every nonzero element is a unit.

Property 9.1.2. For every ring with 1, the units form a multiplicative group.

In the case of group, we are sometime interested in the case when the elements com-
mute with each other. We also would like to restrict ourselves on this case for ring
sometime, as this simplifies the discussion a lot and is necessary for some extra results.
Definition 9.1.4. If xy = yx for any x, y ∈ R, then R is commutative. If R is
commutative and is a ring with 1, then we say that R is a commutative ring with 1.

Definition 9.1.5. If R is a ring with 1 whose non-zero elements are all regular, then R
is said to be a domain. If it is furthermore commutative, then R is an integral domain.

The name “integral domain” comes from the fact that integral domains share a lot of
properties with Z. This will be more clear in the following sections.
Property 9.1.3. If R is a finite integral domain, then R is a field.

Sketch of Proof. Let x be a non-zero element. Then the map a 7→ ax is injective since x
is not a zero divisor. This shows that the map is bijective, and so there exists a such that
ax = 1. Since R is commutative, a is the multiplicative inverse and so R is a field.

Remark. In fact, we just need the condition that R is a finite domain. In this case,
this statement is Wedderburn’s little theorem. This is not relevant to the content of this
note and so is not included. The readers are welcome to google for it if interested.

9.2 Ring Homomorphism


Now that we have a new algebraic object, it’s time to determine what a map is and what
a subobject is. Let’s first determine what sort of maps are considered.
Definition 9.2.1. Suppose that R and R′ are two rngs. A ring homomorphism ϕ :
R → R′ is a group homomorphism that satisfies ϕ(x)ϕ(y) = ϕ(xy) for any x, y ∈ R. In

120
Hung-Hsun, Yu 9.3 Subring and Ideal

other words, a group homomorphism ϕ : R → R′ is a ring homomorphism if the following


diagram commutes.

(ϕ, ϕ)
R×R R′ × R′

· ·

R R′
ϕ

Remark. In the context that rings are required to have multiplicative identities, we
usually furthermore require that ring homomorphism send the multiplicative identity to
the multiplicative identity. There are examples that a ring homomorphism does not send
1 to 1, but it is easy to verify that a ring homomorphism always sends 1 to the 1 of its
image.

Example 9.2.1. For any n ∈ N, consider the canonical projection π : Z → Z/nZ.


Then π is a ring homomorphism.

Example 9.2.2. Consider the map ϕ : Z/3Z → Z/6Z such that ϕ(x + 3Z) = 4x + 6Z.
Then ϕ is a ring homomorphism. Note that ϕ(1 + 3Z) = 4 + 6Z, which is not the
multiplicative identity of Z/6Z but the multiplicative identity of the subring im ϕ =
2Z/6Z.

Definition 9.2.2. Suppose that R, R′ are two rngs. We say that R, R′ are isomorphic
if there exists a bijective ring homomorphism between R, R′ .

9.3 Subring and Ideal


Definition 9.3.1. If (R, +, ·) is a rng and S is a subset in R such that (S, +, ·) is also
a rng, then we say that S is a subring of R.

Remark. Usually if one assumes that an ordinary ring should contain a multiplicative
identity, then one would require a subring to contain the same multiplicative identity.
This restriction however makes some ideals not subrings. Also as we will see, if R is a
ring with 1 and S is a subring of R that is also a ring with 1, then 1S is not necessarily
1R . In commutative algebra the restriction that the subring contains the exactly same
multiplicative identity makes sense, but for our purpose this will not be needed.

Example 9.3.1. Consider the ring with 1 Z/6Z. The subset 2Z/6Z is a subring with 1.
However, the multiplicative identity of Z/6Z is 1 + 6Z, while the multiplicative identity
of 2Z/6Z is 4 + 6Z.

The concept of subring turns out to be somehow useless, unlike the case of subgroup.
Instead, we are more interested in ideal, which is the ring analogy of normal subgroups.
To see how we should define an ideal, let’s see what we need to have “quotient ring” make
sense.

121
Hung-Hsun, Yu 9.3 Subring and Ideal

Property 9.3.1. Let I be a subring of R (which tells us that (I, +) is a (normal)


subgroup of (R, +)). Then xI, Ix ⊆ I for all x ∈ R if and only if we can endow R/I
(which already has a group structure) with a ring structure such that the canonical
projection is a ring homomorphism. In other words, xI, Ix ⊆ I for all x ∈ R if and only
if we can define · : R/I × R/I → R/I such that the following diagram commutes:

(π, π)
R×R R/I × R/I

· ·

R R/I
π

Sketch of Proof. It is clear that we have to define (x + I)(y + I) to be (xy + I). We have
to check what it means for this to be well-defined. That is, when does

(x + i1 )(y + i2 ) − xy ∈ I

for every i1 , i2 ∈ I? Since I is a subring, it is equivalent to i1 y + xi2 ∈ I. Since we can


take x = 0 and then y = 0, it is clear that this is equivalent to xI, Ix ⊆ I.

Definition 9.3.2. If I is a subring of a rng R such that IR, RI ⊆ I, then we say that
I is an ideal of R.

Property 9.3.2. If Is (s ∈ S) is a collection of ideals of R indexed by S, then ∩s∈S Is


is an ideal. If furthermore this collection of ideal forms a chain with respect to inclusion
(i.e. I ⊆ I ′ or I ′ ⊆ I for any I, I ′ ∈ S), then ∪s∈S Is is an ideal.

Therefore it makes sense to define the ideal generated by a subset.


Definition 9.3.3. Suppose that S is a subset of a rng R. The ideal ⟨S⟩ generated by
S is the intersection of all ideals of R containing S.

Remark. In the world of group we reserve the notation ⟨·⟩ for the subgroup generated
by some set. It turns out nonetheless that ideals are more useful than subrings that we
choose to reserve the notation for ideals instead of subrings.

Property 9.3.3. Suppose that R is a ring with 1 and u is a unit, then ⟨u⟩ = R.

Now that we’ve determined what a ring homomorphism is and what an ideal is, we
can state the isomorphism theorems in terms of ring. The proofs are left as exercises.
Theorem 9.3.1. (First isomorphism theorem) Suppose that ϕ : R → R′ is a ring
homomorphism, then ker ϕ is an ideal of R and im ϕ is a subring of R′ . Moreover,
R/ ker ϕ is isomorphic to im ϕ.

Theorem 9.3.2. (Second isomorphism theorem) Suppose that S is a subring of a rng


R and I is an ideal, then S + I is a subring and S ∩ I is an ideal of S. Besides, (S + I)/I
is isomorphic to S/(S ∩ I).

122
Hung-Hsun, Yu 9.4 Integral Domain and Divisibility

Theorem 9.3.3. (Third isomorphism theorem) Suppose that I ⊆ L ⊆ R such that


I, L are both ideals of R. Then L/I is an ideal of R/I, and we have (R/I)/(L/I) is
isomorphic to R/L.

Theorem 9.3.4. (Correspondence theorem) For any epimorphism ϕ : R → R′ , the


subrings of R that contain ker ϕ are in bijection with the subrings of R′ via ϕ. Moreover
the ideals of R that contain ker ϕ are in bijection with the ideals of R′ .

9.4 Integral Domain and Divisibility


Z might be the first ring structure that one learned about. On this specific ring, we can
define divisibility, prime, greatest common divisor and least common multiple. Now that
integral domains have similar properties that Z has, maybe we can define these concepts
on integral domains too. In this section, we will build up the number theory on integral
domain, and then we will see what further assumption we might want to make on a good
integral domain.
Property 9.4.1. (Law of cancellation) Suppose that R is an integral domain and
a, b, c ∈ R such that c ̸= 0 and ac = bc, then a = b.

Sketch of Proof. ac = bc implies that (a − b)c = 0. Since R is an integral domain, this


implies that a − b = 0 or c = 0. The latter is false by assumption, so we’re done.

Now let’s first think over what “divisible” in Z means. If a, b are two integers such
that there exists c ∈ Z satisfying ac = b, then we say that b is divisible by a. Now it is
clear how we should generalize this.
Definition 9.4.1. Suppose that R is an integral domain, then b is divisible by a if there
exists c ∈ R such that ac = b. In this case, we say that a is a divisor of b, and b is a
multiple of a. This is denoted as a|b.

Property 9.4.2. The only multiple of 0 is 0. Any element is an divisor of 0. If u is a


unit, then the only divisors of u are the units, and any element is an multiple of u.

Property 9.4.3. If a|b and b|c, then a|c. For any a, b, c ∈ R, we have that a|b if and
only if a|(b + ca).

In Z, we know that if a|b and b|a, then a = ±b. In other words, we cannot differentiate
a from −a if we only use divisibility to differentiate them. The underlying factor here is
that ±1 are the only units in Z. A similar phenomenon occurs in the general case.
Property 9.4.4. If a|b and b|a, then there exists a unit u such that b = au.

Sketch of Proof. Let b = au and a = bv. Then a = uva, and so by the law of cancellation
we have a = 0 or uv = 1. If uv = 1, then we’re done. If a = 0, then it is clear that b = 0,
and so the statement still holds.

Definition 9.4.2. For any a, b in an integral domain R, we say that a, b are associates
if there exists a unit u such that b = au.

123
Hung-Hsun, Yu 9.4 Integral Domain and Divisibility

Property 9.4.5. If we define a ∼ b if and only if a, b are associates, then ∼ is an


equivalent relation.

Example 9.4.1. Consider the integral domain C[x]. We know that x|2x and 2x|x, and
so x, 2x are associates.

Next, let’s think about what greatest common divisor and least common multiple
mean. The greatest common divisor d of a, b is the greatest number among all the
common divisors of a, b, i.e. d is the largest number that satisfies d|a and d|b. However,
we do not necessarily have a good well-ordering on the integral domain R, so let’s try to
describe the greatest common divisor only with divisibility.
If we assume the fundamental theorem of arithmetic, then we know that if d is the
greatest common divisor of a, b and x is an aribtrary common divisor of a, b, then x|d.
Note that this statement only involves divisibility, so let’s make this the definition of
greatest common divisor. We can make a similar definition for least common multiple.
Definition 9.4.3. Let a, b be two elements in an integral domain R. For any x ∈ R, if
x|a, x|b then we say that x is a common divisor of a, b. If a|x, b|x then we say that x is a
common multiple of a, b. If d is a common divisor of a, b such that any common divisor of
a, b divides d, then we say that d is the greatest common divisor of a, b. If l is a common
multiple of a, b such that any common multiple is divisible by l, then we say that l is the
least common multiple.

Property 9.4.6. (Uniqueness of gcd and lcm) If d, d′ are both gcd’s of a, b, then d, d′
are associates. Similarly, if l, l′ are both lcm’s of a, b, then l, l′ are associates.

Sketch of Proof. Since d′ is itself a gcd of a, b, we have by definition that d′ |d. Similarly
d|d′ , and so d, d′ are associates. Same argument works for lcm.

Note that with this definition, the greatest common divisor need not be positive in
Z. The reason that we always make gcd and lcm positive is simply because that we
prefer positive integers to negative integers. However, in the general setting there is no
particular reason that we should favor one associate over another one.
Besides, at this point it is unclear that whether gcd and lcm always exist. In fact,
there are a lot of scenarios that gcd and lcm do not exist. Soon we will see that if we add
some constraints to the integral domain R, then it will be guaranteed that gcd and lcm
always exist.
Now let’s look at the definition of prime numbers in Z. The usual definition is that
p is prime if p cannot be factored into ab such that a, b ̸= ±1. In the general setting
however, we are used to calling this property irreducible.
Definition 9.4.4. A non-zero non-unit element a ∈ R is irreducible if it cannot be
written as a = xy where x, y are not units.

The term prime is reserved for another property that the primes in Z also have.
Definition 9.4.5. A non-zero non-unit element a ∈ R is prime if for any b, c ∈ R such
that a|bc, we always have a|b or a|c.

Property 9.4.7. A prime element is always irreducible.

124
Hung-Hsun, Yu 9.5 Ideal and Divisibility

Sketch of Proof. If p is primes in R and p = xy, then WLOG assume that p|x. Since x|p,
we have that p, x are associates, and so y is a unit.

9.5 Ideal and Divisibility


We continue to keep the assumption that R is an integral domain in this section. In this
section, we will see the connection between ideal and divisibility.
Property 9.5.1. If R is an integral domain, then for every x ∈ R we have ⟨x⟩ = xR.

Sketch of Proof. Actually we just need that R is a commutative ring with 1. It is easy
to verify that xR is an ideal that contains x, so it remains to show that if x ∈ I for an
ideal I, then xR ⊆ I. This is clear by the definition of an ideal.

Corollary 9.5.1. For any elements a, b ∈ R, we have a|b if and only if ⟨b⟩ ⊆ ⟨a⟩.

As a consequence, an element d is a common divisor of a, b if and only if ⟨a⟩ ⊆ ⟨d⟩


and ⟨b⟩ ⊆ ⟨d⟩. Since ⟨d⟩ is an ideal, we have that ⟨a, b⟩ ⊆ ⟨d⟩.
Property 9.5.2. Suppose that I1 , I2 are two ideals in an integral domain R, then
⟨I1 , I2 ⟩ = I1 + I2 .

The question here is: how should we find the greatest common divisor? We have to
find an element d such that ⟨d⟩ is closest to the ideal ⟨a, b⟩. If there exists d such that
⟨d⟩ = ⟨a, b⟩, then we know that d must be the greatest common divisor. Similarly, if there
exists l such that ⟨l⟩ = ⟨a⟩ ∩ ⟨b⟩, then l is the least common multiple. We naturally hope
that every ideal in R can be expressed as the form ⟨d⟩, or in other words, is “principal”.
Definition 9.5.1. An ideal I is principal if there exists x ∈ I such that I = ⟨x⟩.

Definition 9.5.2. An integral domain R is a principal ideal domain (PID) if every ideal
in R is principal.

Property 9.5.3. If R is a principal ideal domain, then every two elements a, b ∈ R


have gcd and lcm.

Example 9.5.1. As we soon will see, Z is a PID. Now let’s take a = 12 and b = 18.
Then 12Z+18Z = 6Z, and so gcd(12, 18) = 6. 12Z∩18Z = 36Z, and so lcm(12, 18) = 36.

Theorem 9.5.1. (Bezout’s theorem) Suppose that R is a PID. Then for any a, b, c ∈ R,
there exist x, y ∈ R such that ax + by = c if and only if gcd(a, b)|c.

Sketch of Proof. There exist x, y ∈ R such that ax + by = c if and only if c ∈ ⟨a⟩ + ⟨b⟩.
Since R is PID, we know that ⟨a, b⟩ = ⟨d⟩ where d = gcd(a, b), and so the condition is
equivalent to c ∈ ⟨d⟩, which is equivalent to d|c.

We can also translate the term prime into the language of ideal. If x is prime, then
we know that x|ab implies x|a or x|b. Therefore, ab ∈ ⟨x⟩ implies a ∈ ⟨x⟩ or b ∈ ⟨x⟩.

125
Hung-Hsun, Yu 9.5 Ideal and Divisibility

Definition 9.5.3. Let I be a proper ideal of a commutative rng R. We say that I is a


prime ideal if for every a, b ∈ R whose product is in I, we have a ∈ I or b ∈ I.

Property 9.5.4. If R is an integral domain, then ⟨x⟩ is a prime ideal if and only if
x = 0 or x is prime.

Property 9.5.5. Suppose that R is a commutative ring with 1 and I is an ideal of R.


Then I is prime if and only if R/I is an integral domain.

Sketch of Proof. R/I means that we see the elements in I as 0. Therefore the condition
that I is prime becomes that R/I has no non-zero zero divisors, and so it is equivalent
to that R/I is an integral domain.

Remark. One can see that using ideals to discuss divisibility gives us a lot of benefit,
mainly that we don’t have to care about associates anymore—if a and b are associates,
then ⟨a⟩ = ⟨b⟩. Also, divisibility and some other operations can be replaced with being
a subset, addition and taking union. One thing that is worth notice is that we can also
define a multiplication on the ideals, and we will naturally hope that I1 ⊆ I2 if and
only if there exists I3 such that I2 I3 = I1 . “Noetherian” integral domains that satisfy
this condition are called Dedekind domain, and in this kind of integral domain we can
uniquely factorize every ideal into prime ideals. These conditions sound like a lot, but one
can verify that any ring of integers (i.e. ring consisting algebraic integers) of a number
field (i.e. field lying in C containing Q such that the degree over Q is finite) is always
Dedekind. This is a very important result in the early attempt to proving Fermat’s last
theorem.

Now let’s take a look at a similar notion—maximal ideal.


Definition 9.5.4. Let I be a proper ideal of a rng R. We say that I is a maximal ideal
if for every proper ideal I ′ that contains I, we have that I ′ = I.

Property 9.5.6. If R is a commutative ring with 1, then for an ideal I of R, I is


maximal if and only if R/I is a field.

Sketch of Proof. If I is maximal, then for every s ∈ R\I we have that ⟨s, I⟩ = R, which
shows that there exists s′ ∈ R such that ss′ + I = 1 + I, and so R/I is a field.
If R/I is a field, then for every s ∈ R\I we have that 1 + I ⊆ ⟨s, I⟩, and so 1 ∈ ⟨s, I⟩,
which forces ⟨s, I⟩ to be R.

Corollary 9.5.2. Any maximal ideal is a prime ideal in a commutative ring with 1.

Intuitively, every ideal is contained in a maximal ideal, because we can enlarge the
ideal if the ideal that we have is not maximal. This is justified by Zorn’s lemma when R
has a multiplicative identity.
Theorem 9.5.2. Suppose that I is a proper ideal of a ring R with 1, then there exists
a maximal ideal M of R such that I ⊆ M .

Sketch of Proof. Consider the set S of all proper ideals that contain I and a partial order

126
Hung-Hsun, Yu 9.6 Z is PID

≤ such that I1 ≤ I2 if and only if I1 ⊆ I2 . For every chain C, consider the set

IC = I ′.
I ′ ∈C

We know that IC is an ideal since it’s the union of ideals. Also IC is a proper ideal, since
1∈/ IC . Therefore IC is in S, and so every chain has an upper bound. By Zorn’s lemma,
there is a maximal element M in S. Now if M is not a maximal ideal, then there exists
an ideal M ′ such that M ⊆ M ′ and M ′ ̸= M, R. This contradicts the maximality of M
in S, and so M is maximal. Now by the definition of S we know that I ⊆ M .
Corollary 9.5.3. Any nonzero ring with 1 has a maximal ideal.

9.6 Z is PID
In this section, we will show that Z is PID, and on top of this, we will try to generalize
the proof and apply it to some other integral domains.
Theorem 9.6.1. Z is a PID.

Sketch of Proof. Suppose that I ⊆ Z is an ideal. If I is a zero ideal, then we’re done. So
we can assume that I contains some non-zero elements. Suppose that x ∈ I is an element
that has the smallest non-zero absolute value. We claim that I = ⟨x⟩. For every s ∈ I,
we can express s as qx + r where 0 ≤ r < |x|. Since I is an ideal, we know that qx ∈ I,
and so r = s − qx ∈ I. By the minimality of |x|, we know that r = 0, and so s = qx.
Hence I = ⟨x⟩, as desired.
In this proof, we use the existence of the minimum of the absolute value. However,
in the general setting, we do not have such a thing. To do the similar thing, we have to
have a function ϕ : R → N ∪ {0} that emulates the absolute value. The property that we
hope the function has is simply that we can make the euclidean algorithm work. If such
function exists, then we say that the integral domain R is Euclidean.
Definition 9.6.1. An integral domain R is an Euclidean domain (ED) if there exists a
function ϕ : R → N ∪ {0} such that:
(1) ϕ(x) = 0 ⇔ x = 0;
(2) For any a, b ∈ R where b is non-zero, there exists q, r ∈ R such that a = qb + r
and ϕ(r) < ϕ(b).

Now we can modify the proof a bit to prove that an ED is always a PID.
Theorem 9.6.2. If R is an ED, then it is also a PID.

Sketch of Proof. Suppose that I ⊆ Z is an ideal. If I is a zero ideal, then we’re done. So
we can assume that I contains some non-zero elements. Suppose that x ∈ I is a non-zero
element such that ϕ(x) is the smallest among non-zero elements in I. We claim that
I = ⟨x⟩. For every s ∈ I, we can express s as qx + r where ϕ(r) < ϕ(x). Since I is an
ideal, we know that qx ∈ I, and so r = s − qx ∈ I. By the minimality of ϕ(x), we know
that r = 0, and so s = qx. Hence I = ⟨x⟩, as desired.
The definition of ED seems to be overly artificial, but it turns out to be useful in some
applications.

127
Hung-Hsun, Yu 9.7 Noetherian and Existence of Factorization

Example 9.6.1. Consider the polynomial ring F [x] with coefficients in F . It is clear
that F [x] is Euclidean with ϕ(f ) = deg f + 1 (here we follow the convention that deg 0 =
−1). Therefore F [x] is a PID.

9.7 Noetherian and Existence of Factorization


In Z, the fundamental theorem of arithmetic states that for every positive integer, there
exists a factorization of it into a product of primes, and this factorization is unique up
to permutation of primes. This theorem allows us to calculate gcd, lcm and some other
functions like Euler’s φ function in a short amount of time. It is natural to think about
whether similar statement holds for some kind of integral domain.
The first question here is, should we factorize an element into a product of primes, or
a product of irreducibles? Intuitively, given a reducible element, we can split the element
into a product of two elements, and continue this procedure until we are left with only
irreducibles. Therefore it makes more sense to consider the factorization into irreducible
elements.
There is an issue to be solved in the discussion above: we have to make sure that
the procedure ends after finitely many step. Actually, this needs not to be true (see
exercise for this). To make sure that the procedure ends eventually, we have to add some
“finiteness” constraint to the ring.
Definition 9.7.1. Suppose that R is a rng. We say that R is Noetherian if it satisfies
the ascending chain condition: For any sequence of ideals I1 ⊆ I2 ⊆ . . ., the sequence is
eventually constant.

Property 9.7.1. If R is a rng, then the following three conditions are equivalent:
(1) Every ideal in R is finitely generated.
(2) R satisfies the ascending chain condition.
(3) R satisfies the maximal condition: For any set of ideals in R, there exists a maximal
element in a sense that it is not contained in any other ideal in the set.

Sketch of Proof. (1) ⇒ (2): Suppose that I1 ⊆ I2 ⊆ . . . is an ascending chain of ideals,


then consider the ideal I = ∪k∈N Ik . Condition (1) tells us that I is finitely generated, and
so there exists a1 , . . . , an ∈ I such that I = ⟨a1 , . . . , an ⟩. By the definition of I there exists
N such that ai ∈ Im for all 1 ≤ i ≤ n and m > N . This shows that I = ⟨a1 , . . . , an ⟩ ⊆ Im
for all m > N , and so the ascending chain is eventually constant.
(2) ⇒ (3): For any given set S of ideals in R, consider the partially order defined by
inclusion. For every chain C in S, we know that

IC := I
I∈C

is in C by the ascending chain condition. Therefore every chain has an upper bound in
S, and so by Zorn’s lemma there is a maximal element in S.
(3) ⇒ (1): For any ideal I in R, consider the set SI consisting of ideals finitely
generated by the elements in I. Condition (3) say that there is a maximal element I ′
in SI . Now if I ′ is not I, then ⟨I ′ , a⟩ is a strictly larger element in SI for any a ∈ I\I ′ ,
which is a contradiciton. Hence I is finitely generated.
Corollary 9.7.1. Any PID is Noetherian.

128
Hung-Hsun, Yu 9.8 PID is UFD

Theorem 9.7.1. (Existence of factorization into irreducibles for Noetherian domain)


Suppose that R is an integral domain that is Noetherian, then every non-zero element in
R can be factorized into the product of a unit and some irreducibles.

Sketch of Proof. Suppose that the statement is false. Then the set S consisting of non-
zero elements that cannot be factorized is non-empty. Consider the set
S ′ := {⟨x⟩|x ∈ S}.
Since R is Noetherian, by the maximal condition there is an element x ∈ S such that ⟨x⟩
is maximal in S ′ . Clearly x cannot be a unit or zero, and x cannot be irreducible either.
Therefore there exists y, z ∈ R that are not units such that x = yz. If one of y, z, say y,
cannot be factorized, then y ∈ S and ⟨x⟩ ⊆ ⟨y⟩. The maximality forces ⟨y⟩ = ⟨x⟩, which
implies that z is a unit. This is a contradiction. Therefore y, z can both be factorized,
which shows that x can also be factorized, which is again a contradiction.

9.8 PID is UFD


We proved that in a Noetherian domain, the factorization into irreducible elements al-
ways exists. The fundamental theorem of arithmetic furthermore states that in Z, the
factorization is unique in some sense. In this section, we are going to show that actually
the same uniqueness holds in any PID. But before that, we have to think about what
uniqueness actually means.
If R is a Noetherian domain, then we know that for every a we can write a as
up1 p2 . . . pn
where u is a unit and p1 , . . . , pn are some irreducible elements. Note that one can permutes
p1 , . . . , pn arbitrarily, and one can replace pi with any of its associates and let u absorb
the ratio. It seems that there are no other reasonable ways to produces new factorization,
so let’s say that the uniqueness of factorization means the factorization is unqiue up to a
permutation and associations.
Definition 9.8.1. Suppose that R is an integral domain and a can be factorized into
a product of a unit and some irreducible elements. We say that the factorization of a is
unique if for any units u, u′ and irreducibles p1 , . . . , pn , q1 , . . . , qm such that
a = up1 . . . pn = u′ q1 . . . qm
we have that m = n, and there exists a permutation π of [n] such that pi and qπ(i) are
associates.

Definition 9.8.2. If R is an integral domain such that the factorization of any non-zero
element exists and is unique, then we say that R is a unique factorization domain (UFD).

We know that the definitions of prime and irreducible coincide in Z and Z is a UFD.
It turns out that in an integral domain that factorization always exists, the domain is
UFD if and only if the definitions of prime and irreducible coincide.
Property 9.8.1. Suppose that R is an integral domain where the factorization of any
non-zero element exists, then R is a UFD if and only if every irreducible element is prime.

129
Hung-Hsun, Yu 9.8 PID is UFD

Sketch of Proof. If R is a UFD, then for every irreducible p and any two elements a, b such
that p|ab, write this as pc = ab and expand this into the factorization into irreducibles.
By the uniqueness of factorization of pc = ab, we know that one of a, b is divisible by an
associate of p, and hence is divisible by p. Therefore p is a prime.
Conversely, if every irreducible element p is prime, then we can prove that the factor-
ization is unique by induction on the number of irreducible elements in the factorization,
denoted by n. It clearly holds when n = 0. Now suppose that it holds when n = k, and
suppose that we can write
a = up1 . . . pk+1 = u′ q1 . . . qm .
Since pk+1 is irreducible, it is prime by assumption. Therefore there exists an i such that
pk+1 |qi . WLOG i = m, for we can permutes q1 , . . . , qm at our will. Since qm is irreducible,
we have that pk+1 , qm are associates. Suppose that qm = u′′ pk+1 where u′′ is a unit, then
since R is an integral domain, we have that
a′ = up1 . . . pk = (u′ u′′ )q1 . . . qm−1
and so by inductive hypothesis we’re done.
Theorem 9.8.1. Any PID is a UFD.

Sketch of Proof. It suffices to show that every irreducible element is prime in a PID. Let
p be an irreducible element and a, b be two elements such that p|ab. Since we’re in PID,
we know that da = gcd(a, p) and db = gcd(b, p) exist. Since p is irreducible, we know
that da , db are either units or associates of p. If one of them is an associate of p then
we’re done, so suppose that they are both units. WLOG suppose that da = db = 1. By
Bezout’s theorem there exist xa , xb , ya , yb such that axa + pya = 1 and bxb + pyb = 1.
Therefore
1 = (axa + pya )(bxb + pyb ) = abxa xb + p(axa yb + bxa yb + pya yb ).
Note that the right hand side is divisible by p, which shows that p is a unit. Thus we
reach a contradiction, showing that one of da , db has to be an associate of p.
Example 9.8.1. For any field F , the integral domain F [x] is a PID and therefore is a
UFD.

Knowing that an integral domain is a UFD gives us a lot of benefit. For example, we
can know that gcd and lcm exist.
Property 9.8.2. If R is a UFD, then for any a, b ∈ R, we have that the gcd and lcm
of a, b exist.

Sketch of Proof. Write a, b as


a = upα1 1 . . . pαnn , b = u′ pβ1 1 . . . pβnn
where αi , βj are non-negative integers and p1 , . . . , pn are irreducible elements that are
distinct up to association. Then we have that
min(α1 ,β1 ) max(α1 ,β1 )
gcd(a, b) = p1 . . . pnmin(αn ,βn ) , lcm(a, b) = p1 . . . pmax(α
n
n ,βn )
.

130
Hung-Hsun, Yu 9.9 Polynomial Ring

9.9 Polynomial Ring


We know that if F is a field, then we can view the set F [x] of polynomials with coefficients
in F as a ring. In fact, we can generalize this concept to rings. For simplicity, let’s
assume that R is a commutative ring with 1. Then we can follow our instinct to define
the polynomial ring over R.
Definition 9.9.1. Suppose that R is a commutative ring with 1. Then the polynomial
ring R[x] over R is the commutative ring with 1 defined on the set


{ ai xi |ai ∈ R, ∃N ∈ N s.t. ai = 0 ∀i ≥ N }
i=0

with the addition and multiplication defined as


(∞ ) (∞ ) ∞
∑ i
∑ i

ai x + bi x = (ai + bi )xi
i=0 i=0 i=0

and (∞ ) (∞ ) Ñ é
∑ ∑ ∞
∑ ∑
ai x i
· bi x i = aj b k xi .
i=0 i=0 i=0 j+k=i

Here we define the polynomials as an infinite sum whose summands are eventually
zero. This is actually just a fancier notation of the usual definition of polynomials where
all the zeros are omitted. The addition and multiplication are just simply the general-
ization of the ones in F [x]. One needs to check that the two operations are well-defined,
and that this chosen addition and multiplication actually satisfy the ring axioms. This
is left as an exercise.
We can also consider the polynomial ring over R in multiple variables.
Definition 9.9.2. Suppose that R is a commutative ring with 1. Then the polynomial
ring R[x1 , . . . , xn ] over R in n variables can be inductively defined as
R[x1 , . . . , xn ] = (R[x1 , . . . , xn−1 ])[xn ].

If we expand this definition, then R[x1 , . . . , xn ] actually consists of the polynomials


∞ ∑
∑ ∞ ∞

··· ai1 ,...,in xi11 · · · xinn
i1 =0 i2 =0 in =0

where ai1 ,...,in is eventually zero. This is kind of messy, so sometimes we introduce the
multi-index notation i = (i1 , . . . , in ). If we define xi to be xi11 · · · xinn , then we can instead
write the polynomials in n variables as

ai x i
i∈Nn
0

where ai is eventually zero, in the sense that ai = 0 when i1 + . . . + in is sufficiently large.


It is quite inconvenient to write i1 + . . . + in , so we usually denote this as |i|.
With this notation, we can simply write the definitions of addition and multiplication
as Ñ é Ñ é Ñ é
∑ ∑ ∑
ai x i + bi x i = (ai + bi )xi
i∈Nn
0 i∈Nn
0 i∈Nn
0

131
Hung-Hsun, Yu 9.9 Polynomial Ring

and Ñ é Ñ é Ñ é
∑ ∑ ∑ ∑
ai x i · bi x i = (aj bk )xi .
i∈Nn
0 i∈Nn
0 i∈Nn
0 j+k=i

In this polynomial ring, we can also define its degree.


Definition 9.9.3. If f ∈ R[x1 , . . . , xn ] is a non-zero polynomial in n variables, then the
degree deg f of f is defined as the largest non-negative integer N such that there exists
|i| ≥ N and the coefficient of xi in f is non-zero.

The degree of the zero polynomial is kind of hard to define. There might be sometime
that we want deg 0 to be 0, and maybe sometime we want deg 0 to be −1 or −∞. In this
note, I will specify what deg 0 is if necessary.
Property 9.9.1. If f, g ∈ R[x1 , . . . , xn ] are two polynomials, then

deg(f + g) ≤ max(deg f, deg g)

and
deg f g ≤ deg f + deg g.
If furthermore R is an integral domain, then

deg f g = deg f + deg g.

Here deg 0 = −∞.

Definition 9.9.4. A polynomial f in n variables is homogeneous if for any i ∈ Nn0 , we


have that the coefficients of xi in f is nonzero implies |i| = deg f . Here deg 0 = −∞.

Property 9.9.2. The sum of two homogeneous polynomials of the same degree is still
homogeneous. The product of two homogeneous polynomials is homogeneous.

There are a lot of properties that R[x] can inherit from R. By induction, we can
also show that R[x1 , . . . , xn ] inherit the same properties from R, so let’s focus on the
interaction between R and R[x].
Property 9.9.3. If R is an integral domain, then R[x] is also an integral domain.

Sketch of Proof. Suppose that f, g are two nonzero polynomials in R[x] and deg f =
n, deg g = m, then we know that [xn ]f, [xm ]g is nonzero. Here [xn ]f means that the
coefficient of xn in f . Now it’s clear by definition that [xn+m ]f g = [xn ]f [xm ]g ̸= 0, which
completes the proof.

Theorem 9.9.1. (Hilbert’s basis theorem) If R is Noetherian, then R[x] is also Noethe-
rian.

Sketch of Proof. For every ideal I in R[x], we have to show that I is finitely generated.
Let LI be the set containing zero and the leading coefficients of polynomials in I. We
can show that LI is an ideal in R. It suffices to show that a − b ∈ LI for every a, b ∈ LI
and ra ∈ LI for every a ∈ LI , r ∈ R.

132
Hung-Hsun, Yu 9.10 Adjoining an element

Suppose that a, b ∈ LI where a, b are leading coefficients of f, g ∈ I. Then either


a − b = 0 or the leading coefficient of xdeg g f − xdeg f g is a − b, and so a − b ∈ LI since
xdeg g f − xdeg f g ∈ I. Besides, if a ∈ LI is the leading coefficient of f and r is an element
in R, then either ra = 0 ∈ LI or the leading coefficient of rf is ra. This shows that
ra ∈ LI , and so LI is an ideal in R. By the assumption, LI is finitely generated.
Suppose that LI = ⟨a1 , . . . , an ⟩ where ai is the leading coefficient of fi . Assume that
N ≥ deg fi for any i. For any 0 ≤ i < N , let Li be the set containing zero and the
leading coefficients of the polynomials with degree i. By the same argument we can show
that Li is an ideal in R, and so is finitely generated. Suppose that Li is generated by the
leading coefficients of gi,1 , . . . , gi,ti . We claim that I is generated by fi (1 ≤ i ≤ n) and
gi,j (0 ≤ i < N, 1 ≤ j ≤ ti ). If not, denote the ideal generated by fi , gi,j by I ′ . Then there
exists a nonzero element f in I\I ′ that has the smallest degree.
Case 1. deg f < N . For simplicity let i = deg f . Then we know that there exists
r1 , . . . , rti such that
[xi ] (r1 gi,1 + · · · + rti gi,ti ) = [xi ]f.
Therefore f ′ := f − (r1 gi,1 + · · · + rti gi,ti ) is a polynomial with degree at most i − 1. This
shows that f ′ ∈ I ′ , and so f ∈ I ′ , which is a contradiction.
Case 2. deg f ≥ N . For simplicity let i = deg f . Then we know that [xi ]f can be
generated by the leading coefficient of f1 , . . . , fn , and so there exists r1 , . . . , rn such that
Ä ä
f ′ := f − r1 xi−deg f1 f1 + · · · + rn xi−deg fn fn
is a polynomial with degree at most i − 1. This forces f ′ to be in I ′ , and consequently
f ∈ I ′ , which is again a contradiction.
Since we reach a contradiction in either case, we know that I ′ = I, and so we’re
done.
Actually, we also have that if R is a UFD then R[x] is also a UFD. See exercise for
this.

9.10 Adjoining an element


Let’s keep the assumption that R is a commutative ring with 1 in this section. In the
previous section we view R[x] as a ring consisting of polynomials. One can also see R[x]
as “the smallest commutative ring that contains R and x.” In this section, we are going
to generalize this concept using polynomial ring.
Definition 9.10.1. Suppose that R is a subring of a commutative ring E with 1 such
that 1R = 1E . If α is an element of E, then we denote the smallest subring consisting R
and α as R[α]. This is read as “R adjoin α.”

Property 9.10.1. (Universal property of polynomial ring) Suppose that R ⊆ E where


R, E are both commutative rings with 1 and 1R = 1E . Then for any α ∈ E, there exists
a unique surjective homomorphism ϕ from R[x] to R[α] such that ϕ|R is the identity map
and ϕ(x) = α.

Sketch of Proof. It is clear that the only homomorphism that satisfies the condition is
(∞ ) ∞
∑ i

ϕ ai x = ai α i .
i=0 i=0

133
Hung-Hsun, Yu 9.11 Fraction Field and Localization

It remains to show that it is surjective. It is clear by definition of the ring that im ϕ ⊆


R[α]. Since im ϕ is a ring that contains R and α, we have that im ϕ = R[α].

Corollary 9.10.1. R[α] is always a quotient ring R[x]/Iα where Iα is the polynomials
who vanish at α.

We can also do this in an opposite way: if we want to add an element α into R, we


can define R[α] to be R[x]/Iα where Iα is an ideal that is determined by the property
that we hope α to have. The only thing that we have to check is that Iα ∩ R = {0} to
make sure that R is still a subring of R[x]/Iα .
Example 9.10.1. We can use this concept to construct C from R. We know that C is
simply R adjoin an imaginary root i that satisfies i2 = −1. Therefore we can consider
Ii = ⟨x2 + 1⟩. It is easy to check that Ii ∩ R = {0}, and so C = R[i] ∼
= R[x]/⟨x2 + 1⟩.

We can similarly define the field analogue of this. The details are left out for exercise.
Definition 9.10.2. Suppose that F is a subfield of K and α is an element of F . Then
we denote the smallest field consisting of F and α as F (α).

Property 9.10.2. (Universal property of field of rational functions) Suppose that F


is a subfield of K. Then for every nonzero α ∈ E, there exists a unique surjective
homomorphism ϕ : F (x) → F (α) such that ϕ|F is the identity map and ϕ(x) = α. Here,
F (x) is the field with the underlying set
® ´
f (x)
| f, g ∈ F [x], g ̸= 0 .
g(x)

9.11 Fraction Field and Localization


In the previous section we construct C from R. The construction of R from Q is quite
a mess and is irrelevant to algebra so let’s put that aside for now. So let’s think about
how one should construct Q from Z.
We know that in Q are the numbers of the form p/q where p, q ∈ Z and q ̸= 0. If
we want to describe the property of p/q with what we have in Z, the easiest way is that
p/q is the number that satisfies qx = p. It is clear that it suffices to make sure that 1/q
exists. Therefore we can think of Q as a ring where any nonzero element in Z is a unit,
or in other words, Q is a field that contains Z.
To formally construct Q from Z, we can temporarily denote p/q as (p, q). The addition
and multiplication can be defined in the usual way:

(a, b) + (c, d) = (ad + bc, bd);

(a, b)(c, d) = (ac, bd).


Note that we also have to define an equivalent relation (a, b) ∼ (c, d) ⇔ ad = bc, so that
1/3 is the same as 2/6. One has to check that addition and multiplication is compatible
with this equivalent relation, and this is left as an exercise.
It is then clear how we should generalize this concept to a general integral domain.

134
Hung-Hsun, Yu 9.11 Fraction Field and Localization

Definition 9.11.1. Suppose that R is an integral domain. The fraction field F of R is


a field defined on the set
(R, R\{0})/ ∼
where ∼ is the equivalent relation

(a, b) ∼ (c, d) ⇔ ad = bc

and the addition and multiplication are defined as

(a, b) + (c, d) = (ad + bc, bd)

and
(a, b)(c, d) = (ac, bd).

Property 9.11.1. (Universal property of fraction field) Suppose that R is an integral


domain and F is its fraction field. If F ′ is a field and ϕ : R → F ′ is an injective
ring homomorphism, then there exists a unique ring homomorphism ϕ′ : F → F ′ such
that ϕ′ |R = ϕ. In otherwords, there exists a unique ϕ′ such that the following diagram
commutes.

ϕ
R F′

ϕ′
F

Sketch of Proof. It is clear that ϕ′ ((a, b)) = ϕ(a)ϕ(b)−1 satisfies the conditiona and is the
only homomorphism that satisfies this.

Example 9.11.1. Consider the polynomial ring F [x] over a field F . F [x] is an integral
domain, so we can consider its fraction field, which is usually denoted as F (x). This field
consists of the function
f (x)
g(x)
where f, g ∈ F [x] and g is non-zero. We call this kind of function a rational function. Note
that rather than thinking of these functions as functions defined on F , we should think
of them as a formal expression. This is because that the function is actually undefined at
the roots of g, and this does not bother us too much when we only think of the expression
formally.

Example 9.11.2. Now let’s consider another integral domain F [[x]] which consists of
the elements


ai xi
i=0

where ai ∈ F . Different from polynomials, elements in F [[x]] can have non-eventually-


zero coefficients. This is usually called a formal power series, and so F [[x]] is the ring
of formal power series in x. It is easy to show that every power series with a non-zero

135
Hung-Hsun, Yu 9.11 Fraction Field and Localization

constant term is a unit, and so the fraction field F ((x)) of F [[x]] should consists of the
elements


ai x i
i=−m

where m is some integer. This is usually called the ring of formal Laurent series.

Note that if a commutative ring R is not an integral domain and has a zero divisor
c such that cd = 0 where c, d ̸= 0, then ∼ is no longer an equivalent relation. This is
because that (0, 1) ∼ (0, d) ∼ (c, 1) but (0, 1) ̸∼ (c, 1). To address this, we have to make
a different definition for the more general case.
Definition 9.11.2. Suppose that R is a commutative ring with 1 and S is a multi-
plicatively closed subset, then the localization of R by S is constructed as follows: The
underlying set is R × S/ ∼ where ∼ is the equivalent relation

(r1 , s1 ) ∼ (r2 , s2 ) ⇔ ∃u ∈ S s.t. u(r1 s2 − r2 s1 ) = 0.

The addition and multiplication is still defined as

(r1 , s1 ) + (r2 , s2 ) = (r1 s2 + r2 s1 , s1 s2 )

and
(r1 , s1 )(r2 , s2 ) = (r1 r2 , s1 s2 ).
This ring is usually denoted as S −1 R.

This solves the problem above: if we allow c, d to be in the denominator, then we


know that 0 = cd ∈ S, which shows that S −1 R is a zero ring. Although it is not useful,
it is at least well-defined.
Property 9.11.2. (Universal property of localization) Suppose that R is a commutative
ring with 1 and S is a multiplicatively closed subset containing 1. If R′ is a commutative
ring with 1 and ϕ : R → R′ is a ring homomorphism sending 1R to 1R′ such that ϕ(s)
is a unit for any s ∈ S, then there exists a unique homomorphism ϕ′ : S −1 R → R′ such
that ϕ′ ◦ π = ϕ where π : R → S −1 R is the canonical map π(r) = (r, 1). In other words,
there exists a unique ϕ′ such that the following diagram commutes.

ϕ
R R′

π
ϕ′
S −1 R

Sketch of Proof. It is clear that we must choose ϕ′ ((r, s)) = ϕ(r)ϕ(s)−1 . We still have
to check that this is well-defined. If (r1 , s1 ) ∼ (r2 , s2 ), then there exists u ∈ S such
that u(r1 s2 − r2 s1 ) = 0. Therefore ϕ(u)(ϕ(r1 )ϕ(s2 ) − ϕ(r2 )ϕ(s1 )) = 0. Since ϕ(u) is a
unit, we can cancel out ϕ(u). By multiplying ϕ(s1 )−1 ϕ(s2 )−1 on both sides, we get that
ϕ(r1 )ϕ(s1 )−1 = ϕ(r2 )ϕ(s2 )−1 .

136
Hung-Hsun, Yu 9.12 Nullstellensatz and Algebraic Geometry

Note that the canonical map π need not be injective: if S contains a zero divisor c
such that cd = 0 where d ̸= 0, then (0, 1) ∼ (d, 1) and so π(d) = 0.
When R is an integral domain and S is R\{0}, then the localization of R by S is the
fraction field of R. We can furthermore generalize this:
Definition 9.11.3. Suppose that R is a commutative ring with 1 and p is a prime ideal
in R, then the localization Rp of R at p is the localization of R by R\p.

Besides this, we can also take S generated by a single element x. That is, we can take
S = {xn |n ∈ N0 }.
Definition 9.11.4. Suppose that R is a commutative ring with 1 and x is an element
in R, then the localization Rx of R away from x is the localization of R by {xn |n ∈ N0 }.

It might be unclear at this point why we call such an operation a localization, but it
will be clear in a moment.

9.12 Nullstellensatz and Algebraic Geometry


Suppose that F is a field. Then we know that F [x] is a PID. Now let’s look for all
maximal ideals in F [x]. Suppose that m is a maximal ideal, then since F [x] is principal
we can assume that it is generated by an element f . Clearly f ̸= 0, so since m is prime
we have that f is also prime, and thus irreducible. Conversly, if f is irreducible, then
by Bezout’s theorem we know that for every g ̸∈ ⟨f ⟩, there exists x, y ∈ F [x] such that
xf + yg = 1, and so g + ⟨f ⟩ is a unit in F [x]/⟨f ⟩. This shows that F [x]/⟨f ⟩ is a field,
and so ⟨f ⟩ is maximal.
Example 9.12.1. Take F = C. We know by the fundamental theorem of algebra that
f is irreducible if and only if f = x − c for some c ∈ C. Therefore we know that the
maximal ideals of C[x] are in bijection with the points in C. More generally, if F is an
algebraically closed field, i.e. a field such that each non-constant polynomial in F [x] has
a root in F , then the maximal ideals of F [x] are in bijection with the points in F .

How about the maximal ideals of F [x1 , x2 , . . . , xn ]? We know that for every point
c = (c1 , . . . , cn ) in F n , we can find a maximal ideal ker evc where

evc (f ) = f (c1 , c2 , . . . , cn ).

This is because that F [x1 , x2 , . . . , xn ]/ ker evc ∼


= im evc = F is a field. Actually, if F is
algebraically closed, then these are the only maximal ideals. This is called the weak form
of Hilbert’s nullstellensatz, and we are going to prove it after we prove a lemma.
Lemma 9.12.1. (Zariski’s lemma) If F is an algebraically closed field and m is a max-
imal ideal of F [x1 , . . . , xn ], then F [x1 , . . . , xn ]/m ∼
= F.

Sketch of Proof. The actual proof of this is quite technical and requires much more back-
ground knowledge, so here we are only going to prove the case where F is uncountable.
Think on the bright side, C is an uncountable algebraically closed field!
Let F ′ = F [x1 , . . . , xn ]/m. It is clear that F ⊆ F ′ , and so we can see F ′ as a vector
space over F . Since F ′ can be generated by x1 + m, . . . , xn + m, we know that dimF F ′ is

137
Hung-Hsun, Yu 9.12 Nullstellensatz and Algebraic Geometry

countable. Now, if any of xi + m is not in F , then we know that xi + m is not any roots
of polynomials in F [x]. This shows that F [x] ⊆ F ′ , and so by the universal property of
fraction field we know that F (x) ⊆ F ′ . This is absurd since the dimension of F (x) over
F is uncountable, for we can choose an uncountable basis

1
(c ∈ F ).
x−c

Theorem 9.12.1. (Weak form of Hilbert’s nullstellensatz) Suppose that F is an alge-


braically closed field, then the maximal ideals of F [x1 , . . . , xn ] are in bijection with the
points in F n via
c ∈ F n 7→ ker evc .

Sketch of Proof. We only need to prove that for every maximal ideal m of F [x1 , . . . , xn ]
we can find a point c ∈ F n such that m = ker evc . Let ϕi : F [xi ] → F [x1 , . . . , xn ] be the
canonical embedding and π : F [x1 , . . . , xn ] → F [x1 , . . . , xn ]/m be the canonical projec-
tion. By Zariski’s lemma F [x1 , . . . , xn ]/m ∼= F . Therefore π ◦ ϕi is a ring homomorphism
from F [xi ] to F . It is easy to see that π(ϕi (c)) = c for any c ∈ F , and so π ◦ϕi (c) is surjec-
tive. It is clear that the kernel is m ∩ F [xi ], which should be maximal. Therefore we know
that m ∩ F [xi ] = ⟨xi − ci ⟩ for some ci ∈ F . This shows that ⟨x1 − c1 , . . . , xn − cn ⟩ ⊆ m.
Since ker evc = ⟨x1 − c1 , . . . , xn − cn ⟩, we know that m = ker evc .

A collorary of this is that if F is algebraically closed and I is an ideal of F [x1 , . . . , xn ]


such that there is no common zero of functions in I, then I is the unit ideal. To put it
more clear, let’s make a definition.
Definition 9.12.1. Suppose that I is an ideal of F [x1 , . . . , xn ]. The vanishing locus
V (I) of I is a subset of F n consisting of points c such that f (c1 , . . . , cn ) = 0 for any
f ∈ F [x1 , . . . , xn ].

Corollary 9.12.1. If F is an algebraically closed field, then for any ideal in F [x1 , . . . , xn ],
we have I = ⟨1⟩ if and only if V (I) is an empty set.

Sketch of Proof. If I = ⟨1⟩, then clearly V (I) is empty because 1 ̸= 0. Conversely,


suppose that V (I) is empty, then since there is a common zero for any maximal ideal by
the weak form of Hilbert’s nullstellensatz, I cannot be contained in any maximal ideal.
This forces I to be the unit ideal.

At this point we can see that nullstellensatz tells us a lot about the zero of an ideal.
This is why it is called nullstellensatz—the word is the german of “the theorem of zero”.
Using the weak form of Hilbert’s nullstellensatz, we can prove the strong form.

Definition 9.12.2. The radical I of an ideal I is the ideal containing the elements s
such that there exists n ∈ N satisfying sn ∈ I.

Theorem 9.12.2. (Strong form of Hilbert’s nullstellensatz) Suppose that F is an al-


gebraically closed field and I is an ideal of √
F [x1 , . . . , xn ]. If J is the ideal consisting of
polynomials that vanish on V (I), then J = I.

138
Hung-Hsun, Yu 9.13 Random Problem Set

Sketch of Proof. The proof that I am going to show is the Rabinowitsch trick. It is simple
but kind of comes out from nowhere. For a proof that is longer but easier to come up
with, see the exercise.
√ √
It is clear that I ⊆ J, so it suffices to show that J ⊆ I. Since F [x1 , . . . , xn ] is
Noetherian, we can suppose that I is generated by g1 , . . . , gn . Now suppose that f is a
polynomial that vanishes on every point where g1 , . . . , gn vanish. We have to show that
there exists r ∈ N such that f r ∈ I.
Introduce a new variable x0 , and consider the polynomials g1 , g2 , . . . , gn , 1 − x0 f . It
is easy to see that those polynomials do not have any common zero, and so the ideal
generated by them is the unit ideal. Hence there exists h0 , . . . , hn ∈ F [x1 , . . . , xn , x0 ]
such that
1 = h1 g1 + h2 g2 + · · · + hn gn + h0 (1 − x0 f ).
Now let x0 = 1/f . Then we have that
Ç å Ç å
1 1
1 = h1 x1 , . . . , x n , g1 + · · · + hn x1 , . . . , xn , gn .
f (x1 , . . . , xn ) f (x1 , . . . , xn )
Choose r ∈ N greater than the degrees of hi ’s. Then
Ç å
1
h′i r
:= f (x1 , . . . , xn ) hi x1 , . . . , x n ,
f (x1 , . . . , xn )
is a polynomial in F [x1 , . . . , xn ]. Therefore

f r = h′1 g1 + · · · + h′n gn

is also in the ideal I.


Hilbert’s nullstellensatz is the foundation of a branch of mathematics called algebraic
geometry. The weak form tells us that we can see the maximal ideals as points, and the
strong form tells us that if we want to study the zero locus of some polynomials (which
is a geometric object), we can refer to the ring of polynomials (which is an algebraic ob-
ject). This enlightens people to try to generalize this phenomenon and develop algebraic
geometry with the help of commutative algebra (i.e. the study of commutative ring with
1).
Now we can explain why the term localization is used. Say we are studying the ring
C[x]. Hilbert’s nullstellensatz says that C[x] describes C, and we can see C[x] as the
well-behaved functions on C. If we want to look closer at the point 0 ∈ C and forget
everything that is far away, then any function that is not zero at x can be invertible.
Therefore intuitively the localization at the point 0 should be the rational functions that
are defined at 0. This coincide with C[x]⟨x⟩ , which again justifies that the maximal ideal
⟨x⟩ represents the point 0.

9.13 Random Problem Set


1. (9.1) Show that for every vector space V , the set of linear operators on V form a
ring. This is usually denoted as End(V ).

2. (9.1) Show that if R is an integral domain, then R[x], the ring of polynomials with
coefficients in R, is also an integral domain.

139
Hung-Hsun, Yu 9.13 Random Problem Set

3. (9.2) (Hard if you didn’t see this before) Show that the only ring automorphism of
R is the identity map.

4. (9.3) Suppose that ϕ : R → R′ is a ring homomorphism and I ′ is an ideal of R′ .


Show that ϕ−1 (I ′ ) is an ideal of R. This is called the contraction of I ′ .

5. (9.4)
√ Consider
√ the integral domain R = {a + b −5|a, b ∈ Z}. Show that 2, 3, (1 +
−5), (1 − −5) are all irreducible but not prime.

6. (9.5) Show that the contraction of a prime ideal is still a prime ideal.

7. (9.5) Let R be a commutative ring with 1 and I be a prime ideal of R. Show that
if R/I is finite, then I is maximal.

8. (9.5) Show that if (R, +) is an abelian group and we define a · b = 0 in R, then


subgroups of R correspond to ideals of R. Show that given a prime p, the subgroup
G of (Q/Z, +) where
m
G = { n + Z|m, n ∈ Z}
p
has no maximal proper subgroup. Conclude that there exists a rng that does not
have any maximal ideals.

9. (9.6) Show that the integral domain



R = {a + b −2|a, b ∈ Z}

is a PID by showing that it is an ED.

10. (9.7) (Slightly harder) Show that the algebraic integers form an integral domain.
Here “algebraic integers” means the numbers that can be represented as roots of
some monic polynomial with integer coefficients. Show that there exists an element
that is not a unit, and show that any non-unit element is reducible.

11. (9.7) If we replace the ascending chain condition with the “descending chain con-
dition”, then we get the definition of an Artinian rng. Show that the descending
chain condition is equivalent to the “minimal” condition. Show that an Artinian
integral domain must be a field.

12. (9.8) Show that in any PID, any non-zero prime ideal is maximal. Conversely, show
that if in a Noetherian UFD, any non-zero prime ideal is maximal, then it is a PID.
Hint: For the second part, try to prove Bezout’s theorem first.

13. (9.8) Show that the integral domain


( √ )
1+ −7
R = {a + b |a, b ∈ Z}
2
is a UFD by showing that it is an ED. Using this fact, solve the equation

x2 + x + 2 = y 3

in integers.
Ä √ äÄ √ ä
Hint: 4(x2 + x + 2) = (2x + 1)2 + 7 = (2x + 1) + −7 (2x + 1) − −7 .

140
Hung-Hsun, Yu 9.13 Random Problem Set

14. (9.9) Show that if A is a UFD, then A[x], the ring of polynomials with coefficients
in A, is also a UFD. Use this to construct examples of UFDs that are not PIDs.
Hint: You might need to consider the fraction field of A and Gauss’ lemma, which
states that the product of two primitive polynomials (i.e. polynomials whose coef-
ficients’ gcd is 1) is still primitive.

15. (9.9) Use Hilbert’s basis theorem to prove the following statement:
If a1 , a2 , . . . is a sequence of positive integers such that ai ∤ aj for any i ̸= j, then
show that the set of prime factors of this sequence is infinite.
Try to emulate the proof of Hilbert’s basis theorem to generate an elementary proof
of this.

16. (9.10) Show that if F ⊆ K are two fields such that K = F [θ], then K ∼
= F [x]/I for
some maximal ideal I in F [x].

17. (9.10) Let R = Z/6Z. Show that there does not exist R[α] where 2α + 1 = 0.

18. (9.10) Let F be a field and f be a non-zero polynomial. Show that F [x]/⟨f ⟩ is
a vector space over F with dimension equal to deg f . Construct a function N :
F [x]/⟨f ⟩ → F such that N (a)N (b) = N (ab) for any a, b ∈ F [x] and N (a) = adeg f
for any a ∈ F .

19. (9.11) Given a ring homomorphism ϕ : R → R′ , the extension of an ideal I in R


is the ideal ⟨ϕ(I)⟩ in R′ . Suppose that R is a commutative ring with 1 and S is
a multiplicatively closed subset containing 1, show that every ideal in S −1 R is an
extension (with respect to the canonical map) of an ideal in R.

20. (9.11) A commutative ring R with 1 is a local ring if one of the two following
equivalent definitions hold:
(1) R only has a maximal ideal;
(2) R is not a zero ring and any sum of two non-unit elements is non-unit.
Show that these two definitions are equivalent. Show that the localization at a
prime ideal is always local.

21. (9.11) (1) An integral domain R is a valuation ring if for every 0 ̸= x ∈ F , one of
x, x−1 is in R. Here F is a fraction field of R. Show that any valuation ring is a
local ring.
(2) Suppose that R is a valuation ring and F is its fraction field. Let Γ = F × /R× ∪
{∞} where R× is the set of units in R and the abelian quotient group F × /R× is
written additively. We can define a total ordering on Γ by

a ≥ b ⇔ a − b is an image of an element in R

when a, b ̸= ∞, and define ∞ to be the maximal element. Define the valuation map
v : F → Γ such that v(a) = aR× for a ̸= 0 and v(0) = ∞. Show that the valuation
map satisfies the strong triangle inequality

v(a + b) ≥ min(v(a), v(b)).

141
Hung-Hsun, Yu 9.13 Random Problem Set

(3) Show that for any field F , the ring of formal power series F [[x]] is a valuation
ring (and hence local), and the valuation map on the ring of formal Laurent series
F ((x)) is isomorphic to the function

v(f ) = inf{N ∈ Z | [xi ]f = 0 ∀i < N } ∀f ∈ F ((x)).


»
22. (9.12) Suppose that R is a nonzero commutative ring with 1. The nilradical ⟨0⟩ is
the ideal consisting of nilpotent elements, i.e. elements whose powers can be zero.
Show that the nilradical is the intersection of all prime ideals in R.
Hint: To show that the intersection of all prime ideals lie in the nilradical, construct
a prime ideal for every non-nilpotent element such that the ideal does not contain
that element. It is clear how to do this for units. Try to think of how to somehow
reduce the general case to the unit case.

23. (9.12) Let R be a nonzero commutative ring with 1. R is Jacobson if any prime
ideal is the intersection of the maximal ideals containing it. Using Zariski’s lemma,
show that for any algebraically closed field F and any ideal I in F [x1 , . . . , xn ], the
quotient ring F [x1 , . . . , xn ]/I is Jacobson.
Hint: Follow the solution of the previous question and apply Zariski’s lemma.

24. (9.12) If R is a commutative ring with 1, then its Jacobson radical is the intersection
of all maximal ideals in R. Show that if R is Jacobson, then its nilradical is equal
to its Jacobson radical.

25. (9.12) Using the previous three problems, prove the strong form of Hilbert’s null-
stellensatz with the weak form.

142
Chapter 10

Module Theory

A vector space is an abelian group endowed with a scalar multiplication with elements
in a field. It is not hard to see that in order to make the definition make sense, it suffices
to require the scalars to be the elements in a ring. Roughly speaking, this is called a
module. Despite the similarity of the definitions of a vector space and a module, the
properties of a module is substantially different from the ones of a vector space because
of the lack of multiplicative inverse. In this chapter, we will try to see what we can still
get even though we do not necessarily have multiplicative inverse.
Module theory has a lot of applications. As we will see, it is related to group rep-
resentation. Besides, the structure theorem of module over PID can be directly applied
to characterize finitely generated abelian groups, and also can help us derive another
canonical form of linear operator called rational form.

10.1 Definitions and Examples


We know that a ring is an algebraic structure where we can do addition and multiplication.
These two operations are enough to define the “vector space over a ring.”
Definition 10.1.1. Suppose that R is a rng. An abelian group M is a (left) module
over R if it is equipped with a scalar multiplication · : R × M → M such that
(1) r · (m1 + m2 ) = r · m1 + r · m2 for all r ∈ R, m1 , m2 ∈ M ;
(2) (r1 + r2 ) · m = r1 · m + r2 · m for all r1 , r2 ∈ R, m ∈ M ;
(3) r1 · (r2 · m) = (r1 r2 ) · m for all r1 , r2 ∈ R, m ∈ M .
As always, we drop the dot if it does not lead to any ambiguity.

Remark. When we define a vector field, we furthermore require that 1 · v = v. Since


we do not require a ring to have a multiplicative identity, we do not impose this on a
general module. However, if R is a ring with 1, we usually require 1 · m = m. To avoid
ambiguity, I will call this kind of module a unital module.

Property 10.1.1. 0R · m = 0M for every m ∈ M .

Example 10.1.1. If R is a rng, then R itself is a module over R. Moreover, if I is an


ideal in R, then both R/I and I are modules over R.

Example 10.1.2. If V is a vector space over a field F , then V is also a unital module

143
Hung-Hsun, Yu 10.1 Definitions and Examples

over F .

Example 10.1.3. Suppose that G is an abelian group, then we can see G as a Z-module
by defining n · g to be the sums of n copies of g for every n ∈ N and g ∈ G.

Example 10.1.4. Let End(V ) be the set of linear operators on a vector space V over
F . For every f ∈ F , we can define f · T to be the linear operator f (T ). Then we can
verify that End(V ) is a module over F [x].

Note that if we fix an element r and consider the left multiplication lr , then lr is a
group homomorphism from M to itself. We call this an endormorphism.
Definition 10.1.2. An endormophism is a homomorphism from an object O to itself.
If the endomoprhisms are compatible with an addition defined on O, then the set End(O)
of endomorphisms on O form a ring with 1 by defining

(f + g)(o) = f (o) + g(o) ∀f, g ∈ End(O), o ∈ O

and
(f g)(o) = f (g(o)) ∀f, g ∈ End(O), o ∈ O.

Property 10.1.2. Suppose that M is a module over R. Then the function ϕ : R →


End(M ) sending an element r to its left multiplication lr is a ring homomorphism. Con-
versely, if M is an additive abelian group, then any ring homomorphism ϕ : R → End(M )
uniquely determines a module structure of M over R. The requirement that M is a unital
module is equivalent to that ϕ(1R ) = id.

Note that this is somehow similar to the group representation ρ : G → GL(V ). We


can actually write the group representation in a form of a module: if V is a vector space
over F , let F [G] be the set consisting of the elements of the form

c1 g1 + · · · + cn gn (n ∈ N, ci ∈ F, gi ∈ G),

then we can simply extend the representation to ρ̄ : F [G] → End(V ) by

ρ̄(c1 g1 + · · · + cn gn ) = c1 ρ(g1 ) + · · · + cn ρ(gn ).

We can furthermore generalize this concept.


Definition 10.1.3. Suppose that G is a group and R is a rng. Then the group ring
R[G] of G over R is the rng consisting of the elements of the form

c1 g1 + · · · + cn gn (n ∈ N, ci ∈ R, gi ∈ G)

where addition and multiplication are defined as


∑ ∑ ′ ∑
ci g i + cg i i = (ci + c′i )gi (ci , c′i ∈ R, gi ∈ G)

and Ä∑ ä Ä∑ ä ∑
ci gi c′i gi = ci c′j (gi gj ) (ci , c′i ∈ R, gi ∈ G).

Property 10.1.3. The group ring F [G] is an F -module.

144
Hung-Hsun, Yu 10.2 Submodule and Ideal

Example 10.1.5. Consider the group ring F [C2 ] where C2 is the cyclic group of order
2 generated by x. Then the group ring consists of c0 + c1 x where c0 , c1 ∈ F . The
multiplication works as

(c0 + c1 x)(c′0 + c′1 x) = (c0 c′0 + c1 c′1 ) + (c0 c′1 + c1 c′0 )x.

Therefore it is clear that F [C2 ] ∼


= F [x]/⟨x2 − 1⟩.

10.2 Submodule and Ideal


As always, let’s think about what a suboject is and what kind of map we will want to
consider.
Definition 10.2.1. Suppose that M is a module over R. For every N ⊆ M , N is a
submodule of M if N also forms a module over R.

Property 10.2.1. If N1 , N2 are submodules of M , then N1 ∩ N2 and N1 + N2 are also


submodules of M .

Therefore it makes sense to define the submodule generated by some elements.


Definition 10.2.2. Suppose that S is a subset of a module M . The submodule ⟨S⟩
generated by S is the intersection of all submodules of M containing S.

It is not hard to write ⟨S⟩ explicitly.


Property 10.2.2. Suppose that S is a subset of a module M over R. Then the sub-
module ⟨S⟩ is the set

{r1 s1 + · · · + rn sn | n ∈ N, ri ∈ R, si ∈ S}.

Definition 10.2.3. A module M is finitely generated if it can be generated by a finite


subset.

Since we know what a submodule is now, we can define a quotient module now.
Definition 10.2.4. Suppose that M is a module over R and N is a submodule of
M , then the quotient module M /N is the module defined on the quotient group M /N
endowed with the scalar multiplication r(m + N ) = rm + N .

We know that R can be seen as a module over itself. If R is commutative, then it is


clear that the submodules of R is simply the ideals in R. Therefore we will want to call
the submodules of R some sort of ideal, even when R is not commutative.
Definition 10.2.5. Suppose that R is a rng. A subring I ⊆ R is a left (right) ideal of
R if for every r ∈ R, i ∈ I, we have that ri ∈ I (ir ∈ I resp.).

We know that a rng R is Noetherian if every ascending chain of ideals in R is even-


tually constant. We can simply replace “ideal” with “submodule” to get the definition of
Noetherian module.

145
Hung-Hsun, Yu 10.2 Submodule and Ideal

Definition 10.2.6. A module M over R is Noetherian if it satisfies the ascending chain


condition: For any sequence of submodules N1 ⊆ N2 ⊆ . . ., the sequence is eventually
constant.

Property 10.2.3. If M is a module over R, then the following three conditions are
equivalent:
(1) Every submodule in M is finitely generated.
(2) M satisfies the ascending chain condition.
(3) M satisfies the maximal condition: For any set of submodules in M , there exists
a maximal element in a sense that it is not contained in any other submodule in the set.

Sketch of Proof. The proof is identical to the one for rings.

Note that when R is not commutative, that R is Noetherian is different from that
R is Noetherian as a left module over itselt. To differentiate them, let’s make another
definition.
Definition 10.2.7. A rng R is left (right) Noetherian if it is Noetherian as a left (right
resp.) module over itself.

At the end of this section, let’s prove a result that can help us verify if a module is
Noetherian or not.
Property 10.2.4. Suppose that M is a module and N is a submodule of M . Then M
is Noetherian if and only if N and M /N are both Noetherian.

Sketch of Proof. If M is Noetherian, then it is clear that N is also Noetherian. For every
submodule K of M /N , consider the submodule π −1 (K) of M where π is the canonical
projection from M to M /N . Since π −1 (K) is a submodule of a Noetherian mdoule, it
is finitely generated, and so K is also finitely generated. This shows that M /N is also
Noetherian.
Conversely, if N and M /N are both Noetherian, then for every submodule K of M ,
consider the submodules K ∩ N of N and the submodules π(K) of M /N . We know that
both submodules are finitely generated. Suppose that K ∩ N is generated by a1 , . . . , an
and π(K) is generated by b1 +N, . . . , bm +N where b1 , . . . , bm ∈ K, then we can show that
K is generated by a1 , . . . , an , b1 , . . . , bm . For every k ∈ K, we know that k + N ∈ π(K),

and so there exists r1′ , . . . , rm ∈ R such that

k ′ := k − (r1′ b1 + · · · + rm

bm ) ∈ N.

It is clear that k ′ is also in K, and so k ′ ∈ K ∩ N . Hence there exists r1 , . . . , rn ∈ R such


that
k ′ = r1 a1 + · · · + rn an .
Hence we have

k = r1 a1 + · · · + rn an + r1′ b1 + · · · rm

bm ∈ ⟨a1 , . . . , an , b1 , . . . , bm ⟩,

as desired.

146
Hung-Hsun, Yu 10.3 Module Homomorphism

10.3 Module Homomorphism


Definition 10.3.1. Suppose that M, N are modules over a rng R. A map ϕ : M → N
is a homomorphism of modules over R, or an R-linear map, if it is a group homomorphism
such that rϕ(m) = ϕ(rm) for every r ∈ R, m ∈ M .

Example 10.3.1. Suppose that G, G′ are abelian groups, then we can see G, G′ as two
Z-modules. Then any group homomorphism from G to G′ is a Z-linear map.

Definition 10.3.2. A homomorphism of modules is an isomorphism if it is bijective.


Two modules are isomorphic if there exists an isomorphism between them.

The isomorphisms still hold for modules. We just have to modify them a bit.
Theorem 10.3.1. (First isomorphism theorem) Suppose that ϕ : M → M ′ is an R-
linear map, then ker ϕ is a submodule of M and im ϕ is a submodule of M ′ . Moreover,
M / ker ϕ is isomorphic to im ϕ.

Theorem 10.3.2. (Second isomorphism theorem) Suppose that N1 , N2 are submodules


of M , then N1 + N2 is a submodule of M and N1 ∩ N2 is a submodule of N1 and N2 .
Besides, (N1 + N2 )/N1 is isomorphic to N2 /(N1 ∩ N2 ).

Theorem 10.3.3. (Third isomorphism theorem) Suppose that N ⊆ K ⊆ M such that


N, K are both submodules of M , then K/N is a submodule of M /N , and we have
(M /N )/(K/N ) is isomorphic to M /K.

Theorem 10.3.4. (Correspondence theorem) For any epimorphism ϕ : M → M ′ , the


submodules of M containing ker ϕ are in bijection with the submodules of M ′ via ϕ.

If we consider the set of R-linear maps from M to N , we can see that it actually forms
a module over R.
Definition 10.3.3. Suppose that M, N are two modules over R. The set HomR (M, N )
is the module over R that contains homomorphisms of modules over R from M to N .
The addition and scalar multiplication are defined as

(ϕ + ϕ′ )(m) = ϕ(m) + ϕ′ (m)

and
(rϕ)(m) = r(ϕ(m))
for every ϕ, ϕ′ ∈ HomR (M, N ), m ∈ M, r ∈ R.

Definition 10.3.4. Suppose that M is a module over R. The set EndR (M ) is the ring
with 1 and at the same time the module over R that contains R-endomoprhisms of M .
In other words, EndR (M ) is the module HomR (M, M ) with an extra multiplication

(ϕϕ′ )(m) = ϕ(ϕ′ (m))

for every ϕ, ϕ′ ∈ EndR (M ) and m ∈ M .

147
Hung-Hsun, Yu 10.4 Direct Sum and Direct Product

For some category reason, given a module M , we usually consider the relation between
N and HomR (M, N ) or the relation between N and HomR (N, M ). This is because that
Hom is a “functor.”
Property 10.3.1. Let M, N, N ′ be modules over R. Suppose that ϕ : N → N ′ is an
R-linear map. Then we can construct canonically an R-linear map ϕ∗ : HomR (M, N ) →
HomR (M, N ′ ) by sending ψ ∈ HomR (M, N ) to ϕ ◦ ψ. Moreover, if ϕ is injective, then ϕ∗
is injective.

Sketch of Proof. To show that ϕ∗ is well-defined it suffices to show that the composition
of R-linear maps is still R-linear, which is quite clear. Now let’s verify that ϕ∗ is R-linear.
For every ψ, ψ ′ ∈ HomR (M, N ), r ∈ R and m ∈ M , we have that

(ϕ∗ (ψ + ψ ′ ))(m) = ϕ((ψ + ψ ′ )(m)) = ϕ(ψ(m) + ψ ′ (m)) = (ϕ∗ (ψ))(m) + (ϕ∗ (ψ ′ ))(m)

and
r(ϕ∗ (ψ))(m) = rϕ(ψ(m)) = ϕ(rψ(m)) = ϕ∗ (rψ)(m).
Therefore ϕ∗ is R-linear. Now suppose that ϕ is injective and ϕ∗ (ψ) = 0. Then for every
m ∈ M we have that ϕ(ψ(m)) = 0. Since ϕ is injective, we have that ψ(m) = 0 and so
ψ = 0. This shows that ϕ∗ is also injective.
We can similarly prove the following property.
Property 10.3.2. Let M, N, N ′ be modules over R. Suppose that ϕ : N → N ′ is an
R-linear map. Then we can construct canonically an R-linear map ϕ∗ : HomR (N ′ , M ) →
HomR (N, M ) by sending ψ ∈ HomR (N ′ , M ) to ψ ◦ ϕ. Moreover, if ϕ is surjective, then
ϕ∗ is injective.

10.4 Direct Sum and Direct Product


In the previous section we learn a method to produce a new modules from two given
modules. In this section, we are going to introduce another two methods—direct sum
and direct product.
Let’s say that we have a family Ms (s ∈ S) of R-modules indexed by the set S, and
we have a family of R-linear maps ϕs : N → Ms . The problem here is that if we can
represent this family of R-linear maps as one R-linear map.
Definition 10.4.1. Suppose that Ms (s ∈ S) is a family of R-modules indexed by S,

then the direct product s∈S Ms is the module defined naturally on the cartesian produce

s∈S Ms . To be more specific, the operations are defined as

(as )s∈S + (bs )s∈S = (as + bs )s∈S

and
r(as )s∈S = (ras )s∈S
for every as , bs ∈ Ms and r ∈ R.

Property 10.4.1. (Universal property of direct product) Suppose that Ms (s ∈ S) is a


family of R-modules and N is a module over R. If ϕs is an R-linear map from N to Ms

for every s ∈ S, then there exists a unique R-linear map ϕ : N → s′ ∈S Ms′ such that

148
Hung-Hsun, Yu 10.4 Direct Sum and Direct Product


πs ◦ ϕ = ϕs for every s ∈ S where πs is the canonical projection from s′ ∈S Ms′ to Ms .
In other words, there is a unique ϕ such that the following diagram commutes for every
s ∈ S:

ϕs
N Ms

ϕ
πs

s′ ∈S Ms′

Sketch of Proof. It is clear that we have to define ϕ(n) = (ϕs (n))s∈S for every n ∈ N .
The rest is just checking that ϕ is an R-linear map.

If we have instead a family of R-linear maps ϕs : Ms → N , and we try to compress


these into an R-linear maps, then we will get instead the direct sum of the modules.
Definition 10.4.2. Suppose that Ms (s ∈ S) is a family of R-modules indexed by S,
⊕ ∏
then the direct sum s∈S Ms is the submodule of s∈S Ms containing the elements who
have finitely many non-zero entries.

Property 10.4.2. (Universal property of direct sum) Suppose that Ms (s ∈ S) is a


family of R-modules and N is a module over R. If ϕs is an R-linear map from Ms to N

for every s ∈ S, then there exists a unique R-linear map ϕ : s′ ∈S Ms′ → N such that

ϕ ◦ is = ϕs for every s ∈ S where is is the inclusion map from Ms to s′ ∈S Ms′ . In other
words, there is a unique ϕ such that the following diagram commutes for every s ∈ S:

ϕs
Ms N


ϕ
s′ ∈S Ms′

Sketch of Proof. It is clear that we have to let



ϕ((ms )s∈S ) = ϕs (ms ).
s∈S

This is well-defined since only finitely many of s satisfy that ms ̸= 0. The rest is just
checking that this is an R-linear map.

Note that when S is a finite set, then there is no difference between the direct sum
and the direct product of the family. This difference only appears because that to make
the universal property of direct product holds, the direct product has to admit elements
with infinitely many nonzero terms, while the direct sum only has to admit elements with
finitely many nonzero terms since only finite addition is defined.

149
Hung-Hsun, Yu 10.5 Balanced Product and Tensor Product

10.5 Balanced Product and Tensor Product


In section 8.4, we have constructed tensor product via considering the bilinear map.
Similar thing can also be done in the module setting.
Definition 10.5.1. Suppose that M1 , M2 , N are R-modules. A map ϕ : M1 × M2 → N
is R-bilinear if for every fixed m1 ∈ M1 we have that ϕ(m1 , ·) : M2 → N is R-linear and
for every fixed m2 ∈ M2 we have that ϕ(·, m2 ) : M1 → N is R-linear.

Property 10.5.1. For every R-bilinear map ϕ : M1 × M2 → N , it induces naturally an


R-linear map ϕ̄ : M1 → HomR (M2 , N ).

Sketch of Proof. For every m1 ∈ M1 let ϕ̄(m1 ) = ϕ(m1 , ·). It is clear that ϕ̄ is well-
defined, and all we have to do is to verify that ϕ̄ is R-linear. For any m, m′ ∈ M1 and
r ∈ R we have that
ϕ̄(m + rm′ ) = ϕ(m + rm′ , ·) = ϕ(m, ·) + rϕ(m′ , ·) = ϕ̄(m) + rϕ̄(m′ ).

Note that in the case that R is non-commutative, things get a little bit off. What we
are imagining first is that a bilinear map is a product-like function. However, when R is
non-commutative and M, N are both left modules, then the “imaginary product” is no
longer a bilinear map: there is no reason that we would want r(m × n) = (rm) × n =
m × (rn). A more natural assumption would be that (mr) × n = m × (rn), which requires
M to be instead a right module.
Definition 10.5.2. Suppose that M is a right R-module, N is a left R-module and
G is an abelian group. A map ϕ : M × N → G is an R-balanced product if for every
m ∈ M, n ∈ N we have that ϕ(m, ·), ϕ(·, n) are both group homomorphism, and that
ϕ(m, rn) = ϕ(mr, n) for every r ∈ R.

To compress the data of an R-balanced product to a group homomorphism, we can


define the tensor product of M and N as follows.
Definition 10.5.3. Suppose that M is a right module over R and N is a left module
over R. Let F (M × N ) be the free abeian group generated by M × N , and let H be the
subgroup of F (M × N ) generated by the elements of one of the following forms:
(m, n1 + n2 ) − (m, n1 ) − (m, n2 ), m ∈ M, n1 , n2 ∈ N ;
(m1 + m2 , n) − (m1 , n) − (m2 , n), m1 , m2 ∈ M, n ∈ N ;
(mr, n) − (m, rn), m ∈ M, n ∈ N, r ∈ R.
Then the tensor product M ⊗R N is the abelian group F (M × N )/H.

The definition looks weird at the first sight, but it actually makes sense: Recall that
taking the quotient by H is actually identifying the elements in H with 0. Since we want
that (m, n1 + n2 ) = (m, n1 ) + (m, n2 ), we hope that (m, n1 + n2 ) − (m, n1 ) − (m, n2 ) = 0,
and so we put this element into H. The other two forms come from the similar reason.
There is a natural map from M × N → M ⊗R N . This is usually denoted by ⊗, and
the image of (m, n) is usually denoted by m ⊗R n. When no ambiguity will be produced,
we will only write ⊗ instead of ⊗R .

150
Hung-Hsun, Yu 10.6 Group Representation Revisit

Property 10.5.2. (Universal property of tensor product) Suppose that ϕ : M ×N → G


is an R-balanced product. Then there exists a unique group homomorphism ϕ′ : M ⊗R
N → G such that ϕ′ ◦ ⊗ = ϕ. In other words, there exists a unique group homomorphism
ϕ′ such that the following diagram commutes.

ϕ
M ×N G


ϕ′
M ⊗R N

Sketch of Proof. Let ϕ̂ be the extension of ϕ from F (M × N ) to G. Then it is clear


by definition that H ⊆ ker ϕ̂, and so this induced naturally a group homomorphism
ϕ′ : M ⊗R N → G, as desired.

Note that if M is a left module over another ring S, then we can endow M ⊗R N with
a module structure over S by s(m ⊗ n) = (sm ⊗ n). Similarly, if N is a right module
over S, then M ⊗R N can be seen as a right module over S. Now if R is commutative,
then there is no difference between left and right modules. This tells us that M ⊗R N is
an R-module.
Property 10.5.3. (Universal property of tensor product over commutative ring) Sup-
pose that R is a commutative rng and M1 , M2 , N are all R modules. Then for every R-
bilinear map ϕ : M1 × M2 → N there exists a unique R-linear map from ϕ′ : M1 ⊗R M2 →
N such that ϕ′ ◦ ⊗ = ϕ. In other words, there is a unique R-linear map such that the
following diagram commutes.

ϕ
M ×N G


ϕ′
M ⊗R N

Sketch of Proof. Note that when R is commutative, any R-bilinear map is R-balanced.
Therefore there is a unique group homomorphism ϕ′ that makes the diagram commute.
We only need to verify that this is R-linear, which is quite clear and is left as an exercise.

10.6 Group Representation Revisit


We have noticed that a group representation of G on V over F is the same data as an
F [G]-module structure on End(V ). We can therefore restate some of the properties we’ve
discovered for group representation in the module setting.
Definition 10.6.1. A module M over R is simple if there is no proper non-trivial
submodule of M . A module M is semisimple if it can be decomposed as the direct sum
of simple modules.

151
Hung-Hsun, Yu 10.6 Group Representation Revisit

Property 10.6.1. Suppose that M is a unital module over a ring R with 1. Then the
following three conditions are the equivalent:
(1) M is the sum of some simple submodules;
(2) M is semisimple;
(3) For every submodule N of M , there exists a complement P which is a submodule
of M such that M = N ⊕ P .

Sketch of Proof. Let’s first show that (1) implies (2). Suppose that M is the sum of the
family of modules Ni (i ∈ S). Let Ni (i ∈ S ′ ) be the maximal subset of the family such
that the sum is direct (the existence is guaranteed by Zorn’s lemma). We have to show

that M ′ := i∈S ′ Ni is M , or equivalently, it contains every Nj for j ∈ S. Note that if
it does not contain Nj , then since Nj is simple, we have that M ′ ∩ Nj is empty, and so
M ′ + Nj is actually a direct sum, which is a contradiction to the maximality of S ′ .
(2) implies (3) can be proved in a similar way. Suppose that M is the direct sum of
Ni (i ∈ S), and let S ′ to be the maximal subset of S such that the sum of Ni (i ∈ S ′ ) is
direct and has empty intersection with N . Then we can prove similarly that N ⊕M ′ = M

where M ′ = i∈S ′ Ni .
To show that (3) implies (1), let N be the sum of all simple submodules of M . We have
to show that N = M . Suppose that this is not true, and P is a non-trivial submodule that
is a complement of N . Then there exists a nonzero element x ∈ P , and so 0 ̸= Rx ⊆ P .
Let ϕ : R → Rx be the R-linear map sending r to rx, then R/ ker ϕ ∼ = Rx. By Zorn’s
lemma there is a maximal left ideal M of R/ ker ϕ, and so M x is the maximal submodule
of Rx. Let P ′ be a complement of M x, then we can show that P ′ ∩ Rx is simple—if S is a
submodule of P ′ ∩Rx, then S ⊕M x is contained in Rx and contains M x. This shows that
S ⊕ M x = M x or Rx. In the first case S = 0, and in the second case S = P ′ ∩ Rx.

Property 10.6.2. The direct sum of a family of semisimple modules is still semisimple.
Any submodule and quotient module of a semisimple module are still semisimple.

Sketch of Proof. Let Mi (i ∈ S) be a family of semisimple modules over R, then every


Mi is the sum of simple submodules of Mi . In other word, for every mi ∈ Mi , we can
decompose mi into mi1 + · · · + miti such that mij belongs to some simple submodule of
Mi . Therefore for every
∑ ⊕
mi ∈ Mi
i∈S

we can decompose this as


∑∑
ti
mij .
i∈S j=1
∑ ⊕
Note that this is a finite sum since mi is also a finite sum. Therefore i∈S Mi is a sum

of simple submodules, which shows that i∈S Mi is semisimple.
Now suppose that M is a semisimple module and N is a submodule of M , then we
know that M is the sum of some simple submodules. For every simple submodule, the
intersection of it and N is either itself or 0. Therefore N is also the sum of some simple
submodules, which implies that N is semisimple.
Lastly, let’s show that M /N is semisimple. We know that there exists P such that
M = N ⊕ P , and so M /N ∼ = P . Since P is a submodule of M , P is semisimple, and so
M /N is also semisimple.

152
Hung-Hsun, Yu 10.6 Group Representation Revisit

Corollary 10.6.1. If R is a ring with 1 that is semisimple as a module over R, then


every unital module over R is semisimple.

Sketch of Proof. Suppose that R is semisimple over itself and M is a module over R.
Then we can construct a map ⊕
ϕ: Rm → M
m∈M

where Rm is just a copy of R and ϕ(1Rm ) = m. Since this map is surjective, M is a


⊕ ⊕
quotient module of m∈M Rm . Since m∈M Rm is semisimple, M is also semisimple.

With the definition of simpleness and semisimpleness, we can now restate the Maschke’s
theorem proved before.
Theorem 10.6.1. (Maschke’s theorem) Suppose that V is a finite dimensional space
over C that is also a C[G]-module where G is a finite group, then V is a semisimple
module over C[G].

We can actually prove something stronger under this setting.


Theorem 10.6.2. (Maschke’s theorem) Suppose that G is a finite group and F is a
field with characteristic not dividing |G|, then every F [G]-module is semisimple.

Sketch of Proof. It suffices to show that F [G] is semisimple over itself. It is equivalent
to show that for every submodule V of F [G], there exists a submodule P such that
F [G] = V ⊕ P . This also is equivalent to show that there is a F [G]-linear surjection from
F [G] to V .
Since F [G]-module can also be seen as an F -module, or equivalently vector space over
F , there exists a F -linear projection π from F [G] to V . Now let ϕ : F [G] → V such that

1 ∑
ϕ(x) = gπ(g −1 x).
|G| g∈G

Then ϕ is again F -linear. Moreover, for every s ∈ F [G] we have that

1 ∑
ϕ(sx) = gπ(g −1 sx)
|G| g∈G
1 ∑
= sgπ(g −1 x)
|G| g∈G
= sϕ(x),

which shows that ϕ is F [G]-linear. Lastly, we have to show that ϕ is surjective. Since for
every g ∈ G we have that gV ⊆ V , we have that

1 ∑ 1 ∑ −1
ϕ(v) = gπ(g −1 v) = gg v = v.
|G| g∈G |G| g∈G

Therefore ϕ is surjective, as desired.

We can also restate Schur’s lemma in the new language that we just learned.

153
Hung-Hsun, Yu 10.7 Structure Theorem for Module over PID

Theorem 10.6.3. (Schur’s lemma) Suppose that G is a finite group and M1 , M2 are
two simple modules over C[G], then HomC[G] (M1 , M2 ) = 0 unless M1 , M2 are isomorphic.
Moreover, EndC[G] (M1 ) ∼
= C.

This actually holds in a much more general setting.


Theorem 10.6.4. (Schur’s lemma) Suppose that R is a ring with 1 and M1 , M2 are
two simple modules over R, then HomR (M1 , M2 ) = 0 unless M1 , M2 are isomorphic.
Moreover, EndR (M1 ) is a ring with 1 such that every nonzero element is invertible.

Sketch of Proof. Suppose that ϕ : M1 → M2 is an R-linear map, then ker ϕ is a submodule


of M1 and im ϕ is a submodule of M2 . Then it is easy to see that if M1 , M2 are not
isomorphic, then im ϕ = 0 and ker ϕ = M1 . This shows that HomR (M1 , M2 ) = 0.
Now following the same argument we can see that for every ϕ ∈ EndR (M1 ) is either
0 or an isomorphism. Therefore ϕ is either 0 or invertible.

Corollary 10.6.2. If R contains a field F and M is a simple module over R such that
dimF M is finite, then EndR (M1 ) is finite dimensional over F . In particular, if F is
algebraically closed, then EndR (M1 ) ∼
= F.

Sketch of Proof. Since EndR (M1 ) ⊆ EndF (M1 ) as an F -vector space, we have that
EndR (M1 ) is finite dimensional over F . Now if F is algebraically closed, let x ∈ EndR (M1 ).
Since EndR (M1 ) is finite dimensional over F , there exists n ∈ N such that x0 , · · · , xn
are linearly dependent over F . This shows that x is a root of polynomials with co-
efficients in F , and since F is algebraically closed, we can split the polynomial into
(x − f1 ) · · · (x − fn ) = 0 where fi ∈ F . Since every nonzero element is invertible, we have
that one of x − fi is zero, and so x ∈ F . Therefore EndR (M1 ) = F .

10.7 Structure Theorem for Module over PID


In this section, we will introduce a really important result about the structure of modules
over PID. In the next section, we will see some corollaries derived from this powerful
theorem.
Before we dive in to the world of modules over PID, let’s first develop some more
theories for the more general modules.
Definition 10.7.1. Suppose that M is a unital module over R. A subset S is said to
freely generate M if it generates M and for every x1 , . . . , xn and r1 , . . . , rn ∈ R, we have
that
r1 x1 + · · · + rn xn = 0 ⇒ r1 = · · · = rn = 0.
In this case, we will also say that S is a basis of M . If M is freely generated by some
subset, then we say that M is a free module over R.

Example 10.7.1. We know that if R is a field, then M is a vector space over R and
thus must consist of a basis. Therefore any module over a field must always be free.
On the other hand, we know that Z/2Z is a module over Z. However, for every
x ∈ Z/2Z we have that 2x = 0, which shows that there is no basis in Z/2Z.

154
Hung-Hsun, Yu 10.7 Structure Theorem for Module over PID

Property 10.7.1. A unital module M over R is free if and only if it is a direct sum of
copies of R.

Sketch of Proof. A direct sum of copies of R is clearly free. Conversely, if M is a free


module, suppose that S is a basis of M . Let Rs (s ∈ S) be copies of R, then we can
construct an R-linear map ⊕
ϕ: Rs → M
s∈S

sending 1Rs to s. Then the definition implies that ϕ is a bijection, and so M is isomorphic
to the direct sum of Rs (s ∈ S).

Now let’s prove the first piece of the structure theorem.


Theorem 10.7.1. Suppose that R is a PID. Then any submodule of a finitely generated
free unital module over R is still a finitely generated free module.

We can actually split this into two parts (with a bit generalization):
Lemma 10.7.1. Suppose that R is a left Noetherian ring. Then any submodule of a
finitely generated unital module over R is finitely generated.

Sketch of Proof. Let M be a finitely generated unital module generated by the element
m1 , . . . , mn . We can do an induction on n. The statement is trivial when n = 0. If this
holds when n = k, then when n = k + 1, suppose that N is a submodule of M . Let I be
an left ideal containing i such that there exists r1 , . . . , rk ∈ R satisfying

r1 m1 + · · · + rk mk + imk+1 ∈ N.

Since R is left Noetherian, we know that I is finitely generated. Suppose that this is
generated by i1 , . . . , is and n1 , . . . , ns are elements in N such that for every j there exists
r1 , . . . , rm ∈ R satisfying

r1 m1 + · · · + rk mk + ij mk+1 = nj ∈ N.

Also suppose that N ′ is the intersection of N and the submodule M ′ generated by


m1 , . . . , mk . Then it is clear that

N ′ + ⟨n1 , . . . , ns ⟩ = N.

Now note that N ′ is a submodule of M ′ , which is generated by k elements. So by induction


hypothesis, N ′ is also finitely generated. This shows that N is also finitely generated.

Lemma 10.7.2. Suppose that R is a PID and M is a free module over R, and N is
a submodule of M . Then N is a free module over R. Moreover, the basis of N has an
equal or lower rank than the basis of M .

Sketch of Proof. Let M ∼ = i∈I R. By Zermelo’s theorem, we can assume that there is

a well-ordering ≤ on I. For every i ∈ I let Fi = N ∩ j≤i R. We can consider the
projection map at i-th position πi : Fi → R, and the image will be an ideal of R. Since
R is a PID, we know that there is ai ∈ R such that π(Fi ) = ai R. For every ai ̸= 0, let
ni ∈ Fi such that πi (ni ) = ai . We will show that those ni form a basis of N .

155
Hung-Hsun, Yu 10.7 Structure Theorem for Module over PID

Let’s first show that ni are linearly independent. If there exist i1 < . . . < ik and
ri1 , . . . , rik ̸= 0 ∈ R such that

ri1 ni1 + · · · + rik nik = 0.

Then by taking πik , we see that


0 = rik aik ,
which is a contradiction since R is an integral domain. Therefore they are linearly inde-
pendent.
Lastly, let’s show that ni generates the whole set. Suppose that this is not the case,
then the set containing j such that Fj is not generated by ni is non-empty. Since (I, ≤)
is well-ordered, there is a least j such that Fj is not generated. Suppose that a ∈ Fj is
not generated, and that
a = ri1 1i1 + · · · + rik 1ik
where i1 < · · · < ik = j and ri1 , . . . , rik ̸= 0 ∈ R. By definition of nj we know that we
can pick r ∈ R such that πj (a − rnj ) = 0. Suppose that

nj = st1 1t1 + · · · + stk′ 1tk′

where t1 < · · · < tk′ = j and st1 , . . . , stk′ ̸= 0 ∈ R. Since a is not generated, we know that
a − rnj is not generated either. However, we know that a − rnj ∈ Fmax(ik−1 ,tk′ −1 ) , which
is a contradiction with the minimality of j.
Remark. The proof above seems to be scary, but think about the case where M has
finite rank for a while. It will then be clear why the proof works.

Now suppose that M is a finitely generated module over a PID R, then we know that
there exists m such that M is a quotient module of Rm . We know that the kernel is a
submodule of Rm , and so by the theorem we know that the submodule is also finitely
generated free module. This tells us that there is a module homomorphism ϕ : Rn → Rm
such that ϕ(Rn ) is the kernel, or in other words, M ∼= Rm /ϕ(Rn ). Therefore the whole
theory of finitely generated module reduces to the theory of maps from Rn to Rm . If we
can derive a canonical form of linear maps from Rn to Rm , then we can represent every
finitely generated module M over a PID in a canonical form.
Theorem 10.7.2. (Smith normal form) Suppose that R is a PID and ϕ : Rn → Rm is

an R-linear map. Then we can choose a basis β1 , . . . , βn of Rn and β1′ , . . . , βm such that

ϕ(βi ) = di βi′

for every i = 1, . . . , n and that d1 | d2 | · · · | dn . Here if n > m then di = 0 for all i > m.

Sketch of Proof. Suppose that A is the matrix form of ϕ with respect to any two bases
on Rn and Rm . If we can do a series of changes of bases to make a11 be the greatest
common divisor of all entries. Then we can make a1j and ai1 all zero for i, j > 1 by a
series of changes of bases, and then we can just induct on min(m, n). Therefore it suffices
to make a11 the greatest common divisor of all entries. To do this, we only need to know
how to make a specific entry the greatest common divisor of the entries in the same row,
or in the same column. To achieve this, we just need a tool such that for each i, j, j ′ , we
can replace aij with the gcd of aij and aij ′ . Since R is a PID, we can assume that the

156
Hung-Hsun, Yu 10.8 Two Applications of the Structure Theorem

gcd of aij , aij ′ is xaij + yaij ′ . It’s clear that gcd(x, y) = 1, and so there is x′ , y ′ such that
xy ′ − yx′ = 1. This tells us that we can make a change of basis such that the j-th column
is x times j-th column plus y times j ′ -th column, and the j ′ -th column is x′ times j-th
column plus y ′ times j ′ -th column. In this case, aij is replaced by xaij + yaij ′ , which is
the gcd of aij , aij ′ .
Corollary 10.7.1. (The structure theorem of module over PID) Suppose that M is a
finitely generated module over a PID R, then there exists d1 | · · · | dn such that M is
isomorphic to
R/d1 R × · · · × R/dn R.
Moreover, such sequence is unique (up to conjugation) if d1 is not a unit.

Corollary 10.7.2. (The primitive factorization of module over PID) Suppose that M
is a finitely generated module over a PID R, then there exists q1 , . . . , qn where each qi is
conjugate to a power of an irreducible or zero such that M is isomorphic to
R/q1 R × · · · × R/qn R.
Moreover, such sequence is unique up to permutation and conjugation.

10.8 Two Applications of the Structure Theorem


To apply the structure theorem, we first have to have a module over PID. The easiest
example is modules over Z. Since Z-modules are exactly the same as abelian groups,
we can simply translate the theorems proved in the previous section into the statements
about finitely generated abelian groups.
Theorem 10.8.1. If G is a finitely generated free abelian group, then every subgroup
of G is also a finitely generated free abelian group.

Theorem 10.8.2. (Fundamental theorem of finitely generated abelian group) If G is


a finitely generated abelian group, then there exists a sequence of non-negative integers
d1 , d2 , . . . , dn where di |di+1 and d1 ̸= 1 such that
G∼
= Z/d1 Z × Z/d2 Z × · · · × Z/dn ,
and such sequence is unique. Alternatively, there exists a sequence of power of primes
q1 , q2 , . . . , qn and a non-negative integer m such that
G∼
= Z/q1 Z × · · · × Z/qn Z × Zm ,
and such m is unique, such sequence is unique up to permutation.

With this theorem, we can confidently say that we understand finitely generated
abelian groups well enough. For example, we can now easily calculate the number of
abelian groups of a given order.
Example 10.8.1. Let’s calculate the number of abelian groups of order 360 = 23 ×32 ×5.
By the fundamental theorem of finitely generated abelian group, we know that if G is a
abelian group of order 360, then G can be written as
G∼
= Z/q1 Z × · · · × Z/qn Z

157
Hung-Hsun, Yu 10.8 Two Applications of the Structure Theorem

for a sequence of power of primes q1 , . . . , qn . Also, we know that q1 q2 · · · qn = |G| = 360.


Therefore qi can only be the powers of 2, 3 and 5. The part consisting of powers of 2 can
only be Z/8Z, Z/4Z × Z/2Z or (Z/2Z)3 . The part consisting of powers of 3 can only be
Z/9Z or (Z/3Z)2 . The part consisting of powers of 5 can only be Z/5Z. Therefore there
are 3 × 2 × 1 = 6 abelian groups of order 360.

Another good example of PID is the polynomial ring F [x] over a field F . We know
that every finitely generated F [x]-module is also a F -module, which means that it is a
vector space over F . Hence it is natural to start with a finite dimensional vector space
V over F .
Now for every v ∈ V and a ∈ F we know how to define av. To have a F [x]-module
structure on V , it remains to determine how to define x · v for every v ∈ V . It is not
hard to show that determining x · v is equivalent to determining a linear operator on V .
Therefore, given a linear operator T , we can define a F [x]-module structure on V by
f · v = [f (T )](v)
for every f ∈ F [x] and v ∈ V . Then, by the structure theorem, we know that V as a
F [x]-module is isomorphic to
k[x]/⟨f1 ⟩ ⊕ k[x]/⟨f2 ⟩ ⊕ · · · ⊕ k[x]/⟨fn ⟩
for some f1 , . . . , fn ∈ F [x]. This decomposes V into direct summands F [x]/⟨fi ⟩, and to
study the operation of T on V , it suffices to study it on each F [x]/⟨fi ⟩. Suppose that fi
is a polynomial of degree n, then F [x]/⟨fi ⟩ is an n-dimensional vector space, and we can

choose {1, x, . . . , xn−1 } as a basis. Assume that fi = xn + n−1 j
j=0 aj x , then


n−1
T (1) = x, T (x) = x2 , . . . , T (xn−2 ) = xn−1 , T (xn−1 ) = −aj xj .
j=0

Hence the matrix form of T with respect to this basis is


 
−a0
 
1

−a1 

 1 −a2 
 .
 .. .. 
 . 
 . 
1 −an−1
Definition 10.8.1. The rational block corresponding to the polynomial f (x) = xn +
∑n−1 i
i=0 ai x is the matrix above.

With this definition, we can now state the result.


Theorem 10.8.3. (Rational canonical form) Suppose that T is a linear operator on a
finite dimensional vector space V , then there exists a basis such that the matrix form of
T is of the form is  
Q1
 ... 
 
 
Qn
where Qi is the rational block corresponding to the monic polynomial fi ∈ F [x], and
f1 |f2 | · · · |fn . Moreover, this sequence of polynomial is unique.

158
Hung-Hsun, Yu 10.9 Random Problem Set

Definition 10.8.2. The matrices in the form above are said to be in their rational
canonical form.

Corollary 10.8.1. For any square matrix A, it is similar to a matrix in the rational
canonical form. Moreover, there is a unique matrix in the rational canonical form that is
similar to A.

Rational canonical form solves the problem we encountered when talking about Jordan
canonical form: we no longer need to work in an algebraically closed field in order to get
a canonical form of a linear operator. Rational canonical form also tells us how we should
tell if two given matrices are similar. In addition, it also gives a way to calculate the
minimal polynomial of a linear operator.
Property 10.8.1. Suppose that T is a linear operator on a finite dimensional vector
space V , and the rational blocks in the rational canonical form of T correspond to the
polynomials f1 , f2 , . . . , fn ∈ F [x]. Then the minimal polynomial of T is fn .

Sketch of Proof. We know that

V ∼
= F [x]/⟨f1 ⟩ ⊕ · · · ⊕ F [x]/⟨fn ⟩

when x · v = T (v) for every v ∈ V . Therefore for any g ∈ F [x], we know that g(T ) = 0 if
and only if fi |g for every i. Since f1 | · · · |fn , this is equivalent to that fn |g, and so fn is
the minimal polynomial of T .

10.9 Random Problem Set


1. (10.1) Show that every abelian group M can be viewed as an End(M )-module.

2. (10.2) We know that every submodule of a Noetherian submodule is Noetherian.


Show that this is not true if we replace both “Noetherian” with “finitely generated.”
Hint: You might want to consider a (really simple) module over a non-Noetherian
ring.

3. (10.3) Let M, N, N ′ be modules over R. We know that if ϕ : N → N ′ is an injective


R-linear map, then ϕ∗ : Hom(M, N ) → Hom(M, N ′ ) is also injective. Construct an
example where ϕ is surjective instead, but ϕ∗ is not surjective.

4. (10.4) For any R-modules M, N , show that every ϕ ∈ HomR (M, N ) induces a
submodule of M × N , namely (m, ϕ(m))m∈M .

5. (10.5) Suppose that F is a field and V, W are vector spaces over F . Show that
V ⊗ W where here the tensor product is defined for vector spaces is the same as
V ⊗F W as F -modules.

6. (10.5) Compute the module Z/mZ ⊗Z Z/nZ where m, n are positive integers.

7. (10.5) Suppose that R is an integral domain and F is its fraction field. If M, N are
two R-modules and ϕ : M → N is R-linear and injective, show that the induced
map M ⊗R F → N ⊗R F is also injective.

159
Hung-Hsun, Yu 10.9 Random Problem Set

8. (10.6) Try to rewrite the proof of orthogonality of characters into the language of
modules.

9. (10.7) Suppose that M is a module over R such that for every 0 ̸= m ∈ M and
0 ̸= r ∈ R, we have rm ̸= 0 (we say that M is torsion-free in this case). If R is
PID and M is finitely generated and torsion-free, show that M is a free module.
Use this to give an alternative proof of the structure theorem of module over PID.

10. (10.7) Given a matrix M over any PID, let d1 |d2 | . . . |dn be the diagonal entries of
its Smith’s normal form. Show that d1 . . . di is the gcd of the all the i by i minors.

11. (10.8) Suppose that (G, +) is a finite abelian group with order pα1 1 pα2 2 · · · pαnn . Show
that G is cyclic if and only if for every pi , the number of elements x in G that
satisfy pi x = 0 is pαi i −1 .

12. (10.8) Calculate the rational canonical form of


 
−1 2 0 4
 2 0 −1 −2
 
 
 0 1 0 2
−2 1 1 3

and derive the minimal polynomial of this matrix.

160
Chapter 11

Field and Galois Theory

In the last chapter, we are going to study Galois theory, which originates from the attempt
to give a radical formula of roots of polynomials with high degrees and divide an angle
into three identical angles with ruler and compass. Galois theory turns out to be more
powerful than that, and it becomes a canonical and fundamental language in various
fields of math studies.
But before we even dive into Galois theory, which has a lot to do with field extension,
we should first understand field better than we do now. In particular, I will talk more
about finite field, which we try to avoid in most of the notes. This will make it easier to
work with the field with nonzero characteristic, which allows us to understand better the
field where Galois theory does not behave as well.

11.1 Field Extension


The main object (or relationship between objects if you prefer) in this whole chapter will
be two fields F, K where K contains F . Of course we can think of F as a subfield of K,
but it turns out that we usually fix the base field F , and so it is more convenient to think
of K as an “extension” of F .
Definition 11.1.1. Suppose that F is a field and K is a field containing F , then we
say that F ⊆ K is a field extension, or K is an extension of F .

Property 11.1.1. If F ⊆ K is a field extension, then the characteristic of F is the


same as the characteristic of K.

There are a lot of desired properties that we will want the field extensions to have.
Let’s first name some basic ones.
Definition 11.1.2. Suppose that F ⊆ K is a field extension, then K is a vector space
over F . Hence the dimension of K over F is defined, and it is denoted by [K : F ]. We
say that the degree of the extension is [K : F ].

Definition 11.1.3. If F ⊆ K is a field extension, and [K : F ] < ∞, then we say that


this extension is finite, or K is a finite extension of F .

Definition 11.1.4. For any field extension F ⊆ K and any element x ∈ K, we say that
x is algebraic over F if there exists a nonzero polynomial f ∈ F [x] such that f (x) = 0 in

161
Hung-Hsun, Yu 11.1 Field Extension

K. The field extension F ⊆ K is algebraic if every element in K is algebraic over F .

Most of the field extensions discussed in this chapter will be finite, and almost all of
them will be algebraic. In particular, finite extension is always algebraic.
Property 11.1.2. An extension is finite if and only if it is algebraic and K is a finitely
generated F -algebra.

Sketch of Proof. If F ⊆ K is finite, then it is clearly finitely generated. Suppose that


[K : F ] = n, then for every x ∈ K, we know that 1, x, . . . , xn are linealy dependent, and
so we get a polynomial relation of x with coefficients in F . Hence x is algebraic, and so
K is an algebraic of extension.
Conversely, if K is finitely generated F -algebra and also an algebraic extension of
F , suppose that K = F [x1 , . . . , xn ]. We will first show that F [x1 , . . . , xi+1 ] is a finite
extension of F [x1 , . . . , xi ], and then by the following lemma this will conclude the proof.
Hence we reduce the problem to the case where K = F [x].
Since K = F [x], we know that every element in K is of the form f (x) for some
f ∈ F [t]. Since x is algebraic over F , we know that there exists g ∈ F [t] such that
g(x) = 0. Then by taking the remainder of f divided by g, each element in K can be
represented as r(x) for r ∈ F [t] and deg r < deg g. Therefore 1, x, . . . , xdeg g−1 span K
over F , and so K is a finite extension of F .

Lemma 11.1.1. Suppose that L is an extension of F and K is an extension of L,


then F ⊆ K is finite if and only if F ⊆ L, L ⊆ K are both finite, and in this case,
[K : F ] = [K : L][L : F ].

Sketch of Proof. If [K : F ] is finite, then [K : L], [L : F ] ≤ [K : F ] < ∞, which shows


that F ⊆ L, L ⊆ K are both finite.
Conversely, if [K : L], [L : F ] are both finite, then we can choose a basis a1 , . . . , an of
K over L, and a basis b1 , . . . , bm of L over F . Then ai bj (1 ≤ i ≤ n, 1 ≤ j ≤ m) form a
basis of K over F , and so [K : F ] = [K : L][L : F ] < ∞.

From this, we can actually see a way to construct a finite field extension.
Property 11.1.3. Suppose that F is a field and f ∈ F [t] is an irreducible polynomial.
Then F [t]/⟨f ⟩ is a finite extension. Moreover, [F [t]/⟨f ⟩ : F ] = deg f .

The extension that we form this way will always be of the form K = F [θ] for some θ
in K.
Definition 11.1.5. A field extension F ⊆ K is simple if there exists θ ∈ K such that
F [θ] = K. In this case, we say that θ is a primitive element in K over F .

Property 11.1.4. Suppose that F is a field. Then every finite simple extension K of
F is ismorphic to F [t]/⟨f ⟩ for some irreducible polynomial f in F [t].

Sketch of Proof. Suppose that θ is a primitive element of K over F , then F [θ] = K. In


other words, the homomorphism F [t] → K fixing F and sending t to θ is surjective.
Therefore K ∼= F [t]/⟨f ⟩ for some f ∈ F [t] because F [t] is a PID. Since K is a field, we
know that ⟨f ⟩ is maximal, and so f is irreducible.

162
Hung-Hsun, Yu 11.2 Finite Field Part One

Definition 11.1.6. Let F ⊆ K be a field extension and let x ∈ K be an element


algebraic over F , then the monic polynomial f such that F [x] ∼
= F [t]/⟨f ⟩ is called the
minimal polynomial of x. Equivalently, f is the monic polynomial with the smallest
degree such that f (x) = 0 in K.

Example 11.1.1. Consider the field extension R ⊆ C. This is finite and simple: we
can take C = R[i]. Therefore by the above construction we know that C ∼
= R[t]/⟨t2 + 1⟩,
and the minimal polynomial of i is x2 + 1.

Example 11.1.2. We can also consider the polynomial t2 + t + 1 ∈ F2 [t]. This is


irreducible since t2 + t + 1 = t(t + 1) + 1, and so K = F2 [t]/⟨t2 + t + 1⟩ is a finite simple
extension of F2 . Moreover, [K : F2 ] = 2, and so |K| = |F2 |2 = 4. Therefore we get a field
with four elements.

11.2 Finite Field Part One


Now let’s finally start to investigate the properties of finite field. Suppose that F is a
finite field, then we know that char F ̸= 0 since |F | · 1 = 0 by Lagrange’s theorem (or
fundamental theorem of finite abelian group if you prefer). Therefore the characteristic
of F is a prime p. With this characteristic p, we can say a lot about F .
Property 11.2.1. If F is a finite field with characteristic p, then |F | is a power of p.

Sketch of Proof. Suppose that q is a prime dividing |F |, then by Cauchy’s theorem (or
again FTFAG if you prefer) there exists x ̸= 0 such that q · x = 0 in F . Multiplying by
x−1 , we get that q · 1 = 0. Hence q = p, and so |F | can only be the power of p.
Now we can ask ourselves: given a prime p and q, a power of p, how many finite fields
(up to isomorphism) are of order q? If q = pn , then by the idea given by the previous
section, it is natural to take an irreducible polynomial f ∈ Fp [t] with degree n and then
construct F = Fp [t]/⟨f ⟩. However, it is not immediate to see why such polynomial f
exists. We will soon give a construction of a field Fq with order q = pn , but for now, let’s
assume that we already have this field.
Property 11.2.2. Suppose that F is a finite field of order q = pn , then every element
x in F satisfies the relation xq − x = 0.
×|
Sketch of Proof. We know that F × is a group, and so by Lagrange’s theorem 1 = x|F =
xq−1 for all x ̸= 0. Therefore xq = x for every x ∈ F .
Corollary 11.2.1. If F is a finite field of order q, then

(t − x) = tq − t.
x∈F

Moreover, for every f, g ∈ F [t], we have that f (x) = g(x) for all x ∈ F if and only if
tq − t divides f − g.

Sketch of Proof. Since for every x ∈ F , we have that x is a root of tq − t, we know that
t − x divides tq − t. Hence, ∏
(t − x)|(tq − t),
x∈F

163
Hung-Hsun, Yu 11.2 Finite Field Part One

and by comparing the degree and the leading coefficient, we know that those two poly-
nomials are the same.
Now f (x) = g(x) for all x ∈ F if and only if t − x divides f − g for all x ∈ F . Hence
this is equivalent to that tq − t divides f − g.
Example 11.2.1. We know that F = F2 [t]/(t2 +t+1) is a field consisting of 4 elements.
Let x be the image of t in F , then the elements in F are 0, 1, x and x + 1. Then

(t−0)(t−1)(t−x)(t−(x+1)) = (t2 −t)(t2 −(2x+1)t+x(x+1)) = (t2 −t)(t2 +t+1) = t4 −t.

Therefore we can think of F as “the field consisting of roots of tq − t.” This does not
quite make sense now, since we don’t know if it exists or if it is unique. But if tq − t some
how splits into linear factors, then we can get a field consisting of q elements out of this.
Lemma 11.2.1. If F is a field with characteristic p, then (x + y)p = xp + y p for every
x, y in F .

Sketch of Proof. By binomial theorem, we know that


( )
p

p
p i p−i
(x + y) = xy .
i=0 i
Äp ä
Since p divides i
for all i ̸= 0, p, we have that

(x + y)p = xp + y p .

Theorem 11.2.1. If Fp ⊆ K is a field extension, and tq − t splits into linear factors,


then the roots of tq − t in K forms a field of q elements.

Sketch of Proof. Let F be the set of roots of tq − t. We know that 0, 1 ∈ F . It is also


clear that if x ∈ F then −x, x−1 ∈ F . Now to show that F is a field, it remains to show
that x, y ∈ F implies xy ∈ F and x + y ∈ F . We know that if x, y ∈ F , then

(xy)q = xq y q = xy

and
n n n
(x + y)q = (x + y)p = xp + y p = xq + y q = x + y,
which shows that xy, x + y ∈ F , as desired.
We still need to show that F has exactly q elements. Since tq − t splits into linear
factors, if |F | < q, then there exists x ∈ F such that (t − x)2 |tq − t. This shows that
x is a double root of tq − t, and so x should be a root of the derivative of tq − t. Since
the derivative of tq − t is qtq−1 − 1 = −1, we know that this cannot happen. Therefore
|F | = q.
Remark. We use the derivative of polynomials in the proof. However, the usual defini-
tion of taking a derivative does not work in a general field K. Here, taking the derivative
means calculating the derivative formally, i.e.
d ∑ n ∑n
ai ti = iai ti−1 .
dt i=0 i=1

164
Hung-Hsun, Yu 11.3 Splitting Field

It can be checked that the formulas we have for the usual derivatives still hold:

(f + g)′ = f ′ + g ′ , (f − g)′ = f ′ − g ′ , (f g)′ = f ′ g + f g ′ .

Therefore (t − x)2 |f if and only if (t − x)|f and (t − x)|f ′ .

Example 11.2.2. Consider the polynomial f (t) = t4 + t3 + 1 ∈ F2 [t]. f has no roots


in F2 [t], and it is not divisible by t2 + t + 1, the only irreducible polynomial of degree 2.
Therefore f is irreducible in F2 [t], and so we can consider the field K = F2 [t]/(f ) with
16 elements. Again, let the image of t in K be x.
Now, the polynomial of t4 − t happens to split into linear factors in K. It can be seen
by the fact that t4 − t divides t16 − t, and t16 − t splits into linear factors in K. To show
it explicitly, we can write t4 − t as

t(t − 1)(t2 + t + 1) = t(t − 1)(t − (x3 + x))(t − (x3 + x + 1)).

Then we know that {0, 1, x3 + x, x3 + x + 1} forms a field of 4 elements. It can also be


checked that this field is isomorphic to F2 [t]/⟨t2 + t + 1⟩.

11.3 Splitting Field


As we see in the previous section, to construct a finite field of order q, we want to find the
“minimal” field extension of Fp such that tq − t splits into linear factors. More generally,
given a field F and a polynomial f ∈ F [t], we might want to find the “minimal” field
extension such that f splits into linear factors. We will see later that this is one of the
conditions for Galois group to behave well.
Definition 11.3.1. Let F ⊆ K be a field extension and f ∈ F [t] be a polynomial. We
say that K is a splitting field of f over F if f splits into linear factors


n
f (t) = c (t − αi )
i=1

and that K = F [α1 , . . . , αn ].

Example 11.3.1. Suppose that F is a field of order q = pn , then Fp ⊆ F is a field


extension. Moreover, xq − x splits into linear factors in F , and every single element of F
is a root of xq − x. Hence F is a splitting field of xq − x over Fp .

√ extension Q ⊆ Q[t]/⟨t2 − 2⟩. Then Q[t]/⟨t − 2⟩ is


2 2
Example 11.3.2. √ Consider the
isomorphic to Q[ 2], and so Q[ 2] is a splitting field of t − 2 over Q. √
Conversely, consider the extension Q ⊆ Q[t]/⟨t 3
− 2⟩. Then Q[t]/⟨t 3
− 2⟩ ∼
= Q[ 3
2].
√ √ √ √
Then Q[ 2] is not a splitting field of t − 2 over Q since ω 2, ω 2 are not in Q[ 2].
3 3 3 2 3 3

In order√to make it a splitting field, we need to put ω in and enlarge the extension to
Q ⊆ Q[ 3 2, ω]. This also makes the√ degree of extension from 3 to 6.
In fact, we can show that Q[ √
3
2] is not a splitting field over Q, that is, there does
not exist f ∈ Q[t] such that Q[ 2] is a splitting field of f over Q. It will be somehow
3

annoying to show it here, so I will delay the proof until we have enough tools to prove it.

165
Hung-Hsun, Yu 11.4 Finite Field Part 2

This example tells us that using the construction that we have been used might not
give us a splitting field of f over F . That said, using the construction several times can
give us the desired extension.
Theorem 11.3.1. (Existence of splitting field) Let F be a field and f be a polynomial
in F [t]. Then there exists an extension K of F such that K is a splitting field of f over
F.

Sketch of Proof. We can prove this by induction on deg f . The statement clearly holds
when deg f = 1. Now suppose that the statement holds for every deg f < n, then when
deg f = n, we can first assume that f is irreducible. This is because if f = gh with
deg g, deg h < deg f , then we can first find a splitting field of g over K1 , and then find a
splitting field K2 of h over K1 . K2 will then be the splitting field of f over F .
Now if f is irreducible, then we can take K1 ∼ = F [t]/⟨f ⟩ as an extension of F . Let x
be the image of t in K1 , then we know that x is a root of f . Therefore f (t) splits into
(t − x)f ′ (t). Since deg f ′ = deg f − 1, we can take K2 such that it is a splitting field of
f ′ over K1 . Then K2 is a splitting field of f over F , as desired.

Corollary 11.3.1. If q is a power of a prime, then there exists a field of order q.

The splitting field is actually unique up to isomorphism.


Theorem 11.3.2. (Uniqueness of splitting field) Let F be a field and f be a polynomial
in F [t]. If K and K ′ are both splitting fields of f , then K and K ′ are isomorphic.

Sketch of Proof. Again, we can show this by induction. It clearly holds when deg f = 1.
Now suppose that it holds when deg f < n, then when deg f = n, we can still assume
that f is irreducible by the same argument above.
Since f splits into linear factors in K and K ′ , we can choose roots x ∈ K, x′ ∈ K ′
of f . Then the minimal polynomial of x and x′ is f since f is irreducible, and so
F [x] ∼= F [x′ ]. Let f (t) = (t−x)g(t) = (t−x′ )g ′ (t). It is easy to check that the coefficients
of g, g ′ are polynomials in x and x′ , respectively. Therefore g(t) ∈ F [x][t] is identified
to g ′ (t) ∈ F [x′ ][t] when identifying F [x] with F [x′ ]. Since deg g = deg g ′ = deg f − 1
and K, K ′ are splitting fields of g, g ′ over F [x], F [x′ ], respectively, we know by induction
hypothesis that K ∼ = K ′ , as desired.

Corollary 11.3.2. For every power of prime q, there is a unique field (up to isomor-
phism) of order q.

Definition 11.3.2. For every power of prime q, denote the unique field of order q by
Fq .

11.4 Finite Field Part 2


Now that we know there is a unique field Fq of order q for every prime power q, we can
continue to explore more properties of finite field. We can also understand the finite
extensions of finite field a lot better now.

166
Hung-Hsun, Yu 11.4 Finite Field Part 2

Property 11.4.1. Suppose that p is a prime and m, n are two positive integers. Then
there exists a field extension Fpm ⊆ Fpn if and only if m|n. Moreover, if m|n, then such
extension is unique.

Sketch of Proof. If there is a field extension Fpm ⊆ Fpn , suppose that the degree of the
extension is d. Then pn = |Fpn | = |Fpm |d = pmd , and so m|n.
Conversely, if m|n, then we know that the polynomial tp − t divides the polynomial
m

tp − t. Since tp − t splits into linear factors in Fpn , we know that tp − t also splits into
n n m

linear factors in Fpn , and so by Theorem 11.2.1 we know that there is a subset of order
pm in Fpn that forms a field. By the uniqueness we know that this field is Fpm . This
extension is also unique since elements in Fpm are the roots of the polynomial tp − t.
m

Corollary 11.4.1. Suppose that f is an irreducible polynomial in Fp [t] of degree m


and m|n, then f divides tp − t. Moreover, the polynomial tp − t is the product of all
n n

irreducible polynomials in Fp [t] that have degrees dividing n.

Sketch of Proof. We know that Fp [t]/⟨f ⟩ is a field of order pm , and so it is isomorphic to


Fpm . Since there is an extension Fpm ⊆ Fpn , we know that x is a root of tp − t where x is
n

the image of t in Fp [t]/⟨f ⟩. Note that the minimal polynomial of x is f , and so f |tp − t,
n

as desired.
Now to show that tp − t is the product of all irreducible polynomials in Fp [t] that
n


have degrees dividing n, we can group the product x∈Fpn (t − x) in a way that it becomes
the product of some irreducible polynomials (to achieve this, group the x’s that share
the same minimal polynomial). Then we know that tp − t is a product of some distinct
n

irreducible polynomials that have degree dividing n. By the previous part, every such
irreducible polynomial divides tp − t, and so we know that tp − t is in fact the product
n n

of all such irreducible polynomials.


Example 11.4.1. If we set p = 2, n = 2 in the corollary, then we get that

t4 − t = t(t − 1)(t2 + t + 1).

If we set p = 3, n = 2 in the corollary, then we get that

t9 − t = t(t − 1)(t + 1)(t2 + 1)(t2 + t + 1)(t2 − t + 1).

Property 11.4.2. For every prime power q, the multiplicative group F×


q of Fq is a cyclic
group.

Sketch of Proof. For every n ∈ N, let an be the number of nonzero elements with order n
in the multiplicative group. If an > 0 for some n, then by Lagrange’s theorem we know
that n|q − 1. Moreover, suppose that x is an element with order n, then x, x2 , . . . , xn = 1
are n elements that are the roots of the polynomial tn − t. Therefore, there are no other
elements that are roots of tn − t, which also means that there are no other elements of
order n. Since xi is of order n if and only if i, n are coprime, we know that an = φ(n) in
this case. Therefore an ≤ φ(n) in any case.
Since there are q − 1 nonzero elements, we have that

q−1 ∑ ∑
q−1= an = an ≤ φ(n) = q − 1,
n=1 n|q−1 n|q−1

167
Hung-Hsun, Yu 11.5 Algebraic Closure

and so each equaltiy must hold. This shows that aq−1 = φ(q − 1) > 0, and so there exists
an element that generates F×
q .

Definition 11.4.1. If x is an element in F× ×


q that generates Fq , then x is said to be a
primitive root in Fq .

Corollary 11.4.2. Every finite extension of finite fields is simple.

To find a primitive root is actually often a hard task in practice, but the existence
of a primitive root itself already is powerful in some situations. See problem set for an
example.
At this points, we can confidently say that we know enough about finite fields and
finite extensions of finite fields. Starting from the next section, we are going to explore
more properties of field extensions. Now that we have those examples of finite extensions
of finite field, feel free to check those properties on them to help understanding.

11.5 Algebraic Closure


Splitting field is a tool that we can use to include more algebraic elements. Intuitively, if
we continue to do this procedure until this procedure no longer gives us any new elements,
then we get a field extension such that every polynomial splits into linear factors.
Definition 11.5.1. If K is a field such that every polynomial in K[x] splits into linear
factors, or equivalently, every nonconstant polynomial has a root in K[x], then we say
that K is algebraically closed. If F is a field, K is an algebraic extenion of F that is
algebraically closed, then we say that K is the algebraic closure of F . In this case, K is
usually denoted as F̄ .

In this section, we will show that our intuition is correct: for every field F , there exists
a unique algebraic closure K of F up to isomorphism. The proof is not that helpful for
the coming up materials, so feel free to skip the proofs if you feel comfortable assuming
this fact.
Let’s first prove the existence. We will proceed in several steps.
Lemma 11.5.1. Let F be a field. Then there exists an extension K of F such that
every polynomial in F [t] has a root in K.

Sketch of Proof. Let S be the set of irreducible polynomials. For each f ∈ S, associate
a variable tf with it. Let R = F [{tf | f ∈ S}] be the polynomial ring in infinitely many
variables. Consider the ideal I generated by {f (tf ) | f ∈ S}. It is clear that I is not the
unit ideal, and so it is contained in some maximal ideal m of R. Let K = R/m be a field.
Then there is clearly a natural embedding F ⊆ K. Moreover, if we let xf be the image
of tf in K, then for each f ∈ S we have that f (xf ) = 0 since f (tf ) ∈ I ⊆ m.

Lemma 11.5.2. Let F be a field. Then there exists an extension K of F such that K
is algebraically closed.

Sketch of Proof. Let F0 = F . For each n, choose Fn+1 to be an extension Fn such that
every irreducible polynomial in Fn [t] has a root in Fn+1 . Let K be the union of F0 , F1 , . . ..

168
Hung-Hsun, Yu 11.5 Algebraic Closure

Now for every polynomial f ∈ K[t], suppose that



n
f (t) = ai t i
i=0

where ai ∈ K. Therefore for each i = 0, . . . , n there exists mi ∈ N such that ai ∈ Fmi .


Choose m sufficiently large such that ai ∈ Fm for every i = 0, . . . , n. Then f ∈ Fm [t],
and so f has a root in Fm+1 . We can then split f (t) into (t − α)g(t) where α ∈ Fm+1 , g ∈
Fm+1 [t] and deg g = deg f − 1. Therefore by induction we can show that f splits into
linear factors in Fm+deg f [t], which shows that f splits into linear factors in K[t].
Theorem 11.5.1. For every field F , its algebraic closure exists.

Sketch of Proof. We know that there is an extension K of F that is algebraically closed.


Let K ′ be the set of algebraic elements in K over F . Then K ′ is a field. Moreover, we
know that every polynomial f ∈ K ′ [t] splits into linear factors in K[t]. Suppose that

f (t) = c (t − α),
then we know that c ∈ K ′ by comparing the leading coefficient and α is algebraic over
K ′ . It remains to show that α ∈ K ′ , or equivalently α is algebraic over F . Suppose that
the coefficients of f are a0 , . . . , an , then F [a0 , . . . , an ] is a finite extension of F and α is
algebraic over F [a0 , . . . , an ]. Hence F [a0 , . . . , an , α] is a finite extension of F [a0 , . . . , an ],
which in turn shows that F [a0 , . . . , an , α] is a finite extension of F . As a consequence,
F [α] is a finite extension of F , which shows that α is algebraic over F .
Lastly, let’s show that the algebraic closure is unique up to isomorphism. We can
actually prove a stronger statement.
Theorem 11.5.2. Let F̄ be an algebraic closure of a field F , and let K be an algebraic
extension of F . Then there exists an embedding K → F̄ .

Sketch of Proof. The idea here is to define a partial embedding f : L → F̄ for each field
L ⊆ K that contains F , and show that we can extend this until we get an embedding
K → F̄ .
Let D be the set of pairs (L, f ) where L is a subfield of K and at the same time a
extension of F , and f is an embedding L → F̄ . Note that D is nonempty since we are
given an inclusion F ⊆ F̄ . Define a partial order ≤ on D so that (L1 , f1 ) ≤ (L2 , f2 ) if
and only if L1 ⊆ L2 and f2 |L1 = f1 . For every chain C in D, let

LC = L
(L,f )∈C

and define fC (x) = f (x) if x ∈ L and (L, f ) ∈ C. It is easy to check that fC is well-
defined and (LC , fC ) ∈ D. It is also clear that (LC , fC ) is an upper bound of C, and so
by Zorn’s lemma we can choose a maximal element (L, f ).
Now if L ̸= K, then we can choose x ∈ K\L. Since x is algebraic over F , we know
that x is algebraic over L. Also since f (L) contains f (F ) = F , we know that F̄ is also
an algebraic closure of f (L). Let g(t) be the minimal polynomial of x over L, then
L[x] ∼= L[t]/⟨g⟩. Let α ∈ F̄ be a root of f (g(t)) (this exists since F̄ is an algebraic closure
of f (L)), then we can extend f onto L[t]/⟨g⟩ by sending t to α. Therefore we can extend
f onto L[x], which is strictly larger than L. This is a contradiction with the maximality
of (L, f ), and so L = K.

169
Hung-Hsun, Yu 11.6 Automorphism

Corollary 11.5.1. If K1 , K2 are two algebraic closures of F , then K1 ∼


= K2 .

Sketch of Proof. By the theorem, we know that there is an embedding f : K1 → K2 that


fixes F . Now for every element x in K2 , we know that it is algebraic over F , and so it is
in f (K1 ) since f (K1 ) is an algebraic closure of f (F ) = F . Hence f (K1 ) = K2 , and so f
is an isomorphism between K1 and K2 .

11.6 Automorphism
The main object that Galois theory cares about for an extension is the automorphism
group. It might not be clear why it is useful now, but we will soon see its power once we
are able to state and prove the main theorem of Galois theory.
Definition 11.6.1. Suppose that K/F is a field extension. The automorphism group
Aut(K/F ) of the extension K/F is the subgroup of Aut(K) consisting of the automor-
phisms that fix F .

Property 11.6.1. Every automorphism of K/F is an F -linear operator on K.

Example 11.6.1. Consider the field extension C/R. If σ is an automorphism in


Aut(C/R), then for any a, b ∈ R, we have that

σ(a + bi) = a + bσ(i),

and so σ is determined by the value of σ(i). Now note that we can only take σ(i)2 = −1,
which shows that σ(i) = ±i. Therefore Aut(C/R) ∼ = C2 .

Example 11.6.2. Now consider √ the field extension Q[ 3
2]/Q. We know that for every
a, b, c ∈ Q and σ ∈ Aut(Q[ 3 2]/Q), we have
√ √3
√ √
σ(a + b 2 + c 22 ) = a + bσ( 2) + cσ( 2)2 ,
3 3 3


and so σ is uniquely determined by the value of σ( 3 2). Since
√ √
σ( 2)3 − 2 = σ( 23 − 2) = 0,
3 3

√ √ √
the value σ( 3 2) is forced to be 3 2. Therefore Aut(Q[ 3 2]/Q) is a trivial group.
One can see that the reason that the automorphism √ group is trivial is because that
the polynomial t − 2√only has one root
3
√ in Q[ 2]. To fix this, we can take √
3
the splitting
field of t3 − 2 over Q[ 3 2], which √ is Q[ 3
2, ω]. It is still clear that σ ∈ Aut(Q[ √
3
2, ω]/Q) is
determined
√ √ by 3
√ the value of σ( 2) and σ(ω). There are three choices for σ( 2), namely
3

3
2, ω 3 2, ω 2 3 2, and there are two choices for σ(ω), namely ω and ω 2 . Therefore there
are six choices in total, and with some √ hard work one can see that each of them indeed
induces an automorphism in Aut(Q[ √
3
2, ω]/Q).
Although we know | Aut(Q[ 3 2, ω]/Q)| = 6, we don’t quite √ know what √ the group struc-

2 3
ture is yet. Notice that σ is also
√ determined
√ by√ the values of σ( 3
2), σ(ω

3
2) and σ(ω 2),
2 3
and each
√ σ√ should √permute 3
2, ω 3
2 and ω 2. Therefore Aut(Q[ 3
2, ω]/Q) acts
√ on the
set { 3 2, ω 3 2, ω 2 3 2} faithfully, which induces a monomorphism √ from Aut(Q[ 3 2, ω]/Q)
to S3 . Since they have the same order, we know that Aut(Q[ 3 2, ω]/Q) ∼ = S3 .

170
Hung-Hsun, Yu 11.6 Automorphism

What this example tells us is that “splitting” is one of the conditions that the auto-
morphism group does not “degenerate.” This is not the only one condition, but from here
we should see that the conditions should be something about the roots of the polynomials.
The second condition is rather technical, so I will delay it until later.

What we learn from those two examples is that: (1) In a finite extension K/F ,
there are some elements x1 , . . . , xn such that the values of σ(x1 ), . . . , σ(xn ) can determine
σ ∈ Aut(K/F ); (2) The image σ(x) should share the same polynomial relations with x,
and in particular they should share the minimal polynomial; (3) The automorphism group
of finite extension has finite order. The third observation might be somewhat surprising
before we see the examples, but it should be clear now. We will spend the rest of this
section to state those observations formally.
Property 11.6.2. Suppose that K/F is an extension, and K is a finitely generated
F -algebra generated by x1 , . . . , xn . Then for every two elements σ, σ ′ ∈ Aut(K/F ), if
σ(xi ) = σ ′ (xi ) for all i = 1, . . . , n, then σ = σ ′ .

Sketch of Proof. That K is generated by x1 , . . . , xn as an F -algebra is saying that the


ring homomorphism
F [t1 , . . . , tn ] → K
fixing F and sending ti to xi is surjective. Therefore for every x ∈ K there exists
f ∈ F [t1 , . . . , tn ] such that f (x1 , . . . , xn ) = x.
Now suppose that σ(xi ) = σ ′ (xi ) holds for all i. Then

σ(x) = σ(f (x1 , . . . , xn )) = f (σ(x1 ), . . . , σ(xn ))

= f (σ ′ (x1 ), . . . , σ ′ (xn )) = σ ′ (f (x1 , . . . , xn )) = σ ′ (x),


which shows that σ = σ ′ .
Property 11.6.3. Suppose that K/F is an extension and σ is an automorphism of
K/F , then for any algebraic element x ∈ K we have that σ(x) and x share the same
minimal polynomial over F .

Sketch of Proof. Let f be the minimal polynomial of x over F and f ′ be the minimal
polynomial of σ(x) over F . Then

0 = σ(f (x)) = f (σ(x)),

which shows that f ′ |f . Conversely,

0 = σ −1 (f ′ (σ(x)) = f ′ (σ −1 ◦ σ(x)) = f ′ (x),

which shows that f |f ′ . Therefore f = f ′ .


Definition 11.6.2. Let K/F be an extension and x be an algebraic element in K. The
conjugate elements of x in K over F are the roots of the minimal polynomial of x over
F in K, or equivalently, the elements in K that share the same minimal polynomial over
F with x. If y is a conjugate element of x, then we say that x, y are conjugates over F .

Corollary 11.6.1. Let F [θ]/F be a finite simple extension. Then | Aut(F [θ]/F )| is the
number of conjugates of θ over F in F [θ].

171
Hung-Hsun, Yu 11.6 Automorphism

Sketch of Proof. We know that σ(θ) is a conjugate of θ, and once we know which conju-
gate it is, the automorphism σ is then determined. Therefore we just need to check that
for any conjugate θ′ of θ there is an automorphism σ such that σ(θ) = θ′ . Suppose that
the minimal polynomial of θ is f , then we know that

π : F [t]/⟨f ⟩ → F [θ]

fixing F and sending t to θ is an isomorphism. We also know that

π ′ : F [t]/⟨f ⟩ → F [θ]

fixing F and sending t to θ′ is an isomorphism: This is because that π ′ is an isomorphism


from F [t]/⟨f ⟩ to F [θ′ ] and [F [θ′ ] : F ] = deg f , which shows that
[F [θ] : F ]
[F [θ] : F [θ′ ]] = = 1.
[F [θ′ ] : F ]
Now we can take σ = π ′ ◦ π −1 .
Remark. This result is somehow surprising in the sense that sending θ to its conjugates
guarantees that all the other elements are also sent to their conjugates. Later, we will
see a stronger result of this, which tells us (in a certain scenario) how to reconstruct
the minimal polynomial of other elements once we know the minimal polynomial of an
element.

Corollary 11.6.2. If K/F is a finite extension, then there are only finitely many au-
tomorphisms of K/F .

Sketch of Proof. Choose a basis x1 , . . . , xn in K over F . Then x1 , . . . , xn generate K as


an F -algebra, and so the automorphism σ is determined by the values σ(xi ). Since σ(xi )
can only be the roots of the minimal polynomial of xi , there are only finitely many choices
for each σ(xi ). Therefore there are only finitely many possibilities for σ.
In fact, this result can be refined by a lot. To refine this result requires some clever
tricks.
Lemma 11.6.1. Let σ1 , . . . , σn be distinct automorphisms of K/F . Then σ1 , . . . , σn
are linearly independent over K in the sense that if c1 , . . . , cn ∈ K satisfies that

c1 σ1 (x) + · · · + cn σn (x) = 0

for all x ∈ K, then c1 = · · · = cn = 0.

Sketch of Proof. Suppose for the sake of contradiction that they are not linearly inde-
pendent, then we can find a minimal linearly dependent set σ1 , . . . , σn such that

c1 σ1 (x) + . . . + cn σn (x) = 0

for all x ∈ K for some c1 , . . . , cn ∈ K. By the minimality we know that c1 , . . . , cn are


nonzero. It is also clear that n ̸= 1, and so σn ̸= σ1 . Therefore there exists y ∈ K such
that σn (y) ̸= σ1 (y). Now, we can use this to write down two different equations:

c1 σ1 (y)σ1 (x) + . . . + cn σn (y)σn (x) = c1 σ1 (xy) + . . . + cn σn (xy) = 0,

172
Hung-Hsun, Yu 11.7 Fixed Field

c1 σn (y)σ1 (x) + . . . + cn σn (y)σn (x) = σn (y) (c1 σ1 (x) + · · · + cn σn (x)) = 0.


Taking the difference of the two equations, we get that

c1 (σn (y) − σ1 (y)) σ1 (x) + · · · + cn−1 (σn (y) − σn−1 (y)) σn−1 (x) = 0.

Since c1 , σn (y) − σ1 (y) ̸= 0, we know that σ1 , . . . , σn−1 are linearly dependent, which
contradicts the minimality of σ1 , . . . , σn .

Theorem 11.6.1. Suppose that there are n distinct automorphisms in Aut(K/F )


where K/F is an extension, then [K : F ] ≥ n.

Sketch of Proof. If m = [K : F ] < n, let x1 , . . . , xm be a basis of K over F . Consider the


system of linear equations

σ1 (xi )y1 + · · · + σn (xi )yn = 0

for i = 1, . . . , m. Since y1 = · · · = yn = 0 is apparently a solution and there are more


variables than constraints, there is a solution y1 , . . . , yn that are not all zero. Now for
every x ∈ K, we know that it can be represented as x = a1 x1 + · · · + am xm for ai ∈ F .
This shows that
(m )

n ∑
n ∑ ∑
m ∑
n
yj σj (x) = yj σ j ai x i = ai yj σj (xi ) = 0,
j=1 j=1 i=1 i=1 j=1

which shows that σ1 , . . . , σn are linearly dependent. This contradicts the lemma, and so
m < n is impossible.

11.7 Fixed Field


In the previous section, we see that given a field extension, we can get a subgroup of
automorphism group of the larger field. Conversely, if we have a subgroup of the auto-
morphism group, we can construct a field extension.
Definition 11.7.1. Let K be a field and let G be a subgroup of Aut(K). Then the set
F of elements fixed by all elements in G forms a field and is called the fixed field of G.
This is sometime denoted as K G .

Example 11.7.1. Take our good old field extension C/R. Let G be the group Aut(C/R).
Then CG is R since the automorphisms in G is identity and conjugation. In this case,
taking the automorphism group and then taking the fixed field gives us the original field.
√ √
Example 11.7.2. Now consider again the field extenion Q[ 3 2]/Q. Since Aut(Q[ 3 2]/Q)
√ √
3 √
is trivial, we know that Q[ 3 2]Aut(Q[ 2]/Q) is Q[ 3 2]. In this case, taking the automorphism
group and then taking the fixed field gives us a strictly larger √ group.
Again, we can consider instead the field extension Q[ 3 2, ω]/Q.√ Let G ∼ = S3 be
the automorphism group of this extension. Then it is clear that Q[ 2, ω] = Q. We
3 G

can say even more about this field extension. Let H be √ the subgroup
√ of G containing
the√identity√map and the automorphism σ such that σ( 2) = 2, σ(ω) = ω 2 . Then
3 3

Q[ 3 2] = Q[ 3 2, ω]H . This gives us back the original field extension we are working with,

173
Hung-Hsun, Yu 11.7 Fixed Field

which somehow tells us that extending the field extension further until it is splitting is
the right thing to do. √ ′
If we take instead the subgroup H ′ that fixes ω, then we can see that Q[ 3 2, ω]H =
Q[ω].

In the above example, we can see that somehow “taking the fixed field” and “tak-
ing the automorphism group” build up a correspondence between the subgroups of the
automorphism group and the fields that lie between.
Definition 11.7.2. Let F ⊆ K be a field extension. We say that L is an intermediate
field if L is a field such that F ⊆ L ⊆ K.

Property 11.7.1. Given a field extension F ⊆ K. Let SF be the set of intermediate


fields and let SG be the set of subgroups of Aut(K/F ). Then we can construct a map
G : SF → SG such that for every L ∈ SF we take G(L) = Aut(K/L). We can also
construct a map F : SG → SF such that for every H ∈ SG we take F(H) = K H .

Sketch of Proof. We just need to show that the two maps are well-defined. For every
L ∈ SF , we need to show that Aut(K/L) is a subgroup of Aut(K/F ). Note that since
F ⊆ L, every element in Aut(K/L) fixed F and thus is in Aut(K/F ). We also need to
show that for every H ∈ SG , the fixed field K H is an intermediate field. It is certainly a
subfield of K, and so we just need to show that it contains F . Since H is a subgroup of
G, H fixes all the elements in F and so F ⊆ K H , as desired.

Property 11.7.2. The maps F, G constructed above are order-reversing, i.e. if L ⊆ L′


then G(L) ⊇ G(L), and if H ⊆ H ′ then F(H) ⊇ F (H ′ ). Moreover, for every intermediate
field L, the fixed field F(G(L)) contains L, and for every subgroup H of Aut(K/F ), the
subgroup G(F(H)) contains H.

Corollary 11.7.1. F ◦ G ◦ F = F, G ◦ F ◦ G = G.

Sketch of Proof. For every subgroup H, we know by the property that F(G(F(H))) con-
tains F(H). On the other hand, the property states that G(F(H)) contains H, and so
F(G(F(H))) ⊆ H because F is order-reversing. Similary argument works for G.

Corollary 11.7.2. F and G form a bijection between intermediate fields that are fixed
fields of some subgroups, and the subgroups that are automorphism groups of some
intermediate fields.

Therefore in order to get this good correspondence, we will want F to be a fixed field
of K. This is where the focus of Galois theory is.
Definition 11.7.3. Let F ⊆ K be a field extension. If F is the fixed field of Aut(K/F ),
and the extension is algebraic, then we say that this extension is Galois. In this case, we
usually write Gal(K/F ) instead of Aut(K/F ), and we call Gal(K/F ) the Galois group
of the extension. Moreover, if K is the splitting field of f over F , then we also call
Gal(K/F ) the Galois group of f .

In this case, the bound in Theorem 11.6.1 is actually sharp.

174
Hung-Hsun, Yu 11.7 Fixed Field

Theorem 11.7.1. If K/F is an extension where F is the fixed field of a finite auto-
morphism group G, then [K : F ] = |G|.

Sketch of Proof. By Theorem 11.6.1, it suffices to show that [K : F ] ≤ |G|. Otherwise,


if n = [K : F ] > |G| = m, let x1 , . . . , xn be a basis of K over F and σ1 , . . . , σm be the
automorphisms in Gal(K/F ). Consider the system of linear equations

σi (x1 )y1 + · · · + σi (xn )yn = 0

for every i = 1, . . . , m. Then again, since y1 = . . . = yn = 0 is a solution and there


are more variables than the conditions, there is a solution y1 , . . . , yn that is not all zero.
Choose a nonzero solution that contains the least nonzero terms, and WLOG suppose
that y1 , . . . , yr are the nonzero terms. We can WLOG assume that yr = 1. This reduces
the equation to
σi (x1 )y1 + · · · + σi (xr−1 )yr−1 + σi (xr ) = 0.
Since x1 , . . . , xn are linearly independent, y1 , . . . , yr cannot all lie in F . Suppose WLOG
that y1 ̸∈ F . Since F is the fixed field and y1 is not in it, we know that there exists σj
such that yσj (y1 ) ̸= y1 . Therefore
Ä ä
0 = σj σj−1 σi (x1 )y1 + · · · + σj−1 σi (xr ) = σi (x1 )σj (y1 ) + · · · + σi (xr−1 )σj (yj−1 ) + σi (xr ),

and so we have

σi (x1 )[σj (y1 ) − y1 ] + · · · + σi (xr−1 )[σj (yr−1 ) − yr−1 ] = 0,

which contradicts the minimality of y1 , . . . , yn since σj (y1 ) − y1 ̸= 0.

Corollary 11.7.3. If K/F is a finite Galois extension, then [K : F ] = | Gal(K/F )|.

Corollary 11.7.4. If K/F is a finite extension, then G ◦ F is the identity map on SG ,


and so F is injective, G is surjective.

Sketch of Proof. For any subgroup H of Aut(K/F ), we know that [K : K H ] = |H|. We


also have that [K : K H ] = | Aut(K/K H )| since K Aut(K/K ) = F(G(F(H))) = F(H) =
H

K H . Therefore H ⊆ Aut(K/K H ) and |H| = | Aut(K/K H )|, which indicates that H =


Aut(K/K H ).

Example 11.7.3. The extension C/Q is Galois. It has degree 2, and its Galois group
has order 2. √
The extension Q[ 3 2]/Q is not Galois. It has degree 3, while its group of automor-
phisms has order 1. √
The extension Q[ 3 2, ω]/Q is Galois. Its Galois group has order 6, so the theorem
implies
√ that it has degree 6. It is also not hard to show without the theorem that
3
[Q[ 2 : ω]/Q] = 6.

Lastly, let’s validate our claim that splitting field is a desirable property.
Property 11.7.3. If K/F is a finite Galois extension, then it is splitting.

175
Hung-Hsun, Yu 11.8 Separability

Sketch of Proof. It suffices to show that every element has a splitting minimal polynomial
over F in K (this is not that clear and is left as an exercise). Now for every x ∈ K we
can consider the polynomial ∏
(t − σ(x)).
σ∈Gal(K/F )

This polynomial is fixed by all the elements in Gal(K/F ), and so its coefficients are
fixed too. Since F is the fixed field, the coefficients lie in F , which shows that there is
a splitting polynomial that has x has a root. Since the minimal polynomial divides that
polynomial, the minimal polynomial also splits into linear factors in K, as desired.

11.8 Separability
We just proved that finite Galois implies splitting. In fact, it also implies that the
extension is “separable.” Before proving this, let’s first define what separable actually
means.
Definition 11.8.1. Let K/F be a field extension. A polynomial f over F is separable
if it has no double roots in its splitting field. An element x ∈ K is separable over F if its
minimal polynomial is separable. The field extension K/F is separable if every element
in K is separable over F .

It is quite annoying to work with splitting field when we want to verify if an element
is separable or not. The following property allows us to only work in the base field to
verify the separability of an element.
Property 11.8.1. Let K/F be a field extension, x be an algebraic element in K with
the minimal polynomial f . Then x is separable over F if and only if the formal derivative
of f is zero. Here, the formal derivative is simply taking the derivative by the usual
formula for taking derivatives of polynomials. Namely,
d
(an tn + · · · + a1 t + a0 ) = nan tn−1 + · · · + 2a2 t + a1 .
dt

Sketch of Proof. One can verify that, as usual, a polynomial f that splits into linear
factors has double roots if and only if gcd(f, f ′ ) ̸= 1. Now, since f is the minimal
polynomial of x over F , we know that f is irreducible. Therefore f has double roots
⇔ gcd(f, f ′ ) ̸= 1 ⇔ f |f ′ ⇔ f ′ = 0, as desired.
Corollary 11.8.1. Every field extension with characteristic zero is separable.

Corollary 11.8.2. Every finite extension of a finite field is separable.

Sketch of Proof. Let p be the characteristic. We can first prove that Fq is separable over
Fp for every power q of p. Suppose that f ∈ Fp [t] is a polynomial whose formal derivative
is 0, then we can write f as
an tpn + an−1 tp(n−1) + · · · + a1 tp + a0
for some a0 , . . . , an ∈ Fp . Therefore we have that
f (t) = (an tn + · · · + a0 )p ,

176
Hung-Hsun, Yu 11.8 Separability

which shows that f is not irreducible. As a consequence, Fq is separable over Fp . Then


the corollary follows immediately.

The two corollaries show that all the field extensions mentioned above are separable.
This somehow shows that “reasonable” field extensions are separable. However, there are
still extensions that are not separable.
Example 11.8.1. Consider the field extension Fp (x)/Fp (xp ). Then the element x is not
separable over Fp (xp ): the minimal polynomial of x is tp − xp , whose formal derivative is
ptp−1 = 0 (because the characteristic is p).

Extensions that are not separable have a lot of undesired properties, so usually we will
focus on separable extensions. There sure are theories developed for those inseparable
extensions, but they are far beyond the scope of this note.
Now we are ready to prove that finite Galois implies separable.
Theorem 11.8.1. If K/F is a finite Galois extension, then it is separable.

Sketch of Proof. For simplicity, let G = Aut(K/F ). For any element x, let Ox be the
orbit of x with respect to the group action G on K. Then the polynomial

f (t) = (t − y)
y∈Ox

is fixed by the group action G. Since F is the fixed field by G, we know that f (t) ∈ F [t].
We can then conclude that f is the minimal polynomial of x. Since every element in Ox
differs from one another, we have that x is separable over F . Thus, the extension K/F
is separable.

Corollary 11.8.3. Let K/F be a finite Galois extension. Then for every x ∈ K, the
polynomial ∏
(t − σ(x))
σ∈Gal(K/F )

is a power of the minimal polynomial of x over F .

Sketch of Proof. By the proof above, we know that the minimal polynomial is

f (t) = (t − y),
y∈Ox

and so
∏ | Gal(K/F )|
(t − σ(x)) = f (t) |Ox | .
σ∈Gal(K/F )

Finite Galois implies both splitting and separable. The converse is also true: splitting
and separable imply finite Galois.
Lemma 11.8.1. (Transitivity of auotomorphism) Let K be a splitting field of F . Then
for any elements x, y ∈ K that are conjugate with each other over F , there exists an
automorphism σ ∈ Aut(K/F ) such that σ(x) = y.

177
Hung-Hsun, Yu 11.9 Fundamental Theorem of Galois Theory

Sketch of Proof. Suppose that K is the splitting field of f . Since x, y are conjugates,
the two fields F [x] and F [y] are isomorphic. Note that K are the splitting fields of f
over F [x] and F [y], and f is preserved under the isomorphism between F [x] and F [y].
Therefore by the uniqueness of splitting field, the isomorphism between F [x] and F [y]
induces an automorphism of K fixing F and sending x to y.
Theorem 11.8.2. Let K/F be a field extension. Then it is finite and Galois if and
only if it is a splitting field of a separable polynomial f .

Sketch of Proof. If K/F is finite and Galois, then we know that it is separable and K
is also a splitting field of some polynomial f over F . Since K/F is separable, we can
replace f by a separable polynomial so that K is still its splitting field, as desired.
Conversely, if K/F is a splitting field of a separable polynomial f , then we can
induct by the degree of f . It clearly holds when deg f = 1, so let’s assume that it
holds whenever deg f < n. Then when deg f = n, we can factor f (t) into irreducibles
f1 (t), . . . , fm (t). Let α1 , . . . , αs be the roots of deg f1 (t). Then K is the splitting field
of f (t)/(t − αi ) over F [αi ] for every αi , and so by the induction hypothesis K/F [αi ] is
Galois. In other words, for every element that is not in F [αi ] there is an automorphism
in Aut(K/F [αi ]) ⊆ Aut(K/F ) that moves it. Now let x be an element that is fixed by
every automorphism in Aut(K/F ), then we know that x ∈ F [α1 ]. As a consequence there
exist c0 , . . . , cs−1 ∈ F such that

x = c0 + c1 α1 + · · · + cs−1 α1s−1 .

By the transitivity, we can choose an automorphism σi ∈ Aut(K/F ) such that σi (α1 ) =


αi . Then.
x = σ(x) = c0 + c1 αi + · · · + cs−1 αis−1 .
Since α1 , . . . , αs are s different elements, the above identity implies that c1 = · · · = cs−1 =
0, and so x = c0 ∈ F , as desired.

Example 11.8.2. Now let’s get back to the extension Q[ 3 2]/Q. Since the characteristic √
is zero, it is separable.√Now if there exists a polynomial f over Q such that Q[√3 2] is its
splitting field, then Q[ 3 2]/Q is Galois, which is a contradiction. Therefore Q[ 3 2] is not
the splitting field of any polynomial over Q, as asserted before.
In fact, there is another way that does not require the separability to verify if a finite
extension is a splitting field over the base field or not. See the exercise for it.

11.9 Fundamental Theorem of Galois Theory


Now everything is set up, we are ready to state and prove the fundamental theorem of
Galois theory. Basically, it states that for finite Galois extensions, the intermediate fields
are in bijection with the subgroups of the Galois group.
Theorem 11.9.1. (Fundamental theorem of Galois theory) Let K/F be a finite Ga-
lois extension, SF be the set of intermediate fields and SG be the set of subgroups of
Gal(K/F ). Moreover, let G : SF → SG be the map sending the intermediate field L to
Aut(K/L), and let F : SG → SF be the map sending the subgroup G to the fixed group
K G . Then F, G are inverse of each other. Moreover, for every intermediate field L we
have that [K : L] = |G(L)|.

178
Hung-Hsun, Yu 11.9 Fundamental Theorem of Galois Theory

Sketch of Proof. We already know that G ◦ F is the identity map on SG , and so we just
need to show that F ◦ G is the identity map on SF . Since K/F is finite Galois, we
know that there is a separable polynomial f over F such that K is the splitting field of
f over F . Therefore for every intermediate field L we also have that K is the splitting
field of f over L. Thus, K/L is also finite Galois, which shows that K Gal(K/L) = L and
[K : L] = | Gal(K/L)|, as desired.

This is a really powerful tool for enumerating intermediate fields because we usually
have a better understanding of the Galois group. The following are two examples of how
to make use of this statement.
√ √
Example 11.9.1. Consider the extension √ Q[
√ 2, √3]/Q. Then √ its automorphism group
consists of elements σ such that σ( 2) = ± 2, σ( 3) = ± 3. Therefore its automor-
phism group G is ismorphic to K4 . Moreover, it is √ clear that Q √
is its fixed
√ field, and √ so this
is a finite Galois extension. For simplicity, let σij ( 2) = (−1)i 2, σij ( 3) = (−1)j 3 for
i, j = 0, 1. Then the intermediate fields are the fixed fields of {σ00 }, {σ00 , σ10 }, {σ00 , σ01 },
{σ00 , σ11 }, G. √ √
The fixed fields of {σ00 }, G are clearly Q[ 2, 3] and Q,√ respectively.
√ So let’s
√ √ calculate

the three remaining fixed fields. Choose a Q-basis of Q[ 2, 3],√say {1, √ 2, √ 6}.
3,
Then the fixed field of {σ00 , σ10 } should contain the elements a + b 2 + c 3 + d 6 such
that
√ √ √ √ √ √ √ √ √
a + b 2 + c 3 + d 6 = σ10 (a + b 2 + c 3 + d 6) = a − b 2 + c 3 − d 6,

which shows that b = d = 0. Therefore the fixed field of {σ00 , σ10 } is√Q[ 3]. Similarly, √
one can calculate and obtain √that √ the other√two√fixed fields are Q[ 2] and Q[ 6]. A
consequence of this is that Q[ 2 + 3] = Q[ 2, 3].
Before, it would be hard to persuade one that there are only finitely many intermediate
fields. Now with this tool, we can even list all the intermediate fields.

Example 11.9.2. Now √ consider the extenion Q[ 4 2]/Q. This is not Galois since √ the
minimal polynomial
√ of 2 is t − 2, which does not split completely because ±i√ 2 is
4 4 4

not in Q[ 4 2]. To fix this, let’s take the splitting field of t4 − 2 over Q, which is Q[ 4 2, i].
This is finite Galois since t4 − 2 is automatically separable (note that we are working √ with
fields of characteristic 0). It is also easy to calculate that the Galois group of Q[ 2, i]/Q
4

consists of the element σ such that


√ √
σ( 2) = ia 2, σ(i) = (−1)b i
4 4


for a = 0, 1, 2, 3 and b = 0, 1. Denote such σ by σab . Then we can see that Gal(Q[ 4 2, i]/Q) ∼ =
D4 , where we can see σa0 as rotations and σ
√ a1 as reflections.
To find the √ intermediate fields of Q[ 4 2]/Q, it√is the same as finding the intermedi-
ate fields of Q[ 4 2, i]/Q that are contained in Q[ 4 2]. By the fundamental
√ theorem of
Galois √ theory, this √
4
is the same as finding the subgroups of Gal(Q[ 2, i]/Q) that contain
Gal(Q[ 2, i]/Q[ 2]) = {σ00 , σ01 }. There are three such subgroups: {σ00 , σ01 },
4 4

{σ00 , σ01 , σ20 , σ21 } and the entire group. Hence the only nontrivial√ intermediate field is the
fixed field of {σ00 , σ01 , σ20 , σ21 }, which can √ be Q[ √
be verified to √ 2] by some calculations.
In conclusion, the intermediate fields of Q[ 4 2]/Q are Q[ 4 2], Q[ 2] and Q.
What this example demonstrates is that if we are not in a splitting field, then we can
take a suitable extension to make it splitting. If the original extension happens to be

179
Hung-Hsun, Yu 11.10 More on Separability

separable, then the result will be Galois, and then we can apply the fundamental theorem
of Galois theory to get what we want.

If K/F is finite Galois, then we know that K/L is also finite Galois for every interme-
diate field L. On the other hand, L/F is not necessarily Galois, as the previous example
demonstrates. The fundamental theorem of Galois theory also provides a way to see if
L/F is Galois by checking some conditions in the Galois group.
Theorem 11.9.2. (Fundamental theorem of Galois theory Part 2) Let K/F be a finite
Galois extension, and let L be an intermediate field. Then the following are equivalent:

1. L/F is Galois;

2. Gal(K/L) is a normal subgroup of Gal(K/F );

3. σ(L) = L for every σ ∈ Gal(K/F ).

If any of the above holds, then Gal(K/F )/ Gal(K/L) ∼


= Gal(L/F ) in a natural way.

Sketch of Proof. 1. ⇒ 2.: By the uniqueness of splitting field, we can extend any au-
tomorphism of L/F to an automorphism of K/F . Since L/F is finite Galois, there are
[L : F ] automorphisms of L/F . Since [K : L][L : F ] = [K : F ] and | Gal(K/F )| =
[K : F ], | Gal(K/L)| = [K : L], we know that every coset of Gal(K/L) in Gal(K/F ) can
be obtained by extending an automorphism of L/F to one of K/F . This then gives a
compatible group structure of the cosets of Gal(K/L), which implies that Gal(K/L) is a
normal subgroup of Gal(K/F ).
2. ⇒ 3.: For every σ ∈ Gal(K/F ), we can see that σ Gal(K/L)σ −1 = Gal(K/σ(L)).
Therefore σ(L) = L because σ Gal(K/L)σ −1 = Gal(K/L) by the normality of Gal(K/L).
3. ⇒ 1.: The Galois group Gal(K/F ) induces an automorphism group of L since
σ(L) = L for every σ ∈ Gal(K/F ). The fixed field of this automorphism group is F , and
so L/F is Galois.
Now if one of them holds, then by the proof of the implication 1. ⇒ 2. we know that
Gal(K/F )/ Gal(K/L) ∼ = Gal(L/F ) in a natural way.

Example 11.9.3. Consider the extension Q[ 4 2, i]/Q again. Then its Galois group is
isomorphic to D4 , and the normal subgroups are D4 , {1}, C4 and K4 . Therefore
√ the inter-

mediate fields L such that L/Q is Galois are their fixed fields, namely Q[ 2, i], Q[i], Q[ 2]
4

and Q.

11.10 More on Separability


In this section, we will introduce some more results about
√ separable
√ extensions.
√ √ The first
is already implicitly mentioned above: we know that Q[ 2, 3] = Q[ 2 + 3] is a simple
extension by examining the intermediate fields. We can generalize this and prove that
every finite separable extension is simple.
Theorem 11.10.1. (Artin’s primitive element theorem) A finite extension is simple if
and only if there are only finitely many intermediate fields.

180
Hung-Hsun, Yu 11.10 More on Separability

Sketch of Proof. Suppose first that we have a finite extension F [θ]/F , where the minimal
polynomial of θ over F is f . For every intermediate field L, let g be the minimal polyno-
mial of θ over L. Then clearly g divides f . Assume that g(t) = tn + an−1 tn−1 + · · · + a0 ,
then we can show that L = F [a0 , . . . , an−1 ]. Clearly F [a0 , . . . , an−1 ] ⊆ L because ai ∈ L.
Moreover, [F [θ] : F [a0 , . . . , an−1 ]] ≤ n because g(θ) = 0. Since [F [θ] : L] = n, we have
that L = F [a0 , . . . , an−1 ]. As a consequence, there is at most one intermediate field for
every divisor of f . Since there are only finitely many divisors of f , we are done.
Conversely, if K/F is a finite extension with finitely many intermediate fields, we can
assume that F is infinite since finite extensions of finite field are always simple. Choose
θ ∈ K such that [F [θ] : F ] is the largest. If F [θ] ̸= K, then there exists x ∈ K\F [θ].
Consider the intermediate fields F [θ+cx] as c runs through the elements in F . Since there
are only finitely many intermediate fields, there exist c, c′ such that F [θ +cx] = F [θ +c′ x].
In particular, θ+cx, θ+c′ x ∈ F [θ+cx], which shows that (c−c′ )x ∈ F [θ+cx]. Multiplying
(c−c′ )−1 , we get that x ∈ F [θ+cx], and so θ ∈ F [θ+cx]. This implies that F [θ] ⊊ F [θ+cx]
(because x ∈ F [θ + cx]\F [θ]), which contradicts the choice of θ. Therefore F [θ] = K, as
desired.
Corollary 11.10.1. (Another version of primitive element theorem) Every finite sepa-
rable extension is simple.

Sketch of Proof. It suffices to show that if K/F is finite and separable, then there are
only finitely many intermediate fields. Suppose that a1 , . . . , an are elements in K such
that F [a1 , . . . , an ] = K and fi is the minimal polynomial of ai for any i = 1, . . . , n. Take
the splitting field E of f1 f2 . . . fn , then since K/F is separable we have that E/F is also
separable. As a consequence, E/F is finite Galois, and so the intermediate fields of K/F
correspond to the subgroups of Gal(E/F ) that contain Gal(E/K). Since Gal(E/F )
is finite, there are only finitely many subgroups, and so there are only finitely many
intermediate fields, as desired.
Next, we will show some criteria for an extension to be separable. The case that the
characteristic is 0 is trivial, so we will assume that the characteristic is p in the remaining
of this section.
Lemma 11.10.1. Let K/F be an extension of characteristic p, and let x be an algebraic
element over F in K. Then x is separable over F if and only if F [x] = F [xp ].

Sketch of Proof. If x is separable over F , then it is also separable over F [xp ]. Let g be
the minimal polynomial of x over F [xp ], then g(t)|tp − xp = (t − x)p . Since x is separable,
we know that g is separable, and so g(t) = t − x. This shows that x ∈ F [xp ], and so
F [x] ⊆ F [xp ].
Conversely, if F [x] = F [xp ] but x is not separable, let f be the minimal polynomial of
x over F . Then we know that f ′ = 0, which shows that f (t) = h(tp ) for some polynomial
h over F . Since f is irreducible, we know that h is irreducible. Since x is a root of f , we
know that xp is a root of h. This implies that h is the minimal polynomial of xp over F ,
and so
deg f = [F [x] : F ] = [F [xp ] : F ] = deg h,
which is absurd since deg f = p deg h.
Lemma 11.10.2. Let K/F be a finite extension of characteristic p. Then the following
are equivalent:

181
Hung-Hsun, Yu 11.10 More on Separability

1. there exists a basis {v1 , . . . , vn } of K over F such that {v1p , . . . , vnp } is also a basis;
2. for every basis {v1 , . . . , vn } of K over F , {v1p , . . . , vnp } is also a basis;
3. K/F is separable.

Sketch of Proof. 1. ⇒ 2.: For every basis {u1 , . . . , un }, it suffices to prove that {up1 , . . . , upn }
span K over F . It then suffices to show that {up1 , . . . , upn } spans vip for i = 1, . . . , n. Since
{u1 , . . . , un } is a basis, we can write

vi = c1 u1 + · · · + cn un ,

and so
vip = (c1 u1 + · · · + cn un )p = cp1 up1 + · · · + cpn upn ,
as desired.
2. ⇒ 3.: For any element x ∈ K, suppose that s = [F [x] : F ]. Then {1, . . . , xs−1 }
is linearly independent over F . By the replacement lemma, we can extend this to a
basis {1, . . . , xs−1 , us+1 , . . . , un } of K over F . Then {1, xp , . . . , xp(s−1) , ups+1 , . . . , upn } is
also a basis, and so {1, xp , . . . , xp(s−1) } is also linearly independent. As a consequence,
[F [xp ] : F ] ≥ [F [x] : F ]. Since F [xp ] ⊆ F [x], we know that F [xp ] = F [x], and so x is
separable. We prove that every element is separable, and so K/F is separable.
3. ⇒ 1.: Let {v1 , . . . , vn } be a basis of K over F , and let L be the vector space spanned
by {v1p , . . . , vnp }. We can first prove that in fact L = F [v1p , . . . , vnp ]. Since {v1 , . . . , vn } is a
basis, we know that for every i, j there exists cijk ∈ F such that

vi vj = cijk vk .

Therefore Ä∑ äp ∑ p
vip vjp = cijk vk = cijk vkp ,
as desired. This then shows that L is an intermediate field. Now if a ∈ K\L, then by the
same trick we can show that ap ∈ L. Since a is separable, we know that F [a] = F [ap ].
This is a contradiction since a ∈ F [a] = F [ap ] ⊆ L. Therefore L = K, as desired.
With this, we can now give two criteria that are really useful criteria for separability.
Theorem 11.10.2. Let α be an algebraic element over a field F . If α is separable over
F , then F [α] is also separable over F .

Sketch of Proof. Consider the case where the characteristic is p. We know that F [α] =
F [αp ], and so {1, α, . . . , αn−1 } and {1, αp , . . . , αp(n−1) } are both basis of F [α] over F .
Hence F [α] is separable over F .
Theorem 11.10.3. Let K/F be a finite extension and L be an intermediate field. Then
K/F is separable if and only if K/L, L/F are both separable.

Sketch of Proof. Consider the case where the characteristic is p. If K/F is separable, then
clearly K/L and L/F are both separable. Conversely, if K/L and L/F are separable,
then there is an L-basis {a1 , . . . , an } of K such that {ap1 , . . . , apn } is an L-basis. There is
also an F -basis {b1 , . . . , bm } of L such that {b1 , . . . , bpm } is also an F -basis of L. Then
{a1 b1 , . . . , an bm } and {ap1 bp1 , . . . , apn bpM } are both F bases of K, and so K/F is separable,
as desired.

182
Hung-Hsun, Yu 11.11 Random Problem Set

Corollary 11.10.2. Let α1 , . . . , αn be algebraic elements over a field F . Then F [α1 , . . . , αn ]


is separable over F if and only if α1 , . . . , αn are all separable over F .

To conclude this section (and also this chapter), let’s introduce perfect field as a
perfect ending :).
Definition 11.10.1. A field is a perfect field if every finite extension of it is separable.

Example 11.10.1. Since every field extension of characteristic 0 is separable, we know


that every field with characteristic 0 is perfect. In addition, every finite field is perfect.

Theorem 11.10.4. A field F of characteristic p is perfect if and only if for every x ∈ F ,


there exists y ∈ F such that y p = x.

Sketch of Proof. If F is perfect, then for every x ∈ F , consider the splitting field K of
tp − x over F and suppose that y is a root of tp − x. Let f be the minimal polynomial of
y, then f (t)|tp − x = tp − y p = (t − y)p . Since F is perfect, K is separable over F and so
f is separable over F . This then implies that f (t) = t − y, and so y ∈ F , as desired.
Conversely, if for every x ∈ F there exists y ∈ F such that y p = x, then for every
finite extension K of F and every element a ∈ K, assume that its minimal polynomial
over F is f . If f ′ (t) = 0, we can suppose that


n−1
f (t) = tpn + ci tpi
i=0

for c0 , . . . , cn−1 ∈ F . Then ( )p


n

n−1

f (t) = t + p
ci ti ,
i=0

and so f is not irreducible. This is a contradiction. As a consequence, f ′ (t) ̸= 0, and so


a is separable. This then indicates that every finite extension of F is separable.

11.11 Random Problem Set


1. (11.1) Consider the field extension Q ⊆ C. Find two algebraic elements x, y in C
over Q such that [Q[x] : Q][Q[y] : Q] ̸= [Q[x, y] : Q].

2. (11.1) Let Fp (x, y) be the field of rational functions in x, y, or equivalently, the


fraction field of Fp [x, y]. Let F = Fp (xp , y p ) and K = Fp (x, y), then show that
[K : F ] = p2 and for every f ∈ F , we have f p ∈ K. Conclude that F ⊆ K is not
simple.

3. (11.2) Show that for every prime p, we have that



i1 i2 · · · in ≡ 0 mod p
0≤i1 <i2 <...<in ≤p−1

for all n = 1, . . . , p − 2. Also, show that

(p − 1)! ≡ −1 mod p.

183
Hung-Hsun, Yu 11.11 Random Problem Set

4. (11.2) Let Fq be the finite field of order q = pn . Prove, by induction, that for
any two polynomials f, g ∈ Fq [x1 , . . . , xm ], if f (x) = g(x) for every x ∈ Fm q , then
f −g ∈ ⟨xq1 −x1 , . . . , xqm −xm ⟩. Using this, show that for every function s : Fmq → Fq
there exists a polynomial f ∈ Fq [x1 , . . . , xm ] such that f (x) = s(x) for all x ∈ Fm q .

5. (11.2) (Chevalley-Warning theorem) Suppose that Fq is the field of order q, where


q is a power of p. Let f1 , . . . , fn be n polynomials in Fq [x1 , . . . , xm ], and let S
be the set of x ∈ Fm q such that f1 (x) = f2 (x) = · · · = fn (x) = 0. Show that if
deg f1 + . . . + deg fn < m, then |S| ≡ 0 mod p.
This in particular shows that if 0 is a trivial solution to f1 , . . . , fn , then there
exists another nontrivial solution. Even more particularly, for every a, b, c ∈ Fp ,
the polynomial ax2 + by 2 + cz 2 = 0 has a nontrivial solution in F3p .
(Hint: Suppose that f is a polynomial in m variables. Try to construct a polynomial
h in m variables such that h(x) = 0, 1 and h(x) = 1 if and only if f (x) = 0, and
that deg h ≤ (q − 1) deg f . Now, construct the polynomial h such that h(x) = 0, 1
and h(x) = 1 if and only if x ∈ S in two ways, and use the previous exercise.)

6. (11.3) Let F be a field, f be an irreducible polynomial in F [t] of degree n, and K


be a splitting field of f over F . Then n|[K : F ]|n!.

7. (11.4) Show that for every prime p and for every positive integer n, there exists an
irreducible polynomial in Fp [t] with degree n. Moreover, if there is a unique one,
then p = n = 2.

8. (11.4) Let a1 , a2 , . . . be a sequence of positive integers. If this sequence satisfies


that there is a prime p such that

ai = p n
i|n

for any positive integer n, show that for any positive integer n,

n|an .

9. (11.4) For any integer n, show that



xn
x∈F×
q

is 0 if q − 1 ∤ n, and is −1 if q − 1|n.
(Hint: Think about how primitive root can help here)

10. (11.5) Let K1 , K2 be two fields. Let D be the set of pairs (A, f ) where A is a subring
of K1 and f is a homomorphism of rings with 1 from A to K2 . Define a partial
order ≤ on D so that (A, f ) ≤ (A′ , f ′ ) if and only if A ⊆ A′ and f = f ′ |A . Show
that if (A, f ) is a maximal element in D, then ker f is the unique maximal ideal of
A.

11. (11.6) Show that Aut(R/Q) is trivial.


(Hint: Show that every automorphism preserves the order)

184
Hung-Hsun, Yu 11.11 Random Problem Set

12. (11.6) Let k be a field and k(t) be the field of rational functions in t. Consider the
rational function
(t2 − t + 1)3
s(t) = 2
t (t − 1)2
and let k(s) be the smallest field in k(t) containing s. Then k(t)/k(s) is a field
extension. Verify that
1 1 1 t
σ(f (t)) = f (t), f (1 − t), f ( ), f (1 − ), f ( ), f ( )
t t 1−t t−1
are six automorphisms in Aut(k(t)/k(s)). Also show that there exists a polynomial
with coefficients in k(s) of degree 6 that has t as a root. Conclude that [k(t) :
k(s)] = 6.

13. (11.7) Let A be a commutative ring. Let G be a finite group of automorphisms


of A, and let AG be the elements fixed by all elements in G. Show that AG is a
subring of A, and every element in A is integral over AG . In other words, for every
x ∈ A there exists n ∈ N and a0 , . . . , an−1 ∈ AG such that


n−1
xn + ai xi = 0.
i=0

14. (11.8) An extension K/F is called normal if every irreducible polynomial over F
that has a root in K splits into linear factors. √
Show that if K is a splitting field
over F , then K/F is normal. Conclude that Q[ 3 2] is not a splitting field over Q.
(Hint: Let f be an irreducible polynomial, and let E be its splitting field over
K. Then we need to show that if there is a root α of f that lies in K, then
every root β of f also lies in K. Use the uniqueness of splitting field to show that
K = K[α] ∼ = K[β], which then shows that β ∈ K.)
√ √ √
15. (11.9) Find all the intermediate fields Q[
√ √ √2, 3, 5]/Q. Using this, find a prim-
of
itive element θ such that Q[θ] = Q[ 2, 3, 5].

16. (11.10) For an extension K/F of characteristic p, it is purely inseparable if for


every element x ∈ K there exists n ∈ N such that xp ∈ F . Show that for any
n

extension K/F of characteristic p, the set S of separable elements over F forms


an intermediate field, and K/S is purely inseparable. This field S is called the
separable closure of F in K.

185
Hung-Hsun, Yu 11.11 Random Problem Set

186
Chapter 12

Applications of Field Extension and


Galois Theory

In this chapter, we are going to apply what we have learned in the last chapter and
obtain the results about ruler and compass construction and radical formula for solving
polynomials. These are just the classical and basic examples of the applications of field
extensions, and there are a lot of other examples that I will not mention here.
In addition, to have a better understanding of solving polynomials, we are going
to introduce the elementary symmetric polynomials. In some sense, they provide the
most complicated setting for solving polynomials (which is kind of similar to symmetric
group being the most complicated group). They will be really helpful for finding roots of
polynomials, and as we will see lastly, they will allow us to determine the Galois group
of a splitting field of irreducible polynomials with degree less than five without getting
our hands dirty.

12.1 Ruler and Compass Construction


Ruler and compass have been two important tools to draw synthetic geometry diagram.
A lot can be done with only rulers and compasses, such as perpendicular bisectors, angle
bisectors, centroids, orthocenters, circumcenters, to name a few. The ancient Greeks
asked if, with only ruler and compass construction, one can “square the circle”, “double
the cube” and “trisect the angle.” The answers turn out to be negative. But before
we answer these questions, let’s first give a formal definition of the ruler and compass
construction.
Definition 12.1.1. Given some points, lines and circles (by default there are only two
points, which we will WLOG assume to be (0, 0) and (1, 0)). A point, line or circle is
called constructible if it can be obtained after finitely many operations that follow:

1. Connect two constructed points to form a line.

2. Draw a circle centered at a constructed point passing through another constructed


point.

3. Take the intersection of two constructed lines, one constructed line and one con-
structed circle, or two constructed circles.

187
Hung-Hsun, Yu 12.1 Ruler and Compass Construction

Definition 12.1.2. A real number a is constructible if the point (a, 0) is constructible


given two points (0, 0), (1, 0).

Example 12.1.1. Given two points A(0, 0), B(1, 0), draw the circle O1 centered at
A passing through B and the circle O2 centered at B passing through A. Take the
intersections C, D of O1 , O2 . Draw the line CD, and take the intersection E of AB and
CD. Then E has a coordinate (1/2, 0), and so 1/2 is constructible.

We can actually construct way more points.


Property 12.1.1. Suppose that a, b are constructible, then a + √b, a − b, ab are all con-
structible. If b ̸= 0, then a/b is also constructible. If a > 0, then a is also constructible.

Sketch of Proof. It is clear that it suffices to consider the case where a, b > 0. In the above
example, we actually demonstrate how to construct the midpoint of two constructed
points. Since (a, 0), (b, 0) are constructible, we know that ((a + b)/2, 0) is constructible.
Now take the circle centered at ((a + b)/2, 0) passing through (0, 0), and take the in-
tersection of it with the x axis. Then the intersection other than (0, 0) is (a + b, 0), as
desired. We can similarly construct (a − b, 0).
To construct ab, it suffices to construct a/b for all constructible a, b. This is because
that 1 is constructible, and so if a, b constructible implies a/b constructible, then a, b con-
strucible implies 1/b constructible, which then shows that a/(1/b) = ab is constructible.
The way that we construct a/b is pretty simple: draw the y axis, and then mark the point
A(0, 1), B(0, b), C(a, 0). Now draw the line l passing through A parallel to BC, and take
the intersection D of l and the x axis. By similar triangle, we know that the coordinate
of D is (a/b, 0), as desired.

Lastly, to construct a, draw the y axix and mark the two points A(0, a), B(0, −1).
Construct their midpoints and then draw a circle centered at their midpoints passing
through both of them. Take the intersection C of the circle with the x-axis. Then by
similar√triangle, we have that AO/CO = CO/BO √ where O is the origin. This shows that
CO = a, and so the coordinate of C is ± a, as desired.
As a consequence, if we can obtain a real number by taking finitely many square roots,
then that real number will be constructible. We can make it more precise.
Theorem 12.1.1. Let α be a real number. If there exists a chain of fields Q = K0 ⊆
K1 ⊆ · · · ⊆ Kn ⊆ R such that α ∈ Kn and [Ki+1 : Ki ] = 2, then α is constructible.

Sketch of Proof. We can prove that Kn contains only constructible elements. Since a, b
constructible implies a + b, a − b, ab, a/b constructible, the constructible reals form a field.
In particular, the rationals are constructible.
Now we can induct on n. There is nothing to prove when n = 0. As an inductive step,
suppose that every element in Kn−1 is constructible. Since [Kn : Kn−1 ] = 2, we know
that Kn = Kn−1 [x] for any x ∈ Kn \Kn−1 . Let t2 + pt + q be the minimal polynomial of
x, then we know that √
−p ± p2 − 4q
x= .
2
By the induction hypothesis
√ 2 p, q are constructible, and so p2 − 4q is also constructible.
As a consequence, p − 4q is constructible, which shows that x is constructible. Then
the entire field Kn is constructible, as desired.

188
Hung-Hsun, Yu 12.1 Ruler and Compass Construction

It turns out that this sufficient condition for a real to be constructible is also necessary.
To see this, at step n let Kn be the field generated by the coordinates of the constructed
points. For any two points (a0 , b0 ), (a1 , b1 ), the equation of the line passing through these
two points is
(b1 − b0 )x − (a1 − a0 )y = a0 b1 − a1 b0 ,

and so the coefficients are all in Kn . Similarly, the equation of the circle centered at
(a0 , b0 ) passing through (a1 , b1 ) is

(x − a0 )2 + (y − b0 )2 = (a1 − a0 )2 + (b1 − b0 )2 ,

whose coefficients are all in Kn . Therefore the coordinate of the intersection of two lines
still lies in Kn . The coordinate of the intersection of a line and a circle or two circles
satisfies a quadratic equation. This tells us that either Kn+1 = Kn or [Kn+1 : Kn ] = 2.
Hence,
Theorem 12.1.2. Let α be a real number. Then α is constructible if and only if
there exists a chain of fields Q = K0 ⊆ K1 ⊆ · · · ⊆ Kn ⊆ R such that α ∈ Kn and
[Ki+1 : Ki ] = 2.

Corollary 12.1.1. If α is constructible, then α is algebraic and [Q[α] : Q] is a power


of 2.


Sketch of Proof. Let Ki′ = Q[α] ∩ Ki . Then [Ki+1 : Ki′ ] = 1 or 2. Hence [Q[α] : Q] =
′ ′
[Kn : K0 ] is a power of 2.

Remark. Note that this is not a sufficient condition. In fact, there is a degree-four
algebraic real that is not constructible. The tool that we need to show that will be
developed later, so see exercise for it.

Example 12.1.2. Now we can answer the three problems and prove that it is impossible
to solve the problem with ruler and compass construction.
First, given a circle, we cannot construct a square with the same area. This
√ is because
that if we could construct such square, then the side length of
√ it would be π (given that
the radius of the circle is 1). This would then implie that π is constructible, which is
absurd since π is transcendental.
Given a cube,
√ we also cannot construct another cube that is twice as large. This is
because that 2, having a minimal polynomial t3 − 2 of degree 3, is not constructible.
3

Lastly, if we could always trisect an angle, then we could construct an angle π/9 since
we can construct an angle π/3 (by constructing an equilateral triangle). This then would
show that cos π/9 is constructible. However, cos π/9 satisfies the equation

1
4t3 − 3t = cos π/3 = ,
2

or equivalently,
8t3 − 6t − 1 = 0.

This polynomial turns out to be irreducible over Q, and so cos π/9 is not constructible.

189
Hung-Hsun, Yu 12.2 Solution in Radicals

12.2 Solution in Radicals


In the remaining of this chapter, we will focus on solving polynomials (in one variable)
by radicals. To understand better what it means to solve a polynomial by radical, let’s
begin with some elementary examples.
Example 12.2.1. Suppose that a, b are two numbers, and we want to solve the equation

x2 + ax + b = 0.

By what is taught in middle schools, to solve this, we should rewrite it as


1 1
(x + a)2 = a2 − b,
2 4
and so
1 1 2
x=− a± a − b.
2 4
Note that this always works when we are working with fields with characteristic not
equal to 2. Therefore if a, b belongs to a field F such that char F ̸= 2, then to obtain the
splitting field of t2 + at + b, it suffices to adjoin the square root of a2 /4 − b.

Example 12.2.2. Now suppose that a, b, c are three numbers, and we want to solve
the equation
x3 + ax2 + bx + c = 0.
We can use the same trick and rewrite this as
1 1 1 1 2
(x + a)3 + (b − a2 )(x + a) + (c − ab + a3 ) = 0.
3 3 3 3 27
With appropriate substitutions, it suffices to solve for y with

y 3 + py + q = 0

where p, q are given. Cardano gave a really clever way to solve this in the following way:
Set y = u + v, and rewrite the equation as

u3 + v 3 + q + (3uv + p)(u + v) = 0.

Therefore if we can find u, v such that

u3 + v 3 = −q

and
3uv = −p,
then y = u + v is a solution. Note that the second equation can be written as u3 v 3 =
−p3 /27. Therefore u3 , v 3 are the roots of the polynomial t2 + qt − p3 /27. We then know
that
1 1 2 1
u3 , v 3 = − q ± q + p3 ,
2 4 27
and so √
3 1 1 2 1
u, v = − q ± q + p3 .
2 4 27

190
Hung-Hsun, Yu 12.3 Elementary Symmetric Polynomial

Here we are actually cheating a bit: if we see taking a cubic root of s as solving the equa-
tion t3 − s, then there are three roots. This then shows that there are three possibilities
each for u and v, and so there should be nine possibilities for y, which is clearly absurd.
The issue here is that all we are guaranteed is that u3 v 3 = −p3 /27, and so it could be
that 3uv = −ωp for some primitive third root of unity ω. This tells us that we have to
choose the correct u, v and add them together to get y.
This argument works as long as we are working with fields with characteristic not
equal to 2 or 3. If a, b, c belongs to a field F such that char F ̸= 2, 3, then p, q ∈ F ,
and so to obtain the splitting field of t3 + pt + q it suffices
» to adjoin the square roots of
q /4 + p /27, and then adjoin the cubic roots of −q/2 ± q 2 /4 + p3 /27. By the relation
2 3

x = y − a/3, we can see that this will also be the splitting field of t3 + at2 + bt + c.

In the above two examples, we successfully express the roots in terms of the coeffi-
cients with finitely many additions, subtractions, multiplications, divisions, and taking
the n-th roots. We call this kind of solution an algebraic solution or a solution in radicals.
With more hard work, people in the past also found an algebraic solution to quartic poly-
nomials. On the other hand, Abel-Ruffini theorem states that there does not exist such
an expression for a “general” quintic polynomial. In the following sections, we will prove
this with Galois theory. To this end, let’s first give a Galois-theoretic reinterpretation of
solutions in radicals.
Definition 12.2.1. A field extension K/F is a simple radical extension if F [α] = K
for some α ∈ K such that there is n ∈ N satisfying αn = F . A radical series F0 ⊆ F1 ⊆
· · · ⊆ Fn is a series of fields extensions such that Fi /Fi+1 is a simple radical extension.

Definition 12.2.2. A polynomial f over a field F is solvable over F if there is a radical


series F0 ⊆ F1 ⊆ · · · ⊆ Fn such that F0 = F and f splits into linear factors in F .

Example 12.2.3. We have seen that any quadratic and cubic polynomial over a field
of characteristic 0 is solvable. As a consequence, every quadratic and cubic polynomial
over R and C is solvable.

12.3 Elementary Symmetric Polynomial


Suppose that we are given a monic polynomial f of degree n over F , and K is the splitting
field of f over F . Then f splits into linear factors. Let the n roots be α1 , . . . , αn , so that

f (t) = (t − α1 )(t − α2 ) · · · (t − αn ).

Then the coefficient of tn−k of f is



(−1)k α i1 α i2 . . . α ik .
1≤i1 <i2 <...<ik ≤n

This motivates the following definition:


Definition 12.3.1. Let x1 , . . . , xn be n indeterminates. The k-th elementary symmetric
polynomial in the variables x1 , . . . , xn is

ek (x1 , . . . , xn ) = αi1 αi2 . . . αik .
1≤i1 <i2 <...<ik ≤n

191
Hung-Hsun, Yu 12.3 Elementary Symmetric Polynomial

Property 12.3.1. For any α1 , . . . , αn , we have that


n
(t − α1 )(t − α2 ) · · · (t − αn ) = tn + (−1)k ek (α1 , . . . , αn )tn−k .
k=1

Example 12.3.1. Consider the field of rational functions F (x1 , . . . , xn ), and consider
the subfield F (e1 , . . . , en ) generated by the n elementary polynomials. Then the field
F (x1 , . . . , xn ) is the splitting field of


n
tn + (−1)k ek tn−k
k=1

over F (e1 , . . . , en ).

In fact, this example is the “most general” setting.


Property 12.3.2. Suppose that f is a polynomial of degree n over F , and K is the
splitting field of f over F . Then there exists a surjective homomorphism F (x1 , . . . , xn ) →
K such that the subfield F (e1 , . . . , en ) is sent to a subfield of F . In particular, if there
exists a radical series connecting F (x1 , . . . , xn ) and F (e1 , . . . , en ), then f is solvable over
F.

Sketch of Proof. Suppose that α1 , . . . , αn are the n roots of f in K, then we can consider
the homomorphism sending xi to αi . Since K is generated by α1 , . . . , αn , the map is
surjective. Note that for every i, we have that ei gets sent to ei (α1 , . . . , αn ), which is
[tn−i ]f (t). Therefore the image of ei is in F , which shows that the image of F (e1 , . . . , en ) is
contained in F . Now if there is a radical series connecting F (x1 , . . . , xn ) and F (e1 , . . . , en ),
then there is a radical series connecting K and the image of F (e1 , . . . , en ). Since the
image of F (e1 , . . . , en ) is contained in F , one can modify the radical series so that the
base field becomes F , which then shows that f is solvable over F . The detail is left as
an exercise.

Now let’s see what properties are there in this most general setting. Let’s first consider
the automorphism group of the extension F (x1 , . . . , xn )/F (e1 , . . . , en ). Since e1 , . . . , en
are symmetric, we can see that permuting the indicies of xi fixes F (e1 , . . . , en ). To be
more precise, every permutation π ∈ Sn induces an automorphism σπ of F (x1 , . . . , xn ),
namely
(σπ (f ))(x1 , . . . , xn ) = f (xπ(1) , . . . , xπ(n) ) ∀f ∈ F (x1 , . . . , xn ),
and σπ fixes e1 , . . . , en by definition. Therefore σπ also fixes F (e1 , . . . , en ), which shows
that σπ ∈ Aut(F (x1 , . . . , xn )/F (e1 , . . . , en )), and hence Sn ≤ Aut(F (x1 , . . . , xn )/F (e1 , . . . , en )).
Actually these are all the automorphisms. To show this, it suffices to show that the fixed
field of Sn is F (e1 , . . . , en ), or equivalently, the elementary symmetric polynomials gen-
erate the field of symmetric rational functions. In fact, we can prove something stronger.
Theorem 12.3.1. (Fundamental theorem of symmetric polynomials) For any commu-
tative ring A with 1 and any symmetric polynomial f ∈ A[x1 , . . . , xn ]Sn , there exists a
unique polynomial g ∈ A[x1 , . . . , xn ] such that g(e1 , . . . , en ) = f (x1 , . . . , xn ). In other
words, every symmetric polynomial can be uniquely written as a polynomial of the ele-
mentary symmetric polynomials.

192
Hung-Hsun, Yu 12.3 Elementary Symmetric Polynomial

Sketch of Proof. We will prove this by induction on the “degree” of the polynomial f .
Here, instead of the usual definition of degree where x1 , . . . , xn all have the same weight,
we consider the lexicographical order, and use it as the “degree” of the polynomial.
Therefore the polynomial x1 x2 + x2 x3 + x3 x1 + x1 + x2 + x3 should be written as x1 x2 +
x1 x3 + x1 + x2 x3 + x2 + x3 when using descending order, and the leading term becomes
x1 x2 , the degree becomes (1, 1, 0).
Let’s first prove the existence of g. Induct on the degree of f . The case that the
degree of f is (0, 0, . . . , 0). Now suppose that the statement holds for any symmetric
polynomial of lower degree, then suppose that the leading term of f is axd11 · · · xdnn for
some 0 ̸= a ∈ A. Since f is symmetric, we know that d1 ≥ d2 ≥ · · · ≥ dn . This is because
if di < di+1 for some i, then
d d
[xd11 · · · xi i+1 xdi+1
i
· · · xdnn ]f = [xd11 · · · xdi i xi+1
i+1
· · · xdnn ]f = a ̸= 0,

d
which is a contradiction with the assumption that axd11 · · · xdi i xi+1 i+1
· · · xdnn is the leading
term. Now set ci = di − di+1 (here dn+1 = 0), then it is easy to verify that the degree
of the polynomial ec11 ec22 · · · ecnn is (d1 , . . . , dn ). Hence the degree of f ′ = f − aec11 · · · ecnn is
smaller. By the induction hypothesis, there is g ′ such that f ′ (x1 , . . . , xn ) = g ′ (e1 , . . . , en ).
Set g(x1 , . . . , xn ) = axc11 · · · xcnn + g ′ (x1 , . . . , xn ), then

f (x1 , . . . , xn ) = f ′ (x1 , . . . , xn ) + aec11 · · · ecnn = g ′ (e1 , . . . , en ) + aec11 · · · ecnn = g(e1 , . . . , en ),

as desired.
Now it remains to show the uniqueness. It suffices to show that g(e1 , . . . , en ) ̸= 0
for every nonzero polynomial g. Otherwise, let xc11 · · · xcnn be the monomial with nonzero
coefficient in g such that (c1 + c2 + · · · + cn , c2 + · · · + cn , . . . , cn ) is the largest. It is clear
that such monomial is unique. Then the degree of g(e1 , . . . , en ) is (c1 + c2 + · · · + cn , c2 +
· · · + cn , . . . , cn ), which is a contradiction with the assumption that g(e1 , . . . , en ) = 0.

Corollary 12.3.1. F (e1 , . . . , en ) is isomorphic to F (x1 , . . . , xn ).

Corollary 12.3.2. Let K/F be an arbitrary extension, and let α1 , . . . , αn be n elements


in K. If ei (α1 , . . . , αn ) ∈ F for any i = 1, . . . , n, then for any symmetric polynomial p in
n variables over F , we have that p(α1 , . . . , αn ) ∈ F .

Corollary 12.3.3. The fixed field of Sn permuting on the indices of x1 , . . . , xn in


F (x1 , . . . xn ) is F (e1 , . . . , en ). Therefore, the extension F (x1 , . . . , xn )/F (e1 , . . . , en ) is Ga-
lois of degree n!.

Example 12.3.2. To demonstrate how the proof works, let’s consider the following
example. The polynomial [(x1 − x2 )(x2 − x3 )(x3 − x1 )]2 is symmetric. We can expand it
as
( )2
∑ 2 ∑ 4 2
[(x1 −x2 )(x2 −x3 )(x3 −x1 )] = 2
xx 1 2 − x1 x22 = x x −2x4 x
1 2 1 2 x3 −2x1 x2 +2x1 x2 x3 −6x1 x2 x3 .
3 3 3 2 2 2 2
cyc sym


Here cyc shows that xa1 xb2 xc3 stands for the orbit of xa1 xb2 xc3 when permuted cyclically,

and sym shows that xa1 xb2 xc3 stands for the orbit of xa1 xb2 xc3 when permuted arbitrarily.

193
Hung-Hsun, Yu 12.4 Kummer Extension

Then by repeatedly using the construction above, we have that


∑ 4 2
xx 1 2 − 2x41 x2 x3 − 2x31 x32 + 2x31 x22 x3 − 6x21 x22 x23
sym

= e21 e22 + −4x41 x2 x3 − 4x31 x32 − 6x31 x22 x3 − 21x21 x22 x23
sym

= e21 e22 − 4e31 e3 + −4x31 x32 + 6x31 x22 x3 + 3x21 x22 x23
sym

= e21 e22 − 4e31 e3 − 4e32 + 18x31 x22 x3 + 27x21 x22 x23
sym

= e21 e22 − 4e31 e3 − 4e32 + 18e1 e2 e3 − 27x21 x22 x23
sym

= e21 e22 − 4e31 e3 − 4e32 + 18e1 e2 e3 − 27e23 .

The elementary symmetric polynomial is really powerful although it does not involve
anything hard. For example, we can construct, with the elementary polynomial, a poly-
nomial that has a+b as a root if we know the minimal polynomials of a and b. See exercise
for some results that can be obtained using the elementary symmetric polynomials.
Now that we know that K(x1 , . . . , xn )/K(e1 , . . . , en ) is the most general setting,
the problem becomes: What are the n’s such that there is a radical series connecting
K(x1 , . . . , xn ) and K(e1 , . . . , en )?

12.4 Kummer Extension


To understand radical series, let’s investigate the properties of simple radical extensions,
which is the building block of radical series.
√ Since Galois extension is more preferable,
we will assume that in the extension F [ n a]/F , the base field F contains the primitive
n-th roots of unity. In this way, the extension is Galois, and so we can consider its Galois
group. We can actually generalize this a bit more.
Definition 12.4.1. An extension K/F is a Kummer extension if there exists n ∈ N and
a1 , . . . , am ∈ F such that F contains the primitive n-th roots and that K is the splitting
field of (tn − a1 )(tn − a2 ) · · · (tn − am ).

Property 12.4.1. Any Kummer extension is Galois.

Sketch of Proof. We can first show that the characteristic of F does not divide n. Let ξ
be a primitive n-th root. Then we know that for every s ∈ Z, we have ξ s = 1 if and only
if n|s. Now if the characteristic of F is a prime factor p of n, then
Ä äp
0 = ξ n − 1 = ξ n/p − 1 ,

and so ξ n/p = 1, which is a contradiction. Therefore the characteristic of F does not


divide n.
Now it is clear that we can assume that a1 , . . . , am are nonzero and distinct. To show
that K/F is Galois, it suffices to show that (tn − a1 )(tn − a2 ) · · · (tn − am ) are separable.
Clearly tn − ai and tn − aj do not share any roots for i ̸= j, so it suffices to show that
tn − ai is separable. The formal derivative of it is ntn−1 ̸= 0 which only has 0 as roots.
The assumption that ai ̸= 0 then implies that tn − ai is separable, as desired.

194
Hung-Hsun, Yu 12.4 Kummer Extension

Therefore it makes sense to talk about the Galois group of Kummer extensions. It
turns out that the Galois group behaves really nicely.
Property 12.4.2. Suppose that K/F is a Kummer extension where K is the splitting
field of (tn − a1 )(tn − a2 ) · · · (tn − am ) over F . Then Gal(K/F ) is an abelian group such
that g n = 1 for every g ∈ Gal(K/F ).

Sketch of Proof. Let ξ be a primitive n-th root of unity and αi be a root of tn − ai . Then
the roots of tn − ai are αi , ξαi , . . . , ξ n−1 αi . Therefore any automorphism σ ∈ Gal(K/F )
is determined by the value of σ(αi ), and the value of σ(αi ) can only be ξ j αi for some j.
Now suppose that σ(αi ) = ξ pi αi and γ(αi ) = ξ qi αi for any two automorphisms σ, γ, then

(σ ◦ γ)(αi ) = σ(γ(αi )) = σ(ξ qi αi ) = ξ qi σ(αi ) = ξ pi +qi αi .

This shows that the map Gal(K/F ) → (Z/nZ)m sending σ to (p1 , p2 , . . . , pm ) is a


monomorphism. In particular, Gal(K/F ) is abelian and for any g ∈ Gal(K/F ) we
have that g n = 1.

Corollary 12.4.1. Suppose that f is a solvable polynomial over F where char F = 0


and F contains all the roots of unity. Then the Galois group G of f satisfies the following:
there exists a chain of subgroups G = G0 ⊇ G1 ⊇ · · · ⊇ Gn = 1 such that Gi+1 is a normal
subgroup of Gi and that Gi /Gi+1 is abelian.

Sketch of Proof. Suppose that F = F0 ⊆ F1 ⊆ · · · ⊆ Fn is a radical series such that Fn


contains the splitting field K of f over F , and also suppose that Fi+1 = Fi [αi ] where
αini ∈ Fi . Now Fn /F is not necessarily Galois. To fix this, let E0 = F0 , and we are going
to construct Ei inductively such that Ei+1 /Ei is Kummer, Ei+1 /Fi+1 is an extension, and
that Ei /F is Galois. Suppose that Ei is constructed, then let Ei+1 be the splitting field
of

(tni − σ(αi ))
σ∈Gal(Ei /F )

over Ei . It is clear that Ei+1 /Ei is Kummer, and that this polynomial is fixed by any
element in Gal(Ei /F ). Therefore this polynomial has coefficients in F , which shows that
Ei+1 /F is Galois. Since αi ∈ Ei+1 we also have Fi+1 ⊆ Ei+1 . Therefore at the end we
will have K ⊆ Fn ⊆ En and that En /F is Galois.
Let G′i = Gal(En /Ei ). Then by the fundamental theorem of Galois theory we know
that G′i /G′i+1 ∼
= Gal(Ei+1 /Ei ), and so G′i /G′i+1 is abelian. Let N = Gal(En /K), then
the theorem also implies that G′0 /N ∼ = G. Let Gi = (G′i N )/N , then the surjective
homomorphism G′i → Gi → Gi /Gi+1 induces a homomorphism G′i /G′i+1 → Gi /Gi+1
since Gi+1 is sent to 1. This shows that Gi /Gi+1 is abelian. Since G0 = G and Gn = 1,
we are done.

We see that if f is solvable, then its Galois group satisfies some properties. For
simplicity, let’s also call this property solvable.
Definition 12.4.2. A group G is solvable if there exists a chain of subgroups G =
G0 ⊇ G1 ⊇ · · · ⊇ Gn = 1 such that Gi+1 is a normal subgroup of Gi and that Gi /Gi+1 is
abelian.

195
Hung-Hsun, Yu 12.5 Characterization of Kummer Extension

With this definition, we can rewrite the corallary as follows: Suppose that f is a
solvable polynomial over F where char F = 0 and F contains all the roots of unity, then
the Galois group of f is solvable.
The properties of solvable group are relatively complicated. We will only use the
following property and the definition throughout the note. For more properties, see the
exercise.
Property 12.4.3. A subgroup of a solvable subgroup is solvable. The image of a
solvable subgroup under a homomorphism is solvable.

Sketch of Proof. The proof of the second statement is already presented in the proof
of the corollary. For the first one, let H be a subgroup of a solvable group G where
G = G0 ⊇ · · · ⊇ Gn = 1 is a certificate of the solvability of G. Let Hi = Gi ∩ H, then
by the second isomorphism theorem we have that Hi Gi+1 /Gi+1 ∼ = Hi /(Hi ∩ Gi+1 ). Since
Hi ∩ Gi+1 = H ∩ Gi ∩ Gi+1 = H ∩ Gi+1 = Hi+1 and Hi Gi+1 ⊆ Gi , the quotient Hi /Hi+1
is a subgroup of Gi /Gi+1 and hence is abelian. Therefore H is also solvable.

12.5 Characterization of Kummer Extension


We have showed that under certain circumstances, solvability of a polynomial implies the
solvability of its Galois group. We can show that the converse is also true. To this end,
we want to show that under certain circumstances, Galois extension with abelian Galois
group is Kummer. Let’s start with an easy example.
Example 12.5.1. Let p be a prime, and F be √ a field containing a primitive p-th root ξ.
√ an arbitrary element a ∈ F such that a ̸∈ F . Then the √
Choose degree of the extension
p

F [ p a]/F is p, and the extension is also Galois. Since | Gal(F [ p a]/F√)| = p, we have
that√ the Galois
√ group is Z/pZ, and the automorphism σi acts on F [ p a]/F such that
i
σi ( p a) = ξ p a.
Now the question is if we can recover a given√the Galois group. √ To get more insight,
let’s consider some arbitrary element x = c0 + c1 a + · · · + cp−1 a for ci ∈ F . Then
p p p−1

√ √ √ √
σi (x) = σi (c0 + c1 p a + · · · + cp−1 ap−1 ) = c0 + c1 ξ i p a + · · · + ξ i(p−1) ap−1 .
p p


Therefore to extract the term c1 p a, we can consider
∑ σi (x) √
i
= pc1 p a,
i∈Z/pZ
ξ

and so we recover a up to a factor in F as long as pc1 p a ̸= 0. By √ Lemma 11.6.1 we can
choose x such that the result is nonzero, and so we can recover a with this procedure.
p

We can furthermore generalize this procedure.


Property 12.5.1. Suppose that K/F is a finite Galois extension and ϕ : Gal(K/F ) →
F × is a homomorphism. Then there exists a ∈ K × such that for every σ ∈ Gal(K/F ),
we have σ(a)/a = ϕ(σ).

Sketch of Proof. For every x ∈ K, consider


∑ σ ′ (x)
a= .
σ ′ ∈Gal(K/F )
ϕ(σ ′ )

196
Hung-Hsun, Yu 12.5 Characterization of Kummer Extension

Then
∑ σ(σ ′ (x)) ∑ (σ ◦ σ ′ )(x)ϕ(σ)
σ(a) = = = ϕ(σ)a,
σ ′ ∈Gal(K/F )
ϕ(σ ′ ) σ◦σ ′ ∈Gal(K/F )
ϕ(σ ◦ σ ′ )
so it remains to choose x such that a ̸= 0. The existence of such x is guaranteed by
Lemma 11.6.1.
With this, we are ready to prove the converse of Property 12.4.2, which gives another
characterization of Kummer extension.
Theorem 12.5.1. Suppose that K/F is a finite Galois extension and n is a positive
integer such that the following holds:
(i) F contains a primitive n-th root ξ;
(ii) Gal(K/F ) is abelian;
(iii) Every element in Gal(K/F ) has an order dividing n.
Then K/F is Kummer.

Sketch of Proof. By the fundamental theorem of finite abelian group, the Galois group is
isomorphic to some Z/n1 Z × · · · × Z/nm Z. By the condition (iii), we have that ni |n for
any i = 1, . . . , m. For each i, we know that ξ n/ni is a primitive ni -th root. Therefore we
can consider a homomorphism ϕi : Gal(K/F ) → F such that
( n )gi
ϕi ((g1 , g2 , . . . , gm )) = ξ ni ∀(g1 , . . . , gm ) ∈ Z/n1 Z × · · · × Z/nm Z.

Then we know that there exists 0 ̸= αi ∈ K such that σ(αi ) = ϕi (σ)αi for all σ ∈
Gal(K/F ). As a consequence,
Ç ån
σ(αin ) σ(αi )
= = ϕi (σ)n = ϕi (σ n ) = 1.
αin αi
This shows that ai = αin is fixed by every automorphism, which then shows that ai ∈
F . It remains to show that K is the splitting field of (tn − a1 )(tn − a2 ) · · · (tn − am )
over F , or equivalently, K = F [α1 , . . . , αm ]. Let σ = (g1 , . . . , gm ) be an element in
Gal(K/F [α1 , . . . , αm ]) ≤ Gal(K/F ). If there exists gi such that gi ̸≡ 0 mod ni , then
σ(αi ) ( n )gi
= ϕi (σ) = ξ ni ̸= 1
αi
since ξ n/ni is a primitive ni -th root. This is a contradiction with the assumption that
σ ∈ Gal(K/F [α1 , . . . , αm ]). Therefore the Galois group Gal(K/F [α1 , . . . , αm ]) is trivial,
which shows that K = F [α1 , . . . , αm ], as desired.
Corollary 12.5.1. Suppose that f is a polynomial over a field F with characteristic
zero such that the Galois group G of f is solvable and F contains the primitive |G|-th
roots of unity. Then f is solvable over F .

Sketch of Proof. Suppose that G = G0 ⊇ G1 ⊇ · · · ⊇ Gn = 1 certifies the solvability of G,


and let K be the splitting field of f over F . Then K/F is Galois. Let Fi be the fixed field
of Gi , then by the fundamental theorem of Galois theory we know that Gal(Fi+1 /Fi ) =
Gi /Gi+1 is abelian and | Gal(Fi+1 /Fi )| divides |G|. Therefore the extension Fi+1 /Fi is
Kummer, which shows that we can split this extension into a series of simple radical
extensions. This shows that K and F can be connected by a radical series, which shows
that f is solvable over F .

197
Hung-Hsun, Yu 12.6 Natural Irrationality

At this point we have more or less characterized the solvability of a polynomial over
a field with characteristic 0. The caveat here is that we have to assume that the field
contains sufficient roots of unity, and we will remove this assumption in the next section.

12.6 Natural Irrationality


Let F be a field with characteristic 0, and let F ′ be the extension of F obtained by
adjoining all the roots of unity. For every polynomial f over F , we know that f is
solvable over F ′ if and only if the Galois group of f over F ′ is solvable. However, we
would like to consider the Galois group of f over F instead of F ′ . To achieve this, we
want relate the Galois group of f over F ′ to the Galois group of f over F . This can be
seen as a “base change”, which simply changes the base field that we are working with.
We can generalize this a bit more. Let f be a polynomial over F whose irreducible
factors are separable, K be the splitting field of f over F , F ′ be an arbitrary extension
of F , and K ′ be the splitting field of f over F ′ . Suppose that the roots of f is α1 , . . . , αs ,
then K ′ = F ′ [α1 , . . . , αs ]. By definition, we know that F [α1 , . . . , αs ] is a splitting field
of f over F and thus is isomorphic to K. As a consequence, we usually identify K with
F [α1 , . . . , αs ] and see K as a subfield of K ′ . With these notations, we have the following:
Theorem 12.6.1. (Natural Irrationality/Base change for Galois theory) Let f be a
polynomial over F whose irreducible factors are separable, F ′ be an extension of F , K ′
be the splitting field of f over F ′ and K a subfield of K ′ that is the splitting field of f
over F . Then Gal(K ′ /F ′ ) is isomorphic to the subgroup of Gal(K/F ) whose fixed field
is K ∩ F ′ .

Sketch of Proof. Let’s first prove that Gal(K ′ /F ′ ) is isomorphic to the subgroup of Gal(K/F ).
To show this, let’s first construct a natural homomorphism Gal(K ′ /F ′ ) → Gal(K/F )
given by restricting on K. Therefore we have to show that every automorphism σ of
K ′ /F ′ maps K to itself and fixes F . Let α1 , . . . , αs be the roots of f , then σ permutes
the roots α1 , . . . , αs . Moreover, since σ fixes F ′ , it also fixes F . Therefore σ sends
K = F [α1 , . . . , αs ] to itself and fixes F .
We also have to show that this natural homomorphism is injective. Suppose that σ
fixes every element in K, then σ fixes α1 , . . . , αs , and so σ also fixes every element in K ′ .
This implies the injectivitiy, and so Gal(K ′ /F ′ ) is isomorphic to a subgroup of Gal(K/F ).
It remains to show that the fixed field of this subgroup is K ∩ F ′ . Clearly every element
in K ∩ F ′ is fixed. On the other hand, every element in K that is not in K ∩ F ′ does not
belong to F ′ , and hence is moved by one automorphism of K ′ /F ′ . Therefore the fixed
field is K ∩ F ′ , as desired.

This allows us to remove the condition that F contains enough roots of unity, and so
finally we have the following:
Theorem 12.6.2. A polynomial f over a field F with characteristic 0 is solvable if and
only if its Galois group over F is solvable.

Sketch of Proof. If the Galois group G of f over F is solvable, then let F ′ = F [ξ] where
ξ is the |G|-th root of unity. By the natural irrationality we know that the Galois group
of f over F ′ is a subgroup of G and hence is solvable. Therefore f is solvable over F ′ .
Since F ′ /F is a simple radical extension, we know that f is also solvable over F .

198
Hung-Hsun, Yu 12.6 Natural Irrationality

Conversely, if f is solvable over F , then we can assume that there is a radical series
F = F0 ⊆ · · · ⊆ Fn such that f splits completely in Fn . We can extend this to F ⊆ F0 [ξ] ⊆
· · · ⊆ Fn [ξ] for some suitable root of unity ξ. We then know that Gal(Fn [ξ]/F ) is solvable
since Gal(Fn [ξ]/F0 [ξ]) is solvable and Gal(F0 [ξ]/F ) is abelian. Let K be the subfield of
Fn [ξ] that is the splitting field of f over F , then Gal(K/F ) = Gal(F0 [ξ]/F )/ Gal(F0 [ξ]/K)
is solvable, as desired.

Now we can prove that there is no radical formula for a general polynomial of degree
at least 5.
Theorem 12.6.3. Let F (e1 , . . . , en ) be the field of symmetric rational function where
F is of characteristic 0. Then the polynomial tn − e1 tn−1 + · · · + (−1)n en is not solvable
for n ≥ 5.

Sketch of Proof. We know that the Galois group of the polynomial is Sn . If Sn is solvable,
then there exists Sn = G0 ⊇ · · · ⊇ Gn = 1 such that Gi /Gi+1 is abelian. Since An is
simple, the only nontrivial normal subgroup of Sn is An , which shows that G1 = An .
However An is simple and non-abelian, which shows that there does not exist G2 such
that G1 /G2 is abelian and that G1 ̸= G2 . This is a contradiction. Therefore Sn is not
solvable, and so the polynomial is not solvable.

We show that in general solutions in radicals for polynomials of degree at least 5 do


not exist. In practice however, we usually work with the base field Q. At this point it
is not clear whether there is a polynomial over Q that is not solvable. The following
theorem helps us to construct some concrete examples.
Theorem 12.6.4. If f is an irreducible polynomial over F that is a subfield of R, the
degree of f is a prime p, and there are exact two roots of f that are not real, then the
Galois group of f over F is Sp .

Sketch of Proof. Let K be the splitting field of f and let α1 , . . . , αp be the roots of f .
Then Gal(K/F ) permutes the roots of α1 , . . . , αp and thus can be seen as a subgroup of
Sp . We know that [F [α1 ] : F ] = p, and so p| Gal(K/F ). By Cauchy’s theorem there is an
element of Gal(K/F ) that has order p. This shows that there is a p-cycle in Gal(K/F ).
We also know that conjugation is an automorphism of K/F , and since there are exactly
two roots that are not real, this corresponds to a transposition in Sp . Since Gal(K/F )
contains a p-cycle and a transposition, one can check that Gal(K/F ) is in fact Sp , as
desired.

Example 12.6.1. By Eisenstein criterion (with p = 2), we know that the polynomial
f (x) = x5 − 4x + 2 is irreducible over Q. Moreover, the derivative of x5 − 4x + 2 is 5x4 − 4,
which has two real roots. This shows that x5 − 4x + 2 has at most three real roots, and
by f (−∞) = −∞, f (−1) = 5, f (1) = −1, f (∞) = ∞ we know that there are actually
three roots. Therefore the Galois group of f over Q is S5 , which is not solvable.

In general, given a finite group, it is difficult to tell if it is some Galois group of some
Galois extension of Q. This is called the inverse Galois problem, and is still unsolved to
my best knowledge.

199
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group

12.7 Radical Formula and Classifying Galois Group


In addition to proving that there are no solutions in radicals for a general quintic polyno-
mial, we can also use the results to give radical formulas for quadratic, cubic and quartic
polynomials. Let’s begin with the easiest case.
Example 12.7.1. Consider the extension F (x1 , x2 )/F (e1 , e2 ) where e1 = x1 + x2 , e2 =
x1 x2 . For simplicity, suppose that char F = 0 and F contains all the roots of unity.
Then the Galois group is S2 = C2 , and so F (x1 , x2 )/F (e1 , e2 ) is Kummer. Following
the procedure we use to prove that this is Kummer, we know that we should first find
a homomorphism from C2 to F (e1 , e2 )× , which we will take as ϕ(1) = 1, ϕ((1 2)) = −1.
Then consider the element
x1 (1 2)(x1 )
+ = x1 − x2 .
1 ϕ((1 2))
This is not zero, and so we know that F (x1 , x2 ) = F (e1 , e2 )[x1 − x2 ], and furthermore,
(x1 − x2 )2 = x21 − 2x1 x2 + x22 = e21 − 4e2 ∈ F (e1 , e2 ).
Therefore we know that
»
e1 + (x1 − x2 ) e1 + e21 − 4e2
x1 = = ,
2 2
which agrees with the well-known quadratic formula.

Example 12.7.2. Now consider the extension F (x1 , x2 , x3 )/F (e1 , e2 , e3 ). The Galois
group is S3 , which is solvable because C3 is abelian, normal in S3 and S3 /C3 ∼ = C2 is
also solvable. Let L be the fixed field of C3 , then we know that F (x1 , x2 , x3 )/L and
L/F (e1 , e2 , e3 ) are both Kummer.
Let’s first show that L/F (e1 , e2 , e3 ) is a simple radical extension. We know that
Gal(L/F (e1 , e2 , e3 )) = S3 /C3 , so we can take sgn : S3 /C3 → {±1} ≤ F (e1 , e2 , e3 )× as the
homomorphism. Consider the element x21 x2 + x22 x3 + x23 x1 . This element is fixed by C3
and hence is in L. The procedure then tells us to consider
∑ σ(x21 x2 + x22 x3 + x23 x1 ) ∑ 2
= x1 x2 − x1 x22 = (x1 − x2 )(x1 − x3 )(x2 − x3 ).
σ∈Gal(L/F (e1 ,e2 ,e3 ))
sgn(σ) cyc

This is nonzero, and so L = F (e1 , e2 , e3 )[(x1 − x2 )(x1 − x3 )(x2 − x3 )]. Moreover, [(x1 −
x2 )(x1 − x3 )(x2 − x3 )]2 is symmetric and hence lies in F (e1 , e2 , e3 ). The calculation before
tells us that
[(x1 − x2 )(x1 − x3 )(x2 − x3 )]2 = e21 e22 − 4e31 e3 − 4e32 + 18e1 e2 e3 − 27e23 ,
and so
»
(x1 − x2 )(x1 − x3 )(x2 − x3 ) = e21 e22 − 4e31 e3 − 4e32 + 18e1 e2 e3 − 27e23 .

For simplicity, denote this by ∆.
Now let’s show explicitly that F (x1 , x2 , x3 )/L is a simple radical extension. Let ω be
the primitive third root of unity, and let ϕ : C3 → L× be a homomorphism such that
ϕ((1 2 3)) = ω −1 . Then consider the element
∑ σ(x1 )
= x1 + ωx2 + ω 2 x3 .
σ∈C3 ϕ(σ)

200
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group

This is nonzero and so F (x1 , x2 , x3 ) = L[x1 + ωx2 + ω 2 x3 ]. Besides, (x1 + ωx2 + ω 2 x3 )3


is fixed by C3 and therefore lies in L. Expanding this, we get that
∑ 3
(x1 + ωx2 + ω 2 x3 )3 = x 1 + 3ωx21 x2 + 3ω 2 x1 x22 + 6x1 x2 x3
cyc

3(ω − ω 2 ) √ ∑ 3 3(ω + ω 2 ) 2
= ∆+ x1 + x1 x2 + 6x1 x2 x3
2 sym 2

3 −3 √ 9 27
= ∆ + e31 − e1 e2 + e3 ,
2 2 2
and so √ √
3 −3 √ 3 9 27
x1 + ωx2 + ω x3 = 2
∆ + e31 − e1 e2 + e3 .
2 2 2

Now we want to express x1 in terms of e1 , e2 , e3 and ∆. It is possible to express x1 as
a polynomial of x1 + ωx2 + ω 2 x3 over L, but there is an easier way. We can also consider
√ √
3 3 −3 √ 9 27
2
x1 + ω x2 + ωx3 = − ∆ + e31 − e1 e2 + e3 ,
2 2 2
and so (x1 + ωx2√+ ω 2 x3 ) + (x1 + ω 2 x2 + ωx3 ) = 2x1 − x2 − x3 can be expressed in terms
of e1 , e2 , e3 and ∆. Adding e1 = x1 + x2 + x3 and dividing the result by 3, we obtain
an expression of x1 . One can verify that this agrees with the cubic formula we derived
before.
Note that since there are three choices for each cube root, we will obtain nine roots
in this way. This phenomenon also appeared before when we tried to derive the radical
formula for the simpler polynomial y 3 + py + q. To fix this, note that

(x1 + ωx2 + ω 2 x3 )(x1 + ω 2 x2 + ωx3 ) = x21 + x22 + x23 − x1 x2 − x2 x3 − x3 x1 = e21 − 3e2 .

Therefore once we determine what x1 + ω 2 x2 + ωx3 should be, there is only one choice
for x1 + ω 2 x2 + ωx3 , which shows that we can only obtain three roots in this way.

Example 12.7.3. Lastly, let’s consider the extension F (x1 , x2 , x3 , x4 )/F (e1 , e2 , e3 , e4 ).
The Galois group is S4 , and this is solvable because of the series S4 ⊇ A4 ⊇ K4 ⊇ 1.
Following the same line, let L be the fixed field of A4 and K be √ the fixed field of K4 .
With the same argument, we can show that L = F (e1 , e2 , e3 , e4 )[ ∆] where ∆ is

[(x1 − x2 )(x1 − x3 )(x1 − x4 )(x2 − x3 )(x2 − x4 )(x3 − x4 )]2 ∈ F (e1 , e2 , e3 , e4 ).

To show that K/L is a simple radical extension, let ϕ : A4 /K4 → L× be a homomor-


phism sending (1 2 3) to ω −1 . Then consider the element
∑ σ(x1 x2 + x3 x4 )
= (x1 x2 + x3 x4 ) + ω(x1 x4 + x2 x3 ) + ω 2 (x1 x3 + x2 x4 ).
σ∈A4 /K4
ϕ(σ)

This is nonzero, and so K = L[(x1 x2 + x3 x4 ) + ω(x1 x4 + x2 x3 ) + ω 2 (x1 x3 + x2 x4 )]. One


can also show that

[(x1 x2 + x3 x4 ) + ω(x1 x4 + x2 x3 ) + ω 2 (x1 x3 + x2 x4 )]3

201
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group

is
√ fixed by A4 and therefore is in A4 . We can then express it in terms of e1 , e2 , e3 , e4 and
2
∆. We can do the same for (x1 x2 + x3 x4 ) + ω (x1 x4 + x2 x3 ) + ω(x1 x3 + x2 x4 ), which
allows us to compute x1 x2 + x3 x4 , x1 x4 + x2 x3 and x1 x3 + x2 x4 .
Note that the above steps are similar to what we did for solving cubic equations. In
fact, the above steps are equivalent to solving the cubic polynomial

(t − (x1 x2 + x3 x4 ))(t − (x1 x4 + x2 x3 ))(t − (x1 x3 + x2 x4 )),

which can be rewritten as

t3 − e2 t2 + (e1 e3 − 4e4 )t − (e21 e4 + e23 − 4e2 e4 ) ∈ F (e1 , e2 , e3 , e4 )[t].

This is usually called the resolvent cubic of the original quartic. To solve the original
quartic, we usually solve the resolvent cubic first as an intermediate step. For simplicity,
let’s denote x1 x2 + x3 x4 , x1 x4 + x2 x3 , x1 x3 + x2 x4 by β1 , β2 , β3 , respectively.
Lastly, let’s show explicitly that F (x1 , x2 , x3 , x4 )/K is Kummer. Note that the Galois
group is K4 , which is not cyclic anymore. Therefore we have to consider more than one
homomorphism from K4 to F (x1 , x2 , x3 , x4 )× . Let’s simply consider all the nontrivial
homomorphisms. Following the same procedure, we get three elements

x1 + x2 − x3 − x4 , x 1 − x2 − x3 + x4 , x 1 − x2 + x3 − x4 .

The squares are


e21 − 4e2 + 4β1 , e21 − 4e2 + 4β2 , e21 − 4e2 + 4β3 ,
and so we can obtain the expressions for x1 +x2 −x3 −x4 , x1 −x2 −x3 +x4 , x1 −x2 +x3 −x4 .
Together with e1 = x1 + x2 + x3 + x4 , we can solve the system of linear equations and
obtain the expressions for x1 , x2 , x3 and x4 . The actual formula is really complicated so I
am not going to write it down here, but one can write down the exact formula following
this procedure.
Note that we have to make three choices of square roots to determine the value of
x1 + x2 − x3 − x4 , x1 − x2 − x3 + x4 and x1 − x2 + x3 − x4 , which gives rise to eight roots
instead of four. To fix this, we can consider
∑ 3
(x1 +x2 −x3 −x4 )(x1 −x2 −x3 +x4 )(x1 −x2 +x3 −x4 ) = x −x2 x
1 1 2 +2x1 x2 x3 = e31 −4e1 e2 +8e3 ,
sym

which reduce the number of choices down to four, as desired.

Example 12.7.4. To give a demonstration of how this works, let’s solve the equation

x4 + x2 − 2x + 1 = 0.

Its resolvent cubic is

t3 − t2 + (0 · 2 − 4 · 1)t − (02 · 1 + 22 − 4 · 1 · 1) = t3 − t2 − 4t,

whose roots are


1 1√
0, ± 17.
2 2
202
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group

Let α1 , α2 , α3 , α4 be the roots such that

0 = β1 = α1 α2 + α3 α4 ,
1 1√
+ 17 = β2 = α1 α4 + α2 α3 ,
2 2
1 1√
− 17 = β3 = α1 α3 + α2 α4 .
2 2
Then

(α1 + α2 − α3 − α4 )2 = 02 − 4 · 1 + 4β1 = −4,



(α1 − α2 − α3 + α4 )2 = 02 − 4 · 1 + 4β2 = −2 + 2 17.
» √
Therefore we can choose α1 + α2 − α3 − α4 = 2i, α1 − α2 − α3 + α4 = −2 + 2 17. This
automatically gives us that
» √
03 − 4 · 0 · 1 + 8 · 2
α1 − α2 + α3 − α4 = » √ = −i 2 + 2 17.
2i · −2 + 2 17
Then we can solve the system of linear equations to obtain that
» √ » √ Ç
0 + 2i + −2 + 2 17 − i 2 + 2 17 1» √ 1 1» √ å
α1 = = −2 + 2 17 + − 2 + 2 17 i.
4 4 2 4
This is indeed a root since
Ç
1 1» √ 1» √ å
α12 =− + 2 + 2 17 + −1 + −2 + 2 17 i = i(α1 − 1),
2 4 4
which shows that
α14 = −(α1 − 1)2 = −α12 + 2α1 − 1,
as desired. The other three roots can be solved similarly.

In the above discussions, we’ve seen the product of differences of roots several times.
This quantity actually reveals something about the Galois group.
Definition 12.7.1. Let f be a polynomial of degree n over F , and let K be the splitting
field of f over F . Let α1 , . . . , αn be the n roots of f in K. Then the discriminant of f is
defined as ∏
∆ = (αi − αj )2 ,
i<j

which actually is in F .

Property 12.7.1. A polynomial f is separable if and only if its discriminant is not 0.

This again shows that to determine if a polynomial is separable or not, we do not


need to work with the extension. However, taking the derivative and compute the gcd is
still a better method which involves less computation.
Property 12.7.2. Let f be a polynomial of degree n over a field F with characteristic 0.
Furthermore, assume that ∆ ̸= 0. Then the Galois group of f , when
√ seen as a subgroup
of Sn permuting the n roots, is a subgroup of An if and only if ∆ exists in F .

203
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group


Sketch of Proof. If ∆ ∈ F , then we know that

(αi − αj ) ∈ F,
i<j

which shows that every automorphism fixes this element.


√ √Since when permuted by Sn ,
the fixed group of this element is An (note that − ∆ ̸= ∆), we know that the Galois
group of f lies in An . Conversely, if the Galois group of f is a subgroup of An , then every
automorphism fixes this element and therefore is in F .

This can furthermore be generalized.


Definition 12.7.2. Let f be a polynomial of degree n over a field F with characteristic
0, and let α1 , . . . , αn be the n roots. A polynomial p in n variables x1 , . . . , xn is called
a resolvent invariant for a subgroup G of Sn if the fixed group of p, with respect to the
group action of Sn permuting the n indeterminates x1 , . . . , xn , is exactly G. A resolvent
for a subgroup G of Sn is the polynomial of the form

(t − p′ (α1 , . . . , αn ))
p′ ∈Op

where p is a resolvent invariant for G and Op is the orbit of p under the group action of
Sn . This polynomial has degree [Sn : G] and coefficients in F .

Example 12.7.5. The fixed group of



(xi − xj )
i<j

is An , and the resolvent that it gives is


∏ ∏
(t − (αi − αj ))(t − (− (αi − αj ))) = t2 − ∆.
i<j i<j

In addition, when n = 4, the fixed group of x1 x2 + x3 x4 is ⟨(1 2), (1 2)(3 4)⟩ ∼


= D4 .
Therefore the resolvent cubic is the resolvent for D4 .

We can then simply generalize the argument in Property 12.7.2 and get that
Property 12.7.3. Let f be a polynomial of degree n over a field F with characteristic
0, and let p be a resolvent for a subgroup G of Sn . Then the Galois group of f is a
subgroup of (some conjugate) of G implies that p has a root in F , and p has a simple
root in F implies that the Galois group of f is a subgroup of G.

To conclude this chapter, let’s classify the Galois group of all irreducible polynomials
of degree less than 5. Let n = deg f . The case when n = 1, 2 are trivial, so let’s begin
with n = 3. As usual, we assume that we are working over a field F with characteristic
0.
By the transitivity of automorphisms, we know that when n = 3, the Galois group of
f can be C3 = A3 or S3 . Therefore it suffices to see if the discriminant has a square root
in F : if it does, then the Galois group is C3 ; if it does not, then the Galois group is S3 .

204
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group

Example 12.7.6. We know that x3 − 2 is irreducible over Q. The discriminant is

e21 e22 − 4e31 e2 − 4e32 + 18e1 e2 e3 − 27e23 = −108,

which has no square roots in Q. Therefore the Galois group of x3 − 2 is S3 .


Now consider the polynomial x3 − 7x + 7. This is irreducible over Q by Eisenstein
criterion, and the discriminant is

e21 e22 − 4e31 e2 − 4e32 + 18e1 e2 e3 − 27e23 = −4 · (−7)3 − 27 · (−7)2 = 72 ,

which shows that its Galois group is C3 .

Now let’s proceed to the case n = 4. By the transitivity, the Galois group can only
be S4 , A4 , D4 , K4 and C4 . We can first classify by ∆ and see if it is a square in F . If it is,
then the Galois group of f can only be A4 or K4 . If it does not, then the Galois group of
f can only be S4 , D4 or C4 . We can also see if the resolvent cubic g of f has a root in F
or not. Note that the discriminant of g is the same as the discriminant of f , which shows
that g is separable. Therefore g has a root if and only if the Galois group is a subgroup
of D4 . In other words, if g is irreducible, then the Galois group of f can only be S4 or
A4 , and if g is reducible, then the Galois group of f can only be D4 , C4 or K4 .
That is a lot, so let’s put it into a table.
g is irreducible
g is reducible
∆ is a square A4 K4
∆ is not a square S4 C4 or D4

Now it remains to differentiate C4 and D4 . In this case, ∆ ̸∈ F and g has a root in
F . If g splits completely, then the Galois group of f fixes α1 α2 + α3 α4 , α1 α4 + α2 α3 and
α1 α3 + α2 α4 , which shows that the Galois group of F is K4 . Therefore we will assume
that g has exactly one root, which we will assume to be β = α1 α2 + α3 α4 . To differentiate
C4 and D4 , we just need to see if (1 2) is in the Galois group.
Consider γ = α1 α2 − α3 α4 . Then

γ 2 = β 2 − 4e4 (α1 , α2 , α3 , α4 ) ∈ F,
√ √ √
and so γ 2 ∆ ∈ F . Now it is clear that C4 always fixes γ ∆, and (1 2)(γ ∆) = −γ ∆.
As
√ a consequence, when γ ̸= 0, we have that the Galois group of f is C4 if and only if
γ ∆ ∈ F , i.e. γ ∆ is a square in F .
2

We can also consider ζ = α1 + α2 − α3 − α4 . We have that

ζ 2 = e1 (α1 , α2 , α3 , α4 )2 − 4e2 (α1 , α2 , α3 , α4 ) + 4β ∈ F,


√ √ √
and C4 always fixes ζ ∆, (1 2)(ζ ∆) = −ζ ∆. Therefore when ζ ̸= 0, we have that
the Galois group of f is C4 if and only if ζ 2 ∆ is a square in F .
What if γ = ζ = 0? In fact, this will not occur if f is irreducible. This is because
that if γ = ζ = 0, then
β e1 (α1 , α2 , α3 , α4 )
α1 α2 = ∈ F, α1 + α2 = ∈ F.
2 2
This then show that (t − α1 )(t − α2 ) is a polynomial over F that divides f , which is
a contradiction. This then concludes the classification of Galois group of irreducible
polynomials with degree 4.

205
Hung-Hsun, Yu 12.8 Random Problem Set

12.8 Random Problem Set


1. (12.1) Show that a regular 7-gon is not constructible.

2. (12.2) Show that the polynomial t2 + t + x over the field F2 (x) is not solvable.

3. (12.3) We have shown that if α, β are both algebraic over F , then F [α, β] is an
algebraic extension of F by some linear algebra argument. The consequence of
this is that if α, β are both algebraic over F , then α + β, αβ are also algebraic
F . However, the proof is not constructive in the sense that we don’t know what
polynomial relation α + β, αβ actually satisfies. Use the elementary symmetric
polynomials to show this in a more
√ constructive
√ way. Using this, find a polynomial
3
with rational coefficients having 2 + 3 as a root.
(Hint: Consider the conjugates of α and β. Given these conjugates, what might
the conjugates of α + β or αβ be?)

4. (12.3) Let B be a commutative ring with 1 and A be a subring of B such that


1A = 1B . Using the elementary symmetric polynomials, show that the elements of
B that are integral over A form a ring with 1. This ring with 1 is called the integral
closure of A in B.
(Remark: As in the case of algebraic closure, there is an alternative proof of this
using modules, which is the ring-analogy of linear algebra.)

5. (12.3) Using the elementary symmetric polynomials, show that for any integer n,
show that ∑ n
x
x∈F×
q

is 0 if q − 1 ∤ n and is −1 if q − 1|n.
(Hint: The roots of tq−1 − 1 are exactly the elements in F×
q .)
∑ k
6. (12.3) Let pk (x1 , . . . , xn ) = x . i Show that for every field F with characteristic 0,

F [x1 , . . . , xn ]Sn = F [e1 , . . . , en ] = F [p1 , . . . , pn ].

Show that this does not hold if F is replaced with a commutative ring with 1 (say,
Z) or a field with nonzero characteristic (say, F2 ).

7. (12.4) For any two elements g, h in G, the commutator of g, h is ghg −1 h−1 , which
is denoted by [g, h]. Similar definition works for subsets of G. We also inductively
define Gk as [G, Gk−1 ], where G1 = G. If there exists n ∈ N such that Gn = 1, then
we say that G is nilpotent. Show that any nilpotent group is solvable. Also show
that for every proper subgroup of H, the normalizer of H is strictly larger than H.
(Hint: As an intermediate step, show that for every h there exists k such that
for every g, the expression [[· · · [[g, h], h], · · · , h], h] = 1 where there are k pairs of
brackets.)

8. (12.4) The upper central series of a group G is a series 1 = Z0 ⊆ Z1 ⊆ · · · such


that Zi+1 is the inverse of the center of G/Zi . Show that for every finite group G,

206
Hung-Hsun, Yu 12.8 Random Problem Set

it is nilpotent if and only if the upper central series ends at G. Using this, show
that every p-group is nilpotent and hence solvable.
(Hint: Since G is finite, the sequence G1 , G2 , · · · and the upper central series both
stabilize)

9. (12.4) A subgroup of the symmetric group Sn is said to be transitive if for every


i, j ∈ {1, . . . , n} there is an element π in the subgroup such that π(i) = j. Show
that for every prime p, every transitive solvable subgroup of Sp is isomorphic to a
subgroup of the group of linear transformations x 7→ ax + b, a ̸= 0 from Z/pZ to
Z/pZ that contains all the transformation x 7→ x + b.
(Hint: First show that a subgroup of a transitive group is still transitive in Sp .
Then show that if G is abelian and transitive in Sp , then G is isomorphic to Cp .)

10. (12.5) Suppose that K/F , together with a positive integer n, is a Kummer exten-
sion, and A is a subgroup of K × containing the elements a satisfying an ∈ F . Show
that A/F × , An /(F × )n , G and Hom(G, F × ) are isomorphic. Here Hom(G, F × ) is
the group of homomorphisms from G to F × .

11. (12.6) Show that for every cyclic group Cn , there is a Galois extension of Q whose
Galois group is Cn . Use this construction to explicitly give a field K such that
Gal(K/Q) = C3 .
(Hint: Do the case where n = p − 1 first. To show this for general n, you might
need a special case of Dirichlet’s theorem which states that there are infinitely many
primes of the form nk + 1.)

12. (12.7) Let G be a maximal transitive solvable subgroup of S5 . Show that any
resolvent for G is of degree 6. If the original quintic polynomial is irreducible and
the resolvent is separable, show that the Galois group of the quintic polynomial
is solvable if and only if the resolvent has a root in the base field. This somehow
shows that solving a quintic polynomial is really hard, for we have to know how to
find a root for a degree-6 polynomial to tell if a quintic polynomial is solvable.

13. (12.7) Construct irreducible polynomials of degree 4 over Q whose Galois groups
are S4 , A4 , K4 , D4 and C4 .

14. (12.7) Let f be an irreducible polynomial over Q with a real root α. Show that α
is constructible if and only if the order of the Galois group of f is a power of 2.
(Hint: If the order of the Galois group of f is a power of 2, find a subgroup of index
2. If α is constructible, then there is a series of quadratic extension connecting Q
and Q[α]. Extend this so that the extensions become Galois.)

15. (12.7) Construct an irreducible polynomial f of degree 4 over Q such that it has a
real root that is not constructible.

207

You might also like