Algebra
Algebra
Hung-Hsun, Yu
2
Preface
This pdf contains algebra-related material that I have learned so far. Since the only
algebra classes that I have taken are 18.701 (Algera I) and 18.702 (Algebra II) at MIT,
most of the material of this pdf will come from those classes and the textbook they
use. Besides that, there are some other miscellaneous resource that I have consulted
(mostly from the net), either prior to the classes or during the course to gain a better
understanding in algebra, and I will include those into this pdf too.
Since I cannot recall where all the material comes from correctly, and also since this
pdf is not that formal, I will not care a lot about citation in this pdf. However, if the
reader notices that some citation should be made somewhere, I would appreciate it if the
reader can inform me of it so that I can add the citation.
Regarding the format of this pdf, it will primarily be the sketch of important moti-
vation and proofs since my motivation to edit this pdf is to keep a note of what I have
learned. However, I believe that the blank that I have left can be easily filled in by
the readers, and I think that the process of completing the proofs can be helpful for the
readers to learn the material better. If the readers encounter some difficulty completing
the proof, most of the complete proof can be found on the net. That said, the readers are
still encouraged to spend some time working on their own before consulting wikipedia or
google. If the readers find some sketch of proof particularly confusing, please tell me so
that I can improve that.
Finally, hope that my notes can help you understand the beauty of math more and
give you more motivation to learn more by yourselves :)
3
Hung-Hsun, Yu
4
Contents
1 Matrix 9
1.1 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Reduced Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Cofactor Matrix and Invertibility . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Vector Space 33
3.1 Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Linear Transformation and Matrix . . . . . . . . . . . . . . . . . . . . . 36
3.5 Multilinear Alternating Form . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Rank and Nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.8 Another Dimension Formula . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.9 Application: Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . 42
3.10 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Linear Operator 45
4.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Determinant and Characteristic Polynomial . . . . . . . . . . . . . . . . 46
4.3 Invariant Subspace and Eigenspace . . . . . . . . . . . . . . . . . . . . . 47
4.4 Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5
Hung-Hsun, Yu CONTENTS
7 Bilinear Form 85
7.1 Bilinear Form and Dual Space . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Start From Standard Inner Product . . . . . . . . . . . . . . . . . . . . . 86
7.3 Symmetric Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.4 Hermitian Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.5 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.6 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.7 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.8 Positive Definite and Semi-definite Matrices . . . . . . . . . . . . . . . . 95
7.9 Application 1: Quadratic Form and Quadric . . . . . . . . . . . . . . . . 98
7.10 Application 2: Legendre Polynomial . . . . . . . . . . . . . . . . . . . . 100
7.11 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6
Hung-Hsun, Yu CONTENTS
9 Ring 119
9.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.2 Ring Homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
9.3 Subring and Ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.4 Integral Domain and Divisibility . . . . . . . . . . . . . . . . . . . . . . . 123
9.5 Ideal and Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.6 Z is PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.7 Noetherian and Existence of Factorization . . . . . . . . . . . . . . . . . 128
9.8 PID is UFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.9 Polynomial Ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.10 Adjoining an element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.11 Fraction Field and Localization . . . . . . . . . . . . . . . . . . . . . . . 134
9.12 Nullstellensatz and Algebraic Geometry . . . . . . . . . . . . . . . . . . 137
9.13 Random Problem Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7
Hung-Hsun, Yu CONTENTS
8
Chapter 1
Matrix
Matrix is a simple but important tool, so we’ll investigate several properties of matrix at
the beginning of this note. Although all the properties can be discussed in a more general
scenario, it is always not bad to start with concrete examples that we are familiar with,
so the matrices in this chapter will all have complex coefficients.
If the readers feel comfortable to work/compute with matrices, feel free to skip this
section. Things get more interesting when matrices are associated with linear algebra :D.
There is a convention denoting matrices by uppercase letters and the entries by cor-
responding lowercase letters. If no confusion will be created, I will use this convention
throughout the whole note.
Merely recording a two-dimensional array is quite boring, so let’s define some inter-
esting operations on matrices.
Definition 1.1.3. (Matrix addition/subtraction) For two m by n matrices A, B, define
the m by n matrix C = A ± B to be the matrix satisfying cij = aij ± bij for i = 1, . . . , m
and j = 1, . . . , n.
9
Hung-Hsun, Yu 1.1 Basic Operations
for i = 1, . . . , m and j = 1, . . . , l.
The definition of addition is really natural, while one might not see right away the
intention of the definition of multiplication. It will be clear after we introduce the concept
of linear operator.
I used to have a hard time doing matrix multiplication, and I learned a helpful trick
that helped me a lot. To present this trick, let’s do this with a concrete example.
Example 1.1.1. Compute
ñ ô 6 7 8 9
0 1 2
0 1 2 3 .
3 4 5
4 5 6 7
Solution. Step 1. Lift the right matrix upward. Put an empty matrix on the right. Now
you can know what size the resulting matrix should be of!
6 7 8 9
0 1 2 3
ñ ô
4
ñ
5 6 7ô
0 1 2 · · · ·
3 4 5 · · · ·
Step 2. For every blank entry, draw a horizontal line and record the vector that the
line intersects. Draw a vertical line and do the same. For example, if we choose the top
left entry to begin with, then the resulting vectors are (0, 1, 2) and (6, 0, 4)
Step 3. Calculate the dot product/inner product of the vectors and put it in the entry.
For example, the dot product of (0, 1, 2) and (6, 0, 4) is 0 × 6 + 1 × 0 + 2 × 4 = 8, so we
put 8 in the top left entry entry.
6 7 8 9
0 1 2 3
ñ ô ñ
4 5 6 7ô
0 1 2 8 · · ·
3 4 5 · · · ·
If we complete Step 2. and Step 3. for every entry, then the result will be
6 7 8 9
0 1 2 3
ñ ô ñ
4 5 6 7 ô.
0 1 2 8 11 14 17
3 4 5 38 50 62 74
Therefore we have
ñ ô
6 7 8 9 ñ ô
0 1 2 8 11 14 17
0 1 2 3 = .
3 4 5 38 50 62 74
4 5 6 7
■
10
Hung-Hsun, Yu 1.2 Reduced Row Echelon Form
There is a reason that we say that these operations are addition and multiplication:
they satisfy MOST of the properties of addition and multiplication on integers, reals and
complex numbers.
Property 1.1.1. (Associativity and commutativity of matrix addition) For any m by
n matrics A, B, C, the following always holds:
(1) A + (B + C) = (A + B) + C;
(2) A + B = B + A.
The middle-school method tells us that we can do the following two operations for multiple
time to get the answer:
11
Hung-Hsun, Yu 1.3 Determinant
Note that we can write this system of linear equations into Ax = b. Therefore we can
“translate” the above two operations into the language of matrix.
Definition 1.2.1. Define a row operation of a matrix to be one of the three following
operations:
(1) Adding c times the i-th row to the j-th row, where i ̸= j.
(2) Multiplying the i-th row by c, where c ̸= 0.
(3) Exchanging the i-th row and the j-th row, where i ̸= j.
We can apply the middle-school method to reduce any matrix to a specific form by
row operations. This is called the reduced row echelon form.
Definition 1.2.3. We say that a matrix M is in reduced row echelon form if
(1) The first nonzero term in every row is 1. This is called the pivot.
(2) The positions of pivots are strictly increasing.
Theorem 1.2.1. For any matrix, we can do finitely many row operations to make it
be in reduced row echelon form.
Corollary 1.2.1. For any matrix A, there exist elementary matrices E1 , . . . , Ek and a
matrix in reduced row echelon form A′ such that E1 · · · Ek A = A′ .
1.3 Determinant
Determinant is an important attribute of square matrices. Although the definition of
determinant is somewhat weird and ugly, the striking properties that it has will explain
why it appears such early in the note.
There are several possible definition for determinant, and I feel like the one involving
permutation is the easiest one. So let’s first define what a permutation is.
Definition 1.3.1. A permutation π on n elements s1 , . . . , sn is a bijection between S
and itself, where S = {s1 , . . . , sn }.
Since we can label the n elements, we can see permutations as permuting the indices.
Therefore we usually assume that the permutations act on the set {1, . . . , n}.
Definition 1.3.2. The set Sn is the collection of permutations on the index set [n] :=
{1, . . . , n}.
12
Hung-Hsun, Yu 1.3 Determinant
It is clear that for any permutation, we can produce it by switching two elements
for multiple time. The following property shows that the number of exchange is not
arbitrary:
Property 1.3.1. For any permutation π, if we can represent it as a composite of k
transpositions and a composite of k ′ transpositions, then k and k ′ has the same parity.
Sketch of Proof. Consider the parity of the number of inversion pairs (i.e. the pair (i, j)
such that i < j, π(i) > π(j)). Every transposition changes the parity of the number of
inversion pairs.
Property 1.3.2. If π and σ are both permutations on n elements, then the parity of
π ◦ σ is the sum of the parities of π and σ.
∑ ∏
n
(−1)σ aiσ(i) ,
σ∈Sn i=1
The reason to define the determinant in this way will become clear after we know
about linear algebra.
Property 1.3.3. For any n by n matrices A, B and scalar c, we have
(1) det(cA) = cn det(A);
(2) det(In ) = 1;
(3) det(A) = 0 if two rows of A are identical;
(4) det(At ) = det(A)
(5) det(AB) = det(A) det(B).
The last property is really amazing, but it is not easy (or annoying) to prove at this
point. I’ll prove this when we revisit determinant in the world of linear algebra.
It is time-consuming to compute the determinant directly from the formula when the
matrix is big. However, the multiplicativity of determinant makes it possible to compute
it faster.
First, observe that if the entries below the diagonal are all zero, then the only term
that has contribution to the determinant is the product of all diagonal entries. Therefore
we can calculate the determinant in a short time if we manage to reduce the matrix into
an “upper triangular matrix.”
13
Hung-Hsun, Yu 1.4 Cofactor Matrix and Invertibility
Definition 1.3.5. A square matrix is upper triangular if all the entries below the
diagonal are all zero. A square matrix is lower triangular if all the entries above the
diagonal are all zero.
Now for any square matrix A, we know from Corollary 1.2.1 that there exist ele-
mentary matrices E1 , . . . , Ek and a matrix A′ in reduced row echelon form such that
E1 · · · Ek A = A′ . Therefore det(E1 ) · · · det(Ek ) det(A) = det(A′ ). Since A′ must be up-
per triangular (why?), it remains to calculate the determinant of the elementary matrices.
This should not be hard and is left to the readers for exercise.
It is not hard to see that if det(A) = 0 then A is not invertible. We will show that
the converse is also true. To show this, we first define the cofactor matrix.
Definition 1.4.2. For any n by n matrix A, the cofactor matrix of A, denoted by
cof(A), is the matrix B such that bij = (−1)i+j det(Aij ) for i, j = 1, . . . , n. Here Aij is
the matrix obtained by deleting the i-th row and the j-th column of A.
Sketch of Proof.
Lemma 1.4.1. For any i = 1, . . . , n, we have
∑
n
det(A) = aij (−1)i+j det(Aij ).
j=1
∑
n
bik akj
k=1
is det(A) when i = j, and is the determinant of a matrix with two identical rows/columns
when i ̸= j, which is 0 by Property 1.3.3.
The “GL” here stands for ”general linear group.” Leave the magical word ”group”
alone for a while, it is called ”general” because it is not degenerate (i.e. the determinant
is non-zero), and it is called linear because... well... it has something to do with linear
algebra.
I don’t know if the above explanation makes the name “GL” less mysterious :p.
14
Hung-Hsun, Yu 1.5 Random Problem Set
1. (1.1) Try to find an injection f from C to M2×2 (R) such that f (x)f (y) = f (xy)
and f (x) ∈ GL2 (R) for any nonzero x. Then consider the function det ◦f : C → R.
Is this multiplicative? What function is this?
3. (1.3) Suppose that A is a matrix with integer coefficients. Then A−1 exists and has
integer coefficients if and only if det(A) = ±1.
15
Hung-Hsun, Yu 1.5 Random Problem Set
16
Chapter 2
Now the trip to the abstract kingdom begins. If you feel lost, try to grab some concrete
examples with you as your compass. It might be difficult at first to deal with these
abstract objects, but everything will be better when you get used to it.
Group is the simplest object in algebra in the sense that it has the least structure.
That said, the structure is still somewhat complicated. The most famous example for
this is the classification of “simple group”, which is by no mean simple.
As an introduction, the properties introduced in this chapter will all be very basic.
The readers are encouraged to be aware of what has been done, or what should be
considered, after constructing an abstract algebraic object. This will help a lot when
encountering other algebraic objects.
One can immediately come up with a lot of examples of laws of composition. For
example, addition/subtraction/multiplication on real numbers, division on non-zero real
numbers, taking power on positive reals, etc. However, it is really hard to say something
clever under this weak condition. So we usually assume something (much) stronger:
Definition 2.1.2. Suppose that · is a law of composition on a set G. (G, ·) is said to
be a group if there exists an element e ∈ G such that for any a, b, c ∈ G the following
hold:
(1) (associativity) (a · b) · c = a · (b · c);
(2) a · e = e · a = a;
(3) There exists an element a′ ∈ G such that a · a′ = a′ · a = e.
In this case, e is called the identity of (G, ·).
Example 2.1.1. (Z, +), (Q, +), (Q\{0}, ×), (R, +), (R\{0}, ×), (C, +), (C\{0}, ×),
(GLn (K), ×) are groups. (N, +), (R, ×), (Z, −), (Mn×n , ×) are not groups.
The first condition shows that the parentheses are in fact not necessary. Therefore we
will drop the parentheses from now on. The second one says that there must be a neutral
17
Hung-Hsun, Yu 2.2 Symmetric Group
element, and the third one says that the inverse always exists. Note that both do not say
that such elements are unique, but they are in fact unique already by the conditions.
Property 2.1.1. If e and e′ are both identities of G, then e = e′ .
Property 2.1.2. If a′ and a′′ are both inverses of a (i.e. a′ a = aa′ = a′′ a = aa′′ = e,
then a′ = a′′ .
Now most of the examples we have are abelian, but there will soon be a lot of non-
abelian examples. Since the abelian case is well-studied, most of people’s interest is also
in the non-abelian groups.
Usually we drop the · if it does not cause any confusion. Conventionally, when the
group is written additively (so 0 is the identity and −a is the inverse of a), then it
implicitly means that the group is abelian. When the group is written multiplicatively
(so 1 is the identity and a−1 is the inverse of a), then it does not necessarily mean that
the group is non-abelian, nor does it show that it is abelian.
There is an important property of groups before we go on:
Property 2.1.3. (Law of cancellation) If a, b, x are elements in the group G, then
ax = bx implies a = b. Moreover, xa = xb also implies a = b.
To work with this group, we hope to calculate the product (or the composition)
efficiently. The cycle notation is really helpful in this regard.
Definition 2.2.2. Suppose that i1 , . . . , ik ∈ [n] are k different elements. Then the cycle
notation (i1 i2 . . . ik ) is the permutation π such that
®
ij+1 if m = ij
π(m) =
m else
where ik+1 = i1 .
Example 2.2.1. If σ is a permutation in S5 such that σ = (1 4 3), then the map table
of σ is
18
Hung-Hsun, Yu 2.2 Symmetric Group
i 1 2 3 4 5
.
σ(i) 4 2 1 3 5
Example 2.2.2. Let’s actually compute the disjoint cycle representation for a concrete
example. For example, take the permutation π = 451326 for example. We first consider
1. 1 is mapped to 4, 4 is mapped to 3 and 3 is mapped to 1. Since we come back to 1,
we can write
π = (1 4 3) · (??).
The next element that has not been considered is 2. 2 is mapped to 5, and 5 is mapped
to 2. Therefore,
π = (1 4 3)(2 5) · (??).
The only element left is 6. Since it is mapped to itslef, we finally have
π = (1 4 3)(2 5)(6).
Note that cycles of length 1 are in fact identity. Therefore we can drop (6) and wirte
π = (1 4 3)(2 5).
πσ = (1 6 8 5 3 2 4 7) · (??).
Remark. As we evaluate the composition of functions from right to left, we also cal-
culate the product of permutations from right to left. This is somehow counter-intuitive
and worth of caution.
Note that S3 has 3! = 6 elements. It can actually be shown that groups of order less
than 6 are always abelian, and therefore S3 is the smallest non-abelian group.
The group structure of Sn is fairly complicated, so whenever you need an example
of finite non-abelian group to test out a property, symmetric group is always your best
friend.
19
Hung-Hsun, Yu 2.3 Subgroup
2.3 Subgroup
Now that we have defined what a group is, we can think about what are the “subobjects”
and the “morphism” (function) when it comes to groups. In this section we will first
define the concept of subobjects in groups.
Definition 2.3.1. Given a group (G, ·). Let H be a nonempty subset of G. We say
that H is a subgroup of G if (H, ·) is also a group. This is denoted by (H, ·) ≤ (G, ·) or
H ≤ G when it does not have any ambiguity.
By the definition, it seems that we have to verify all the conditions to say that H is
a subgroup of G. However, since we are given a well-behaving operations in G already,
things are a bit easier for its subset.
Property 2.3.1. For any nonempty subset H in a group G, it is a subgroup of H if
and only if ab−1 ∈ H for any a, b ∈ H.
Note that for every group G, itself and {e} are always subgroups of G. To say
something clever, we are usually interested in the other cases.
Definition 2.3.2. A subgroup H of G is proper if H ̸= G. A subgroup H of G is trivial
if H only contains the identity of G; otherwise, it is nontrivial.
Remark. In some other texts “proper” means “proper and nontrivial.” Be careful with
the definition when you are reading other books.
Property 2.3.2. For any S ⊆ G, the set ⟨S⟩ is indeed a subgroup of G that contains
S. Moreover, it is the smallest possible one in the sense that if H ≤ G and S ⊆ H then
⟨S⟩ ⊆ H.
Example 2.3.1. In (R, +), the subgroup generated by {1} is Z. In (C\{0}, ×), the
subgroup generated by R+ ∪ {i} is (R\{0}) ∪ (R\{0})i, i.e. all the nonzero reals and
nonzerl purely imaginary numbers.
20
Hung-Hsun, Yu 2.4 Homomorphism
Sketch of Proof. By definition the set is clearly contained in ⟨x⟩. Therefore it suffices to
show that the set is indeed a subgroup of G.
Note that for two disdinct i, j ∈ Z, the elements xi , xj are not necessarily distinct. To
understand the structure of ⟨x⟩ better, let’s investigate when will xi = xj .
Definition 2.3.4. For any x ∈ G, the order of x in G is the smallest positive integer n
such that xn = 1. Denote it by ordG (x) (we’ll omit G if it is clear by context). If such n
does not exist, then define ordG (x) to be infinity.
Property 2.3.4. If ordG (x) < ∞, then xi = xj if and only if ordG (x)|i − j and so
⟨x⟩ = {1, x, . . . , xordG (x)−1 }. If ordG (x) = ∞, then xi = xj if and only if i = j.
Sketch of Proof. This is clear by the law of cancellation and the minimality of ordG (x).
Definition 2.3.5. If G is a group such that there exists x ∈ G such that G = ⟨x⟩, then
G is a cyclic group. If |G| is finite, denote G by Cn where n = |G|.
Therefore the property is actually saying that ⟨x⟩ is a cyclic group of order ordG (x).
Corollary 2.3.1. If G is a finte group, then every element in G has finite order.
Indeed a stronger version of this corollary holds, and we will see it really soon.
2.4 Homomorphism
Next is to define what type of functions we are going to consider in group theory. Ideally,
the functions should interact with the group structures. Hence the definition:
Definition 2.4.1. Given two groups G and G′ . A map ϕ : G → G′ is a homomorphism
if for any a, b ∈ G the following holds:
ϕ(a)ϕ(b) = ϕ(ab).
Furthermore, if ϕ is injective, then it is a monomorphism; if ϕ is surjective, then it is an
epimorphism; if ϕ is bijective, then it is an isomorphism.
21
Hung-Hsun, Yu 2.5 Equivalence Relation
(ϕ, ϕ)
G×G G′ × G′
· ·
G G′
ϕ
Example 2.4.1. For any group G and subgroup H ≤ G, we can construct a homomor-
phism (or furthermore a monomorphism) f : H → G by sending h ∈ H to h ∈ G. This
is often called the inclusion homomorphism.
Example 2.4.3. Taking determinant on GLn (R) is a homomorphism from GLn (R) to
R\{0}.
Example 2.4.5. If G and G′ are both cyclic groups and |G| = |G′ |, then G and G′ are
isomorphic. This justify the notation Cn because the structure of Cn only depends on n.
22
Hung-Hsun, Yu 2.6 Coset
Definition 2.5.2. For any set S with an equivalence relation ∼, and for any element
s ∈ S, define the equivalence class [s] of s by
Property 2.5.1. For any set S with an equivalence relation and s, s′ ∈ S, the sets
[s], [s′ ] are either the same or disjoint.
Definition 2.5.3. For any S with an equivalence relation ∼, the set S/ ∼ is the set of
equivalence classes. In other words,
S/ ∼= {[s]|s ∈ S}.
Corollary 2.5.1. S/ ∼ is a partitoin of S such that for any s, s′ ∈ S, they fall in the
same part if and only if s ∼ s′ .
Example 2.5.2. For any undirected graph G = (V, E), define an equivalence relation
∼ on V such that v ∼ v ′ if and only if v and v ′ are connected. Then V / ∼ is the set of
connected components of G.
Similarly, for any topological space X, we can define an equivalence relation ∼ on X
such that v ∼ v ′ if and only if they are path-connected, i.e. there exists a continuous
function f : [0, 1] → X such that f (0) = v and f (1) = v ′ . Then X/ ∼ is the set of
path-connected components.
2.6 Coset
In this section, we will furthermore explore the interaction between groups and their
subgroups.
Definition 2.6.1. Let H be a subgroup of a group G. For any a ∈ G, define the left
coset of H in G with respect to a to be
aH := {ah|h ∈ H}.
Property 2.6.1. Let H be a subgroup of G. Then we can verify that the binary relation
∼ on G such that a ∼ b ⇔ a−1 b ∈ H is indeed an equivalence relation. In this case, aH
is the equivalence class of a.
Note that by the law of cancellation, we can show that every coset of H has the same
size with H. Hence, we have |H||G/H| = |G| if |G| is finte.
Definition 2.6.2. For any subgroup H of G, the index of H in G is |G/H|. This is
usually denoted by [G : H].
23
Hung-Hsun, Yu 2.7 Normal Subgroup and Quotient Group
Corollary 2.6.3. (Lagrange’s Thm.) For every subgroup H of a finite group G, the
size of H divides the size of G.
To try to apply Lagrange’s theorem, we can investigate the simplest fashion of sub-
group in a group G: the cyclic subgroup generated by an element x ∈ G. We know that
⟨x⟩ is of order ordG (x), and Langrange’s theorem tells us that |⟨x⟩| divides |G|. As a
corollary,
Corollary 2.6.4. If G is a finite group and x ∈ G, then x|G| = 1.
At this point, we are ready to give a group theory proof of Euler’s theorem:
Theorem 2.6.1. (Euler’s) For any positive integer n and any a coprime with n,
aφ(n) ≡ 1 mod n
where φ(n) is the number of positive integers that are not greater than n and are coprime
with n.
Sketch of Proof. Let G be the set {i|1 ≤ i ≤ n, gcd(i, n) = 1}, and define i · j = k if
n|ij − k. By some simple number theory properties, (G, ·) is a group, and so it is a direct
corollary of Corollary 2.6.4.
Sketch of Proof. Note that if there is a homomorphism from G to G/H, then abH =
π(ab) = π(a)π(b) = aH · bH. Therefore it remains to see the condition for it to be
well-defined, i.e. if aH = a′ H, bH = b′ H then abH = a′ b′ H.
Now assume that the map is well-defined. Take b = b′ = a−1 and a′ = ah for some
h ∈ H, then aha−1 H = eH, which shows that aha−1 ∈ H. Therefore gHg −1 ⊆ H for
all g ∈ G. Replacing g with g −1 , we get that g −1 Hg ⊆ H, and so H ⊆ gHg −1 . As a
consequence, gHg −1 = H for all g ∈ G. One can also show that if gHg −1 = H then the
map is well-defined, and this is left as an exercise.
Definition 2.7.1. H is a normal subgroup of G if H is a subgroup of G such that
gHg −1 = H for any g ∈ G. It is often denoted by H ⊴ G. In this case, endow G/H with
a group structure such that aH · bH = abH. This is called a quotient group.
When thinking of quotient group G/H, we can see it by setting H as the identity (or
making H vanish). This will be helpful in the next section.
24
Hung-Hsun, Yu 2.8 First Isomorphism Theorem
Example 2.7.1. For any n ∈ N, consider the subgroup nZ := {nz|z ∈ Z} in (Z, +).
Since (Z, +) is abelian, the subgroup nZ is normal. Therefore we can consider the quotient
group Z/nZ. This is the addition group on integers modulo n. Note that |Z/nZ| = n
and Z/nZ = ⟨1 + nZ⟩. This shows that Z/nZ ∼ = Cn .
One can think of the kernel of ϕ to be the set of elements that will vanish after being
mapped by ϕ. Note that as mentioned before, taking a quotient group is as making a
normal subgroup vanish. Therefore there might be a chance that G/ ker(ϕ) is isomorphic
to the image of ϕ. This is indeed the first isomorphism theorem.
Theorem 2.8.1. (First Isomorphism Theorem) Let ϕ : G → G′ to be a group homo-
morphism. Then ker(ϕ) ⊴ G, the image of ϕ is a subgroup of G′ , and it is isomorphic to
G/ ker(ϕ).
Sketch of Proof. It is easy to verify that ker(ϕ) ⊴ G and that the image of ϕ is a subgroup
of G′ by definition. Now consider the following diagram.
ϕ
G G′
π
ϕ̃
G/ ker(ϕ)
One can show that there exists a unique homomorphism ϕ̃ that makes this diagram com-
mute. By the definition of ker(ϕ), the homomorphism ϕ̃ is a monomorphism. Therefore
G/ ker(ϕ) is isomorphic to the image of ϕ (or equivalently ϕ̃) via ϕ̃.
Example 2.8.2. Consider the map from GLn (K) to K × by taking the determinant.
This is an epimorphism, and so GLn (K)/ ker(det) ∼
= K ×.
25
Hung-Hsun, Yu 2.8 First Isomorphism Theorem
Definition 2.8.2. Let SLn (K) be the kernel of taking determinant in GLn (K). In
other words, SLn (K) is the set of n by n matrices with entries in K of determinant 1.
This is called the special linear group.
Example 2.8.3. For any n ≥ 2, let sgn : Sn → {±1} such that sgn maps even permu-
tations to 1 and odd permutations to −1. This is an epimorphism, and so Sn / ker(sgn) ∼
=
{±1}.
Definition 2.8.3. Let An be the kernel of sgn. In other words, An is the set of even
permutations in Sn . This is called the alternating group.
Example 2.8.4. Now consider the symmetric group S4 . There are three ways of
partition of [4] into two parts of the same size, i.e. {1, 2} ∪ {3, 4}, {1, 3} ∪ {2, 4} and
{1, 4} ∪ {2, 3}. Label these three partitions as P1 , P2 , P3 .
For any π ∈ S4 , when π permutes the index set [4], we can see that
is still a partition. Similarly π(P2 ), π(P3 ) are also partitions of [4]. Therefore π permutes
the set {P1 , P2 , P3 }, and so we can construct a map ϕ : S4 → S3 such that ϕ(π) is the
permutation of the partitions under π. It is clear that it is a homomorphism. Moreover,
ϕ((1 2)) = (2 3) and ϕ((1 3)) = (1 3). Since (1 3) and (2 3) genererates S3 , we have that
ϕ is surjective. Therefore S4 / ker(ϕ) ∼ = S3 .
Now let’s examine the kernel. Since |S4 | = 24 and |S3 | = 6, the kernel must be of size
4. Now instead of identity, it is clear that (1 2)(3 4), (1 3)(2 4) and (1 4)(2 3) are in the
kernel. Therefore
This group is an abelian group of order 4 whose elements have either order 1 or order 2.
It is usually denoted by K4 and called Klein four group.
In short, K4 ⊴ S4 and S4 /K4 ∼ = S3 .
There is really a reason that we choose S4 in this example. The readers are encouraged
to do the similar trick on other symmetric groups, and it should fail in most of the case.
There is a deep reason behind this, and it will soon be clear.
There are two other isomorphism theorems. However, they are just the corollaries of
first isomorphism theorem and they are relatively subtle, so I will omit the sketches of
proofs.
Theorem 2.8.2. (Second isomorphism theorem) Suppose that G is a group, N is a
normal subgroup of G and H is a subgroup of G. Then HN is a subgroup of G and
H ∩ N is a normal subgroup of H. Moreover, HN /N ∼
= H/(H ∩ N ).
26
Hung-Hsun, Yu 2.9 Direct Product and Semidirect Product
Example 2.9.1. K4 ∼
= Z/2Z × Z/2Z.
Property 2.9.1. Suppose that G = G1 × G2 . Then there are two natural homomor-
phisms f1 : G1 → G, f2 : G2 → G such that f1 (g1 ) = (g1 , 1G2 ) and f2 (g2 ) = (1G1 , g2 ).
Those two homomorphisms are injective, and so fi (Gi ) ∼
= Gi .
It may seem that direct product and quotient group are just mutually inverse opera-
tions. However, this is not the case in groups.
Example 2.9.3. Consider the cyclic group C4 generated by x. Since x2 has order 2,
the subgroup ⟨x2 ⟩ is isomorphic to C2 . Consider the map f from C4 to itself such that
f (y) = y 2 . This is a homomorphism because C4 is abelian. Both the kernel and image of
f are ⟨x2 ⟩, and so C4 /⟨x2 ⟩ ∼
= ⟨x2 ⟩. However, ⟨x2 ⟩ × ⟨x2 ⟩ ∼
= C2 × C2 is not isomorphic to
C4 because every element has an order 1 or 2 in C2 × C2 .
In this case, we can see that it fails because we cannot “put” C2 back to C4 as a
subgroup. More precisely, there does not exist monomorphism g : C2 → C4 such that
f ◦ g is the identity map on C2 . However, it is not the only potential factor that fails this.
Example 2.9.4. Consider the symmetric group S3 . We can see that ⟨(1 2 3)⟩ is a
normal subgroup of S3 and that
S3 /⟨(1 2 3)⟩ ∼
= ⟨(1 2)⟩.
27
Hung-Hsun, Yu 2.9 Direct Product and Semidirect Product
However, S3 is not isomorphic to ⟨(1 2 3)⟩ × ⟨(1 2)⟩ since the former is non-abelian while
the latter is abelian.
Although this time we can embed the image back to the group perfectly, the direct
product still does not give the correct answer. This is because that if S3 were isomorphic
to ⟨(1 2 3)⟩ × ⟨(1 2)⟩, then (1 2)(1 2 3)(1 2)−1 would be (1 2 3). However, the fact that
⟨(1 2 3)⟩ is a normal subgroup only gives that (1 2)(1 2 3)(1 2)−1 ∈ ⟨(1 2 3)⟩. In fact,
(1 2)(1 2 3)(1 2)−1 = (1 3 2) = (1 2 3)2 .
Let’s first put this aside and describe the situation more precisely.
Property 2.9.3. Suppose that N ⊴ G and f : G → H is an epimorphism with kernel
N . If there exists monomorphism g : H → G such that f ◦ g is the identity map on H,
then g(H) ∩ N = {1G } and G = N g(H).
28
Hung-Hsun, Yu 2.10 Correspondence Theorem
This is denoted by N ⋊φ H.
Example 2.9.6. Suppose G1 , G2 are two groups and φ : G2 → Aut(G1 ) is the trivial
map. Then G1 × G2 = G1 ⋊φ G2 .
Example 2.9.7. Consider the cyclic groups Z/3Z and Z/2Z. Let
φ : Z/2Z → Aut(Z/3Z) be a homomorphism sending x to the automorphism multiplying
2x . Then Z/3Z ⋊φ Z/2Z is isomorphic to S3 .
Property 2.9.5. Suppose that N and H are two groups and φ : H → Aut(N ) is a
group homomorphism. Then N is a normal subgroup of N ⋊φ H and
(N ⋊φ H)/N ∼
= H.
This leads us to determine the group Aut(G) for any group G. This is however a hard
problem to tackle. Therefore we will only consider the case where G is a cyclic group.
Property 2.9.6. Aut(Cn ) is isomorphic to Z/nZ× , i.e. the multiplicative group of
integers coprime with n modulo n. Aut(Z) ∼
= C2 .
29
Hung-Hsun, Yu 2.11 Simple Group and Alternating Group
These groups are simple because of obvious reasons. However, there are still tons of
other simple groups. In this section, we are going to prove that An is simple for any
n ≥ 5.
Theorem 2.11.1. For any n ≥ 5, the alternating group An is simple.
Sketch of Proof. For any N ⊴ An , we are going to first prove that it consists of a 3-cycle.
A lemma would be helpful:
Lemma 2.11.1. For a cycle (i1 i2 . . . ik ) and a permutation σ ∈ Sn , we have
Now for any permutation π in N that is not identity, we can divide this into three
cases:
Case 1. It contains a cycle of length at least 4. Suppose that the cycle is
(i1 i2 i3 i4 . . . ik ). Then by the lemma,
Note that
(i2 i3 i1 i4 . . . ik )−1 (i1 i2 i3 i4 . . . ik ) = (i1 ik i3 ).
As a consequence,
(i1 i2 i3 )π −1 (i1 i2 i3 )−1 π = (i1 ik i3 ).
Since N is normal, this is contained in N .
Case 2. It contains cycles of length at most 3 and there is a 2-cycle. Since π is an
even permutation, there should be two 2-cycles. Suppose that those are (a b) and (c d).
Then
(a b c)(a b)(c d)(a b c)−1 = (b c)(a d).
Note that
(b c)−1 (a d)−1 (a b)(c d) = (a c)(b d).
30
Hung-Hsun, Yu 2.12 Random Problem Set
As a consequence,
(a b c)π −1 (a b c)−1 π = (a c)(b d).
Since N is normal, this is contained in N . By the fact that n ≥ 5, there exists an element
e other than a, b, c, d. Therefore
is in N .
Case 3. It only contains cycles of length 3. If it only contains one cycle, then we’re
done. Otherwise, there exist two 3-cycles (a b c), (d e f ) in the disjoint representation of
π. By the lemma,
Note that
((b d c)(a e f ))−1 (a b c)(d e f ) = (a c f b d).
As a consequence,
(a b d)π −1 (a b d)−1 π = (a c f b d)
is in N . This reduces the case to Case 1.
In conclusion, there is always a 3-cycle in N . Now we are going to prove that every
3-cycle belongs in N . This will conclude the proof since 3-cycles generate An .
Suppose that (i j k) ∈ N and p, q are two other indices. For any three cycle (i′ j ′ k ′ ),
pick a permutation σ such that σ(i) = i′ , σ(j) = j ′ , σ(k) = k ′ . We can assume that
σ ∈ An , for if σ is odd, then we just need to pick σ ′ = σ(p q). Now by the lemma,
2. (2.2) (Much harder) For every permutation π ∈ Sn , define l(π) to be the number of
cycles in the disjoint cycle representation (including the 1-cycles). Prove that the
minimum number of 2-cycles whose product is π is n − l(π).
3. (2.3) Show that every group of finite and even order has an element of order 2.
4. (2.3) Show that for any a, b ∈ G, we have ordG (ab) = ordG (ba).
5. (2.4) Show that there exists a group G such that there is a proper subgroup H of
G that is isomorphic to G.
31
Hung-Hsun, Yu 2.12 Random Problem Set
6. (2.5) (Kind of tricky) Assume the axiom of choice in this problem. There are people
in a line labeled with 0, 1, 2, and so on. Each person has a hat that writes 0 or 1,
but the person does not know what it is. A person with the label i can see the hats
of all the people having larger labels. Now they are asked to guess the numbers
on their hats simultaneously. Show that if they are allowed to have a discussion
beforehand, then it is possible to guarantee that all but finitely many people guess
correctly.
7. (2.6) (Much much harder) Suppose that G is a group generated by n elements such
that every g ∈ G is of order 3. Prove that |G| ≤ 32 −1 .
n
8. (2.6) Given an n × n matrix of light bulbs and switches. For any switch, whenever
it is turned on/off, the bulbs at its position and next to it all alternate from on to
off or from off to on. Prove that starting from the state that all bulbs are off, the
number of possible states that one can reach is the power of 2.
10. (2.7) Suppose that G is a finite group and p is the smallest prime factor of |G|.
Show that every subgroup of G of index p is normal in G.
12. (2.7) For any group G, define the commutator subgroup [G, G] as the subgroup
generated by
{ghg −1 h−1 |g, h ∈ G}.
Show that [G, G] is a normal subgroup of G and G/[G, G] is abelian.
13. (2.7) Let G be a subgroup of GLn (R). Prove that the matrices that are path-
connected to the identity in G form a normal subgroup of G.
14. (2,7) Suppose that M, N are both normal subgroups of G such that M ∩ N = {1}.
Show that mn = nm for any m ∈ M and n ∈ N .
15. (2.9) Suppose that G is a group containing two elements a, b such that ordG (a) = n
and aba−1 = bk . Prove that ordG (b)|k n − 1. Conversely, suppose that d is a positive
integer such that d|k n − 1 then there exists a group G and a, b ∈ G such that
ordG (a) = n, ordG (b) = d and aba−1 = bk .
32
Chapter 3
Vector Space
We will leave the group kingdom for a while and jump into a more concrete area: the
vector space. There are a lot more restrictions on vector space, which makes it easier
to handle. These restrictions are reasonable though, so that the concept of vector space
appears a lot in various topics.
Matrix will appear a lot in this chapter, so it will be better to first get familiar with
the operations and then read this chapter. Also, which is the row and which is the column
might confuse you from time to time. If that happens, make sure to think clear and keep
track of the definition/convention.
3.1 Field
Before jumping into the definition of vector space, we have to first define what a field is.
Definition 3.1.1. A field is a non-trivial abelian group (F, +) with an additional law
of composition × such that:
(1) There is an element 1 ∈ F such that 1 × x = x for any x ∈ F .
(2) For any 0 ̸= x ∈ F there exists x−1 ∈ F such that x × x−1 = 1.
(3) (commutativity) a × b = b × a for any a, b ∈ F .
(4) (distributivity) (a × b) × c = a × (b × c) for any a, b, c ∈ F .
(5) (distributivity) (a + b) × c = a × c + b × c for any a, b, c ∈ F .
Informally, a field is the set that we can do addition, subtraction, multiplication and
division on.
Example 3.1.1. Q, R, C, Z/pZ are fields (where p is a prime). Z is not a field.
Note that in the definition, we don’t require 0 to have a multiplicative inverse. This
is because that it is impossible for 0 to have one.
Property 3.1.1. Suppose that F is a field. Then 0x = 0 for every x ∈ F .
We will talk more about the properties of fields later, but for now it is enough to
33
Hung-Hsun, Yu 3.2 Definitions and Examples
Sketch of Proof. If the order of 1 is ab for some a, b ∈ N such that a, b > 1. Then a · 1, b · 1
are not zero in F while their products are. Here a · 1 means 1 + 1 + · · · + 1 where 1
appears a times.
In this section, we will mostly discuss about the properties about the vector space
over the general field. However, it is always helpful to take R or C as the field to get a
tangible example.
Example 3.2.1. For any positive integer n and field F , the additive group F n is a
vector space over F . The set of polynomials with coefficients in F (or the polynomial
ring over F , denoted by F [x]) is also a vector space over F .
As in the case in group theory, we want to define what subobjects and maps we want
to consider when it comes to vector space.
Definition 3.2.2. Suppose that V is a vector space over F and W is a subset of V
such that W is also a vector space over F , then W is called a vector subspace of V .
f (av + v ′ ) = af (v) + f (v ′ ).
f (v + v ′ ) = f (v) + f (v ′ ),
34
Hung-Hsun, Yu 3.3 Basis
f (av) = af (v).
Example 3.2.2. For any vector subspace W of a vector space V , the inclusion map is
a linear transformation.
Example 3.2.3. For any x ∈ F , the evaluation map evx : F [x] → F that evaluates the
polynomials at x is a linear transformation.
Definition 3.2.4. For two vector spaces V, V ′ over F , if there exists a bijective linear
transformation from V to V ′ then we say that V is isomorphic to V ′ , or V and V ′ are
isomorphic.
Example 3.2.4. The set F [x]≤d := {f ∈ F [x]| deg f ≤ d} is a vector subspace of F [x].
It is isomorphic to F d+1 .
3.3 Basis
In this section, we will introduce the concept of basis. It is so essential that most of the
theory in linear algebra depend on the existence (and appropriate choice) of basis. Before
that, let’s first introduce generating and linearly independent sets.
Definition 3.3.1. Suppose that V is a vector space over F and S ⊆ V is a subset. The
set S is spanning if for every v ∈ V there exist v1 , . . . , vn ∈ S and a1 , . . . , an ∈ F such
that a1 v1 + · · · + an vn = v. In other words, every element can be represented as a linear
combination of elements in S.
Note that although S may be infinite, we only consider finite sums in linear algebra
with the absence of the concept of limit. For example, the set {1, 0.1, 0.01, . . .} is not
spanning in R over Q.
Definition 3.3.3. Suppose that V is a vector space and S ⊆ V is a subset. If S is both
spanning and linearly independent, then S is called a basis of V .
Intuitively generating sets are “larger” than linearly independent sets, so bases are
the sets that fall on the borderline between spanning sets and linearly independent sets.
Property 3.3.1. Suppose that {v1 , . . . , vn } is a basis of V , then for any v ∈ V there
exists a unique tuple (a1 , . . . , an ) ∈ F n such that a1 v1 + · · · + an vn = v.
Sketch of Proof. The existence follows by the defintion. The uniqueness follows by taking
the difference of two tuples that result in the same vector in V .
Corollary 3.3.1. If V is a vector space that has an ordered basis β of size n, then V
is isomorphic to F n . Moreover, we can write the isomorphism ϕ : F n → V explicitly as
35
Hung-Hsun, Yu 3.4 Linear Transformation and Matrix
This shows that it is easy to work in a vector space that has a basis. In fact, this is
true for finite dimensional vector spaces, whatever this means.
Theorem 3.3.1. Suppose that V is a vector space that has a finite generating set.
Then there is a finite basis of V .
Sketch of Proof. Consider a spanning set which is the smallest. Then it is the basis.
Definition 3.3.4. In this case, define the dimension of V to be the size of basis. This
is independent of the choice of basis and is denoted by dim V .
Remark. In fact, if we assume the axiom of choice (which we will in this note), then
there always is a basis of a given vector space, regardless of whether it is finite dimensional.
This can be shown by Zorn’s lemma and is left as exercise for readers that are acquainted
with the axiom of choice. The dimension of an infinite dimensional vector space is then
defined as the cardinality of the basis. That said, most of the discussion of linear algebra
will be on finite dimensional vector spaces in this note.
Example 3.3.1. The vector space containing polynomials over F of degree at most
d is of dimension d + 1, for the set {1, x, . . . , xd } is a basis of it. Note that the set
{1, x + 1, x2 + x + 1, . . . , xd + · · · + 1} is also a (rather unusual) basis. This suggest that
basis is generally not unique.
36
Hung-Hsun, Yu 3.4 Linear Transformation and Matrix
∑
m
aij ei = T (ej ) ∀j = 1, . . . , n,
i=1
This property shows that determining the value of a linear transformation is equivalent
to determining the value that it takes on the basis.
Since we show that any finite dimensional vector space is isomorphic to F n for some
n, we can pull this back to the general setting.
Corollary 3.4.1. Suppose that T : V → V ′ is a linear transformation and β, β ′ are
ordered basis of V, V ′ , respectively. Then there is a unique matrix A ∈ Mm×n (F ) such
that
T (βv) = β ′ Av
for any v ∈ F n .
′
Definition 3.4.2. In the previous setting, A is usually denoted as [T ]ββ .
This tells us that linear transformations between finite dimensional vector spaces and
the matrices are essentially the same. From now on, we will frequently exchange these
two ideas at our convenience.
Note that under suitable situation we can define the composition of linear transforma-
tion. Ideally the matrix form of the composition should be the multiplication of matrices.
One can also think of it as the reason the multiplication is defined in this way.
Property 3.4.2. Suppose that T1 : V1 → V2 and T2 : V2 → V3 are two linear transfor-
mations and α, β, γ are bases of V1 , V2 , V3 , respectively. Then [T2 ◦ T1 ]γα = [T2 ]γβ [T1 ]βα .
d
(1) = 0 + 0x + 0x2 ,
dx
d
(x) = 1 + 0x + 0x2 ,
dx
d 2
(x ) = 0 + 2x + 0x2
dx
d 3
(x ) = 0 + 0x + 3x2 .
dx
Therefore
0 1 0 0
d β′
[ ]β = 0 0 2 0
.
dx
0 0 0 3
37
Hung-Hsun, Yu 3.5 Multilinear Alternating Form
is a linear transformation.
Example 3.5.1. The map from F n to F taking the product of n variables is multilinear.
The map f : (Rn )2 → R such that f (u, v) = uT v is multilinear (and bilinear in this case
since there are two variables).
Despite the complicated looking of this property, this is actually a type of distribu-
tivity. When in doubt, try to think of multilinear forms as products.
Definition 3.5.2. A multilinear form f : V n → W is alternating if f (v1 , . . . , vn ) = 0
for any v1 , . . . , vn ∈ V n where vi = vj for some i ̸= j.
Sketch of Proof. It suffices to show the bilinear case. Suppose that v, w ∈ V , then we
want to show that f (v, w) + f (w, v) = 0. By the antisymmetry,
38
Hung-Hsun, Yu 3.5 Multilinear Alternating Form
Remark. It seems that one can take the previous property as definition, which is more
intuitive, and recover the current definition as the following:
However this fails when char(F ) = 2. This is why we take f (v, v) = 0 as definition.
Example 3.5.2. The map f : (R3 )3 → R taking the triple product f (u, v, w) = u ·
(v × w) is multilinear and alternating. More genrally, the map f : (Fn )n taking the
determinant of the matrix formed by the n column vectors is multilinear and alternating.
Sketch of Proof.
Ñ ( n ) é
∑
n ∑
n ∑ ∏
f a1j vj , . . . , anj vj ) = aig(i) f (vg(1) , . . . , vg(n)
j=1 j=1 g i=1
where the summation runs through all g : [n] → [n]. Note that if g(i) = g(j) for i ̸= j,
then vg(i) = vg(j) and so the term vanishes. Therefore
( n ) ( n )
∑ ∏ ∑ ∏
aig(i) f (vg(1) , . . . , vg(n) ) = aiσ(i) f (vσ(1) , . . . , vσ(n) ).
g i=1 σ∈Sn i=1
det(AB)f (v) = f (LAB (v)) = f (LA (LB (v))) = det(A)f (LB (v)) = det(A) det(B)f (v)
for any v ∈ V . Now it remains to choose suitable f and v such that f (v) ̸= 0, and this
is left as an exercise.
39
Hung-Hsun, Yu 3.6 Change of Basis
Corollary 3.6.1. (Change of basis formula for linear transformation) Suppose that
T : V1 → V2 is a linear transformation and α, α′ are two bases of V1 , β, β ′ are two bases
of V2 , then
′ ′
[T ]βα′ = [idV2 ]ββ [T ]βα [idV1 ]αα′ .
This shows that changing bases is just left multiplying the coordinate system with
an invertible matrix and left (and right) multiplying the transformation matrix with
invertible matrices. The converse is also true, i.e. if we do such multiplication then we
can change the basis accordingly.
Property 3.6.2. Suppose that β is a basis of an n-dimensional vector space V over F
and P is an n by n matrix that is invertible. Then β ′ = βP −1 is also a basis.
This tells us that A is intrinsically the same as QAP −1 when seen as linear transfor-
mations for any invertible matrices P, Q. We will use this to discover some useful facts
later.
40
Hung-Hsun, Yu 3.7 Rank and Nullity
This is actually a stronger version of the first isomorphism theorem in vector space.
Moreover, this shows that the only intrinsic property of a linear transformation (besides
its dimension) is the rank. The readers are asked to think it through.
Corollary 3.7.1. (Rank-nullity theorem) Suppose that T : V1 → V2 is a linear trans-
formation, then rk(T ) + nul(T ) = dim(V1 ).
Now that we know that linear transformation and matrix are essentially the same, we
can extend the definitions to the matrices.
Definition 3.7.2. The rank of a matrix A, denoted by rk(A), is the rank of the linear
transformation LA . The nullity of a matrix A, denoted by nul(A), is the nullity of the
linear transformation LA .
Property 3.7.3. Suppose that A and B are two matrices such that AB is defined, then
rk(AB) ≤ rk(A) and rk(AB) ≤ rk(B).
Note that the image of LA is simply the vector space spanned by the columns of A.
This is called the column rank of A in this context. A natural question here is: if we
consider the row rank of A (i.e. the rank of AT or the rank of RA ), will it be the same?
The answer is yes.
Property 3.7.4. The row rank and the column rank of a matrix is identical. Therefore
there is no need to distinguish the two terminologies.
Sketch of Proof. Suppose that the column rank of A is r, then we know that there are
41
Hung-Hsun, Yu 3.8 Another Dimension Formula
AT also has column rank r. This shows that the row rank of A is r.
Corollary 3.7.3. The rank of an m by n matrix is at most min(m, n). The rank of
a linear transformation from an n-dimensional vector space to an m-dimensional vector
space is of rank at most min(m, n).
Definition 3.7.3. A matrix or linear transformation is said to be of full rank if its rank
is the maximum possible out of among the matrices/linear transformations of the same
dimension.
This definition will become useful in the next chapter where we start to focus on linear
operator restricted on a subspace. For now, just remember that such thing exists.
42
Hung-Hsun, Yu 3.9 Application: Lagrange Interpolation
This is called Vandermonde matrix. To see that ϕ is bijective, we only need to show that
the matrix is invertible, or equivalently the determinant of the Vandermonde matrix is
nonzero. Indeed, we can compute the determinant explicitly:
Property 3.9.1. The determinant of Vandermonde matrix is
∏
(xj − xi ).
i<j
Sketch
Än ä of Proof. It is clear that the determinant is a homogeneous polynomial of degree
2
. Moreover, if we set xi = xj for i < j, then since there are two indentical rows, the
determinant is zero. This shows that xj − xi divides the determinat. As a consequence,
∏
the determinant is c i<j (xj − xi ) for some constant c. It remains to determine c, which
is clearly 1.
Remark. In fact there is a way to show that ϕ is bijective without calculating the
determinant. We can show that ϕ is injective by the fact that if f (xi ) = 0 then (x −
x1 ) · · · (x − xn )|f .
∑
n ∏ x − xj
f (x) = ai .
i=1 j̸=i xi − xj
43
Hung-Hsun, Yu 3.10 Random Problem Set
2. (3.2) Consider the euclidean space R3 . This is a vector space over R. For any
v ∈ R3 we can consider the map fv : R3 → R3 such that fv (u) = u × v for any
u ∈ R3 . Show that fv is a linear transformation but is not bijective.
3. (3.3) For every finite dimensional vector space V and a vector subspace W of V ,
show that dim W ≤ dim V . Moreover, show that the equality holds if and only if
V = W.
4. (3.4) Choose your favorite prime p that is not too large. Let β = {1, x, . . . , xp−1 }
be a basis of Fp [x]≤p−1 and β ′ be the standard basis of Fpp . Let T : Fp [x]≤p−1 → Fpp
be the linear transformation such that T (f ) = (f (0), f (1), . . . , f (p − 1)). Compute
′
[T ]ββ and deduce that T is an isomorphism.
6. (3.6) Compute the order of the set GLn (Fp ) by considering the number of (ordered)
bases in Fnp .
7. (3.7) Prove that for any m by n matrix A, we can decompose it into the sum of
rk(A) rank-one matrices, and that this is the least possible. This is called the
rank-one decomposition of A.
8. (3.7) Show that the rank of a matrix is the number of pivots of its reduced row
echelon form. Now consider Problem 2 in Chapter 1 again.
44
Chapter 4
Linear Operator
In the previous chapter we have seen that the only intrinsic property of a linear transfor-
mation is rank. However, if the domain and the range of a linear transformation is the
same vector space, then we usually hope that the bases that we choose for the domain
and the range are the same. As a consequence, the argument in the previous chapter does
not work any more, and we will actually see that in this case the linear transformation
possess much more intrinsic properties. This kind of linear transformation are called
linear operator.
In this section we will develop the theory of linear operator. The main result of this
chapter will be the Jordan canonical form. There is in fact a more general result, i.e. the
rational form. This is nonetheless hard to prove at this point, and I will put it off until
we learn about module theory.
Since now the domain and the range are the same, we just need to choose a basis.
Definition 4.1.2. Suppose that T is a linear operator on a finite dimensional vector
space V who has a basis β, then [T ]β is the matrix [T ]ββ .
Property 4.1.1. (Change of basis formula for linear operator) Suppose that T is a
linear operator on a n-dimensional vector space V and β, β ′ are two bases of V . Then
′
[T ]β ′ = [id]ββ [T ]β [id]ββ ′ ,
or equivalently,
′ ′
[T ]β ′ = [id]ββ [T ]β ([id]ββ )−1 .
Conversly, if A is the matrix form of T with respect to some basis of V and Q is an n by
n invertible matrix, then there exists a basis of V such that the matrix form of T with
respect to that basis is QAQ−1 .
Definition 4.1.3. Suppose that A, B are two n by n matrix such that there exists
Q ∈ GLn (F ) such that B = QAQ−1 , then we say that A, B are similar.
45
Hung-Hsun, Yu 4.2 Determinant and Characteristic Polynomial
For any θ we can see that rθ is of full rank. However, they are not intrinsically the same.
In particular, r0 is the identity, while rπ is the reflection about the origin.
From this example we see that merely the rank does not determine the property of
the linear operator. From now on, we will be dedicated to finding other invariant of linear
operators.
Sketch of Proof. Suppose that A, B are two matrix forms of T , then A and B are similar.
Therefore they have the same determinant.
Now we can take full advantage of the determinant to find another invariant:
Definition 4.2.2. If T is a linear operator on an n-dimensional vector space, then the
determinant of (x × id −T ), a polynomial in x of degree n, is called the characteristic
polynomial of T . This is denoted by charT (x). Similarly, if A is an n by n matrix, then
charA (x) is defined to be det(xIn − A).
Property 4.2.2. If A, B are similar square matrices, then charA (x) = charB (x).
Note that [xn−1 ] charA (x) is negative the sum of diagonal entries. Since the character-
istic polynomial of a linear operator T does not depend on the basis, the sum of diagonal
entries of its matrix form also is independent of the basis. This is another important
invariant that is worth a name.
Definition 4.2.3. If A is a square matrix, then the trace of A is the sum of its diagonal
entries. This is denoted by tr(A). If T is a finite dimensional linear operator, then tr(T )
is the trace of any matrix form of T .
We already know from the property of characteristic polynomial that two similar
matrices have the same trace. In fact, it holds in a slightly general situation.
46
Hung-Hsun, Yu 4.3 Invariant Subspace and Eigenspace
Property 4.2.3. Suppose that A, B are two square matrices of the same size. Then
tr(AB) = tr(BA).
Now that we know that the characteristic polynomial is also an intrinsic property of a
linear operator, we naturally hope that the characteristic polynomial and rank determine
the linear operator uniquely. This is however not true.
Example 4.2.1. Consider the two matrices
ñ ô ñ ô
1 0 1 1
, .
0 1 0 1
They both have characteristic polynomial (x − 1)2 and rank 2. However, they are not
similar, since every element is fixed by the first one but not the second one.
The intuition that this might be useful is that if we can break V into pieces of invariant
subspaces, then we can kind of “diagonalize” the linear operator.
Property 4.3.1. Suppose that T is a linear operator on a finite dimensional vector
space V . If V1 , V2 , . . . , Vk are T -invariant subspaces such that V = V1 ⊕ · · · ⊕ Vk then
there exists a basis of V such that the matrix form of T is in the form
A1
A2
...
Ak
where Ai is a dim(Vi ) by dim(Vi ) matrix.
Definition 4.3.2. A linear operator is diagonalizable if there is a basis such that the
matrix form of it is a diagonal matrix. Similarly, a square matrix A is diagonalizable if
there exists invertible square matrix Q suxh that QAQ−1 is diagonal.
47
Hung-Hsun, Yu 4.3 Invariant Subspace and Eigenspace
Naturally our hope now is that Eλ1 ⊕· · ·⊕Eλk = V . This is nonetheless not necessarily
true.
Example 4.3.1. Let’s consider the rotation on R2 again. The characteristic polynomial
of rθ is x2 − 2 cos θ + 1. Since this polynomial has no real roots (unless θ = 0, π), rθ has
no eigenvalues. It even does not have any nontrivial proper invariant subspace in R2 .
This issue can be easily solved if we extend this linear operator to C2 over C. In this
case, there are two eigenvalues: eiθ and e−iθ . After some simple calculation, we can see
that rθ can be diagonalized:
ñ ô−1 ñ ô ñ ô
1 1 1 1 e−iθ
rθ = .
i −i i −i eiθ
Therefore we would like to make the assumption that the characteristic polynomial splits
in F . If it does not, we can simply take its “algebraic closure” to try to diagonalize it.
Example 4.3.2. This is however not the only issue that one might encounter. Consider
the matrix ñ ô
1 1
.
0 1
Its characteristic polynomial is (x − 1)2 which splits completely. However E1 is of dimen-
sion 1 and there is just no way to fix it.
From this example we can see that we kind of care about the dimension of eigenspaces
.
Property 4.3.5. If λ is an eigenvalue of T of multiplicity m, then
1 ≤ dim(Eλ ) ≤ m.
48
Hung-Hsun, Yu 4.4 Cayley-Hamilton Theorem
From this, we see that a linear operator/matrix is diagonalizable means that every
eigenspace achieves its maximum possible dimension.
Corollary 4.3.3. If a linear operator/matrix has a splitting characteristic polynomial
whose roots are distinct, then it is diagonalizable.
It nonetheless does not work, and as far as the author is concerned, there is no easy way
to fix this proof. The readers are encouraged to point out the abuse of notation/flaw of
logic here.
Sketch of Proof. Suppose that the transpose of the cofactor matrix of tIn − A is B(t).
Then B(t)(tIn − A) = charA (t)In . It is clear that every entry of B(t) is a polynomial in
t with degree at most n − 1. Therefore, we can write
∑
n−1
B(t) = Bi ti .
i=0
Suppose that
∑
n
charA (t) = pi ti ,
i=0
49
Hung-Hsun, Yu 4.5 Generalized Eigenspace
Definition 4.4.1. For a matrix A(or a linear operator on finite dimensional vector
space), the minimal polynomial of A is the polynomial f (x) ∈ F [x] with leading coefficient
1 of the smallest degree such that f (A) = 0.
Note that how Cayley-Hamilton theorem is powerful: If we directly use the fact that
an n by n matrix A lives in an n2 -dimensional vector space, then we can only get that
the degree of the minimal polynomial is at most n2 . However, Cayley-Hamilton theorem
not only tells us that the degree of the minimal polynomial is at most n, but also tells us
that the minimal polynomial divides its characteristic polynomial.
Note that with the concept of the minimal polynomial, we can now differentiate the
matrices ñ ô ñ ô
1 0 1 1
, .
0 1 0 1
The first one has minimal polynomial x−1, while the second one has minimal polynomial
(x−1)2 . Although it is generally not true that merely minimal polynomial can determines
a linear operator up to change of basis, this somehow leads us to consider the polynomials
and “where they vanish.”
Property 4.4.1. Suppose that T is a linear operator and f ∈ F [x], then ker f (T ) is
T -invariant.
Sketch of Proof. It is easy to see that ker f (T ) + ker g(T ) ⊆ ker l(T ) and ker d(T ) ⊆
ker f (T ) ∩ ker g(T ). To show the opposite direction, we can apply Bezout’s theorem to
show that there exists a, b ∈ F [x] such that af + bg = d. Therefore for every v ∈ l(T ),
bg af
v= (T )v + (T )v ∈ ker f (T ) + ker g(T )
d d
50
Hung-Hsun, Yu 4.6 Jordan Canonical Form
f = g1 g2 · · · gk
then
V = ker(T − λ1 id)m1 ⊕ · · · ⊕ ker(T − λk id)mk
This shows that when the characteristic polynomial splits, the whole space is the
direct sum of the generalized eigenspaces. Therefore we just need to see how T can act
on the generalized eigenspaces.
If we can choose a good basis for any nilpotent linear operator, then we can express
any linear operator in a really good form.
Theorem 4.6.1. If T is a nilpotent linear operator on a finite vector space V , then V
be can decomposed into a direct sum of T -invariant subspaces V1 , . . . , Vk such that for
every Vi we can choose a basis {βi1 , . . . , βit } of Vi such that T βi1 = 0, T βi(j+1) = βij .
Sketch of Proof. Suppose that m is the smallest positive integer such that T m = 0 and
β1m , . . . , βlm m extend the basis of ker T m−1 to one of ker T m . Take βi(m−1) = T βim for
i = 1, . . . , lm and suppose that β(lm +1)(m−1) , . . . , βlm−1 (m−1) extend the basis of ker T m−2
union {β1(m−1) , . . . , βlm (m−1) } to the one of ker T m−1 . Continue this operation and we’re
done. To put it formally, one can show it by induction.
51
Hung-Hsun, Yu 4.6 Jordan Canonical Form
Similarly, suppose that A is a square matrix, then there exists invertible matrix Q such
that QAQ−1 is of the form.
Now combining the fact that the vector spaces can be decomposed into ker(T −λi id)mi ,
we get that
Theorem 4.6.2. Suppose that T is a linear operator on a finite dimensional vector
space whose charateristic polynomial splits, then there exists a choice of basis such that
the matrix form of T is
J1
J2
...
Jk
where each square block Ji is of the form
λi 1
λi 1
... ...
.
λi 1
λi
Definition 4.6.2. The blocks are called Jordan blocks. The matrix form made up of
Jordan blocks is said to be in Jordan canonical form.
Remark. Actually it is easy to show that Jordan canonical form of linear opera-
tors/matrices is unique up to a permutation of Jordan blocks. This is fairly trivial and
is left as an exercise.
52
Hung-Hsun, Yu 4.7 Application 1: Homogeneous Linear Dif Eq
However, this does not suffice when we want to investigate the relation between two
linear operators; moreover, this does not work if we don’t want to extend the field to
its algebraic closure. For the former issue we basically have different tricks for different
situation. For the latter, we can in fact develop another canonical form, which is called
rational form. We will get into this after we introduce the concept of module.
where aij ∈ C and xi (i ∈ C) are differentiable functions from C to C. We can also write
it as x′ (t) = Ax(t) where A ∈ Mn×n (C) and x is a differentiable function from C to Cn .
We can first investigate the easiest case where n = 1. In this case, we have x′ (t) = ax,
which shows that
∫ t ∫ t ∫ x(t)
a dx
at = ads = ′
dx(s) = = log x(t) − log x(0),
0 0 x (s) x(0) x
When A is not diagonalizable, we can instead take Q such that QAQ−1 is in Jordan
Canonical form. Therefore the problem is reduced to calculate the solution y to y ′ = Jy
where J is a Jordan block.
Lemma 4.7.1. Suppose that J is an n by n Jordan block corresponding to an eigenvalue
λ and y ′ = Jy, then for any i, yi is of the form Pi (t)eλt where Pi is a polynomial of degree
at most n − i. Moreover, the dimension of the solutions is n.
53
Hung-Hsun, Yu 4.8 Application 2: Markov Chain
If there are only finitely many states, then we can use a matrix to record the change
of distribution each step.
Definition 4.8.1. Suppose that there are n states X1 , . . . , Xn in a Markov process,
and the probability of the state Xj transitioning to Xi is mij , then the matrix M is the
transition matrix corresponding to this Markov process.
Example 4.8.2. Suppose that there is an ant on a simple graph G with n vertices
v1 , . . . , vn . At each step, the ant randomly picks a neighbor and get to the chosen vertex.
Then this is a Markov process, and the transition matrix M satisfies
{
0 if vi , vj are not neighbors
mij = 1 .
dj
if vi , vj are neighbors
This is called a random walk on G.
Property 4.8.1. If M is a transition matrix, then every column sums to 1, and every
entry is non-negative. Moreover, suppose that at i-th step the distribution is p (i.e. the
probability of the process being at state Xi is pi ), then the distribution at step i + 1 is
M p.
In this kind of problem, we are usually interested in the probability distribution after
a long time. That is, given an initial distribution π, we want to calculate limk→∞ M k π.
Since this is just some convex combination of the columns of limM →∞ Ak , it suffices to
calculate limM →∞ Ak .
54
Hung-Hsun, Yu 4.8 Application 2: Markov Chain
Lemma 4.8.1. Suppose that M is a matrix whose entries are non-negative and whose
columns all sum to 1. Then the eigenvalue of M with the largest absolute value is 1.
Moreover, if for any proper nonempty subset S of [n], there exists i ∈ S, j ̸∈ S such that
mij ̸= 0, then the eigenvector of M corresponding to 1 is unique (up to scaling).
Sketch of Proof. It suffices to prove the statement for M T . It is clear that 1 is an eigen-
value since M T 1 = 1 where 1 is the all-one vector. Suppose that λ is an eigenvalue of
M T with eigenvector v, and suppose that vi has the largest absolute value among all the
entries of v, then
∑
n ∑
n ∑
n
|λ||vi | = |λvi | = | mji vj | ≤ mji |vj | ≤ mji |vi | = |vi |.
j=1 j=1 j=1
Therefore |λ| ≤ 1.
Now if for any proper nonempty subset S of [n], there exists i ∈ S, j ̸∈ S such that
mij ̸= 0, suppose that M t v = v. Let S be the indices set
Since the equality holds, we have for each k either mki = 0 or vk = vi . This is a
contradiction.
Lemma 4.8.2. Suppose that M is a matrix whose entries are non-negative and whose
columns all sum to 1. Then E1′ = E1 .
Sketch of Proof. Suppose not, then ||(M T )k ||∞ → ∞ as k → ∞ where ||(M T )k ||∞ is the
largest absolute value of the entries of (M T )k . However, we have (M T )k 1 = 1, which is
a contradiction since the entries of M k are non-negative.
Theorem 4.8.1. Suppose that M is a transition matrix corresponding to a Markov
process, and every eigenvalue of M with absolute value 1 is 1, then limk→∞ M k exists.
Moreover, the columns of limk→∞ M k are eigenvectors of M corresponding to 1.
Sketch of Proof. Consider the Jordan canonical form of M . Every block either has diag-
onal entries with absolute value less then 1 or is a single 1. In the first case, its power
tends to 0. In the second case, its power is always 1. Therefore limk→∞ M k exists. It is
then clear that every column of M k is an eigenvector corresponding to 1.
Corollary 4.8.1. Suppose that M is a transition matrix corresponding to a Markov
process, and every eigenvalue of M with absolute value 1 is 1 with the unique unit
eigenvector, then the distribution tends to that unit eigenvector no matter what the
initial distribution is.
55
Hung-Hsun, Yu 4.9 Perron-Frobenius Theorem
Sketch of Proof. One can prove that M has an eigenvalue λ ̸= 1 such that |λ| = 1 if and
only if G is bipartite (in fact λ = −1 in this case), and dim E1 > 1 if and only if G is not
connected.
Corollary 4.8.3. (Random walk on directed graph) If a directed graph G is aperiodic
(i.e. the gcd of lengths of cycles is 1) and strongly connected, then the probability
distribution tends to a unique distribution.
Sketch of Proof. One can prove that M has an eigenvalue λ ̸= 1 such that |λ| = 1 if and
only if G is periodic (in fact λ is a primitive k-th root of unity if and only if G has period
k), and dim E1 > 1 if and only if G is not strongly connected.
Definition 4.9.2. For a matrix A/a vector v, |A|/|v| is the matrix/vector obtained by
taking absolute value entry-wise.
Definition 4.9.3. The spectral radius of a square matrix A is the eigenvalue of A with
the largest absolute value. This is denoted by ρ(A).
Theorem 4.9.1. (Perron’s Theorem) Suppose that A > 0 is a square matrix, then:
(1) ρ(A) is an eigenvalue of A of multiplicity 1;
(2) If λ is an eigenvalue of A such that |λ| = ρ(A), then λ = ρ(A);
(3) The eigenvector corresponding to ρ(A) is positive;
(4) If v is an eigenvector that is non-negative, then Av = ρ(A)v.
Sketch of Proof. Suppose that λ is an eigenvalue of A such that |λ| = ρ(A) and v is the
corresponding eigenvector, then
Suppose for the sake of contradiction that the equality does not hold, then u := A|v| −
ρ(A)|v| ≥ 0 and u ̸= 0. Since Au > 0, there exists ε such that Au > ερ(A)A|v|. This
shows that
A2 |v| − Aρ(A)|v| > ερ(A)A|v|
or equivalently
A
A|v| > A|v|.
(1 + ε)ρ(A)
Let B = (1 + ε)−1 ρ(A)−1 A, then B(A|v|) > A|v|. As a consequence, B k (A|v|) > A|v|
for any k ∈ N. However, ρ(B) = (1 + ε)−1 < 1 implies that limk→∞ B k = 0. Therefore
56
Hung-Hsun, Yu 4.9 Perron-Frobenius Theorem
0 > A|v|, which is a contradiction. Therefore 0 < A|v| = ρ(A)|v|, and so ρ(A) is an
eigenvalue and the corresponding eigenvector |v| is positive. Moreover, since the equality
must hold, we have that arg(v1 ), . . . , arg(vn ) are the same. This implies that λ = ρ(A).
Now if v ≥ 0 is an eigenvector of A such that Av = λv for some λ, suppose that w
is the eigenvector of AT corresponding to ρ(A). Then λwT v = wT Av = ρ(A)wT v, which
implies that λ = ρ(A).
It remains to show that ρ(A) is of multiplicity 1. We begin with showing that
dim Eρ(A) = 1. Suppose that u, v > 0 satisfy Au = ρ(A)u, Av = ρ(A)v, then A(u − cv) =
ρ(A)(u − cv). Pick c such that one of entry of u − cv is zero and the remaining are
non-negative, then ρ(A)(u − cv) = A(u − cv) > 0 if u − cv ̸= 0, which is impossible.
Therefore u = cv and so dim Eρ(A) = 1.
′
Finally, we want to show that dim Eρ(A) = 1. To show this, we first notice that
if dim Eρ(A) > 1 then ||C ||∞ → ∞ where C = ρ(A)−1 A. Here ||C k ||∞ is the largest
′ k
absolute value of the entries of C k . Since ρ(C) = 1, there is a positive vector x such
that Cx = x. Therefore C k x = x for any k ∈ N, which shows that ||x||∞ → ∞, a
′
contradiction. Therefore dim Eρ(A) = 1.
Note that because of the presence of zero entries, the uniqueness and the positiveness
both fail in this case. Frobenius found that most of the uniqueness and positiveness still
hold in a certain class of non-negative matrices.
Definition 4.9.4. A non-negative square matrix A is said to be irreducible if there
does not exist permutation matrix P such that P −1 AP is of the form
ñ ô
X Y
0 Z
Sketch of Proof. Let G be a directed graph such that there is an edge from vi to vj if and
only if aji is positive. Then the three conditions are all equivalent to that G is strongly
connected.
57
Hung-Hsun, Yu 4.10 Random Problem Set
Sketch of Proof. Let B = (In + A)n−1 , then by Jordan canonical form it is clear that
the eigenvalues of B are (1 + λ)n−1 where λ is an eigenvalue of A. Therefore ρ(B) =
(1 + ρ(A))n−1 . By Perron’s theorem the multiplicity of (1 + ρ(A))n−1 is 1 w.r.t. B, and so
the multiplicity of ρ(A) is at most (and therefore equal to) 1 w.r.t. A. Moreover, suppose
that v is an eigenvector corresponding to λ of A, then
( ) ( )
∑
n−1
n−1 i ∑ n−1
n−1
Bv = Av= λi v = (1 + λ)n−1 v.
i=0 i i=0 i
Therefore (2), (3) are both direct corollary of Perron’s theorem. Now suppose that
ρ(A) = 0, then Av = ρ(A)v = 0 for some v > 0, which is apparently a contradiction.
Therefore ρ(A) > 0.
58
Hung-Hsun, Yu 4.10 Random Problem Set
10. (4.6) Suppose that A, B are two diagonalizable matrices that commute, i.e. AB =
BA. Show that A, B are simultaneously diagonalizable, i.e. there exists an invert-
ible matrix Q such that QAQ−1 , QBQ−1 are both diagonal.
11. (4.7) Using the result in Section 4.7, show that if xn + an−1 xn−1 + · · · + a0 has roots
λ1 , . . . , λk with multiplicity m1 , . . . , mk , then the general solution of
12. (4.7) Find all finite dimensional vector spaces V containing differentiable functions
f in one variable such that f ∈ W implies f ′ ∈ W .
13. (4.9) Show that there exists irreducible matrix A such that there is an eigenvalue
λ of A such that λ ̸= ρ(A) but |λ| = ρ(A).
59
Hung-Hsun, Yu 4.10 Random Problem Set
60
Chapter 5
In this chapter we will introduce the concept of group action. This is an important tool
to learn about groups, and we can derive several important theorems from it. Before
jumping into it, we will first begin with a concrete example of group action, namely
symmetry. While developing the theory of symmetry, we will at the same time encounter
with more examples of finite non-abelian groups.
5.1 Isometry
Let P be the space Rn . Note that I write P instead of Rn here because that I don’t want
to endow P with the vector space structure. Instead, I want to think of P “geometrically”.
Specifically, for any point x, y in P , I only care about the vector from x to y, denoted by
x − y. This is a vector in Rn .
Definition 5.1.1. An isometry of P is a bijective function f : P → P such that
||f (x) − f (y)|| = ||x − y|| for any x, y ∈ P . In other words, an isometry is a distance-
preserving bijection. All isometries form a group, which is called the isometry group (of
P ). We will denote it by Mn .
61
Hung-Hsun, Yu 5.1 Isometry
This shows that f (−v) = −f (v). Now for any v ∈ Rn and c ∈ R+ we know that
= ||f (x + (c + 1)v) − f (x + cv)|| + ||f (x + cv) − f (x)|| = ||f (v)|| + ||f (cv)||.
Therefore there exists d ≥ 0 such that f (cv) = df (v). It is then clear that d = c.
Now f is a linear operator that preserves the norms of the vectors. This is a special
kind of linear operator that we will be interested in.
Property 5.1.1. Suppose that the matrix form of f corresponding to the standard
basis is M . Then M T = M −1 . Conversely, it M T = M −1 , then M preserves the norms
of the vectors.
M T = M −1
⇔aT M T M a = aT a ∀a ∈ Rn
⇔||M a|| = ||a|| ∀a ∈ Rn .
62
Hung-Hsun, Yu 5.2 Isometry of Plane
From now on, for simplicity, we will identify P with R2 , which induces an embedding
of O2 into M2 .
Property 5.2.3. For every isometry in M2 , it is either of the form ta ρθ or ta ρθ r.
Definition 5.2.1. In the first case, we say that the isometry is orientation-preserving.
In the second case, we say that the isometry is orientation-reversing.
Corollary 5.2.1. There are three kinds of isometry of plane: translation, rotation,
and glide reflection Here glide reflection gv,l is the composition of reflection about l and
translation by v, where v is parallel to l.
Sketch of Proof. We first deal with the case ta ρθ . For any x ∈ P we know that ρθ =
t−ρx,θ (0) ρx,θ . Therefore ta ρθ = ρx,θ if and only if a = ρx,θ (0) = (I2 − ρθ )x. Note that
Therefore there is always such x unless θ ∈ 2πZ. Hence the isometries in this case can
only be translation or rotation.
Now the remaining is the second case ta ρθ r. We will first show that ρθ r is a reflection
about a line passing through the origin. In other words, there exists ϕ such that ρθ r =
ρϕ rρ−ϕ . A simple calculation shows that picking ϕ = θ/2 suffices. Now it remains to
show that we can somehow “reduce” the component of a that is perpendicular to the
line. For simplicity, let’s WLOG assume that the reflection is r, the reflection about the
x-axis. Then
ta r = ga1 ,y= 1 a2 .
2
63
Hung-Hsun, Yu 5.3 Symmetry group
From this, one can see that translation is somehow the degeneration of rotation.
Intuitively, translation is the rotation about the “infinite point”. Therefore sometime we
see translations as a part of generalized rotations.
Property 5.2.4. The composition of two generalized rotations is a generalized rotation.
The composition of two glide reflection is also a generalized rotation. The composition
of one generalized rotation and one glide reflection is a glide reflection.
Example 5.3.1. The symmetry group of a circle is the rotations about the center and
the reflections about the line passing through the center. The symmetry group of the
x-axis is the translation ta where a is parallel to the x-axis.
We can also do this in the reverse direction. That is, given a subgroup of the isome-
tries, can we generate a figure that has the given a subgroup as its symmetry group?
The simplest way is for a given subgroup G of the isometries, first put a figure F ′ on
a plane. Then we draw the figure F = ∪g∈G g(F ′ ). It is clear that every element g in G
fixes F . The only thing that we need to do is to make F ′ as asymmetric as possible, and
hope that the isometry group of F is precisely G.
Example 5.3.2. Consider the subgroup G generated by g(1,0),y=0 and ρ(0,0),π . The figure
that the funny stickman generates is the following diagram.
··· ···
Figure 5.1: Running Stickmans
This figure’s symmetry group is precisely G. For instance, this figure is fixed by t(2,0) .
2
On the other hand, t(2,0) = g(1,0),y=0 .
Example 5.3.3. Now consider the subgroup SO2 w.r.t. some coordinate. Suppose that
F is a figure that is fixed by SO2 , then F is a union of some circles centered at the origin.
64
Hung-Hsun, Yu 5.4 Lattice Group and Point Group
Therefore F is also fixed by elements in O2 . This shows that SO2 is not realizable as the
symmetric group of some figure.
The reason that the copy and paste method fails is that if we try to generate F with
a stickman, they will just overlap with each other too much and one can not tell that
the figure is generated by a stickman at all. This guides us to consider the subgroups of
isometries that are not “that dense.”
⇒
Figure 5.2: Sad stickman losing its contour
Theorem 5.3.1. For every discrete subgroup G of the isometries of plane, there exists
a figure having G as its symmetry group.
Sketch of Proof. We won’t give a complete proof of it in this note. However, after we
deduce some useful properties of discrete symmetry groups, one can do a case-by-case
analysis and construct the corresponding figure of each type of discrete symmetry group.
65
Hung-Hsun, Yu 5.5 Crystallographic Restriction
tM v = ϕx,M tv ϕ−1
x,M ∈ G.
Sketch of Proof. Suppose that L ̸= {0}, then choose a′ ∈ L. Since L is discrete, there is
a ∈ Ra′ ∩ L that is nonzero but closest to 0. It is then clear that Ra ∩ L = Za.
If Za ̸= L, choose b′ such that a, b′ are linearly independent. Consider the map
π : R2 → R mapping xa + yb′ to y. Then π(L) is nontrivial and discrete (this is kind
of not obvious but is left as an exercise), and so by the same argument we know that
π(L) = Zr for some r ∈ R. Suppose that π(b) = r, then by ker π ∩ L = Za we know that
L = Za + Zb.
Property 5.5.2. Suppose that G is the point group of G ≤ M2 where G is discrete,
then one of the following holds:
(1) G = {ρ2kπ/n |k = 0, . . . , n − 1} for some n ∈ N;
(2) G = {ρ2kπ/n , ρ2kπ/n r|k = 0, . . . , n − 1} for some n ∈ N where r is a reflection.
G = {ρ2kπ/n |k = 0, . . . , n − 1}
for some n ∈ N. Let θ be the smallest positive angle such that ρθ ∈ G, then G = ⟨ρθ ⟩.
Now choose n ∈ N such that n is the smallest positive satisfying nθ ≥ 2π. It is clear that
the equality must hold.
Note that in the first case, it is clear that G ∼ = Cn . In the second case we see that
F ∼= ⟨x, r⟩ where x n
= 1, r 2
= 1 and rxr −1
= x −1
. This is called the dihedral group, and
is denoted by Dn . An easy way to visualize this is to think of it as the symmetry group
of a regular n-gon.
66
Hung-Hsun, Yu 5.6 Group Action
Example 5.5.1. D1 ∼
= C2 , D 2 ∼
= K4 , D 3 ∼
= S3 .
Remark. In fact, one can actually find all the discrete symmetry group on a plane.
There are 7 types of discrete symmetry groups, called the frieze groups, with one-
dimensional lattice groups and 17 types of discrete symmetry groups, called the wallpaper
groups, with two-dimensional lattice groups.
Corollary 5.6.1. The data of a group action is precisely a group homomorphism from
G to the symmetric group of S.
67
Hung-Hsun, Yu 5.6 Group Action
When it comes to group action, the orbits and the stabilizers are usually considered.
Definition 5.6.3. Suppose that G acts on a set S. For any s ∈ S the orbit of s is the
set
orbG (s) := {gs|g ∈ G}.
The stabilizer of s is the subgroup
Gs := {g ∈ G|gs = s}.
Example 5.6.2. Consider the action of Dn on the set of vertices of regular n-gon. The
orbit of a vertex is the whole set. The stabilizer of a vertex is the subgoup {1, r} where
r is the reflection about the line passing through the vertex.
Besides, orbits and stabilizers have some really nice relations. One can think of
stabilizers as subgroups and orbits as the corresponding cosets.
Property 5.6.2. Suppose that G acts on S, and s is an elements in S. Then
(1) Ggs = gGs g −1 for any g ∈ G;
(2) (Counting formula for group action) If G is finite, then | orbG (s)||Gs | = |G|.
Sketch of Proof. (1) is trivial. To show (2), it suffices to show that elements in orbG (s)
are in bijection with the cosets of Gs .
Example 5.6.5. Consider the rotational group G that fixes a tetrahedron. This group
acts on the set of faces of tetrahedron. It is clear that the orbit of any face is the whole set.
Moreover, for any face f , there are three elements in G that fix f , namely the rotation
about the axis through the center of f by 0, 2π/3 and 4π/3. Therefore,
Now consider the group action of G on the set of all edges. It is clear that the orbit is
still the whole set. Therefore by the counting formula, for any edge e there should be two
elements in G that fix e. We can construct this elements explicitly: Suppose that e′ is
the unique edge that has no intersection with e, then the rotation about the line passing
the midpoints of e, e′ by π fixes e and e′ .
Therefore for any face we find two corresponding elements that have order 3, and for
any pair of disjoint edges we find one corresponding element that has order 2. Including
the identity, we find exactly 12 elements in G, 8 of them being of order 3 and 3 of them
being of order 2. In fact, G is isomorphic to A4 .
68
Hung-Hsun, Yu 5.7 Finite Subgroup of SO3
Therefore λ and λ−1 has the same multiplicity. Since there are 3 eigenvalues (counted
with multiplicity) and the product of them is 1, at least one of them is 1.
Note that every non-identity rotation has exactly two poles, and there is no preferred
choice of which should be included in the spin. This can be explained by the fact that
when two people see a rotation through the axis from opposite directions, one would say
that the rotation is clockwise while the other sees counterclockwise.
Now suppose that G is a finite subgroup of SO3 . Define the poles of G to be the poles
of non-identity rotations in G. If p is a pole of g, then hp is a pole of hgh−1 . Therefore
G acts on its poles.
Lemma 5.7.1. Let G be a finite subgroup of SO3 , and let G act on its poles. Suppose
that O1 , . . . , Ok are the orbits, and for any pi ∈ Oi the order of stabilizer of pi is ri . Then
Ç å
∑
k
1 2
1− =2− .
i=1 ri |G|
Sketch of Proof. We will do a double counting on the spins of G. Note that every non-
identity rotation corresponds to two spins, and so there are 2|G| − 2 spins in total. On
the other hand, for any pole p, if there are rp rotations that fix p, then there are (rp − 1)
spins correspond to rp since one of the rp rotations is identity. Hence,
∑
(rp − 1) = 2|G| − 2.
p
Since rp only depends on the orbit it lies in, we can replace it by ri . Suppose that
|Oi | = ni , then we have
∑
k
ni (ri − 1) = 2|G| − 2.
i=1
By the counting formula, ni ri = |G|. Therefore we can get the desired identity by dividing
both side by |G|.
69
Hung-Hsun, Yu 5.8 Random Problem Set
Sketch of Proof. We begin by solving for ri . Note that in the left hand side of the identity
in Lemma 5.7.1, each term is at least one half. Therefore k ≤ 3, and we can do a case-
by-case analysis. We first WLOG r1 ≤ r2 ≤ r3 .
If k = 1, then the left hand side is less than 1 while the right hand side is at least 1.
Therefore there is no solution in this case.
If k = 2, then
1 1 2
+ = .
r1 r2 |G|
Since r1 , r2 both divide |G|, this can only occur when r1 = r2 = |G|. Therefore all the
rotations in G are about the same axis, and so G ∼ = Cn .
If k = 3, then it is clear that r1 = 2. We first suppose that r2 = 2. Then r3 = n, |G| =
2n. Consider the poles p, p′ that form the third orbit. There are n rotations that fix p, p′
and the other half interchange p, p′ . Therefore there are n rotations about p, p′ by 2tπ/n
and n rotations about the axis perpendicular to p, p′ by π. This is the dihedral group
Dn .
If r2 > 2, then since 1/2+1/4+1/4 = 1, we have that r2 = 3. Since 1/2+1/3+1/6 = 1,
we have that r3 = 3, 4, 5. Therefore there are three possibilities:
(i) r1 = 2, r2 = 3, r3 = 3, |G| = 12: G is the tetrahedral group T .
(ii) r1 = 2, r2 = 3, r3 = 4, |G| = 24: G is the octahedral group O.
(iii) r1 = 2, r2 = 3, r3 = 5, |G| = 60: G is the icosahedral group I.
The proof that G is the tetrahedral/octahedral/icosahedral group is of more geometry
than of algebra and is thus omitted. One way to do this is to identify the orbit whose
elements form a regular polyhedron.
70
Hung-Hsun, Yu 5.8 Random Problem Set
2. (5.2) Suppose that ABC is a triangle on a plane, and DEF are points on the
exterior of ∆ABC such that ∆DBC, ∆AEC, ∆ABF are isosceles and ∠BDC =
2α, ∠CEA = 2β, ∠AF B = 2γ where α + β + γ = π.
(1) Consider the isometry ρF,γ ρE,β ρD,α . Show that this is the identity map.
(2) Consider the image of D under this composition of isometries. Conclude that
∆DEF is a triangle that has angles α, β, γ.
4. (5.3) Suppose that F is a nonempty figure on a plane such that the symmetry group
of F contains all rotations about a given point x. Show that either F is the whole
plane or the symmetry group of F is the same as the one of a circle centered at x.
5. (5.4) (Not hard but takes time) Find the seven frieze groups. A frieze pattern is
a figure whose symmetry group is a frieze group. For each frieze group, find a
corresponding frieze pattern.
71
Hung-Hsun, Yu 5.8 Random Problem Set
72
Chapter 6
Now that we get group action as a new tool, we can get back to groups and discover
more useful properties. In this section, one can see that different group actions reveal
different properties of a group. Although it may be puzzling why a specific group action
is considered, it might be better to simply memorize the result first, for the result itself
is usually easier to understand than its proof.
Sketch of Proof. Consider the group action of G on itself defined by left multiplication.
For any g ∈ G, if g fixes every element, then gx = x ∀x ∈ G, which implies g = 1.
Therefore the permutation representation of this group action is a monomorphism from
G to the symmetric group of G.
Note that the order of S|G| is (|G|)!, which is much larger than |G|. Therefore this
theorem is indeed not that useful practically. Theoretically, this theorem is implying that
symmetric group is the most complicated one. This theorem however does not reduce
the study of finite groups to the one of symmetric groups: people still understand the
structures of finite groups mainly from simple groups.
This group action actually does not give us too much, so let’s turn to another group
action.
73
Hung-Hsun, Yu 6.2 Class Equation
Definition 6.2.1. The orbit of an element g is called the conjugacy class of g, denoted
by Cg . The stabilizer of an element g is called the centralizer, denoted by CG (g). The
number of orbits is called the class number of G.
Property 6.2.1. |Cg ||CG (g)| = |G| for any g ∈ G, and the conjugacy classes form a
partition of G.
Example 6.2.1. Consider the tetrahedral group T . There is one identity, eight rota-
tions about a vertex by ±2π/3, and three rotations about the midpoints of a pair of edges
by π. The conjugacy class of the identity only contains the identity. Now suppose that
ρ is a rotation about a vertex v by θ, and ρ′ is another rotation about a vertex v ′ by θ.
Then ρ′ = f ρf −1 for any f ∈ T that maps v to v ′ , and so ρ, ρ′ are conjugates. Since
there are four vertices, we find four members in Cρ . Note that ⟨ρ⟩ is of order three and
is contained in the centralizer of ρ. Since 4 × 3 = 12 = |T |, the conjugacy class of ρ is of
order exactly 4.
It is easy to show that the rest of the rotations form a conjugacy class. Therefore
there is a conjugacy class of order 1, a conjugacy class of order 3 and two conjugacy
classes (corresponding to θ = ±2π/3 respectively) of order 4. A simple sanity check:
1 + 3 + 4 + 4 = 12, and so the conjugacy classes form a partition of T .
The information of conjugacy classes of a group makes it easier to determine all of its
normal subgroups.
Property 6.2.2. A subset N of a group G is a normal subgroup if and only if it is a
subgroup that is a union of conjugacy classes.
Note that since the conjugacy classes form a partition, the sums of their orders is the
order of the group. Together with the counting formula, we can get that
Theorem 6.2.1. (Class equation) Suppose that C1 , . . . , Ck are the conjugacy classes of
a finite group G, and for each i = 1, . . . , k pick a representative gi . Then
∑
k ∑
k
|G| = |Ci | = [G : CG (gi )].
i=1 i=1
Usually, we will isolate the conjugacy classes of order 1. They form a subgroup that
is called the center.
Definition 6.2.2. Suppose that G is a group. Its center Z is the subgroup containing
s such that gs = sg for all g ∈ G.
Corollary 6.2.1. Suppose that C1 , . . . , Ck are the conjugacy classes of a finite group G
74
Hung-Hsun, Yu 6.3 Normalizer
that have orders greater than 1, and for each i = 1, . . . , k pick a representative gi . Then
∑
k
|G| = |Z| + [G : CG (gi )].
i=1
∑
k
|S| = |Z| + [G : Gsi ].
i=1
6.3 Normalizer
We can furthermore extend the group action to the subsets of G. The stabilizers are
something that one may consider.
Definition 6.3.1. Suppose that S is a subset of G. The stabilizer of S under conjuga-
tion is called the normalizer of S, which is denoted by NG (S).
Theorem 6.3.1. (N/C Theorem) Suppose that H ≤ G, then NG (H)/CG (H) is a sub-
∩
group of Aut(H). Here CG (H) = h∈H CG (h).
Sketch of Proof. Let φ : NG (H) → Aut(H) be the homomorphism such that φ(n) sends
h to nhn−1 . Then φ is well-defined, and ker φ = CG (H).
Sketch of Proof. It suffices to show the existence of such g when p||G|. Consider the set
Since for any g1 , . . . , gp−1 ∈ G there is a unique gp such that g1 . . . gp = 1, the size of S
is |G|p−1 . Now consider the group action of Cp on S that permutes the tuples cyclically.
The class equation is
∑
k
|G|p−1 = |S| = |Z| + [Cp : (Cp )si ] = |Z| + pk.
i=1
75
Hung-Hsun, Yu 6.5 Sylow p-group
Therefore p||Z|. Note that (1G , . . . , 1G ) ∈ Z, so there exists (g1 , . . . , gp ) ∈ Z that is not
(1G , . . . , 1G ). Now since every cyclic permutation fixes (g1 , . . . , gp ), we have g1 = . . . = gp .
Therefore g1p = 1G and g1 ̸= 1G .
Remark. When |G| is not divided by p, there is only one element in Z. Therefore one
can get Fermat’s little theorem from this.
Theorem 6.5.1. (Sylow First Theorem) Suppose that G is a group of order pk m where
p is a prime and m is not divisible by p. Then for every s = 1, . . . , k there exists a
subgroup of G that has order ps .
Sketch of Proof. Let S be the set of the subsets of G of size ps , and consider the group
action of G on S defined by left multiplication. For every A ∈ S we can construct an
injection from GA to A mapping g to ga where a is a chosen element in A. Therefore
|GA | ≤ |A| = ps .
Now suppose that A1 , . . . , At are the representatives of the orbits, then
( )
pk m ∑
t
= |S| = [G : GAi ].
ps i=1
Note that (( ))
pk m
vp = k − s,
ps
so there exists i such that vp ([G : GAi ]) ≤ k − s, which implies vp (GAi ) ≥ s. Since
|GAi | ≤ ps , we have that the equality holds. Therefore GAi is the desired subgroup.
Example 6.5.1. Take G = S4 . Then there are three Sylow 2-subgroups according to
Example 2.10.1. It is also easy to see there are four 3-subgroups.
Sketch of Proof. Let S be the set of the left cosets of H, and consider the group action
of P on S defined by left multiplication. Then the class equation is
∑
t
m = [G : H] = |S| = |Z| + [P : Psi ].
i=1
Note that since Psi is not P and P is a p-group, we have p|[P : Psi ]. Combined with the
fact that p ∤ m, we have that p ∤ Z, and in particular there is a coset gH that is fixed by
76
Hung-Hsun, Yu 6.5 Sylow p-group
every element in P . Therefore P gH = gH, and so P (gHg −1 ) = gHg −1 . This shows that
P ≤ gHg −1 .
Theorem 6.5.3. (Sylow Third Theorem) Let n be the number of Sylow p-subgroups
of G. Then n divides the order of G, and n ≡ 1 mod p.
Sketch of Proof. Consider first that the group action of G on the set of Sylow p-subgroups
defined by conjugation. By the second theorem we know that the orbit is the whole set.
Therefore the size of the set divides G.
Now consider the group action of G on the set S of the subsets of size pk (which is
the special case of the group action considered in the first theorem). Then
( )
pk m ∑
t
= |S| = [G : GAi ].
pk i=1
Note that the number of the Sylow p-subgroups is the number of i such that |GAi | = pk ,
so
∑
t
[G : GAi ] ≡ mn mod p.
i=1
Therefore ( )
−1 pk m
n≡m ≡ m−1 m ≡ 1 mod p.
pk
Example 6.5.2. |S4 | = 24. There are three Sylow 2-subgroups, and 3 ≡ 1 mod 2,
3|24. There are four Sylow 3-subgroups, and 4 ≡ 1 mod 3, 4|24.
Sylow theorems, especially the third one, are really helpful to determine the possible
structures of groups of a given order.
Example 6.5.3. Let G be a group of order 15. Then there are one Sylow 3-group
M and one Sylow 5-group N according to Sylow third theorem. Therefore by Exercise
14 in Chapter 2, G ∼
= M ×N ∼ = Z/15Z, and so there is one group of order 15 up to
isomorphism.
Example 6.5.4. Now consider the group G of order 21. By Sylow third theorem, there
is only one Sylow 7-subgroup N , and so N is normal. Now when p = 3, we don’t know if
there is one Sylow 3-subgroup or 7 Sylow 3-subgroups. If there is only one, then by the
same argument used above we can see that G ∼ = Z/21Z, so let’s consider the second case.
Suppose that H is a Sylow 3-group, then G = N ⋊ H. Let x be the generator of N
and y be the generator of H, then yxy −1 = xi for some i = 1, . . . , 6. Since y is of order
3, we have i3 ≡ 1 mod 7. This shows that i = 1, 2, 4.
If i = 1, then it is easy to see that the semidirect product is indeed a direct product,
which is a contradiction. If i = 2, then we get another group G′ which is non-abelian.
It is easy to verify that there are 7 Sylow 3-subgroups in G′ . If i = 4, then it is also
isomorphic to G′ via the homomorphism that sends x to x and y to y −1 . Therefore there
are two groups of order 21 up to isomorphism.
77
Hung-Hsun, Yu 6.6 Free Group and Relation
a1 a2 . . . an
Definition 6.6.2. A free group FS generated by a set S is defined on the set of all
reduced words in S. The product of two words is the reduced word of the concatenation.
A group is free if it is isomorphic to a free group generated by some subset of it. The
rank of a free group is the cardinality of its freely generating set.
Example 6.6.1. The free group generated by a single element {x} consists of the
reduced words z n , z −n or ∅ where n ∈ N. Therefore the free group generated by an
element is isomorphic to (Z, +).
Property 6.6.1. (Universal property of free group) Suppose that G is a group generated
by S, then there is a unique homomorphism ϕ from FS to G such that ϕ(s) = s for every
s ∈ S. In other words, there is a unique homomorphism ϕ that makes the following
diagram commute.
ϕ
FS G
Sketch of Proof. For every reduced word a1 a2 . . . an in FS it is easy to see that ϕ(a1 . . . an )
has to be ϕ(a1 ) . . . ϕ(an ). To verify that ϕ is a homomorphism, we just need to show that
reduction does not affect the value that ϕ takes, which is true since ϕ(a)ϕ(a−1 ) = aa−1 =
1.
Corollary 6.6.1. Suppose that G is generated by S. Then G is a quotient group of
FS .
78
Hung-Hsun, Yu 6.6 Free Group and Relation
Remark. There is an easy but complicated proof for the case where the subgroup is
finitely generated. The main idea is to do something similar to row reduction to reduce
the length of the generating set.
Remark. Just like the definition of the group that a subset generates, there is an
explicit definition of the normal closure. This is somehow boring and is left as an exercise
for the readers.
ϕ
G G′
79
Hung-Hsun, Yu 6.7 Todd-Coxeter Algorithm
from FS to G′ such that ϕ′ (s) = s. Since all the relators vanish in G′ , we have R ⊆ ker ϕ′ .
Therefore the normal closure N of R in FS is also contained in ker ϕ′ . Therefore there is
a natural homomorphism from G = FS /N to G′ ∼ = FS / ker ϕ′ .
Although we can construct a lot of groups by writing down tons of presentations, it
is actually hard to work with the presentations because of the complexity of dealing with
free groups. It is even hard to determine whether two reduced words are the same under
the given relations.
What we are going to do is to use these four easy facts to compute the action. To
demonstrate this, let’s take our favorite dihedral group D4 as an example.
Example 6.7.1. Let G = ⟨x, y | x4 , y 2 , xyxy⟩ and H = ⟨x⟩ ≤ G. Then we know that
there is definitely a coset of H, namely H. For convenience, let’s label it 1. We know
that x fixes H, and all the relators fixes H. We can write this as the table.
x x x x y y x y x y
1 1 1 1 1 1 1 1 1 1
The table simply means that the i-th column is sent to the (i + 2)-th column by the
symble written in the (i + 1)-th column. Now we don’t know where y sends 1 to. Let’s
label this with 2 whatsoever. Then we can fill in some blanks:
x x x x y y x y x y
1 1 1 1 1 1 2 1 1 1 2 1
Therefore we know that the preimage of 1 under y is 2. We can hence fill in one more
blank:
x x x x y y x y x y
1 1 1 1 1 1 2 1 1 1 2 2 1
This shows that x sends 2 to 2. At this point, we know for sure that all the relators act
on 1 trivially. We still need to check this for 2:
x x x x y y x y x y
1 1 1 1 1 1 2 1 1 1 2 2 1
2 2 2 2 2 2 1 2 2 2 1 1 2
80
Hung-Hsun, Yu 6.7 Todd-Coxeter Algorithm
Now everything is compatible with the rules. Since the group action is transitive, we
know that the cosets are precisely 1 and 2. Although at this point we don’t know if 1 and
2 are distinct, there are no rules that tell us that they are the same, and we will prove
(informally) later that this shows that they are different. Therefore the index of ⟨x⟩ is 2,
which is consistent with the fact.
We can also do the similar algorithm for ⟨y⟩, and we can get the following table.
x x x x y y x y x y
1 2 3 4 1 1 1 1 1 2 4 1 1
2 3 4 1 2 2 4 2 2 3 3 4 2
3 4 1 2 3 3 3 3 3 4 2 3 3
4 1 2 3 4 4 2 4 4 1 1 2 4
Now note that x2 does not act trivially on the cosets of ⟨y⟩, which shows that x2 is not
the identity. Therefore |⟨x⟩| = 4, and so G is of order 8, which is again consistent with
the fact.
Besides the order of G and the indices of the subgroups, we also get a lot of bonus from
it. For example, since x fixes all the cosets of ⟨x⟩, we know that ⟨x⟩ is normal. We also
get permutation representations of G. The first table gives us the representation ϕ(x) =
id, ϕ(y) = (1 2) and the second one gives us the representation ϕ(x) = (1 2 3 4), ϕ(y) =
(2 4). The second one happens to be an injection, and so we successfully embed D4 into
S4 .
We can see that although we will definitely know the index of the subgroup (as long
as it is finite), we might have to do something else to get other information such as the
order of the subgroup/the group, or the order of a generator. A way to avoid this is to
take H as a trivial group, which has a cost of computational complexity.
Sometimes things do not go that well. There might be some collision of indices which
force the table to collapse.
Example 6.7.2. Let G = ⟨x, y|x2 , y 2 , yxyxy⟩ and H = ⟨x⟩. We can work out the table
similarly:
x x y y y x y x y
1 1 1 1 2 1 1 2 3 4 2 1
2 3 2 2 1 2 2 1 1 2 3 2
Now we see some collisions here: x sends 3 to 2 and sends 4 to 2. This shows that 3 = 4.
Also y sends 3 to 2 and 1 to 2, which shows that 3 = 1. So we can replace 3, 4 with 1:
x x y y y x y x y
1 1 1 1 2 1 1 2 1 1 2 1
2 1 2 2 1 2 2 1 1 2 1 2
Now there are collisions everywhere. For example, y sends 1 to 2 and also sends 1 to 1,
which shows that 1 = 2. Therefore the index of ⟨x⟩ is 1. In fact, we can directly see
y = 1 from the relation:
81
Hung-Hsun, Yu 6.8 Burnside’s Lemma
To state the algorithm formally is quite annoying (which is true for a lot of algorithms),
so I will state the algorithm informally since the above discussion should already say
enough about the algorithm. As a consequence, it is impossible to prove the correctness
formally, but I will demonstrate why the number of indices is the same as the number of
cosets.
Theorem 6.7.1. (Todd-Coxeter Algorithm) Given a group G = ⟨S | R⟩ and a subgroup
H. At each step, we can do one of the two following:
(1) Identify two indices if the rules force so;
(2) Choose a generator x and an index i such that the image of i under x is not
determined, and assign a newr index j to it.
If we can complete the table such that all the four rules are satisfied after finitely
many steps, then the number of cosets is equal to the number of distinct indices in the
table.
Sketch of Proof. We can construct the map from the indices to the cosets inductively. It
is easy for (2). For (1), we have to show that the two indices correspond to the same
cosets at the beginning, which is true since we only do step (1) when we are forced to. It
remains to show that the map is bijective. The surjectivity is clear since the group action
is transitive. The injectivity then follows from the fact that the stabilizer of 1 contains
H.
Sketch of Proof. The key is to spread the total contribution of a orbit uniformly to every
element of the orbit. Then the number of orbits is
∑ ∑ 1 ∑ g
| orbG (s)|−1 = |G|−1 |CG (s)| = |G|−1 |{g ∈ G, s ∈ S|gs = s}| = |S |.
s∈S s∈S G g∈G
Example 6.8.1. There is a ring with p ∈ P beads, colored with m colors. We are going
to count the number of different colorings of the beads. Here two colorings are seen to
be the same if we can get one from rotating the other one.
Consider the action of Cn on the set S of all the sequences of n m-colored beads.
Suppose that x is the rotation by 2π/n. For every 0 ≤ i < n, we have |S x | = mgcd(n,i) .
i
82
Hung-Hsun, Yu 6.9 Random Problem Set
83
Hung-Hsun, Yu 6.9 Random Problem Set
84
Chapter 7
Bilinear Form
In section 3.5 we briefly talked about multilinear forms. In this chapter, we will focus
on the easiest case: bilinear form. As the last chapter focusing on the structure of linear
algebra, we will endow the real vector space with an extra “measurement” via defining
the inner product. We will also talk about the analogy in the complex case.
Most of this chapter will be about real and complex vector space. This allows us to
discover more properties, and some of them are really striking. This is probably why
most of the linear algebra classes end with bilinear forms. So get ready, and let’s go
through the last chapter of the linear algebra part!
B(xβ, yβ) = xT Ay
Property 7.1.2. (Change of coordinate formula for bilinear form) Suppose that β and
β ′ are two bases of a finite dimensional space V and the matrix forms of B with respect
to β and β ′ are A and A′ , respectively. Then
T
A′ = [id]ββ ′ A[id]ββ ′ .
We will refrain from determining the intrinsic properties of bilinear forms because the
fun begins only when we consider bilinear forms that have certain good properties. This
will be discussed in the next section.
Now given a bilinear form B, we can actually construct some objects out of it. For
example, if we fix v, then B(v, ·) is a linear transformation from V to F . Therefore B is
85
Hung-Hsun, Yu 7.2 Start From Standard Inner Product
a map from V to “the linear transformation from V to F .” This leads us to consider the
dual space.
Definition 7.1.2. Suppose that V is a vector space over F . The dual space of V is the
space V ∗ containing the linear transformation from V to F . The dual space naturally
possess a structure of vector space.
Property 7.1.3. If V is finite dimensional, then V and V ∗ are of the same dimension.
Moreover, if β1 , . . . , βn form a basis of V , then the linear transformations Ti such that
∑
n
Ti ( aj βj ) = βi
j=1
Property 7.1.4. Given a bilinear form B. Then we can construct naturally maps
B1 , B2 : V → V ∗ such that
B1 (v)(w) = B2 (w)(v) = B(v, w).
It has a lot of good properties that we hope that the ”good abstract bilinear forms”
should also have.
Property 7.2.1. The standard inner product has the following properties:
(1) ⟨v, w⟩ = ⟨w, v⟩ for any v, w ∈ Rn ;
(2) ⟨v, v⟩ ≥ 0 for any v ∈ Rn , and the equality holds if and only if v = 0.
(3) If v ∈ Rn satisfies that ⟨v, w⟩ = 0 for all w ∈ Rn , then v = 0.
Back to the standard inner product case. The usual standard basis of Rn is e1 , . . . , en .
This basis interacts with the standard inner product very well.
86
Hung-Hsun, Yu 7.3 Symmetric Form
Property 7.2.2. Suppose that ⟨·, ·⟩ is the standard inner product on Rn and e1 , . . . , en
is the standard basis. Then for any i ̸= j we have ⟨ei , ej ⟩ = 0, and for any i we have
⟨ei , ei ⟩ = 1.
∑
n
⟨v, w⟩ = vi wi .
i=1
Note that the standard inner product on Cn IS NOT A BILINEAR FORM. We are
willing to give up the linearity with respect to the first entry because this keeps the
positive-definiteness.
Definition 7.2.6. A function ⟨·, ·⟩ : V × V → C where V is a C-vector space is said to
be hermitian if it satisfies the following:
(1) ⟨cv1 + v2 , w⟩ = c⟨v1 , w⟩ + ⟨v2 , w⟩;
(2) ⟨v, cw1 + w2 ⟩ = c⟨v, w1 ⟩ + ⟨v, w2 ⟩;
(3) ⟨v, w⟩ = ⟨w, v⟩.
We can also define orthogonal, orthonormal, positive definite, positive semi-definite
and inner product in a similar way as we did for bilinear form.
Now that we know which properties might give us something interesting, we are ready
to explore them one by one in the following sections.
Theorem 7.3.1. If V is a finite dimensional vector space over F and char(F ) ̸= 2, then
any symmetric form on V is diagonalizable.
Sketch of Proof. This can be proved by induction. The case that dim V = 1 is trivial.
Suppose that the statement is true for dim V = n − 1, then for dim V = n, suppose
that β1 , . . . , βn is a basis of V . If there is i such that B(βi , βi ) ̸= 0, WLOG assume that
87
Hung-Hsun, Yu 7.3 Symmetric Form
B(βn , βn ) ̸= 0. Let
B(βi , βn )
βi′ = βi − βn
B(βn , βn )
for i = 1, . . . , n − 1. Then we have
B(βi , βn )
B(βi′ , βn ) = B(βi , βn ) − B(βn , βn ) = 0.
B(βn , βn )
Therefore we can take β1 , . . . , βj−1 , βj′ , βj+1 , . . . , βn as a new basis and apply the above
argument.
Although we know that when char(F ) ̸= 2 we can diagonalize the symmetric form,
the result is usually not unique. For example, if we scale one element of the orthogonal
basis by c, then the corresponding diagonal entry will be scaled by c2 . This tells us that
when F = R we can adjust the orthogonal basis so that the diagonal entries are 0, 1 or
−1, and when F = C we can make the diagonal entries 0 or 1. In fact, the numbers of
the 0’s, 1’s and −1’s are invariants.
Definition 7.3.1. The kernel of a symmetric form B is the set of all v such that
B(v, w) = 0 ∀w ∈ W .
Sketch of Proof. Suppose that β, β ′ are two orthogonal bases with respect to B, and there
are r positive diagonal entries and s non-negative diagonal entries with respect to β, and
r′ positive diagonal entries and s′ non-negative diagonal entries with respect to β ′ . It
suffices to show that r = r′ . We will show this by showing that r + s′ ≤ dim V and
r′ + s ≤ dim V .
Suppose that v1 , . . . , vr are elements in β corresponding to the positive entries and
w1 , . . . , ws′ are elements in β ′ corresponding to the non-negative entries. If a1 , . . . , ar and
b1 , . . . , bs′ are reals such that
′
∑
r ∑
s
ai v i + bi wi = 0,
i=1 i=1
88
Hung-Hsun, Yu 7.4 Hermitian Form
∑r ∑s′
let v = i=1 ai vi and w = i=1 bi wi . Then
∑
r
0 = B(v, v + w) = B(v, v) + B(v, w) = a2 B(v
i i , vi ) + B(v, w) ≥ B(v, w)
i=1
and
′
∑
s
0 = B(w, v + w) = B(w, w) + B(v, w) = b2 B(w
i i , wi ) + B(v, w) ≤ B(v, w).
i=1
Property 7.4.1. Suppose that V is a finite dimensional C-vector space, and ⟨·, ·⟩ is a
hermitian form on V . Then for any given basis β and any x, y ∈ Cn we have
⟨xβ, yβ⟩ = x∗ Ay
Property 7.4.2. (Change of coordinate formula for hermitian form) Suppose that β
and β ′ are two bases of a finite dimensional C-vector space V and the matrix forms of a
hermitian form ⟨·, ·⟩ with respect to β and β ′ are A and A′ , respectively. Then
∗
A′ = [id]ββ ′ A[id]ββ ′ .
⟨x, y⟩ = x∗ Ay ∀x, y ∈ Cn
Theorem 7.4.1. If V is a finite dimensional complex vector space, then any hermitian
form on V can be diagonalized.
89
Hung-Hsun, Yu 7.5 Orthogonality
Sketch of Proof. The proof is similar to the proof for symmetric form. The only thing
we have to actually prove that is if there is x, y such that ⟨x, y⟩ ̸= 0, then there exists v
such that ⟨v, v⟩ ̸= 0. If ⟨x, x⟩ ≤ 0 or ⟨y, y⟩ ̸= 0, then we’re done, so let’s suppose that
they are both zero. Let v = ⟨x, y⟩x + y, then
Theorem 7.4.2. (Sylvester’s law of inertia) Suppose that ⟨·, ·⟩ is a symmetric form on
C-vector space V . The numbers of positive diagonal entries, negative diagonal entries
and zero entries of a diagonalization of ⟨·, ·⟩ is independent of the choice of orthogonal
basis.
Sketch of Proof. The proof is identical to the one for symmetric forms.
Basically a lot of properties of symmetric form also hold for hermitian form if one
replaces transpose with conjugate transpose.
The hermitian matrix itself has a lot of interesting properties. We will introduce one
of them first.
Property 7.4.4. Any eigenvalue of a hermitian matrix is real.
Sketch of Proof. Suppose that A is hermitian and λ is its eigenvalue. Let v be a non-zero
vector such that Av = λv. Then
λv = Av = A∗ v = λv.
Although this corollary is easy, it is somehow striking. Without the aid of complex
number, it will be very hard to show that the characteristic polynomial actually splits in
reals.
7.5 Orthogonality
In this section, we will suppose that V is a finite dimensional vector space equipped with
a symmetric/hermitian bilinear form B. For simplicity, we will use ∗ to indicate transpose
if B is a symmetric form, and to indicate conjugate transpose if B is hermitian. For every
subspace W of V , we can consider the set of vectors that are orthogonal to any element
in W .
Definition 7.5.1. The orthogonal complement of a subspace W is the set
{v ∈ V |B(w, v) = 0 ∀w ∈ W }.
90
Hung-Hsun, Yu 7.5 Orthogonality
Instead of inspecting the orthogonal complement, we can also consider the matrix
form to see if the symmetric/hermitian form is non-degenerate.
Property 7.5.3. B is non-degenerate if and only if its matrix form is invertible.
Sketch of Proof. Suppose that A is the matrix form of B with respect to a basis β. Then
for any v ∈ F n we have vβ ∈ V ⊥ if and only if
w∗ Av = 0 ∀w ∈ F n ,
which is equivalent to Av = 0.
Now if B is non-degenerate on W , then W + W ⊥ is a direct sum. We naturally hope
that the direct sum is V . In this way, we will have a canonical way splitting V into the
direct sum of W and another subspace. Suppose that this is true, then we will get a
projection P from V to W such that P 2 = P and im P = W . Moreover, for any v ∈ V
we have P v − v is orthogonal to v.
Definition 7.5.2. If P is a linear operator such that P 2 = P and for any w ∈ im P, w′ ∈
ker P we have B(w, w′ ) = 0, then we say that P is an orthogonal projection.
A = −P −1 Q
91
Hung-Hsun, Yu 7.6 Inner Product Space
Sketch of Proof. Choose a basis of W and a basis of W ⊥ . We know that the two bases
together form a basis of V . Consider the matrix form of B with respect to this basis.
Then it is of the form ñ ô
P 0
.
0 S
Since B is non-degenerate, we have that both P and S are invertible. This shows that B
is also non-degenerate on W ⊥ . Consequently, V = (W ⊥ ) ⊕ (W ⊥ )⊥ , and so (W ⊥ )⊥ = W
because they have the same dimension.
From these facts, we know that things get easier when the symmetric/hermitian form
is non-degenerate on every subspace. Inner product happens to satisfy this property, and
so we will focus on the case when B is an inner product in the next section.
∑
i−1
⟨vj′ , vi ⟩ ′
vi′ = vi − ′ ′
vj .
j=1 ⟨vj , vj ⟩
Moreover,
⟨v1′ , v1′ ⟩− 2 v1′ , . . . , ⟨vn′ , vn′ ⟩− 2 vn′
1 1
92
Hung-Hsun, Yu 7.7 Spectral Theorem
When working with inner product spaces, we usually consider the orthonormal basis.
This limits the possibility of the base-change matrix.
Property 7.6.1. If F = R, then a real square matrix A is some base-change matrix
of V if and only if A is orthogonal. If F = C, then a complex square matrix A is some
base-change matrix of V if and only if A∗ = A−1 .
Definition 7.6.2. If B = QAQ−1 for some orthogonal matrix Q, then A, B are said
to be orthogonally similar. If B = QAQ−1 for some unitary Q, then A, B are said to be
unitarily similar.
Note that the operator ∗ on the matrices is compatible with conjugations of orthogo-
nal/unitary matrices: If Q is orthogonal/unitary, then
⟨T x, y⟩ = ⟨x, T ∗ y⟩.
⟨T x, y⟩ = ⟨x, T y⟩ ∀x, y ∈ V.
⟨T x, T y⟩ = ⟨x, y⟩ ∀x, y ∈ V.
93
Hung-Hsun, Yu 7.7 Spectral Theorem
This definition seems to come out of nowhere at this point, but as we will see, normal
operators are precisely those that are diagonalizable.
Property 7.7.1. A linear operator T is normal if and only if ⟨T x, T y⟩ = ⟨T ∗ x, T ∗ y⟩
for any x, y ∈ V .
Now we are going to prove that normal operators can be diagonalized. The concept
of invariant subspace here is still helpful.
Property 7.7.2. Suppose that W is a T -invariant subspace. Then W ⊥ is a T ∗ -invariant
subspace.
Sketch of Proof. Let w′ be any element in W ⊥ . We have to prove that for any w ∈ W
we have ⟨w, T ∗ w′ ⟩ = 0. This is justified by
⟨w, T ∗ w′ ⟩ = ⟨T w, w′ ⟩ = 0
due to the fact that T w ∈ W .
Lemma 7.7.1. Suppose that T is a normal operator and v is its eigenvector correspond-
ing to the eigenvalue λ. Then v is an eigenvector of T ∗ corresponding to the eigenvalue
λ.
94
Hung-Hsun, Yu 7.8 Positive Definite and Semi-definite Matrices
Note that unitary matrices and hermitian matrices are all normal, and so these results
apply particularly on those matrices.
Corollary 7.7.3. Every conjugacy class of Un consists of at least one diagonalized
matrix.
Corollary 7.7.4. (Spectral theorem for symmetric matrix) A real square matrix is
orthogonally similar to a diagonalized matrix if and only if it is symmetric.
Actually most of the inner product spaces that people care about are infinite dimen-
sional, and those places are where the fun begins. However, they are more of calculus
than of linear algebra, so I didn’t mention them. If you are interested, try to google it
for fun!
95
Hung-Hsun, Yu 7.8 Positive Definite and Semi-definite Matrices
∑
every x = ci xi where ci are not all zero, we have
∑
n ∑
n ∑
n ∑
n ∑
n
x∗ Ax = ( ci x∗i )A( ci x i ) = ( ci x∗i )( λi ci xi ) = |ci |2 λi > 0.
i=1 i=1 i=1 i=1 i=1
where for every S ′ ⊆ [m] and S ′′ ⊆ [n], the symbol AS ′ ,S ′′ denotes the submatrix of A
containing rows indexed by elements in S ′ and columns indexed by elements in S ′′ , and
the symbol BS ′′ ,S ′ denotes the submatrix of B containing the rows indexed by elements
in S ′′ and columns indexed by elements in S ′ .
96
Hung-Hsun, Yu 7.8 Positive Definite and Semi-definite Matrices
Theorem 7.8.1. (Sylvester’s criterion for positive definiteness) A real symmetric/ her-
mitian matrix A is positive definite if and only if all of the leading principal minors of A
is positive.
Sketch of Proof. Suppose that A is positive definite, then there exists an invertible matrix
B such that B ∗ B = A. We can actually show that any principal minor of A is positive.
For any S ⊆ [n], let P ∈ Mn,|S| satisfy
1i is the j-th element in S
pij = .
0 otherwise
Then
∑
det(AS,S ) = det(P ∗ AP ) = det((BP )∗ (BP )) = det((BP )∗[|S|],S0 ) det(BPS0 ,[|S|] )
S0 ⊆[n],|S0 |=|S|
∑
= det(BPS0 ,[|S|] )2 ≥ 0.
S0 ⊆[n],|S0 |=|S|
The equality holds if and only if every |S|-minor of BP is zero. However, this is not
possible since B is invertible. Therefore every principal minors of A is positive.
Conversely, if every leading principal minor is positive, then we can induct on n. It
is trivial when n = 1. Suppose that the statement holds for n = k − 1, then for n = k we
can diagonalize the first k − 1 by k − 1 submatrix so that
ñ ô
∗ I v
Q AQ = k−1∗
v a
for some vector v and real a. Because
∑
k−1
0 < | det(Q)|2 det(A) = a − |vi |2
i=1
∑
we have that a > |vi |2 . Therefore for every x we have that
∑
k−1 ∑
k−1 ∑
k−1
(Qx)∗ A(Qx) = (|xi |2 + 2ℜ(vi xi xk )) + a|xn |2 = |xi + vi xk |2 + (a − |vi |2 )|xk |2 ≥ 0.
i=1 i=1 i=1
The equality holds when xk = 0 and xi + vi xk = 0 for all i, which shows that x = 0.
Therefore A is also positive definite, and the desired statement follows by induction.
Theorem 7.8.2. (Sylvester’s criterion for positive semi-definiteness) A real symmet-
ric/hermitian matrix A is positive semi-definite if and only if all of the principal minors
of A is non-negative.
Sketch of Proof. With a same argument we can prove that if A is positive semi-definite
then every principal minor is non-negative. Conversely, if every principal minor is non-
negative, then since [xi ] charA (x) is the sum of n − i-minors multiplied by (−1)n−i , we
have that (−1)n−i [xi ] charA (x) ≥ 0. Therefore for any t > 0 we have that
∑
n
(−1)n charA (−t) = (−1)n−i ti [xi ] charA (x) > 0,
i=0
and so charA (x) has no negative roots. This shows that A is positive semi-definite.
97
Hung-Hsun, Yu 7.9 Application 1: Quadratic Form and Quadric
q(x1 , x2 , . . . , xn ) + c1 x1 + · · · + cn xn + c
In this section, we will focus on the case F = R and classify all the conics. To be
more specific, we will determine the orbits of conics under the action of isometries of the
plane. To achieve this, let’s first examine quadratic forms over real.
Property 7.9.1. Suppose that q(x1 , . . . , xn ) = a11 x21 + · · · + ann x2n + 2a12 x1 x2 + · · · +
2a(n−1)n xn−1 xn . Then we have
xT Ax = q(x)
for all x ∈ Rn . Here A is real symmetric. Conversely, for any real symmetric matrix A,
the form xT Ax is a quadratic form over x1 , . . . , xn .
∑n 2
Corollary 7.9.1. There exists an orthogonal operator T such that q(T x) = i=1 bii (T x)i
where bii are some reals.
Sketch of Proof. Write q(x) = xT Ax. By the spectral theorem there exists an orthog-
onal matrix Q such that QT AQ is diagonalized. Therefore q(Qx) = xT (QT AQ)x =
∑n 2
i=1 bii (Qx)i where bii are some reals.
Now let’s classify the conics. We know that it is the zero locus of the function
xT Ax + bT x + c
for some real symmetrix 2 by 2 matrix A, two dimensional vector b and constant c. By the
spectral theorem we can assume that A is diagonalized. Therefore the equation becomes
98
Hung-Hsun, Yu 7.9 Application 1: Quadratic Form and Quadric
ax21 − x2 = 0 (parabola).
We can do the same thing in R3 and get the classification of nondegenerate quadrics
in R3 :
Theorem 7.9.2. Every nondegenerate quadric in R3 is congruent to one of the follow-
ing:
ax21 + bx22 + cx23 − 1 = 0 (ellipsoid)
ax21 + bx22 − cx23 − 1 = 0 (hyperboloid of one sheet)
ax21 − bx22 − cx23 − 1 = 0 (hyperboloid of two sheets)
ax21 + bx22 − x3 = 0 (elliptic paraboloid)
ax21 − bx22 − x3 = 0 (hyperbolic paraboloid)
where a, b, c > 0.
Since we are less familiar with the nondegenerate quadrics in R3 , let’s give some
geometric descriptions of the five cases.
1. An ellipsoid is like a distorted sphere. The intersection of any plane with an ellipsoid
is either empty, a point or an ellipse. This is why it is called an ellipsoid.
99
Hung-Hsun, Yu 7.10 Application 2: Legendre Polynomial
∑
n−1 Ä ä
= (−1)i P (n−i−1) (1)Q(i) (1) − P (n−i−1) (−1)Q(i) (−1) .
i=0
100
Hung-Hsun, Yu 7.10 Application 2: Legendre Polynomial
Sketch of Proof. If m ̸= n, we can assume that m < n, and so ⟨Pm , Pn ⟩ = 0 by the choice
of Pn . Now by the discussion above, we know that
Ä2n ä Ä2n ä
∫ 1 ∫ 1 ∫ 1
⟨Pn , Pn ⟩ = (−1)n
P (x)P (2n)
(x)dx = (−1) n n
(x − 1) dx =
2 n n
(1 − x2 )n dx.
−1 −1 22n 22n −1
It is well-known that
∫ 1 ∫ π
22n+1 (n!)2
(1 − x2 )n dx = sin2n+1 (x)dx = .
−1 0 (2n + 1)!
Therefore Ä2n ä
22n+1 (n!)2 2
⟨Pn , Pn ⟩ = n
· = .
22n (2n + 1)! 2n + 1
Now for every f ∈ C([−1, 1], R) we can consider its orthogonal projection to R[x]≤n :
∑
n
2n + 1
fn = ⟨P, Pn ⟩Pn (x).
i=0 2
It is clear that ||fn || is increasing, and that ||f − fn ||2 + ||fn ||2 = ||f ||2 . One might guess
that ||f − fn || tends to 0 as n tends to infinity, which shows that f0 , f1 , . . . is a sequence
of polynomials that “tends to” f . This is actually true because “the Parseval’s identity
holds” in this case. Since the proof involves some calculus details, it is omitted here.
Theorem 7.10.1. (A weak form of Stone-Weierstrass theorem) For every continuous
function f on an interval [a, b], there exists a sequence of polynomials f1 , f2 , . . . such that
∫ b
lim (f (x) − fn (x))2 dx = 0.
n→∞ a
101
Hung-Hsun, Yu 7.11 Random Problem Set
2. (7.2) Suppoes that ⟨·, ·⟩ is a hermitian form on V . Show that for any v ∈ V we
have ⟨v, v⟩ ∈ R.
4. (7.3) Let A be an n by n matrix such that for any i, j, we have Aij = 1 if and only
if |i − j| = 1. Since A is real symmetric, we know that every eigenvalue is real.
Find all of its eigenvalues and the corresponding eigenvectors. Is A diagonalizable?
5. (7.5) Let R[x] be a vector space equipped with the inner product
∫ 1
⟨f, g⟩ = f (x)g(x)dx.
0
7. (7.6) (Riesz representation theorem for finite dimensional real inner product space)
Suppose that ⟨·, ·⟩ is an inner product on a R-vector space V . Prove that the map
from V to V ∗ sending x to φx is an isomorphism. Here
φx (y) = ⟨x, y⟩ ∀y ∈ V.
8. (7.7) Suppose that T is a linear operator on a complex inner product space. Show
that T is normal if and only if there exists a polynomial g such that T ∗ = g(T ).
10. (7.9) (Wolstenholme’s inequality) Show that if x1 , x2 , x3 , θ1 , θ2 , θ3 are reals such that
θ1 + θ2 + θ3 = π, then
Hint: You can prove that cos2 θ1 + cos2 θ2 + cos2 θ3 + 2 cos θ1 cos θ2 cos θ3 = 1. You
can also try to do row operations to calculate the determinant.
102
Hung-Hsun, Yu 7.11 Random Problem Set
11. (7.10) Let F be the set of all real-valued continuous functions of period 2π defined
on R, and define an inner product on it:
∫ π
⟨f, g⟩ = f (x)g(x)dx.
−π
Show that {1, sin x, sin 2x, . . . , cos x, cos 2x, . . .} is an orthogonal set. Normalize this
orthogonal set. Express f in an infinite linear combination of 1, sin nx, cos nx where
f ∈ F satisfies
x + π −π ≤ x ≤ 0
f (x) = .
π−x 0≤x≤π
103
Hung-Hsun, Yu 7.11 Random Problem Set
104
Chapter 8
Group representation
Group representation is a relatively new tool to find some properties of groups. We know
little about groups, but we know a lot about vector spaces, so we simply study groups by
examining its (linear) action on vector spaces. This weird idea somehow takes us really
far and gives us some surprising result.
This note will be focusing on group representations of finite groups over complex
number. Group representations behave totally different if we consider other fields of
positive characteristics or if the group is not finite, so we will not discuss those here.
8.1 Definition
Definition 8.1.1. A representation of a group G on a vector space V is a homomor-
phism from G to GL(V ) where GL(V ) denotes all the invertible operators on V . It is
faithful if it is injective. The dimension of the representation is the dimension of V .
Sketch of Proof. By Cayley theorem there exists a monomorphism from G to S|G| . Since
there is also a monomorphism from S|G| to GL|G| (C) sending a permutation to its corre-
sponding permutation matrix, we’re done.
As always, we have to think about what the subobject is and what the map between
the objects is. Since we hope that the subrepresentation is still a representation of G,
the only possible way to define this is to consider the subspace of V . However, not all
subspaces give a subrepresentation–only those that are invariant under the action of G
do.
105
Hung-Hsun, Yu 8.2 Unitary Representation and Maschke’s Theorem
ρ(g) ρ′ (g)
V V′
T
If there is an equivariant map T from (ρ, V ) to (ρ′ , V ′ ) and an equivariant map T ′ from
(ρ′ , V ′ ) to (ρ, V ) such that T, T ′ are mutaully inverse, then we say that ρ and ρ′ are
isomorphic.
Definition 8.1.5. A representation is irreducible if it does not have any proper sub-
representation. If this is not the case, then the representation is said to be reducible.
Example 8.1.1. For every figure F on the plane R2 , if the point group of tye symmetry
group of F is trivial, then the symmetry group of F is a subgroup of GL(R2 ), which
induces naturally a faithful representation of dimension 2 over R. Take F as an equilateral
triangle, then we get a faithful representation of S3 . Since GL2 (R) is a subgroup of
GL2 (C), we get a faithful representation of S3 over C of dimension 2. Call it ρ′2 .
It seems that maybe every representation can be decomposed into a direct sum of
some irreducible representations. It is the case when G is finite and the representation is
finite dimensional and over C.
106
Hung-Hsun, Yu 8.2 Unitary Representation and Maschke’s Theorem
only need to show that if ρ1 is a proper subrepresentation of ρ, then there exists another
proper subrepresentation ρ2 such that ρ = ρ1 ⊕ ρ2 . In other words, we need to show that
if V ′ is a proper nontrivial G-invariant subspace, then there exists V ′′ such that V ′′ is
G-invariant subspace such that V = V ′ ⊕ V ′′ .
The problem here is that given a V ′ , there is no canonical way to choose V ′′ such that
V = V ′ ⊕ V ′′ . The key point here is that looking back Example 8.1.2, we find that the
two G-invariant subspaces are orthogonal complement of each other with respect to the
standard inner product. Therefore maybe we can define an appropriate inner product
⟨·, ·⟩ on V and take V ′′ to be the orthogonal complement. To show that V ′′ is G-invariant,
we hope that for every g ∈ G, we have
⟨v ′ , v ′′ ⟩ = 0 ∀v ′ ∈ V ′ ⇒ ⟨v ′ , ρ(g)v ′′ ⟩ = 0 ∀v ′ ∈ V ′ .
⟨v ′ , ρ(g)v ′′ ⟩ = 0 ∀v ′ ∈ V ′
is the same as
⟨ρ(g)v ′ , ρ(g)v ′′ ⟩ = 0 ∀v ′ ∈ V ′ .
Therefore we will naturally hope that the inner product satisfies that
for all u, v ∈ V .
Definition 8.2.1. Suppose that V is an inner product space over C and ρ is a repre-
sentation of G on V . The representation ρ is said to be unitary if for every g ∈ G, the
operator ρ(g) is unitary.
Now our goal is clear: we have to construct an inner product such that the given
representation is unitary with respect to it. The trick that we will use is the averaging
trick. This trick will appear for another time soon after this, so keep the trick in mind.
Lemma 8.2.1. Suppose that ρ is a representation of a finite group G on V over C.
Then there exists an inner product on V such that ρ is unitary with respect to the inner
product.
Sketch of Proof. Let’s begin with an arbitrary inner product {·, ·} and try to construct
the desired inner product. For every v, w ∈ V , define
1 ∑
⟨v, w⟩ = {ρ(g)v, ρ(g)w}.
|G| g∈G
It is easy to verify that ⟨·, ·⟩ is still hermitian and positive definite. Also, for every g ′ ∈ G
we have
1 ∑ 1 ∑
⟨ρ(g ′ )v, ρ(g ′ )w⟩ = {ρ(gg ′ )v, ρ(gg ′ )w} = {ρ(g)v, ρ(g)w} = ⟨v, w⟩
|G| g∈G |G| g∈G
and so ρ is unitary.
Theorem 8.2.1. Suppose that ρ is a finite dimensional representation of a finite group
G on V over C. Then ρ can be decomposed as a direct sum of irreducible representations.
107
Hung-Hsun, Yu 8.3 Schur’s Lemma
T ρV (g)v = ρW (g)T v = 0
ρW (g)w = ρW (g)T v = T ρV T ρV
Now if ρV and ρW are irreducible, then we immediately get that T must be trivial or
be an isomorphism. In fact, we can prove something even stronger.
Theorem 8.3.1. (Schur’s Lemma) Suppose that ρV , ρW are irreducible representations
of G on V, W over C respectively, and T is an equivariant map from V to W . If V, W are
not isomorphic, then T = 0. If V = W , then T is a scalar multiple of the identity map.
108
Hung-Hsun, Yu 8.4 Interlude: Tensor Product
not a linear transformation (because we don’t even have a good way to define a structure
of vector space on U × V ), which is kind of sad. If there exists a vector space U ⊗ V
such that every bilinear map ϕ : U × V → W induces naturally and bijectively a linear
operator ϕ′ : U ⊗ V → W , then maybe we will be happier.
The first intuition to build U ⊗ V is to consider the vector space spanned by u ⊗ v for
all u ∈ U, v ∈ V . Then we can simply define ϕ′ (u ⊗ v) = ϕ(u, v). We have to nonetheless
record the bilinearity of ϕ into U ⊗ V . Since for every c ∈ F, u1 , u2 ∈ U, v ∈ V we have
ϕ(u1 + cu2 , v) = ϕ(u1 , v) + cϕ(u2 , v),
we automatically hope that (u1 + cu2 ) ⊗ v = u1 ⊗ v + cu2 ⊗ v. Similar property should
also hold when U, V are interchanged.
Definition 8.4.1. For any two vector spaces U, V over F , the tensor product U ⊗ V of
U and V is the vector space spanned by u ⊗ v for all u ∈ U, v ∈ V . Here u ⊗ v satisfies
the relation
(u1 + cu2 ) ⊗ v) = u1 ⊗ v + cu2 ⊗ v, u ⊗ (v1 + cv2 ) = u ⊗ v1 + cu ⊗ v2 .
Property 8.4.1. (Universal property of tensor product) For any vector spaces U, V, W
over F , if ϕ : U × V → W is a bilinear map, then there exists a unique linear map
ϕ′ : U ⊗ V → W such that ϕ′ (u ⊗ v) = ϕ(u, v). In other words, there exists a unique
linear map ϕ′ such that the following diagram commutes:
ϕ
U ×V W
φ
ϕ′
U ⊗V
Sketch of Proof. From the universal property, it is natural to construct a bilinear map
ϕ : U ∗ × V → L(U, V ). For every f ∈ U ∗ and v ∈ V , we have to decide what ϕ(f, v)
is. Since ϕ(f, v) is a linear map from U to V , we have to decide what ϕ(f, v)u is for all
u ∈ U . It is then natural to let
ϕ(f, v)u = (f u)v.
It is easy to show that ϕ is well-defined and bilinear. Therefore there is an induced linear
map ϕ′ from U ∗ ⊗ V to L(U, V ). We can verify that ϕ′ is an isomorphism by directly
constructing the inverse map
∑ ∗
ϕ′−1 T = u i ⊗ f (ui )
i
where u1 , . . . , un form a basis of U and u∗1 , . . . , u∗n form the corresponding dual basis of
U ∗.
109
Hung-Hsun, Yu 8.5 Character
Besides vector spaces, we can also define the tensor product of linear transformations.
Definition 8.4.2. Suppose that T1 : U1 → V1 and T2 : U2 → V2 are two linear maps,
then the tensor product T1 ⊗ T2 is the linear map from U1 ⊗ U2 to V1 ⊗ V2 such that
for all u1 ∈ U1 , u2 ∈ U2 .
Now that we know how to define the tensor product of linear transformations, we can
also define the tensor product of the representation.
Definition 8.4.3. Let ρ1 , ρ2 be the representations of G on V1 , V2 , respectively. Then
the tensor product ρ1 ⊗ ρ2 of ρ1 and ρ2 is the representation on V1 ⊗ V2 that satisfies
8.5 Character
The data ρ : G → GL(V ) is often too complicated to deal with. If V is finite dimensional,
then we can consider some forgetful map GL(V ) → F that simplifies the data. So far
we’ve always considered determinant when we need to choose a forgetful map. However
determinant does not work quite well in this case (see the problem set for explanations).
In this specific situation, it turns out that taking the trace works better.
Definition 8.5.1. Suppose that ρ is a finite dimensional representation of G. The
character χρ of ρ is a function from G to F such that
χρ (g) = tr ρ(g) ∀g ∈ G.
If G is finite, then we know that for each g ∈ G there exists k ∈ N such that ρ(g)k = id.
This shows that if ρ is over C, then all the eigenvalues of ρ(g) must be a k-th root of
unity. This tells us that:
Property 8.5.2. If ρ is a representation over C of G of dimension n and G is finite,
then χρ (g) is a sum of n roots of unity. Moreover, we have χρ (g −1 ) = χρ (g).
110
Hung-Hsun, Yu 8.6 Orthonormality of Irreducible Characters
Although we choose to forget something, we still hope that χ has some interaction
with the structure of G. This can be achieved by realizing χρ (gh) = tr ρ(g)ρ(h) =
tr ρ(h)ρ(g) = χρ (hg). We can rewrite this in a slightly different way:
Property 8.5.3. Suppose that g, g ′ are conjugates in G, then χρ (g) = χρ (g ′ ).
Property 8.5.4. If |G| is finite, then the class functions on G form a vector space over
C. Its dimensional is the class number of G.
Example 8.5.1. Let’s get back to our favorite example S3 . We want to evaluate the
possible irreducible characters of S3 . Note that isomorphic representations give the same
characters, so we only need to consider the non-isomorphic irreducible representations.
There are two one-dimensional characters, namely the identity and the sign map. To-
gether with the two-dimensional irreducible representation that we have already dis-
cussed, we have three irreducible representations of S3 now.
Besides, there are three conjugacy classes of S3 . Since characters are class functions,
we only need to determine the values they take on the identity, (1 2) and (1 2 3). By
simple calculation, we know that the three chacaters are as the following:
e (1 2) (1 2 3)
χ1 1 1 1
χ2 1 −1 1
χ3 2 0 −1
It is easy to see that χ1 , χ2 , χ3 actually form a basis of the class functions. More strikingly,
if we consider the inner product
1 ∑
⟨f, g⟩ = f (x)g(x),
|G| x∈G
then χ1 , χ2 , χ3 actually form an orthonormal basis! This tells us that maybe considering
the character that we define can give us some really strong statements that hold, and we
will prove some of them in the next section.
111
Hung-Hsun, Yu 8.6 Orthonormality of Irreducible Characters
Sketch of Proof. We first prove that the irreducible characters form an orthonormal basis.
Suppose that ρ, ρ′ are two irreducible representations of finite dimension on V and V ′ ,
resepctively. Then
Ñ é
1 ∑ 1 ∑ 1 ∑
⟨χρ , χρ′ ⟩ = χρ (g)χρ′ (g) = χρ (g)χρ′ (g −1 ) = tr ρ(g) ⊗ ρ′ (g −1 ) .
|G| g∈G |G| g∈G |G| g∈G
Note that
1 ∑
ρ(g) ⊗ ρ′ (g −1 )
|G| g∈G
is an linear operator on V ⊗V ′ , which is isomorphic to L(V ′ , V ), the vector space consisting
of all linear transformation from V ′ to V . Let T be any linear transformation from V ′ to
V , and let ϕ be any isomorphism from L(V ′ , V ) to V ⊗ V ′ . It is easy to verify that
(ρ(g) ⊗ ρ′ (g −1 ))(ϕ(T )) = ϕ(ρ(g)T ρ′ (g −1 )).
Let φ be the linear operator on L(V ′ , V ) such that
1 ∑
φ(T ) = ρ(g)T ρ′ (g −1 ),
|G| g∈G
then we have ⟨χρ , χρ′ ⟩ = tr φ. The main observation here is that any linear transformation
in im φ is equivariant. Therefore it is a direct corollary of Schur’s lemma that if ρ and ρ′
are not isomorphic, then ⟨χρ , χρ′ ⟩ = 0. If ρ and ρ′ are isomorphic, then we can WLOG
assume that ρ = ρ′ . This shows that φ(T ) = λ id by Schur’s lemma. Moreover, it is easy
to verify that φ(id) = id. Therefore ⟨χρ , χρ′ ⟩ = tr φ = 1, as desired.
Now it remains to show that all the irreducible characters indeed form a basis. If f
is a class function such that
⟨f, χ⟩ = 0
for any irreducible character χ, we have to show that f = 0. Let ρ be an arbitrary
irreducible representation of G over C. Then it is easy to verify that the linear operator
1 ∑
Tρ = f (g)ρ(g −1 )
|G| g∈G
is equivariant. Also
tr Tρ = ⟨f, χρ ⟩ = 0.
Hence by Schur’s lemma Tρ = 0. Now consider the regular representation ρreg . By
Maschke’s theorem we know that ρreg can be decomposed as a direct sum of irreducible
representations. Therefore the linear operator
1 ∑
Tρreg = f (g)ρreg (g −1 )
|G| g∈G
is also zero. Now since ρreg (g) is linearly independent, it is clear that f = 0.
Corollary 8.6.1. Suppose that G is a finite group. Then the character of a repre-
sentation determines the representation uniquely up to isomorphism. In particular, if
χ1 , . . . , χn are all the irreducible characters, then
⊕
n
ρ= ⟨χ, χi ⟩ρi .
i=1
112
Hung-Hsun, Yu 8.6 Orthonormality of Irreducible Characters
Corollary 8.6.2. Suppose that G is a finite group. For any irreducible character χ,
we have ⟨χ, χreg ⟩ ̸= 0. In other words, we can get all the irreducible representations by
decomposing the regular representation.
Corollary 8.6.3. Suppose that G is a finite group, χ1 , . . . , χn are all the irreducible
∑
characters, and di is the dimension of ρi , then ni=1 d2i = |G|.
Example 8.6.1. Let’s use the theorem that we proved to evaluate the character table
of A4 (or equivalently the tetrahedral group T ). We know that there are four conju-
gacy classes of A4 , where id, (1 2 3), (1 3 2), (1 2)(3 4) are the four representatives.
Therefore there are four irreducible characters. Suppose that di is the dimension of the
i-th irreducible character, then d21 + d22 + d23 + d24 = 12. The only solution to this is
d1 = d2 = d3 = 1, d4 = 3. We can then conclude that the character table is of the
following form.
id (1 2 3) (1 3 2) (1 2)(3 4)
χ1 1 1 1 1
χ2 1 a b c
χ3 1 a′ b ′
c′
χ4 3 x y z
113
Hung-Hsun, Yu 8.7 Restricted and Induced Representation
Note that A4 /K4 ∼ = C3 . This directly provides us three 1-dimensional characters. There-
fore we can fill in some more information:
id (1 2 3) (1 3 2) (1 2)(3 4)
χ1 1 1 1 1
χ2 1 ω ω2 1
χ3 1 ω2 ω 1
χ4 3 x y z
Now we can compute χ4 by the fact that χ1 , χ2 , χ3 , χ4 are orthonormal. After some easy
calculation, one can get that x = y = 0 and z = −1.
Example 8.6.2. Besides computing the character table, we can also use this to get
some general result of the class number. Suppose that |G| is a finite group with odd
order and c is its class number. Moreover, suppose that d1 , . . . , dc are the dimensions of
the irreducible characters of G. Then we have
∑
c ∑
c
|G| = d2 i ≡ 1 ≡ c mod 8
i=1 i=1
because di is odd. Compared with the method used in Problem 5 in Chapter 6, this is
much more straight forward.
This is pretty straightforward. The thing here is, can we do something that is somehow
the inverse of taking the restriction? That is, given a representation of a subgroup H,
how can we construct a representation of G from that?
Definition 8.7.2. Suppose that H is a subgroup of G, and ρ is a representation of H
on a vector space V . Let VHG be the vector space consisting of gv(g ∈ G, v ∈ V ) such
that
g(cv1 + v2 ) = cgv1 + gv2
and
(gh)v = g(ρ(h)v)
for all g ∈ G, h ∈ H. Then G acts on VHG naturally by g(g ′ v) = (gg ′ )v for any g, g ′ ∈ G.
This is a representation of G on VHG , and is denoted by IndG
H ρ. This is called the induced
representation of ρ.
114
Hung-Hsun, Yu 8.7 Restricted and Induced Representation
We can also states the definition of IndG H ρ in another way. Let gs (s ∈ S) be the
representatives of the left cosets of H. Then VHG is simply
⊕
gs V.
s∈S
For every g and any s ∈ S, we know that ggs falls into some coset of H. Let gσ(s) be its
representative, and ggs = gσ(s) hs . Then we can define
Example 8.7.1. Suppose that the subgroup H of G is trivial and ρ is the trivial
representation of H. Then ρG
H is the regular representation of G.
1 ∑
IndG
H χ(x) = χ(gxg −1 ).
|H| g∈G
gxg −1 ∈H
Sketch of Proof. Since we can decompose VHG into ⊕s∈S gs V , it suffices to calculate the
traces block-wise and sum them up. For a fixed x, the trace of IndG H ρ(x) on gs V is
zero if gσ(s) ̸= gs . Therefore we only need to consider the case σ(s) = s. In this case,
gs−1 xgs ∈ H and the trace is χ(gs−1 xgs ). Note that all the elements in gs ’s coset also
satisfy this. Therefore
∑ 1 ∑
IndG
H χ(x) = χ(gs−1 xgs ) = χ(gxg −1 ).
s∈S |H| g∈G
gs−1 xgs gxg −1 ∈H
1 ∑
IndG
H f (x) = f (gxg −1 ).
|H| g∈G
gxg −1 ∈H
With this property/definition, we can relate the inner product of the characters on G
to the one of the characters on H.
Theorem 8.7.1. (Frobenius Reciprocity) Suppose that G is a finite group, H is a
subgroup of G, and χ, ϕ are class functions on G, H, respectively. Then
115
Hung-Hsun, Yu 8.8 Dual Representation
Sketch of Proof.
1 ∑
⟨χ, IndG
H ϕ⟩G = χ(x)IndG
H ϕ(x)
|G| x∈G
1 ∑ ∑
= χ(x)ϕ(yxy −1 )
|G||H| x∈G y∈G
yxy −1 ∈H
1 ∑ 1 ∑
= χ(y −1 xy)ϕ(x)
|H| x∈H |G| y∈G
1 ∑
= χ(x)ϕ(x)
|H| x∈H
= ⟨χ|H , ϕ⟩H .
Example 8.7.2. Let H = {e} and ϕ be the trivial character. Then IndG H ϕ is the
standard character χreg . For every charcter χ of G, it is easy to see that χ|H is simply
the dimension of χ. Hence,
ρ∗ (g)T ρ(g) = In .
ρ∗ (g) = ρ(g −1 )T .
116
Hung-Hsun, Yu 8.9 Random Problem Set
Theorem 8.8.1. Suppose that |G| is odd. Then the only self-dual irreducible repre-
sentation of G is the trivial representation.
by the orthogonality. Now for any g ∈ G that has order d, we know that the eigenvalues
of ρ(g) are d-th roots of unity. Therefore
∑
χρ (g d )
1≤d′ ≤d,gcd(d′ ,d)=1
is an integer. Since ρ is self-dual, we know that ρ(g), ρ(g −1 ) share the eigenvalues. This
shows that we can pair up the non-real eigenvalues of ρ(g) and get that
∑
χρ (g d )
1≤d′ ≤d,gcd(d′ ,d)=1
117
Hung-Hsun, Yu 8.9 Random Problem Set
3. (8.2) Let S3 act on C3 by permuting the coordinates, and let W be the subspace
whose coordinates sum to zero. Let ρ : S3 → GL(W ) be the corresponding repre-
sentation. Choose a basis of W and write the representation explicitly. Find an
inner product on W such that ρ is unitary. Determine if ρ is irreducible or not.
4. (8.2) Show that if A, B are both linear operators on a vector space, and if they
commute, then they are simultaneously diagonalizable. That is, there exists a
basis that diagonalizes A and B. Using this, prove that every finite dimensional
representation of a finite abelian group over C decomposes as a direct sum of one
dimensional representations.
5. (8.2) Suppose that G is a finite abelian group. Show that the number of irreducible
representations of G over C is |G|.
Hint: You might need to use the fundamental theorem of finite abelain group.
Google it if you don’t know what that is.
6. (8.3) Show that Schur’s lemma does not hold if we replace C with R.
10. (8.6) Show that for any prime p, any group with order p2 is abelian. This time, use
the character theory to prove this.
11. (8.8) Let c be the number of conjugacy classes of a finite group G. Show that if G
is of odd order, then
|G| ≡ c mod 16.
118
Chapter 9
Ring
We have learned about group in the preceding chapters. However, in many situations
we might want to know not only about a single law of composition but also about the
interaction between two laws of composition. For example, if we consider the integers,
then we can consider the addition and multiplication at the same time. The abstract
concept of this is called ring.
One can view field as a ring that we can do division as long as the divisor is non-zero.
However, field and ring are substantially distinct from each other. In fact, the structure of
ring is so complicated that usually one is only interested in some particular rings. In this
chapter, we will introduce some basic properties of rings and various extra constraints
that we would like to add to the structure of rings.
Remark. Here we don’t require a ring to have a multiplicative identity. Some authors
tend to assume that ring has a multiplicaitive identity. To avoid ambiguity, we will from
now on call a ring that does not necessarily have a multiplicative identity “rng”, and call
a ring that has a multiplicative identity “ring with 1”.
Example 9.1.1. (Z, +, ×), (F, +, ×), (F [x], +, ×), (Z/nZ, +, ×), Mn×n (F ) are rings with
1. (xF [x], +, ×) is a rng. (N, +, ×), (Z, +, −) are not rngs.
Property 9.1.1. Suppose that R is a rng and 0 is the additive identity. Then 0 · x = 0
for all x ∈ R. Besides, if we denote the sum of n x’s by nx (where n can be negative),
then (nx) · y = n(x · y).
119
Hung-Hsun, Yu 9.2 Ring Homomorphism
From now on, we will drop · if it does not lead to any ambiguity.
Unlike the case in field, there might be two non-zero elements whose product is zero.
For example, 2 · 3 = 0 in Z/6Z.
Definition 9.1.2. If R is a rng and a ∈ R is an element such that there exists b ∈ R\{0}
satisfying ab = 0, then a is called a left zero divisor of R. If there exists b ∈ R\{0} such
that ba = 0, then a is called a right zero divisor of R. If two cases occur simultaneously,
then a is called a two-sided zero divisor. If an element is not a zero divisor, then it is
said to be regular.
Definition 9.1.3. If R is a ring with 1 and a is an element such that the multiplicative
inverse exists, then we call a a unit.
Property 9.1.2. For every ring with 1, the units form a multiplicative group.
In the case of group, we are sometime interested in the case when the elements com-
mute with each other. We also would like to restrict ourselves on this case for ring
sometime, as this simplifies the discussion a lot and is necessary for some extra results.
Definition 9.1.4. If xy = yx for any x, y ∈ R, then R is commutative. If R is
commutative and is a ring with 1, then we say that R is a commutative ring with 1.
Definition 9.1.5. If R is a ring with 1 whose non-zero elements are all regular, then R
is said to be a domain. If it is furthermore commutative, then R is an integral domain.
The name “integral domain” comes from the fact that integral domains share a lot of
properties with Z. This will be more clear in the following sections.
Property 9.1.3. If R is a finite integral domain, then R is a field.
Sketch of Proof. Let x be a non-zero element. Then the map a 7→ ax is injective since x
is not a zero divisor. This shows that the map is bijective, and so there exists a such that
ax = 1. Since R is commutative, a is the multiplicative inverse and so R is a field.
Remark. In fact, we just need the condition that R is a finite domain. In this case,
this statement is Wedderburn’s little theorem. This is not relevant to the content of this
note and so is not included. The readers are welcome to google for it if interested.
120
Hung-Hsun, Yu 9.3 Subring and Ideal
(ϕ, ϕ)
R×R R′ × R′
· ·
R R′
ϕ
Remark. In the context that rings are required to have multiplicative identities, we
usually furthermore require that ring homomorphism send the multiplicative identity to
the multiplicative identity. There are examples that a ring homomorphism does not send
1 to 1, but it is easy to verify that a ring homomorphism always sends 1 to the 1 of its
image.
Example 9.2.2. Consider the map ϕ : Z/3Z → Z/6Z such that ϕ(x + 3Z) = 4x + 6Z.
Then ϕ is a ring homomorphism. Note that ϕ(1 + 3Z) = 4 + 6Z, which is not the
multiplicative identity of Z/6Z but the multiplicative identity of the subring im ϕ =
2Z/6Z.
Definition 9.2.2. Suppose that R, R′ are two rngs. We say that R, R′ are isomorphic
if there exists a bijective ring homomorphism between R, R′ .
Remark. Usually if one assumes that an ordinary ring should contain a multiplicative
identity, then one would require a subring to contain the same multiplicative identity.
This restriction however makes some ideals not subrings. Also as we will see, if R is a
ring with 1 and S is a subring of R that is also a ring with 1, then 1S is not necessarily
1R . In commutative algebra the restriction that the subring contains the exactly same
multiplicative identity makes sense, but for our purpose this will not be needed.
Example 9.3.1. Consider the ring with 1 Z/6Z. The subset 2Z/6Z is a subring with 1.
However, the multiplicative identity of Z/6Z is 1 + 6Z, while the multiplicative identity
of 2Z/6Z is 4 + 6Z.
The concept of subring turns out to be somehow useless, unlike the case of subgroup.
Instead, we are more interested in ideal, which is the ring analogy of normal subgroups.
To see how we should define an ideal, let’s see what we need to have “quotient ring” make
sense.
121
Hung-Hsun, Yu 9.3 Subring and Ideal
(π, π)
R×R R/I × R/I
· ·
R R/I
π
Sketch of Proof. It is clear that we have to define (x + I)(y + I) to be (xy + I). We have
to check what it means for this to be well-defined. That is, when does
(x + i1 )(y + i2 ) − xy ∈ I
Definition 9.3.2. If I is a subring of a rng R such that IR, RI ⊆ I, then we say that
I is an ideal of R.
Remark. In the world of group we reserve the notation ⟨·⟩ for the subgroup generated
by some set. It turns out nonetheless that ideals are more useful than subrings that we
choose to reserve the notation for ideals instead of subrings.
Property 9.3.3. Suppose that R is a ring with 1 and u is a unit, then ⟨u⟩ = R.
Now that we’ve determined what a ring homomorphism is and what an ideal is, we
can state the isomorphism theorems in terms of ring. The proofs are left as exercises.
Theorem 9.3.1. (First isomorphism theorem) Suppose that ϕ : R → R′ is a ring
homomorphism, then ker ϕ is an ideal of R and im ϕ is a subring of R′ . Moreover,
R/ ker ϕ is isomorphic to im ϕ.
122
Hung-Hsun, Yu 9.4 Integral Domain and Divisibility
Now let’s first think over what “divisible” in Z means. If a, b are two integers such
that there exists c ∈ Z satisfying ac = b, then we say that b is divisible by a. Now it is
clear how we should generalize this.
Definition 9.4.1. Suppose that R is an integral domain, then b is divisible by a if there
exists c ∈ R such that ac = b. In this case, we say that a is a divisor of b, and b is a
multiple of a. This is denoted as a|b.
Property 9.4.3. If a|b and b|c, then a|c. For any a, b, c ∈ R, we have that a|b if and
only if a|(b + ca).
In Z, we know that if a|b and b|a, then a = ±b. In other words, we cannot differentiate
a from −a if we only use divisibility to differentiate them. The underlying factor here is
that ±1 are the only units in Z. A similar phenomenon occurs in the general case.
Property 9.4.4. If a|b and b|a, then there exists a unit u such that b = au.
Sketch of Proof. Let b = au and a = bv. Then a = uva, and so by the law of cancellation
we have a = 0 or uv = 1. If uv = 1, then we’re done. If a = 0, then it is clear that b = 0,
and so the statement still holds.
Definition 9.4.2. For any a, b in an integral domain R, we say that a, b are associates
if there exists a unit u such that b = au.
123
Hung-Hsun, Yu 9.4 Integral Domain and Divisibility
Example 9.4.1. Consider the integral domain C[x]. We know that x|2x and 2x|x, and
so x, 2x are associates.
Next, let’s think about what greatest common divisor and least common multiple
mean. The greatest common divisor d of a, b is the greatest number among all the
common divisors of a, b, i.e. d is the largest number that satisfies d|a and d|b. However,
we do not necessarily have a good well-ordering on the integral domain R, so let’s try to
describe the greatest common divisor only with divisibility.
If we assume the fundamental theorem of arithmetic, then we know that if d is the
greatest common divisor of a, b and x is an aribtrary common divisor of a, b, then x|d.
Note that this statement only involves divisibility, so let’s make this the definition of
greatest common divisor. We can make a similar definition for least common multiple.
Definition 9.4.3. Let a, b be two elements in an integral domain R. For any x ∈ R, if
x|a, x|b then we say that x is a common divisor of a, b. If a|x, b|x then we say that x is a
common multiple of a, b. If d is a common divisor of a, b such that any common divisor of
a, b divides d, then we say that d is the greatest common divisor of a, b. If l is a common
multiple of a, b such that any common multiple is divisible by l, then we say that l is the
least common multiple.
Property 9.4.6. (Uniqueness of gcd and lcm) If d, d′ are both gcd’s of a, b, then d, d′
are associates. Similarly, if l, l′ are both lcm’s of a, b, then l, l′ are associates.
Sketch of Proof. Since d′ is itself a gcd of a, b, we have by definition that d′ |d. Similarly
d|d′ , and so d, d′ are associates. Same argument works for lcm.
Note that with this definition, the greatest common divisor need not be positive in
Z. The reason that we always make gcd and lcm positive is simply because that we
prefer positive integers to negative integers. However, in the general setting there is no
particular reason that we should favor one associate over another one.
Besides, at this point it is unclear that whether gcd and lcm always exist. In fact,
there are a lot of scenarios that gcd and lcm do not exist. Soon we will see that if we add
some constraints to the integral domain R, then it will be guaranteed that gcd and lcm
always exist.
Now let’s look at the definition of prime numbers in Z. The usual definition is that
p is prime if p cannot be factored into ab such that a, b ̸= ±1. In the general setting
however, we are used to calling this property irreducible.
Definition 9.4.4. A non-zero non-unit element a ∈ R is irreducible if it cannot be
written as a = xy where x, y are not units.
The term prime is reserved for another property that the primes in Z also have.
Definition 9.4.5. A non-zero non-unit element a ∈ R is prime if for any b, c ∈ R such
that a|bc, we always have a|b or a|c.
124
Hung-Hsun, Yu 9.5 Ideal and Divisibility
Sketch of Proof. If p is primes in R and p = xy, then WLOG assume that p|x. Since x|p,
we have that p, x are associates, and so y is a unit.
Sketch of Proof. Actually we just need that R is a commutative ring with 1. It is easy
to verify that xR is an ideal that contains x, so it remains to show that if x ∈ I for an
ideal I, then xR ⊆ I. This is clear by the definition of an ideal.
Corollary 9.5.1. For any elements a, b ∈ R, we have a|b if and only if ⟨b⟩ ⊆ ⟨a⟩.
The question here is: how should we find the greatest common divisor? We have to
find an element d such that ⟨d⟩ is closest to the ideal ⟨a, b⟩. If there exists d such that
⟨d⟩ = ⟨a, b⟩, then we know that d must be the greatest common divisor. Similarly, if there
exists l such that ⟨l⟩ = ⟨a⟩ ∩ ⟨b⟩, then l is the least common multiple. We naturally hope
that every ideal in R can be expressed as the form ⟨d⟩, or in other words, is “principal”.
Definition 9.5.1. An ideal I is principal if there exists x ∈ I such that I = ⟨x⟩.
Definition 9.5.2. An integral domain R is a principal ideal domain (PID) if every ideal
in R is principal.
Example 9.5.1. As we soon will see, Z is a PID. Now let’s take a = 12 and b = 18.
Then 12Z+18Z = 6Z, and so gcd(12, 18) = 6. 12Z∩18Z = 36Z, and so lcm(12, 18) = 36.
Theorem 9.5.1. (Bezout’s theorem) Suppose that R is a PID. Then for any a, b, c ∈ R,
there exist x, y ∈ R such that ax + by = c if and only if gcd(a, b)|c.
Sketch of Proof. There exist x, y ∈ R such that ax + by = c if and only if c ∈ ⟨a⟩ + ⟨b⟩.
Since R is PID, we know that ⟨a, b⟩ = ⟨d⟩ where d = gcd(a, b), and so the condition is
equivalent to c ∈ ⟨d⟩, which is equivalent to d|c.
We can also translate the term prime into the language of ideal. If x is prime, then
we know that x|ab implies x|a or x|b. Therefore, ab ∈ ⟨x⟩ implies a ∈ ⟨x⟩ or b ∈ ⟨x⟩.
125
Hung-Hsun, Yu 9.5 Ideal and Divisibility
Property 9.5.4. If R is an integral domain, then ⟨x⟩ is a prime ideal if and only if
x = 0 or x is prime.
Sketch of Proof. R/I means that we see the elements in I as 0. Therefore the condition
that I is prime becomes that R/I has no non-zero zero divisors, and so it is equivalent
to that R/I is an integral domain.
Remark. One can see that using ideals to discuss divisibility gives us a lot of benefit,
mainly that we don’t have to care about associates anymore—if a and b are associates,
then ⟨a⟩ = ⟨b⟩. Also, divisibility and some other operations can be replaced with being
a subset, addition and taking union. One thing that is worth notice is that we can also
define a multiplication on the ideals, and we will naturally hope that I1 ⊆ I2 if and
only if there exists I3 such that I2 I3 = I1 . “Noetherian” integral domains that satisfy
this condition are called Dedekind domain, and in this kind of integral domain we can
uniquely factorize every ideal into prime ideals. These conditions sound like a lot, but one
can verify that any ring of integers (i.e. ring consisting algebraic integers) of a number
field (i.e. field lying in C containing Q such that the degree over Q is finite) is always
Dedekind. This is a very important result in the early attempt to proving Fermat’s last
theorem.
Sketch of Proof. If I is maximal, then for every s ∈ R\I we have that ⟨s, I⟩ = R, which
shows that there exists s′ ∈ R such that ss′ + I = 1 + I, and so R/I is a field.
If R/I is a field, then for every s ∈ R\I we have that 1 + I ⊆ ⟨s, I⟩, and so 1 ∈ ⟨s, I⟩,
which forces ⟨s, I⟩ to be R.
Corollary 9.5.2. Any maximal ideal is a prime ideal in a commutative ring with 1.
Intuitively, every ideal is contained in a maximal ideal, because we can enlarge the
ideal if the ideal that we have is not maximal. This is justified by Zorn’s lemma when R
has a multiplicative identity.
Theorem 9.5.2. Suppose that I is a proper ideal of a ring R with 1, then there exists
a maximal ideal M of R such that I ⊆ M .
Sketch of Proof. Consider the set S of all proper ideals that contain I and a partial order
126
Hung-Hsun, Yu 9.6 Z is PID
≤ such that I1 ≤ I2 if and only if I1 ⊆ I2 . For every chain C, consider the set
∪
IC = I ′.
I ′ ∈C
We know that IC is an ideal since it’s the union of ideals. Also IC is a proper ideal, since
1∈/ IC . Therefore IC is in S, and so every chain has an upper bound. By Zorn’s lemma,
there is a maximal element M in S. Now if M is not a maximal ideal, then there exists
an ideal M ′ such that M ⊆ M ′ and M ′ ̸= M, R. This contradicts the maximality of M
in S, and so M is maximal. Now by the definition of S we know that I ⊆ M .
Corollary 9.5.3. Any nonzero ring with 1 has a maximal ideal.
9.6 Z is PID
In this section, we will show that Z is PID, and on top of this, we will try to generalize
the proof and apply it to some other integral domains.
Theorem 9.6.1. Z is a PID.
Sketch of Proof. Suppose that I ⊆ Z is an ideal. If I is a zero ideal, then we’re done. So
we can assume that I contains some non-zero elements. Suppose that x ∈ I is an element
that has the smallest non-zero absolute value. We claim that I = ⟨x⟩. For every s ∈ I,
we can express s as qx + r where 0 ≤ r < |x|. Since I is an ideal, we know that qx ∈ I,
and so r = s − qx ∈ I. By the minimality of |x|, we know that r = 0, and so s = qx.
Hence I = ⟨x⟩, as desired.
In this proof, we use the existence of the minimum of the absolute value. However,
in the general setting, we do not have such a thing. To do the similar thing, we have to
have a function ϕ : R → N ∪ {0} that emulates the absolute value. The property that we
hope the function has is simply that we can make the euclidean algorithm work. If such
function exists, then we say that the integral domain R is Euclidean.
Definition 9.6.1. An integral domain R is an Euclidean domain (ED) if there exists a
function ϕ : R → N ∪ {0} such that:
(1) ϕ(x) = 0 ⇔ x = 0;
(2) For any a, b ∈ R where b is non-zero, there exists q, r ∈ R such that a = qb + r
and ϕ(r) < ϕ(b).
Now we can modify the proof a bit to prove that an ED is always a PID.
Theorem 9.6.2. If R is an ED, then it is also a PID.
Sketch of Proof. Suppose that I ⊆ Z is an ideal. If I is a zero ideal, then we’re done. So
we can assume that I contains some non-zero elements. Suppose that x ∈ I is a non-zero
element such that ϕ(x) is the smallest among non-zero elements in I. We claim that
I = ⟨x⟩. For every s ∈ I, we can express s as qx + r where ϕ(r) < ϕ(x). Since I is an
ideal, we know that qx ∈ I, and so r = s − qx ∈ I. By the minimality of ϕ(x), we know
that r = 0, and so s = qx. Hence I = ⟨x⟩, as desired.
The definition of ED seems to be overly artificial, but it turns out to be useful in some
applications.
127
Hung-Hsun, Yu 9.7 Noetherian and Existence of Factorization
Example 9.6.1. Consider the polynomial ring F [x] with coefficients in F . It is clear
that F [x] is Euclidean with ϕ(f ) = deg f + 1 (here we follow the convention that deg 0 =
−1). Therefore F [x] is a PID.
Property 9.7.1. If R is a rng, then the following three conditions are equivalent:
(1) Every ideal in R is finitely generated.
(2) R satisfies the ascending chain condition.
(3) R satisfies the maximal condition: For any set of ideals in R, there exists a maximal
element in a sense that it is not contained in any other ideal in the set.
is in C by the ascending chain condition. Therefore every chain has an upper bound in
S, and so by Zorn’s lemma there is a maximal element in S.
(3) ⇒ (1): For any ideal I in R, consider the set SI consisting of ideals finitely
generated by the elements in I. Condition (3) say that there is a maximal element I ′
in SI . Now if I ′ is not I, then ⟨I ′ , a⟩ is a strictly larger element in SI for any a ∈ I\I ′ ,
which is a contradiciton. Hence I is finitely generated.
Corollary 9.7.1. Any PID is Noetherian.
128
Hung-Hsun, Yu 9.8 PID is UFD
Sketch of Proof. Suppose that the statement is false. Then the set S consisting of non-
zero elements that cannot be factorized is non-empty. Consider the set
S ′ := {⟨x⟩|x ∈ S}.
Since R is Noetherian, by the maximal condition there is an element x ∈ S such that ⟨x⟩
is maximal in S ′ . Clearly x cannot be a unit or zero, and x cannot be irreducible either.
Therefore there exists y, z ∈ R that are not units such that x = yz. If one of y, z, say y,
cannot be factorized, then y ∈ S and ⟨x⟩ ⊆ ⟨y⟩. The maximality forces ⟨y⟩ = ⟨x⟩, which
implies that z is a unit. This is a contradiction. Therefore y, z can both be factorized,
which shows that x can also be factorized, which is again a contradiction.
Definition 9.8.2. If R is an integral domain such that the factorization of any non-zero
element exists and is unique, then we say that R is a unique factorization domain (UFD).
We know that the definitions of prime and irreducible coincide in Z and Z is a UFD.
It turns out that in an integral domain that factorization always exists, the domain is
UFD if and only if the definitions of prime and irreducible coincide.
Property 9.8.1. Suppose that R is an integral domain where the factorization of any
non-zero element exists, then R is a UFD if and only if every irreducible element is prime.
129
Hung-Hsun, Yu 9.8 PID is UFD
Sketch of Proof. If R is a UFD, then for every irreducible p and any two elements a, b such
that p|ab, write this as pc = ab and expand this into the factorization into irreducibles.
By the uniqueness of factorization of pc = ab, we know that one of a, b is divisible by an
associate of p, and hence is divisible by p. Therefore p is a prime.
Conversely, if every irreducible element p is prime, then we can prove that the factor-
ization is unique by induction on the number of irreducible elements in the factorization,
denoted by n. It clearly holds when n = 0. Now suppose that it holds when n = k, and
suppose that we can write
a = up1 . . . pk+1 = u′ q1 . . . qm .
Since pk+1 is irreducible, it is prime by assumption. Therefore there exists an i such that
pk+1 |qi . WLOG i = m, for we can permutes q1 , . . . , qm at our will. Since qm is irreducible,
we have that pk+1 , qm are associates. Suppose that qm = u′′ pk+1 where u′′ is a unit, then
since R is an integral domain, we have that
a′ = up1 . . . pk = (u′ u′′ )q1 . . . qm−1
and so by inductive hypothesis we’re done.
Theorem 9.8.1. Any PID is a UFD.
Sketch of Proof. It suffices to show that every irreducible element is prime in a PID. Let
p be an irreducible element and a, b be two elements such that p|ab. Since we’re in PID,
we know that da = gcd(a, p) and db = gcd(b, p) exist. Since p is irreducible, we know
that da , db are either units or associates of p. If one of them is an associate of p then
we’re done, so suppose that they are both units. WLOG suppose that da = db = 1. By
Bezout’s theorem there exist xa , xb , ya , yb such that axa + pya = 1 and bxb + pyb = 1.
Therefore
1 = (axa + pya )(bxb + pyb ) = abxa xb + p(axa yb + bxa yb + pya yb ).
Note that the right hand side is divisible by p, which shows that p is a unit. Thus we
reach a contradiction, showing that one of da , db has to be an associate of p.
Example 9.8.1. For any field F , the integral domain F [x] is a PID and therefore is a
UFD.
Knowing that an integral domain is a UFD gives us a lot of benefit. For example, we
can know that gcd and lcm exist.
Property 9.8.2. If R is a UFD, then for any a, b ∈ R, we have that the gcd and lcm
of a, b exist.
130
Hung-Hsun, Yu 9.9 Polynomial Ring
and (∞ ) (∞ ) Ñ é
∑ ∑ ∞
∑ ∑
ai x i
· bi x i = aj b k xi .
i=0 i=0 i=0 j+k=i
Here we define the polynomials as an infinite sum whose summands are eventually
zero. This is actually just a fancier notation of the usual definition of polynomials where
all the zeros are omitted. The addition and multiplication are just simply the general-
ization of the ones in F [x]. One needs to check that the two operations are well-defined,
and that this chosen addition and multiplication actually satisfy the ring axioms. This
is left as an exercise.
We can also consider the polynomial ring over R in multiple variables.
Definition 9.9.2. Suppose that R is a commutative ring with 1. Then the polynomial
ring R[x1 , . . . , xn ] over R in n variables can be inductively defined as
R[x1 , . . . , xn ] = (R[x1 , . . . , xn−1 ])[xn ].
where ai1 ,...,in is eventually zero. This is kind of messy, so sometimes we introduce the
multi-index notation i = (i1 , . . . , in ). If we define xi to be xi11 · · · xinn , then we can instead
write the polynomials in n variables as
∑
ai x i
i∈Nn
0
131
Hung-Hsun, Yu 9.9 Polynomial Ring
and Ñ é Ñ é Ñ é
∑ ∑ ∑ ∑
ai x i · bi x i = (aj bk )xi .
i∈Nn
0 i∈Nn
0 i∈Nn
0 j+k=i
The degree of the zero polynomial is kind of hard to define. There might be sometime
that we want deg 0 to be 0, and maybe sometime we want deg 0 to be −1 or −∞. In this
note, I will specify what deg 0 is if necessary.
Property 9.9.1. If f, g ∈ R[x1 , . . . , xn ] are two polynomials, then
and
deg f g ≤ deg f + deg g.
If furthermore R is an integral domain, then
Property 9.9.2. The sum of two homogeneous polynomials of the same degree is still
homogeneous. The product of two homogeneous polynomials is homogeneous.
There are a lot of properties that R[x] can inherit from R. By induction, we can
also show that R[x1 , . . . , xn ] inherit the same properties from R, so let’s focus on the
interaction between R and R[x].
Property 9.9.3. If R is an integral domain, then R[x] is also an integral domain.
Sketch of Proof. Suppose that f, g are two nonzero polynomials in R[x] and deg f =
n, deg g = m, then we know that [xn ]f, [xm ]g is nonzero. Here [xn ]f means that the
coefficient of xn in f . Now it’s clear by definition that [xn+m ]f g = [xn ]f [xm ]g ̸= 0, which
completes the proof.
Theorem 9.9.1. (Hilbert’s basis theorem) If R is Noetherian, then R[x] is also Noethe-
rian.
Sketch of Proof. For every ideal I in R[x], we have to show that I is finitely generated.
Let LI be the set containing zero and the leading coefficients of polynomials in I. We
can show that LI is an ideal in R. It suffices to show that a − b ∈ LI for every a, b ∈ LI
and ra ∈ LI for every a ∈ LI , r ∈ R.
132
Hung-Hsun, Yu 9.10 Adjoining an element
Sketch of Proof. It is clear that the only homomorphism that satisfies the condition is
(∞ ) ∞
∑ i
∑
ϕ ai x = ai α i .
i=0 i=0
133
Hung-Hsun, Yu 9.11 Fraction Field and Localization
Corollary 9.10.1. R[α] is always a quotient ring R[x]/Iα where Iα is the polynomials
who vanish at α.
We can similarly define the field analogue of this. The details are left out for exercise.
Definition 9.10.2. Suppose that F is a subfield of K and α is an element of F . Then
we denote the smallest field consisting of F and α as F (α).
134
Hung-Hsun, Yu 9.11 Fraction Field and Localization
(a, b) ∼ (c, d) ⇔ ad = bc
and
(a, b)(c, d) = (ac, bd).
ϕ
R F′
ϕ′
F
Sketch of Proof. It is clear that ϕ′ ((a, b)) = ϕ(a)ϕ(b)−1 satisfies the conditiona and is the
only homomorphism that satisfies this.
Example 9.11.1. Consider the polynomial ring F [x] over a field F . F [x] is an integral
domain, so we can consider its fraction field, which is usually denoted as F (x). This field
consists of the function
f (x)
g(x)
where f, g ∈ F [x] and g is non-zero. We call this kind of function a rational function. Note
that rather than thinking of these functions as functions defined on F , we should think
of them as a formal expression. This is because that the function is actually undefined at
the roots of g, and this does not bother us too much when we only think of the expression
formally.
Example 9.11.2. Now let’s consider another integral domain F [[x]] which consists of
the elements
∞
∑
ai xi
i=0
135
Hung-Hsun, Yu 9.11 Fraction Field and Localization
constant term is a unit, and so the fraction field F ((x)) of F [[x]] should consists of the
elements
∞
∑
ai x i
i=−m
where m is some integer. This is usually called the ring of formal Laurent series.
Note that if a commutative ring R is not an integral domain and has a zero divisor
c such that cd = 0 where c, d ̸= 0, then ∼ is no longer an equivalent relation. This is
because that (0, 1) ∼ (0, d) ∼ (c, 1) but (0, 1) ̸∼ (c, 1). To address this, we have to make
a different definition for the more general case.
Definition 9.11.2. Suppose that R is a commutative ring with 1 and S is a multi-
plicatively closed subset, then the localization of R by S is constructed as follows: The
underlying set is R × S/ ∼ where ∼ is the equivalent relation
and
(r1 , s1 )(r2 , s2 ) = (r1 r2 , s1 s2 ).
This ring is usually denoted as S −1 R.
ϕ
R R′
π
ϕ′
S −1 R
Sketch of Proof. It is clear that we must choose ϕ′ ((r, s)) = ϕ(r)ϕ(s)−1 . We still have
to check that this is well-defined. If (r1 , s1 ) ∼ (r2 , s2 ), then there exists u ∈ S such
that u(r1 s2 − r2 s1 ) = 0. Therefore ϕ(u)(ϕ(r1 )ϕ(s2 ) − ϕ(r2 )ϕ(s1 )) = 0. Since ϕ(u) is a
unit, we can cancel out ϕ(u). By multiplying ϕ(s1 )−1 ϕ(s2 )−1 on both sides, we get that
ϕ(r1 )ϕ(s1 )−1 = ϕ(r2 )ϕ(s2 )−1 .
136
Hung-Hsun, Yu 9.12 Nullstellensatz and Algebraic Geometry
Note that the canonical map π need not be injective: if S contains a zero divisor c
such that cd = 0 where d ̸= 0, then (0, 1) ∼ (d, 1) and so π(d) = 0.
When R is an integral domain and S is R\{0}, then the localization of R by S is the
fraction field of R. We can furthermore generalize this:
Definition 9.11.3. Suppose that R is a commutative ring with 1 and p is a prime ideal
in R, then the localization Rp of R at p is the localization of R by R\p.
Besides this, we can also take S generated by a single element x. That is, we can take
S = {xn |n ∈ N0 }.
Definition 9.11.4. Suppose that R is a commutative ring with 1 and x is an element
in R, then the localization Rx of R away from x is the localization of R by {xn |n ∈ N0 }.
It might be unclear at this point why we call such an operation a localization, but it
will be clear in a moment.
How about the maximal ideals of F [x1 , x2 , . . . , xn ]? We know that for every point
c = (c1 , . . . , cn ) in F n , we can find a maximal ideal ker evc where
evc (f ) = f (c1 , c2 , . . . , cn ).
Sketch of Proof. The actual proof of this is quite technical and requires much more back-
ground knowledge, so here we are only going to prove the case where F is uncountable.
Think on the bright side, C is an uncountable algebraically closed field!
Let F ′ = F [x1 , . . . , xn ]/m. It is clear that F ⊆ F ′ , and so we can see F ′ as a vector
space over F . Since F ′ can be generated by x1 + m, . . . , xn + m, we know that dimF F ′ is
137
Hung-Hsun, Yu 9.12 Nullstellensatz and Algebraic Geometry
countable. Now, if any of xi + m is not in F , then we know that xi + m is not any roots
of polynomials in F [x]. This shows that F [x] ⊆ F ′ , and so by the universal property of
fraction field we know that F (x) ⊆ F ′ . This is absurd since the dimension of F (x) over
F is uncountable, for we can choose an uncountable basis
1
(c ∈ F ).
x−c
Sketch of Proof. We only need to prove that for every maximal ideal m of F [x1 , . . . , xn ]
we can find a point c ∈ F n such that m = ker evc . Let ϕi : F [xi ] → F [x1 , . . . , xn ] be the
canonical embedding and π : F [x1 , . . . , xn ] → F [x1 , . . . , xn ]/m be the canonical projec-
tion. By Zariski’s lemma F [x1 , . . . , xn ]/m ∼= F . Therefore π ◦ ϕi is a ring homomorphism
from F [xi ] to F . It is easy to see that π(ϕi (c)) = c for any c ∈ F , and so π ◦ϕi (c) is surjec-
tive. It is clear that the kernel is m ∩ F [xi ], which should be maximal. Therefore we know
that m ∩ F [xi ] = ⟨xi − ci ⟩ for some ci ∈ F . This shows that ⟨x1 − c1 , . . . , xn − cn ⟩ ⊆ m.
Since ker evc = ⟨x1 − c1 , . . . , xn − cn ⟩, we know that m = ker evc .
Corollary 9.12.1. If F is an algebraically closed field, then for any ideal in F [x1 , . . . , xn ],
we have I = ⟨1⟩ if and only if V (I) is an empty set.
At this point we can see that nullstellensatz tells us a lot about the zero of an ideal.
This is why it is called nullstellensatz—the word is the german of “the theorem of zero”.
Using the weak form of Hilbert’s nullstellensatz, we can prove the strong form.
√
Definition 9.12.2. The radical I of an ideal I is the ideal containing the elements s
such that there exists n ∈ N satisfying sn ∈ I.
138
Hung-Hsun, Yu 9.13 Random Problem Set
Sketch of Proof. The proof that I am going to show is the Rabinowitsch trick. It is simple
but kind of comes out from nowhere. For a proof that is longer but easier to come up
with, see the exercise.
√ √
It is clear that I ⊆ J, so it suffices to show that J ⊆ I. Since F [x1 , . . . , xn ] is
Noetherian, we can suppose that I is generated by g1 , . . . , gn . Now suppose that f is a
polynomial that vanishes on every point where g1 , . . . , gn vanish. We have to show that
there exists r ∈ N such that f r ∈ I.
Introduce a new variable x0 , and consider the polynomials g1 , g2 , . . . , gn , 1 − x0 f . It
is easy to see that those polynomials do not have any common zero, and so the ideal
generated by them is the unit ideal. Hence there exists h0 , . . . , hn ∈ F [x1 , . . . , xn , x0 ]
such that
1 = h1 g1 + h2 g2 + · · · + hn gn + h0 (1 − x0 f ).
Now let x0 = 1/f . Then we have that
Ç å Ç å
1 1
1 = h1 x1 , . . . , x n , g1 + · · · + hn x1 , . . . , xn , gn .
f (x1 , . . . , xn ) f (x1 , . . . , xn )
Choose r ∈ N greater than the degrees of hi ’s. Then
Ç å
1
h′i r
:= f (x1 , . . . , xn ) hi x1 , . . . , x n ,
f (x1 , . . . , xn )
is a polynomial in F [x1 , . . . , xn ]. Therefore
f r = h′1 g1 + · · · + h′n gn
2. (9.1) Show that if R is an integral domain, then R[x], the ring of polynomials with
coefficients in R, is also an integral domain.
139
Hung-Hsun, Yu 9.13 Random Problem Set
3. (9.2) (Hard if you didn’t see this before) Show that the only ring automorphism of
R is the identity map.
6. (9.5) Show that the contraction of a prime ideal is still a prime ideal.
7. (9.5) Let R be a commutative ring with 1 and I be a prime ideal of R. Show that
if R/I is finite, then I is maximal.
10. (9.7) (Slightly harder) Show that the algebraic integers form an integral domain.
Here “algebraic integers” means the numbers that can be represented as roots of
some monic polynomial with integer coefficients. Show that there exists an element
that is not a unit, and show that any non-unit element is reducible.
11. (9.7) If we replace the ascending chain condition with the “descending chain con-
dition”, then we get the definition of an Artinian rng. Show that the descending
chain condition is equivalent to the “minimal” condition. Show that an Artinian
integral domain must be a field.
12. (9.8) Show that in any PID, any non-zero prime ideal is maximal. Conversely, show
that if in a Noetherian UFD, any non-zero prime ideal is maximal, then it is a PID.
Hint: For the second part, try to prove Bezout’s theorem first.
x2 + x + 2 = y 3
in integers.
Ä √ äÄ √ ä
Hint: 4(x2 + x + 2) = (2x + 1)2 + 7 = (2x + 1) + −7 (2x + 1) − −7 .
140
Hung-Hsun, Yu 9.13 Random Problem Set
14. (9.9) Show that if A is a UFD, then A[x], the ring of polynomials with coefficients
in A, is also a UFD. Use this to construct examples of UFDs that are not PIDs.
Hint: You might need to consider the fraction field of A and Gauss’ lemma, which
states that the product of two primitive polynomials (i.e. polynomials whose coef-
ficients’ gcd is 1) is still primitive.
15. (9.9) Use Hilbert’s basis theorem to prove the following statement:
If a1 , a2 , . . . is a sequence of positive integers such that ai ∤ aj for any i ̸= j, then
show that the set of prime factors of this sequence is infinite.
Try to emulate the proof of Hilbert’s basis theorem to generate an elementary proof
of this.
16. (9.10) Show that if F ⊆ K are two fields such that K = F [θ], then K ∼
= F [x]/I for
some maximal ideal I in F [x].
17. (9.10) Let R = Z/6Z. Show that there does not exist R[α] where 2α + 1 = 0.
18. (9.10) Let F be a field and f be a non-zero polynomial. Show that F [x]/⟨f ⟩ is
a vector space over F with dimension equal to deg f . Construct a function N :
F [x]/⟨f ⟩ → F such that N (a)N (b) = N (ab) for any a, b ∈ F [x] and N (a) = adeg f
for any a ∈ F .
20. (9.11) A commutative ring R with 1 is a local ring if one of the two following
equivalent definitions hold:
(1) R only has a maximal ideal;
(2) R is not a zero ring and any sum of two non-unit elements is non-unit.
Show that these two definitions are equivalent. Show that the localization at a
prime ideal is always local.
21. (9.11) (1) An integral domain R is a valuation ring if for every 0 ̸= x ∈ F , one of
x, x−1 is in R. Here F is a fraction field of R. Show that any valuation ring is a
local ring.
(2) Suppose that R is a valuation ring and F is its fraction field. Let Γ = F × /R× ∪
{∞} where R× is the set of units in R and the abelian quotient group F × /R× is
written additively. We can define a total ordering on Γ by
a ≥ b ⇔ a − b is an image of an element in R
when a, b ̸= ∞, and define ∞ to be the maximal element. Define the valuation map
v : F → Γ such that v(a) = aR× for a ̸= 0 and v(0) = ∞. Show that the valuation
map satisfies the strong triangle inequality
141
Hung-Hsun, Yu 9.13 Random Problem Set
(3) Show that for any field F , the ring of formal power series F [[x]] is a valuation
ring (and hence local), and the valuation map on the ring of formal Laurent series
F ((x)) is isomorphic to the function
23. (9.12) Let R be a nonzero commutative ring with 1. R is Jacobson if any prime
ideal is the intersection of the maximal ideals containing it. Using Zariski’s lemma,
show that for any algebraically closed field F and any ideal I in F [x1 , . . . , xn ], the
quotient ring F [x1 , . . . , xn ]/I is Jacobson.
Hint: Follow the solution of the previous question and apply Zariski’s lemma.
24. (9.12) If R is a commutative ring with 1, then its Jacobson radical is the intersection
of all maximal ideals in R. Show that if R is Jacobson, then its nilradical is equal
to its Jacobson radical.
25. (9.12) Using the previous three problems, prove the strong form of Hilbert’s null-
stellensatz with the weak form.
142
Chapter 10
Module Theory
A vector space is an abelian group endowed with a scalar multiplication with elements
in a field. It is not hard to see that in order to make the definition make sense, it suffices
to require the scalars to be the elements in a ring. Roughly speaking, this is called a
module. Despite the similarity of the definitions of a vector space and a module, the
properties of a module is substantially different from the ones of a vector space because
of the lack of multiplicative inverse. In this chapter, we will try to see what we can still
get even though we do not necessarily have multiplicative inverse.
Module theory has a lot of applications. As we will see, it is related to group rep-
resentation. Besides, the structure theorem of module over PID can be directly applied
to characterize finitely generated abelian groups, and also can help us derive another
canonical form of linear operator called rational form.
Example 10.1.2. If V is a vector space over a field F , then V is also a unital module
143
Hung-Hsun, Yu 10.1 Definitions and Examples
over F .
Example 10.1.3. Suppose that G is an abelian group, then we can see G as a Z-module
by defining n · g to be the sums of n copies of g for every n ∈ N and g ∈ G.
Example 10.1.4. Let End(V ) be the set of linear operators on a vector space V over
F . For every f ∈ F , we can define f · T to be the linear operator f (T ). Then we can
verify that End(V ) is a module over F [x].
Note that if we fix an element r and consider the left multiplication lr , then lr is a
group homomorphism from M to itself. We call this an endormorphism.
Definition 10.1.2. An endormophism is a homomorphism from an object O to itself.
If the endomoprhisms are compatible with an addition defined on O, then the set End(O)
of endomorphisms on O form a ring with 1 by defining
and
(f g)(o) = f (g(o)) ∀f, g ∈ End(O), o ∈ O.
c1 g1 + · · · + cn gn (n ∈ N, ci ∈ F, gi ∈ G),
c1 g1 + · · · + cn gn (n ∈ N, ci ∈ R, gi ∈ G)
and Ä∑ ä Ä∑ ä ∑
ci gi c′i gi = ci c′j (gi gj ) (ci , c′i ∈ R, gi ∈ G).
144
Hung-Hsun, Yu 10.2 Submodule and Ideal
Example 10.1.5. Consider the group ring F [C2 ] where C2 is the cyclic group of order
2 generated by x. Then the group ring consists of c0 + c1 x where c0 , c1 ∈ F . The
multiplication works as
(c0 + c1 x)(c′0 + c′1 x) = (c0 c′0 + c1 c′1 ) + (c0 c′1 + c1 c′0 )x.
{r1 s1 + · · · + rn sn | n ∈ N, ri ∈ R, si ∈ S}.
Since we know what a submodule is now, we can define a quotient module now.
Definition 10.2.4. Suppose that M is a module over R and N is a submodule of
M , then the quotient module M /N is the module defined on the quotient group M /N
endowed with the scalar multiplication r(m + N ) = rm + N .
145
Hung-Hsun, Yu 10.2 Submodule and Ideal
Property 10.2.3. If M is a module over R, then the following three conditions are
equivalent:
(1) Every submodule in M is finitely generated.
(2) M satisfies the ascending chain condition.
(3) M satisfies the maximal condition: For any set of submodules in M , there exists
a maximal element in a sense that it is not contained in any other submodule in the set.
Note that when R is not commutative, that R is Noetherian is different from that
R is Noetherian as a left module over itselt. To differentiate them, let’s make another
definition.
Definition 10.2.7. A rng R is left (right) Noetherian if it is Noetherian as a left (right
resp.) module over itself.
At the end of this section, let’s prove a result that can help us verify if a module is
Noetherian or not.
Property 10.2.4. Suppose that M is a module and N is a submodule of M . Then M
is Noetherian if and only if N and M /N are both Noetherian.
Sketch of Proof. If M is Noetherian, then it is clear that N is also Noetherian. For every
submodule K of M /N , consider the submodule π −1 (K) of M where π is the canonical
projection from M to M /N . Since π −1 (K) is a submodule of a Noetherian mdoule, it
is finitely generated, and so K is also finitely generated. This shows that M /N is also
Noetherian.
Conversely, if N and M /N are both Noetherian, then for every submodule K of M ,
consider the submodules K ∩ N of N and the submodules π(K) of M /N . We know that
both submodules are finitely generated. Suppose that K ∩ N is generated by a1 , . . . , an
and π(K) is generated by b1 +N, . . . , bm +N where b1 , . . . , bm ∈ K, then we can show that
K is generated by a1 , . . . , an , b1 , . . . , bm . For every k ∈ K, we know that k + N ∈ π(K),
′
and so there exists r1′ , . . . , rm ∈ R such that
k ′ := k − (r1′ b1 + · · · + rm
′
bm ) ∈ N.
k = r1 a1 + · · · + rn an + r1′ b1 + · · · rm
′
bm ∈ ⟨a1 , . . . , an , b1 , . . . , bm ⟩,
as desired.
146
Hung-Hsun, Yu 10.3 Module Homomorphism
Example 10.3.1. Suppose that G, G′ are abelian groups, then we can see G, G′ as two
Z-modules. Then any group homomorphism from G to G′ is a Z-linear map.
The isomorphisms still hold for modules. We just have to modify them a bit.
Theorem 10.3.1. (First isomorphism theorem) Suppose that ϕ : M → M ′ is an R-
linear map, then ker ϕ is a submodule of M and im ϕ is a submodule of M ′ . Moreover,
M / ker ϕ is isomorphic to im ϕ.
If we consider the set of R-linear maps from M to N , we can see that it actually forms
a module over R.
Definition 10.3.3. Suppose that M, N are two modules over R. The set HomR (M, N )
is the module over R that contains homomorphisms of modules over R from M to N .
The addition and scalar multiplication are defined as
and
(rϕ)(m) = r(ϕ(m))
for every ϕ, ϕ′ ∈ HomR (M, N ), m ∈ M, r ∈ R.
Definition 10.3.4. Suppose that M is a module over R. The set EndR (M ) is the ring
with 1 and at the same time the module over R that contains R-endomoprhisms of M .
In other words, EndR (M ) is the module HomR (M, M ) with an extra multiplication
147
Hung-Hsun, Yu 10.4 Direct Sum and Direct Product
For some category reason, given a module M , we usually consider the relation between
N and HomR (M, N ) or the relation between N and HomR (N, M ). This is because that
Hom is a “functor.”
Property 10.3.1. Let M, N, N ′ be modules over R. Suppose that ϕ : N → N ′ is an
R-linear map. Then we can construct canonically an R-linear map ϕ∗ : HomR (M, N ) →
HomR (M, N ′ ) by sending ψ ∈ HomR (M, N ) to ϕ ◦ ψ. Moreover, if ϕ is injective, then ϕ∗
is injective.
Sketch of Proof. To show that ϕ∗ is well-defined it suffices to show that the composition
of R-linear maps is still R-linear, which is quite clear. Now let’s verify that ϕ∗ is R-linear.
For every ψ, ψ ′ ∈ HomR (M, N ), r ∈ R and m ∈ M , we have that
(ϕ∗ (ψ + ψ ′ ))(m) = ϕ((ψ + ψ ′ )(m)) = ϕ(ψ(m) + ψ ′ (m)) = (ϕ∗ (ψ))(m) + (ϕ∗ (ψ ′ ))(m)
and
r(ϕ∗ (ψ))(m) = rϕ(ψ(m)) = ϕ(rψ(m)) = ϕ∗ (rψ)(m).
Therefore ϕ∗ is R-linear. Now suppose that ϕ is injective and ϕ∗ (ψ) = 0. Then for every
m ∈ M we have that ϕ(ψ(m)) = 0. Since ϕ is injective, we have that ψ(m) = 0 and so
ψ = 0. This shows that ϕ∗ is also injective.
We can similarly prove the following property.
Property 10.3.2. Let M, N, N ′ be modules over R. Suppose that ϕ : N → N ′ is an
R-linear map. Then we can construct canonically an R-linear map ϕ∗ : HomR (N ′ , M ) →
HomR (N, M ) by sending ψ ∈ HomR (N ′ , M ) to ψ ◦ ϕ. Moreover, if ϕ is surjective, then
ϕ∗ is injective.
and
r(as )s∈S = (ras )s∈S
for every as , bs ∈ Ms and r ∈ R.
148
Hung-Hsun, Yu 10.4 Direct Sum and Direct Product
∏
πs ◦ ϕ = ϕs for every s ∈ S where πs is the canonical projection from s′ ∈S Ms′ to Ms .
In other words, there is a unique ϕ such that the following diagram commutes for every
s ∈ S:
ϕs
N Ms
ϕ
πs
∏
s′ ∈S Ms′
Sketch of Proof. It is clear that we have to define ϕ(n) = (ϕs (n))s∈S for every n ∈ N .
The rest is just checking that ϕ is an R-linear map.
ϕs
Ms N
⊕
ϕ
s′ ∈S Ms′
This is well-defined since only finitely many of s satisfy that ms ̸= 0. The rest is just
checking that this is an R-linear map.
Note that when S is a finite set, then there is no difference between the direct sum
and the direct product of the family. This difference only appears because that to make
the universal property of direct product holds, the direct product has to admit elements
with infinitely many nonzero terms, while the direct sum only has to admit elements with
finitely many nonzero terms since only finite addition is defined.
149
Hung-Hsun, Yu 10.5 Balanced Product and Tensor Product
Sketch of Proof. For every m1 ∈ M1 let ϕ̄(m1 ) = ϕ(m1 , ·). It is clear that ϕ̄ is well-
defined, and all we have to do is to verify that ϕ̄ is R-linear. For any m, m′ ∈ M1 and
r ∈ R we have that
ϕ̄(m + rm′ ) = ϕ(m + rm′ , ·) = ϕ(m, ·) + rϕ(m′ , ·) = ϕ̄(m) + rϕ̄(m′ ).
Note that in the case that R is non-commutative, things get a little bit off. What we
are imagining first is that a bilinear map is a product-like function. However, when R is
non-commutative and M, N are both left modules, then the “imaginary product” is no
longer a bilinear map: there is no reason that we would want r(m × n) = (rm) × n =
m × (rn). A more natural assumption would be that (mr) × n = m × (rn), which requires
M to be instead a right module.
Definition 10.5.2. Suppose that M is a right R-module, N is a left R-module and
G is an abelian group. A map ϕ : M × N → G is an R-balanced product if for every
m ∈ M, n ∈ N we have that ϕ(m, ·), ϕ(·, n) are both group homomorphism, and that
ϕ(m, rn) = ϕ(mr, n) for every r ∈ R.
The definition looks weird at the first sight, but it actually makes sense: Recall that
taking the quotient by H is actually identifying the elements in H with 0. Since we want
that (m, n1 + n2 ) = (m, n1 ) + (m, n2 ), we hope that (m, n1 + n2 ) − (m, n1 ) − (m, n2 ) = 0,
and so we put this element into H. The other two forms come from the similar reason.
There is a natural map from M × N → M ⊗R N . This is usually denoted by ⊗, and
the image of (m, n) is usually denoted by m ⊗R n. When no ambiguity will be produced,
we will only write ⊗ instead of ⊗R .
150
Hung-Hsun, Yu 10.6 Group Representation Revisit
ϕ
M ×N G
⊗
ϕ′
M ⊗R N
Note that if M is a left module over another ring S, then we can endow M ⊗R N with
a module structure over S by s(m ⊗ n) = (sm ⊗ n). Similarly, if N is a right module
over S, then M ⊗R N can be seen as a right module over S. Now if R is commutative,
then there is no difference between left and right modules. This tells us that M ⊗R N is
an R-module.
Property 10.5.3. (Universal property of tensor product over commutative ring) Sup-
pose that R is a commutative rng and M1 , M2 , N are all R modules. Then for every R-
bilinear map ϕ : M1 × M2 → N there exists a unique R-linear map from ϕ′ : M1 ⊗R M2 →
N such that ϕ′ ◦ ⊗ = ϕ. In other words, there is a unique R-linear map such that the
following diagram commutes.
ϕ
M ×N G
⊗
ϕ′
M ⊗R N
Sketch of Proof. Note that when R is commutative, any R-bilinear map is R-balanced.
Therefore there is a unique group homomorphism ϕ′ that makes the diagram commute.
We only need to verify that this is R-linear, which is quite clear and is left as an exercise.
151
Hung-Hsun, Yu 10.6 Group Representation Revisit
Property 10.6.1. Suppose that M is a unital module over a ring R with 1. Then the
following three conditions are the equivalent:
(1) M is the sum of some simple submodules;
(2) M is semisimple;
(3) For every submodule N of M , there exists a complement P which is a submodule
of M such that M = N ⊕ P .
Sketch of Proof. Let’s first show that (1) implies (2). Suppose that M is the sum of the
family of modules Ni (i ∈ S). Let Ni (i ∈ S ′ ) be the maximal subset of the family such
that the sum is direct (the existence is guaranteed by Zorn’s lemma). We have to show
⊕
that M ′ := i∈S ′ Ni is M , or equivalently, it contains every Nj for j ∈ S. Note that if
it does not contain Nj , then since Nj is simple, we have that M ′ ∩ Nj is empty, and so
M ′ + Nj is actually a direct sum, which is a contradiction to the maximality of S ′ .
(2) implies (3) can be proved in a similar way. Suppose that M is the direct sum of
Ni (i ∈ S), and let S ′ to be the maximal subset of S such that the sum of Ni (i ∈ S ′ ) is
direct and has empty intersection with N . Then we can prove similarly that N ⊕M ′ = M
⊕
where M ′ = i∈S ′ Ni .
To show that (3) implies (1), let N be the sum of all simple submodules of M . We have
to show that N = M . Suppose that this is not true, and P is a non-trivial submodule that
is a complement of N . Then there exists a nonzero element x ∈ P , and so 0 ̸= Rx ⊆ P .
Let ϕ : R → Rx be the R-linear map sending r to rx, then R/ ker ϕ ∼ = Rx. By Zorn’s
lemma there is a maximal left ideal M of R/ ker ϕ, and so M x is the maximal submodule
of Rx. Let P ′ be a complement of M x, then we can show that P ′ ∩ Rx is simple—if S is a
submodule of P ′ ∩Rx, then S ⊕M x is contained in Rx and contains M x. This shows that
S ⊕ M x = M x or Rx. In the first case S = 0, and in the second case S = P ′ ∩ Rx.
Property 10.6.2. The direct sum of a family of semisimple modules is still semisimple.
Any submodule and quotient module of a semisimple module are still semisimple.
152
Hung-Hsun, Yu 10.6 Group Representation Revisit
Sketch of Proof. Suppose that R is semisimple over itself and M is a module over R.
Then we can construct a map ⊕
ϕ: Rm → M
m∈M
With the definition of simpleness and semisimpleness, we can now restate the Maschke’s
theorem proved before.
Theorem 10.6.1. (Maschke’s theorem) Suppose that V is a finite dimensional space
over C that is also a C[G]-module where G is a finite group, then V is a semisimple
module over C[G].
Sketch of Proof. It suffices to show that F [G] is semisimple over itself. It is equivalent
to show that for every submodule V of F [G], there exists a submodule P such that
F [G] = V ⊕ P . This also is equivalent to show that there is a F [G]-linear surjection from
F [G] to V .
Since F [G]-module can also be seen as an F -module, or equivalently vector space over
F , there exists a F -linear projection π from F [G] to V . Now let ϕ : F [G] → V such that
1 ∑
ϕ(x) = gπ(g −1 x).
|G| g∈G
1 ∑
ϕ(sx) = gπ(g −1 sx)
|G| g∈G
1 ∑
= sgπ(g −1 x)
|G| g∈G
= sϕ(x),
which shows that ϕ is F [G]-linear. Lastly, we have to show that ϕ is surjective. Since for
every g ∈ G we have that gV ⊆ V , we have that
1 ∑ 1 ∑ −1
ϕ(v) = gπ(g −1 v) = gg v = v.
|G| g∈G |G| g∈G
We can also restate Schur’s lemma in the new language that we just learned.
153
Hung-Hsun, Yu 10.7 Structure Theorem for Module over PID
Theorem 10.6.3. (Schur’s lemma) Suppose that G is a finite group and M1 , M2 are
two simple modules over C[G], then HomC[G] (M1 , M2 ) = 0 unless M1 , M2 are isomorphic.
Moreover, EndC[G] (M1 ) ∼
= C.
Corollary 10.6.2. If R contains a field F and M is a simple module over R such that
dimF M is finite, then EndR (M1 ) is finite dimensional over F . In particular, if F is
algebraically closed, then EndR (M1 ) ∼
= F.
Sketch of Proof. Since EndR (M1 ) ⊆ EndF (M1 ) as an F -vector space, we have that
EndR (M1 ) is finite dimensional over F . Now if F is algebraically closed, let x ∈ EndR (M1 ).
Since EndR (M1 ) is finite dimensional over F , there exists n ∈ N such that x0 , · · · , xn
are linearly dependent over F . This shows that x is a root of polynomials with co-
efficients in F , and since F is algebraically closed, we can split the polynomial into
(x − f1 ) · · · (x − fn ) = 0 where fi ∈ F . Since every nonzero element is invertible, we have
that one of x − fi is zero, and so x ∈ F . Therefore EndR (M1 ) = F .
Example 10.7.1. We know that if R is a field, then M is a vector space over R and
thus must consist of a basis. Therefore any module over a field must always be free.
On the other hand, we know that Z/2Z is a module over Z. However, for every
x ∈ Z/2Z we have that 2x = 0, which shows that there is no basis in Z/2Z.
154
Hung-Hsun, Yu 10.7 Structure Theorem for Module over PID
Property 10.7.1. A unital module M over R is free if and only if it is a direct sum of
copies of R.
sending 1Rs to s. Then the definition implies that ϕ is a bijection, and so M is isomorphic
to the direct sum of Rs (s ∈ S).
We can actually split this into two parts (with a bit generalization):
Lemma 10.7.1. Suppose that R is a left Noetherian ring. Then any submodule of a
finitely generated unital module over R is finitely generated.
Sketch of Proof. Let M be a finitely generated unital module generated by the element
m1 , . . . , mn . We can do an induction on n. The statement is trivial when n = 0. If this
holds when n = k, then when n = k + 1, suppose that N is a submodule of M . Let I be
an left ideal containing i such that there exists r1 , . . . , rk ∈ R satisfying
r1 m1 + · · · + rk mk + imk+1 ∈ N.
Since R is left Noetherian, we know that I is finitely generated. Suppose that this is
generated by i1 , . . . , is and n1 , . . . , ns are elements in N such that for every j there exists
r1 , . . . , rm ∈ R satisfying
r1 m1 + · · · + rk mk + ij mk+1 = nj ∈ N.
N ′ + ⟨n1 , . . . , ns ⟩ = N.
Lemma 10.7.2. Suppose that R is a PID and M is a free module over R, and N is
a submodule of M . Then N is a free module over R. Moreover, the basis of N has an
equal or lower rank than the basis of M .
⊕
Sketch of Proof. Let M ∼ = i∈I R. By Zermelo’s theorem, we can assume that there is
⊕
a well-ordering ≤ on I. For every i ∈ I let Fi = N ∩ j≤i R. We can consider the
projection map at i-th position πi : Fi → R, and the image will be an ideal of R. Since
R is a PID, we know that there is ai ∈ R such that π(Fi ) = ai R. For every ai ̸= 0, let
ni ∈ Fi such that πi (ni ) = ai . We will show that those ni form a basis of N .
155
Hung-Hsun, Yu 10.7 Structure Theorem for Module over PID
Let’s first show that ni are linearly independent. If there exist i1 < . . . < ik and
ri1 , . . . , rik ̸= 0 ∈ R such that
where t1 < · · · < tk′ = j and st1 , . . . , stk′ ̸= 0 ∈ R. Since a is not generated, we know that
a − rnj is not generated either. However, we know that a − rnj ∈ Fmax(ik−1 ,tk′ −1 ) , which
is a contradiction with the minimality of j.
Remark. The proof above seems to be scary, but think about the case where M has
finite rank for a while. It will then be clear why the proof works.
Now suppose that M is a finitely generated module over a PID R, then we know that
there exists m such that M is a quotient module of Rm . We know that the kernel is a
submodule of Rm , and so by the theorem we know that the submodule is also finitely
generated free module. This tells us that there is a module homomorphism ϕ : Rn → Rm
such that ϕ(Rn ) is the kernel, or in other words, M ∼= Rm /ϕ(Rn ). Therefore the whole
theory of finitely generated module reduces to the theory of maps from Rn to Rm . If we
can derive a canonical form of linear maps from Rn to Rm , then we can represent every
finitely generated module M over a PID in a canonical form.
Theorem 10.7.2. (Smith normal form) Suppose that R is a PID and ϕ : Rn → Rm is
′
an R-linear map. Then we can choose a basis β1 , . . . , βn of Rn and β1′ , . . . , βm such that
ϕ(βi ) = di βi′
for every i = 1, . . . , n and that d1 | d2 | · · · | dn . Here if n > m then di = 0 for all i > m.
Sketch of Proof. Suppose that A is the matrix form of ϕ with respect to any two bases
on Rn and Rm . If we can do a series of changes of bases to make a11 be the greatest
common divisor of all entries. Then we can make a1j and ai1 all zero for i, j > 1 by a
series of changes of bases, and then we can just induct on min(m, n). Therefore it suffices
to make a11 the greatest common divisor of all entries. To do this, we only need to know
how to make a specific entry the greatest common divisor of the entries in the same row,
or in the same column. To achieve this, we just need a tool such that for each i, j, j ′ , we
can replace aij with the gcd of aij and aij ′ . Since R is a PID, we can assume that the
156
Hung-Hsun, Yu 10.8 Two Applications of the Structure Theorem
gcd of aij , aij ′ is xaij + yaij ′ . It’s clear that gcd(x, y) = 1, and so there is x′ , y ′ such that
xy ′ − yx′ = 1. This tells us that we can make a change of basis such that the j-th column
is x times j-th column plus y times j ′ -th column, and the j ′ -th column is x′ times j-th
column plus y ′ times j ′ -th column. In this case, aij is replaced by xaij + yaij ′ , which is
the gcd of aij , aij ′ .
Corollary 10.7.1. (The structure theorem of module over PID) Suppose that M is a
finitely generated module over a PID R, then there exists d1 | · · · | dn such that M is
isomorphic to
R/d1 R × · · · × R/dn R.
Moreover, such sequence is unique (up to conjugation) if d1 is not a unit.
Corollary 10.7.2. (The primitive factorization of module over PID) Suppose that M
is a finitely generated module over a PID R, then there exists q1 , . . . , qn where each qi is
conjugate to a power of an irreducible or zero such that M is isomorphic to
R/q1 R × · · · × R/qn R.
Moreover, such sequence is unique up to permutation and conjugation.
With this theorem, we can confidently say that we understand finitely generated
abelian groups well enough. For example, we can now easily calculate the number of
abelian groups of a given order.
Example 10.8.1. Let’s calculate the number of abelian groups of order 360 = 23 ×32 ×5.
By the fundamental theorem of finitely generated abelian group, we know that if G is a
abelian group of order 360, then G can be written as
G∼
= Z/q1 Z × · · · × Z/qn Z
157
Hung-Hsun, Yu 10.8 Two Applications of the Structure Theorem
Another good example of PID is the polynomial ring F [x] over a field F . We know
that every finitely generated F [x]-module is also a F -module, which means that it is a
vector space over F . Hence it is natural to start with a finite dimensional vector space
V over F .
Now for every v ∈ V and a ∈ F we know how to define av. To have a F [x]-module
structure on V , it remains to determine how to define x · v for every v ∈ V . It is not
hard to show that determining x · v is equivalent to determining a linear operator on V .
Therefore, given a linear operator T , we can define a F [x]-module structure on V by
f · v = [f (T )](v)
for every f ∈ F [x] and v ∈ V . Then, by the structure theorem, we know that V as a
F [x]-module is isomorphic to
k[x]/⟨f1 ⟩ ⊕ k[x]/⟨f2 ⟩ ⊕ · · · ⊕ k[x]/⟨fn ⟩
for some f1 , . . . , fn ∈ F [x]. This decomposes V into direct summands F [x]/⟨fi ⟩, and to
study the operation of T on V , it suffices to study it on each F [x]/⟨fi ⟩. Suppose that fi
is a polynomial of degree n, then F [x]/⟨fi ⟩ is an n-dimensional vector space, and we can
∑
choose {1, x, . . . , xn−1 } as a basis. Assume that fi = xn + n−1 j
j=0 aj x , then
∑
n−1
T (1) = x, T (x) = x2 , . . . , T (xn−2 ) = xn−1 , T (xn−1 ) = −aj xj .
j=0
158
Hung-Hsun, Yu 10.9 Random Problem Set
Definition 10.8.2. The matrices in the form above are said to be in their rational
canonical form.
Corollary 10.8.1. For any square matrix A, it is similar to a matrix in the rational
canonical form. Moreover, there is a unique matrix in the rational canonical form that is
similar to A.
Rational canonical form solves the problem we encountered when talking about Jordan
canonical form: we no longer need to work in an algebraically closed field in order to get
a canonical form of a linear operator. Rational canonical form also tells us how we should
tell if two given matrices are similar. In addition, it also gives a way to calculate the
minimal polynomial of a linear operator.
Property 10.8.1. Suppose that T is a linear operator on a finite dimensional vector
space V , and the rational blocks in the rational canonical form of T correspond to the
polynomials f1 , f2 , . . . , fn ∈ F [x]. Then the minimal polynomial of T is fn .
V ∼
= F [x]/⟨f1 ⟩ ⊕ · · · ⊕ F [x]/⟨fn ⟩
when x · v = T (v) for every v ∈ V . Therefore for any g ∈ F [x], we know that g(T ) = 0 if
and only if fi |g for every i. Since f1 | · · · |fn , this is equivalent to that fn |g, and so fn is
the minimal polynomial of T .
4. (10.4) For any R-modules M, N , show that every ϕ ∈ HomR (M, N ) induces a
submodule of M × N , namely (m, ϕ(m))m∈M .
5. (10.5) Suppose that F is a field and V, W are vector spaces over F . Show that
V ⊗ W where here the tensor product is defined for vector spaces is the same as
V ⊗F W as F -modules.
6. (10.5) Compute the module Z/mZ ⊗Z Z/nZ where m, n are positive integers.
7. (10.5) Suppose that R is an integral domain and F is its fraction field. If M, N are
two R-modules and ϕ : M → N is R-linear and injective, show that the induced
map M ⊗R F → N ⊗R F is also injective.
159
Hung-Hsun, Yu 10.9 Random Problem Set
8. (10.6) Try to rewrite the proof of orthogonality of characters into the language of
modules.
9. (10.7) Suppose that M is a module over R such that for every 0 ̸= m ∈ M and
0 ̸= r ∈ R, we have rm ̸= 0 (we say that M is torsion-free in this case). If R is
PID and M is finitely generated and torsion-free, show that M is a free module.
Use this to give an alternative proof of the structure theorem of module over PID.
10. (10.7) Given a matrix M over any PID, let d1 |d2 | . . . |dn be the diagonal entries of
its Smith’s normal form. Show that d1 . . . di is the gcd of the all the i by i minors.
11. (10.8) Suppose that (G, +) is a finite abelian group with order pα1 1 pα2 2 · · · pαnn . Show
that G is cyclic if and only if for every pi , the number of elements x in G that
satisfy pi x = 0 is pαi i −1 .
160
Chapter 11
In the last chapter, we are going to study Galois theory, which originates from the attempt
to give a radical formula of roots of polynomials with high degrees and divide an angle
into three identical angles with ruler and compass. Galois theory turns out to be more
powerful than that, and it becomes a canonical and fundamental language in various
fields of math studies.
But before we even dive into Galois theory, which has a lot to do with field extension,
we should first understand field better than we do now. In particular, I will talk more
about finite field, which we try to avoid in most of the notes. This will make it easier to
work with the field with nonzero characteristic, which allows us to understand better the
field where Galois theory does not behave as well.
There are a lot of desired properties that we will want the field extensions to have.
Let’s first name some basic ones.
Definition 11.1.2. Suppose that F ⊆ K is a field extension, then K is a vector space
over F . Hence the dimension of K over F is defined, and it is denoted by [K : F ]. We
say that the degree of the extension is [K : F ].
Definition 11.1.4. For any field extension F ⊆ K and any element x ∈ K, we say that
x is algebraic over F if there exists a nonzero polynomial f ∈ F [x] such that f (x) = 0 in
161
Hung-Hsun, Yu 11.1 Field Extension
Most of the field extensions discussed in this chapter will be finite, and almost all of
them will be algebraic. In particular, finite extension is always algebraic.
Property 11.1.2. An extension is finite if and only if it is algebraic and K is a finitely
generated F -algebra.
From this, we can actually see a way to construct a finite field extension.
Property 11.1.3. Suppose that F is a field and f ∈ F [t] is an irreducible polynomial.
Then F [t]/⟨f ⟩ is a finite extension. Moreover, [F [t]/⟨f ⟩ : F ] = deg f .
The extension that we form this way will always be of the form K = F [θ] for some θ
in K.
Definition 11.1.5. A field extension F ⊆ K is simple if there exists θ ∈ K such that
F [θ] = K. In this case, we say that θ is a primitive element in K over F .
Property 11.1.4. Suppose that F is a field. Then every finite simple extension K of
F is ismorphic to F [t]/⟨f ⟩ for some irreducible polynomial f in F [t].
162
Hung-Hsun, Yu 11.2 Finite Field Part One
Example 11.1.1. Consider the field extension R ⊆ C. This is finite and simple: we
can take C = R[i]. Therefore by the above construction we know that C ∼
= R[t]/⟨t2 + 1⟩,
and the minimal polynomial of i is x2 + 1.
Sketch of Proof. Suppose that q is a prime dividing |F |, then by Cauchy’s theorem (or
again FTFAG if you prefer) there exists x ̸= 0 such that q · x = 0 in F . Multiplying by
x−1 , we get that q · 1 = 0. Hence q = p, and so |F | can only be the power of p.
Now we can ask ourselves: given a prime p and q, a power of p, how many finite fields
(up to isomorphism) are of order q? If q = pn , then by the idea given by the previous
section, it is natural to take an irreducible polynomial f ∈ Fp [t] with degree n and then
construct F = Fp [t]/⟨f ⟩. However, it is not immediate to see why such polynomial f
exists. We will soon give a construction of a field Fq with order q = pn , but for now, let’s
assume that we already have this field.
Property 11.2.2. Suppose that F is a finite field of order q = pn , then every element
x in F satisfies the relation xq − x = 0.
×|
Sketch of Proof. We know that F × is a group, and so by Lagrange’s theorem 1 = x|F =
xq−1 for all x ̸= 0. Therefore xq = x for every x ∈ F .
Corollary 11.2.1. If F is a finite field of order q, then
∏
(t − x) = tq − t.
x∈F
Moreover, for every f, g ∈ F [t], we have that f (x) = g(x) for all x ∈ F if and only if
tq − t divides f − g.
Sketch of Proof. Since for every x ∈ F , we have that x is a root of tq − t, we know that
t − x divides tq − t. Hence, ∏
(t − x)|(tq − t),
x∈F
163
Hung-Hsun, Yu 11.2 Finite Field Part One
and by comparing the degree and the leading coefficient, we know that those two poly-
nomials are the same.
Now f (x) = g(x) for all x ∈ F if and only if t − x divides f − g for all x ∈ F . Hence
this is equivalent to that tq − t divides f − g.
Example 11.2.1. We know that F = F2 [t]/(t2 +t+1) is a field consisting of 4 elements.
Let x be the image of t in F , then the elements in F are 0, 1, x and x + 1. Then
Therefore we can think of F as “the field consisting of roots of tq − t.” This does not
quite make sense now, since we don’t know if it exists or if it is unique. But if tq − t some
how splits into linear factors, then we can get a field consisting of q elements out of this.
Lemma 11.2.1. If F is a field with characteristic p, then (x + y)p = xp + y p for every
x, y in F .
(x + y)p = xp + y p .
(xy)q = xq y q = xy
and
n n n
(x + y)q = (x + y)p = xp + y p = xq + y q = x + y,
which shows that xy, x + y ∈ F , as desired.
We still need to show that F has exactly q elements. Since tq − t splits into linear
factors, if |F | < q, then there exists x ∈ F such that (t − x)2 |tq − t. This shows that
x is a double root of tq − t, and so x should be a root of the derivative of tq − t. Since
the derivative of tq − t is qtq−1 − 1 = −1, we know that this cannot happen. Therefore
|F | = q.
Remark. We use the derivative of polynomials in the proof. However, the usual defini-
tion of taking a derivative does not work in a general field K. Here, taking the derivative
means calculating the derivative formally, i.e.
d ∑ n ∑n
ai ti = iai ti−1 .
dt i=0 i=1
164
Hung-Hsun, Yu 11.3 Splitting Field
It can be checked that the formulas we have for the usual derivatives still hold:
∏
n
f (t) = c (t − αi )
i=1
In order√to make it a splitting field, we need to put ω in and enlarge the extension to
Q ⊆ Q[ 3 2, ω]. This also makes the√ degree of extension from 3 to 6.
In fact, we can show that Q[ √
3
2] is not a splitting field over Q, that is, there does
not exist f ∈ Q[t] such that Q[ 2] is a splitting field of f over Q. It will be somehow
3
annoying to show it here, so I will delay the proof until we have enough tools to prove it.
165
Hung-Hsun, Yu 11.4 Finite Field Part 2
This example tells us that using the construction that we have been used might not
give us a splitting field of f over F . That said, using the construction several times can
give us the desired extension.
Theorem 11.3.1. (Existence of splitting field) Let F be a field and f be a polynomial
in F [t]. Then there exists an extension K of F such that K is a splitting field of f over
F.
Sketch of Proof. We can prove this by induction on deg f . The statement clearly holds
when deg f = 1. Now suppose that the statement holds for every deg f < n, then when
deg f = n, we can first assume that f is irreducible. This is because if f = gh with
deg g, deg h < deg f , then we can first find a splitting field of g over K1 , and then find a
splitting field K2 of h over K1 . K2 will then be the splitting field of f over F .
Now if f is irreducible, then we can take K1 ∼ = F [t]/⟨f ⟩ as an extension of F . Let x
be the image of t in K1 , then we know that x is a root of f . Therefore f (t) splits into
(t − x)f ′ (t). Since deg f ′ = deg f − 1, we can take K2 such that it is a splitting field of
f ′ over K1 . Then K2 is a splitting field of f over F , as desired.
Sketch of Proof. Again, we can show this by induction. It clearly holds when deg f = 1.
Now suppose that it holds when deg f < n, then when deg f = n, we can still assume
that f is irreducible by the same argument above.
Since f splits into linear factors in K and K ′ , we can choose roots x ∈ K, x′ ∈ K ′
of f . Then the minimal polynomial of x and x′ is f since f is irreducible, and so
F [x] ∼= F [x′ ]. Let f (t) = (t−x)g(t) = (t−x′ )g ′ (t). It is easy to check that the coefficients
of g, g ′ are polynomials in x and x′ , respectively. Therefore g(t) ∈ F [x][t] is identified
to g ′ (t) ∈ F [x′ ][t] when identifying F [x] with F [x′ ]. Since deg g = deg g ′ = deg f − 1
and K, K ′ are splitting fields of g, g ′ over F [x], F [x′ ], respectively, we know by induction
hypothesis that K ∼ = K ′ , as desired.
Corollary 11.3.2. For every power of prime q, there is a unique field (up to isomor-
phism) of order q.
Definition 11.3.2. For every power of prime q, denote the unique field of order q by
Fq .
166
Hung-Hsun, Yu 11.4 Finite Field Part 2
Property 11.4.1. Suppose that p is a prime and m, n are two positive integers. Then
there exists a field extension Fpm ⊆ Fpn if and only if m|n. Moreover, if m|n, then such
extension is unique.
Sketch of Proof. If there is a field extension Fpm ⊆ Fpn , suppose that the degree of the
extension is d. Then pn = |Fpn | = |Fpm |d = pmd , and so m|n.
Conversely, if m|n, then we know that the polynomial tp − t divides the polynomial
m
tp − t. Since tp − t splits into linear factors in Fpn , we know that tp − t also splits into
n n m
linear factors in Fpn , and so by Theorem 11.2.1 we know that there is a subset of order
pm in Fpn that forms a field. By the uniqueness we know that this field is Fpm . This
extension is also unique since elements in Fpm are the roots of the polynomial tp − t.
m
the image of t in Fp [t]/⟨f ⟩. Note that the minimal polynomial of x is f , and so f |tp − t,
n
as desired.
Now to show that tp − t is the product of all irreducible polynomials in Fp [t] that
n
∏
have degrees dividing n, we can group the product x∈Fpn (t − x) in a way that it becomes
the product of some irreducible polynomials (to achieve this, group the x’s that share
the same minimal polynomial). Then we know that tp − t is a product of some distinct
n
irreducible polynomials that have degree dividing n. By the previous part, every such
irreducible polynomial divides tp − t, and so we know that tp − t is in fact the product
n n
Sketch of Proof. For every n ∈ N, let an be the number of nonzero elements with order n
in the multiplicative group. If an > 0 for some n, then by Lagrange’s theorem we know
that n|q − 1. Moreover, suppose that x is an element with order n, then x, x2 , . . . , xn = 1
are n elements that are the roots of the polynomial tn − t. Therefore, there are no other
elements that are roots of tn − t, which also means that there are no other elements of
order n. Since xi is of order n if and only if i, n are coprime, we know that an = φ(n) in
this case. Therefore an ≤ φ(n) in any case.
Since there are q − 1 nonzero elements, we have that
∑
q−1 ∑ ∑
q−1= an = an ≤ φ(n) = q − 1,
n=1 n|q−1 n|q−1
167
Hung-Hsun, Yu 11.5 Algebraic Closure
and so each equaltiy must hold. This shows that aq−1 = φ(q − 1) > 0, and so there exists
an element that generates F×
q .
To find a primitive root is actually often a hard task in practice, but the existence
of a primitive root itself already is powerful in some situations. See problem set for an
example.
At this points, we can confidently say that we know enough about finite fields and
finite extensions of finite fields. Starting from the next section, we are going to explore
more properties of field extensions. Now that we have those examples of finite extensions
of finite field, feel free to check those properties on them to help understanding.
In this section, we will show that our intuition is correct: for every field F , there exists
a unique algebraic closure K of F up to isomorphism. The proof is not that helpful for
the coming up materials, so feel free to skip the proofs if you feel comfortable assuming
this fact.
Let’s first prove the existence. We will proceed in several steps.
Lemma 11.5.1. Let F be a field. Then there exists an extension K of F such that
every polynomial in F [t] has a root in K.
Sketch of Proof. Let S be the set of irreducible polynomials. For each f ∈ S, associate
a variable tf with it. Let R = F [{tf | f ∈ S}] be the polynomial ring in infinitely many
variables. Consider the ideal I generated by {f (tf ) | f ∈ S}. It is clear that I is not the
unit ideal, and so it is contained in some maximal ideal m of R. Let K = R/m be a field.
Then there is clearly a natural embedding F ⊆ K. Moreover, if we let xf be the image
of tf in K, then for each f ∈ S we have that f (xf ) = 0 since f (tf ) ∈ I ⊆ m.
Lemma 11.5.2. Let F be a field. Then there exists an extension K of F such that K
is algebraically closed.
Sketch of Proof. Let F0 = F . For each n, choose Fn+1 to be an extension Fn such that
every irreducible polynomial in Fn [t] has a root in Fn+1 . Let K be the union of F0 , F1 , . . ..
168
Hung-Hsun, Yu 11.5 Algebraic Closure
Sketch of Proof. The idea here is to define a partial embedding f : L → F̄ for each field
L ⊆ K that contains F , and show that we can extend this until we get an embedding
K → F̄ .
Let D be the set of pairs (L, f ) where L is a subfield of K and at the same time a
extension of F , and f is an embedding L → F̄ . Note that D is nonempty since we are
given an inclusion F ⊆ F̄ . Define a partial order ≤ on D so that (L1 , f1 ) ≤ (L2 , f2 ) if
and only if L1 ⊆ L2 and f2 |L1 = f1 . For every chain C in D, let
∪
LC = L
(L,f )∈C
and define fC (x) = f (x) if x ∈ L and (L, f ) ∈ C. It is easy to check that fC is well-
defined and (LC , fC ) ∈ D. It is also clear that (LC , fC ) is an upper bound of C, and so
by Zorn’s lemma we can choose a maximal element (L, f ).
Now if L ̸= K, then we can choose x ∈ K\L. Since x is algebraic over F , we know
that x is algebraic over L. Also since f (L) contains f (F ) = F , we know that F̄ is also
an algebraic closure of f (L). Let g(t) be the minimal polynomial of x over L, then
L[x] ∼= L[t]/⟨g⟩. Let α ∈ F̄ be a root of f (g(t)) (this exists since F̄ is an algebraic closure
of f (L)), then we can extend f onto L[t]/⟨g⟩ by sending t to α. Therefore we can extend
f onto L[x], which is strictly larger than L. This is a contradiction with the maximality
of (L, f ), and so L = K.
169
Hung-Hsun, Yu 11.6 Automorphism
11.6 Automorphism
The main object that Galois theory cares about for an extension is the automorphism
group. It might not be clear why it is useful now, but we will soon see its power once we
are able to state and prove the main theorem of Galois theory.
Definition 11.6.1. Suppose that K/F is a field extension. The automorphism group
Aut(K/F ) of the extension K/F is the subgroup of Aut(K) consisting of the automor-
phisms that fix F .
and so σ is determined by the value of σ(i). Now note that we can only take σ(i)2 = −1,
which shows that σ(i) = ±i. Therefore Aut(C/R) ∼ = C2 .
√
Example 11.6.2. Now consider √ the field extension Q[ 3
2]/Q. We know that for every
a, b, c ∈ Q and σ ∈ Aut(Q[ 3 2]/Q), we have
√ √3
√ √
σ(a + b 2 + c 22 ) = a + bσ( 2) + cσ( 2)2 ,
3 3 3
√
and so σ is uniquely determined by the value of σ( 3 2). Since
√ √
σ( 2)3 − 2 = σ( 23 − 2) = 0,
3 3
√ √ √
the value σ( 3 2) is forced to be 3 2. Therefore Aut(Q[ 3 2]/Q) is a trivial group.
One can see that the reason that the automorphism √ group is trivial is because that
the polynomial t − 2√only has one root
3
√ in Q[ 2]. To fix this, we can take √
3
the splitting
field of t3 − 2 over Q[ 3 2], which √ is Q[ 3
2, ω]. It is still clear that σ ∈ Aut(Q[ √
3
2, ω]/Q) is
determined
√ √ by 3
√ the value of σ( 2) and σ(ω). There are three choices for σ( 2), namely
3
3
2, ω 3 2, ω 2 3 2, and there are two choices for σ(ω), namely ω and ω 2 . Therefore there
are six choices in total, and with some √ hard work one can see that each of them indeed
induces an automorphism in Aut(Q[ √
3
2, ω]/Q).
Although we know | Aut(Q[ 3 2, ω]/Q)| = 6, we don’t quite √ know what √ the group struc-
√
2 3
ture is yet. Notice that σ is also
√ determined
√ by√ the values of σ( 3
2), σ(ω
√
3
2) and σ(ω 2),
2 3
and each
√ σ√ should √permute 3
2, ω 3
2 and ω 2. Therefore Aut(Q[ 3
2, ω]/Q) acts
√ on the
set { 3 2, ω 3 2, ω 2 3 2} faithfully, which induces a monomorphism √ from Aut(Q[ 3 2, ω]/Q)
to S3 . Since they have the same order, we know that Aut(Q[ 3 2, ω]/Q) ∼ = S3 .
170
Hung-Hsun, Yu 11.6 Automorphism
What this example tells us is that “splitting” is one of the conditions that the auto-
morphism group does not “degenerate.” This is not the only one condition, but from here
we should see that the conditions should be something about the roots of the polynomials.
The second condition is rather technical, so I will delay it until later.
What we learn from those two examples is that: (1) In a finite extension K/F ,
there are some elements x1 , . . . , xn such that the values of σ(x1 ), . . . , σ(xn ) can determine
σ ∈ Aut(K/F ); (2) The image σ(x) should share the same polynomial relations with x,
and in particular they should share the minimal polynomial; (3) The automorphism group
of finite extension has finite order. The third observation might be somewhat surprising
before we see the examples, but it should be clear now. We will spend the rest of this
section to state those observations formally.
Property 11.6.2. Suppose that K/F is an extension, and K is a finitely generated
F -algebra generated by x1 , . . . , xn . Then for every two elements σ, σ ′ ∈ Aut(K/F ), if
σ(xi ) = σ ′ (xi ) for all i = 1, . . . , n, then σ = σ ′ .
Sketch of Proof. Let f be the minimal polynomial of x over F and f ′ be the minimal
polynomial of σ(x) over F . Then
Corollary 11.6.1. Let F [θ]/F be a finite simple extension. Then | Aut(F [θ]/F )| is the
number of conjugates of θ over F in F [θ].
171
Hung-Hsun, Yu 11.6 Automorphism
Sketch of Proof. We know that σ(θ) is a conjugate of θ, and once we know which conju-
gate it is, the automorphism σ is then determined. Therefore we just need to check that
for any conjugate θ′ of θ there is an automorphism σ such that σ(θ) = θ′ . Suppose that
the minimal polynomial of θ is f , then we know that
π : F [t]/⟨f ⟩ → F [θ]
π ′ : F [t]/⟨f ⟩ → F [θ]
Corollary 11.6.2. If K/F is a finite extension, then there are only finitely many au-
tomorphisms of K/F .
c1 σ1 (x) + · · · + cn σn (x) = 0
Sketch of Proof. Suppose for the sake of contradiction that they are not linearly inde-
pendent, then we can find a minimal linearly dependent set σ1 , . . . , σn such that
c1 σ1 (x) + . . . + cn σn (x) = 0
172
Hung-Hsun, Yu 11.7 Fixed Field
c1 (σn (y) − σ1 (y)) σ1 (x) + · · · + cn−1 (σn (y) − σn−1 (y)) σn−1 (x) = 0.
Since c1 , σn (y) − σ1 (y) ̸= 0, we know that σ1 , . . . , σn−1 are linearly dependent, which
contradicts the minimality of σ1 , . . . , σn .
which shows that σ1 , . . . , σn are linearly dependent. This contradicts the lemma, and so
m < n is impossible.
Example 11.7.1. Take our good old field extension C/R. Let G be the group Aut(C/R).
Then CG is R since the automorphisms in G is identity and conjugation. In this case,
taking the automorphism group and then taking the fixed field gives us the original field.
√ √
Example 11.7.2. Now consider again the field extenion Q[ 3 2]/Q. Since Aut(Q[ 3 2]/Q)
√ √
3 √
is trivial, we know that Q[ 3 2]Aut(Q[ 2]/Q) is Q[ 3 2]. In this case, taking the automorphism
group and then taking the fixed field gives us a strictly larger √ group.
Again, we can consider instead the field extension Q[ 3 2, ω]/Q.√ Let G ∼ = S3 be
the automorphism group of this extension. Then it is clear that Q[ 2, ω] = Q. We
3 G
can say even more about this field extension. Let H be √ the subgroup
√ of G containing
the√identity√map and the automorphism σ such that σ( 2) = 2, σ(ω) = ω 2 . Then
3 3
Q[ 3 2] = Q[ 3 2, ω]H . This gives us back the original field extension we are working with,
173
Hung-Hsun, Yu 11.7 Fixed Field
which somehow tells us that extending the field extension further until it is splitting is
the right thing to do. √ ′
If we take instead the subgroup H ′ that fixes ω, then we can see that Q[ 3 2, ω]H =
Q[ω].
In the above example, we can see that somehow “taking the fixed field” and “tak-
ing the automorphism group” build up a correspondence between the subgroups of the
automorphism group and the fields that lie between.
Definition 11.7.2. Let F ⊆ K be a field extension. We say that L is an intermediate
field if L is a field such that F ⊆ L ⊆ K.
Sketch of Proof. We just need to show that the two maps are well-defined. For every
L ∈ SF , we need to show that Aut(K/L) is a subgroup of Aut(K/F ). Note that since
F ⊆ L, every element in Aut(K/L) fixed F and thus is in Aut(K/F ). We also need to
show that for every H ∈ SG , the fixed field K H is an intermediate field. It is certainly a
subfield of K, and so we just need to show that it contains F . Since H is a subgroup of
G, H fixes all the elements in F and so F ⊆ K H , as desired.
Corollary 11.7.1. F ◦ G ◦ F = F, G ◦ F ◦ G = G.
Sketch of Proof. For every subgroup H, we know by the property that F(G(F(H))) con-
tains F(H). On the other hand, the property states that G(F(H)) contains H, and so
F(G(F(H))) ⊆ H because F is order-reversing. Similary argument works for G.
Corollary 11.7.2. F and G form a bijection between intermediate fields that are fixed
fields of some subgroups, and the subgroups that are automorphism groups of some
intermediate fields.
Therefore in order to get this good correspondence, we will want F to be a fixed field
of K. This is where the focus of Galois theory is.
Definition 11.7.3. Let F ⊆ K be a field extension. If F is the fixed field of Aut(K/F ),
and the extension is algebraic, then we say that this extension is Galois. In this case, we
usually write Gal(K/F ) instead of Aut(K/F ), and we call Gal(K/F ) the Galois group
of the extension. Moreover, if K is the splitting field of f over F , then we also call
Gal(K/F ) the Galois group of f .
174
Hung-Hsun, Yu 11.7 Fixed Field
Theorem 11.7.1. If K/F is an extension where F is the fixed field of a finite auto-
morphism group G, then [K : F ] = |G|.
and so we have
Example 11.7.3. The extension C/Q is Galois. It has degree 2, and its Galois group
has order 2. √
The extension Q[ 3 2]/Q is not Galois. It has degree 3, while its group of automor-
phisms has order 1. √
The extension Q[ 3 2, ω]/Q is Galois. Its Galois group has order 6, so the theorem
implies
√ that it has degree 6. It is also not hard to show without the theorem that
3
[Q[ 2 : ω]/Q] = 6.
Lastly, let’s validate our claim that splitting field is a desirable property.
Property 11.7.3. If K/F is a finite Galois extension, then it is splitting.
175
Hung-Hsun, Yu 11.8 Separability
Sketch of Proof. It suffices to show that every element has a splitting minimal polynomial
over F in K (this is not that clear and is left as an exercise). Now for every x ∈ K we
can consider the polynomial ∏
(t − σ(x)).
σ∈Gal(K/F )
This polynomial is fixed by all the elements in Gal(K/F ), and so its coefficients are
fixed too. Since F is the fixed field, the coefficients lie in F , which shows that there is
a splitting polynomial that has x has a root. Since the minimal polynomial divides that
polynomial, the minimal polynomial also splits into linear factors in K, as desired.
11.8 Separability
We just proved that finite Galois implies splitting. In fact, it also implies that the
extension is “separable.” Before proving this, let’s first define what separable actually
means.
Definition 11.8.1. Let K/F be a field extension. A polynomial f over F is separable
if it has no double roots in its splitting field. An element x ∈ K is separable over F if its
minimal polynomial is separable. The field extension K/F is separable if every element
in K is separable over F .
It is quite annoying to work with splitting field when we want to verify if an element
is separable or not. The following property allows us to only work in the base field to
verify the separability of an element.
Property 11.8.1. Let K/F be a field extension, x be an algebraic element in K with
the minimal polynomial f . Then x is separable over F if and only if the formal derivative
of f is zero. Here, the formal derivative is simply taking the derivative by the usual
formula for taking derivatives of polynomials. Namely,
d
(an tn + · · · + a1 t + a0 ) = nan tn−1 + · · · + 2a2 t + a1 .
dt
Sketch of Proof. One can verify that, as usual, a polynomial f that splits into linear
factors has double roots if and only if gcd(f, f ′ ) ̸= 1. Now, since f is the minimal
polynomial of x over F , we know that f is irreducible. Therefore f has double roots
⇔ gcd(f, f ′ ) ̸= 1 ⇔ f |f ′ ⇔ f ′ = 0, as desired.
Corollary 11.8.1. Every field extension with characteristic zero is separable.
Sketch of Proof. Let p be the characteristic. We can first prove that Fq is separable over
Fp for every power q of p. Suppose that f ∈ Fp [t] is a polynomial whose formal derivative
is 0, then we can write f as
an tpn + an−1 tp(n−1) + · · · + a1 tp + a0
for some a0 , . . . , an ∈ Fp . Therefore we have that
f (t) = (an tn + · · · + a0 )p ,
176
Hung-Hsun, Yu 11.8 Separability
The two corollaries show that all the field extensions mentioned above are separable.
This somehow shows that “reasonable” field extensions are separable. However, there are
still extensions that are not separable.
Example 11.8.1. Consider the field extension Fp (x)/Fp (xp ). Then the element x is not
separable over Fp (xp ): the minimal polynomial of x is tp − xp , whose formal derivative is
ptp−1 = 0 (because the characteristic is p).
Extensions that are not separable have a lot of undesired properties, so usually we will
focus on separable extensions. There sure are theories developed for those inseparable
extensions, but they are far beyond the scope of this note.
Now we are ready to prove that finite Galois implies separable.
Theorem 11.8.1. If K/F is a finite Galois extension, then it is separable.
Sketch of Proof. For simplicity, let G = Aut(K/F ). For any element x, let Ox be the
orbit of x with respect to the group action G on K. Then the polynomial
∏
f (t) = (t − y)
y∈Ox
is fixed by the group action G. Since F is the fixed field by G, we know that f (t) ∈ F [t].
We can then conclude that f is the minimal polynomial of x. Since every element in Ox
differs from one another, we have that x is separable over F . Thus, the extension K/F
is separable.
Corollary 11.8.3. Let K/F be a finite Galois extension. Then for every x ∈ K, the
polynomial ∏
(t − σ(x))
σ∈Gal(K/F )
Sketch of Proof. By the proof above, we know that the minimal polynomial is
∏
f (t) = (t − y),
y∈Ox
and so
∏ | Gal(K/F )|
(t − σ(x)) = f (t) |Ox | .
σ∈Gal(K/F )
Finite Galois implies both splitting and separable. The converse is also true: splitting
and separable imply finite Galois.
Lemma 11.8.1. (Transitivity of auotomorphism) Let K be a splitting field of F . Then
for any elements x, y ∈ K that are conjugate with each other over F , there exists an
automorphism σ ∈ Aut(K/F ) such that σ(x) = y.
177
Hung-Hsun, Yu 11.9 Fundamental Theorem of Galois Theory
Sketch of Proof. Suppose that K is the splitting field of f . Since x, y are conjugates,
the two fields F [x] and F [y] are isomorphic. Note that K are the splitting fields of f
over F [x] and F [y], and f is preserved under the isomorphism between F [x] and F [y].
Therefore by the uniqueness of splitting field, the isomorphism between F [x] and F [y]
induces an automorphism of K fixing F and sending x to y.
Theorem 11.8.2. Let K/F be a field extension. Then it is finite and Galois if and
only if it is a splitting field of a separable polynomial f .
Sketch of Proof. If K/F is finite and Galois, then we know that it is separable and K
is also a splitting field of some polynomial f over F . Since K/F is separable, we can
replace f by a separable polynomial so that K is still its splitting field, as desired.
Conversely, if K/F is a splitting field of a separable polynomial f , then we can
induct by the degree of f . It clearly holds when deg f = 1, so let’s assume that it
holds whenever deg f < n. Then when deg f = n, we can factor f (t) into irreducibles
f1 (t), . . . , fm (t). Let α1 , . . . , αs be the roots of deg f1 (t). Then K is the splitting field
of f (t)/(t − αi ) over F [αi ] for every αi , and so by the induction hypothesis K/F [αi ] is
Galois. In other words, for every element that is not in F [αi ] there is an automorphism
in Aut(K/F [αi ]) ⊆ Aut(K/F ) that moves it. Now let x be an element that is fixed by
every automorphism in Aut(K/F ), then we know that x ∈ F [α1 ]. As a consequence there
exist c0 , . . . , cs−1 ∈ F such that
x = c0 + c1 α1 + · · · + cs−1 α1s−1 .
178
Hung-Hsun, Yu 11.9 Fundamental Theorem of Galois Theory
Sketch of Proof. We already know that G ◦ F is the identity map on SG , and so we just
need to show that F ◦ G is the identity map on SF . Since K/F is finite Galois, we
know that there is a separable polynomial f over F such that K is the splitting field of
f over F . Therefore for every intermediate field L we also have that K is the splitting
field of f over L. Thus, K/L is also finite Galois, which shows that K Gal(K/L) = L and
[K : L] = | Gal(K/L)|, as desired.
This is a really powerful tool for enumerating intermediate fields because we usually
have a better understanding of the Galois group. The following are two examples of how
to make use of this statement.
√ √
Example 11.9.1. Consider the extension √ Q[
√ 2, √3]/Q. Then √ its automorphism group
consists of elements σ such that σ( 2) = ± 2, σ( 3) = ± 3. Therefore its automor-
phism group G is ismorphic to K4 . Moreover, it is √ clear that Q √
is its fixed
√ field, and √ so this
is a finite Galois extension. For simplicity, let σij ( 2) = (−1)i 2, σij ( 3) = (−1)j 3 for
i, j = 0, 1. Then the intermediate fields are the fixed fields of {σ00 }, {σ00 , σ10 }, {σ00 , σ01 },
{σ00 , σ11 }, G. √ √
The fixed fields of {σ00 }, G are clearly Q[ 2, 3] and Q,√ respectively.
√ So let’s
√ √ calculate
√
the three remaining fixed fields. Choose a Q-basis of Q[ 2, 3],√say {1, √ 2, √ 6}.
3,
Then the fixed field of {σ00 , σ10 } should contain the elements a + b 2 + c 3 + d 6 such
that
√ √ √ √ √ √ √ √ √
a + b 2 + c 3 + d 6 = σ10 (a + b 2 + c 3 + d 6) = a − b 2 + c 3 − d 6,
√
which shows that b = d = 0. Therefore the fixed field of {σ00 , σ10 } is√Q[ 3]. Similarly, √
one can calculate and obtain √that √ the other√two√fixed fields are Q[ 2] and Q[ 6]. A
consequence of this is that Q[ 2 + 3] = Q[ 2, 3].
Before, it would be hard to persuade one that there are only finitely many intermediate
fields. Now with this tool, we can even list all the intermediate fields.
√
Example 11.9.2. Now √ consider the extenion Q[ 4 2]/Q. This is not Galois since √ the
minimal polynomial
√ of 2 is t − 2, which does not split completely because ±i√ 2 is
4 4 4
not in Q[ 4 2]. To fix this, let’s take the splitting field of t4 − 2 over Q, which is Q[ 4 2, i].
This is finite Galois since t4 − 2 is automatically separable (note that we are working √ with
fields of characteristic 0). It is also easy to calculate that the Galois group of Q[ 2, i]/Q
4
√
for a = 0, 1, 2, 3 and b = 0, 1. Denote such σ by σab . Then we can see that Gal(Q[ 4 2, i]/Q) ∼ =
D4 , where we can see σa0 as rotations and σ
√ a1 as reflections.
To find the √ intermediate fields of Q[ 4 2]/Q, it√is the same as finding the intermedi-
ate fields of Q[ 4 2, i]/Q that are contained in Q[ 4 2]. By the fundamental
√ theorem of
Galois √ theory, this √
4
is the same as finding the subgroups of Gal(Q[ 2, i]/Q) that contain
Gal(Q[ 2, i]/Q[ 2]) = {σ00 , σ01 }. There are three such subgroups: {σ00 , σ01 },
4 4
{σ00 , σ01 , σ20 , σ21 } and the entire group. Hence the only nontrivial√ intermediate field is the
fixed field of {σ00 , σ01 , σ20 , σ21 }, which can √ be Q[ √
be verified to √ 2] by some calculations.
In conclusion, the intermediate fields of Q[ 4 2]/Q are Q[ 4 2], Q[ 2] and Q.
What this example demonstrates is that if we are not in a splitting field, then we can
take a suitable extension to make it splitting. If the original extension happens to be
179
Hung-Hsun, Yu 11.10 More on Separability
separable, then the result will be Galois, and then we can apply the fundamental theorem
of Galois theory to get what we want.
If K/F is finite Galois, then we know that K/L is also finite Galois for every interme-
diate field L. On the other hand, L/F is not necessarily Galois, as the previous example
demonstrates. The fundamental theorem of Galois theory also provides a way to see if
L/F is Galois by checking some conditions in the Galois group.
Theorem 11.9.2. (Fundamental theorem of Galois theory Part 2) Let K/F be a finite
Galois extension, and let L be an intermediate field. Then the following are equivalent:
1. L/F is Galois;
Sketch of Proof. 1. ⇒ 2.: By the uniqueness of splitting field, we can extend any au-
tomorphism of L/F to an automorphism of K/F . Since L/F is finite Galois, there are
[L : F ] automorphisms of L/F . Since [K : L][L : F ] = [K : F ] and | Gal(K/F )| =
[K : F ], | Gal(K/L)| = [K : L], we know that every coset of Gal(K/L) in Gal(K/F ) can
be obtained by extending an automorphism of L/F to one of K/F . This then gives a
compatible group structure of the cosets of Gal(K/L), which implies that Gal(K/L) is a
normal subgroup of Gal(K/F ).
2. ⇒ 3.: For every σ ∈ Gal(K/F ), we can see that σ Gal(K/L)σ −1 = Gal(K/σ(L)).
Therefore σ(L) = L because σ Gal(K/L)σ −1 = Gal(K/L) by the normality of Gal(K/L).
3. ⇒ 1.: The Galois group Gal(K/F ) induces an automorphism group of L since
σ(L) = L for every σ ∈ Gal(K/F ). The fixed field of this automorphism group is F , and
so L/F is Galois.
Now if one of them holds, then by the proof of the implication 1. ⇒ 2. we know that
Gal(K/F )/ Gal(K/L) ∼ = Gal(L/F ) in a natural way.
√
Example 11.9.3. Consider the extension Q[ 4 2, i]/Q again. Then its Galois group is
isomorphic to D4 , and the normal subgroups are D4 , {1}, C4 and K4 . Therefore
√ the inter-
√
mediate fields L such that L/Q is Galois are their fixed fields, namely Q[ 2, i], Q[i], Q[ 2]
4
and Q.
180
Hung-Hsun, Yu 11.10 More on Separability
Sketch of Proof. Suppose first that we have a finite extension F [θ]/F , where the minimal
polynomial of θ over F is f . For every intermediate field L, let g be the minimal polyno-
mial of θ over L. Then clearly g divides f . Assume that g(t) = tn + an−1 tn−1 + · · · + a0 ,
then we can show that L = F [a0 , . . . , an−1 ]. Clearly F [a0 , . . . , an−1 ] ⊆ L because ai ∈ L.
Moreover, [F [θ] : F [a0 , . . . , an−1 ]] ≤ n because g(θ) = 0. Since [F [θ] : L] = n, we have
that L = F [a0 , . . . , an−1 ]. As a consequence, there is at most one intermediate field for
every divisor of f . Since there are only finitely many divisors of f , we are done.
Conversely, if K/F is a finite extension with finitely many intermediate fields, we can
assume that F is infinite since finite extensions of finite field are always simple. Choose
θ ∈ K such that [F [θ] : F ] is the largest. If F [θ] ̸= K, then there exists x ∈ K\F [θ].
Consider the intermediate fields F [θ+cx] as c runs through the elements in F . Since there
are only finitely many intermediate fields, there exist c, c′ such that F [θ +cx] = F [θ +c′ x].
In particular, θ+cx, θ+c′ x ∈ F [θ+cx], which shows that (c−c′ )x ∈ F [θ+cx]. Multiplying
(c−c′ )−1 , we get that x ∈ F [θ+cx], and so θ ∈ F [θ+cx]. This implies that F [θ] ⊊ F [θ+cx]
(because x ∈ F [θ + cx]\F [θ]), which contradicts the choice of θ. Therefore F [θ] = K, as
desired.
Corollary 11.10.1. (Another version of primitive element theorem) Every finite sepa-
rable extension is simple.
Sketch of Proof. It suffices to show that if K/F is finite and separable, then there are
only finitely many intermediate fields. Suppose that a1 , . . . , an are elements in K such
that F [a1 , . . . , an ] = K and fi is the minimal polynomial of ai for any i = 1, . . . , n. Take
the splitting field E of f1 f2 . . . fn , then since K/F is separable we have that E/F is also
separable. As a consequence, E/F is finite Galois, and so the intermediate fields of K/F
correspond to the subgroups of Gal(E/F ) that contain Gal(E/K). Since Gal(E/F )
is finite, there are only finitely many subgroups, and so there are only finitely many
intermediate fields, as desired.
Next, we will show some criteria for an extension to be separable. The case that the
characteristic is 0 is trivial, so we will assume that the characteristic is p in the remaining
of this section.
Lemma 11.10.1. Let K/F be an extension of characteristic p, and let x be an algebraic
element over F in K. Then x is separable over F if and only if F [x] = F [xp ].
Sketch of Proof. If x is separable over F , then it is also separable over F [xp ]. Let g be
the minimal polynomial of x over F [xp ], then g(t)|tp − xp = (t − x)p . Since x is separable,
we know that g is separable, and so g(t) = t − x. This shows that x ∈ F [xp ], and so
F [x] ⊆ F [xp ].
Conversely, if F [x] = F [xp ] but x is not separable, let f be the minimal polynomial of
x over F . Then we know that f ′ = 0, which shows that f (t) = h(tp ) for some polynomial
h over F . Since f is irreducible, we know that h is irreducible. Since x is a root of f , we
know that xp is a root of h. This implies that h is the minimal polynomial of xp over F ,
and so
deg f = [F [x] : F ] = [F [xp ] : F ] = deg h,
which is absurd since deg f = p deg h.
Lemma 11.10.2. Let K/F be a finite extension of characteristic p. Then the following
are equivalent:
181
Hung-Hsun, Yu 11.10 More on Separability
1. there exists a basis {v1 , . . . , vn } of K over F such that {v1p , . . . , vnp } is also a basis;
2. for every basis {v1 , . . . , vn } of K over F , {v1p , . . . , vnp } is also a basis;
3. K/F is separable.
Sketch of Proof. 1. ⇒ 2.: For every basis {u1 , . . . , un }, it suffices to prove that {up1 , . . . , upn }
span K over F . It then suffices to show that {up1 , . . . , upn } spans vip for i = 1, . . . , n. Since
{u1 , . . . , un } is a basis, we can write
vi = c1 u1 + · · · + cn un ,
and so
vip = (c1 u1 + · · · + cn un )p = cp1 up1 + · · · + cpn upn ,
as desired.
2. ⇒ 3.: For any element x ∈ K, suppose that s = [F [x] : F ]. Then {1, . . . , xs−1 }
is linearly independent over F . By the replacement lemma, we can extend this to a
basis {1, . . . , xs−1 , us+1 , . . . , un } of K over F . Then {1, xp , . . . , xp(s−1) , ups+1 , . . . , upn } is
also a basis, and so {1, xp , . . . , xp(s−1) } is also linearly independent. As a consequence,
[F [xp ] : F ] ≥ [F [x] : F ]. Since F [xp ] ⊆ F [x], we know that F [xp ] = F [x], and so x is
separable. We prove that every element is separable, and so K/F is separable.
3. ⇒ 1.: Let {v1 , . . . , vn } be a basis of K over F , and let L be the vector space spanned
by {v1p , . . . , vnp }. We can first prove that in fact L = F [v1p , . . . , vnp ]. Since {v1 , . . . , vn } is a
basis, we know that for every i, j there exists cijk ∈ F such that
∑
vi vj = cijk vk .
Therefore Ä∑ äp ∑ p
vip vjp = cijk vk = cijk vkp ,
as desired. This then shows that L is an intermediate field. Now if a ∈ K\L, then by the
same trick we can show that ap ∈ L. Since a is separable, we know that F [a] = F [ap ].
This is a contradiction since a ∈ F [a] = F [ap ] ⊆ L. Therefore L = K, as desired.
With this, we can now give two criteria that are really useful criteria for separability.
Theorem 11.10.2. Let α be an algebraic element over a field F . If α is separable over
F , then F [α] is also separable over F .
Sketch of Proof. Consider the case where the characteristic is p. We know that F [α] =
F [αp ], and so {1, α, . . . , αn−1 } and {1, αp , . . . , αp(n−1) } are both basis of F [α] over F .
Hence F [α] is separable over F .
Theorem 11.10.3. Let K/F be a finite extension and L be an intermediate field. Then
K/F is separable if and only if K/L, L/F are both separable.
Sketch of Proof. Consider the case where the characteristic is p. If K/F is separable, then
clearly K/L and L/F are both separable. Conversely, if K/L and L/F are separable,
then there is an L-basis {a1 , . . . , an } of K such that {ap1 , . . . , apn } is an L-basis. There is
also an F -basis {b1 , . . . , bm } of L such that {b1 , . . . , bpm } is also an F -basis of L. Then
{a1 b1 , . . . , an bm } and {ap1 bp1 , . . . , apn bpM } are both F bases of K, and so K/F is separable,
as desired.
182
Hung-Hsun, Yu 11.11 Random Problem Set
To conclude this section (and also this chapter), let’s introduce perfect field as a
perfect ending :).
Definition 11.10.1. A field is a perfect field if every finite extension of it is separable.
Sketch of Proof. If F is perfect, then for every x ∈ F , consider the splitting field K of
tp − x over F and suppose that y is a root of tp − x. Let f be the minimal polynomial of
y, then f (t)|tp − x = tp − y p = (t − y)p . Since F is perfect, K is separable over F and so
f is separable over F . This then implies that f (t) = t − y, and so y ∈ F , as desired.
Conversely, if for every x ∈ F there exists y ∈ F such that y p = x, then for every
finite extension K of F and every element a ∈ K, assume that its minimal polynomial
over F is f . If f ′ (t) = 0, we can suppose that
∑
n−1
f (t) = tpn + ci tpi
i=0
(p − 1)! ≡ −1 mod p.
183
Hung-Hsun, Yu 11.11 Random Problem Set
4. (11.2) Let Fq be the finite field of order q = pn . Prove, by induction, that for
any two polynomials f, g ∈ Fq [x1 , . . . , xm ], if f (x) = g(x) for every x ∈ Fm q , then
f −g ∈ ⟨xq1 −x1 , . . . , xqm −xm ⟩. Using this, show that for every function s : Fmq → Fq
there exists a polynomial f ∈ Fq [x1 , . . . , xm ] such that f (x) = s(x) for all x ∈ Fm q .
7. (11.4) Show that for every prime p and for every positive integer n, there exists an
irreducible polynomial in Fp [t] with degree n. Moreover, if there is a unique one,
then p = n = 2.
for any positive integer n, show that for any positive integer n,
n|an .
is 0 if q − 1 ∤ n, and is −1 if q − 1|n.
(Hint: Think about how primitive root can help here)
10. (11.5) Let K1 , K2 be two fields. Let D be the set of pairs (A, f ) where A is a subring
of K1 and f is a homomorphism of rings with 1 from A to K2 . Define a partial
order ≤ on D so that (A, f ) ≤ (A′ , f ′ ) if and only if A ⊆ A′ and f = f ′ |A . Show
that if (A, f ) is a maximal element in D, then ker f is the unique maximal ideal of
A.
184
Hung-Hsun, Yu 11.11 Random Problem Set
12. (11.6) Let k be a field and k(t) be the field of rational functions in t. Consider the
rational function
(t2 − t + 1)3
s(t) = 2
t (t − 1)2
and let k(s) be the smallest field in k(t) containing s. Then k(t)/k(s) is a field
extension. Verify that
1 1 1 t
σ(f (t)) = f (t), f (1 − t), f ( ), f (1 − ), f ( ), f ( )
t t 1−t t−1
are six automorphisms in Aut(k(t)/k(s)). Also show that there exists a polynomial
with coefficients in k(s) of degree 6 that has t as a root. Conclude that [k(t) :
k(s)] = 6.
∑
n−1
xn + ai xi = 0.
i=0
14. (11.8) An extension K/F is called normal if every irreducible polynomial over F
that has a root in K splits into linear factors. √
Show that if K is a splitting field
over F , then K/F is normal. Conclude that Q[ 3 2] is not a splitting field over Q.
(Hint: Let f be an irreducible polynomial, and let E be its splitting field over
K. Then we need to show that if there is a root α of f that lies in K, then
every root β of f also lies in K. Use the uniqueness of splitting field to show that
K = K[α] ∼ = K[β], which then shows that β ∈ K.)
√ √ √
15. (11.9) Find all the intermediate fields Q[
√ √ √2, 3, 5]/Q. Using this, find a prim-
of
itive element θ such that Q[θ] = Q[ 2, 3, 5].
185
Hung-Hsun, Yu 11.11 Random Problem Set
186
Chapter 12
In this chapter, we are going to apply what we have learned in the last chapter and
obtain the results about ruler and compass construction and radical formula for solving
polynomials. These are just the classical and basic examples of the applications of field
extensions, and there are a lot of other examples that I will not mention here.
In addition, to have a better understanding of solving polynomials, we are going
to introduce the elementary symmetric polynomials. In some sense, they provide the
most complicated setting for solving polynomials (which is kind of similar to symmetric
group being the most complicated group). They will be really helpful for finding roots of
polynomials, and as we will see lastly, they will allow us to determine the Galois group
of a splitting field of irreducible polynomials with degree less than five without getting
our hands dirty.
3. Take the intersection of two constructed lines, one constructed line and one con-
structed circle, or two constructed circles.
187
Hung-Hsun, Yu 12.1 Ruler and Compass Construction
Example 12.1.1. Given two points A(0, 0), B(1, 0), draw the circle O1 centered at
A passing through B and the circle O2 centered at B passing through A. Take the
intersections C, D of O1 , O2 . Draw the line CD, and take the intersection E of AB and
CD. Then E has a coordinate (1/2, 0), and so 1/2 is constructible.
Sketch of Proof. It is clear that it suffices to consider the case where a, b > 0. In the above
example, we actually demonstrate how to construct the midpoint of two constructed
points. Since (a, 0), (b, 0) are constructible, we know that ((a + b)/2, 0) is constructible.
Now take the circle centered at ((a + b)/2, 0) passing through (0, 0), and take the in-
tersection of it with the x axis. Then the intersection other than (0, 0) is (a + b, 0), as
desired. We can similarly construct (a − b, 0).
To construct ab, it suffices to construct a/b for all constructible a, b. This is because
that 1 is constructible, and so if a, b constructible implies a/b constructible, then a, b con-
strucible implies 1/b constructible, which then shows that a/(1/b) = ab is constructible.
The way that we construct a/b is pretty simple: draw the y axis, and then mark the point
A(0, 1), B(0, b), C(a, 0). Now draw the line l passing through A parallel to BC, and take
the intersection D of l and the x axis. By similar triangle, we know that the coordinate
of D is (a/b, 0), as desired.
√
Lastly, to construct a, draw the y axix and mark the two points A(0, a), B(0, −1).
Construct their midpoints and then draw a circle centered at their midpoints passing
through both of them. Take the intersection C of the circle with the x-axis. Then by
similar√triangle, we have that AO/CO = CO/BO √ where O is the origin. This shows that
CO = a, and so the coordinate of C is ± a, as desired.
As a consequence, if we can obtain a real number by taking finitely many square roots,
then that real number will be constructible. We can make it more precise.
Theorem 12.1.1. Let α be a real number. If there exists a chain of fields Q = K0 ⊆
K1 ⊆ · · · ⊆ Kn ⊆ R such that α ∈ Kn and [Ki+1 : Ki ] = 2, then α is constructible.
Sketch of Proof. We can prove that Kn contains only constructible elements. Since a, b
constructible implies a + b, a − b, ab, a/b constructible, the constructible reals form a field.
In particular, the rationals are constructible.
Now we can induct on n. There is nothing to prove when n = 0. As an inductive step,
suppose that every element in Kn−1 is constructible. Since [Kn : Kn−1 ] = 2, we know
that Kn = Kn−1 [x] for any x ∈ Kn \Kn−1 . Let t2 + pt + q be the minimal polynomial of
x, then we know that √
−p ± p2 − 4q
x= .
2
By the induction hypothesis
√ 2 p, q are constructible, and so p2 − 4q is also constructible.
As a consequence, p − 4q is constructible, which shows that x is constructible. Then
the entire field Kn is constructible, as desired.
188
Hung-Hsun, Yu 12.1 Ruler and Compass Construction
It turns out that this sufficient condition for a real to be constructible is also necessary.
To see this, at step n let Kn be the field generated by the coordinates of the constructed
points. For any two points (a0 , b0 ), (a1 , b1 ), the equation of the line passing through these
two points is
(b1 − b0 )x − (a1 − a0 )y = a0 b1 − a1 b0 ,
and so the coefficients are all in Kn . Similarly, the equation of the circle centered at
(a0 , b0 ) passing through (a1 , b1 ) is
(x − a0 )2 + (y − b0 )2 = (a1 − a0 )2 + (b1 − b0 )2 ,
whose coefficients are all in Kn . Therefore the coordinate of the intersection of two lines
still lies in Kn . The coordinate of the intersection of a line and a circle or two circles
satisfies a quadratic equation. This tells us that either Kn+1 = Kn or [Kn+1 : Kn ] = 2.
Hence,
Theorem 12.1.2. Let α be a real number. Then α is constructible if and only if
there exists a chain of fields Q = K0 ⊆ K1 ⊆ · · · ⊆ Kn ⊆ R such that α ∈ Kn and
[Ki+1 : Ki ] = 2.
′
Sketch of Proof. Let Ki′ = Q[α] ∩ Ki . Then [Ki+1 : Ki′ ] = 1 or 2. Hence [Q[α] : Q] =
′ ′
[Kn : K0 ] is a power of 2.
Remark. Note that this is not a sufficient condition. In fact, there is a degree-four
algebraic real that is not constructible. The tool that we need to show that will be
developed later, so see exercise for it.
Example 12.1.2. Now we can answer the three problems and prove that it is impossible
to solve the problem with ruler and compass construction.
First, given a circle, we cannot construct a square with the same area. This
√ is because
that if we could construct such square, then the side length of
√ it would be π (given that
the radius of the circle is 1). This would then implie that π is constructible, which is
absurd since π is transcendental.
Given a cube,
√ we also cannot construct another cube that is twice as large. This is
because that 2, having a minimal polynomial t3 − 2 of degree 3, is not constructible.
3
Lastly, if we could always trisect an angle, then we could construct an angle π/9 since
we can construct an angle π/3 (by constructing an equilateral triangle). This then would
show that cos π/9 is constructible. However, cos π/9 satisfies the equation
1
4t3 − 3t = cos π/3 = ,
2
or equivalently,
8t3 − 6t − 1 = 0.
This polynomial turns out to be irreducible over Q, and so cos π/9 is not constructible.
189
Hung-Hsun, Yu 12.2 Solution in Radicals
x2 + ax + b = 0.
Example 12.2.2. Now suppose that a, b, c are three numbers, and we want to solve
the equation
x3 + ax2 + bx + c = 0.
We can use the same trick and rewrite this as
1 1 1 1 2
(x + a)3 + (b − a2 )(x + a) + (c − ab + a3 ) = 0.
3 3 3 3 27
With appropriate substitutions, it suffices to solve for y with
y 3 + py + q = 0
where p, q are given. Cardano gave a really clever way to solve this in the following way:
Set y = u + v, and rewrite the equation as
u3 + v 3 + q + (3uv + p)(u + v) = 0.
u3 + v 3 = −q
and
3uv = −p,
then y = u + v is a solution. Note that the second equation can be written as u3 v 3 =
−p3 /27. Therefore u3 , v 3 are the roots of the polynomial t2 + qt − p3 /27. We then know
that
1 1 2 1
u3 , v 3 = − q ± q + p3 ,
2 4 27
and so √
3 1 1 2 1
u, v = − q ± q + p3 .
2 4 27
190
Hung-Hsun, Yu 12.3 Elementary Symmetric Polynomial
Here we are actually cheating a bit: if we see taking a cubic root of s as solving the equa-
tion t3 − s, then there are three roots. This then shows that there are three possibilities
each for u and v, and so there should be nine possibilities for y, which is clearly absurd.
The issue here is that all we are guaranteed is that u3 v 3 = −p3 /27, and so it could be
that 3uv = −ωp for some primitive third root of unity ω. This tells us that we have to
choose the correct u, v and add them together to get y.
This argument works as long as we are working with fields with characteristic not
equal to 2 or 3. If a, b, c belongs to a field F such that char F ̸= 2, 3, then p, q ∈ F ,
and so to obtain the splitting field of t3 + pt + q it suffices
» to adjoin the square roots of
q /4 + p /27, and then adjoin the cubic roots of −q/2 ± q 2 /4 + p3 /27. By the relation
2 3
x = y − a/3, we can see that this will also be the splitting field of t3 + at2 + bt + c.
In the above two examples, we successfully express the roots in terms of the coeffi-
cients with finitely many additions, subtractions, multiplications, divisions, and taking
the n-th roots. We call this kind of solution an algebraic solution or a solution in radicals.
With more hard work, people in the past also found an algebraic solution to quartic poly-
nomials. On the other hand, Abel-Ruffini theorem states that there does not exist such
an expression for a “general” quintic polynomial. In the following sections, we will prove
this with Galois theory. To this end, let’s first give a Galois-theoretic reinterpretation of
solutions in radicals.
Definition 12.2.1. A field extension K/F is a simple radical extension if F [α] = K
for some α ∈ K such that there is n ∈ N satisfying αn = F . A radical series F0 ⊆ F1 ⊆
· · · ⊆ Fn is a series of fields extensions such that Fi /Fi+1 is a simple radical extension.
Example 12.2.3. We have seen that any quadratic and cubic polynomial over a field
of characteristic 0 is solvable. As a consequence, every quadratic and cubic polynomial
over R and C is solvable.
f (t) = (t − α1 )(t − α2 ) · · · (t − αn ).
191
Hung-Hsun, Yu 12.3 Elementary Symmetric Polynomial
∑
n
(t − α1 )(t − α2 ) · · · (t − αn ) = tn + (−1)k ek (α1 , . . . , αn )tn−k .
k=1
Example 12.3.1. Consider the field of rational functions F (x1 , . . . , xn ), and consider
the subfield F (e1 , . . . , en ) generated by the n elementary polynomials. Then the field
F (x1 , . . . , xn ) is the splitting field of
∑
n
tn + (−1)k ek tn−k
k=1
over F (e1 , . . . , en ).
Sketch of Proof. Suppose that α1 , . . . , αn are the n roots of f in K, then we can consider
the homomorphism sending xi to αi . Since K is generated by α1 , . . . , αn , the map is
surjective. Note that for every i, we have that ei gets sent to ei (α1 , . . . , αn ), which is
[tn−i ]f (t). Therefore the image of ei is in F , which shows that the image of F (e1 , . . . , en ) is
contained in F . Now if there is a radical series connecting F (x1 , . . . , xn ) and F (e1 , . . . , en ),
then there is a radical series connecting K and the image of F (e1 , . . . , en ). Since the
image of F (e1 , . . . , en ) is contained in F , one can modify the radical series so that the
base field becomes F , which then shows that f is solvable over F . The detail is left as
an exercise.
Now let’s see what properties are there in this most general setting. Let’s first consider
the automorphism group of the extension F (x1 , . . . , xn )/F (e1 , . . . , en ). Since e1 , . . . , en
are symmetric, we can see that permuting the indicies of xi fixes F (e1 , . . . , en ). To be
more precise, every permutation π ∈ Sn induces an automorphism σπ of F (x1 , . . . , xn ),
namely
(σπ (f ))(x1 , . . . , xn ) = f (xπ(1) , . . . , xπ(n) ) ∀f ∈ F (x1 , . . . , xn ),
and σπ fixes e1 , . . . , en by definition. Therefore σπ also fixes F (e1 , . . . , en ), which shows
that σπ ∈ Aut(F (x1 , . . . , xn )/F (e1 , . . . , en )), and hence Sn ≤ Aut(F (x1 , . . . , xn )/F (e1 , . . . , en )).
Actually these are all the automorphisms. To show this, it suffices to show that the fixed
field of Sn is F (e1 , . . . , en ), or equivalently, the elementary symmetric polynomials gen-
erate the field of symmetric rational functions. In fact, we can prove something stronger.
Theorem 12.3.1. (Fundamental theorem of symmetric polynomials) For any commu-
tative ring A with 1 and any symmetric polynomial f ∈ A[x1 , . . . , xn ]Sn , there exists a
unique polynomial g ∈ A[x1 , . . . , xn ] such that g(e1 , . . . , en ) = f (x1 , . . . , xn ). In other
words, every symmetric polynomial can be uniquely written as a polynomial of the ele-
mentary symmetric polynomials.
192
Hung-Hsun, Yu 12.3 Elementary Symmetric Polynomial
Sketch of Proof. We will prove this by induction on the “degree” of the polynomial f .
Here, instead of the usual definition of degree where x1 , . . . , xn all have the same weight,
we consider the lexicographical order, and use it as the “degree” of the polynomial.
Therefore the polynomial x1 x2 + x2 x3 + x3 x1 + x1 + x2 + x3 should be written as x1 x2 +
x1 x3 + x1 + x2 x3 + x2 + x3 when using descending order, and the leading term becomes
x1 x2 , the degree becomes (1, 1, 0).
Let’s first prove the existence of g. Induct on the degree of f . The case that the
degree of f is (0, 0, . . . , 0). Now suppose that the statement holds for any symmetric
polynomial of lower degree, then suppose that the leading term of f is axd11 · · · xdnn for
some 0 ̸= a ∈ A. Since f is symmetric, we know that d1 ≥ d2 ≥ · · · ≥ dn . This is because
if di < di+1 for some i, then
d d
[xd11 · · · xi i+1 xdi+1
i
· · · xdnn ]f = [xd11 · · · xdi i xi+1
i+1
· · · xdnn ]f = a ̸= 0,
d
which is a contradiction with the assumption that axd11 · · · xdi i xi+1 i+1
· · · xdnn is the leading
term. Now set ci = di − di+1 (here dn+1 = 0), then it is easy to verify that the degree
of the polynomial ec11 ec22 · · · ecnn is (d1 , . . . , dn ). Hence the degree of f ′ = f − aec11 · · · ecnn is
smaller. By the induction hypothesis, there is g ′ such that f ′ (x1 , . . . , xn ) = g ′ (e1 , . . . , en ).
Set g(x1 , . . . , xn ) = axc11 · · · xcnn + g ′ (x1 , . . . , xn ), then
as desired.
Now it remains to show the uniqueness. It suffices to show that g(e1 , . . . , en ) ̸= 0
for every nonzero polynomial g. Otherwise, let xc11 · · · xcnn be the monomial with nonzero
coefficient in g such that (c1 + c2 + · · · + cn , c2 + · · · + cn , . . . , cn ) is the largest. It is clear
that such monomial is unique. Then the degree of g(e1 , . . . , en ) is (c1 + c2 + · · · + cn , c2 +
· · · + cn , . . . , cn ), which is a contradiction with the assumption that g(e1 , . . . , en ) = 0.
Example 12.3.2. To demonstrate how the proof works, let’s consider the following
example. The polynomial [(x1 − x2 )(x2 − x3 )(x3 − x1 )]2 is symmetric. We can expand it
as
( )2
∑ 2 ∑ 4 2
[(x1 −x2 )(x2 −x3 )(x3 −x1 )] = 2
xx 1 2 − x1 x22 = x x −2x4 x
1 2 1 2 x3 −2x1 x2 +2x1 x2 x3 −6x1 x2 x3 .
3 3 3 2 2 2 2
cyc sym
∑
Here cyc shows that xa1 xb2 xc3 stands for the orbit of xa1 xb2 xc3 when permuted cyclically,
∑
and sym shows that xa1 xb2 xc3 stands for the orbit of xa1 xb2 xc3 when permuted arbitrarily.
193
Hung-Hsun, Yu 12.4 Kummer Extension
The elementary symmetric polynomial is really powerful although it does not involve
anything hard. For example, we can construct, with the elementary polynomial, a poly-
nomial that has a+b as a root if we know the minimal polynomials of a and b. See exercise
for some results that can be obtained using the elementary symmetric polynomials.
Now that we know that K(x1 , . . . , xn )/K(e1 , . . . , en ) is the most general setting,
the problem becomes: What are the n’s such that there is a radical series connecting
K(x1 , . . . , xn ) and K(e1 , . . . , en )?
Sketch of Proof. We can first show that the characteristic of F does not divide n. Let ξ
be a primitive n-th root. Then we know that for every s ∈ Z, we have ξ s = 1 if and only
if n|s. Now if the characteristic of F is a prime factor p of n, then
Ä äp
0 = ξ n − 1 = ξ n/p − 1 ,
194
Hung-Hsun, Yu 12.4 Kummer Extension
Therefore it makes sense to talk about the Galois group of Kummer extensions. It
turns out that the Galois group behaves really nicely.
Property 12.4.2. Suppose that K/F is a Kummer extension where K is the splitting
field of (tn − a1 )(tn − a2 ) · · · (tn − am ) over F . Then Gal(K/F ) is an abelian group such
that g n = 1 for every g ∈ Gal(K/F ).
Sketch of Proof. Let ξ be a primitive n-th root of unity and αi be a root of tn − ai . Then
the roots of tn − ai are αi , ξαi , . . . , ξ n−1 αi . Therefore any automorphism σ ∈ Gal(K/F )
is determined by the value of σ(αi ), and the value of σ(αi ) can only be ξ j αi for some j.
Now suppose that σ(αi ) = ξ pi αi and γ(αi ) = ξ qi αi for any two automorphisms σ, γ, then
over Ei . It is clear that Ei+1 /Ei is Kummer, and that this polynomial is fixed by any
element in Gal(Ei /F ). Therefore this polynomial has coefficients in F , which shows that
Ei+1 /F is Galois. Since αi ∈ Ei+1 we also have Fi+1 ⊆ Ei+1 . Therefore at the end we
will have K ⊆ Fn ⊆ En and that En /F is Galois.
Let G′i = Gal(En /Ei ). Then by the fundamental theorem of Galois theory we know
that G′i /G′i+1 ∼
= Gal(Ei+1 /Ei ), and so G′i /G′i+1 is abelian. Let N = Gal(En /K), then
the theorem also implies that G′0 /N ∼ = G. Let Gi = (G′i N )/N , then the surjective
homomorphism G′i → Gi → Gi /Gi+1 induces a homomorphism G′i /G′i+1 → Gi /Gi+1
since Gi+1 is sent to 1. This shows that Gi /Gi+1 is abelian. Since G0 = G and Gn = 1,
we are done.
We see that if f is solvable, then its Galois group satisfies some properties. For
simplicity, let’s also call this property solvable.
Definition 12.4.2. A group G is solvable if there exists a chain of subgroups G =
G0 ⊇ G1 ⊇ · · · ⊇ Gn = 1 such that Gi+1 is a normal subgroup of Gi and that Gi /Gi+1 is
abelian.
195
Hung-Hsun, Yu 12.5 Characterization of Kummer Extension
With this definition, we can rewrite the corallary as follows: Suppose that f is a
solvable polynomial over F where char F = 0 and F contains all the roots of unity, then
the Galois group of f is solvable.
The properties of solvable group are relatively complicated. We will only use the
following property and the definition throughout the note. For more properties, see the
exercise.
Property 12.4.3. A subgroup of a solvable subgroup is solvable. The image of a
solvable subgroup under a homomorphism is solvable.
Sketch of Proof. The proof of the second statement is already presented in the proof
of the corollary. For the first one, let H be a subgroup of a solvable group G where
G = G0 ⊇ · · · ⊇ Gn = 1 is a certificate of the solvability of G. Let Hi = Gi ∩ H, then
by the second isomorphism theorem we have that Hi Gi+1 /Gi+1 ∼ = Hi /(Hi ∩ Gi+1 ). Since
Hi ∩ Gi+1 = H ∩ Gi ∩ Gi+1 = H ∩ Gi+1 = Hi+1 and Hi Gi+1 ⊆ Gi , the quotient Hi /Hi+1
is a subgroup of Gi /Gi+1 and hence is abelian. Therefore H is also solvable.
F [ p a]/F is p, and the extension is also Galois. Since | Gal(F [ p a]/F√)| = p, we have
that√ the Galois
√ group is Z/pZ, and the automorphism σi acts on F [ p a]/F such that
i
σi ( p a) = ξ p a.
Now the question is if we can recover a given√the Galois group. √ To get more insight,
let’s consider some arbitrary element x = c0 + c1 a + · · · + cp−1 a for ci ∈ F . Then
p p p−1
√ √ √ √
σi (x) = σi (c0 + c1 p a + · · · + cp−1 ap−1 ) = c0 + c1 ξ i p a + · · · + ξ i(p−1) ap−1 .
p p
√
Therefore to extract the term c1 p a, we can consider
∑ σi (x) √
i
= pc1 p a,
i∈Z/pZ
ξ
√
and so we recover a up to a factor in F as long as pc1 p a ̸= 0. By √ Lemma 11.6.1 we can
choose x such that the result is nonzero, and so we can recover a with this procedure.
p
196
Hung-Hsun, Yu 12.5 Characterization of Kummer Extension
Then
∑ σ(σ ′ (x)) ∑ (σ ◦ σ ′ )(x)ϕ(σ)
σ(a) = = = ϕ(σ)a,
σ ′ ∈Gal(K/F )
ϕ(σ ′ ) σ◦σ ′ ∈Gal(K/F )
ϕ(σ ◦ σ ′ )
so it remains to choose x such that a ̸= 0. The existence of such x is guaranteed by
Lemma 11.6.1.
With this, we are ready to prove the converse of Property 12.4.2, which gives another
characterization of Kummer extension.
Theorem 12.5.1. Suppose that K/F is a finite Galois extension and n is a positive
integer such that the following holds:
(i) F contains a primitive n-th root ξ;
(ii) Gal(K/F ) is abelian;
(iii) Every element in Gal(K/F ) has an order dividing n.
Then K/F is Kummer.
Sketch of Proof. By the fundamental theorem of finite abelian group, the Galois group is
isomorphic to some Z/n1 Z × · · · × Z/nm Z. By the condition (iii), we have that ni |n for
any i = 1, . . . , m. For each i, we know that ξ n/ni is a primitive ni -th root. Therefore we
can consider a homomorphism ϕi : Gal(K/F ) → F such that
( n )gi
ϕi ((g1 , g2 , . . . , gm )) = ξ ni ∀(g1 , . . . , gm ) ∈ Z/n1 Z × · · · × Z/nm Z.
Then we know that there exists 0 ̸= αi ∈ K such that σ(αi ) = ϕi (σ)αi for all σ ∈
Gal(K/F ). As a consequence,
Ç ån
σ(αin ) σ(αi )
= = ϕi (σ)n = ϕi (σ n ) = 1.
αin αi
This shows that ai = αin is fixed by every automorphism, which then shows that ai ∈
F . It remains to show that K is the splitting field of (tn − a1 )(tn − a2 ) · · · (tn − am )
over F , or equivalently, K = F [α1 , . . . , αm ]. Let σ = (g1 , . . . , gm ) be an element in
Gal(K/F [α1 , . . . , αm ]) ≤ Gal(K/F ). If there exists gi such that gi ̸≡ 0 mod ni , then
σ(αi ) ( n )gi
= ϕi (σ) = ξ ni ̸= 1
αi
since ξ n/ni is a primitive ni -th root. This is a contradiction with the assumption that
σ ∈ Gal(K/F [α1 , . . . , αm ]). Therefore the Galois group Gal(K/F [α1 , . . . , αm ]) is trivial,
which shows that K = F [α1 , . . . , αm ], as desired.
Corollary 12.5.1. Suppose that f is a polynomial over a field F with characteristic
zero such that the Galois group G of f is solvable and F contains the primitive |G|-th
roots of unity. Then f is solvable over F .
197
Hung-Hsun, Yu 12.6 Natural Irrationality
At this point we have more or less characterized the solvability of a polynomial over
a field with characteristic 0. The caveat here is that we have to assume that the field
contains sufficient roots of unity, and we will remove this assumption in the next section.
Sketch of Proof. Let’s first prove that Gal(K ′ /F ′ ) is isomorphic to the subgroup of Gal(K/F ).
To show this, let’s first construct a natural homomorphism Gal(K ′ /F ′ ) → Gal(K/F )
given by restricting on K. Therefore we have to show that every automorphism σ of
K ′ /F ′ maps K to itself and fixes F . Let α1 , . . . , αs be the roots of f , then σ permutes
the roots α1 , . . . , αs . Moreover, since σ fixes F ′ , it also fixes F . Therefore σ sends
K = F [α1 , . . . , αs ] to itself and fixes F .
We also have to show that this natural homomorphism is injective. Suppose that σ
fixes every element in K, then σ fixes α1 , . . . , αs , and so σ also fixes every element in K ′ .
This implies the injectivitiy, and so Gal(K ′ /F ′ ) is isomorphic to a subgroup of Gal(K/F ).
It remains to show that the fixed field of this subgroup is K ∩ F ′ . Clearly every element
in K ∩ F ′ is fixed. On the other hand, every element in K that is not in K ∩ F ′ does not
belong to F ′ , and hence is moved by one automorphism of K ′ /F ′ . Therefore the fixed
field is K ∩ F ′ , as desired.
This allows us to remove the condition that F contains enough roots of unity, and so
finally we have the following:
Theorem 12.6.2. A polynomial f over a field F with characteristic 0 is solvable if and
only if its Galois group over F is solvable.
Sketch of Proof. If the Galois group G of f over F is solvable, then let F ′ = F [ξ] where
ξ is the |G|-th root of unity. By the natural irrationality we know that the Galois group
of f over F ′ is a subgroup of G and hence is solvable. Therefore f is solvable over F ′ .
Since F ′ /F is a simple radical extension, we know that f is also solvable over F .
198
Hung-Hsun, Yu 12.6 Natural Irrationality
Conversely, if f is solvable over F , then we can assume that there is a radical series
F = F0 ⊆ · · · ⊆ Fn such that f splits completely in Fn . We can extend this to F ⊆ F0 [ξ] ⊆
· · · ⊆ Fn [ξ] for some suitable root of unity ξ. We then know that Gal(Fn [ξ]/F ) is solvable
since Gal(Fn [ξ]/F0 [ξ]) is solvable and Gal(F0 [ξ]/F ) is abelian. Let K be the subfield of
Fn [ξ] that is the splitting field of f over F , then Gal(K/F ) = Gal(F0 [ξ]/F )/ Gal(F0 [ξ]/K)
is solvable, as desired.
Now we can prove that there is no radical formula for a general polynomial of degree
at least 5.
Theorem 12.6.3. Let F (e1 , . . . , en ) be the field of symmetric rational function where
F is of characteristic 0. Then the polynomial tn − e1 tn−1 + · · · + (−1)n en is not solvable
for n ≥ 5.
Sketch of Proof. We know that the Galois group of the polynomial is Sn . If Sn is solvable,
then there exists Sn = G0 ⊇ · · · ⊇ Gn = 1 such that Gi /Gi+1 is abelian. Since An is
simple, the only nontrivial normal subgroup of Sn is An , which shows that G1 = An .
However An is simple and non-abelian, which shows that there does not exist G2 such
that G1 /G2 is abelian and that G1 ̸= G2 . This is a contradiction. Therefore Sn is not
solvable, and so the polynomial is not solvable.
Sketch of Proof. Let K be the splitting field of f and let α1 , . . . , αp be the roots of f .
Then Gal(K/F ) permutes the roots of α1 , . . . , αp and thus can be seen as a subgroup of
Sp . We know that [F [α1 ] : F ] = p, and so p| Gal(K/F ). By Cauchy’s theorem there is an
element of Gal(K/F ) that has order p. This shows that there is a p-cycle in Gal(K/F ).
We also know that conjugation is an automorphism of K/F , and since there are exactly
two roots that are not real, this corresponds to a transposition in Sp . Since Gal(K/F )
contains a p-cycle and a transposition, one can check that Gal(K/F ) is in fact Sp , as
desired.
Example 12.6.1. By Eisenstein criterion (with p = 2), we know that the polynomial
f (x) = x5 − 4x + 2 is irreducible over Q. Moreover, the derivative of x5 − 4x + 2 is 5x4 − 4,
which has two real roots. This shows that x5 − 4x + 2 has at most three real roots, and
by f (−∞) = −∞, f (−1) = 5, f (1) = −1, f (∞) = ∞ we know that there are actually
three roots. Therefore the Galois group of f over Q is S5 , which is not solvable.
In general, given a finite group, it is difficult to tell if it is some Galois group of some
Galois extension of Q. This is called the inverse Galois problem, and is still unsolved to
my best knowledge.
199
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group
Example 12.7.2. Now consider the extension F (x1 , x2 , x3 )/F (e1 , e2 , e3 ). The Galois
group is S3 , which is solvable because C3 is abelian, normal in S3 and S3 /C3 ∼ = C2 is
also solvable. Let L be the fixed field of C3 , then we know that F (x1 , x2 , x3 )/L and
L/F (e1 , e2 , e3 ) are both Kummer.
Let’s first show that L/F (e1 , e2 , e3 ) is a simple radical extension. We know that
Gal(L/F (e1 , e2 , e3 )) = S3 /C3 , so we can take sgn : S3 /C3 → {±1} ≤ F (e1 , e2 , e3 )× as the
homomorphism. Consider the element x21 x2 + x22 x3 + x23 x1 . This element is fixed by C3
and hence is in L. The procedure then tells us to consider
∑ σ(x21 x2 + x22 x3 + x23 x1 ) ∑ 2
= x1 x2 − x1 x22 = (x1 − x2 )(x1 − x3 )(x2 − x3 ).
σ∈Gal(L/F (e1 ,e2 ,e3 ))
sgn(σ) cyc
This is nonzero, and so L = F (e1 , e2 , e3 )[(x1 − x2 )(x1 − x3 )(x2 − x3 )]. Moreover, [(x1 −
x2 )(x1 − x3 )(x2 − x3 )]2 is symmetric and hence lies in F (e1 , e2 , e3 ). The calculation before
tells us that
[(x1 − x2 )(x1 − x3 )(x2 − x3 )]2 = e21 e22 − 4e31 e3 − 4e32 + 18e1 e2 e3 − 27e23 ,
and so
»
(x1 − x2 )(x1 − x3 )(x2 − x3 ) = e21 e22 − 4e31 e3 − 4e32 + 18e1 e2 e3 − 27e23 .
√
For simplicity, denote this by ∆.
Now let’s show explicitly that F (x1 , x2 , x3 )/L is a simple radical extension. Let ω be
the primitive third root of unity, and let ϕ : C3 → L× be a homomorphism such that
ϕ((1 2 3)) = ω −1 . Then consider the element
∑ σ(x1 )
= x1 + ωx2 + ω 2 x3 .
σ∈C3 ϕ(σ)
200
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group
3(ω − ω 2 ) √ ∑ 3 3(ω + ω 2 ) 2
= ∆+ x1 + x1 x2 + 6x1 x2 x3
2 sym 2
√
3 −3 √ 9 27
= ∆ + e31 − e1 e2 + e3 ,
2 2 2
and so √ √
3 −3 √ 3 9 27
x1 + ωx2 + ω x3 = 2
∆ + e31 − e1 e2 + e3 .
2 2 2
√
Now we want to express x1 in terms of e1 , e2 , e3 and ∆. It is possible to express x1 as
a polynomial of x1 + ωx2 + ω 2 x3 over L, but there is an easier way. We can also consider
√ √
3 3 −3 √ 9 27
2
x1 + ω x2 + ωx3 = − ∆ + e31 − e1 e2 + e3 ,
2 2 2
and so (x1 + ωx2√+ ω 2 x3 ) + (x1 + ω 2 x2 + ωx3 ) = 2x1 − x2 − x3 can be expressed in terms
of e1 , e2 , e3 and ∆. Adding e1 = x1 + x2 + x3 and dividing the result by 3, we obtain
an expression of x1 . One can verify that this agrees with the cubic formula we derived
before.
Note that since there are three choices for each cube root, we will obtain nine roots
in this way. This phenomenon also appeared before when we tried to derive the radical
formula for the simpler polynomial y 3 + py + q. To fix this, note that
Therefore once we determine what x1 + ω 2 x2 + ωx3 should be, there is only one choice
for x1 + ω 2 x2 + ωx3 , which shows that we can only obtain three roots in this way.
Example 12.7.3. Lastly, let’s consider the extension F (x1 , x2 , x3 , x4 )/F (e1 , e2 , e3 , e4 ).
The Galois group is S4 , and this is solvable because of the series S4 ⊇ A4 ⊇ K4 ⊇ 1.
Following the same line, let L be the fixed field of A4 and K be √ the fixed field of K4 .
With the same argument, we can show that L = F (e1 , e2 , e3 , e4 )[ ∆] where ∆ is
201
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group
is
√ fixed by A4 and therefore is in A4 . We can then express it in terms of e1 , e2 , e3 , e4 and
2
∆. We can do the same for (x1 x2 + x3 x4 ) + ω (x1 x4 + x2 x3 ) + ω(x1 x3 + x2 x4 ), which
allows us to compute x1 x2 + x3 x4 , x1 x4 + x2 x3 and x1 x3 + x2 x4 .
Note that the above steps are similar to what we did for solving cubic equations. In
fact, the above steps are equivalent to solving the cubic polynomial
This is usually called the resolvent cubic of the original quartic. To solve the original
quartic, we usually solve the resolvent cubic first as an intermediate step. For simplicity,
let’s denote x1 x2 + x3 x4 , x1 x4 + x2 x3 , x1 x3 + x2 x4 by β1 , β2 , β3 , respectively.
Lastly, let’s show explicitly that F (x1 , x2 , x3 , x4 )/K is Kummer. Note that the Galois
group is K4 , which is not cyclic anymore. Therefore we have to consider more than one
homomorphism from K4 to F (x1 , x2 , x3 , x4 )× . Let’s simply consider all the nontrivial
homomorphisms. Following the same procedure, we get three elements
x1 + x2 − x3 − x4 , x 1 − x2 − x3 + x4 , x 1 − x2 + x3 − x4 .
Example 12.7.4. To give a demonstration of how this works, let’s solve the equation
x4 + x2 − 2x + 1 = 0.
0 = β1 = α1 α2 + α3 α4 ,
1 1√
+ 17 = β2 = α1 α4 + α2 α3 ,
2 2
1 1√
− 17 = β3 = α1 α3 + α2 α4 .
2 2
Then
In the above discussions, we’ve seen the product of differences of roots several times.
This quantity actually reveals something about the Galois group.
Definition 12.7.1. Let f be a polynomial of degree n over F , and let K be the splitting
field of f over F . Let α1 , . . . , αn be the n roots of f in K. Then the discriminant of f is
defined as ∏
∆ = (αi − αj )2 ,
i<j
which actually is in F .
203
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group
√
Sketch of Proof. If ∆ ∈ F , then we know that
∏
(αi − αj ) ∈ F,
i<j
where p is a resolvent invariant for G and Op is the orbit of p under the group action of
Sn . This polynomial has degree [Sn : G] and coefficients in F .
We can then simply generalize the argument in Property 12.7.2 and get that
Property 12.7.3. Let f be a polynomial of degree n over a field F with characteristic
0, and let p be a resolvent for a subgroup G of Sn . Then the Galois group of f is a
subgroup of (some conjugate) of G implies that p has a root in F , and p has a simple
root in F implies that the Galois group of f is a subgroup of G.
To conclude this chapter, let’s classify the Galois group of all irreducible polynomials
of degree less than 5. Let n = deg f . The case when n = 1, 2 are trivial, so let’s begin
with n = 3. As usual, we assume that we are working over a field F with characteristic
0.
By the transitivity of automorphisms, we know that when n = 3, the Galois group of
f can be C3 = A3 or S3 . Therefore it suffices to see if the discriminant has a square root
in F : if it does, then the Galois group is C3 ; if it does not, then the Galois group is S3 .
204
Hung-Hsun, Yu 12.7 Radical Formula and Classifying Galois Group
Now let’s proceed to the case n = 4. By the transitivity, the Galois group can only
be S4 , A4 , D4 , K4 and C4 . We can first classify by ∆ and see if it is a square in F . If it is,
then the Galois group of f can only be A4 or K4 . If it does not, then the Galois group of
f can only be S4 , D4 or C4 . We can also see if the resolvent cubic g of f has a root in F
or not. Note that the discriminant of g is the same as the discriminant of f , which shows
that g is separable. Therefore g has a root if and only if the Galois group is a subgroup
of D4 . In other words, if g is irreducible, then the Galois group of f can only be S4 or
A4 , and if g is reducible, then the Galois group of f can only be D4 , C4 or K4 .
That is a lot, so let’s put it into a table.
g is irreducible
g is reducible
∆ is a square A4 K4
∆ is not a square S4 C4 or D4
√
Now it remains to differentiate C4 and D4 . In this case, ∆ ̸∈ F and g has a root in
F . If g splits completely, then the Galois group of f fixes α1 α2 + α3 α4 , α1 α4 + α2 α3 and
α1 α3 + α2 α4 , which shows that the Galois group of F is K4 . Therefore we will assume
that g has exactly one root, which we will assume to be β = α1 α2 + α3 α4 . To differentiate
C4 and D4 , we just need to see if (1 2) is in the Galois group.
Consider γ = α1 α2 − α3 α4 . Then
γ 2 = β 2 − 4e4 (α1 , α2 , α3 , α4 ) ∈ F,
√ √ √
and so γ 2 ∆ ∈ F . Now it is clear that C4 always fixes γ ∆, and (1 2)(γ ∆) = −γ ∆.
As
√ a consequence, when γ ̸= 0, we have that the Galois group of f is C4 if and only if
γ ∆ ∈ F , i.e. γ ∆ is a square in F .
2
205
Hung-Hsun, Yu 12.8 Random Problem Set
2. (12.2) Show that the polynomial t2 + t + x over the field F2 (x) is not solvable.
3. (12.3) We have shown that if α, β are both algebraic over F , then F [α, β] is an
algebraic extension of F by some linear algebra argument. The consequence of
this is that if α, β are both algebraic over F , then α + β, αβ are also algebraic
F . However, the proof is not constructive in the sense that we don’t know what
polynomial relation α + β, αβ actually satisfies. Use the elementary symmetric
polynomials to show this in a more
√ constructive
√ way. Using this, find a polynomial
3
with rational coefficients having 2 + 3 as a root.
(Hint: Consider the conjugates of α and β. Given these conjugates, what might
the conjugates of α + β or αβ be?)
5. (12.3) Using the elementary symmetric polynomials, show that for any integer n,
show that ∑ n
x
x∈F×
q
is 0 if q − 1 ∤ n and is −1 if q − 1|n.
(Hint: The roots of tq−1 − 1 are exactly the elements in F×
q .)
∑ k
6. (12.3) Let pk (x1 , . . . , xn ) = x . i Show that for every field F with characteristic 0,
Show that this does not hold if F is replaced with a commutative ring with 1 (say,
Z) or a field with nonzero characteristic (say, F2 ).
7. (12.4) For any two elements g, h in G, the commutator of g, h is ghg −1 h−1 , which
is denoted by [g, h]. Similar definition works for subsets of G. We also inductively
define Gk as [G, Gk−1 ], where G1 = G. If there exists n ∈ N such that Gn = 1, then
we say that G is nilpotent. Show that any nilpotent group is solvable. Also show
that for every proper subgroup of H, the normalizer of H is strictly larger than H.
(Hint: As an intermediate step, show that for every h there exists k such that
for every g, the expression [[· · · [[g, h], h], · · · , h], h] = 1 where there are k pairs of
brackets.)
206
Hung-Hsun, Yu 12.8 Random Problem Set
it is nilpotent if and only if the upper central series ends at G. Using this, show
that every p-group is nilpotent and hence solvable.
(Hint: Since G is finite, the sequence G1 , G2 , · · · and the upper central series both
stabilize)
10. (12.5) Suppose that K/F , together with a positive integer n, is a Kummer exten-
sion, and A is a subgroup of K × containing the elements a satisfying an ∈ F . Show
that A/F × , An /(F × )n , G and Hom(G, F × ) are isomorphic. Here Hom(G, F × ) is
the group of homomorphisms from G to F × .
11. (12.6) Show that for every cyclic group Cn , there is a Galois extension of Q whose
Galois group is Cn . Use this construction to explicitly give a field K such that
Gal(K/Q) = C3 .
(Hint: Do the case where n = p − 1 first. To show this for general n, you might
need a special case of Dirichlet’s theorem which states that there are infinitely many
primes of the form nk + 1.)
12. (12.7) Let G be a maximal transitive solvable subgroup of S5 . Show that any
resolvent for G is of degree 6. If the original quintic polynomial is irreducible and
the resolvent is separable, show that the Galois group of the quintic polynomial
is solvable if and only if the resolvent has a root in the base field. This somehow
shows that solving a quintic polynomial is really hard, for we have to know how to
find a root for a degree-6 polynomial to tell if a quintic polynomial is solvable.
13. (12.7) Construct irreducible polynomials of degree 4 over Q whose Galois groups
are S4 , A4 , K4 , D4 and C4 .
14. (12.7) Let f be an irreducible polynomial over Q with a real root α. Show that α
is constructible if and only if the order of the Galois group of f is a power of 2.
(Hint: If the order of the Galois group of f is a power of 2, find a subgroup of index
2. If α is constructible, then there is a series of quadratic extension connecting Q
and Q[α]. Extend this so that the extensions become Galois.)
15. (12.7) Construct an irreducible polynomial f of degree 4 over Q such that it has a
real root that is not constructible.
207