0% found this document useful (0 votes)
73 views213 pages

Linear Algebra II: Peter Philip

Uploaded by

Van Vi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views213 pages

Linear Algebra II: Peter Philip

Uploaded by

Van Vi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 213

Linear Algebra II

Peter Philip∗

Lecture Notes
Originally Created for the Class of Spring Semester 2019 at LMU Munich
Includes Subsequent Corrections and Revisions†

September 19, 2021

Contents
1 Affine Subspaces and Geometry 4
1.1 Affine Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Affine Hull and Affine Independence . . . . . . . . . . . . . . . . . . . . 6
1.3 Affine Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Barycentric Coordinates and Convex Sets . . . . . . . . . . . . . . . . . . 12
1.5 Affine Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Affine Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Duality 22
2.1 Linear Forms and Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Hyperplanes and Linear Systems . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Dual Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Symmetric Groups 38

E-Mail: [email protected]

Resources used in the preparation of this text include [Bos13, For17, Lan05, Str08].

1
CONTENTS 2

4 Multilinear Maps and Determinants 46


4.1 Multilinear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Alternating Multilinear Maps and Determinants . . . . . . . . . . . . . . 49
4.3 Determinants of Matrices and Linear Maps . . . . . . . . . . . . . . . . . 57

5 Direct Sums and Projections 71

6 Eigenvalues 76

7 Commutative Rings, Polynomials 87

8 Characteristic Polynomial, Minimal Polynomial 106

9 Jordan Normal Form 119

10 Vector Spaces with Inner Products 136


10.1 Definition, Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.2 Preserving Norm, Metric, Inner Product . . . . . . . . . . . . . . . . . . 138
10.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.4 The Adjoint Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
10.5 Hermitian, Unitary, and Normal Maps . . . . . . . . . . . . . . . . . . . 158

11 Definiteness of Quadratic Matrices over K 172

A Multilinear Maps 175

B Polynomials in Several Variables 176

C Quotient Rings 192

D Algebraic Field Extensions 199


D.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . . . . 199
D.2 Algebraic Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
CONTENTS 3

References 212
1 AFFINE SUBSPACES AND GEOMETRY 4

1 Affine Subspaces and Geometry

1.1 Affine Subspaces


Definition 1.1. Let V be a vector space over the field F . Then M ⊆ V is called an
affine subspace of V if, and only if, there exists a vector v ∈ V and a (vector) subspace
U ⊆ V such that M = v + U . We define dim M := dim U to be the dimension of M
(this notion of dimension is well-defined by the following Lem. 1.2(a)).

Thus, the affine subspaces of a vector space V are precisely the translations of vector
subspaces U of V , i.e. the cosets of subspaces U , i.e. the elements of quotient spaces
V /U .
Lemma 1.2. Let V be a vector space over the field F .

(a) If M is an affine subspace of V , then the vector subspace corresponding to M is


unique, i.e. if M = v1 + U1 = v2 + U2 with v1 , v2 ∈ V and vector subspaces U1 , U2 ⊆
V , then
U1 = U2 = {u − v : u, v ∈ M }. (1.1)

(b) If M = v + U is an affine subspace of V , then the vector v in this representation is


unique if, and only if, U = {0}.

Proof. (a): Let M = v1 + U1 with v1 ∈ V and vector subspace U1 ⊆ V . Moreover, let


U := {u − v : u, v ∈ M }. It suffices to show U1 = U . Suppose, u1 ∈ U1 . Since v1 ∈ M
and v1 + u1 ∈ M , we have u1 = v1 + u1 − v1 ∈ U , showing U1 ⊆ U . If a ∈ U , then there
are u1 , u2 ∈ U1 such that a = v1 + u1 − (v1 + u2 ) = u1 − u2 ∈ U1 , showing U ⊆ U1 , as
desired.
(b): If U = {0}, then M = {v} and v is unique. If M = v + U with 0 6= u ∈ U , then
M = v + U = v + u + U with v 6= v + u. 
Definition 1.3. In the situation of Def. 1.1, we call affine subspaces of dimension 0
points, of dimension 1 lines, and of dimension 2 planes – in R2 and R3 , such objects are
easily visualized and they then coincide with the points, lines, and planes with which
one is already familiar.

Affine spaces and vector spaces share many structural properties. In consequence, one
can develop a theory of affine spaces that is in many respects analogous to the theory
1 AFFINE SUBSPACES AND GEOMETRY 5

of vector spaces, as will be illustrated by some of the notions and results presented in
the following. We start by defining so-called affine combinations, which are, for affine
spaces, what linear combinations are for vector spaces:
Definition 1.4. Let V be a P vector space over the field F with v1 , . . . , vn ∈ V and
F , n ∈ N. Then ni=1 λi vi is called an affine combination of v1 , . . . , vn if,
λ1 , . . . , λn ∈ P
and only if, ni=1 λi = 1.
Theorem 1.5. Let V be a vector space over the field F , ∅ 6= M ⊆ V . Then M is
an affine subspace of V if, and only, if M is closed under affine combinations. More
precisely, the following statements are equivalent:

(i) M is an affine subspace of V .


Pn Pn
(ii) If n ∈ N, v1 , . . . , vn ∈ M , and λ1 , . . . , λn ∈ F with i=1 λi = 1, then i=1 λi vi ∈
M.

If char F 6= 2, then (i) and (ii) are also equivalent to1

(iii) If v1 , v2 ∈ M and λ1 , λ2 ∈ F with λ1 + λ2 = 1, then λ1 v1 + λ2 v2 ∈ M .

Proof. Exercise. 

The following Th. 1.6 is the analogon of [Phi19, Th. 5.7] for affine spaces:
Theorem 1.6. Let V be a vector space over the field F .

(a) Let I 6= ∅ be an index


T set and (Mi )i∈I a family of affine subspaces of V . Then the
intersection M := i∈I Mi is either empty or it is an affine subspace of V .

(b) In contrast to intersections, unions of affine subspaces are almost never affine sub-
spaces. More precisely, if M1 and M2 are affine subspaces of V and char F 6= 2 (i.e.
1 6= −1 in F ), then
 
M1 ∪ M2 is an affine subspace of V ⇔ M 1 ⊆ M2 ∨ M2 ⊆ M1 (1.2)

(where “⇐” also holds for char F = 2, but cf. Ex. 1.7 below).
1
For char F = 2, (iii) does not imply (i) and (ii): Let F := Z2 = {0, 1}. Let V be a vector space
over F with #V ≥ 4 (e.g. V = F 2 ). Let p, q, r ∈ V be distinct, M := {p, q, r} (i.e. #M = 3). If
λ1 , λ2 ∈ F with λ1 + λ2 = 1, then (λ1 , λ2 ) ∈ {(0, 1), (1, 0)} and (iii) is trivially true. On the other hand,
v := p + q + r is an affine combination of p, q, r, since 1 + 1 + 1 = 1 in F ; but v ∈ / M: v = p + q + r = p
implies q = −r = r, and v = r, q likewise leads to a contradiction (this counterexample was pointed out
by Robin Mader).
1 AFFINE SUBSPACES AND GEOMETRY 6

Proof. (a): Let M 6= ∅. We use the characterization of Th. 1.5(ii)Pto show M is an


affinePsubspace: If n ∈ N, v1 , . . . , vn ∈ M , and λ1 , . . . , λn ∈ F with nk=1 λk = 1, then
v := nk=1 λk vk ∈ Mi for each i ∈ I, implying v ∈ M . Thus, M is an affine subspace of
V.
(b): If M1 ⊆ M2 , then M1 ∪ M2 = M2 , which is an affine subspace of V . If M2 ⊆
M1 , then M1 ∪ M2 = M1 , which is an affine subspace of V . For the converse, we
now assume char F 6= 2, M1 6⊆ M2 , and M1 ∪ M2 is an affine subspace of V . We
have to show M2 ⊆ M1 . Let m1 ∈ M1 \ M2 and m2 ∈ M2 . Since M1 ∪ M2 is an
affine subspace, m2 + m2 − m1 ∈ M1 ∪ M2 by Th. 1.5(ii). If m2 + m2 − m1 ∈ M2 ,
then m1 = m2 + m2 − (m2 + m2 − m1 ) ∈ M2 , in contradiction to m1 ∈ / M2 . Thus,
m2 + m2 − m1 ∈ M1 . Since char F 6= 0, we have 2 := 1 + 1 6= 0 in F , implying
m2 = 21 (m2 + m2 − m1 ) + 21 m1 ∈ M1 , i.e. M2 ⊆ M1 . 
Example 1.7. Consider F := Z2 = {0, 1} and the vector space V := F 2 over F . Then
M1 := U1 := {(0, 0), (1, 0)} = h{(1, 0)}i is a vector subspace and, in particular, an affine
subspace of V . The set M2 := (0, 1) + U1 = {(0, 1), (1, 1)} is also an affine subspace.
Then M1 ∪ M2 = V is an affine subspace, even though neither M1 ⊆ M2 nor M2 ⊆ M1 .

1.2 Affine Hull and Affine Independence


Next, we will define the affine hull of a subset A of a vector space, which is the affine
analogon to the linear notion of the span of A (which is sometimes also called the linear
hull of A):
Definition 1.8. Let V be a vector space over the field F , ∅ 6= A ⊆ V , and

M := M ∈ P(V ) : A ⊆ M ∧ M is affine subspace of V ,
where we recall that P(V ) denotes the power set of V . Then the set
\
aff A := M
M ∈M

is called the affine hull of A. We call A a generating set of aff A.


The following Prop. 1.9 is the analogon of [Phi19, Prop. 5.9] for affine spaces:
Proposition 1.9. Let V be a vector space over the field F and ∅ 6= A ⊆ V .

(a) aff A is an affine subspace of V , namely the smallest affine subspace of V containing
A.
1 AFFINE SUBSPACES AND GEOMETRY 7

(b) aff A is the set of all affine combinations of elements from A, i.e.
( n n
)
X X
aff A = λ i ai : n ∈ N ∧ λ 1 , . . . , λ n ∈ F ∧ a1 , . . . , a n ∈ A ∧ λi = 1 .
i=1 i=1
(1.3)
(c) If A ⊆ B ⊆ V , then aff A ⊆ aff B.
(d) A = aff A if, and only if, A is an affine subspace of V .
(e) aff aff A = aff A.

Proof. (a): Since A ⊆ aff A implies aff A 6= ∅, (a) is immediate from Th. 1.6(a).
(b): Let W denote the right-hand side of (1.3). If M is an affine subspace of V and
A ⊆ M , then W ⊆ M , since M is closed under affine combinations, showing W ⊆ aff A.
On the other hand, suppose N, n1 , . . . , nN ∈ N, ak1 , . . . , aknk ∈ A for each k ∈ {1, . . . , N },
λk1 , . . . , λknk ∈ F for each k ∈ {1, . . . , N }, and α1 , . . . , αN ∈ F such that
nk
X N
X
∀ λki = αi = 1.
k∈{1,...,N }
i=1 i=1

Then
N
X nk
X
αk λki aki ∈ W,
k=1 i=1

since
X nk
N X N
X
αk λki = αk = 1,
k=1 i=1 k=1

showing W to be an affine subspace of V by Th. 1.5(ii). Thus, aff A ⊆ W , completing


the proof of aff A = W .
(c) is immediate from (b).
(d): If A = aff A, then A is an affine subspace by (a). For the converse, while it is clear
that A ⊆ aff A always holds, if A is an affine subspace, then A ∈ M, where M is as in
Def. 1.8, implying aff A ⊆ A.
(e) now follows by combining (d) with (a). 
Proposition 1.10. Let V be a vector space over the field F , A ⊆ V , M = v + U with
v ∈ A and U a vector subspace of V . Then the following statements are equivalent:

(i) aff A = M .
1 AFFINE SUBSPACES AND GEOMETRY 8

(ii) h−v + Ai = U .

Proof. Exercise. 

We will now define the notions of affine dependence/independence, which are, for affine
spaces, what linear dependence/independence are for vector spaces:

Definition 1.11. Let V be a vector space over the field F .

(a) A vector v ∈ V is called affinely dependent on a subset U of V (or on the vectors


in U ) if, and only if, there exists n ∈ N and u1 , . . . , un ∈ U such that v is an affine
combination of u1 , . . . , un . Otherwise, v is called affinely independent of U .

(b) A subset U of V is called affinely independent if, and only if, whenever 0 ∈ V is
written as a linear combination of distinct elements of U such that the coefficients
have sum 0, then all coefficients must be 0 ∈ F , i.e. if, and only if,

X
n∈N ∧ W ⊆U ∧ #W = n ∧ λu u = 0
u∈W
!
X
∧ ∀ λu ∈ F ∧ λu = 0 ⇒ ∀ λu = 0. (1.4)
u∈W u∈W
u∈W

Sets that are not affinely independent are called affinely dependent.

As a caveat, it is underlined that, in Def. 1.11(b) above, one does not consider affine
combinations of the vectors u ∈ U , but special linear combinations (this is related to
the fact that 0 is only an affine combination of vectors in U , if aff U is a vector subspace
of V ).

Remark 1.12. It is immediate from Def. 1.11 that if v ∈ V is linearly independent of


U ⊆ V , then it is also affinely independent of U , and, if U ⊆ V is linearly independent,
then U is also affinely independent. However, the converse is, in general, not true (cf.
Ex. 1.13(b),(c) below).

Example 1.13. Let V be a vector space over the field F .

(a) ∅ is affinely independent: Indeed, if U = ∅, then the left side of the implication in
(1.4) is always false (since W ⊆ U means #W = 0), i.e. the implication is true.
1 AFFINE SUBSPACES AND GEOMETRY 9

(b) Every singleton set {v}, v ∈ V , is affinely independent, since λ1 = 1i=1 λi = 0


P
means λ1 = 0 (if v = 0, then {v} is not linearly independent, cf. [Phi19, Ex.
5.13(b)]).

(c) Every set {v, w} with two distinct vectors v, w ∈ V is affinely independent (but
not linearly independent for w = αv with some α ∈ F ): 0 = λv − λw = λ(v − w)
implies λ = 0 or v = w.

There is a close relationship between affine independence and linear independence:

Proposition 1.14. Let V be a vector space over the field F and U ⊆ V . Then the
following statements are equivalent:

(i) U is affinely independent.

(ii) If u0 ∈ U , then U0 := {u − u0 : u ∈ U \ {u0 }} is linearly independent.

(iii) The set X := {(u, 1) ∈ V × F : u ∈ U } is a linearly independent subset of the


vector space V × F .

Proof. Exercise. 

The following Prop. 1.15 is the analogon of [Phi19, Prop. 5.14(a)-(c)] for affine spaces:

Proposition 1.15. Let V be a vector space over the field F and U ⊆ V .

(a) U is affinely dependent if, and only if, there exists u0 ∈ U such that u0 is affinely
dependent on U \ {u0 }.

(b) If U is affinely dependent and U ⊆ M ⊆ V , then M is affinely dependent as well.

(c) If U is affinely independent and M ⊆ U , then M is affinely independent as well.

P Then there exists W ⊆ U , #W = n ∈ N,


Proof. (a):PSuppose, U is affinely dependent.
such that u∈W λu u = 0 with λu ∈ F , u∈W λu = 0, and there exists u0 ∈ W with
λu0 6= 0. Then
X X X
u0 = −λ−1u0 λ u u = (−λ −1
λ
u0 u )u, (−λ−1 −1
u0 λu ) = (−λu0 )·(−λu0 ) = 1,
u∈W \{u0 } u∈W \{u0 } u∈W \{u0 }
1 AFFINE SUBSPACES AND GEOMETRY 10

showing u0 to be affinely dependent on U \ {u0 }. Conversely, if u0 ∈ U is affinely


dependent on U \ {uP 0 }, then there exists n ∈ N, distinct u1 , . . . , un ∈ U \ {u0 }, and
λ1 , . . . , λn ∈ F with ni=1 λi = 1 such that
n
X n
X
u0 = λ i ui ⇒ −u0 + λi ui = 0,
i=1 i=1

showing U to be affinely dependent, since the coefficient of u0 is −1 6= 0 and −1 +


P n
i=1 λi = 0.

(b) and (c) are now both immediate from (a). 

1.3 Affine Bases


Definition 1.16. Let V be a vector space over the field F , let M ⊆ V be an affine
subspace, and B ⊆ V . Then B is called an affine basis of M if, and only if, B is a
generating set for M (i.e. M = aff B) that is also affinely independent.

There is a close relationship between affine bases and vector space bases:
Proposition 1.17. Let V be a vector space over the field F , let M ⊆ V be an affine
subspace, and let B ⊆ M with v ∈ B. Then the following statements are equivalent:

(i) B is an affine basis of M .

(ii) B0 := {b − v : b ∈ B \ {v}} is a vector space basis of the vector space U :=


{v1 − v2 : v1 , v2 ∈ M }.

Proof. As a consequence of Lem. 1.2(a), we know U to be a vector subspace of V and


M = a + U for each a ∈ M . Moreover, v ∈ B ⊆ M implies B0 ⊆ U . According
to Prop. 1.14, B is affinely independent if, and only if, B0 is linearly independent.
According to Prop. 1.10, aff B = M holds if, and only if, h−v + Bi = U , which, since
B0 = (−v + B) \ {0}, holds if, and only if, hB0 i = U . 

The following Th. 1.18 is the analogon of [Phi19, Th. 5.17] for affine spaces:
Theorem 1.18. Let V be a vector space over the field F , let M ⊆ V be an affine
subspace, and let ∅ 6= B ⊆ V . Then the following statements (i) – (iii) are equivalent:

(i) B is an affine basis of M .


1 AFFINE SUBSPACES AND GEOMETRY 11

(ii) B is a maximal affinely independent subset of M , i.e. B is affinely independent


and each set A ⊆ M with B ( A is affinely dependent.
(iii) B is a minimal generating set for M , i.e. aff B = M and aff A ( M for each
A ( B.

Proof. Let v ∈ B, and let B0 and U be as in Prop. 1.17 above. Then, due to Prop.
1.14, B is a maximal affinely independent subset of M if, and only if, B0 is a maximal
linearly independent subset of U . Moreover, due to Prop. 1.10, B is a minimal (affine)
generating set for M if, and only if, B0 is a minimal (linear) generating set for U . Thus,
the equivalences of Th. 1.18 follow by combining Prop. 1.17 with [Phi19, Th. 5.17]. 

The following Th. 1.19 is the analogon of [Phi19, Th. 5.23] for affine spaces:
Theorem 1.19. Let V be a vector space over the field F and let M ⊆ V be an affine
subspace.

(a) If S ⊆ M is affinely independent, then there exists an affine basis of M that contains
S.
(b) M has an affine basis B ⊆ M .
(c) Affine bases of M have a unique cardinality, i.e. if B ⊆ M and B̃ ⊆ M are both
affine bases of M , then there exists a bijective map φ : B −→ B̃.
(d) If B is an affine basis of M and S ⊆ M is affinely independent, then there exists
C ⊆ B such that B̃ := S ∪˙ C is an affine basis of M .

Proof. Let v ∈ V and let U be a vector subspace of V such that M = v + U . Then


v ∈ M and U = {v1 − v2 : v1 , v2 ∈ M } according to Lem. 1.2(a).
(a): It suffices to consider the case S 6= ∅. Thus, let v ∈ S. According to Prop. 1.14(ii),
S0 := {x − v : x ∈ S \ {v}} is a linearly independent subset of U . According to
[Phi19, Th. 5.23(a)], U has a vector space basis S0 ⊆ B0 ⊆ U . Then, by Prop. 1.17,
˙
S ⊆ (v + B0 ) ∪{v} is an affine basis of M .
(b) is immediate from (a).
(c): Let B ⊆ M and B̃ ⊆ M be affine bases of M . Moreover, let b ∈ B and b̃ ∈ B̃.
Then, by Prop. 1.17, B0 := {x − b : x ∈ B \ {b}} and B̃0 := {x − b̃ : x ∈ B̃ \ {b̃}}
are both vector space bases of U . Thus, by [Phi19, Th. 5.23(c)], there exists a bijective
map ψ : B0 −→ B̃0 . Then, clearly, the map
(
b̃ for x = b,
φ : B −→ B̃, φ(x) :=
b̃ + ψ(x − b) for x 6= b,
1 AFFINE SUBSPACES AND GEOMETRY 12

is well-defined and bijective, thereby proving (c).


(d): If B ⊆ aff S, then, according to Prop. 1.9(c),(e), M = aff B ⊆ aff aff S = aff S,
i.e. aff S = M , as M is an affine subspace containing S. Thus, S is itself an affine
basis of M and the statement holds with C := ∅. It remains to consider the case,
where there exists b ∈ B \ S such that S ∪ {b} is affinely independent. Then, by Prop.
1.17, B0 := {x − b : x ∈ B \ {b}} is a vector space basis of U and, by Prop. 1.14(ii),
S0 := {x − b : x ∈ S} is a linearly independent subset of U . Thus, by [Phi19, Th.
5.23(d)], there exists C0 ⊆ B0 such that B̃0 := S0 ∪˙ C0 is a vector space basis of U , and,
˙
then, using Prop. 1.17 once again, (b + B̃0 ) ∪{b} = S ∪˙ C with C := (b + C0 ) ∪{b}
˙ ⊆B
is an affine basis of M . 

1.4 Barycentric Coordinates and Convex Sets


The following Th. 1.20 is the analogon of [Phi19, Th. 5.19] for affine spaces:

Theorem 1.20. Let V be a vector space over the field F and assume M ⊆ V is an
affine subspace with affine basis B of M . Then each vector v ∈ M has unique barycentric
coordinates with respect to the affine basis B, i.e., for each v ∈ M , there exists a unique
finite subset Bv of B and a unique map c : Bv −→ F \ {0} such that
X X
v= c(b) b ∧ c(b) = 1. (1.5)
b∈Bv b∈Bv

Proof. The existence of Bv and the map c follows from the fact that the affine basis B
is an affine generating set, aff B = M . For the uniqueness proof, consider finite sets
Bv , B̃v ⊆ B and maps c : Bv −→ F \ {0}, c̃ : B̃v −→ F \ {0} such that
X X X X
v= c(b) b = c̃(b) b ∧ c(b) = c̃(b) = 1.
b∈Bv b∈B̃v b∈Bv b∈B̃v

Extend both c and c̃ to A := Bv ∪ B̃v by letting c(b) := 0 for b ∈ B̃v \ Bv and c̃(b) := 0
for b ∈ Bv \ B̃v . Then X 
0= c(b) − c̃(b) b,
b∈A

such that the affine independence of A implies c(b) = c̃(b) for each b ∈ A, which, in
turn, implies Bv = B̃v and c = c̃. 

Example 1.21. With respect to the affine basis {0, 1} of R over R, the barycentric
coordinates of 13 are 23 and 31 , whereas the barycentric coordinates of 5 are −4 and 5.
1 AFFINE SUBSPACES AND GEOMETRY 13

Remark 1.22. Let V be a vector space over the field F and assume M ⊆ V is an affine
subspace with affine basis B of M .

(a) Caveat: In the literature, one also finds the notion of affine coordinates, however,
this notion of affine coordinates is usually (but not always, so one has to use care)
defined differently from the notion of barycentric coordinates as defined in Th. 1.20
above: For the affine coordinates, one designates one point x0 ∈ B to be the origin
of M . Let v ∈ M and let c : Bv −→ F \ {0} be the map yielding the barycentric
coordinates according to Th. 1.20. We write {x0 } ∪ Bv = {x0 , x1 , . . . , xn } with
distinct elements x1 , . . . , xn ∈ M (if any) and we set c(x0 ) := 0 in case x0 ∈
/ Bv .
Then n n
X X
v= c(xi ) xi ∧ c(xi ) = 1,
i=0 i=0
Pn
which, since 1 − i=1 c(xi ) = c(x0 ), is equivalent to
n
X n
X
v = x0 + c(xi ) (xi − x0 ) ∧ c(xi ) = 1.
i=1 i=0

One calls the c(x1 ), . . . , c(xn ), given by the map ca := c↾Bv \{x0 } , the affine coordi-
nates of v with respect to the affine coordinate system {x0 } ∪ (−x0 + B) (for v = x0 ,
ca turns out to be the empty map).

(b) If x1 , . . . , xn ∈ M are distinct points that are affinely independent and n := n·1 6= 0
in F , then one sometimes calls
n
1X
v := xi ∈ M
n i=1

the barycenter of x1 , . . . , xn .

Definition and Remark 1.23. Let V be a vector space over R (we restrict ourselves
to vector spaces over R, since, for a scalar λ we will need to know what it means
for λ to be positive, i.e. λ > 0 needs to be well-defined). P Let v1 , . . . , vn ∈ V and
λ1 , . . . , λn ∈ R, n ∈ N. Then we call the affine combination ni=1 Pnλi vi of v1 , . . . , vn a
convex combination of v1 , . . . , vn if, and only if, in addition to i=1 λi = 1, one has
λi ≥ 0 for each i ∈ {1, . . . , n}. Moreover, we call C ⊆ V convex if, and only if, C
is closed under convex Pcombinations, if, and only if, n ∈ N, v1 , . . . , vn ∈ C, and
i.e. P
λ1 , . . . , λn ∈ R0 with i=1 λi = 1, implies ni=1 λi vi ∈ C (analogous to Th. 1.5, C ⊆ V
+ n

is then convex if, and only if, each convex combination of merely two elements of C
is again in C). Note that, in contrast to affine subspaces, we allow convex sets to
1 AFFINE SUBSPACES AND GEOMETRY 14

be empty. Clearly, the convex subsets of R are precisely the intervals (open, closed,
half-open, bounded or unbounded). Convex subsets of R2 include triangles and disks.
Analogous to the proof of Th. 1.6(a), one can show that arbitrary intersections of convex
sets are always convex, and, analogous to the definition of the affine hull in Def. 1.8,
one defines the convex hull conv A of a set A ⊆ V by letting

C := C ∈ P(V ) : A ⊆ C ∧ C is convex subset of V ,
\
conv A := C.
C∈C

Then Prop. 1.9 and its proof still work completely analogously in the convex situation
and one obtains conv A to be the smallest convex subset of V containing A, where conv A
consists precisely of all convex combinations of elements from A; A = conv A holds if, and
only if, A is convex; conv conv A = conv A; and conv A ⊆ conv B for each A ⊆ B ⊆ V .
If n ∈ N0 and A = {x0 , x1 , . . . , xn } ⊆ V is an affinely independent set, consisting of
the n + 1 distinct points x0 , x1 , . . . , xn , then conv A is called an n-dimensional simplex
(or simply an n-simplex) with vertices x0 , x1 , . . . , xn – 0-simplices are called points, 1-
simplices line segments, 2-simplices triangles, and 3-simplices tetrahedra. If {e1 , . . . , ed }
denotes the standard basis of Rd , d ∈ N, then conv{e1 , . . . , en+1 }, 0 ≤ n < d, is called
the standard n-simplex in Rd .

1.5 Affine Maps


We first study a special type of affine map, namely so-called translations.
Definition 1.24. Let V be a vector space over the field F . If v ∈ V , then the map

Tv : V −→ V, Tv (x) := x + v,

is called a translation, namely, the translation by v or the translation with translation


vector v. Let T (V ) := {Tv : v ∈ V } denote the set of translations on V .
Proposition 1.25. Let V be a vector space over the field F .

(a) If v ∈ V and A, B ⊆ V , then Tv (A + B) = v + A + B. In particular, translations


map affine subspaces of V into affine subspaces of V .

(b) If v ∈ V , then Tv is bijective with (Tv )−1 = T−v . In particular, T (V ) ⊆ SV , where


SV denotes the symmetric group on V according to [Phi19, Ex. 4.9(b)].

(c) Nontrivial translations are not linear: More precisely, Tv with v ∈ V is linear if,
and only if, v = 0 (i.e. Tv = Id).
1 AFFINE SUBSPACES AND GEOMETRY 15

(d) If v, w ∈ V , then Tv ◦ Tw = Tv+w = Tw ◦ Tv .

(e) (T (V ), ◦) is a commutative subgroup of (SV , ◦). Moreover, (T (V ), ◦) ∼


= (V, +),
where
I : (V, +) −→ (T (V ), ◦), I(v) := Tv ,
constitutes a group isomorphism.

Proof. Exercise. 

We will now define affine maps, which are, for affine spaces, what linear maps are for
vector spaces:

Definition 1.26. Let V and W be vector spaces over the field F . A map A : V −→ W
is called affine if, and only if, there exists a linear map L ∈ L(V, W ) and w ∈ W such
that
∀ A(x) = (Tw ◦ L)(x) = w + L(x) (1.6)
x∈V

(i.e. the affine maps are precisely the compositions of linear maps with translations).
We denote the set of all affine maps from V into W by A(V, W ).

Proposition 1.27. Let V, W, X be vector spaces over the field F .

(a) If L ∈ L(V, W ) and v ∈ V , then L ◦ Tv = TLv ◦ L ∈ A(V, W ).

(b) If A ∈ A(V, W ), L ∈ L(V, W ), and w ∈ W , then A = Tw ◦ L if, and only if,


T−w ◦ A = L. In particular, A = Tw ◦ L is injective (resp. surjective, resp. bijective)
if, and only if, L is injective (resp. surjective, resp. bijective).

(c) If A : V −→ W is an affine and bijective, then A−1 is also affine.

(d) If A : V −→ W and B : W −→ X are affine, then so is B ◦ A.

(e) Define GA(V ) := {A ∈ A(V, V ) : A bijective}. Then (GA(V ), ◦) forms a subgroup


of the symmetric group (SV , ◦) (then, clearly, GL(V ) forms a subgroup of GA(V ),
cf. [Phi19, Cor. 6.23]).

Proof. (a): If L ∈ L(V, W ) and v, x ∈ V , then

(L ◦ Tv )(x) = L(v + x) = Lv + Lx = (TLv ◦ L)(x),

proving L ◦ Tv = TLv ◦ L.
1 AFFINE SUBSPACES AND GEOMETRY 16

(b) is due to the bijectivity of Tw : One has, since T−w ◦ Tw = Id,

A = Tw ◦ L ⇔ T−w ◦ A = T−w ◦ Tw ◦ L = Id ◦L = L.

Moreover, for each x, y ∈ V and z ∈ W , one has

Ax = w + Lx = w + Ly = Ay ⇔ Lx = Ly,
z + w = Ax = w + Lx ⇔ z = Lx,
z − w = Lx ⇔ z = w + Lx = Ax,

proving A = Tw ◦ L is injective (resp. surjective, resp. bijective) if, and only if, L is
injective (resp. surjective, resp. bijective).
(c): If A = Tw ◦ L with L ∈ L(V, W ) and w ∈ W is affine and bijective, then, by (b), L
is bijective. Thus, A−1 = L−1 ◦ (Tw )−1 = L−1 ◦ T−w , which is affine by (a).
(d): If A = Tw ◦ L, B = Tx ◦ K with L ∈ L(V, W ), w ∈ W , K ∈ L(W, X), x ∈ X, then

∀ (B ◦ A)(a) = B(w + La) = x + Kw + (K ◦ L)(a) = TKw+x ◦ (K ◦ L) (a),
a∈V

showing B ◦ A to be affine.
(e) is an immediate consequence of (c) and (d). 

Proposition 1.28. Let V and W be vector spaces over the field F .

(a) Let v ∈ V , w ∈ W , L ∈ L(V, W ), and let U be a vector subspace of V . Then

(Tw ◦ L)(v + U ) = w + Lv + L(U )

(in particular, each affine image of an affine subspace is an affine subspace). More-
over, if A := Tw ◦ L and S ⊆ V such that M := v + U = aff S, then A(M ) =
w + Lv + L(U ) = aff(A(S)).

(b) Let y ∈ W , L ∈ L(V, W ), and let U be a vector subspace of W . Then L−1 (U ) is a


vector subspace of V and

∀ L−1 (y + U ) = v + L−1 (U )
v∈L−1 {y}

(in particular, each linear preimage of an affine subspace is either empty or an


affine subspace).

(c) If M ⊆ W is an affine subspace of W and A ∈ A(V, W ), then A−1 (M ) is either


empty or an affine subspace of V .
1 AFFINE SUBSPACES AND GEOMETRY 17

Proof. Exercise. 

The following Prop. 1.29 is the analogon of [Phi19, Prop. 6.5(a),(b)] for affine spaces
(but cf. Caveat 1.30 below):
Proposition 1.29. Let V and W be vector spaces over the field F , and let A : V −→ W
be affine.

(a) If A is injective, then, for each affinely independent subset S of V , A(S) is an


affinely independent subset of W .
(b) A is surjective if, and only if, for each subset S of V with V = aff S, one has
W = aff(A(S)).

Proof. Let w ∈ W and L ∈ L(V, W ) be such that A = Tw ◦ L.


(a): If A is injective, S ⊆ V
Pnis affinely independent, and λ1 , . . . , λn ∈ F ; s1 , . . . , sn ∈ S
distinct; n ∈ N; such that i=1 λi = 0 and
n n n
! n
! n
!
X X X X X
0= λi A(si ) = λi (w + Lsi ) = λi w + L λi s i = L λi si ,
i=1 i=1 i=1 i=1 i=1
Pn
then i=1 λi si = 0 by [Phi19, Prop. 6.3(d)], implying λ1 = · · · = λn = 0 and, thus,
showing that A(S) is also affinely independent.
(b): If A is not surjective, then aff(A(V )) = A(V ) 6= W , since A(V ) is an affine subspace
of W by Prop. 1.28(a). Conversely, if A is surjective, S ⊆ V , and aff(S) = V , then
Prop. 1.28(a)
W = A(V ) = A(aff S) = aff(A(S)),

thereby establishing the case. 


Caveat 1.30. Unlike in [Phi19, Prop. 6.5(a)], the converse of Prop. 1.29(a) is, in general,
not true: If dim V ≥ 1 and A ≡ w ∈ W is constant, then A is affine, not injective, but it
maps every nonempty affinely independent subset of V (in fact, every nonempty subset
of V ) onto the affinely independent set {w}.
Corollary 1.31. Let V and W be vector spaces over the field F , and let A : V −→ W
be affine and injective. If M ⊆ V is an affine subspace and B is an affine basis of M ,
then A(B) is an affine basis of A(M ) (Caveat 1.30 above shows that the converse is, in
general, not true).

Proof. Since B is affinely independent, A(B) is affinely independent by Prop. 1.29(a).


On the other hand, A(M ) = aff(A(B)) by Prop. 1.28(a). 
1 AFFINE SUBSPACES AND GEOMETRY 18

The following Prop. 1.32 shows that affine subspaces are precisely the images of vector
subspaces under translations and also precisely the sets of solutions to linear systems
with nonempty sets of solutions:
Proposition 1.32. Let V be a vector space over the field F and M ⊆ V . Then the
following statements are equivalent:

(i) M is an affine subspace of V .

(ii) There exists v ∈ V and a vector subspace U ⊆ V such that M = Tv (U ).

(iii) There exists a linear map L ∈ L(V, V ) and a vector b ∈ V such that ∅ 6= M =
L−1 {b} = {x ∈ V : Lx = b} (if V is finite-dimensional, then L−1 {b} = L(L|b),
where L(L|b) denotes the set of solutions to the linear system Lx = b according to
[Phi19, Rem. 8.3]).

Proof. “(i)⇔(ii)”: By the definition of affine subspaces, (i) is equivalent to the existence
of v ∈ V and a vector subspace U ⊆ V such that M = v + U = Tv (U ), which is (ii).
“(iii)⇒(i)”: Let L ∈ L(V, V ) and b ∈ V such that ∅ 6= M = L−1 {b}. Let x0 ∈ M . Then,
by [Phi19, Th. 4.20(f)], M = x0 + ker L, showing M to be an affine subspace.
“(i)⇒(iii)”: Now suppose M = v + U with v ∈ V and U a vector subspace of V .
According to [Phi19, Th. 5.27(c)], there exists a subspace W of V such that V = U ⊕W .
Then, clearly, L : V −→ V , L(u + w) := w (where u ∈ U , w ∈ W ), defines a linear map.
Let b := Lv. Then M = L−1 {b}: Indeed, if u ∈ U , then L(v + u) = Lv + 0 = Lv = b,
showing M ⊆ L−1 {b}; if L(u+w) = w = b = Lv, then u+w = v+u+w−v ∈ v+U = M
(since L(u + w − v) = Lw − Lv = w − Lv = 0 implies u + w − v ∈ U ), showing
L−1 {b} ⊆ M . 

The following Th. 1.33 is the analogon of [Phi19, Th. 6.9] for affine spaces:
Theorem 1.33. Let V and W be vector spaces over the field F . Moreover, let MV =
v + UV ⊆ V and MW = w + UW ⊆ W be affine subspaces of V and W , respectively,
where v ∈ V , w ∈ W , UV is a vector subspace of V and UW is a vector subspace of
W . Let BV be an affine basis of MV and let BW be an affine basis of MW . Then the
following statements are equivalent:

(i) There exists a linear isomorphism L : UV −→ UW such that

MW = (Tw ◦ L ◦ T−v )(MV ).

(ii) UV and UW are linearly isomorphic.


1 AFFINE SUBSPACES AND GEOMETRY 19

(iii) dim MV = dim MW .


(iv) #BV = #BW (i.e. there exists a bijective map from BV onto BW ).

Proof. “(i)⇒(ii)” is trivially true.


“(ii)⇒(i)” holds, since the restricted translations T−v : MV −→ UV and Tw : UW −→
MW are, clearly, bijective.
“(ii)⇔(iii)”: By Def. 1.1, (iii) is equivalent to dim UV = dim UW , which, according to
[Phi19, Th. 6.9], is equivalent to (ii).
“(iii)⇔(iv)”: Let x ∈ BV and y ∈ BW . Then, by Prop. 1.17, SV := {b−x : b ∈ BV \{x}}
is a vector space basis of UV and SW := {b − y : b ∈ BW \ {y}} is a vector space basis of
UW , where the restricted translations T−x : BV \{x} −→ SV and T−y : BW \{y} −→ SW
are, clearly, bijective. Thus, if dim MV = dim MW , then there exists a bijective map
φ : SV −→ SW , implying (Ty ◦ φ ◦ T−x ) : BV \ {x} −→ BW \ {y} to be bijective as well.
Conversely, if ψ : BV \ {x} −→ BW \ {y} is bijective, so is (T−y ◦ ψ ◦ Tx ) : SV −→ SW ,
implying dim MV = dim MW . 

Analogous to [Phi19, Def. 6.17], we now consider, for vector spaces V, W over the field
F , A(V, W ) with pointwise addition and scalar multiplication, letting, for each A, B ∈
A(V, W ), λ ∈ F ,
(A + B) : V −→ W, (A + B)(x) := A(x) + B(x),
(λ · A) : V −→ W, (λ · A)(x) := λ · A(x) for each λ ∈ F .
The following Th. 1.34 corresponds to [Phi19, Th. 6.18] and [Phi19, Th. 6.21] for linear
maps.
Theorem 1.34. Let V and W be vector spaces over the field F . Addition and scalar
multiplication on A(V, W ), given by the pointwise definitions above, are well-defined in
the sense that, if A, B ∈ A(V, W ) and λ ∈ F , then A+B ∈ A(V, W ) and λA ∈ A(V, W ).
Moreover, with these pointwise defined operations, A(V, W ) forms a vector space over
F.

Proof. According to [Phi19, Ex. 5.2(c)], it only remains to show that A(V, W ) is a
vector subspace of F(V, W ) = W V . To this end, let A, B ∈ A(V, W ) with A = Tw1 ◦ L1 ,
B = Tw2 ◦ L2 , where w1 , w2 ∈ W , L1 , L2 ∈ L(V, W ), and let λ ∈ F . If v ∈ V , then
(A + B)(v) = w1 + L1 v + w2 + L2 v = w1 + w2 + (L1 + L2 )v,
(λA)(v) = λw1 + λL1 v,
proving A + B = Tw1 +w2 ◦ (L1 + L2 ) ∈ A(V, W ) and λA = Tλw1 ◦ (λL1 ) ∈ A(V, W ), as
desired. 
1 AFFINE SUBSPACES AND GEOMETRY 20

1.6 Affine Geometry


The subject of affine geometry is concerned with the relationships between affine sub-
spaces, in particular, with the way they are contained in each other.
Definition 1.35. Let V be a vector space over the field F and let M, N ⊆ V be affine
subspaces.

(a) We define the incidence M I N by


 
M IN :⇔ M ⊆N ∨N ⊆M .

If M I N holds, then we call M, N incident or M incident with N or N incident


with M .
(b) If M = v + UM and N = w + UN with v, w ∈ V and UM , UN vector subspaces of
V , then we call M, N parallel (denoted M k N ) if, and only if, UM I UN .
Proposition 1.36. Let V be a vector space over the field F and let M, N ⊆ V be affine
subspaces.

(a) If M k N , then M I N or M ∩ N = ∅.
(b) If n ∈ N0 and An denotes the set of affine subspaces with dimension n of V , then
the parallelity relation of Def. 1.35(b) constitutes an equivalence relation on An .
(c) If A denotes the set of all affine subspaces of V , then, for dim V ≥ 2, the parallelity
relation of Def. 1.35(b) is not transitive (in particular, not an equivalence relation)
on A.

Proof. (a): Let M = v + UM and N = w + UN with v, w ∈ V and UM , UN vector


subspaces of V . Without loss of generality, assume UM ⊆ UN . Assume there exists
x ∈ M ∩ N . Then, if y ∈ M , then y − x ∈ UM ⊆ UN , implying y = x + (y − x) ∈ N and
M ⊆ N.
(b): It is immediate from Def. 1.35 that k is reflexive and symmetric. It remains to
show k is transitive on An . Thus, suppose M = v + UM , N = w + UN , P = z + UP
with v, w, z ∈ V and UM , UN , UP vector subspaces of dimension n of V . If M k N , then
UM I UN and dim UM = dim UN = n implies UM = UN by [Phi19, Th. 5.27(d)]. In the
same way, N k P implies UN = UP . But then UM = UP and M k P , proving transitivity
of k.
(c): Let u, w ∈ V be linearly independent, U := h{u}i, W := h{w}i. Then U k V and
W k V , but U ∦ W (e.g., due to (a)). 
1 AFFINE SUBSPACES AND GEOMETRY 21

Caveat 1.37. The statement of Prop. 1.36(b) becomes false if n ∈ N0 is replaced by an


infinite cardinality: In an adaptation of the proof of Prop. 1.36(c), suppose V is a vector
space over the field F , where the distinct vectors v1 , v2 , . . . are linearly independent, and
define B := {vi : i ∈ N}, U := hB \ {v1 }i, W := hB \ {v2 }i, X := hBi. Then, clearly,
U k X and W k X, but U ∦ W (e.g., due to Prop. 1.36(a)).

Proposition 1.38. Let V be a vector space over the field F .

(a) If x, y ∈ V with x 6= y, then there exists a unique line l ⊆ V (i.e. a unique affine
subspace l of V with dim l = 1) such that x, y ∈ l. Moreover, this affine subspace is
given by
l = x + h{x − y}i. (1.7)

(b) If x, y, z ∈ V and there does not exist a line l ⊆ V such that x, y, z ∈ l, then there
exists a unique plane p ⊆ V (i.e. a unique affine subspace p of V with dim p = 2)
such that x, y, z ∈ p. Moreover, this affine subspace is given by

p = x + h{y − x, z − x}i. (1.8)

(c) If v1 , . . . , vn ∈ V , n ∈ N, then aff{v1 , . . . , vn } = v1 + h{v2 − v1 , . . . , vn − v1 }i.

Proof. Exercise. 

Proposition 1.39. Let V, W be vector spaces over the field F and let M, N ⊆ V be
affine subspaces.

(a) If A ∈ A(V, W ), then M I N implies A(M ) I A(N ), and M k N implies A(M ) k


A(N ).

(b) If v ∈ V , then Tv (M ) k M .

Proof. (a): Let A ∈ A(V, W ). Then M I N implies A(M ) I A(N ), since M ⊆ N im-
plies A(M ) ⊆ A(N ) and A(M ), A(N ) are affine subspaces of W due to Prop. 1.28(a).
Moreover, if M = v + UM , N = w + UN with v, w ∈ V and UM , UN vector subspaces
of V , A = Tx ◦ L with x ∈ W and L ∈ L(V, W ), then A(M ) = x + Lv + L(UM ) and
A(N ) = x + Lw + L(UN ), such that M k N implies A(M ) k A(N ), since UM ⊆ UN
implies L(UM ) ⊆ L(UN ).
(b) is immediate from Tv (M ) = v + w + U for M = w + U with w ∈ V and U a vector
subspace of V . 
2 DUALITY 22

2 Duality

2.1 Linear Forms and Dual Spaces


If V is a vector space over the field F , then maps from V into F are often of particular
interest and importance. Such maps are sometimes called functionals or forms. Here,
we will mostly be concerned with linear forms: Let us briefly review some examples of
linear forms that we already encountered in [Phi19]:
Example 2.1. Let V be a vector space over the field F .

(a) Let B be a basis of V . IfPcv : Bv −→ F \ {0}, Bv ⊆ V , are the corresponding


coordinate maps (i.e. v = b∈Bv cv (b) b for each v ∈ V ), then, for each b ∈ B, the
projection onto the coordinate with respect to b,
(
cv (b) for b ∈ Bv ,
πb : V −→ F, πb (v) :=
0 for b ∈
/ Bv ,

is a linear form (cf. [Phi19, Ex. 6.7(b)]).

(b) Let I be a nonempty set, V := F(I, F ) = F I (i.e. the vector space of functions
from I into F ). Then, for each i ∈ I, the projection onto the ith coordinate

πi : V −→ F, πi (f ) := f (i),

is a linear form (cf. [Phi19, Ex. 6.7(c)]).

(c) Let F := K, where, as in [Phi19], we write K if K may stand for R or C. Let V be


the set of convergent sequences in K. Then

A : V −→ K, A(zn )n∈N := lim zn ,


n→∞

is a linear form (cf. [Phi19, Ex. 6.7(e)(i)]).

(d) Let a, b ∈ R, a ≤ b, I := [a, b], and let V := R(I, K) be the set of all K-valued
Riemann integrable functions on I. Then
Z
J : V −→ K, J(f ) := f,
I

is a linear form (cf. [Phi19, Ex. 6.7(e)(iii)]).


Definition 2.2. Let V be a vector space over the field F .
2 DUALITY 23

(a) The functions from V into F (i.e. the elements of F(V, F ) = F V ) are called func-
tionals or forms on V . In particular, the elements of L(V, F ) are called linear
functionals or linear forms on V .

(b) The set


V ′ := L(V, F ) (2.1)
is called the (linear2 ) dual space (or just the dual) of V (in the literature, one often
also finds the notation V ∗ instead of V ′ ). We already know from [Phi19, Th. 6.18]
that V ′ constitutes a vector space over F .

Corollary 2.3. Let V be a vector space over the field F . Then each linear form α :
V −→ F is uniquely determined by its values on a basis of V . More precisely, if B is a
P v ∈ V , cv : Bv −→ F \ {0}, Bv ⊆ V ,
basis of V , (λb )b∈B is a family in F , and, for each
is the corresponding coordinate map (i.e. v = b∈Bv cv (b) b for each v ∈ V ), then
!
X X
α : V −→ F, α(v) = α cv (b) b := cv (b) λb , (2.2)
b∈Bv b∈Bv

is linear, and α̃ ∈ V ′ with


∀ α̃(b) = λb ,
b∈B

implies α = α̃.

Proof. Corollary 2.3 constitutes a special case of [Phi19, Th. 6.6]. 

Corollary 2.4. Let V be a vector space over the field F and let B be a basis of V .
Using Cor. 2.3, define linear forms αb ∈ V ′ by letting
(
1 for a = b,
∀ αb (a) := δba = (2.3)
(b,a)∈B×B 0 for a 6= b.

Define
B ′ := αb : b ∈ B .

(2.4)

(a) Then B ′ is linearly independent.


2
In Functional Analysis, where the vector space V over K is endowed with the additional structure
of a topology (e.g., V might be the normed space Kn ), one defines the (topological) dual Vtop′
of V
′ ∗
(there usually also just denoted as V or V ) to consist of all linear functionals on V that are also
continuous with respect to the topology on V (cf. [Phi17b, Ex. 3.1]). Depending on the topology on

V , Vtop can be much smaller than V ′ – Vtop

tends to be much more useful in an analysis context.
2 DUALITY 24

(b) If V is finite-dimensional, dim V = n ∈ N, then B ′ constitutes a basis for V ′ (in


particular, dim V = dim V ′ ). In this case, B ′ is called the dual basis of B (and B
the dual basis of B ′ ).

(c) If dim V = ∞, then hB ′ i ( V ′ and, in particular, B ′ is not a basis of V ′ (in fact,


in this case, one has dim V ′ > dim V , see [Jac75, pp. 244-248]).

Proof. Cor. 2.4(a),(b),(c) constitute special cases of the corresponding cases of [Phi19,
Th. 6.19]. 

Definition 2.5. If V is a vector space over the field F with dim V = n ∈ N and
B := (b1 , . . . , bn ) is an ordered basis of V , then we call B ′ := (α1 , . . . , αn ), where
 
∀ αi ∈ V ′ ∧ αi (bj ) = δij , (2.5)
i∈{1,...,n}

the ordered dual basis of B (and B the ordered dual basis of B ′ ) – according to Cor.
2.4(b), B ′ is, indeed, an ordered basis of V ′ .

Example 2.6. Consider V := R2 . If b1 := (1, 0), b2 := (1, 1), then B := (b1 , b2 ) is


an ordered basis of V . Then the ordered dual basis B ′ = (α1 , α2 ) of V ′ consists of the
maps α1 , α2 ∈ V ′ with α1 (b1 ) = α2 (b2 ) = 1, α1 (b2 ) = α2 (b1 ) = 0, i.e. with, for each
(v1 , v2 ) ∈ V ,

α1 (v1 , v2 ) = α1 (v1 − v2 )b1 + v2 b2 = v1 − v2 ,

α2 (v1 , v2 ) = α2 (v1 − v2 )b1 + v2 b2 = v2 .

Notation 2.7. Let V be a vector space over the field F , dim V = n ∈ N, with ordered
basis B = (b1 , . . . , bn ). Moreover, let B ′ = (α1 , . . . , αn ) be the corresponding ordered
dual basis of V ′ . If one then denotes the coordinates of v ∈ V with respect to B as the
column vector  
v1
 .. 
v =  . ,
vn
then one typically denotes the coordinates of γ ∈ V ′ with respect to B ′ as the row vector

γ = γ1 . . . γn

(this has the advantage that one then can express γ(v) as a matrix product, cf. Rem.
2.8(a) below).

Remark 2.8. We remain in the situation of Not. 2.7 above.


2 DUALITY 25

(a) We obtain
n
! n
! n
n X n
n X
X X X X
γ(v) = γ k αk v l bl = γk vl αk (bl ) = γk vl δkl
k=1 l=1 l=1 k=1 l=1 k=1
 
v
n
X   .1 
= γ k vk = γ 1 . . . γn  ..  .
k=1 vn

(b) Let B̃V := (ṽ1 , . . . , ṽn ) be another ordered basis of V and (cji ) ∈ GLn (F ) such that
n
X
∀ ṽi = cji vj .
i∈{1,...,n}
j=1

If B̃V′ := (α̃1 , . . . , α̃n ) denotes the ordered dual basis corresponding to B̃V and
(dji ) := (cji )−1 , then
n
X n
X
∀ α̃i = djit αj = dij αj ,
i∈{1,...,n}
j=1 j=1

where (djit ) denotes the transpose of (dji ), i.e.


(djit ) ∈ GLn (F ) with ∀ dtji := dij :
(j,i)∈{1,...,n}×{1,...,n}

Indeed, for each k, l ∈ {1, . . . , n}, we obtain


n
! n
! n ! n X
n n
X X X X X
dkj αj (ṽl ) = dkj αj cil vi = dkj cil δji = dkj cjl = δkl .
j=1 j=1 i=1 j=1 i=1 j=1

Proposition 2.9. Let V be a vector space over the field F . If U is a vector subspace of
V and v ∈ V \ U , then there exists α ∈ V ′ , satisfying
α(v) = 1 ∧ ∀ α(u) = 0.
u∈U

Proof. Let BU be a basis of U . Then Bv := {v} ∪˙ BU is linearly independent and,


according to [Phi19, Th. 5.23(a)], there exists a basis B of V such that Bv ⊆ B.
According to Cor. 2.3,
(
1 for b = v,
α : V −→ F, ∀ α(b) :=
b∈B 0 for b 6= v,

defines an element of V ′ , which, clearly, satisfies the required conditions. 


2 DUALITY 26

Definition 2.10. Let V be a vector space over the field F .

(a) The map


h·, ·i : V × V ′ −→ F, hv, αi := α(v), (2.6)
is called the dual pairing corresponding to V .

(b) The dual of V ′ is called the bidual or the second dual of V . One writes V ′′ := (V ′ )′ .

(c) The map

Φ : V −→ V ′′ , v 7→ Φv, (Φv) : V ′ −→ F, (Φv)(α) := α(v), (2.7)

is called the canonical embedding of V into V ′′ (cf. Th. 2.11 below).

Theorem 2.11. Let V be a vector space over the field F .

(a) The canonical embedding Φ : V −→ V ′′ of (2.7) is a linear monomorphism (i.e. a


linear isomorphism Φ : V ∼
= Im Φ ⊆ V ′′ ).

(b) If dim V = n ∈ N, then Φ is a linear isomorphism Φ : V ∼ = V ′′ (in fact, the


converse is also true, i.e., if Φ is an isomorphism, then dim V < ∞, cf. the remark
in Cor. 2.4(c)).

Proof. (a): Exercise.


(b): According to Cor. 2.4(b), n = dim V = dim V ′ = dim V ′′ . Thus, by [Phi19, Th.
6.10], the linear monomorphism Φ is also an epimorphism. 

Corollary 2.12. Let V be a vector space over the field F , dim V = n ∈ N. If B ′ =


{α1 , . . . , αn } is a basis of V ′ , then there exists a basis B of V such that B and B ′ are
dual.

Proof. According to Th. 2.11(b), the canonical embedding Φ : V −→ V ′′ of (2.7)


constitutes a linear isomorphism. Let B ′′ = {f1 , . . . , fn } be the basis of V ′′ that is dual
to B ′ and, for each i ∈ {1, . . . , n}, bi := Φ−1 (fi ). Then, as Φ is a linear isomorphism,
B := {b1 , . . . , bn } is a basis of V . Moreover, B and B ′ are dual:

∀ αi (bj ) = (Φbj )(αi ) = fj (αi ) = δij ,


i,j∈{1,...,n}

where we used that B ′ and B ′′ are dual. 


2 DUALITY 27

2.2 Annihilators
Definition 2.13. Let V be a vector space over the field F , M ⊆ V , S ⊆ V ′ . Moreover,
let Φ : V −→ V ′′ denote the canonical embedding of (2.7). Then
  ( ′
V for M = ∅,
M ⊥ := α ∈ V ′ : ∀ α(v) = 0 = T
v∈M ker(Φv) for M 6= ∅
v∈M

is called the (forward) annihilator of M in V ′ ,


  (
⊤ V for S = ∅,
S := v ∈ V : ∀ α(v) = 0 = T
α∈S 6 ∅
α∈S ker α for S =

is called the (backward) annihilator of S in V . In view of Rem. 2.15 and Ex. 2.16(b)
below, one also calls v ∈ V and α ∈ V ′ such that
(2.6)
α(v) = hv, αi = 0

perpendicular or orthogonal and, in consequence, sets M ⊥ and S ⊤ are sometimes called


M perp and S perp, respectively.

Lemma 2.14. Let V be a vector space over the field F , M ⊆ V , S ⊆ V ′ . Then M ⊥ is


a subspace of V ′ and S ⊤ is a subspace of V . Moreover,

M ⊥ = hM i⊥ , S ⊤ = hSi⊤ . (2.8)

Proof. Since M ⊥ and S ⊤ are both intersections of kernels of linear maps, they are
subspaces, since kernels are subspaces by [Phi19, Prop. 6.3(c)] and intersections of sub-
spaces are subspaces by [Phi19, Th. 5.7(a)]. Moreover, it is immediate from Def. 2.13
that M ⊥ ⊇ hM i⊥ and S ⊤ ⊇ hSi⊤ . On the other hand, consider α ∈ M ⊥ and v ∈ S ⊤ .
Let λ1 , . . . , λn ∈ F , n ∈ N. If v1 , . . . , vn ∈ M , then
n
! n
α∈M ⊥
X X
α λi vi = λi α(vi ) = 0,
i=1 i=1

showing α ∈ hM i⊥ and M ⊥ ⊆ hM i⊥ . Analogously, if α1 , . . . , αn ∈ S, then


n
! n
v∈S ⊤
X X
λi αi (v) = λi αi (v) = 0,
i=1 i=1

showing v ∈ hSi⊤ and S ⊤ ⊆ hSi⊤ . 


2 DUALITY 28

Remark 2.15. On real vector spaces V , one can study so-called inner products (also
called scalar products), h·, ·i : V × V −→ R, (v, w) 7→ hv, wi ∈ R, which, as part
of their definition, have the requirement of being bilinear forms, i.e., for each v ∈ V ,
hv, ·i : V −→ R is a linear form and, for each w ∈ V , h·, wi : V −→ R is a linear form
(we will come back to vector spaces with inner products again Sec. 10 below). One then
calls vectors v, w ∈ V perpendicular or orthogonal with respect to h·, ·i if, and only if,
hv, wi = 0 so that the notions of Def. 2.13 can be seen as generalizing orthogonality
with respect to inner products (also cf. Ex. 2.16(b) below).
Example 2.16. (a) Let V be a vector space over the field F and let U be a subspace
of V with BU being a basis of U . Then, according to [Phi19, Th. 5.23(a)], there
exists a basis B of V such that BU ⊆ B. Then Cor. 2.3 implies
 

∀′ α∈U ⇔ ∀ α(b) = 0 .
α∈V b∈BU

(b) Consider the real vector space R2 and let

h·, ·i : R2 × R2 −→ R, h(v1 , v2 ), (w1 , w2 )i := v1 w1 + v2 w2 ,

denote the so-called Euclidean inner product on R2 . Then, clearly, for each w =
(w1 , w2 ) ∈ R2 ,

αw : R2 −→ R, αw (v) := hv, wi = v1 w1 + v2 w2 ,

defines a linear form on R2 . Let v := (1, 2). Then the span of v, i.e. lv := {(λ, 2λ) :
λ ∈ R}, represents the line through v. Moreover, for each w = (w1 , w2 ) ∈ R2 ,

αw ∈ {v}⊥ = lv⊥ ⇔ αw (v) = w1 + 2w2 = 0 ⇔ w1 = −2w2


⇔ w ∈ l⊥ := {(−2λ, λ) : λ ∈ R}.

Thus, l⊥ is spanned by (−2, 1) and we see that lv⊥ consists precisely of the linear
forms αw that are given by vectors w that are perpendicular to v in the Euclidean
geometrical sense (i.e. in the sense usually taught in high school geometry).

The following notions defined for linear forms in connection with subspaces can some-
times be useful when studying annihilators:
Definition 2.17. Let V be a vector space over the field F and let U be a subspace of
V . Then
R : V ′ −→ U ′ , Rf := f ↾U ,
2 DUALITY 29

is called the restriction operator from V to U ;

I : (V /U )′ −→ V ′ , (Ig)(v) := g(v + U ),

is called the inflation operator from V /U to V .


Theorem 2.18. Let V be a vector space over the field F and let U be a subspace of V
with the restriction operator R and the inflation operator I defined as in Def. 2.17.

(a) R : V ′ −→ U ′ is a linear epimorphism with ker R = U ⊥ . Moreover,

dim U ⊥ + dim U ′ = dim V ′ (2.9)

and
U′ ∼
= V ′ /U ⊥ . (2.10)
(see [Phi19, Th. 6.8(a)] for the precise meaning of (2.9) in case at least one of the
occurring cardinalities is infinite). If dim V = n ∈ N, then one also has

n = dim V = dim U ⊥ + dim U. (2.11)

(b) I is a linear isomorphism I : (V /U )′ ∼


= U ⊥.

Proof. (a): Let α, β ∈ V ′ and λ, µ ∈ F . Then, for each u ∈ U ,



R(λα + µβ)(u) = λα(u) + µβ(u) = λ(Rα)(u) + µ(Rβ)(u) = λ(Rα) + µ(Rβ) (u),

showing R to be linear. Moreover, for each α ∈ V ′ , one has

α ∈ ker R ⇔ ∀ α(u) = 0 ⇔ α ∈ U ⊥,
u∈U

proving ker R = U ⊥ . Let BU be a basis of U . Then, according to [Phi19, Th. 5.23(a)],


there exists C ⊆ V such that BU ∪˙ C is a basis of V . Consider α ∈ U ′ . Using Cor. 2.3,
define β ∈ V ′ by setting (
α(b) for b ∈ BU ,
β(b) :=
0 for b ∈ C.
Then, clearly, Rβ = α, showing R to be surjective. Thus, we have
[Phi19, Th. 6.8(a)]
dim V ′ = dim ker R + dim Im R = dim U ⊥ + dim U ′ ,

thereby proving (2.9). Next, applying the isomorphism theorem of [Phi19, Th. 6.16(a)]
yields
U ′ = Im R ∼
= V ′ / ker R = V ′ /U ⊥ ,
2 DUALITY 30

which is (2.10). Finally, if dim V = n ∈ N, then


Cor. 2.4(b) (2.9) Cor. 2.4(b)
n = dim V = dim V ′ = dim U ⊥ + dim U ′ = dim U ⊥ + dim U,

proving (2.11).
(b): Exercise. 
Theorem 2.19. Let V be a vector space over the field F .

(a) If U is a subspace of V , then (U ⊥ )⊤ = U .

(b) If S is a subspace of V ′ , S ⊆ (S ⊤ )⊥ . If dim V = n ∈ N, then one even has


(S ⊤ )⊥ = S.

(c) If U1 , U2 are subspaces of V , then

(U1 + U2 )⊥ = U1⊥ ∩ U2⊥ , (U1 ∩ U2 )⊥ = U1⊥ + U2⊥ .

(d) If S1 , S2 are subspaces of V ′ , then

(S1 + S2 )⊤ = S1⊤ ∩ S2⊤ , (S1 ∩ S2 )⊤ ⊇ S1⊤ + S2⊤ .

If dim V = n ∈ N, then one also has

(S1 ∩ S2 )⊤ = S1⊤ + S2⊤ .

Proof. (a): Exercise.


(b): According to Def. 2.13, we have
 
(S ) := α ∈ V ′ :
⊤ ⊥
∀ α(v) = 0 ,
v∈S ⊤

showing S ⊆ (S ⊤ )⊥ . Now assume dim V = n ∈ N and suppose there exists α ∈ (S ⊤ )⊥ \S.


Then, according to Prop. 2.9, there exists f ∈ V ′′ , satisfying

f (α) = 1 ∧ ∀ f (β) = 0.
β∈S

Since dim V = n ∈ N, we may employ Th. 2.11(b) to conclude that the canonical
embedding Φ : V −→ V ′′ is a linear isomorphism, in particular, surjective. Thus, there
exists v ∈ V such that f = Φv, i.e. f (γ) = γ(v) for each γ ∈ V ′ . Since f ∈ S ⊥ , we
have β(v) = f (β) = 0 for each β ∈ S, showing v ∈ S ⊤ . Thus, α ∈ (S ⊤ )⊥ implies the
contradiction 0 = α(v) = f (α) = 1. In consequence, (S ⊤ )⊥ \ S = ∅, proving (b).
2 DUALITY 31

(c): Let α ∈ (U1 + U2 )⊥ . Then U1 ⊆ U1 + U2 implies α ∈ U1⊥ , U2 ⊆ U1 + U2 implies


α ∈ U2⊥ , showing α ∈ U1⊥ ∩ U2⊥ and (U1 + U2 )⊥ ⊆ U1⊥ ∩ U2⊥ . Conversely, if α ∈ U1⊥ ∩ U2⊥
and u1 ∈ U1 , u2 ∈ U2 , then α(u1 + u2 ) = α(u1 ) + α(u2 ) = 0, showing α ∈ (U1 + U2 )⊥
and U1⊥ ∩ U2⊥ ⊆ (U1 + U2 )⊥ . To prove the second equality in (c), first, let α ∈ U1⊥ + U2⊥ ,
i.e. α = α1 + α2 with α1 ∈ U1⊥ , α2 ∈ U2⊥ . Then, if v ∈ U1 ∩ U2 , one has α(v) =
α1 (v) + α2 (v) = 0, showing α ∈ (U1 ∩ U2 )⊥ and U1⊥ + U2⊥ ⊆ (U1 ∩ U2 )⊥ . Conversely,
let α ∈ (U1 ∩ U2 )⊥ . We now proceed similar to the proof of [Phi19, Th. 5.30(c)]: We
choose bases B∩ , BU1 , BU2 of U1 ∩ U2 , U1 , and U2 , respectively, such that B∩ ⊆ BU1 and
B∩ ⊆ BU2 , defining B1 := BU1 \ B∩ , B2 := BU2 \ B∩ . Then it was shown in the proof of
[Phi19, Th. 5.30(c)] that

B+ := BU1 ∪ BU2 = B1 ∪˙ B2 ∪˙ B∩

is a basis of U1 + U2 and we may choose C ⊆ V such that B := B+ ∪˙ C is a basis of V .


Using Cor. 2.3, we now define α1 , α2 ∈ V ′ by setting
( (
α(b) for b ∈ B \ B1 , α(b) for b ∈ B1 ,
∀ α1 (b) := α2 (b) :=
b∈B 0 for b ∈ B1 , 0 for b ∈ B \ B1 .

Since α↾B∩ ≡ 0, we obtain α1 ↾BU1 = α1 ↾B1 ∪˙ B∩ ≡ 0 and α2 ↾BU2 = α2 ↾B2 ∪˙ B∩ ≡ 0, showing


α1 ∈ U1⊥ and α2 ∈ U2⊥ . On the other hand, we have, for each b ∈ B, α(b) = α1 (b)+α2 (b),
showing α = α1 + α2 , α ∈ U1⊥ + U2⊥ , and (U1 ∩ U2 )⊥ ⊆ U1⊥ + U2⊥ , thereby completing
the proof of (c).
(d): Exercise. 

2.3 Hyperplanes and Linear Systems


In the present section, we combine duality with the theory of affine spaces of Sec. 1 and
with the theory of linear systems of [Phi19, Sec. 8].

Definition 2.20. Let V be a vector space over the field F . If α ∈ V ′ \ {0} and r ∈ F ,
then the set
Hα,r := α−1 {r} = {v ∈ V : α(v) = r} ⊆ V
is called a hyperplane in V .

Notation 2.21. Let V be a vector space over the field F , v ∈ V , and α ∈ V ′ . We then
write
v ⊥ := {v}⊥ , α⊤ := {α}⊤ .

Theorem 2.22. Let V be a vector space over the field F .


2 DUALITY 32

(a) Each hyperplane H in V is an affine subspace of V , where dim V = 1 + dim H, i.e.


dim V = dim H if V is infinite-dimensional, and dim H = n − 1 if dim V = n ∈ N.
More precisely, if 0 6= α ∈ V ′ and r ∈ F , then
w w
∀ Hα,r = r + α⊤ = r + ker α. (2.12)
w∈V : α(w)6=0 α(w) α(w)

(b) If dim V = n ∈ N and M is an affine subspace of V with dim M = n − 1, then M


is a hyperplane in V , i.e. there exist 0 6= α ∈ V ′ and r ∈ F such that M = Hα,r .

(c) Let α, β ∈ V ′ \ {0} and r, s ∈ F . Then


 
Hα,r = Hβ,s ⇔ ∃ β = λα ∧ s = λr .
06=λ∈F

Moreover, if α = β, then Hα,r and Hβ,s are parallel.

Proof. (a): Let 0 6= α ∈ V ′ and r ∈ F with w ∈ V such that α(w) 6= 0. Define


w
v := r α(w) . Then α(v) = r, i.e. v ∈ Hα,r = α−1 {r}. Thus, by [Phi19, Th. 4.20(f)], we
have
Hα,r = v + ker α = v + {x ∈ V : α(x) = 0} = v + α⊤ ,
proving (2.12). In particular, we have dim Hα,r = dim ker α and, by [Phi19, Th. 6.8(a)],

dim V = dim ker α + dim Im α = dim Hα,r + dim F = dim Hα,r + 1,

thereby completing the proof of (a).


(b),(c): Exercise. 

Proposition 2.23. Let V be a vector space over the field F , dim V = n ∈ N. If M ⊆ V


is an affine subspace of V with dim M = m ∈ N0 , m < n, then M is the intersection of
n − m hyperplanes in V .

Proof. If dim M = m, then there exists a linear subspace U of V with dim U = m


and v ∈ V such that M = v + U . Then, according to (2.11), dim U ⊥ = n − m. Let
{α1 , . . . , αn−m } be a basis of U ⊥ . Define

∀ ri := αi (v).
i∈{1,...,n−m}

Tn−m
We claim M = N := i=1 Hαi ,ri : Indeed, if x = v + u with u ∈ U , then

αi ∈U ⊥
∀ αi (x) = αi (v) + αi (u) = ri + 0 = ri ,
i∈{1,...,n−m}
2 DUALITY 33

showing x ∈ N and M ⊆ N . Conversely, let x ∈ N . Then

∀ αi (x − v) = ri − ri = 0, i.e. x − v ∈ αi⊤ ,
i∈{1,...,n−m}

implying
Th. 2.19(a)
x − v ∈ h{α1 , . . . , αn−m }i⊤ = (U ⊥ )⊤ = U,
showing x ∈ v + U = M and N ⊆ M as claimed. 

Example 2.24. Let F be a field. As in [Phi19, Sec. 8.1], consider the linear system
n
X
∀ ajk xk = bj , (2.13)
j∈{1,...,m}
k=1

where m, n ∈ N; b1 , . . . , bm ∈ F and the aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n}.


We know that we can also write (2.13) in matrix form as Ax = b with A = (aji ) ∈
M(m, n, F ), m, n ∈ N, and b = (b1 , . . . , bm )t ∈ M(m, 1, F ) ∼
= F m . The set of solutions
to (2.13) is
L(A|b) = {x ∈ F n : Ax = b}.
If we now define the linear forms
   
v1 v
 ..    .1  Xn
∀ αj : F n −→ F, .
αj  .  := aj1 . . . ajn  .  = ajk vk ,
j∈{1,...,m}
vn vn k=1

then we can rewrite (2.13) as


   
x1 m
x =  ...  ∈ Hαj ,bj 
\
∀ ⇔ x∈ Hαj ,bj . (2.14)
   

j∈{1,...,m}
xn j=1

Thus, we have L(A|b) = m


T
j=1 Hαj ,bj and we can view (2.14) as a geometric interpretation
of (2.13), namely that the solution vectors x are required to lie in the intersection of the
m hyperplanes Hα1 ,b1 , . . . , Hαm ,bm . Even though we know from [Phi19, Th. 8.15] that the
elementary row operations of [Phi19, Def. 8.13] do not change the set of solutions L(A|b),
it might be instructive to reexamine this fact in terms of linear forms and hyperplanes:
The elementary row operation of row switching merely corresponds to changing the order
of the Hαj ,bj in the intersection yielding L(A|b). The elementary row operation of row
multiplication rj 7→ λrj (0 6= λ ∈ F ) does not change L(A|b) due to Hαj ,bj = Hλαj ,λbj
according to Th. 2.22(c). The elementary row operation of row addition rj 7→ rj + λri
2 DUALITY 34

(λ ∈ F , i 6= j) replaces Hαj ,bj by Hαj +λαi ,bj +λbi . We verify, once again, what we already
know form [Phi19, Th. 8.15], namely
 
m
\ m
\ 
L(A|b) = Hαk ,bk = M := 
 H α k ,b k
 ∩ Hα +λα ,b +λb :
 j i j i
k=1 k=1,
k6=j

If x ∈ L(A|b), then (αj + λαi )(x) = bj + λbi , showing x ∈ Hαj +λαi ,bj +λbi and x ∈ M .
Conversely, if x ∈ M , then αj (x) = (αj + λαi )(x) − λαi (x) = bj + λbi − λbi = bj , showing
x ∈ Hαj ,bj and x ∈ L(A|b).

2.4 Dual Maps


Theorem 2.25. Let V, W be vector spaces over the field F . If A ∈ L(V, W ), then there
exists a unique map A′ : W ′ −→ V ′ such that (using the notation of (2.6))

∀ ∀ (A′ β)(v) = hv, A′ βi = hAv, βi = β(Av). (2.15)


v∈V β∈W ′

Moreover, this map turns out to be linear, i.e. A′ ∈ L(W ′ , V ′ ).

Proof. Clearly, given A ∈ L(V, W ), (2.15) uniquely defines a map A′ : W ′ −→ V ′ (for


each β ∈ W ′ , (2.15) defines the map (A′ β) = β ◦ A ∈ L(V, F ) = V ′ ). It merely remains
to check that A′ is linear. To this end, let β, β1 , β2 ∈ W ′ , λ ∈ F , and v ∈ V . Then

A′ (β1 + β2 ) (v) = (β1 + β2 )(Av) = β1 (Av) + β2 (Av) = (A′ β1 )(v) + (A′ β1 )(v)


= (A′ β1 + A′ β2 )(v),
A′ (λβ) (v) = (λβ)(Av) = λ(A′ β)(v),


showing A′ (β1 + β2 ) = A′ β1 + A′ β2 , A′ (λβ) = λA′ (β), and the linearity of A′ . 

Definition 2.26. Let V, W be vector spaces over the field F , A ∈ L(V, W ). Then the
map A′ ∈ L(W ′ , V ′ ) given by Th. 2.26 is called the dual map corresponding to A (or
the transpose of A).

Theorem 2.27. Let F be a field, m, n ∈ N. Let V, W be finite-dimensional vector spaces


over F , dim V = n, dim W = m, where BV := (v1 , . . . , vn ) is an ordered basis of V and
BW := (w1 , . . . , wm ) is an ordered basis of W . Moreover, let BV ′ = (α1 , . . . , αn ) and
BW ′ = (β1 , . . . , βm ) be the corresponding (ordered) dual bases of V ′ and W ′ , respectively.
2 DUALITY 35

If A ∈ L(V, W ) and (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with respect


to BV and BW , then the transpose of (aji ), i.e.

(atji )(j,i)∈{1,...,n}×{1,...,m} ∈ M(n, m, F ), where ∀ atji := aij


(j,i)∈{1,...,n}×{1,...,m}

is the matrix corresponding to A′ with respect to BW ′ and BV ′ .

Proof. If (aji ) is the matrix corresponding to A with respect to BV and BW , then (cf.
[Phi19, Th. 7.10(b)])
Xm
∀ Avi = aji wj , (2.16)
i∈{1,...,n}
j=1

and we have to show


n
X n
X

∀ A βj = atij αi , = aji αi . (2.17)
j∈{1,...,m}
i=1 i=1

Indeed, one computes, for each j ∈ {1, . . . , m} and for each k ∈ {1, . . . , n},
m
! m m
X X X
(A′ βj )vk = βj (Avk ) = βj alk wl = alk βj (wl ) = alk δjl = ajk
l=1 l=1 l=1
n n n
!
X X X
= aji δik = aji αi (vk ) = aji αi vk ,
i=1 i=1 i=1

thereby proving (2.17). 

Remark 2.28. As in Th. 2.27, let m, n ∈ N, let V, W be finite-dimensional vector spaces


over the field F , dim V = n, dim W = m, where BV := (v1 , . . . , vn ) is an ordered basis
of V and BW := (w1 , . . . , wm ) is an ordered basis of W with corresponding (ordered)
dual bases BV ′ = (α1 , . . . , αn ) of V ′ and BW ′ = (β1 , . . . , βm ) of W ′ . Let A ∈ L(V, W )
with dual map A′ ∈ L(W ′ , V ′ ).

(a) If (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with respect to BV and BW


and if one represents elements of Pthe duals as column vectors, then,
Pnaccording to′
m ′ ′
Th. 2.27, one obtains, for γ = i=1 γi βi ∈ W and ǫ := A (γ) = i=1 ǫi αi ∈ V
with γ1 , . . . , γm , ǫ1 , . . . , ǫn ∈ F ,
   
ǫ1 γ1
 ..  t  .. 
 .  = (aji )  .  .
ǫn γm
2 DUALITY 36

However, if one adopts the convention of Not. 2.7 to represent elements of the duals
as row vectors, then one applies transposes in the above equation to obtain
 
ǫ1 . . . ǫn = γ1 . . . γm (aji ),
showing that this notation allows A and A′ to be represented by the same matrix
(aji ).
(b) As in [Phi19, Th. 7.14] and Rem. 2.8(b) above, we now consider basis transitions
n
X m
X
∀ ṽi = cji vj , ∀ w̃i = fji wj , ,
i∈{1,...,n} i∈{1,...,m}
j=1 j=1

B̃V := (ṽ1 , . . . , ṽn ), B̃W := (w̃1 , . . . , w̃m ). We then know from [Phi19, Th. 7.14] that
the matrix representing A with respect to B̃V and B̃W is (fji )−1 (aji )(cji ). Thus, ac-
cording to Th. 2.26, the matrix representing A′ with respect to the dual bases B̃V′ =

t
(α̃1 , . . . , α̃n ) and B̃W = (β̃1 , . . . , β̃m ) is (fji )−1 (aji )(cji ) = (cji )t (aji )t ((fji )−1 )t .
Of course, we can, alternatively, observe that, by Rem. 2.8(b), the basis transi-
tion from BV ′ to B̃V′ is given by ((cji )−1 )t and the basis transition from BW ′ to

B̃W is given by ((fji )−1 )t and compute the matrix representing A′ with respect
to B̃V′ and B̃W ′
via Th.Pm2.26 and [Phi19, Th. 7.14] to obtain (cji )t (aji )t ((fji )−1 )t ,
′ ′
P n ′
as before. If γ = i=1 γi β̃i ∈ W and ǫ := A (γ) = i=1 ǫi α̃i ∈ V with
γ1 , . . . , γm , ǫ1 , . . . , ǫn ∈ F , then this yields
ǫ1 . . . ǫn = γ1 . . . γm (fji )−1 (aji )(cji ).
 

(c) Comparing with [Phi19, Rem. 7.24], we observe that the dual map A′ ∈ L(W ′ , V ′ )
is precisely the transpose map At of the map A considered in [Phi19, Rem. 7.24].
Moreover, as a consequence of Th. 2.26, the rows of the matrix (aji ), representing
A, span Im A′ in the same way that the columns of (aji ) span Im A.
Theorem 2.29. Let V, W be vector spaces over the field F .

(a) The duality map ′ : L(V, W ) −→ L(W ′ , V ′ ), A 7→ A′ , is linear.


(b) If X is another vector space over F , A ∈ L(V, W ) and B ∈ L(W, X), then
(BA)′ = A′ B ′ .

Proof. (a): Let A, B ∈ L(V, W ), λ ∈ F , β ∈ W ′ , and v ∈ V . Then we compute


(A + B)′ (β)(v) = β (A + B)(v) = β(Av + Bv) = β(Av) + β(Bv)


= (A′ β)(v) + (B ′ β)(v) = (A′ + B ′ )(β)(v),


(λA)′ (β)(v) = β (λA)(v) = λβ(Av) = (λA′ )(β)(v),

2 DUALITY 37

showing (A + B)′ = A′ + B ′ , (λA)′ = λA′ , and the linearity of ′ .


(b): If γ ∈ X ′ , then

(B ◦ A)′ (γ) = γ ◦ (B ◦ A) = (γ ◦ B) ◦ A = A′ (γ ◦ B) = A′ (B ′ γ) = (A′ ◦ B ′ )(γ),

showing (B ◦ A)′ = A′ ◦ B ′ . 

Theorem 2.30. Let V, W be vector spaces over the field F and A ∈ L(V, W ).

(a) ker A′ = (Im A)⊥ .

(b) ker A = (Im A′ )⊤ .

(c) A is an epimorphism if, and only if, A′ is a monomorphism.

(d) If A′ is an epimorphism, then A is a monomorphism. If A is a monomorphism and


dim V = n ∈ N, then A′ is an epimorphism.

(e) If A′ is an isomorphism, then A is a isomorphism. If A is a isomorphism and


dim V = n ∈ N, then A′ is an isomorphism.

Proof. (a) is due to the equivalence


 
′ ′
β ∈ ker A ⇔ ∀ β(Av) = (A β)(v) = 0 ⇔ β ∈ (Im A)⊥ .
v∈V

(b): Exercise.
(c): If A is an epimorphism, then Im A = W , implying
(a)
ker A′ = (Im A)⊥ = W ⊥ = {0},

showing A′ to be a monomorphism. Conversely, if A′ is a monomorphism, then ker A′ =


{0}, implying
Th. 2.19(a) ⊤ (a)
Im A = (Im A)⊥ = (ker A′ )⊤ = {0}⊤ = W,

showing A to be an epimorphism.
(d): Exercise.
(e) is now immediate from combining (c) and (d). 
3 SYMMETRIC GROUPS 38

Theorem 2.31. Let V, W be vector spaces over the field F with canonical embeddings
ΦV : V −→ V ′′ and ΦW : W −→ W ′′ according to Def. 2.10(c). Let A ∈ L(V, W ) and
A′′ := (A′ )′ ∈ L(V ′′ , W ′′ ). Then we have

ΦW ◦ A = A′′ ◦ ΦV . (2.18)

Proof. If v ∈ V and β ∈ W ′ , then

(ΦW ◦ A)(v) (β) = β(Av) = (A′ β)(v) = (ΦV v) A′ β = (ΦV v) ◦ A′ (β)


  

= A′′ (ΦV v) (β) = (A′′ ◦ ΦV )(v) (β)


 

proves (2.18). 

3 Symmetric Groups
In preparation for the introduction of the notion of determinant (which we will find
to be a useful tool to further study linear endomorphisms between finite-dimensional
vector spaces), we revisit the symmetric group Sn of [Phi19, Ex. 4.9(b)].

Definition 3.1. Let k, n ∈ N, k ≤ n. A permutation π ∈ Sn is called a k-cycle if, and


only if, there exist k distinct numbers i1 , . . . , ik ∈ {1, . . . , n} such that

ij+1 if i = ij , j ∈ {1, . . . , k − 1},

π(i) = i1 if i = ik , (3.1)

i if i ∈/ {i1 , . . . , ik }.

A 2-cycle is also known as a transposition.

Notation 3.2. Let n ∈ N, π ∈ Sn .

(a) One writes  


i1 i2 ... in
π= ,
π(i1 ) π(i2 ) . . . π(in )
where {i1 , . . . , in } = {1, . . . , n}.

(b) If π is a k-cycle as in (3.1), k ∈ N, then one also writes

π = (i1 i2 . . . ik ). (3.2)
3 SYMMETRIC GROUPS 39

Example 3.3. (a) Consider π ∈ S5 ,


 
5 4 3 2 1
π= = (3 1 5).
3 4 1 2 5

Then
π(1) = 5, π(2) = 2, π(3) = 1, π(4) = 4, π(5) = 3.

(b) Letting π ∈ S5 be as in (a), we have (recalling that the composition on Sn is merely


the usual composition of maps)
  
5 4 3 2 1 1 2 3 4 5
π= = (3 5)(3 1).
4 1 2 5 3 2 3 4 5 1

Lemma 3.4. Let α, k, n ∈ N, α ≤ k ≤ n, and consider distinct numbers i1 , . . . , ik ∈


{1, . . . , n}. Then, in Sn , the following statements hold true:

(a) One has


(i1 i2 . . . ik ) = (iα iα+1 . . . ik i1 . . . iα−1 ).

(b) Let n ≥ 2. If 1 < α < k, then

(i1 i2 . . . ik )(i1 iα ) = (i1 iα+1 . . . ik )(i2 . . . iα );

and, moreover,

(i1 i2 . . . ik )(i1 ik ) = (i2 . . . ik )(i1 ).

(c) Let n ≥ 2. Given β, l ∈ N, β ≤ l ≤ n, and distinct numbers j1 , . . . , jl ∈ {1, . . . , n} \


{i1 , . . . , ik }, one has

(i1 . . . ik )(j1 . . . jl )(i1 j1 ) = (j1 i2 i3 . . . ik i1 j2 j3 . . . jl ).

Proof. Exercise. 

Notation 3.5. Let M, N be sets. Define

S(M, N ) := {(f : M −→ N ) : f bijective}.

Proposition 3.6. Let M, N be sets with #M = #N = n ∈ N0 , S := S(M, N ) (cf. Not.


3.5). Then #S = n!; in particular #SM = n!.
3 SYMMETRIC GROUPS 40

Proof. We conduct the proof via induction: If n = 0, then S contains precisely the
empty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = {a},
N = {b}, then S contains precisely the map f : M −→ N , f (a) = b, and #S = 1 = 1!
is true. For the induction step, fix n ∈ N and assume #M = #N = n + 1. Let a ∈ M
and [  
A := S M \ {a}, N \ {b} . (3.3)
b∈N

Since the union in (3.3) is finite and disjoint, one has


 
ind.hyp.
X X
#A = #S M \ {a}, N \ {b} = (n!) = (n + 1) · n! = (n + 1)!.
b∈N b∈N

Thus, it suffices to show

φ : S −→ A, φ(f ) : M \ {a} −→ N \ {f (a)}, φ(f ) := f ↾M \{a} ,

is well-defined and bijective. If f : M −→ N is bijective, then f ↾M \{a} : M \ {a} −→


N \ {f (a)} is bijective as well, i.e. φ is well-defined. Suppose f, g ∈ S with f 6= g. If
f (a) 6= g(a), then φ(f ) 6= φ(g), as they have different ranges. If f (a) = g(a), then there
exists x ∈ M \ {a} with f (x) 6= g(x), implying φ(f )(x) = f (x) 6= g(x) = φ(g)(x), i.e.,
once again, φ(f ) 6= φ(g). Thus, φ is injective. Now let h ∈ S M \ {a}, N \ {b} for some
b ∈ N . Letting (
b for x = a,
f : M −→ N, f (x) :=
h(x) for x 6= a,
we have φ(f ) = h, showing φ to be surjective as well. 

Theorem 3.7. Let n ∈ N.

(a) Each permutation can be decomposed into finitely many disjoint cycles: For each
π ∈ Sn , there exists a decomposition of {1, . . . , n} into disjoint sets A1 , . . . , AN ,
N ∈ N, i.e.
N
[
{1, . . . , n} = Ai and Ai ∩ Aj = ∅ for i 6= j, (3.4)
i=1

such that Ai consists of the distinct elements ai1 , . . . , ai,Ni and

π = (aN 1 . . . aN,NN ) · · · (a11 . . . a1,N1 ). (3.5)

The decomposition (3.5) is unique up to the order of the cycles.


3 SYMMETRIC GROUPS 41

(b) If n ≥ 2, then every permutation π ∈ Sn is the composition of finitely many trans-


positions, where each transposition permutes two juxtaposed elements, i.e.

∀ ∃ ∃ π = τN ◦ · · · ◦ τ1 , (3.6)
π∈Sn N ∈N τ1 ,...,τN ∈T


where T := (i i + 1) : i ∈ {1, . . . , n − 1} .

Proof. (a): We prove the statement by induction on n. For n = 1, there is nothing to


prove. Let n > 1 and choose i ∈ {1, . . . , n}. We claim that
 
k l
∃ π (i) = i ∧ ∀ π (i) 6= i . (3.7)
k∈N l∈{1,...,k−1}

Indeed, since {1, . . . , n} is finite, there must be a smallest k ∈ N such that π k (i) ∈ A1 :=
{i, π(i), . . . , π k−1 (i)}. Since π is bijective, it must be π k (i) = i and (i π(i) . . . , π k−1 (i))
is a k-cycle. We are already done in case k = n. If k < n, then consider B :=
{1, . . . , n} \ A1 . Then, again using the bijectivity of π, π↾B is a permutation onSB with
1 ≤ #B < n. By induction, there are disjoint sets A2 , . . . , AN such that B = N j=2 Aj ,
Aj consists of the distinct elements aj1 , . . . , aj,Nj and

π↾B = (aN 1 . . . aN,NN ) · · · (a21 . . . a2,N2 ).

Since π = (i π(i) . . . , π k−1 (i)) ◦ π ↾B , this finishes the proof of (3.5). If there were
another, different,
SM decomposition of π into cycles, say, given by disjoint sets B1 , . . . , BM ,
{1, . . . , n} = i=1 Bi , M ∈ N, then there were Ai 6= Bj and k ∈ Ai ∩ Bj . But then k
were in the cycle given by Ai and in the cycle given by Bj , implying Ai = {π l (k) : l ∈
N} = Bj , in contradiction to Ai 6= Bj .
(b): We first show that every π ∈ Sn is a composition of finitely many transpositions
(not necessarily transpositions from the set T ): According to (a), it suffices to show
that every cycle is a composition of finitely many transpositions. Since each 1-cycle is
the identity, it is (i) = Id = (1 2) (1 2) for each i ∈ {1, . . . , n}. If (i1 . . . ik ) is a k-cycle,
k ∈ {2, . . . , n}, then

(i1 . . . ik ) = (i1 i2 ) (i2 i3 ) · · · (ik−1 ik ) : (3.8)

Indeed,

 i1
 for i = ik ,
∀ (i1 i2 ) (i2 i3 ) · · · (ik−1 ik )(i) = il+1 for i = il , l ∈ {1, . . . , k − 1},
i∈{1,...,n} 
i for i ∈
/ {i1 , . . . , ik },

3 SYMMETRIC GROUPS 42

proving (3.8). To finish the proof of (b), we observe that every transposition is a
composition of finitely many elements of T : If i, j ∈ {1, . . . , n}, i < j, then
(i j) = (i i + 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i + 1 i + 2)(i i + 1) : (3.9)
Indeed,
∀ (i i + 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i + 1 i + 2)(i i + 1)(k)
k∈{1,...,n}


 j for k = i,

i for k = j,
=


 k for i < k < j,
k for k ∈ / {i, i + 1, . . . , j},

proving (3.9). 
Remark 3.8. Let n ∈ N, n ≥ 2. According to Th. 3.7(a), each π ∈ Sn has a unique
decomposition into N ∈ N cycles as in (3.5). If τ ∈ Sn is a transposition, then, as
a consequence of Lem. 3.4(a),(b),(c), the corresponding cycle decomposition of πτ has
precisely N + 1 cycles (if Lem. 3.4(b) applies) or precisely N − 1 cycles (if Lem. 3.4(c)
applies).
Definition 3.9. Let k ∈ Z. We call the integer k even if, and only if, k ≡ 0 (mod 2)
(cf. notation introduced in [Phi19, Ex. 4.28(a)]; we call k odd if, and only if, k ≡ 1 (mod
2) (i.e. k is even if, and only if, 2 is a divisor of k; k is odd if, and only if, 2 is no divisor
of k, cf. [Phi19, Def. D.2(a)]). The property of being even or odd is called the parity of
the integer k.
Example 3.10. Let Id ∈ S3 be the identity on {1, 2, 3}. Then
Id = (1)(2)(3) = (1 2)(1 2) = (3 2)(3 2) = (1 2)(1 2)(3 2)(3 2),
(1 2 3) = (1 2)(2 3) = (2 3)(1 3) = (1 2)(1 3)(2 3)(1 2),
illustrating that, for n ≥ 2, one can write elements π of Sn as products of transpositions
with varying numbers of factors. However, while the number k ∈ N0 of factors in such
products representing π is not unique, we will prove in the following Th. 3.11(a) that
the parity of k is uniquely determined by π ∈ Sn .
Theorem 3.11. Let n ∈ N.

(a) Let n ≥ 2, π ∈ Sn , N ∈ N, k ∈ N0 . If
k
Y
π= hi (3.10)
i=1
3 SYMMETRIC GROUPS 43

with transpositions h1 , . . . , hk ∈ Sn and π is decomposed into N cycles according to


Th. 3.7(a), then
k ≡ n − N (mod 2), (3.11)
i.e. the parity of k is uniquely determined by π.
(b) The map
(
1 for n = 1,
sgn : Sn −→ {−1, 1}, sgn(π) :=
k (a)
(−1) = (−1)n−N (mod 2) for n ≥ 2,
where, for n ≥ 2, k = k(π) and N = N (π) are as in (a), constitutes a group
epimorphism (here, {−1, 1} is considered as a multiplicative subgroup of R (or Q)
– as we know all groups with two elements to be isomorphic, we have ({−1, 1}, ·) ∼
=
(Z2 , +)). One calls sgn(π) the sign, signature, or signum of the permutation π.

Proof. (a): We conduct the proof via induction on k: For k = 0, the product in (3.10)
is empty, i.e.
π = Id = (1) · · · (n),
yielding N = n, showing (3.11) to hold. If k = 1, then π = h1 is a transposition and,
thus, has N = n − 1 cycles, showing (3.11) to holdonce again.
 Now assume (3.11) for
Qk+1 Qk
k ≥ 1 by induction and consider π = i=1 hi = i=1 hi hk+1 . Thus, π = πk hk+1 ,
Qk
where πk := i=1 hi . If πk has Nk cycles, then, by induction,
k ≡ n − Nk (mod 2). (3.12)
Moreover, from Rem. 3.8 we know N = Nk + 1 or N = Nk − 1. In both cases, (3.12)
implies k + 1 ≡ n − N (mod 2), completing the induction.
(b): For n = 1, there is nothing to prove. For n ≥ 2, we first note sgn to be well-defined,
as the number of cycles N (π) is uniquely determined by π ∈ Sn (and each π ∈ Sn can
be written as a product of transpositions by Th. 3.7(b)). Next, we note sgn to be
surjective, since, for the identity, we can choose k = 0, i.e. sgn(Id) = (−1)0 = 1, and, for
each transposition τ ∈ Sn (as n ≥ 2, Sn contains at least the transposition τ = (1 2)),
we can choose k = 1, i.e. sgn(τ ) = (−1)1 = −1. To verify sgn to be a homomorphism,
let π, σ ∈ Sn . By Th. 3.7(b), there are transpositions τ1 , . . . , τkπ , h1 , . . . , hkσ ∈ Sn such
that ! k !

Y kσ
Y Ykπ Y σ

π= τi , σ = hi ⇒ πσ = τi hi ,
i=1 i=1 i=1 i=1
implying
sgn(πσ) = (−1)kπ +kσ = (−1)kπ (−1)kσ = sgn(π) sgn(σ),
thus, completing the proof that sgn constitutes a homomorphism. 
3 SYMMETRIC GROUPS 44

Proposition 3.12. Let n ∈ N.

(a) One has


Y π(i) − π(j)
∀ sgn(π) = . (3.13)
π∈Sn
1≤i<j≤n
i−j

(b) Let n ≥ 2, π ∈ Sn . As in Th. 3.7(a), we write π as a product of N ∈ N cycles,


N
(3.5) Y
π = (aN 1 . . . aN,NN ) · · · (a11 . . . a1,N1 ) = γi , (3.14)
i=1

where, for each i ∈ {1, . . . , N }, γi := (ai1 . . . ai,Ni ) is a cycle of length Ni ∈ N.


Then PN
sgn(π) = (−1) i=1 (Ni −1) . (3.15)

Proof. (a): For each π ∈ Sn , let σ(π) denote the value given by the right-hand side of
(3.13). If n = 1, then Sn = {Id} and σ(Id) = 1 = sgn(Id), since the product in (3.13) is
empty (and, thus, equal to 1). For n ≥ 2, we first show

∀ σ(π1 ◦ π2 ) = σ(π1 )σ(π2 ) :


π1 ,π2 ∈Sn

For each π1 , π2 ∈ Sn , one computes


Y π1 (π2 (i)) − π1 (π2 (j))
σ(π1 ◦ π2 ) =
1≤i<j≤n
i−j
 
Y π1 (π2 (i)) − π1 (π2 (j)) π2 (i) − π2 (j)
= ·
1≤i<j≤n
π 2 (i) − π 2 (j) i−j
π2 bij.
= σ(π1 )σ(π2 ),

thereby establishing the case. Next, if τ ∈ Sn is a transposition, then there exist elements
i, j ∈ {1, . . . , n} such that i < j and τ = (i j). Thus,

τ (i) − τ (j) j−i


σ(τ ) = = = −1
i−j i−j
holds for each transposition τ . In consequence, if π ∈ Sn is the composition of k ∈ N
transpositions, then
Th. 3.11(b)
σ(π) = (−1)k = sgn(π),
proving (a).
3 SYMMETRIC GROUPS 45

(b): Using (3.14) together with the homomorphism property of sgn given by Th. 3.11(b),
if suffices to show that
sgn(γ) = (−1)k−1 (3.16)
holds for each cycle γ := (i1 . . . ik ) ∈ Sn , where i1 , . . . , ik are distinct elements of
{1, . . . , n}, k ∈ N. According to (3.8), we have
γ = (i1 . . . ik ) = (i1 i2 ) (i2 i3 ) · · · (ik−1 ik ),
showing γ to be the product of k − 1 transpositions, thereby proving (3.16) and the
proposition. 
Definition 3.13. Let n ∈ N, n ≥ 2, and let sgn : Sn −→ {1, −1} be the group
homomorphism defined in Th. 3.11(b) above. We call π ∈ Sn even if, and only if,
sgn(π) = 1; we call π odd if, and only if, sgn(π) = −1. The property of being even or
odd is called the parity of the permutation π. Moreover, we call
An := ker sgn = {π ∈ Sn : sgn(π) = 1}
the alternating group on {1, . . . , n}.
Proposition 3.14. Let n ∈ N, n ≥ 2.

(a) An is a normal subgroup of Sn and one has


= Im sgn = {1, −1} ∼
Sn /An ∼ = Z2 (3.17)
(where {1, −1} is considered with multiplication and Z2 = {0, 1} is considered with
addition modulo 2).
(b) For each transposition τ ∈ Sn , one has τ ∈ / An , Sn = (An τ ) ∪˙ An , where we recall
˙
that ∪ denotes a disjoint union and An τ denotes the coset {πτ : π ∈ An }. Moreover,
#An = #(An τ ) = (n!)/2.

Proof. (a): As the kernel of a homomorphism, An is a normal subgroup by [Phi19,


Ex. 4.25(a)] and, thus, (3.17) is immediate from the isomorphism theorem [Phi19, Th.
4.27(b)].
(b): Let τ ∈ Sn be a transposition. Since sgn(τ ) = (−1)1 = −1, τ ∈/ An . Moreover, if
π ∈ An , then sgn(πτ ) = sgn(π) sgn(τ ) = 1 · (−1) = −1, showing (An τ ) ∩ An = ∅. Let
π ∈ Sn \ An . Then
sgn(π) = −1 ⇒ sgn(πτ ) = 1 ⇒ πτ ∈ An ⇒ π = πτ τ ∈ An τ,
showing Sn = (An τ ) ∪ An . To prove #An = #(An τ ), we note the maps φ : An −→ An τ ,
φ(π) := πτ , φ−1 : An τ −→ An , φ−1 (π) := πτ , to be inverses of each other and, thus,
bijective. Moreover, as we know #Sn = n! from Prop. 3.6, we have n! = #Sn =
#An + #(An τ ) = 2 · (#An ), thereby completing the proof. 
4 MULTILINEAR MAPS AND DETERMINANTS 46

4 Multilinear Maps and Determinants


Employing the preparations of the previous section, we are now in a position to formulate
an ad hoc definition of the notion of determinant. However, it seems more instructive to
first study some more general related notions that embed determinants into a context
that is also of independent interest and importance. Determinants are actually rather
versatile objects: We will see below that, given a finite-dimensional vector space V over
a field F , they can be viewed as maps det : L(V, V ) −→ F , assigning numbers to linear
endomorphisms. However, they can also be viewed as functions det : M(n, F ) −→ F ,
2
assigning numbers to quadratic matrices, and also as polynomial functions det : F n −→
F of degree n in n2 variables. But the point of view, we will focus on first, is determinants
as (alternating) multilinear forms (i.e. F -valued maps) det : V n −→ F .
We start with a general introduction to multilinear maps, which have many important
applications, not only in Algebra, but also in Analysis, where they, e.g., occur in the
form of higher order total derivatives of maps from Rn to Rm (cf. [Phi16b, Sec. 4.6])
and when studying the integration of so-called differential forms (see, e.g., [For17, §19]
and [Kön04, Sec. 13.1]).

4.1 Multilinear Maps


Definition 4.1. Let V and W be vector spaces over the field F , α ∈ N. We call a map
L : V α −→ W (4.1)
multilinear (more precisely, α times linear, bilinear for α = 2) if, and only if, it is linear
in each component, i.e., for each x1 , . . . , xi−1 , xi+1 , . . . , xα , v, w ∈ V , i ∈ {1, . . . , α} and
each λ, µ ∈ F :
L(x1 , . . . , xi−1 , λv + µw, xi+1 , . . . , xα )
= λ L(x1 , . . . , xi−1 , v, xi+1 , . . . , xα ) + µ L(x1 , . . . , xi−1 , w, xi+1 , . . . , xα ), (4.2)
where we note that, here, and in the following, the superscripts merely denote upper
indices and not exponentiation. We denote the set of all α times linear maps from V α
into W by Lα (V, W ). We also set L0 (V, W ) := W . In extension of Def. 2.2(a), we also
call L ∈ Lα (V, F ) a multilinear form, a bilinear form for α = 2.
Remark 4.2. In the situation of Def. 4.1, each Lα (V, W ), α ∈ N0 , constitutes a vector
space over F : It is a subspace of the vector space over F of all functions from V α into W ,
since, clearly, if K, L : V α −→ W are both α times linear and λ, µ ∈ F , then λK + µL
is also α times linear.

4 MULTILINEAR MAPS AND DETERMINANTS 47

The following Th. 4.3 is in generalization of [Phi19, Th. 6.6].

Theorem 4.3. Let V and W be vector spaces over the field F , α ∈ N. Moreover, let
B be a basis of V . Then each α times linear map L ∈ Lα (V, W ) is uniquely determined
by its values on B α : More precisely, if (wb )b∈B α is a family in W , and, for each v ∈ V ,
Bv and cv : Bv −→ F \ {0} are as in [Phi19, Th. 5.19] (providing the coordinates of v
with respect to B in the usual way), then the map
 
X X
L : V α −→ W, L(v 1 , . . . , v α ) = L  cv1 (b1 ) b1 , . . . , cvα (bα ) bα 
b1 ∈Bv1 bα ∈Bvα
X
:= cv1 (b1 ) · · · cvα (bα ) w(b1 ,...,bα ) , (4.3)
(b1 ,...,bα )∈Bv1 ×···×Bvα

is α times linear, and L̃ ∈ Lα (V, W ) with

∀ L̃(b) = wb , (4.4)
b∈B α

implies L = L̃.

Proof. Exercise: Apart from the more elaborate notation, everything works as in the
proof of [Phi19, Th. 6.6]. 

The following Th. 4.4 is in generalization of [Phi19, Th. 6.19].

Theorem 4.4. Let V and W be vector spaces over the field F , let BV and BW be bases
of V and W , respectively, let α ∈ N. Given b1 , . . . , bα ∈ BV and b ∈ BW , and using Th.
4.3, define maps Lb1 ,...,bα ,b ∈ Lα (V, W ) by letting
(
b for (b̃1 , . . . , b̃α ) = (b1 , . . . , bα ),
Lb1 ,...,bα ,b (b̃1 , . . . , b̃α ) := (4.5)
0 otherwise.

Let B := {Lb1 ,...,bα ,b : (b1 , . . . , bα ) ∈ (BV )α , b ∈ BW }.

(a) B is linearly independent.

(b) If V is finite-dimensional, dim V = n ∈ N, BV = {b1 , . . . , bn }, then B constitutes a


basis for Lα (V, W ). If, in addition, dim W = m ∈ N, BW = {w1 , . . . , wm }, then we
can write
dim Lα (V, W ) = (dim V )α · dim W = nα · m. (4.6)
4 MULTILINEAR MAPS AND DETERMINANTS 48

(c) If dim V = ∞ and dim W ≥ 1, then hBi ( Lα (V, W ) and, in particular, B is not a
basis of Lα (V, W ).

Proof. (a): We verify that the elements of B are linearly independent: Let M, N ∈ N.
Let (b11 , . . . , bα1 ), . . . , (b1N , . . . , bαN ) ∈ (BV )α be distinct and let w1 , . . . , wM ∈ BW be
distinct as well. Assume λlk ∈ F to be such that
M X
X N
L := λlk Lb1k ,...,bαk ,wl = 0.
l=1 k=1

Let k̄ ∈ {1, . . . , N }. Then


X N
M X M
X
0= L(b1k̄ , . . . , bαk̄ ) = λlk Lb1k ,...,bαk ,wl (b1k̄ , . . . , bαk̄ ) = λlk̄ wl
l=1 k=1 l=1

implies λ1k̄ = · · · = λM k̄ = 0 due to the linear independence of the wl ∈ BW . As this


holds for each k̄ ∈ {1, . . . , N }, we have established the linear independence of B.
(b): According to (a), it remains to show hBi = Lα (V, W ). Let L ∈ Lα (V, W )
and (i1 , . . . , iα ) ∈ {1, P . . . , n}α . Then there exists a finite set B(i1 ,...,iα ) ⊆ BW such
that L(bi1 , . . . , biα ) = w∈B(i1 ,...,iα ) λw w with λw ∈ F . Now let w1 , . . . , wM , M ∈ N,
S
be an enumeration of the finite set I∈{1,...,n}α BI . Then there exist λj,(i1 ,...,iα ) ∈ F ,
(j, (i1 , . . . , iα )) ∈ {1, . . . , M } × {1, . . . , n}α , such that
M
X
i1 iα
∀ L(b , . . . , b ) = λj,(i1 ,...,iα ) wj .
(i1 ,...,iα )∈{1,...,n}α
j=1
PM P
Letting L̃ := j=1 (i1 ,...,iα )∈{1,...,n}α λj,(i1 ,...,iα ) Lbi1 ,...,biα ,wj , we claim L̃ = L. Indeed,
M
X X
j1 jα
∀ L̃(b , . . . , b ) = λj,(i1 ,...,iα ) Lbi1 ,...,biα ,wj (bj1 , . . . , bjα )
(j1 ,...,jα )∈{1,...,n}α
j=1 (i1 ,...,iα )∈{1,...,n}α
M
X X
= λj,(i1 ,...,iα ) δ(i1 ,...,iα ),(j1 ,...,jα ) wj
j=1 (i1 ,...,iα )∈{1,...,n}α
M
X
= λj,(j1 ,...,jα ) wj = L(bj1 , . . . , bjα ),
j=1

proving L̃ = L by Th. 4.3. Since L ∈ hBi, the proof of (b) is complete.


(c): As dim W ≥ 1, there exists w ∈ BW . If L ∈ hBi, then {b ∈ (BV )α : L(b) 6= 0}
is finite. Thus, if BV is infinite, then the map L ∈ Lα (V, W ) with L(b) := w for each
b ∈ (BV )α is not in hBi, proving (c). 
4 MULTILINEAR MAPS AND DETERMINANTS 49

4.2 Alternating Multilinear Maps and Determinants


Definition 4.5. Let V and W be vector spaces over the field F , α ∈ N. Then A ∈
Lα (V, W ) is called alternating if, and only if,
 
i j
1
∀α α ∃ v =v ⇒ A(v 1 , . . . , v α ) = 0. (4.7)
(v ,...,v )∈V i,j∈{1,...,α}, i6=j

Moreover, define the sets

Alt0 (V, W ) := W, Altα (V, W ) := A ∈ Lα (V, W ) : A alternating




(note that this immediately yields Alt1 (V, W ) = L(V, W )).

Remark 4.6. Let V and W be vector spaces over the field F , α ∈ N. Then Altα (V, W )
is a vector subspace of Lα (V, W ): Indeed, 0 ∈ Altα (V, W ) and, if λ ∈ F and A, B ∈
Lα (V, W ) satisfy (4.7), then λA and A + B satisfy (4.7) as well.
α
Notation 4.7. Let V, W be a sets and α ∈ N. Then, for each f ∈ F(V α , W ) = W V
and each permutation π ∈ Sα , define

(πf ) : V α −→ W, (πf )(v1 , . . . , vα ) := f (vπ(1) , . . . , vπ(α) ).

Lemma 4.8. Let V, W be a sets and α ∈ N. Then

∀ ∀ (π1 π2 )f = π1 (π2 f ). (4.8)


π1 ,π2 ∈Sα f ∈F (V α ,W )

Proof. For each (v1 , . . . , vα ) ∈ V α , let (w1 , . . . , wα ) := (vπ1 (1) , . . . , vπ1 (α) ) ∈ V α . Then,
for each i ∈ {1, . . . , α}, we have wi = vπ1 (i) and wπ2 (i) = vπ1 (π2 (i)) . Thus, we compute

(π1 π2 )f (v1 , . . . , vα ) = f (v(π1 π2 )(1) , . . . , v(π1 π2 )(α) ) = f (vπ1 (π2 (1)) , . . . , vπ1 (π2 (α)) )
= f (wπ2 (1) , . . . , wπ2 (α) ) = (π2 f )(w1 , . . . , wα )

= (π2 f )(vπ1 (1) , . . . , vπ1 (α) ) = π1 (π2 f ) (v1 , . . . , vα ),

thereby establishing (4.8). 

Proposition 4.9. Let V and W be vector spaces over the field F , α ∈ N. Then, given
A ∈ Lα (V, W ), the following statements are equivalent for char F 6= 2, where “(i) ⇒ (ii)”
also holds for char F = 2.

(i) A is alternating.
4 MULTILINEAR MAPS AND DETERMINANTS 50

(ii) For each permutation π ∈ Sα , one has

∀ A(v π(1) , . . . , v π(α) ) = sgn(π) A(v 1 , . . . , v α ). (4.9)


(v 1 ,...,v α )∈V α

Proof. “(i)⇒(ii)”: We first prove (4.9) for transpositions π = (i i+1) with i ∈ {1, . . . , α−
1}: Let v k ∈ V , k ∈ {1, . . . , α} \ {i, i + 1}, and define

B : V × V −→ W, B(v, w) := A(v 1 , . . . , v i−1 , v, w, v i+2 , . . . , v α ).

Then, as A is α times linear and alternating, B is bilinear and alternating. Thus, for
each v, w ∈ V ,

0 = B(v + w, v + w) = B(v, v) + B(v, w) + B(w, v) + B(w, w) = B(v, w) + B(w, v),

implying B(v, w) = −B(w, v), proving (4.9) for this case. For general  π ∈ Sα , (4.9)
now follows from Th. 3.7(b), Th. 3.11(b), and Lem. 4.8: Let T := (i i + 1) ∈ Sα : i ∈
{1, . . . , α − 1} . Then, given π ∈ Sα , Th. 3.7(b) implies the existence of π1 , . . . , πN ∈ T ,
N ∈ N, such that π = π1 · · · πN . Thus, for each (v 1 , . . . , v α ) ∈ V α ,
 
Lem. 4.8
A(v π(1) , . . . , v π(α) ) = (πA)(v 1 , . . . , v α ) = π1 . . . (πN A) (v 1 , . . . , v α )
Th. 3.11(b)
= (−1)N A(v 1 , . . . , v α ) = = sgn(π) A(v 1 , . . . , v α ),

proving (4.9).
“(ii)⇒(i)”: Let char F = 6 2. Let (v 1 , . . . , v α ) ∈ V α and suppose i, j ∈ {1, . . . , α} are
such that i 6= j as well as v i = v j . Then
(ii)
A(v 1 , . . . , v α ) = sgn(ij)A(v 1 , . . . , v α ) = −A(v 1 , . . . , v α ).

Thus, 2A(v 1 , . . . , v α ) = 0 and 2 6= 0 implies A(v 1 , . . . , v α ) = 0. 

The following Ex. 4.10 shows that “(i) ⇐ (ii)” can not be expected to hold in Prop. 4.9
for char F = 2:

Example 4.10. Let F := Z2 = {0, 1}, V := F . Consider the bilinear map A : V 2 −→


F , A(λ, µ) := λµ. Then A is not alternating, since A(1, 1) = 1 · 1 = 1 6= 0. However,
(4.9) does hold for A (due to 1 = −1 ∈ F ): A(1, 1) = 1 = −1 = −A(1, 1) (and
0 = A(0, 0) = −A(0, 0) = A(0, 1) = −A(1, 0)).

Proposition 4.11. Let V and W be vector spaces over the field F , α ∈ N. Then, given
A ∈ Lα (V, W ), the following statements are equivalent:
4 MULTILINEAR MAPS AND DETERMINANTS 51

(i) A is alternating.

(ii) The implication in (4.7) holds whenever j = i + 1 (i.e. whenever two juxtaposed
arguments are identical).

(iii) If the family (v 1 , . . . , v α ) in V is linearly dependent, then A(v 1 , . . . , v α ) = 0.

Proof. Exercise. 

Definition 4.12. Let F be a field and V := F α , α ∈ N. Then det ∈ Altα (V, F ) is called
a determinant if, and only if,
det(e1 , . . . , eα ) = 1,
where e1 , . . . , eα denote the standard basis vectors of V .

Definition 4.13. Let V be a vector space over the field F , α ∈ N. We define the map
Λα : (V ′ )α −→ Altα (V, F ), called outer product or wedge product of linear forms, as
follows: X
Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) := sgn(π) ωπ(1) (v 1 ) · · · ωπ(α) (v α ) (4.10)
π∈Sα

(cf. Th. 4.15 below). Given linear forms ω1 , . . . , ωα ∈ V ′ , it is common to also use the
notation
ω1 ∧ · · · ∧ ωα := Λα (ω1 , . . . , ωα ).

Lemma 4.14. Let V be a vector space over the field F , α ∈ N. Moreover, let Λα denote
the wedge product of Def. 4.13. Then one has
X
∀ ′ 1 ∀α Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) := sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) ).
ω1 ,...,ωα ∈V v ,...,v ∈V
π∈Sα
(4.11)

Proof. Using the commutativity of multiplication in F , the bijectivity of permutations,


as well as sgn(π) = sgn(π −1 ), we obtain
X
Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) = sgn(π) ωπ(1) (v 1 ) · · · ωπ(α) (v α )
π∈Sα
−1 (1) −1 (α)
X
= sgn(π) ω1 (v π ) · · · ωα (v π )
π∈Sα
X
= sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) ),
π∈Sα

proving (4.11). 
4 MULTILINEAR MAPS AND DETERMINANTS 52

Theorem 4.15. Let V be a vector space over the field F , α ∈ N. Moreover, let Λα
denote the wedge product of Def. 4.13.

(a) Λα is well-defined, i.e. it does, indeed, map (V ′ )α into Altα (V, F ).


(b) Λα ∈ Altα V ′ , Altα (V, F ) .


Proof. (a): Let ω1 , . . . , ωα ∈ V ′ and A := Λα (ω1 , . . . , ωα ). First, we show A to be α


times linear: To this end, let λ, µ ∈ F and x, v 1 , . . . , v α ∈ V . Then
A(v 1 , . . . , v i−1 , λv i + µx, v i+1 , . . . , v α )
X
sgn(π) ωπ(1) (v 1 ) · · · ωπ(i−1) (v i−1 ) λωπ(i) (v i ) + µωπ(i) (x) ωπ(i+1) (v i+1 ) . . . ωπ(α) (v α )

=
π∈Sα
X
=λ sgn(π) ωπ(1) (v 1 ) · · · ωπ(i−1) (v i−1 )ωπ(i) (v i )ωπ(i+1) (v i+1 ) . . . ωπ(α) (v α )
π∈Sα
X
+µ sgn(π) ωπ(1) (v 1 ) · · · ωπ(i−1) (v i−1 )ωπ(i) (x)ωπ(i+1) (v i+1 ) . . . ωπ(α) (v α )
π∈Sα

= λA(v , . . . , v i−1 , v i , v i+1 , . . . , v α ) + µA(v 1 , . . . , v i−1 , x, v i+1 , . . . , v α ),


1

proving A ∈ Lα (V, F ). It remains to show A is alternating. To this end, let i, j ∈


{1, . . . , α} with i < j. Then, according to Prop. 3.14(b). τ := (i j) ∈ / Aα and Sα =
˙
(Aα τ ) ∪ Aα . If k ∈ {1, . . . , α} \ {i, j}, then, for each π ∈ Sα , (πτ )(k) = π(k). Thus, if
(v 1 , . . . , v α ) ∈ V α is such that v i = v j , then
(4.11) X
A(v 1 , . . . , v α ) = sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) )
π∈Sα
X 
= sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) ) + sgn(πτ ) ω1 (v (πτ )(1) ) · · · ωα (v (πτ )(α) )
π∈Aα
X 
π(1) π(α) π(1) π(α)
= sgn(π) ω1 (v ) · · · ωα (v ) − sgn(π) ω1 (v ) · · · ωα (v ) = 0,
π∈Aα

showing A to be alternating.
(b) follows from (a): According to (a), Λα maps (V ′′ )α into Altα (V ′ , F ). Thus, if
Φ : V −→ V ′′ is the canonical embedding, then, for each v 1 , . . . , v α ∈ V α and each
ω1 , . . . , ωα ∈ V ′ ,
X
Λα (Φv 1 , . . . , Φv α )(ω1 , . . . , ωα ) = sgn(π) (Φv π(1) )(ω1 ) · · · (Φv π(α) )(ωα )
π∈Sα
X
= sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) )
π∈Sα
(4.11)
= Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ),
4 MULTILINEAR MAPS AND DETERMINANTS 53

thereby completing the proof. 


Corollary 4.16. Let F be a field and V := F α , α ∈ N. Given the standard basis
{e1 , . . . , eα } of V , we define a map det : V αP−→ F as follows: For each (v 1 , . . . , v α ) ∈
V α such that, for each j ∈ {1, . . . , α}, v j = αi=1 aji ei with (aji ) ∈ M(α, F ), let
X
det(v 1 , . . . , v α ) := sgn(π) a1π(1) · · · aαπ(α) . (4.12)
π∈Sα

Then det is a determinant according to Def. 4.12. Moreover,


X
det(v 1 , . . . , v α ) = sgn(π) aπ(1)1 · · · aπ(α)α (4.13)
π∈Sα

also holds.

Proof. For each i ∈ {1, . . . , α}, let ωi := πei : V −→ F be the projection onto the
coordinate with respect to ei according to Ex. 2.1(a). Then, if v j is as above, we have
ωi (v j ) = aji . Thus,
(4.10) X (4.12)
Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) = sgn(π) ωπ(1) (v 1 ) · · · ωπ(α) (v α ) = det(v 1 , . . . , v α ),
π∈Sα

showing det ∈ Altα (V, F ) by Th. 4.15(a). Then we also have


(4.11) X
det(v 1 , . . . , v α ) = Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) = sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) )
π∈Sα
X
= sgn(π) aπ(1)1 · · · aπ(α)α ,
π∈Sα

proving (4.13). Moreover, for (v 1 , . . . , v α ) = (e1 , . . . , eα ), (aji ) = (δji ) holds. Thus,


X
det(e1 , . . . , eα ) = sgn(π) δ1π(1) · · · δαπ(α) = sgn(Id) = 1,
π∈Sα

which completes the proof. 

According to Th. 4.17(b) below, the map defined by (4.12) is the only determinant in
Altα (F α , F ).
Theorem 4.17. Let V and W be vector spaces over the field F , let BV and BW be bases
of V and W , respectively, let α ∈ N. Moreover, as in Cor. 2.4, let B ′ := {ωb : b ∈ BV },
where  
∀ ωb ∈ V ′ , ωb (a) := δba . (4.14)
(b,a)∈BV ×BV
4 MULTILINEAR MAPS AND DETERMINANTS 54

In addition, assume < to be a strict total order on BV (for dim V = ∞, the order on
BV exists due to the axiom of choice, cf. [Phi19, Th. A.52(iv)]) and define
(BV )αord := (b1 , . . . , bα ) ∈ (BV )α : b1 < · · · < bα .


Given (b1 , . . . , bα ) ∈ (BV )αord and y ∈ BW , and using Th. 4.15(a), define maps
Ab1 ,...,bα ,y ∈ Altα (V, W ), Ab1 ,...,bα ,y := Λα (ωb1 , . . . , ωbα ) y. (4.15)
: y ∈ BW , (b1 , . . . , bα ) ∈ (BV )αord .

Let B := Ab1 ,...,bα ,y

(a) B is linearly independent.


(b) If V is finite-dimensional, dim V = n ∈ N, BV = {b1 < · · · < bn }, then B consti-
tutes a basis for Altα (V, W ) (note B = ∅, dim Altα (V, W ) = 0 for α > n). If, in
addition, dim W = m ∈ N, then we can write
( 
n
α α
· m for α ≤ n,
dim Alt (V, W ) = (4.16)
0 for α > n.

In particular, dim Altα (F α , F ) = 1, showing that the map det ∈ Altα (F α , F ) is


uniquely determined by Def. 4.12 and given by (4.12).
(c) If V is infinite-dimensional and dim W ≥ 1, then B is not a basis for Altα (V, W ).

Proof. (a): We verify that the elements of B are linearly independent: Note that
∀ α
∀ (bπ(1) , . . . , bπ(α) ) ∈
/ (BV )αord . (4.17)
(b1 ,...,bα )∈(B V )ord Id6=π∈Sα

Thus, if
(b1 , . . . , bα ), (c1 , . . . , cα ) ∈ (BV )αord ,
then
X
Λα (ωb1 , . . . , ωbα )(c1 , . . . , cα ) = sgn(π) ωbπ(1) (c1 ) · · · ωbπ(α) (cα )
π∈Sα
(
(4.17),(4.14) 1 for (b1 , . . . , bα ) = (c1 , . . . , cα ),
= (4.18)
0 otherwise.

Now let M, N ∈ N, let (b11 , . . . , bα1 ), . . . , (b1N , . . . , bαN ) ∈ (BV )αord be distinct, and let
y1 , . . . , yM ∈ BW be distinct as well. Assume λlk ∈ F to be such that
M X
X N
A := λlk Ab1k ,...,bαk ,yl = 0.
l=1 k=1
4 MULTILINEAR MAPS AND DETERMINANTS 55

Let k̄ ∈ {1, . . . , N }. Then


N
M X M
X (4.18) X
0 = A(b1k̄ , . . . , bαk̄ ) = λlk Λα (ωb1k , . . . , ωbαk )(b1k̄ , . . . , bαk̄ ) yl = λlk̄ yl
l=1 k=1 l=1

implies λ1k̄ = · · · = λM k̄ = 0 due to the linear independence of the yl ∈ BW . As this


holds for each k̄ ∈ {1, . . . , N }, we have established the linear independence of B.
(b): Note that
 
(4.17) [Phi16a, Prop. 5.18(b)] n
#(BV )αord =

# I ⊆ BV : #I = α = .
α

According to (a), it remains to show hBi = Altα (V, W ). Let A ∈ Altα (V, W ) and
(bi1 , . . . , biα ) ∈ (B α
V )ord . Then there exists a finite set B(i1 ,...,iα ) ⊆ BW such that
i1 iα
P
A(b , . . . , b ) = y∈B(i1 ,...,iα ) λy y with λy ∈ F . Now let y1 , . . . , yM , M ∈ N, be an
enumeration of the finite set
[
B(i1 ,...,iα ) .
(i1 ,...,iα )∈{1,...,n}α : (bi1 ,...,biα )∈(BV )α
ord

Then, for each (j, (bi1 , . . . , biα )) ∈ {1, . . . , M } × (BV )αord , there exists λj,(i1 ,...,iα ) ∈ F , such
that
M
X
i1 iα
i
∀ α
A(b , . . . , b ) = λj,(i1 ,...,iα ) yj .
(b 1 ,...,biα )∈(BV )ord
j=1
PM P
Letting à := j=1 (bi1 ,...,biα )∈(BV )α λj,(i1 ,...,iα ) Abi1 ,...,biα ,yj , we claim à = A. Indeed, for
ord
each (bj1 , . . . , bjα ) ∈ (BV )αord ,
M
X X
j1 jα
Ã(b , . . . , b ) = λj,(i1 ,...,iα ) Λα (ωbi1 , . . . , ωbiα )(bj1 , . . . , bjα ) yj
j=1 (bi1 ,...,biα )∈(BV )α
ord
M
(4.18) X X
= λj,(i1 ,...,iα ) δ(i1 ,...,iα ),(j1 ,...,jα ) yj
j=1 (bi1 ,...,biα )∈(BV )α
ord
M
X
= λj,(j1 ,...,jα ) yj = A(bj1 , . . . , bjα ),
j=1

proving à = A by (4.9) and Th. 4.3. Since à ∈ hBi, the proof of (b) is complete.
(c): Exercise. 
4 MULTILINEAR MAPS AND DETERMINANTS 56

The following Th. 4.18 compiles some additional rules of importance for alternating
multilinear maps:

Theorem 4.18. Let V and W be vector spaces over the field F , α ∈ N. The following
rules hold for each A ∈ Altα (V, W ):

(a) The value of A remains unchanged if one argument is replaced by the sum of that
argument and a linear combination of the other arguments, i.e., if λ1 , . . . , λα ∈ F ,
v 1 , . . . , v α ∈ V , and i ∈ {1, . . . , α}, then
 
α
X
1 α
 1 i−1 i
λj v j , v i+1 , . . . , v α 

A(v , . . . , v ) = A v , . . . , v , v +

.
j=1
j6=i

(b) If v 1 , . . . , v α ∈ V and aji ∈ F are such that


α
X
j
∀ w = aji v i ,
j∈{1,...,α}
i=1

then
!
X
1 α
A(w , . . . , w ) = sgn(π) a1π(1) · · · aαπ(α) A(v 1 , . . . , v α )
π∈Sα

= det(x1 , . . . , xα ) A(v 1 , . . . , v α ), (4.19)

where α
X
∀ xj := aji ei ∈ F α
j∈{1,...,α}
i=1
α
and {e1 , . . . , eα } is the standard basis of F .

(c) Suppose the family (v 1 , . . . , v α ) in V is linearly independent and such that there
exist w1 , . . . , wα ∈ h{v 1 , . . . , v α }i with A(w1 , . . . , wα ) 6= 0. Then A(v 1 , . . . , v α ) 6= 0
as well.
4 MULTILINEAR MAPS AND DETERMINANTS 57

Proof. (a): One computes, for each i, j ∈ {1, . . . , α} with i 6= j and λj 6= 0:

A(v 1 , . . . , v i + λj v j , . . . , v α )
A∈Lα (V,W )
= λ−1 1 i j j α
j A(v , . . . , v + λj v , . . . , λj v , . . . , v )
A∈Lα (V,W )
= λ−1 1 i j α
j A(v , . . . , v , . . . , λj v , . . . , v )
+λ−1 1 j j α
j A(v , . . . , λj v , . . . , λj v , . . . , v )
(4.7)
= λ−1 1 i j α
j A(v , . . . , v , . . . , λj v , . . . , v ) + 0
A∈Lα (V,W )
= A(v 1 , . . . , v α ).

The general case of (a) then follows via a simple induction.


(b): We calculate
α α
1 α A∈Lα (V,W ) X X
A(w , . . . , w ) = a1i1 · · · aαiα A(v i1 , . . . , v iα )
i1 =1 iα =1
(4.7) X
= a1π(1) · · · aαπ(α) A(v π(1) , . . . , v π(α) )
π∈Sα
!
(4.9) X
= sgn(π) a1π(1) · · · aαπ(α) A(v 1 , . . . , v α ),
π∈Sα

thereby proving the first equality in (4.19). The second equality in (4.19) is now an
immediate consequence of Cor. 4.16.
(c): Exercise. 

4.3 Determinants of Matrices and Linear Maps


In Def. 4.12, we defined a determinant as an alternating multilinear form on F α . In
the following, we will see that determinants can be particularly useful when they are
considered to be maps on quadratic matrices or maps on linear endomorphisms on vector
spaces of finite dimension. We begin by defining determinants on quadratic matrices:

Definition 4.19. Let F be a field and n ∈ N. Then the map


X
det : M(n, F ) −→ F, det(aji ) := sgn(π) a1π(1) · · · anπ(n) , (4.20)
π∈Sn

is called the determinant on M(n, F ).


4 MULTILINEAR MAPS AND DETERMINANTS 58

Notation 4.20. If F is a field, n ∈ N, and det : M(n, F ) −→ F is the determinant,


then, for (aji ) ∈ M(n, F ), one commonly uses the notation

a11 . . . a1n
.. .. .. := det(a ).

. . . ji (4.21)

an1 . . . ann

Notation 4.21. Let F be a field and n ∈ N. As in [Phi19, Rem. 7.4(b)], we denote the
columns and rows of a matrix A := (aji ) ∈ M(n, F ) as follows:
 
a1i
 .. 
ci := ci :=  .  , ri := riA := ai1 . . . ain .
A


i∈{1,...,n}
ani
Remark 4.22. Let F be a field and n ∈ N. If we consider the rows r1 , . . . , rn of the
matrix (aji ) ∈ M(n, F ) as elements of F n , then, by Cor. 4.16,
det(aji ) = det(r1 , . . . , rn ),
where the second det is the map det : (F n )n −→ F defined as in Cor. 4.16. As a further
consequence of Cor. 4.16, in combination with Th. 4.17(b), we see that det of Def. 4.19
is the unique form on M(n, F ) that is multilinear and alternating in the rows of the
matrix and that assigns the value 1 to the identity matrix.
Example 4.23. Let F be a field. We evaluate (4.20) explicitly for n = 1, 2, 3:

(a) For (a11 ) ∈ M(1, F ), we have


X
|a11 | = det(a11 ) = sgn(π) a1π(1) = sgn(Id) a11 = a11 .
π∈S1

(b) For (aji ) ∈ M(2, F ), we have



a11 a12 X
= det(aji ) = sgn(π) a1π(1) a2π(2)
a21 a22
π∈S2

= sgn(Id) a11 a22 + sgn(1 2) a12 a21 = a11 a22 − a12 a21 ,
i.e. the determinant is the product of the elements of the main diagonal minus the
product of the elements of the other diagonal:
+ −
a11 a12
a21 a22
− +
4 MULTILINEAR MAPS AND DETERMINANTS 59

(c) For (aji ) ∈ M(3, F ), we have



a11 a12 a13 X

a21 a22 a23 = det(aji ) = sgn(π) a1π(1) a2π(2) a3π(3)

a31 a32 a33 π∈S3

= sgn(Id) a11 a22 a33 + sgn(1 2) a12 a21 a33 + sgn(1 3) a13 a22 a31
+ sgn(2 3) a11 a23 a32 + sgn(1 2 3) a12 a23 a31 + sgn(1 3 2) a13 a21 a32

= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 + a13 a22 a31 + a11 a23 a32 .

To remember this formula, one can use the following tableau:

+ + + − − −
a11 a12 a13 a11 a12
a21 a22 a23 a21 a22
a31 a32 a33 a31 a32
− − − + + +

One writes the first two columns of the matrix once again on the right and then
takes the product of the first three diagonals in the direction of the main diagonal
with a positive sign and the first three diagonals in the other direction with a
negative sign.

We can now translate some of the results of Sec. 4.2 into results on determinants of
matrices:

Corollary 4.24. Let F be a field, n ∈ N, and A := (aji ) ∈ M(n, F ). Let r1 , . . . , rn


denote the rows of A and let c1 , . . . , cn denote the columns of A. Then the following
rules hold for the determinant:

(a) det(Idn ) = 1, where Idn denotes the identity matrix in M(n, F ).

(b) det(At ) = det(A).

(c) det is multilinear with regard to matrix rows as well as multilinear with regard to
matrix columns, i.e., for each v ∈ M(1, n, F ), w ∈ M(n, 1, F ), i ∈ {1, . . . , n}, and
4 MULTILINEAR MAPS AND DETERMINANTS 60

λ, µ ∈ F :
r1 r1
   
..  .. 
.  . 
 
 
 ri−1  ri−1 
   
det λ ri + µ v  = λ det(A) + µ det  v  ,
   
 ri+1  ri+1 
   
 ..   . 
 .. 
 . 
rn rn


det c1 . . . ci−1 λ ci + µ w ci+1 . . . cn

= λ det(A) + µ det c1 . . . ci−1 w ci+1 . . . cn .

(d) If λ ∈ F , then det(λA) = λn det(A).

(e) For each permutation π ∈ Sn , one has


 
rπ(1)
det  ...  = det cπ(1) . . . cπ(n) = sgn(π) det(A).
  

rπ(n)

In particular, switching rows i and j or columns i and j, where i, j ∈ {1, . . . , n},


i 6= j, changes the sign of the determinant, i.e.
   
r1 r1
 ..   .. 
. .
 ri   rj 
   
. .
 ..  = − det  ..  ,
det    
r  r 
 j  i
. .
. .  .. 
rn rn
 
det c1 . . . ci . . . cj . . . cn = − det c1 . . . cj . . . ci . . . cn .

(f ) The following statements are equivalent:

(i) det A = 0
(ii) The rows of A are linearly dependent.
(iii) The columns of A are linearly dependent.
4 MULTILINEAR MAPS AND DETERMINANTS 61

(g) Multiplication Rule: If B := (bji ) ∈ M(n, F ), then det(AB) = det(A) det(B).

(h) det(A) = 0 if, and only if, A is singular. If A is invertible, then


−1
det(A−1 ) = det(A) .

(i) The value of the determinant remains the same if one row of a matrix is replaced
by the sum of that row and a scalar multiple of another row. More generally, the
determinant remains the same if one row of a matrix is replaced by the sum of that
row and a linear combination of the other rows. The statement also remains true if
the word “row” is replaced by “column”. Thus, if λ1 , . . . , λn ∈ F and i ∈ {1, . . . , n},
then
r1
 
..
.
 
   
r1 r
 
 i−1 
 ..  P n
r + j=1 λj rj  ,
 
det(A) = det  .  = det  i j6=i 
rn 
 ri+1



 .
..


rn
 
n
X
 
det(A) = det(c1 , . . . , cn ) = det c1 , . . . , ci−1 , ci +
 λj cj , ci+1 , . . . , cn 
.
j=1
j6=i

Proof. As already observed in Rem. 4.22, det : M(n, F ) −→ F can be viewed as the
unique map det ∈ Altn (F n , F ) with det(e1 , . . . , en ) = 1, where one considers
 
r1
 .. 
A =  .  ∈ (F n )n :
rn

Since, for each j ∈ {1, . . . , n},


n
X

rj = aj1 . . . ajn = aji ei ,
i=1
 
a1j n
 ..  X
cj =  .  = aij ei ,
anj i=1
4 MULTILINEAR MAPS AND DETERMINANTS 62

we have
(4.20) X (4.12)
det(A) = sgn(π) a1π(1) · · · anπ(n) = det(r1 , . . . , rn )
π∈Sn
(4.13) X (4.12)
= sgn(π) aπ(1)1 · · · aπ(n)n = det(c1 , . . . , cn ). (4.22)
π∈Sn

(a): det(Idn ) = det(e1 , . . . , en ) = 1.


(b) is immediate from (4.22).
(c) is due to det ∈ Altn (F n , F ) and (4.22).
(d) is a consequence of (c).
(e) is due to det ∈ Altn (F n , F ), (4.22), and (4.9).
(f): Again, we use det ∈ Altn (F n , F ) and (4.22): If det(A) = 0, then det(Idn ) = 1 and
Th. 4.18(c) imply (ii) and (iii). Conversely, if (ii) or (iii) holds, then det(A) = 0 follows
from Prop. 4.11(iii).
(g): Let C := (cji ) := AB. Using Not. 4.21, we have
n n
!
X X
∀ rjC = rjA B = aji riB , rjA = aji ei
j∈{1,...,n}
i=1 i=1

and, thus,
(4.22) (4.19) (4.22)
det(AB) = det(r1C , . . . , rnC ) = det(r1A , . . . , rnA ) det(r1B , . . . , rnB ) = det(A) det(B).

(h): As a consequence of [Phi19, Th. 7.17(a)], A is singular if, and only if, the columns
of A are linearly dependent. Thus, det(A) = 0 if, and only if, A is singular due to (f).
Moreover, if A is invertible, then
(g)
1 = det(Idn ) = det(AA−1 ) = det(A) det(A−1 ).

(i) is due to det ∈ Altn (F n , F ), (4.22), and Th. 4.18(a). 

Theorem 4.25 (Block Matrices). Let F be a field. The determinant of so-called block
matrices over F , where one block is a zero matrix (all entries 0), can be computed as
the product of the determinants of the corresponding blocks. More precisely, if n, m ∈ N,
4 MULTILINEAR MAPS AND DETERMINANTS 63

then
a11 . . . a1n
. .. ..
.
. . . ∗


an1 . . . ann

= det(aji ) det(bji ). (4.23)

0 ... 0 b11 . . . b1m

. .. .. .. .. ..
.. . . . . .

0 ... 0 bm1 . . . bmm

Proof. Suppose (aji ) ∈ M(n + m, F ), where


(
bi−n,j−n for i, j > n,
aji = (4.24)
0 j > n and i ≤ n.

Via obvious embeddings, we can consider Sn ⊆ Sm+n and Tm := S{n+1,...,n+m} ⊆ Sm+n .


If π ∈ Sm+n , then there are precisely two possibilities: Either there exist ω ∈ Sn and
σ ∈ Tm such that π = ωσ, or there exists j ∈ {n + 1, . . . , m + n} such that i := π(j) ≤ n,
in which case ajπ(j) = 0. Thus,
X
det(aji ) = sgn(π) a1π(1) · · · an+m,π(n+m)
π∈Sn+m
X
= sgn(ω) sgn(σ) a1ω(1) · · · an,ω(n) an+1,σ(n+1) · · · an+m,σ(n+m)
(ω,σ)∈Sn ×Tm
! !
X X
= sgn(ω) a1ω(1) · · · anω(n) sgn(σ) an+1,σ(n+1) · · · an+m,σ(n+m)
ω∈Sn σ∈Tm
! !
X X
= sgn(π) a1π(1) · · · anπ(n) sgn(π) b1π(1) · · · bmπ(m)
π∈Sn π∈Sm

= det(aji )↾{1,...,n}×{1,...,n} det(bji ),

proving (4.23). 
Corollary 4.26. Let F be a field and n ∈ N. If (aji ) ∈ M(n, F ) is upper triangular or
lower triangular, then
n
Y
det(aji ) = akk .
k=1

Proof. For (aji ) upper triangular, the statement follows from Th. 4.25 via an obvious
induction on n ∈ N. If (aji ) is lower triangular, then the transpose (aji )t is upper
triangular and the statement the follows from Cor. 4.24(b). 
4 MULTILINEAR MAPS AND DETERMINANTS 64

Definition 4.27. Let F be a field, n ∈ N, n ≥ 2, A = (aji ) ∈ M(n, F ). For each


j, i ∈ {1, . . . , n}, let Mji ∈ M(n − 1, F ) be the (n − 1) × (n − 1) submatrix of A obtained
by deleting the jth row and the ith column of A – the Mji are called the minor matrices
of A; define
Aji := (−1)i+j det(Mji ), (4.25)
where the Aji are called cofactors of A and the det(Mji ) are called the minors of A. Let
à := (Aji )t denote the transpose of the matrix of cofactors of A, called the adjugate
matrix of A.
Lemma 4.28. Let F be a field, n ∈ N, n ≥ 2, A = (aji ) ∈ M(n, F ). For each
j, i ∈ {1, . . . , n}, let R(j, i) ∈ M(n, F ) be the matrix obtained from A by replacing the
j-th row with the standard (row) basis vector ei , and let C(j, i) ∈ M(n, F ) be the matrix
obtained from A by replacing the i-th column with the standard (column) basis vector ej ,
i.e.
 
a11 . . . a1,i−1 a1i a1,i+1 . . . a1n
 .. .. .. .. .. 
 . . . . . 
 
aj−1,1 . . . aj−1,i−1 aj−1,i aj−1,i+1 . . . aj−1,n 
 
R(j, i) :=  0 ... 0 1 0 ... 0  ,
a
 j+1,1 . . . aj+1,i−1 aj+1,i aj+1,i+1 . . . aj+1,n 

 .. .. .. .. .. 
 . . . . . 
an1 . . . an,i−1 ani an,i+1 . . . ann

a11 . . . a1,i−1 0 a1,i+1 . . . a1n


 
 .. .. .. .. .. 
 . . . . . 
aj−1,1 . . . aj−1,i−1 0 aj−1,i+1 . . . aj−1,n 
 
C(j, i) :=  aj1 . . . aj,i−1 1 aj,i+1 . . . aj,n  .
 
aj+1,1 . . . aj+1,i−1 0 aj+1,i+1 . . . aj+1,n 
 
 . .. .. .. .. 
 .. . . . . 
an1 . . . an,i−1 0 an,i+1 . . . ann
Then, we have  
∀ Aji = det R(j, i) = det C(j, i) . (4.26)
j,i∈{1,...,n}

Proof. Let j, i ∈ {1, . . . , n}, and let Mji denote the corresponding minor matrix of A
according to Def. 4.27. Then

 Cor. 4.24(e) i+j 1
0  Cor. 4.24(e) j+i 1

det R(j, i) = (−1) = det C(j, i) = (−1)
∗ Mji 0 Mji
(4.23) (4.25)
= (−1)i+j det(Mji ) = Aji ,
4 MULTILINEAR MAPS AND DETERMINANTS 65

thereby proving (4.26). 

Theorem 4.29. Let F be a field, n ∈ N, n ≥ 2, A = (aji ) ∈ M(n, F ). Moreover,


let à := (Aji )t be the adjugate matrix of A according to Def. 4.27. Then the following
holds:

(a) AÃ = ÃA = (det A) Idn .

(b) If det A 6= 0, then det à = (det A)n−1 .

(c) If det A 6= 0, then A−1 = (det A)−1 Ã.


Pn
(d) Laplace Expansion by Rows: det A = i=1 aji Aji (expansion with respect to the
jth row).
Pn
(e) Laplace Expansion by Columns: det A = j=1 aji Aji (expansion with respect to the
ith column).

Proof. (a): Let C := (cji ) := AÃ, D := (dji ) := ÃA. Also let R(j, i) and C(j, i) be as
in Lem. 4.28. Then, we compute, for each j, i ∈ {1, . . . , n},
 A
r1
 .. 
 . 
 A  (
n n ri−1 
X
t (4.26)
X  Cor. 4.24(c)  A det(A) for i = j,
cji = ajk Aki = ajk det R(i, k) =  rj  = 0
det  
k=1 k=1 rA  for i 6= j,
 i+1 
 . 
 .. 
rnA

n n
X (4.26) X
Atjk aki =

dji = aki det C(k, j)
k=1 k=1
(
Cor. 4.24(c) det(A) for i = j,
det cA . . . cA A
cA A

= 1 j−1 ci j+1 . . . cn =
0 for i 6= j,

proving (a).
(b): We obtain
(a)  Cor. 4.24(d)
det(A) det(Ã) = det(AÃ) = det (det A) Idn = (det A)n ,
4 MULTILINEAR MAPS AND DETERMINANTS 66

which, for det(A) 6= 0, implies det à = (det A)n−1 .


(c) is immediate from (a).
(d): From (a), we obtain
n
X n
X
det(A) = aji Atij = aji Aji ,
i=1 i=1

proving (d).
(e): From (a), we obtain
n
X n
X
det(A) = Atij aji = aji Aji ,
j=1 j=1

proving (e). 
Example 4.30. (a) We use Ex. 4.23(b) and Th. 4.29(d) to compute

1 2
D1 := :
3 4

From Ex. 4.23(b), we obtain

D1 = 1 · 4 − 2 · 3 = −2,

which we also obtain when expanding with respect to the first row according to Th.
4.29(d). Expanding with respect to the second row, we obtain

D1 = −3 · 2 + 4 · 1 = −2.

(b) We use Ex. 4.23(c) and Th. 4.29(e) to compute



1 2 3

D2 := 4 5 6 :
7 8 9

From Ex. 4.23(c), we obtain

D2 = 1·5·9+2·6·7+3·4·8−3·5·7−1·6·8−2·4·9 = 45+84+96−105−48−72 = 0.

Expanding with respect to the third column according to Th. 4.29(d), we obtain

4 5 1 2 1 2
D2 = 3·
−6·
+9·
= 3·(−3)−6·(−6)+9·(−3) = −9+36−27 = 0.
7 8 7 8 4 5
4 MULTILINEAR MAPS AND DETERMINANTS 67

Theorem 4.31 (Cramer’s Rule). Let F be a field and A := (aji ) ∈ M(n, F ), n ∈ N,


n ≥ 2. If b1 , . . . , bn ∈ F and det(A) 6= 0, then the linear system
n
X
∀ ajk xk = bj ,
j∈{1,...,n}
k=1
n
has a unique solution x ∈ F , which is given by
n
X
−1
∀ xj = (det A) Akj bk , (4.27)
j∈{1,...,n}
k=1

where Akj denote the cofactors of A according to Def. 4.27.

Proof. In matrix form, the linear system reads Ax = b, which, for det(A) 6= 0, is
equivalent to x = A−1 b, where A−1 = (det A)−1 à by Th. 4.29(c). Since à := (Aji )t , we
have n n
X X
−1 t −1
∀ xj = (det A) Ajk bk = (det A) Akj bk ,
j∈{1,...,n}
k=1 k=1
proving (4.27). 
Definition 4.32. Let n ∈ N. An element p = (p1 , . . . , pn ) ∈ (N0 )n is called a multi-
index; |p| := p1 + · · · + pn is called the degree of the multi-index. Let R be a ring with
unity. If x = (x1 , . . . , xn ) ∈ Rn and p = (p1 , . . . , pn ) is a multi-index, then we define
xp := xp11 xp22 · · · xpnn . (4.28)
Each function from Rn into R, x 7→ xp , is called a monomial function (in n variables).
A function P from Rn into R is called a polynomial function (in n variables) if, and only
if, it is a linear combination of monomial functions, i.e. if, and only if P has the form
X
P : Rn −→ R, P (x) = ap xp , k ∈ N0 , ap ∈ R (4.29)
|p|≤k

(if R is commutative, then our present definition of polynomial function is a special case
of the one given in Th. B.23 of the Appendix, also cf. Rem. B.24). If F := R is an
infinite field, then, as a consequence of Th. B.23(c), the representation of P in (4.29)
in terms of the monomial functions x 7→ xp is unique and, in this case, we also define,
for each polynomial function given in the form of (4.29), its degree3 , denoted deg(P ),
3
For example, if R = Z2 = {0, 1} and f, g : R −→ R, f (x) := x, g(x) := x2 , then f (0) = g(0) = 0,
f (1) = g(1) = 1, showing f = g and the nonuniqueness of the representation in (4.29) for R = Z2 .
However, it is still possible to generalize our degree definition for polynomial functions to situations,
where the representation in (4.29) is not unique: If P : Rn −→ R is a polynomial function, then for
each representation of P in the form (4.29), one can define the degree (of the representation) as in the
case, where R is an infinite field, then defining deg P to be the minimum of the representation degrees.
4 MULTILINEAR MAPS AND DETERMINANTS 68

to be the largest number d ≤ k such that there is p ∈ (N0 )n with |p| = d and ap 6= 0.
If all ap = 0, i.e. if P ≡ 0, then P is the zero polynomial function and its degree is
defined to be −∞ (some authors use −1 instead); in particular, d is then the degree of
each monomial function x 7→ xp with |p| = d. If F is a field (not necessarily infinite),
then we also define a rational function as a quotient of two polynomial functions: If
P, Q : F n −→ F are polynomials, then
P (x)
(P/Q) : F n \ Q−1 {0} −→ F, (P/Q)(x) := , (4.30)
Q(x)

is called a rational function (in n variables).


Remark 4.33. Let F be a field and n ∈ N. Comparing Def. 4.19 with Def. 4.32, we
observe that
2
X
det : F n −→ F, det(aji ) := sgn(π) a1π(1) · · · anπ(n) ,
π∈Sn

is a polynomial function of degree n (and in n2 variables). According to Th. 4.29(c) and


(4.25), we have

(−1)i+j det(Mij )

−1
inv : GLn (F ) −→ GLn (F ), inv(aji ) := (aji ) = ,
det(aji )

where the Mji ∈ M(n − 1, F ) denote the minor matrices of (aji ). Thus, for each
(k, l) ∈ {1, . . . , n}2 , the component function

(−1)k+l det(Mlk )
invkl : GLn (F ) −→ F, invkl (aji ) = ,
det(aji )

is a rational function invkl = Pkl / det, where Pkl is a polynomial of degree n − 1.


Remark 4.34 (Computation of Determinants). For n ≤ 3, one may well use the for-
mulas of Ex. 4.23 to compute determinants. However, the larger n becomes, the less
advisable is the use of formula (4.20) to compute det(aji ), as this requires n · n! multi-
plications (note that n! grows faster for n → ∞ than nk for each k ∈ N). Sometimes
Th. 4.29(d),(e) can help, if one can expand with respect to a row or column, where
many entries are 0. However, for a generic large n × n matrix A, it is usually the best
strategy4 to transform A to triangular form (e.g. by using Gaussian elimination accord-
ing to [Phi19, Alg. 8.17]) and then use Cor. 4.26 (the number of multiplications then
4
If A has a special structure (e.g. many zeros) and/or the field F has a special structure (e.g.
F ∈ {R, C}), then there might well be more efficient methods to compute det(A), studied in the fields
Numerical Analysis and Numerical Linear Algebra.
4 MULTILINEAR MAPS AND DETERMINANTS 69

merely grows proportional to n3 ): We know from Cor. 4.24(e),(i) that the operations
of the Gaussian elimination algorithm do not change the determinant, except for sign
changes, when switching rows. For the same reasons, one should use Gaussian elimi-
nation (or even more efficient algorithms adapted to special situations) rather than the
neat-looking explicit formula of Cramer’s rule of Th. 4.31 to solve large linear systems
and rather than Th. 4.29(c) to compute inverses of large matrices.
Theorem 4.35 (Vandermonde Determinant). Let F be a field and λ0 , λ1 , . . . , λn ∈ F ,
n ∈ N. Moreover, let
 
1 λ0 . . . λn0
1 λ1 . . . λn 
1
V :=  .. ..  ∈ M(n + 1, F ), (4.31)

. .
1 λn . . . λnn
which is known as the corresponding Vandermonde matrix. Then its determinant, the
so-called Vandermonde determinant, is given by
n
Y
det(V ) = (λk − λl ). (4.32)
k,l=0
k>l

Proof. The proof can be conducted by induction with respect to n: For n = 1, we have
1
1 λ0 Y
det(V ) = = λ1 − λ0 = (λk − λl ),
1 λ1
k,l=0
k>l

showing (4.32) holds for n = 1. Now let n > 1. Using Cor. 4.24(i), we add the (−λ0 )-fold
of the nth column to the (n + 1)st column, we obtain in the (n + 1)st column
 
0
λn − λn−1 λ0 
 1 1
.. .


 . 
n n−1
λn − λn λ0
Next, one adds the (−λ0 )-fold of the (n − 1)st column to the nth column, and, succes-
sively, the (−λ0 )-fold of the mth column to the (m + 1)st column. One finishes, in the
nth step, by adding the (−λ0 )-fold of the first column to the second column, obtaining

1 λ0 . . . λn 1 0 0 ... 0
0
1 λ1 . . . λn 1 λ1 − λ0 λ2 − λ1 λ0 . . . λn − λn−1 λ0
1 1 1 1
det(V ) = .. .. = .. .. .. .. .. .

. . . . . . .
n 2 n n−1

1 λn . . . λn 1 λn − λ0 λn − λn λ0 . . . λn − λn λ0
4 MULTILINEAR MAPS AND DETERMINANTS 70

Applying (4.23), then yields



λ1 − λ0 λ2 − λ1 λ0 . . . λn − λn−1 λ0
1 1 1
.. . . .

det(V ) = 1 · . .. .. .. .

2 n n−1

λn − λ0 λn − λn λ0 . . . λn − λn λ0

We now use multilinearity to factor out, for each k ∈ {1, . . . , n}, (λk − λ0 ) from the kth
row, arriving at
n−1
n
1 λ 1 . . . λ 1
(λk − λ0 ) ... ... .. .. ,
Y
det(V ) =

. .
n−1

k=1 1 λn . . . λn
which is precisely the Vandermonde determinant of the n − 1 numbers λ1 , . . . , λn . Using
the induction hypothesis, we obtain
n
Y n
Y n
Y
det(V ) = (λk − λ0 ) (λk − λl ) = (λk − λl ),
k=1 k,l=1 k,l=0
k>l k>l

completing the induction proof of (4.32). 

Remark and Definition 4.36 (Determinant on Linear Endomorphisms). Let V be a


finite-dimensional vector space over the field F , n ∈ N, dim V = n, and A ∈ L(V, V ).
(1)
Moreover, let B1 = (v1 , . . . , vn ) and B2 = (w1 , . . . , wn ) be ordered bases of V . If (aji ) ∈
(2)
M(n, F ) is the matrix corresponding to A with respect to B1 and (aji ) ∈ M(n, F ) is
the matrix corresponding to A with respect to B2 , then we know from [Phi19, Th. 7.14]
that there exists (cji ) ∈ GLn (F ) such that
(2) (1)
(aji ) = (cji )−1 (aji )(cji )

(namely (cji ) such that, for each i ∈ {1, . . . , n}, wi = nj=1 cji vj ). Thus,
P

(2) (1) (1)


det(aji ) = det(cji )−1 det(aji ) det(cji ) = det(aji )


and, in consequence,
(1)
det : L(V, V ) −→ F, det(A) := det(aji ),

is well-defined.

Corollary 4.37. Let V be a finite-dimensional vector space over the field F , n ∈ N,


dim V = n.
5 DIRECT SUMS AND PROJECTIONS 71

(a) If A, B ∈ L(V, V ), then det(AB) = det(A) det(B).

(b) If A ∈ L(V, V ), then det(A) = 0 if, and only if, A is not bijective. If A is bijective,
then det(A−1 ) = (det(A))−1 .

(c) If A ∈ L(V, V ) and λ ∈ F , then det(λA) = λn det(A).

Proof. Let BV be an ordered basis of V .


(a): Let (aji ), (bji ) ∈ M(n, F ) be the matrices corresponding to A, B with respect to
BV . Then
[Phi19, Th. 7.10(a)]  Cor. 4.24(g)
det(AB) = det (aji ) (bji ) = det(aji ) det(bji ) = det(A) det(B).

(b): Since (aji ) (with (aji ) as before) is singular if, and only if, A is not bijective, (b) is
immediate from Cor. 4.24(h).
(c): If (aji ) is as before, then
 Cor. 4.24(d) n
det(λA) = det λ (aji ) = λ det(aji ) = λn det(A),

thereby completing the proof. 

5 Direct Sums and Projections


In [Phi19, Def. 5.10], we defined sums of arbitrary (finite or infinite) families of subspaces.
In [Phi19, Def. 5.28], we defined the direct sum for two subspaces. We will now extend
the notion of direct sum to arbitrary families of subspaces:

Definition 5.1. Let V be a vector space over the field F , let I be an index set and let
(Ui )i∈I be a family of subspaces of V . We say that V is the direct sum of the family of
subspaces (Ui )i∈I if, and only if, the following two conditions hold:
P
(i) V = i∈I Ui .

(ii) For each finite J ⊆ I and each family (uj )j∈J in V such that uj ∈ Uj for each
j ∈ J, one has X
0= uj ⇒ ∀ uj = 0.
j∈J
j∈J

L
If V is the direct sum of the Ui , then we write V = i∈I Ui .
5 DIRECT SUMS AND PROJECTIONS 72

Proposition 5.2. Let V be a vector space over the field F , let I be an index set and let
(Ui )i∈I be a family of subspaces of V . Then the following statements are equivalent:
L
(i) V = i∈I Ui .

(ii) For each v ∈ V , there exists a unique finite subset Jv of I and a unique map
σv : Jv −→ V \ {0}, j 7→ uj (v) := σv (j), such that
X X
v= σv (j) = uj (v) ∧ ∀ uj (v) ∈ Uj . (5.1)
j∈Jv
j∈Jv j∈Jv

P
(iii) V = i∈I Ui and
X
∀ Uj ∩ Ui = {0}. (5.2)
j∈I
i∈I\{j}

(iv) V = i∈I Ui and, letting I ′ := {i ∈ I : Ui 6= {0}}, each family (ui )i∈I ′ in V with
P
ui ∈ Ui \ {0} for each i ∈ I ′ is linearly independent.
P
Proof. “(i) ⇔ (ii)”: According to the definition P of V = i∈I Ui , the existence of Jv and
σv such that (5.1)
P holds is equivalent to V = U
i∈I i . If (5.1) holds and Iv ⊆ I is finite
such that v = j∈Iv τv (j) with τv : Iv −→ V \ {0} and τv (j) ∈ Uj for each j ∈ Iv , then
define σv (j) := 0 for each j ∈ Iv \ Jv and τv (j) := 0 for each j ∈ Jv \ Iv . Then
X 
0=v−v = σv (j) − τv (j)
j∈Jv ∪Iv

and Def. 5.1(ii) implies σv (j) = τv (j) for each j ∈ Jv ∪ Iv as well as Jv = Iv . Conversely,
assume there exists J ⊆ I finite and uj ∈ Uj (j ∈ J) such that
X
0= uj ∧ ∃ uj0 6= 0.
j0 ∈J
j∈J

Then X
uj 0 = (−uj )
j∈J\{j0 }

shows (ii) does not hold (as v := uj0 has two different representations).
P
“(i) ⇒ (iii)”: If (i) holds and v ∈ Uj ∩ i∈I\{j} Ui for some j ∈ I, then there exists a
P
finite J ⊆ I \ {j} such that v = i∈J ui with ui ∈ Ui for each i ∈ J. Since −v ∈ Uj and
X
0 = −v + ui ,
i∈J
5 DIRECT SUMS AND PROJECTIONS 73

Def. 5.1(ii) implies −v = 0 = v, i.e. (iii).


P
“(iii) ⇒ (i)”: Let J ⊆ I be finite such that 0 = i∈J ui with ui ∈ Ui for each i ∈ J.
Then X X
∀ uj = − uj ∈ Uj ∩ Ui
j∈J
i∈J\{j} i∈I\{j}

i.e. (iii) implies uj = 0 and Def. 5.1(ii).



“(iv) ⇒ (i)”: If J ⊆
PI is finite and uj ∈ Uj ∈ \{0} for each j ∈ J with (uj )j∈J linearly
independent, then j∈J uj 6= 0, implying Def. 5.1(ii) via contraposition.

“(i) ⇒ (iv)”: Suppose (uP i )i∈I ′ is a family in V such that ui ∈ Ui \ {0} for each i ∈ I . If
J ⊆ I ′ is finite and 0 = j∈J λj uj for some λj ∈ K, then Def. 5.1(ii) implies λj uj = 0
and, thus, λj = 0, for each j ∈ J, yielding the linear independence of (ui )i∈I ′ . 

Proposition 5.3. Let V be a vector space over the field F .



(a) Let B be a basis of V L with a decomposition B L i∈I Bi . If, for each i ∈ I,
=
Ui := hBi i, then V = i∈I Ui . In particular, V = b∈B h{b}i.
L
(b) If (Ui )i∈I is a family of subspaces of V such that Bi is a basis of Ui and V = i∈I Ui ,

then the Bi are pairwise disjoint and B := i∈I Bi forms a basis of V .

Proof. Exercise. 

Example 5.4. Consider the vector space V := R2 over R and let U1 := h{(1, 0)}i,
U2 := h{(0, 1)}i, U3 := h{(1, 1)}i. Then V = Ui + Uj and Ui ∩ Uj = {0} for each
i, j ∈ {1, 2, 3} with i 6= j. In particular, the sum V = U1 + U2 + U3 is not a direct sum,
showing Prop. 5.2(iii) can, in general, not be replaced by the condition Ui ∩ Uj = {0}
for each i, j ∈ I with i 6= j.

Definition 5.5. Let S be a set. Then P : S −→ S is called a projection if, and only if,
P 2 := P ◦ P = P .

Remark 5.6. For each set S, Id : S −→ S is a projection. Moreover, for each x ∈ S,


the constant map fx : S −→ S, fx ≡ x, is a projection. If V := S is a vector space
over a field F , then fx is linear, if and only if, x = 0. While this shows that not every
projection on a vector space is linear, here, we are interested in linear projections. We
will see in Th. 5.8 below that there is a close relationship between linear projections and
direct sums.

Example 5.7. (a) Let V be a vector space over the field F and let B be a basis of
V . If cv : Bv −→ F \ {0}, Bv ⊆ V finite, is the coordinate map for v ∈ V , then,
5 DIRECT SUMS AND PROJECTIONS 74

clearly, for each b ∈ B,


(
cv (b) b for b ∈ Bv ,
Pb : V −→ V, Pb (v) :=
0 otherwise,
is a linear projection.
(b) Let I be a nonempty set, can consider the vector space V := F(I, F ) = F I over
the field F . If
∀ ei : I −→ F, ei (j) := δij ,
i∈I
then, clearly, for each i ∈ I,
Pi : V −→ V, Pi (f ) := f (i) ei ,
is a linear projection.
(c) Let n ∈ N and let V be the vector space over R, consisting of all functions f :
R −→ R such that the n-th derivative f (n) exists. Then, clearly,
n
X f (k) (0)
P : V −→ V, P (f )(x) := xk ,
k=0
k!
is a linear projection, where the image of P consists of the subspace of all polynomial
functions of degree at most n.
Theorem 5.8. Let V be a vector space over the field F .
L
(a) Let (Ui )i∈I be a family of subspaces of V such that V = i∈I Ui , According to Prop.
5.2(ii), for each v ∈ V , there exists a unique finite subset Jv of I and a unique map
σv : Jv −→ V \ {0}, j 7→ uj (v) := σv (j), such that
X X
v= σv (j) = uj (v) ∧ ∀ uj (v) ∈ Uj .
j∈Jv
j∈Jv j∈Jv

Thus, we can define


(
σv (j) for j ∈ Jv ,
∀ Pj : V −→ V, Pj (v) := (5.3)
j∈I 0 otherwise.
L
Then each Pj is a linear projection with Im Pj = Uj and ker Pj = i∈I\{j} Ui .
Moreover, if i 6= j, then Pi Pj ≡ 0. Defining
! !
X X X
Pi : V −→ V, Pi (v) := Pi (v), (5.4)
i∈I i∈I i∈Jv
P
we have Id = i∈I Pi .
5 DIRECT SUMS AND PROJECTIONS 75

(b) Let (Pi )i∈I be a family of projections in L(V, PV ) such that, P for each v ∈ V , the set
Jv := {i ∈ I : Pi (v) 6= 0} is finite. If Id = i∈I Pi (where i∈I Pi is defined as in
(5.4)) and Pi Pj ≡ 0 for each i, j ∈ {1, . . . , n} with i 6= j, then
M
V = Im Pi .
i∈I

(c) If P ∈ L(V, V ) is a projection, then V = ker P ⊕ Im P , Im P = ker(Id −P ),


ker P = Im(Id −P ).

Proof. (a): If v, w ∈ V and λ ∈ F , then, extending σv and σw by 0 to Jv ∪ Jw , we have


 
∀ σv+w (j) = σv (j) + σw (j) ∈ Uj ∧ σλv (j) = λσv (j) ∈ Uj ,
j∈Jv ∪Jw

showing, forLeach i ∈ I, the linearity of Pi as well as Im Pi = Ui and Pi2 = Pi . Clearly, if


j ∈ I, then i∈I\{j} Ui ⊆ ker Pj . On the other hand, if v ∈ ker Pj , then j ∈ / Jv , showing
L
v ∈ i∈I\{j} Ui . If i 6= j and v ∈ V , then Pj v ∈ Uj ⊆ ker Pi , showing Pi Pj ≡ 0. Finally,
for each v ∈ V , we compute
!
X X X
Pi (v) = Pi (v) = σv (i) = v,
i∈I i∈Jv i∈Jv

thereby completing the proof of (a).


(b): For each v ∈ V , we have
X X
v = Id v = Pi (v) ∈ Im Pi ,
i∈Jv i∈Jv
P P
proving V = i∈I Im Pi . Now let J ⊆ I be finite such that 0 = j∈J uj with uj ∈ Im Pj
for each j ∈ J. Then there exist vj ∈ V such that uj = Pj vj . Thus, we obtain
!
X X
∀ 0 = Pi (0) = Pi uj = Pi Pj vj = Pi Pi vi = Pi vi = ui ,
i∈J
j∈J j∈J

showing Def. 5.1(ii) to hold and proving (b).


(c): We have
(Id −P )2 = (Id −P )(Id −P ) = Id −P − P + P 2 = Id −P,
showing Id −P to be a projection. On the other hand, P (Id −P ) = (Id −P )P = P −
P 2 = P − P = 0, i.e.
V = Im P ⊕ Im(Id −P ) (5.5)
6 EIGENVALUES 76

according to (b). We show ker P = Im(Id −P ) next: Let v ∈ V and x := (Id −P )v ∈


Im(Id −P ). Then P x = P v − P 2 v = P v − P v = 0, showing x ∈ ker P and Im(Id −P ) ⊆
ker P . Conversely, let x ∈ ker P . By (5.5), we write x = v1 + v2 with v1 ∈ Im P and
v2 ∈ Im(Id −P ) ⊆ ker P . Then v1 = x − v2 ∈ ker P as well, i.e. v1 ∈ Im P ∩ ker P . Then
there exists w ∈ V such that v1 = P w and we compute
v1 = P w = P 2 w = P v1 = 0,
as v1 ∈ ker P as well. Thus, x = v2 ∈ Im(Id −P ), showing ker P ⊆ Im(Id −P ) as
desired. From (5.5), we then also have V = Im P ⊕ ker P . Since we have seen Id −P to
be a projection as well, we also obtain ker(Id −P ) = Im(Id −(Id −P )) = Im P . 

6 Eigenvalues
Definition 6.1. Let V be a vector space over the field F and A ∈ L(V, V ).

(a) We call λ ∈ F an eigenvalue of A if, and only if, there exists 0 6= v ∈ V such that
Av = λv. (6.1)
Then each 0 6= v ∈ V such that (6.1) holds is called an eigenvector of A for the
eigenvalue λ; the set
 
EA (λ) := ker λ Id −A = v ∈ V : Av = λv (6.2)
is then called the eigenspace of A with respect to the eigenvalue λ. The set
σ(A) := {λ ∈ F : λ eigenvalue of A} (6.3)
is called the spectrum of A.
(b) We call A diagonalizable if, and only if, there exists basis B of V such that each
v ∈ B is an eigenvector of A.
Remark 6.2. Let V be a finite-dimensional vector space over the field F , dim V =
n ∈ N, and assume A ∈ L(V, V ) to be diagonalizable. Then there exists a basis B =
{v1 , . . . , vn } of V , consisting of eigenvectors of A, i.e. Avi = λi vi , λi ∈ σ(A), for each
i ∈ {1, . . . , n}. Thus, with respect to B, A is represented by the diagonal matrix
 
λ1
diag(λ1 , . . . , λn ) = 
 ... ,

λn
which explains the term diagonalizable.

6 EIGENVALUES 77

It will be a main goal of the present and the following sections to investigate under
which conditions, given A ∈ L(V, V ) with dim V < ∞, V has a basis B such that, with
respect to B, A is represented by a diagonal matrix. While such a basis does not always
exist, we will see that there always exist bases such that the representing matrix has a
particularly simple structure, a so-called normal form.

Theorem 6.3. Let V be a vector space over the field F and A ∈ L(V, V ).

(a) λ ∈ F is an eigenvalue of A if, and only if, ker(λ Id −A) 6= {0}, i.e. if, and only if,
λ Id −A is not injective.

(b) For each eigenvalue λ of A, the eigenspace EA (λ) constitutes a subspace of V .

(c) Let (vλ )λ∈σ(A) be a family in V such that, for each λ ∈ σ(A), vλ is a eigenvector for
λ. Then (vλ )λ∈σ(A) is linearly independent (in particular, #σ(A) ≤ dim V ).

(d) A is diagonalizable if, and only if, V is the direct sum of the eigenspaces of V , i.e.
if, and only if, M
V = EA (λ). (6.4)
λ∈σ(A)

(e) Let A be diagonalizable and, for each λ ∈ σ(A), let Pλ : V −→ V be the projection
with M
Im Pλ = EA (λ) and ker Pλ = EA (µ)
µ∈σ(A)\{λ}

given by (d) in combination with Th. 5.8(a). Then


X
A= λ Pλ (6.5a)
λ∈σ(A)

and
∀ APλ = Pλ A, (6.5b)
λ∈σ(A)

where (6.5a) is known as the spectral decomposition of A.

Proof. (a) holds, as, for each λ ∈ F and each v ∈ V ,

Av = λv ⇔ λv − Av = 0 ⇔ (λ Id −A)v = 0 ⇔ v ∈ ker(λ Id −A).

(b) holds, as EA (λ) is the kernel of a linear map.


6 EIGENVALUES 78

(c): Seeking a contradiction, assume (vλ )λ∈σ(A) to be linearly dependent. Then there
exists a minimal family of vectors (vλ1 , . . . , vλk ), k ∈ N, such that λ1 , . . . , λk ∈ σ(A) and
there exist c1 , . . . , ck ∈ F \ {0} with
k
X
0= ci vλi .
i=1

We compute
k
! k k k
X X X X
0=0−0=A ci vλi − λk ci vλi = c i λi v λ i − λk ci vλi
i=1 i=1 i=1 i=1
k−1
X
= ci (λi − λk ) vλi .
i=1

As we had chosen the family (vλ1 , . . . , vλk ) to be minimal, we obtain ci (λi − λk ) = 0 for
each i ∈ {1, . . . , k − 1}, which is a contradiction, since ci 6= 0 as well as λi 6= λk .
(d): If A is diagonalizable, V has basis B, consisting of eigenvectors of A. Letting, for
each λ ∈ σ(A), Bλ := {b ∈ B : Ab = λb}, we have that Bλ is a basis of EA (λ). Since

we have B = λ∈σ(A) Bλ , (6.4) now follows from Prop. 5.3(a). Conversely, if (6.4) holds,
then V has a basis of eigenvectors of A by means of Prop. 5.3(b).
(e): Exercise. 

Corollary 6.4. Let V be a vector space over the field F and A ∈ L(V, V ). If dim V =
n ∈ N and A has n distinct eigenvalues λ1 , . . . , λn ∈ F , then A is diagonalizable.

Proof. Due to Th. 6.3(b),(c), we must have


n
M
V = EA (λi ),
i=1

showing A to be diagonalizable by Th. 6.3(d). 

The following examples illustrate the dependence of diagonalizability, and even of the
mere existence of eigenvalues, on the structure of the field F .

Example 6.5. (a) Let K ∈ {R, C} and let V be a vector space over K with dim V = 2
and ordered basis B := (v1 , v2 ). Consider A ∈ L(V, V ) such that

Av1 = v2 , Av2 = −v1 .


6 EIGENVALUES 79

 
0 −1
With respect to B, A is then given by the matrix M := . In consequence,
1 0
    
2 0 −1 0 −1 −1 0
M = = ,
1 0 1 0 0 −1

showing A2 = − Id as well. Suppose λ ∈ σ(A) and v ∈ EA (λ). Then

−v = A2 v = λ2 v ⇒ λ2 = −1.

Thus, for K = R, A has no eigenvalues, σ(A) = ∅. For K = C, we obtain

A(v1 + iv2 ) = v2 − iv1 = −i(v1 + iv2 ),


A(v1 − iv2 ) = v2 + iv1 = i(v1 − iv2 ),

showing A to be diagonalizable with σ(A) = {i, −i} and {v1 + iv2 , v1 − iv2 } being
a basis of V of eigenvectors of A.

(b) Over R, consider the vector spaces

V1 := {(f : R −→ R) : f polynomial function},




V2 := {(expa : R −→ R) : a ∈ R} ,

where
∀ expa : R −→ R, expa (x) := eax .
a∈R

For i ∈ {1, 2}, consider the linear map

D : Vi −→ Vi , D(f ) := f ′ .

If P ∈ V1 \{0}, then deg(DP ) < deg(P ). If P ∈ V1 is constant, then DP = 0·P = 0.


In consequence 0 ∈ R is the only eigenvalue of D : V1 −→ V1 . On the other hand,
for each a ∈ R, D(expa ) = a expa , showing σ(D) = R for D : V2 −→ V2 . In
this case, D is even diagonalizable, since B := {expa : a ∈ R} is a basis of V2 of
eigenvectors of D.

(c) Let V be a vector space over the field F and assume char F 6= 2. Moreover, let
A ∈ L(V, V ) such that A2 = Id (for
 dim V = 2, a nontrivial example is given by
0 1
each A represented by the matrix with respect to some ordered basis of V ).
1 0
We claim A to be diagonalizable with σ(A) = {−1, 1} and

V = EA (−1) ⊕ EA (1) :
6 EIGENVALUES 80

Indeed, if x ∈ Im(A+Id), then there exists v ∈ V such that x = (A+Id)v, implying

Ax = A(A + Id)v = (A2 + A)v = (Id +A)v = x,

showing Im(A + Id) ⊆ EA (1). If x ∈ Im(Id −A), then there exists v ∈ V such that
x = (Id −A)v, implying

Ax = A(Id −A)v = (A − A2 )v = (A − Id)v = −x,

showing Im(Id −A) ⊆ EA (−1). Now let v ∈ V be arbitrary. We use 2 6= 0 in F to


obtain
1 1
v = (v + Av) + (v − Av) ∈ Im(Id +A) + Im(Id −A) ⊆ EA (1) + EA (−1),
2 2
showing V = EA (1) + EA (−1).

(d) Let V be a vector space over the field F with dim V = 2 and ordered basis B :=
(v1 , v2 ). Consider A ∈ L(V, V ) such that

Av1 = v1 ,
Av2 = v1 + v2 .
 
1 1
With respect to B, A is then given by the matrix . Due to Av1 = v1 , we
0 1
have 1 ∈ σ(A). Let v ∈ V . Then there exist c1 , c2 ∈ F such that v = c1 v1 + c2 v2 .
If λ ∈ σ(A) and 0 6= v ∈ EA (λ), then

λ(c1 v1 + c2 v2 ) = λv = Av = c1 v1 + c2 v1 + c2 v2 .

As the coordinates with respect to the basis B are unique, we conclude λc1 = c1 +c2
and λc2 = c2 . If c2 6= 0, then the second equation yields λ = 1. If c2 = 0, then
c1 6= 0 and the first equation yields λ = 1. Altogether, we obtain σ(A) = {1} and,
since A 6= Id, A is not diagonalizable.
Definition 6.6. Let S be a set and A : S −→ S. Then U ⊆ S is called A-invariant if,
and only if, A(U ) ⊆ U .
Proposition 6.7. Let V be a vector space over the field F and let U ⊆ V be a subspaces.
If A ∈ L(V, V ) is diagonalizable and U is A-invariant, then A ↾U is diagonalizable as
well.

Proof. Let A be diagonalizable, let U be A-invariant, and set AU := A↾U . Clearly,


(
EAU (λ) for λ ∈ σ(AU ),
∀ U ∩ EA (λ) =
λ∈σ(A) {0} for λ ∈
/ σ(AU ).
6 EIGENVALUES 81

As A is diagonalizable, from Th. 6.3(d), we know


M
V = EA (λ). (6.6)
λ∈σ(A)

It suffices to show X 
U = W := U ∩ EA (λ) ,
λ∈σ(A)

since, then M
U= EAU (λ)
λ∈σ(AU )

due to Th. 6.3(d) and Prop. 5.2(iv). Thus, seeking a contradiction, let u ∈ P U \ W (note
u 6= 0). Then there exist distinct λ1 , . . . , λn ∈ σ(A), n ∈ N, such that u = ni=1 vi with
vi ∈ EA (λi ) \ {0} for each i ∈ {1, . . . , n}, where we may choose u ∈ U \ W such that
n ∈ N is minimal. Since U is A-invariant, we know
n
X n
X
Au = Avi = λi vi ∈ U.
i=1 i=1

As λn u ∈ U as well, we conclude
n−1
X
Au − λn u = (λi − λn ) vi ∈ U
i=1

as well. Since u ∈ U \ W was chosen such that n is minimal, we must have Au − λn u ∈


U ∩ W . Thus, there exists a finite set σu ⊆ σ(A) such that
n−1
X X
Au − λn u = (λi − λn ) vi = wλ ∧ ∀ 0 6= wλ ∈ U ∩ EA (λ). (6.7)
λ∈σu
i=1 λ∈σu

Since the sum in (6.6) is direct, (6.7) and Prop. 5.2(ii) imply σu = {λ1 , . . . , λn−1 } and
 
λi 6=λn
∀ wλi = (λi − λn ) vi ⇒ vi ∈ U ∩ EA (λi ) ⊆ W .
i∈{1,...,n−1}

On the other hand, this then implies


n−1
X vn ∈EA (λn )
vn = u − vi ∈ U ⇒ vn ∈ W,
i=1
Pn−1
yielding the contradiction u = vn + i=1 vi ∈ W . Thus, the assumption that there
exists u ∈ U \ W was false, proving U = W as desired. 
6 EIGENVALUES 82

We will now use Prop. 6.7 to prove a result regarding the simultaneous diagonalizability
of linear endomorphisms:
Theorem 6.8. Let V be a vector space over the field F and let A1 , . . . , An ∈ L(V, V ),
n ∈ N, be diagonalizable linear endomorphisms. Then the A1 , . . . , An are simultaneously
diagonalizable (i.e. there exists a basis B of V , consisting of eigenvectors of Ai for each
i ∈ {1, . . . , n}) if, and only if,
∀ Ai Aj = Aj Ai . (6.8)
i,j∈{1,...,n}

Proof. Suppose B is a basis of V such that


∀ ∀ ∃ Ai b = λi,b b. (6.9)
i∈{1,...,n} b∈B λi,b ∈σ(Ai )

Then
∀ Ai Aj b = λi,b λj,b b = Aj Ai b,
i,j∈{1,...,n}

proving (6.8). Conversely, assume (6.8) to hold. We prove (6.9) via induction on n ∈ N.
For technical reasons, we actually prove (6.9) via induction on n ∈ N in the following,
clearly, equivalent form: There exists a family (Vk )k∈K of subspaces of V such that
L
(i) V = k∈K Vk .
(ii) For each k ∈ K, Vk has a basis Bk , consisting of eigenvectors of Ai for each
i ∈ {1, . . . , n}.
(iii) For each k ∈ K and each i ∈ {1, . . . , n}, Vk is contained in some eigenspace of Ai ,
i.e.  
∀ ∀ ∃ Vk ⊆ EAi (λik ), ∀ Ai v = λik v .
k∈K i∈{1,...,n} λik ∈σ(Ai ) v∈Vk

(iv) For each k, l ∈ K with k 6= l, there exists i ∈ {1, . . . , n} such that Vk and Vl are
not contained in the same eigenspace of Ai , i.e.
 
∀ k 6= l ⇒ ∃ λik 6= λil .
k,l∈K i∈{1,...,n}

For n = 1, we can simply use K := σ(A1 ) and, for each λ ∈ K, Vλ := EA1 (λ). Thus,
consider n > 1. By induction, assume (i) – (iv) to hold with n replaced by n − 1. It
suffices to show that the spaces Vk , k ∈ K, are all An -invariant, i.e. An (Vk ) ⊆ Vk : Then,
according to Prop. 6.7, Ank := An ↾Vk is diagonalizable, i.e.
M
Vk = EAnk (λ).
λ∈σ(Ank )
6 EIGENVALUES 83

Now each Vkλ := EAnk (λ) has a basis Bkλ , consisting of eigenvectors of An . Since, for
each i ∈ {1, . . . , n − 1}, Bkλ ⊆ Vk ⊆ EAi (λik ), Bkλ consists of eigenvectors of Ai for each
i ∈ {1, . . . , n}. Letting Kn := {(k, λ) : k ∈ K, λ ∈ σ(Ank )}, (i) – (iv) then hold with
K replaced by Kn . Thus, it remains to show An (Vk ) ⊆ Vk for each k ∈ Vk : Fix v ∈ Vk ,
k ∈ K. We have
(6.8)
∀ Aj (An v) = An (Aj v) = An (λjk v).
j∈{1,...,n−1}

Moreover, there exists a finite set Kv ⊆ K such that


X
An v = vl ∧ ∀ vl ∈ Vl \ {0}.
l∈Kv
l∈Kv

Then
X X X
∀ λjk vl = λjk (An v) = An (λjk v) = Aj (An v) = Aj vl = λjl vl .
j∈{1,...,n−1}
l∈Kv l∈Kv l∈Kv

As the sum in (i) is direct, Prop. 5.2(ii) implies λjk vl = λjl vl for each l ∈ Kv . For each
l ∈ Kv , we have vl 6= 0, implying λjk = λjl for each j ∈ {1, . . . , n − 1}. Thus, by (iv),
k = l, i.e. Kv = {k} and An v = vk ∈ Vk as desired. 

In general, computing eigenvalues is a difficult task (we will say more about this issue
later in Rem. 8.3(e) below). The following results can sometimes help, where Th. 6.9(a)
is most useful for dim V small:

Theorem 6.9. Let V be a vector space over the field F , dim V = n ∈ N. Let A ∈
L(V, V ).

(a) λ ∈ F is an eigenvalue of A if, and only if,

det(λ Id −A) = 0.

(b) If there exists a basis B of V such that the matrix (aji ) ∈ M(n, F ) of A with respect
to B is upper or lower triangular,
 then the diagonal elements aii are precisely the
eigenvalues of A, i.e. σ(A) = aii : i ∈ {1, . . . , n} .

Proof. (a): According to Th. 6.3(a), λ ∈ σ(A) is equivalent to λ Id −A not being in-
jective, which (as V is finite-dimensional) is equivalent to det(λ Id −A) = 0 by Cor.
4.37(b).
6 EIGENVALUES 84

(b): For each λ ∈ F , we have


n
Y

det(λ Id −A) = det λ Idn −(aji ) = (λ − aii ).
i=1

Thus, by (a), σ(A) = aii : i ∈ {1, . . . , n} . 

Example 6.10. Consider the vector space V := R2 over R and, with  respect to the
3 −2
standard basis, let A ∈ L(V, V ) be given by the matrix M := . Then, for each
1 0
λ ∈ R,

λ − 3 2
det(λ Id −A) = = (λ − 3) · λ + 2 = λ2 − 3λ + 2 = (λ − 1)(λ − 2),
−1 λ

i.e. σ(A) = {1, 2} by Th. 6.9(a). Since


       
v v v 2v1
M 1 = 1 ⇒ v1 = v2 , M 1 = ⇒ v1 = 2v2 ,
v2 v2 v2 2v2
1 2
 
B := { 1
, 1
} is a basis of eigenvectors of A.

Remark 6.11. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Moreover, let λ ∈ σ(A). Clearly, one has

{0} ⊆ ker(A − λ Id) ⊆ ker(A − λ Id)2 ⊆ . . .

and the inclusion can be strict for at most n times. Let

r(λ) := min k ∈ N : ker(A − λ Id)k = ker(A − λ Id)k+1 .



(6.10)

Then
∀ ker(A − λ Id)r(λ) = ker(A − λ Id)r(λ)+k : (6.11)
k∈N

Indeed, otherwise, let k0 := min{k ∈ N : ker(A − λ Id)r(λ) ( ker(A − λ Id)r(λ)+k }. Then


there exists v ∈ V such that (A−λ Id)r(λ)+k0 v = 0, but (A−λ Id)r(λ)+k0 −1 v 6= 0. However,
that means w := (A − λ Id)k0 −1 v ∈ ker(A − λ Id)r(λ)+1 , but w ∈ / ker(A − λ Id)r(λ) , in
contradiction to the definition of r(λ).

Definition 6.12. Let V be a vector space over the field F , dim V = n ∈ N, and
A ∈ L(V, V ). Moreover, let λ ∈ σ(A). The number

ma (λ) := dim ker(A − λ Id)r(λ) ∈ {1, . . . , n}, (6.12)


6 EIGENVALUES 85

where r(λ) is given by (6.10), is called the algebraic multiplicity of the eigenvalue λ,
whereas
mg (λ) := dim ker(A − λ Id) ∈ {1, . . . , n} (6.13)
is called its geometric multiplicity. We call λ simple if, and only if, mg (λ) = ma (λ) = 1;
we call λ semisimple if, and only if, mg (λ) = ma (λ). For each k ∈ {1, . . . , r(λ)}, the
space
EAk (λ) := ker(A − λ Id)k
is called the generalized eigenspace of rank k of A, corresponding to the eigenvalue λ;
each v ∈ EAk (λ) \ EAk−1 (λ), k ≥ 2, is called a generalized eigenvector of rank k with
corresponding to the eigenvalue λ (an eigenvector v ∈ EA (λ) is sometimes called a
generalized eigenvector of rank 1).

Proposition 6.13. Let V be a vector space over the field F , dim V = n ∈ N, and
A ∈ L(V, V ).

(a) If λ ∈ σ(A) and r(λ) is given by (6.10), then

1 ≤ r(λ) ≤ n. (6.14)

If k ∈ N is such that 1 ≤ k < k + 1 ≤ r(λ), then


r(λ)
1 ≤ mg (λ) = dim EA1 (λ) ≤ dim EAk (λ) < dim EAk+1 (λ) ≤ dim EA (λ) = ma (λ) ≤ n,
(6.15)
which implies, in particular,

r(λ) ≤ ma (λ), (6.16a)


1 ≤ mg (λ) ≤ ma (λ) ≤ n, (6.16b)
0 ≤ ma (λ) − mg (λ) ≤ n − 1. (6.16c)

(b) If λ ∈ σ(A), then, for each k ∈ {1, . . . , r(λ)}, the generalized eigenspace EAk (λ) is
A-invariant, i.e.
A EAk (λ) ⊆ EAk (λ).


(c) If A is diagonalizable, then mg (λ) = ma (λ) holds for each λ ∈ σ(A) (but cf. Ex.
6.14 below).

Proof. (a): Both (6.14) and (6.15) are immediate from Rem. 6.11 together with the defi-
nitions of r(λ), mg (λ) and ma (λ). Then (6.16a) follows from (6.15), since dim EAk+1 (λ) −
dim EAk (λ) ≥ 1; (6.16b) is immediate from (6.15); (6.16c) is immediate from (6.16b).
6 EIGENVALUES 86

(b): Due to A(A − λ Id) = (A − λ Id)A, one has


A ker(A − λ Id)k ⊆ ker(A − λ Id)k :


k∈N0

Indeed, if v ∈ ker(A − λ Id)k , then


(A − λ Id)k (Av) = A(A − λ Id)k v = 0.

(c): Exercise. 
Example 6.14. Let V be a vector space over the field F , dim V = n ∈ N, n ≥ 2. Let
λ ∈ F . We show that there always exists a map A ∈ L(V, V ) such that λ ∈ σ(A) and
such that the difference between ma (λ) and mg (λ) maximal, namely
ma (λ) − mg (λ) = n − 1 :
Let B = (v1 , . . . , vn ) be an ordered basis of V , and let A ∈ L(V, V ) be such that
Av1 = λv1 ∧ ∀ Avi = λvi + vi−1 .
i∈{2,...,n}

Then, with respect to B, A is represented by the matrix


 
λ 1
 λ 1 
 
M := 
 . . . . . . .

 
 λ 1
λ
We use an induction over k{1, . . . , n} to show
 
∀ ∀ (A − λ Id)k vi = 0 ∧ ∀ k
(A − λ Id) vi = vi−k : (6.17)
k{1,...,n} 1≤i≤k k<i≤n

For k = 1, we have
(A − λ Id)v1 = λv1 − λv1 = 0, ∀ (A − λ Id)vi = λvi + vi−1 − λvi = vi−1 ,
1<i≤n

as needed. For k > 1, we have


ind.hyp. ind.hyp.
∀ (A − λ Id)k vi = 0, (A − λ Id)k vk = (A − λ Id)v1 = 0,
1≤i<k
ind.hyp.
∀ (A − λ Id)k vi = (A − λ Id)vi−k+1 = λvi−k+1 + vi−k − λvi−k+1 = vi−k ,
k<i≤n

completing the induction. In particular, since dim V = n, (6.17) yields, for each 1 <
k ≤ n, dim EAk (λ) − dim EAk−1 (λ) = 1 and dim EAk (λ) = k, implying mg (λ) = 1 and
ma (λ) = n, as claimed.
7 COMMUTATIVE RINGS, POLYNOMIALS 87

Definition 6.15. Let n ∈ N. Consider the vector space V := F n over the field F . All
of the notions we introduced in this section for linear endomorphisms A ∈ L(V, V ) (e.g.
eigenvalue, eigenvector, eigenspace, multiplicity of an eigenvalue, diagonalizability, etc.),
one also defines for quadratic matrices M ∈ M(n, F ): The notions are then meant with
respect to the linear map AM that M represents with respect to the standard basis of
F n.

Example 6.16. Let F be field and n ∈ N. If M ∈ M(n, F ) is diagonalizable, then,


according to Def. 6.15, there exists a regular matrix T ∈ GLn (F ) and a diagonal matrix
D ∈ M(n, F ) such that D = T −1 M T . A simple induction then shows

∀ M k = T Dk T −1 .
k∈N0

Clearly, if one knows T and T −1 , this can tremendously simplify the computation of
M k , especially if k is large and M is fully populated. However, computing T and T −1
can also be difficult, and it depends on the situation if it is a good option to pursue this
route.

7 Commutative Rings, Polynomials


We have already seen in the previous section that the eigenvalues of a quadratic matrix
are precisely the zeros of its characteristic polynomial functions. In order to further
study the relation between certain polynomials and the structure of a matrix (and the
structure of corresponding linear maps), we will need to investigate some of the general
theory of polynomials. We will take this opportunity to also learn more about the
general theory of commutative rings, which is of algebraic interest beyond our current
interest in matrix-related polynomials.

Definition 7.1. Let R be a commutative ring with unity. We call


N0
:= (f : N0 −→ R) : #f −1 (R \ {0}) < ∞

R[X] := Rfin (7.1)

the set of polynomials over R (i.e. a polynomial over R is a sequence (ai )i∈N0 in R such
that all, but finitely many, of the entries ai are 0, cf. [Phi19, Ex. 5.16(c)]). We then
have the pointwise-defined addition and scalar multiplication on R[X], which it inherits
from RN0 :
∀ (f + g) : N0 −→ R, (f + g)(i) := f (i) + g(i),
f,g∈R[X]
(7.2)
∀ ∀ (λ · f ) : N0 −→ R, (λ · f )(i) := λ f (i),
f ∈R[X] λ∈R
7 COMMUTATIVE RINGS, POLYNOMIALS 88

where we know from [Phi19, Ex. 5.16(c)] that, with these compositions, R[X] forms a
vector space over R, provided R is a field and, then, B = {ei : i ∈ N0 }, where

∀ ei : N0 −→ R, ei (j) := δij ,
i∈N0

provides the standard basis of the vector space R[X]. In the current context, we will
now write X i := ei and we will call these polynomials monomials. Furthermore, we
define a multiplication on R[X] by letting

(ai )i∈N0 , (bi )i∈N0 7→ (ci )i∈N0 := (ai )i∈N0 · (bi )i∈N0 ,
i
X X X (7.3)
ci := ak bl := ak b l = ak bi−k .
k+l=i (k,l)∈(N0 )2 : k+l=i k=0

If f := (ai )i∈N0 ∈ R[X], then we call the ai ∈ R the coefficients of f , and we define the
degree of f by (
−∞ for f ≡ 0,
deg f := (7.4)
max{i ∈ N0 : ai 6= 0} for f 6≡ 0
(defining deg(0) = −∞ instead of deg(0) = −1 has the advantage that formulas (7.5a),
(7.5b) below then also hold for the zero polynomial). If deg f = n ∈ N0 and an = 1,
then the polynomial f is called monic.

Remark 7.2. In the situation of Def. 7.1, using the notation X i = ei , we can write ad-
dition, scalar multiplication, and
Pn multiplication
Pin the following, perhaps, more familiar-
n
looking forms: If λ ∈ R, f = i=0 fi X , g = i=0 gi X i , n ∈ N0 , f0 , . . . , fn , g0 , . . . , gn ∈
i

R, then
n
X
f +g = (fi + gi ) X i ,
i=0
n
X
λf = (λfi ) X i ,
i=0
2n
!
X X
fg = fk g l X i.
i=0 k+l=i

Recall from [Phi19, Def. and Rem. 4.41] that an element x in a ring with unity R is
called invertible if, and only if, there exists x ∈ R such that xx = xx = 1, and that
(R∗ , ·) denotes the group of invertible elements of R.
7 COMMUTATIVE RINGS, POLYNOMIALS 89

Lemma 7.3. Let R be a ring with unity and x ∈ R∗ .

(a) x is not a zero divisor.


(b) If S is also a ring with unity and φ : R −→ S is a unital ring homomorphism, then
φ(x) ∈ S ∗ .

Proof. (a): Let x ∈ R∗ and x ∈ R such that xx = xx = 1. If y ∈ R such that xy = 0,


then y = 1 · y = xxy = 0; if y ∈ R such that yx = 0, then y = y · 1 = yxx = 0. In
consequence, x is not a zero divisor.
(b): Let x ∈ R such that xx = xx = 1. Then
1 = φ(1) = φ(xx) = φ(xx) = φ(x)φ(x) = φ(x)φ(x),
proving φ(x) ∈ S ∗ . 
Theorem 7.4. Let R be a commutative ring with unity.

(a) If f, g ∈ R[X] with f = (fi )i∈N0 , g = (gi )i∈N0 , then


(
−∞ ≤ max{deg f, deg g} if f = −g,
deg(f + g) = (7.5a)
max{i ∈ N0 : fi 6= −gi } ≤ max{deg f, deg g}, otherwise,
deg(f g) ≤ deg f + deg g. (7.5b)
If the highest nonzero coefficient of f or of g is not a zero divisor (e.g. if this
coefficient is an invertible element, cf. Lem. 7.3), then one even has
deg(f g) = deg f + deg g. (7.5c)

(b) (R[X], +, ·) forms a commutative ring with unity, where 1 = X 0 is the neutral
element of multiplication.

Proof. (a): If f ≡ 0, then f + g = g and f g ≡ 0, i.e. the degree formulas hold if f ≡ 0


or g ≡ 0. It is also immediate from (7.2) that (7.5a) holds in the remaining case. If
deg f = n ∈ N0 , deg g = m ∈ N0 , then, for each i ∈ N0 with P i > m + n, we have, for
k, l ∈ N0 with k + l = i that k > m or l > n, showing (f g)i = k+l=i fk gl = 0, proving
(7.5b). If fn is not a zero divisor, then (f g)m+n = fn gm 6= 0, proving (7.5c).
(b): We already know from [Phi19, Ex. 4.9(e)] that (R[X], +) forms a commutative
group. To verify associativity of multiplication, let a, b, c, d, f, g, h ∈ R[X],
a := (ai )i∈N0 , b := (bi )i∈N0 , c := (ci )i∈N0 , d := (di )i∈N0 ,
f := (fi )i∈N0 , g := (gi )i∈N0 , h := (hi )i∈N0 ,
7 COMMUTATIVE RINGS, POLYNOMIALS 90

such that d := ab, f := bc, g := (ab)c, h := a(bc). Then, for each i ∈ N0 ,


X X X X
gi = d k cl = am b n c l = am b n c l
k+l=i k+l=i m+n=k m+n+l=i
X X X
= am b n c l = am f k = h i ,
m+k=i n+l=k m+k=i

proving g = h, as desired. To verify distributivity, let a, b, c, d, f, g ∈ R[X] be as before,


but this time such that d := ab, f := ac, and g := a(b + c). Then, for each i ∈ N0 ,
X X X
gi = ak (bl + cl ) = ak b l + ak c l = d i + f i ,
k+l=i k+l=i k+l=i

proving g = d + f , as desired. To verify commutativity of multiplication, let a, b, c, d ∈


R[X] be as before, but this time such that c := ab, d := ba. Then, for each i ∈ N0 ,
X X
ci = ak b l = b l ak = d i ,
k+l=i k+l=i

proving c = d, as desired. Finally, if b := X 0 , then b0 = 1 and bi = 0 for i > 0, yielding,


for c := ab and each i ∈ N0 ,
X X
ci = ak b l = ak b 0 = ai ,
k+l=i k+0=i

showing X 0 to be neutral and completing the proof. 


Definition 7.5. Let R, R′ be rings (with unity). We call R′ a ring extension of R if,
and only if, there exists a (unital) ring monomorphism ι : R −→ R′ (if R′ is a ring
extension of R, then one might even identify the elements of R and ι(R) and consider
R to be a subset of R′ ). If R, R′ are fields, then one also calls R′ a field extension of R.
Example 7.6. (a) If R is a commutative ring with unity, then R[X] is a ring extension
of R via the unital ring monomorphism

ι : R −→ R[X], ι(r) := r X 0 :

Indeed, ι is unital, since ι(1) = X 0 ; ι is a ring homomorphism, since, for each


r, s ∈ R, ι(r + s) = (r + s)X 0 = rX 0 + sX 0 = ι(r) + ι(s) and ι(rs) = rsX 0 =
rX 0 · sX 0 = ι(r) ι(s); ι is injective, since, for r 6= 0, ι(r) = rX 0 6≡ 0.
(b) If R is a ring (with unity) and n ∈ N, then the matrix ring M(n, R) (cf. [Phi19,
Ex. 7.7(c)]) is a ring extension of R via the (unital) ring monomorphism

ι : R −→ M(n, R), ι(r) := diag(r, . . . , r) :


7 COMMUTATIVE RINGS, POLYNOMIALS 91

Indeed, ι is a ring homomorphism, since, for each r, s ∈ R,

ι(r + s) = diag(r + s, . . . , r + s) = diag(r, . . . , r) + diag(s, . . . , s) = ι(r) + ι(s),


[Phi19, (7.28)]
ι(rs) = diag(r, . . . , r) diag(s, . . . , s) = ι(r) ι(s),

ι is injective, since, for r 6= 0, ι(r) = diag(r, . . . , r) 6= 0. If R is a ring with unity,


then ι is unital, since ι(1) = Idn .
Proposition 7.7. Let R be a commutative ring with unity without nonzero zero divisors.
Then R[X] has no nonzero zero divisors and (R[X])∗ = R∗ .

Proof. Since R has no nonzero zero divisors, (7.5c) holds, i.e., if f, g ∈ R[X] with
f, g 6= 0, then deg(f g) = deg f + deg g ≥ 0, showing f g 6= 0, such that f, g can not be
zero divisors. First note that R∗ ⊆ (R[X])∗ always holds according to Lem. 7.3(b). If
f ∈ R[X] \ R, then deg f ≥ 1 and (7.5c) implies deg(f g) ≥ 1 for each 0 6= g ∈ R[X],
/ (R[X])∗ and also that each g ∈ R \ R∗ is not in (R[X])∗ ,
i.e. f g 6= 1. This shows f ∈
thereby proving R∗ = (R[X])∗ . 
Example 7.8. Let R := Z4 = Z/(4Z). Then (due to 4 = 0 in Z4 )

(2X 1 + 1X 0 )(2X 1 + 1X 0 ) = 0X 2 + 0X 1 + 1X 0 = X 0 ,

showing 2X 1 + 1X 0 ∈ (R[X])∗ \ R∗ , i.e. (R[X])∗ 6= R∗ can occur if R has nonzero zero


divisors. This also provides an example, where the degree formula (7.5c) does not hold.

Next, we will prove a remainder theorem for polynomials, which can be seen as an
analogon of the remainder theorem for integers (cf. [Phi19, Th. D.1]):
Theorem
Pd 7.9 (Remainder Theorem). Let R be a commutative ring with unity. Let
g = i=0 gi X i ∈ R[X], deg g = d, where gd ∈ R∗ . Then, for each f ∈ R[X], there exist
unique polynomials q, r ∈ R[X] such that

f = qg + r ∧ deg r < d. (7.6)

Proof. Uniqueness: Suppose f = qg +r = q ′ g +r′ with q, q ′ , r, r′ ∈ R[X] and deg r, deg r′


< d. Then
(7.5a),(7.5c)
0 = (q − q ′ )g + (r − r′ ) ⇒ deg(q − q ′ ) + deg g = deg(r − r′ ).

However, since deg(r − r′ ) < d and deg g = d, this can only hold for q = q ′ , which, in
turn, implies r = r′ as well.
7 COMMUTATIVE RINGS, POLYNOMIALS 92

Existence: We prove the existence of q, r ∈ R[X], satisfying (7.6), via induction on


n := deg f ∈ N0 ∪ {−∞}: If deg f < d, then set q := 0P and r := f (this, in particular,
takes care of the base case f ≡ 0). Now suppose f = ni=0 fi X i with fn 6= 0, n ≥ d.
Define
h := f − fn gd−1 X n−d g.
Then the degree of the subtracted polynomial is n, where the coefficient of X n is
fn gd−1 gd = fn , implying deg h < n. Thus, by the induction hypothesis, there exist
qh , rh ∈ R[X] such that h = qh g + rh and deg rh < d. Then
f = h + fn gd−1 X n−d g = qh g + rh + fn gd−1 X n−d g = (qh + fn gd−1 X n−d ) g + rh .
Hence, letting q := qh + fn gd−1 X n−d and r := rh completes the proof. 
Definition and Remark 7.10. Let R be a commutative ring with unity and let R′
be a ring extension of R, where ι : R −→ R′ is a unital ring monomorphism. For each
x ∈ R′ such that
∀ rx = xr (7.7)
r∈R

(note that we do not require R to be commutative, cf. Ex. 7.6(b)), the map
deg f deg f
!
X X
′ i
ǫx : R[X] −→ R , f 7→ ǫx (f ) = ǫx fi X := f i xi , (7.8)
i=0 i=0

is called the substitution homomorphism or evaluation homomorphism corresponding to


x (a typical example, where one wants to use a proper ring extension rather than R, is
the substitution of matrices from M(n, R), n ∈ N, for X): Indeed, if x ∈ R′ satisfies
(7.7), thenPǫx is unital, since ǫx (X 0 ) =Px0 = 1; ǫx is a ring homomorphism, since, for
each f = deg f i
i=0 fi X ∈ R[X] and g =
deg g i
i=0 gi X ∈ R[X], one has
deg(f +g) deg f deg g
X X X
ǫx (f + g) = (fi + gi ) xi = f i xi + gi xi = ǫx (f ) + ǫx (g),
i=0 i=0 i=0
deg(f g) deg f deg g
!
X X X X
ǫx (f g) = fk g l xi = fk gl xk+l
i=0 k+l=i k=0 l=0
deg f deg g
! !
(7.7) X X
= f k xk g l xl = ǫx (f ) ǫx (g).
k=0 l=0

Moreover, ǫx is linear, since, for each λ ∈ R,


deg f deg f
X X
i
ǫx (λf ) = (λfi ) x = λ fi xi = λ ǫx (f ).
i=0 i=0

We call x ∈ R a zero or a root of f ∈ R[X] if, and only if, ǫx (f ) = 0.
7 COMMUTATIVE RINGS, POLYNOMIALS 93

Definition 7.11. Let F be a field. We call F algebraically closed if, and only if, for each
f ∈ F [X] with deg f ≥ 1, there exists λ ∈ F such that ǫλ (f ) = 0, i.e. such that λ is a zero
of f , as defined in Def. and Rem. 7.10 (cf. Th. 7.38). It is an important result of Algebra
that every field F is contained in an algebraically closed field (a proof is provided in Th.
D.16 of the Appendix – some additional required material from nonlinear algebra, not
covered in the main text, is also included in the Appendix).

Notation 7.12. Let R be a commutative ring with unity. One commonly uses the
simplified notation X := X 1 ∈ R[X] and s := s X 0 ∈ R[X] for each s ∈ R.

Proposition 7.13. Let R be a commutative ring with unity. For each f ∈ R[X] with
deg f = n ∈ N and each s ∈ R, there exists q ∈ R[X] with deg q = n − 1 such that

f = ǫs (f ) + (X − s) q = ǫs (f ) X 0 + (X 1 − s X 0 ) q, (7.9a)

where ǫs is the substitution homomorphism according to Def. and Rem. 7.10. In partic-
ular, if s is a zero of f , then
f = (X − s) q. (7.9b)

Proof. According to Th. 7.9, there exist q, r ∈ R[X] such that f = q(X − s) + r with
deg r < deg(X − s) = 1. Thus, r ∈ R and deg q = n − 1 by (7.5c) (which holds, as X − s
is monic). Applying ǫs to f = q(X − s) + r then yields ǫs (f ) = ǫs (q)(s − s) + r = r,
proving (7.9). 

Corollary 7.14. Let F be a field. If f ∈ F [X] with deg f = n ∈ N0 , then f has at


most n zeros. Moreover, there exists k ∈ {0, . . . , n} and q ∈ F [X] with deg q = n − k
and such that
Yk
f =q (X − λj ), (7.10a)
j=1

where q does not have any zeros in F and N := {λ1 , . . . , λk } = {λ ∈ F : ǫλ (f ) = 0} is


the set of zeros of f (N = ∅ and f = q is possible, and it can also occur that all λj in
(7.10a) are identical). We can rewrite (7.10a) as
l
Y
f =q (X − µj )mj , (7.10b)
j=1

where µ1 , . . . , µl ∈ F , l ∈ {0, . . . , k}, are the distinct zeros of f , and mj ∈ N with


Pl
j=1 mj = k. Then mj is called the multiplicity of the zero µj of f .

If F is algebraically closed, then k = n and q ∈ F .


7 COMMUTATIVE RINGS, POLYNOMIALS 94

Proof. For f ∈ F [X] with deg f = n ∈ N0 , the representation (7.10a) follows from (7.9b)
combined with a straightforward induction, and (7.10b) is immediate from (7.10a). From
(7.9b) combined with the degree formula (7.5c), we then also know k ≤ n, i.e. f ∈ F [X]
with deg f = n ∈ N0 can have at most n zeros. If F is algebraically closed and q has no
zeros, then deg q = 0, implying k = n. 
Example 7.15. (a) Since C is algebraically closed by [Phi16a, Th. 8.32], for each f ∈
C[X] with n := deg f ∈ N, there exist numbers c, λ1 , . . . , λn ∈ C such that
n
Y
f =c (X − λj ) (7.11)
j=1

(the λ1 , . . . , λn are precisely all the zeros of f , some or all of which might be iden-
tical).

(b) For each f ∈ R[X] with n := deg f ∈ N, there exist numbers n1 , n2 ∈ N0 and
c, ξ1 , . . . , ξn1 , α1 , . . . , αn2 , β1 , . . . , βn2 ∈ R such that

n = n1 + 2n2 , (7.12a)

and
n1
Y n2
Y
f =c (X − ξj ) (X 2 + αj X + βj ) : (7.12b)
j=1 j=1

Indeed, if f has only real coefficients, then we can take complex conjugates to
obtain, for each λ ∈ C,

ǫλ (f ) = 0 ⇒ ǫλ (f ) = ǫλ (f ) = 0,

showing that the nonreal zeros of f (if any) must occur in conjugate pairs. Moreover,

(X − λ)(X − λ) = X 2 − (λ + λ)X + λλ = X 2 − 2X Re λ + |λ|2 ,

showing that (7.11) implies (7.12).


Remark 7.16. Let R be a commutative ring with unity. In Def. 4.32, we defined poly-
nomial functions (of several variables). The following Th. 7.17 illuminates the relation
between the polynomials of Def. 7.1 and the polynomial functions (of one variable) of
Def. 4.32 (for a generalization to polynomials of several variables, see Th. B.23 and Rem.
B.24 of the Appendix). Let Pol(R) denote the set of polynomial functions from R into
R. Clearly, Pol(R) is a subring with unity of RR (cf. [Phi19, Ex. 4.42(a)]). If R is a
field, then Pol(R) also is a vector subspace of RR .
7 COMMUTATIVE RINGS, POLYNOMIALS 95

Theorem 7.17. Let R be a commutative ring with unity and consider the map
φ : R[X] −→ Pol(R), f 7→ φ(f ), φ(f )(x) := ǫx (f ). (7.13)

(a) φ is a unital ring epimorphism. If R is a field, then φ is also a linear epimorphism.


(b) If R is finite, then φ is not a monomorphism.
(c) If F := R is an infinite field, then φ is an isomorphism.

Proof. (a): If f = X 0 , then φ(f ) ≡ 1. We also know from Def. and Rem. 7.10 that, for
each x ∈ R, ǫx is a linear ring homomorphism. Thus, if f, g ∈ R[X] and λ ∈ R, then,
for each x ∈ R,

φ(f + g)(x) = ǫx (f + g) = ǫx (f ) + ǫx (g) = φ(f ) + φ(g) (x),

φ(f g)(x) = ǫx (f g) = ǫx (f ) ǫx (g) = φ(f )φ(g) (x),

φ(λf )(x) = ǫx (λf ) = λ ǫx (f ) = λ φ(f ) (x).
Pn i
Moreover, φ is an epimorphism since, if P ∈ Pol(R) Pn with P (x) = i=0 fi x , where
f0 , . . . , fn ∈ R, n ∈ N0 , then P = φ(f ) with f = i=0 fi X i .
N0
(b): If R is finite, then RR and Pol(R) ⊆ RR are finite, whereas R[X] = Rfin is infinite
(also cf. 7.18 below).
(c): If F is an infinite field and f ∈ F [X] is such that P := φ(f ) ≡ 0, then each λ ∈ F
is a zero of f , showing f to have infinitely many zeros. Thus, according to Cor. 7.14,
deg f ∈/ N, implying f = 0. Thus, ker φ = {0} and φ is a monomorphism. 
Example 7.18. If R is a finite commutative ring with unity, then f := λ∈R (X 1 − λ) ∈
Q
R[X] \ {0}, but, using φ of (7.13), φ(f ) ≡ 0. For a concrete example, consider the field
with two elements, R := Z2 = {0, 1}. Then, 0 6= f := X 2 + X 1 = X 1 (X 1 + 1) ∈ R[X],
but, for each x ∈ R, φ(f )(x) = x(x + 1) = 0.
Remark 7.19. If F is a field and P ∈ Pol(F ) can be written as P (x) = ni=0 ai xi
P
with ai ∈ F , n ∈ N0 , then the representation is unique if, and only if, F has at least
n + 1 elements (in particular, the monomial functions x 7→ xi , i ∈ {0, . . . , n}, are
linearly independent if, and only if, F has at least n + 1 elements): Indeed, if F has
less than n +Q1 elements, then, as in Ex. 7.18 (and again using φ of (7.13)),
Pn φ(fi ) = 0,
1
where f := λ∈F (X − λ) ∈ R[X] \ {0} and 1 ≤ deg f ≤ n. If g := i=0 aP i X , then
n i
P = φ(g) = φ(g + f ). Since Png + f 6
= g and deg(g + f ) ≤ n, if we write g + f = i=0 bi X
with bi ∈ F , then P (x) = i=0 bi xi , showing the nonuniqueness ofP the representation of
P . Conversely, assume F has at least n + 1 elements. If P P (x) = ni=0 bi xi with bi ∈ F
is also a representation of f , then φ(h) = 0, where h := ni=0 (bi − ai )X i . Thus, h has
at least n + 1 zeros and deg h ≤ n together with Cor. 7.14 implies h = 0, i.e. ai = bi for
each i ∈ {0, . . . , n}. In consequence, the representation of P is unique.
7 COMMUTATIVE RINGS, POLYNOMIALS 96

Definition 7.20. Let R be a commutative ring with unity and 1 6= 0.

(a) We call R integral or an integral domain if, and only if, R does not have any nonzero
zero divisors.
(b) We call R Euclidean if, and only if, R is an integral domain and there exists a map
deg : R \ {0} −→ N0 such that, for each f, g ∈ R with g 6= 0, there exist q, r ∈ R,
satisfying 
f = qg + r ∧ deg r < deg g ∨ r = 0 . (7.14)
The map deg is then called a degree map or a Euclidean map of R.
Example 7.21. (a) Every field F is a Euclidean ring, where we can choose deg : F ∗ =
F \ {0} −→ N0 , deg ≡ 0, as the degree map: If g ∈ F ∗ , then, given f ∈ F , we
choose q := f g −1 and r := 0. Then, clearly, (7.14) holds.
(b) Z is a Euclidean ring, where we can choose deg : Z \ {0} −→ N0 , deg(k) := |k|,
as the degree map: According to [Phi19, Th. D.1], for each f, g ∈ N, there exist
q, r ∈ N0 such that f = qg + r and 0 ≤ r < g, also implying −f = −qg − r with
| − r| < g, f = −q(−g) + r, and −f = q(−g) − r, showing that (7.14) can always
be satisfied (for f = 0, merely set q := r := 0).
(c) If F is a field, then F [X] is a Euclidean ring, where we can choose the degree map
as in Def. 7.1: If F is a field, then, for each 0 6= g ∈ F [X], we can apply Th. 7.9 to
obtain (7.14).
Definition 7.22. Let R be a commutative ring with unity.

(a) a ⊆ R is called an ideal in R if, and only if, the following two conditions hold:
(i) (a, +) is a subgroup of (R, +).
(ii) For each x ∈ R and each a ∈ a, one has ax ∈ a (which, as 1 ∈ R, is equivalent
to aR = a).
(b) An ideal a ⊆ R is called principal if, and only if, there exists a ∈ R such that
a = (a) := aR.
(c) R is called principal if, and only if, every ideal in R is principal.
(d) R is called a principal ideal domain if, and only if, R is both principal and integral.
Remark 7.23. Let R be a commutative ring with unity and let a ⊆ R be an ideal.
Since (a, +) is a subgroup of (R, +) and a, b ∈ a implies ab ∈ a, a is always a subring
of R. However, as 1 ∈ a implies a = R, (0) 6= a is a subring with unity if, and only if,
a = R.
7 COMMUTATIVE RINGS, POLYNOMIALS 97

Proposition 7.24. Let R be a commutative ring with unity.

(a) {0} = (0) and (1) = R are principal ideals of R.


(b) If S is a ring and φ : R −→ S is a ring homomorphism, then ker φ = φ−1 {0} is an
ideal in R.
(c) Let S be a commutative ring with unity and let φ : R −→ S be a ring homomor-
phism. If b ⊆ S is an ideal in S, then φ−1 (b) is an ideal in R. If φ is even an
epimorphism and a ⊆ R is an ideal in R, then φ(a) is an ideal in S (however, cf.
Rem. 7.27(c) below).
(d) If a and b are ideals in R, then a + b is an ideal in R.
T
(e) If (ai )i∈I is a family of ideals in R, I 6= ∅, then a := i∈I ai is an ideal in R as
well.
(f ) If I 6= ∅ is an index set, partially ordered by ≤ in a way such that, for each i, j ∈ I,
there exists k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use
k := max{i, j}), and (ai )i∈I is an increasing family
S of ideals in R (i.e., for each
i, j ∈ I with i ≤ j, one has ai ⊆ aj ), then a := i∈I ai is an ideal in R as well.

Proof. (a) is clear.


(b): If x ∈ R and a ∈ ker φ, then φ(ax) = φ(a)φ(x) = 0 · φ(x) = 0, showing ax ∈ ker φ.
We also know (ker φ, +) to be a subgroup of (R, +).
(c): According to [Phi19, Th. 4.20(a)], (φ−1 (b), +) is a subgroup of (R, +). Moreover,
if x ∈ R and a ∈ φ−1 (b), then φ(a) ∈ b and, thus, φ(ax) = φ(a)φ(x) ∈ b, since b is an
ideal in S. Thus, ax ∈ φ−1 (b), showing φ−1 (b) to be an ideal in R. According to [Phi19,
Th. 4.20(d)], (φ(a), +) is a subgroup of (S, +). Moreover, if y ∈ S and b ∈ φ(a), then
there exist x ∈ R and a ∈ a such that y = φ(x) and b = φ(a). Then
by = φ(a)φ(x) = φ(ax),
showing by ∈ φ(a), since ax ∈ a (using a being an ideal in R). Thus, φ(a) is an ideal in
S.
(d): If x ∈ R, a ∈ a, and b ∈ b, then x(a + b) = xa + xb ∈ a + b. Moreover, if a1 , a2 ∈ a
and b1 , b2 ∈ b, then (a1 + b1 ) + (a2 + b2 ) = (a1 + a2 ) + (b1 + b2 ) ∈ a + b as well as
−(a1 + b1 ) = −a1 − b1 ∈ a + b, showing (a + b, +) to be a subgroup of (R, +).
(e): Let x ∈ R and a ∈ a. Then, for each i ∈ I, a ∈ ai , and, thus xa ∈ ai , showing
xa ∈ a. We also know (a, +) to be a subgroup of (R, +) by [Phi19, Th. 4.18(d)].
(f): Let x ∈ R and a ∈ a. Then there exists i ∈ I such that a ∈ ai , implying xa ∈ ai ⊆ a.
We also know (a, +) to be a subgroup of (R, +) by [Phi19, Th. 4.18(f)]. 
7 COMMUTATIVE RINGS, POLYNOMIALS 98

The following Prop. 7.25 is the ideal analogue of [Phi19, Prop. 5.9] for vector spaces.
Proposition 7.25. Let R be a commutative ring with unity, A ⊆ R, and

S := a ⊆ R : A ⊆ a ∧ a is ideal in R .

Then the set \


(A) := a (7.15)
a∈S

is called the ideal generated by A (the notation (a) for a principal ideal with a ∈ R can
then be seen as a short form of ({a})). Moreover A is called a generating set of the
ideal b in R if, and only if, (A) = b.

(a) (A) is an ideal in R, namely the smallest ideal in R containing A.

(b) If A = ∅, then (A) = {0}; if A 6= ∅, then


( n )
X
(A) = ri ai : n ∈ N ∧ r1 , . . . , r n ∈ S ∧ a1 , . . . , a n ∈ A . (7.16)
i=1

(c) If A ⊆ B ⊆ R, then (A) ⊆ (B).

(d) A = (A) if, and only if, A is an ideal in R.

(e) ((A)) = (A).

Proof. Exercise. 
Theorem 7.26. If R is a Euclidean ring, then R is a principal ideal domain.

Proof. Let R be a Euclidean ring with degree map deg : R \ {0} −→ N0 . Moreover, let
a ⊆ R be an ideal, a 6= (0). Let a ∈ a \ {0} be such that

deg(a) = min{deg(x) : 0 6= x ∈ a}.

Then a = (a): Indeed, let f ∈ a. According to (7.14), f = qa + r with q, r ∈ R and


deg(r) < deg(a) or r = 0. Then r = f − qa ∈ a and the choice of a implies r = 0 and
f = qa ∈ (a), showing a ⊆ (a). As (a) ⊆ a also holds (since a is an ideal), we have
a = (a), as desired. 
Example 7.27. (a) If F is a field, then (0) and F = (1) are the only ideals in F (in
particular, each field is a principal ideal domain): Indeed, if a is an ideal in F ,
0 6= a ∈ a, and x ∈ F , then x = xa−1 a ∈ a.
7 COMMUTATIVE RINGS, POLYNOMIALS 99

(b) Z and F [X] (where F is a field) are principal ideal domains according to Th. 7.26,
since we know from Ex. 7.21(b),(c) that both rings are Euclidean rings.

(c) According to Rem. 7.23, a proper subring with unity S of the commutative ring
with unity R can never be an ideal in R (and then the unital ring monomorphism
ι : S −→ R, ι(x) := x, shows that the subring S = Im φ does not need to be an
ideal). For example, Z is a subring of Q, but not an ideal in Q; Q is a subring of
R, but not an ideal in R.

(d) The ring Z4 = {0, 1, 2, 3} is principal: (2) = {0, 2} and, if a is an ideal in Z4 with
3 ∈ a, then 3 + 3 = 2 ∈ a and 2 + 3 = 1 ∈ a, showing a = Z4 . However, since
2 · 2 = 0, Z4 is not a principal ideal domain.

(e) The set A := 2Z ∪ 3Z ⊆ Z satisfies Def. 7.22(a)(ii): If k, l ∈ Z, then kl · 2 ∈ 2Z ⊆ A


and kl · 3 ∈ 3Z ⊆ A, but A is not an ideal in Z, since 2 + 3 ∈ / A. This example also
shows that unions of ideals need not be ideals.

(f ) The ring Z[X] is not principal: Let



a := (fi )i∈N0 ∈ Z[X] : f0 is even .

Then, clearly, a is an ideal in Z[X]. Moreover, 2X 0 ∈ a and X 1 ∈ Z[X]. However,


if f ∈ a such that 2 = 2X 0 ∈ (f ), then f ∈ {−2, 2}, showing X 1 ∈
/ (f ). Thus, the
ideal a is not principal.

Proposition 7.28. Let F be a field and let R 6= {0} be a ring with unity. Then every
unital ring homomorphism φ : F −→ R is injective (in particular, every unital ring
homomorphism between fields is injective).

Proof. According to Prop. 7.24(b), ker φ is an ideal in F . Thus, from Ex. 7.27(a),
ker φ = {0} or ker φ = F . As φ is unital, φ(1) = 1 6= 0, showing ker φ = {0}, i.e. φ is
injective. 

We now want to show that the analogue of the fundamental theorem of arithmetic
[Phi19, Th. D.6] holds in every Euclidean ring (in particular, in F [X], if F is a field)
and even in every principal ideal domain. We begin with some preparations:

Definition 7.29. Let R be an integral domain.

(a) We call x, y ∈ R associated if, and only if, there exists a ∈ R∗ such that x = ay.
7 COMMUTATIVE RINGS, POLYNOMIALS 100

(b) Let x, y ∈ R. We define x to be a divisor5 of y (and also say that x divides y,


denoted x| y) if, and only if, there exists c ∈ R such that y = cx. If x is no divisor
of y, then we write x 6 | y.

(c) Let ∅ 6= M ⊆ R. We call d ∈ R a greatest common divisor of the elements of M if,


and only if,    
∀ d| x ∧ ∀ ∀ r| x ⇒ r| d . (7.17)
x∈M r∈R x∈M

(d) 0 6= p ∈ R \ R∗ is called irreducible if, and only if,


 
∗ ∗
∀ p = xy ⇒ (x ∈ R ∨ y ∈ R ) . (7.18)
x,y∈R

Otherwise, p is called reducible.

(e) 0 6= p ∈ R \ R∗ is called prime if, and only if,


 
∀ p| xy ⇒ (p| x ∨ p| y) . (7.19)
x,y∈R

Before looking at some examples, we prove two propositions:

Proposition 7.30. Let R be an integral domain.

(a) Cancellation Law: If a, x, y ∈ R such that a 6= 0 and ax = ay, then x = y.

(b) If a| b and b| a, then a, b are associated.

(c) If (a) = (b), then a, b are associated.

(d) Let ∅ 6= M ⊆ R. If r, d ∈ R are both greatest common divisors of the elements of


M , then r, d are associated.

(e) If 0 6= p ∈ R \ R∗ is prime, then p is irreducible.


5
One has to be especially cautious with divisors of 0, as there is an inconsistency between the
present definition and the definition of zero divisor in [Phi19, Def and Rem. 4.32] (both definitions are
the ones commonly used in the literature and there does not seem to be a good way to avoid this issue):
According to the present definition, every x ∈ R is a divisor of 0; however, according to [Phi19, Def
and Rem. 4.32], a zero divisor x ∈ R must satisfy 0 = cx with c 6= 0. Thus, if one encounters a zero
divisor, one needs to determine from the context, which of the two definitions is the relevant one.
7 COMMUTATIVE RINGS, POLYNOMIALS 101

(f ) If a1 , . . . , an ∈ R \ {0}, n ∈ N, and d ∈ R is such that we have the equality of ideals

(a1 ) + · · · + (an ) = (d), (7.20)

then d is a greatest common divisor of a1 , . . . , an .

Proof. (a): If ax = ay, then a(x − y) = ax − ay = 0. Since R has no nonzero zero


divisors and a 6= 0, this means x = y.
(b): If a| b and b| a, then there exist x, y ∈ R such that b = xa and a = yb. Thus,
b = xa = xyb and (a) yields 1 = xy, showing x, y ∈ R∗ and a, b being associated.
(c): If (a) = (b), then a ∈ (b) and b ∈ (a), i.e. there exist x, y ∈ R such that a = xb and
b = ya, i.e. b| a and a| b. Thus, a, b are associated by (b).
(d): If r, d ∈ R are both greatest common divisors of the elements of M , then r| d and
d| r, i.e. r, d are associated by (b).
(e): Let 0 6= p ∈ R \ R∗ be prime and assume p = xy = 1 · xy. Then p| xy and, as p
is prime, p| x or p| y. By possibly renaming x, y, we may assume p| x, i.e. there exists
c ∈ R with x = cp, implying p = xy = cpy and 1 = cy by (a). Thus, c, y ∈ R∗ , i.e. p is
irreducible.
(f): As a1 , . . . , an ∈ (d), there exist c1 , . . . , cn ∈ R such that ai = ci d for each i ∈
{1, . . . , n}, showing d| ai for each i ∈ {1, . . . , n}. Now suppose r ∈ R is such that r| ai for
each i ∈ {1, . . . , n}, i.e. there exist x1 , . . . , xn ∈ R with ai = xi r for each i ∈ {1, . . . , n}.
On thePn other hand d ∈ (a1 ) + · · · + (an ) implies the existence of s1 , . . . , sn ∈ R with
d = i=1 si ai . Then
n n n
!
X X X
d= s i ai = s i xi r = si xi r,
i=1 i=1 i=1

showing r| d and proving (f). 


Proposition 7.31. Let R be a principal ideal domain.

(a) Bézout’s Lemma, cf. [Phi19, Th. D.4]: If a1 , . . . , an ∈ R \ {0}, n ∈ N, and d ∈ R is


a greatest common divisor of a1 , . . . , an , then (7.20) holds. In particular,

∃ x1 a1 + · · · + xn an = d, (7.21)
x1 ,...,xn ∈R

which is known as Bézout’s identity (usually for n = 2). An important special case
is that, if 1 is a greatest common divisor of a1 , . . . , an , then

∃ x1 a1 + · · · + xn an = 1. (7.22)
x1 ,...,xn ∈R
7 COMMUTATIVE RINGS, POLYNOMIALS 102

(b) Let 0 6= p ∈ R \ R∗ . Then p is prime if, and only if, p is irreducible.

Proof. (a): Let d be a greatest common divisor of a1 , . . . , an . Since R is a principal ideal


domain and using Prop. 7.30(f), (7.20) must hold with some greatest common divisor d1
of a1 , . . . , an . Then, by Prop. 7.30(d), there exists r ∈ R∗ such that d = r d1 , implying
(d) = (d1 ), proving (a).
(b): Due to Prop. 7.30(e), it only remains to prove that p is prime if it is irreducible.
Thus, assume p to be irreducible and let x, y ∈ R such that p| xy, i.e. xy = ap with
a ∈ R. If p 6 | x, then 1 is a greatest common divisor p, x and, according to (7.22), there
exist r, s ∈ R such that rp + sx = 1. Then

y = y · 1 = y(rp + sx) = yrp + sxy = yrp + sap = (yr + sa)p,

showing p| y and p prime. 


Example 7.32. Let R be an integral domain.

(a) For each x ∈ R, due to x = 1 · x, 1| x and x| x.

(b) If F := R is a field, then F \ F ∗ = {0}, i.e. F has neither irreducible elements nor
prime elements.

(c) If R = Z, then R∗ = {−1, 1}, i.e. p ∈ Z is irreducible if, and only if, |p| is a prime
number in N (and, since Z is a principal ideal domain by Ex. 7.27(b), p ∈ Z is
irreducible if, and only if, it is prime).

(d) For each λ ∈ R, X − λ = X 1 − λX 0 ∈ R[X] is irreducible due to (7.5c). For R = R,


X 2 + 1 is irreducible: Otherwise, there exist λ, µ ∈ R and X 2 + 1 = (X + λ)(X + µ),
yielding the contradiction 0 = ǫ−λ ((X + λ)(X + µ)) = ǫ−λ (X 2 + 1) = λ2 + 1. On the
other hand, Ex. 7.15(b) shows that, if f ∈ R[X] is irreducible, then deg f ∈ {1, 2}.

(e) Suppose
R := Q + X 1 R[X] = {(fi )i∈N0 ∈ R[X] : f0 ∈ Q} .
Clearly, R is a subring of√R[X].√ Then X = X 1√∈ R is irreducible,
√ but X is not
prime, since X| 2X 2 = ( 2X)( 2X), but X 6 | 2X, since 2 ∈ / Q. Then, as a
consequence of Prop.
√ 7.31(b), R can not be a principle ideal√domain. Indeed, the
ideal a := (X) + ( 2X) is not principle in R: Clearly, X and 2X are not common
multiples of any noninvertible f ∈ R.
Lemma 7.33. Let R be a principal ideal domain. Let I 6= ∅ be an index set totally
ordered by ≤, let (ai )i∈I be an increasing
S family of ideals in R. According to Prop.
7.24(f), we can form the ideal a := i∈I ai . Then there exists i0 ∈ I such that a = ai0 .
7 COMMUTATIVE RINGS, POLYNOMIALS 103

Proof. As R is principal, there exists a ∈ R such that (a) = a. Since a ∈ a, there exists
i0 ∈ I such that a ∈ ai0 , implying (a) ⊆ ai0 ⊆ a = (a) and establishing the case. 

Theorem 7.34 (Existence of Prime Factorization). Let R be a principal ideal domain.


If 0 6= a ∈ R \ R∗ , then there exist prime elements p1 , . . . , pn ∈ R, n ∈ N, such that

a = p1 . . . pn . (7.23)

Proof. Let S be the set of all ideals (a) in R that are generated by elements 0 6= a ∈ R\R∗
that do not have a prime factorization as in (7.23). We need to prove S = ∅. Seeking a
contradiction, assume S 6= ∅ and note that set inclusion ⊆ provides a partial
S order on S.
If C 6= ∅ is a totally ordered subset of S, then, by Prop. 7.24(f), a := c∈C c is an ideal
in R and, by Lem. 7.33, there exists c ∈ C such that a = c, showing a ∈ S to provide an
upper bound for C. Thus, Zorn’s lemma [Phi19, Th. 5.22] applies, yielding a maximal
element m ∈ S (i.e. maximal in S with respect to ⊆). Then there exists a ∈ R \ R∗ such
that m = (a) and a does not have a prime factorization. In particular, a is not prime,
i.e. a must be reducible by Prop. 7.31(b). Thus, there exist a1 , a2 ∈ R \ (R∗ ∪ {0}) such
that a = a1 a2 . Then (a) ( (a1 ) and (a) ( (a2 ): Indeed, if a1 = ra = ra1 a2 with r ∈ R,
then Prop. 7.30(a) yields 1 = ra2 and a2 ∈ R∗ (and analogously for (a) = (a2 )). Due
to the maximality of m = (a) in S, we conclude (a1 ), (a2 ) ∈ / S. Thus, a1 , a2 both must
have prime factorizations, yielding the desired contradiction that a = a1 a2 must have a
prime factorization as well. 

Remark 7.35. In particular, we obtain from Th. 7.34 that each k ∈ Z \ {−1, 0, 1} and
each f ∈ F [X] with F being a field and deg f ≥ 1 has a prime factorization. However,
for R = Z and R = F [X], we can prove the existence of a prime factorization for
each 0 6= a ∈ R \ R∗ in a simpler way and without making use of Zorn’s lemma: Let
deg : R \ {0} −→ N0 be the degree map as in Ex. 7.21(b),(c): We conduct the proof via
induction on deg(a) ∈ N: If a itself is prime, then there is nothing to prove, and this,
in particular, takes care of the base case of the induction. If a is not prime, then it is
reducible, i.e. a = a1 a2 with a1 , a2 ∈ R \ (R∗ ∪ {0}). In particular, 1 ≤ deg a1 , deg a2 <
deg a. Thus, by induction a1 , a2 both have prime factorizations, implying a to have a
prime factorization as well.

Theorem 7.36 (Uniqueness of Prime Factorization). Let R be an integral domain and


x ∈ R. Suppose
x = p 1 · · · p n = a r1 · · · rm , (7.24)
where a ∈ R∗ , p1 , . . . , pn ∈ R, n ∈ N, are prime and r1 , . . . , rm ∈ R, m ∈ N, are
irreducible. Then m = n and there exists a permutation π ∈ Sn such that, for each
i ∈ {1, . . . , n}, ri and pπ(i) are associated (cf. Def. 7.29(a)).
7 COMMUTATIVE RINGS, POLYNOMIALS 104

Proof. The proof is conducted via induction on n ∈ N. Since p1 is prime and p1 | r1 · · · rm ,


there exists i ∈ {1, . . . , m} such that p1 | ri . Since ri is irreducible, we must have ri =
a1 p1 with a1 ∈ R∗ . For n = 1 (the induction base case), this yields m = 1 and p1 , r1
associated, as desired. For n > 1, we have

p2 · · · pn = aa1 r1 · · · ri−1 ri+1 · · · rm

and we employ the induction hypothesis to obtain n − 1 = m − 1 (i.e. n = m) and a


bijective map σ : {1, . . . , n} \ {i} −→ {2, . . . , n} such that, for each j ∈ {2, . . . , n}, rj
and pσ(j) are associated. Then
(
1 for k = i,
π : {1, . . . , n} −→ {1, . . . , n}, π(k) :=
σ(k) for k 6= i,

defines a permutation π ∈ Sn such that for each j ∈ {1, . . . , n}, rj and pπ(j) are associ-
ated. 

Corollary 7.37. If R is a principal ideal domain (e.g. R = Z or R = F [X], where F is


a field), then each 0 6= a ∈ R \ R∗ admits a factorization into prime elements, which is
unique up to the order of the primes and up to association (rings R with this property
are called factorial, factorial integral domains are called unique factorization domains).

Proof. One merely combines Th. 7.34 with Th. 7.36. 

Theorem 7.38. Let F be a field. Then the following statements are equivalent:

(i) F is algebraically closed.

(ii) For each f ∈ F [X] with deg f = n ∈ N, there exists c ∈ F and λ1 , . . . , λn ∈ F


(not necessarily distinct), such that
n
Y
f =c (X 1 − λi ). (7.25)
i=1

(iii) f ∈ F [X] is irreducible if, and only if, deg f = 1.

Proof. “(i) ⇔ (ii)”: If F is algebraically closed, then (ii) is given by Cor. 7.14. That (ii)
implies (i) is immediate.
“(i) ⇔ (iii)”: We already noted in Ex. 7.32(d) that each X −λ with λ ∈ F is irreducible,
i.e. each sX − λ with s ∈ F \ {0} is irreducible as well. If F is algebraically closed and
deg f > 1, then (7.9b) shows f to be reducible. Conversely, if (iii) holds, then an
7 COMMUTATIVE RINGS, POLYNOMIALS 105

induction over n = deg f ∈ N shows each f ∈ F [X] with deg f ∈ N to have a zero:
Indeed, f = aX + b with a, b ∈ R, a 6= 0, has −ba−1 as a zero, and, if deg f > 1, then f
is reducible, i.e. there exist g, h ∈ F [X] with 1 ≤ deg g, deg h < deg f such that f = gh.
By induction, g and h must have a zero, i.e. f must have a zero as well. 

In [Phi19, Ex. 4.39], we saw how to obtain the field of rational numbers Q from the ring
of integers Z. The same construction actually still works if Z is replaced by an arbitrary
integral domain R, resulting in the so-called field of fractions of R (in the following
section, we will use the field of fractions of F [X] in the definition of the characteristic
polynomial of A ∈ L(V, V ), where V is a vector space over F ). This gives rise to the
following Th. 7.39.

Theorem 7.39. Let R be an integral domain. One defines the field of fractions6 F of
R as the quotient set F := (R × (R \ {0}))/ ∼ with respect to the following equivalence
relation ∼ on R × (R \ {0}), where the relation ∼ on R × (R \ {0}) is defined by

(a, b) ∼ (c, d) :⇔ ad = bc, (7.26)

where, as usual, we will write


a
:= a/b := [(a, b)] (7.27)
b
for the equivalence class of (a, b) with respect to ∼. Addition on F is defined by
a c  a c ad + bc
+ : F × F −→ F, , 7→ + := . (7.28)
b d b d bd
Multiplication on F is defined by
a c  a c ac
· : F × F −→ F, , 7→ · := . (7.29)
b d b d bd
Then (F, +, ·) does, indeed, form a field, where 0/1 and 1/1 are the neutral elements
with respect to addition and multiplication, respectively, (−a/b) is the additive inverse
to a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0. The map
k
ι : R −→ F, ι(k) := , (7.30)
1
is a unital ring monomorphism and it is customary to identify R with ι(R), just writing
k instead of k1 .
6
Caveat: The field of fractions of R should not be confused with the quotient field or factor field of
R with respect to a maximal ideal m in R (cf. Th. C.8(c)) – this is a different construction, leading to
different objects (e.g., for R = Z to the finite fields Zp , where p ∈ N is prime, cf. Ex. C.11 and [Phi19,
Ex. 4.38]).
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 106

Proof. Exercise. 
Example 7.40. (a) Q is the field of fractions of Z.

(b) If R is an integral domain, then we know from Prop. B.11 that R[X] is an integral
domain as well. The field of fractions of R[X] is denoted by R(X) and is called the
field of rational fractions over R.
Definition and Remark 7.41. Let R be an integral domain. We show that the field
of fractions of R (as defined in Th. 7.39) is the smallest field containing R: Let L be
some arbitrary field extension of R. Define

S := F ⊆ L : R ⊆ F ∧ F is subfield of L (7.31)

and \
K := F. (7.32)
F ∈S

According to [Phi19, Ex. 4.36(d)], K is a field, namely the smallest subfield of L, con-
taining R. If F (R) denotes the field of fractions of R, then
a
φ : F (R) −→ K, φ := ab−1 (7.33)
b
constitutes an isomorphism: Indeed, φ is well-defined, since
a c a c
= ⇒ ad = bc ⇒ φ = ab−1 = cd−1 = φ ,
b d b d
and since the definition of S guarantees ab−1 ∈ F for each a, b ∈ R with b 6= 0 and each
F ∈ S; φ is a homomorphism, since, for each a, b, c, d ∈ R with b, d 6= 0,
a c  
−1 −1 −1 −1 ad + bc a c 
φ +φ = ab + cd = (ad + bc)b d = φ =φ + ,
b d bd b d
a  c   ac  a c 
φ ·φ = ab−1 cd−1 = ac(bd)−1 = φ =φ · ;
b d bd b d
φ is injective, since a, b ∈ R\{0} implies φ( ab ) = ab−1 =
6 0; φ is surjective, since Im φ ⊆ K
is itself a subfield of L that contains R, implying K ⊆ Im φ and Im φ = K.

8 Characteristic Polynomial, Minimal Polynomial


We will now apply the theory of polynomials to further study linear endomorphisms on
finite-dimensional vector spaces. The starting point is Th. 6.9(a), which states that, if
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 107

V is a vector space over the field F , then the eigenvalues of A ∈ L(V, V ) are precisely
the zeros of the polynomial function
pA : F −→ F, pA (t) := det(t Id −A).
In order to make the results of the previous section available, instead of associating a
polynomial function with A ∈ L(V, V ), we will associate an actual polynomial (this also
avoids issues related to the fact that, in the case of finite fields, different polynomials
can give rise to the same polynomial function according to Th. 7.17(b)). The idea is
to replace t 7→ det(t Id −A) with det(X Idn −MA ), where MA is the matrix of A with
respect to an ordered basis of V . If V is a vector space over the field F , then the entries
of the matrix X Idn −MA are elements of the ring F [X]. However, we defined deter-
minants only for matrices with entries in fields. Thus, to make the following definition
consistent with our definition of determinants, we consider the elements of X Idn −MA
to be elements of F (X), the field of rational fractions over F (cf. Ex. 7.40(b)):
Definition 8.1. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Moreover, let B be an ordered basis of A and let MA ∈ M(n, F ) be the matrix
of A with respect to B. Since F (X) is a field extension of F , we may consider MA as
an element of M(n, F (X)). We define
χA := det(X Idn −MA ) ∈ F [X]
to be the characteristic polynomial of A.
Proposition 8.2. Let V be a vector space over the field F , dim V = n ∈ N, and
A ∈ L(V, V ).

(a) The characteristic polynomial χA is well-defined by Def. 8.1, i.e. if B1 are B2 are
ordered bases of V and M1 , M2 are the matrices of A with respect to B1 , B2 , respec-
tively, then
χ1 := det(X Idn −M1 ) = χ2 := det(X Idn −M2 ).
(b) The spectrum σ(A) is precisely the set of zeros of χA .

Proof. (a): Let T ∈ GLn (F ) be such that M2 = T −1 M1 T . Then


χ2 = det(X Idn −T −1 M1 T ) = det T −1 (X Idn −M1 )T = (det T −1 ) χ1 (det T ) = χ1 ,


proving (a).
(b): If λ ∈ F , then we have
Th. 6.9(a)
λ ∈ σ(A) ⇔ ǫλ (χA ) = det(λ Id −A) = 0,
thereby establishing the case. 
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 108

Remark 8.3. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ).

(a) If B is an ordered basis of V , the matrix (aji ) ∈ M(n, F ) represents A with respect
to B, and we let (cji ) := (X Idn −(aji )), then
n n
 Y X Y
χA = det X Idn −(aji ) = (X − aii ) + sgn(π) ciπ(i) (8.1)
i=1 π∈Sn \{Id} i=1

shows χA to be monic (i.e. the coefficient of X n is 1) with deg χA = n: Indeed,


clearly, the degree of the first summand is n and, for each π ∈ Sn \ {Id}, the degree
of the corresponding summand is at most n − 2.

(b) Some authors prefer to define the characteristic polynomial of A as the polynomial

χ̃A := det(A − X Id) = (−1)n χA .

While χ̃A still has the property that σ(A) is precisely the set of zeros of χ̃A , χ̃A is
monic only for n even. On the other hand, χ̃A has the advantage that ǫ0 (χ̃A ) =
det(A).

(c) According to Prop. 8.2(b), the task of finding the eigenvalues of A is the same as
the task of finding the zeros of the characteristic polynomial χA . So one might hope
that only particularly simple polynomials can occur as characteristic polynomials.
However, this is not the case: Indeed, every monic polynomial of degree n occurs
as a characteristic polynomial: Let a1 , . . . , an ∈ F , and
n
X
n
f := X + ai X n−i .
i=1

We define the companion matrix of f to be


 
−a1 −a2 −a3 . . . −an−1 −an
 1 0 
 

 1 0 

M (f ) :=  . . .

 1 
...
 
0
 
 
1 0

and claim χA = f , if A ∈ L(F n , F n ) is the linear map, represented by M (f ) with


respect to the standard basis of F n : Indeed, using Laplace expansion with respect
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 109

to the first row, we obtain



X + a1 a2 a3 . . . an−1 an

−1 X



−1 X
χA =

. ..


−1

...

X

−1 X

= (−1)n+1 an (−1)n−1 + (−1)n an−1 (−1)n−2 X


+ · · · + (−1)3 a2 (−1)1 X n−2 + (−1)2 (X + a1 ) X n−1
n−2
X
= (−1)2(n−i) an−i X i + (−1)2 (X + a1 ) X n−1
i=0
n−1
X n
X
n i n
=X + an−i X = X + ai X n−i = f.
i=0 i=1

(d) In Ex. 6.5(a), we saw that the considered linear endomorphism A had eigenvalues
for F = C, but no eigenvalues for F = R, which we can now relate to the fact that
χA = X 2 + 1 has no zeros over R, but χA = (X − i)(X + i) with zeros ±i over C.
(e) Given that eigenvalues are precisely the zeros of the characteristic polynomial, and
given that, according to (c), every monic polynomial of degree n can occur as the
characteristic polynomial of a matrix, it is not surprising that computing eigenvalues
is, in general, a difficult task, even if F is algebraically closed, guaranteeing the
eigenvalues’ existence. It is a result of Algebra that, for a generic polynomial
of degree at least 5, it is not possible to obtain its zeros using so-called radicals
(which are, roughly, zeros of polynomials of the form X k − λ, k ∈ N, λ ∈ F ,
see, e.g., [Bos13, Def. 6.1.1] for a precise definition) in finitely many steps (cf.,
e.g., [Bos13, Cor. 6.1.7]). In practice, one often has to make use of approximative
numerical methods (see, e.g., [Phi21, Sec. 7]). Having said that, let us note that
the problem of computing eigenvalues is, indeed, typically easier than the general
problem of computing zeros of polynomials. This is due to the fact that the difficulty
of computing the zeros of a polynomial depends tremendously on the form in which
the polynomial P is given: It is typically hard if the polynomial is expanded into
n i
the form f = i=0 ai X , but itQis easy (trivial, in fact), if the polynomial is
n
given in a factored form f = c i=1 (X − λi ). If the characteristic polynomial
is given implicitly by a matrix, one is, in general, somewhere between the two
extremes. In particular, for a large matrix, it usually makes no sense to compute
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 110

the characteristic polynomial in its expanded form (this is an expensive task in itself
and, in the process, one even loses the additional structure given by the matrix). It
makes much more sense to use methods tailored to the computation of eigenvalues,
and, if available, one should make use of additional structure a matrix might have.

Theorem 8.4. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Then there exists an ordered basis B of V such that the matrix M of A with
respect to B is triangular if, and only if, there exist distinct λ1 , . . . , λl ∈ F , l ∈ N, and
n1 , . . . , nl ∈ N with
X l Yl
ni = n ∧ χ A = (X − λi )ni . (8.2)
i=1 i=1

In this case, σ(A) = {λ1 , . . . , λl },

∀ ni = ma (λi ) (8.3)
i∈{1,...,l}

(i.e. the algebraic multiplicity of λi is precisely the multiplicity of λi as a zero of χA ),


and one can choose B such that M has the upper diagonal form
 
λ1
 ... 
 
λ1 ∗
 
 
M =
 ... 
, (8.4)
 

 0 λl 


 . ..


λl

where each λi occurs precisely ni times on the diagonal. Moreover, one then has

Y l
Y
det A = det M = λ ma (λ)
= λni i . (8.5)
λ∈σ(A) i=1

Proof. If there exists a basis B of V such that the matrix M = (mji ) of A with respect
to B is triangular, then
n
Def. 8.1 Cor. 4.26
Y
χA = det(X Idn −M ) = (X − mii ).
i=1

Combining factors, where the mii are equal, yields (8.2). For the converse, we assume
(8.2) and prove the existence of the basis B such that M has the from of (8.4) via
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 111

induction on n. For n = 1, there is nothing to prove. Thus, let n > 1. Then λ1 must be
an eigenvector of A with some eigenvector 0 6= v1 ∈ V . Then, if B1 := (v1 , . . . , vn ) is an
ordered basis of V , the matrix M1 of A with respect to B1 has the block form
 
λ1 ∗
M1 = , N ∈ M(n − 1, F ).
0 N
According to Th. 4.25, we obtain
l
Y l
Y
ni n1 −1
(X − λi ) = χA = (X − λ1 ) χN ⇒ χN = (X − λ1 ) (X − λi )ni .
i=1 i=2

Let U := h{v1 }i and W := V /h{v1 }i. Then dim W = n−1 and, by [Phi19, Cor. 6.13(a)],
BW := (v2 + U, . . . , vn + U ) is an ordered basis of W . Let A1 ∈ L(W, W ) be such that,
with respect to BW , A1 has the matrix N . Then, by induction hypothesis, there exists
an ordered basis CW = (w2 + U, . . . , wn + U ) of W (w2 , . . . , wn ∈ V ), such that, with
respect to CW , the matrix N1 ∈ M(n − 1, F ) of A1 has the form (8.4), except that λ1
occurs precisely n1 − 1 times on the diagonal. That N1 is the matrix of A1 means, for
N1 = (νji )(j,i)∈{2,...,n}2 that
n
X
∀ A1 (wi + U ) = νji (wj + U ).
i∈{2,...,n}
j=2

Then, by [Phi19, Cor. 6.13(b)], B := (v1 , w2 , . . . , wn ) is an ordered basis of V and,


with respect to B, the matrix M of A has the form (8.4): According to [Phi19, Th.
7.14], there exists an (n − 1) × (n − 1) transition matrix T1 = (tji )(j,i)∈{2,...,n}2 such that
N1 = T1−1 N T1 and
n
X n
X
∀ wi = tji vj , wi + U = tji (vj + U ),
i∈{2,...,n}
j=2 j=2

implying      
1 0 λ1 ∗ 1 0 λ1 ∗
M= = .
0 T1−1 0 N 0 T1 0 N1
It remains to verify (8.3). Letting w1 := v1 , we have B = (w1 , . . . , wn ). For each
k ∈ {1, . . . , n1 } and the standard column basis vector ek , we obtain
m1k
 
 .. 
 .  k−1
m
  X
(M − λ1 Idn ) ek = (mji ) − λ1 (δji ) ek =  k−1,k  ⇒ (A − λ1 Id)wk =

mαk wα ,

 0  α=1
 . 
 .. 
0
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 112

showing (A − λ1 Id)wk ∈ {w1 , . . . , wk−1 } and {w1 , . . . , wn1 } ∈ ker(A − λ1 Id)n1 . On



the other hand, for each k ∈ {n1 + 1, . . . , n}, we obtain

m1k
 
 .. 
 . 
mk−1,k 
  k−1
X
(M − λ1 Idn ) ek = λ − λ1  ⇒ (A − λ1 Id)wk = (λ − λ1 ) wk + mαk wα ,
 
 0 
  α=1
 . 
 .. 
0

/ ker(A − λ1 Id)N for each N ∈ N. Thus,


where λ ∈ {λ2 , . . . , λl }, showing wk ∈

n1 = dim ker(A − λ1 Id)n1 = dim ker(A − λ1 Id)r(λ1 ) = ma (λ1 ),

where r(λ1 ) is as defined in Rem. 6.11. Now note that λ1 was chosen arbitrarily is the
above argument. The same argument shows the existence of a basis B ′ such that λi ,
i ∈ {1, . . . , l}, appears in the upper left block of M . In particular, we obtain ni = ma (λi )
for each i ∈ {1, . . . , l}.
Finally, if (8.4) holds, then (8.5) is immediate from Cor. 4.26. 

Corollary 8.5. Let V be a vector space over the algebraically closed field F , dim V =
n ∈ N, and A ∈ L(V, V ). Then there exists an ordered basis B of V such that the
matrix M of A with respect to B is triangular. Moreover, (8.5) then holds, i.e. det A is
the product of the eigenvalues of A, where each eigenvalue is multiplied according to its
algebraic multiplicity.

Proof. This is immediate from combining Th. 8.4 with Th. 7.38(ii). 

Theorem 8.6. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). There exists a unique monic polynomial 0 6= µA ∈ F [X] (called the minimal
polynomial of A), satisfying the following two conditions:

(i) ǫA (µA ) = 0, where ǫA : F [X] −→ L(V, V ) is the substitution homomorphism


according to Def. and Rem. 7.10 (noting L(V, V ) to be a ring extension of F via
the unital ring monomorphism ι : F −→ L(V, V ), ι(a) := a Id).

(ii) For each f ∈ F [X] such that ǫA (f ) = 0, µA is a divisor of f , i.e. µA | f .

Proof. Let a := {f ∈ F [X] : ǫA (f ) = 0}. Clearly, a is an ideal in F [X], and, thus, as


F [X] is a principal ideal domain, there exists g ∈ F [X] such that a = (g). Clearly, g
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 113

satisfies both (i) and (ii). We need to show g 6= 0, i.e. a 6= (0) = {0}. To this end,
2
note that, since dim L(V, V ) = n2 , the n2 + 1 maps Id, A, A2 , . . . , An ∈ L(V, V ) must
be linearly dependent, i.e. there exist c0 , . . . , cn2 ∈ F , not all 0, such that
n2
X
0= ci Ai ,
i=0

Pn2 i
showing 0 6= f := i=0 ci X ∈ a. If h ∈ F [X] also satisfies (i) and (ii), then h| g
and g| h, implying g, h to be associated. In consequence, µA is the unique monic such
element of F [X]. 
Remark 8.7. Let F be a field.

(a) We extend Def. 6.15 to the characteristic polynomial and to the minimal polynomial:
Let n ∈ N. Consider the vector space V := F n over the field F . If M ∈ M(n, F ),
then χM and µM denote the characteristic polynomial and the minimal polynomial
of the linear map AM that M represents with respect to the standard basis of F n .

(b) Let V be a vector space over the field F , dim V = n ∈ N. As M(n, F ) is a ring
extension of F , we can plug M ∈ M(n, F ) into elements of F [X]. Moreover, if
f ∈ F [X], A ∈ L(V, V ) and M ∈ M(n, F ) represents A with respect to a basis B
of V , then, due to [Phi19, Th. 7.10(a)], ǫM (f ) represents ǫA (f ) with respect to B.
Example 8.8. Let F be a field. Consider
 
0 0 1
M := 0 0 0 ∈ M(3, F ) :
0 0 0

We claim that
Pthe minimal polynomial is µM = X 2 : Indeed, M 2 = 0 implies ǫM (X 2 ) = 0,
n
and, if f = i=0 fi X i ∈ F [X], n ∈ N, then ǫM (f ) = f0 Id3 +f1 M . Thus, if ǫM (f ) = 0,
then f0 = f1 = 0, implying X 2 | f and showing µM = X 2 .
Theorem 8.9 (Cayley-Hamilton). Let V be a vector space over the field F , dim V =
n ∈ N, and A ∈ L(V, V ). If χA and µA denote the characteristic and the minimal
polynomial of A, respectively, then the following statements hold true:

(a) χA (A) := ǫA (χA ) = 0.

(b) µA | χA and, in particular, deg µA ≤ deg χA = n.

(c) λ ∈ σ(A) if, and only if, µA (λ) := ǫλ (µA ) = 0, i.e. the eigenvalues of A are precisely
the zeros of the minimal polynomial µA .
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 114

(d) If #σ(A) = n (i.e. if A has n distinct eigenvalues), then χA = µA .

Proof. (a): Let B be an ordered basis of V and let (mji ) := MA be the matrix of A with
respect to B. Moreover, let N be the adjugate matrix of X Idn −MA , i.e., up to factors
of ±1, N contains the determinants of the (n − 1) × (n − 1) submatrices of X Idn −MA .
According to Th. 4.29(a), we then have
χA Idn = det(X Idn −MA ) Idn = N (X Idn −MA ). (8.6)
Since X Idn −MA contains only entries of degree at most 1 (deg(X − mii ) = 1, all other
entries having degree 0 or degree −∞), each entry nji of N has degree at most n − 1,
i.e.
n−1
X
∀ 2
∃ n ji = bk,j,i X k .
(j,i)∈{1,...,n} b0,j,i ,...,bn−1,j,i ∈F
k=0
Pn−1
If, for each k ∈ {0, . . . , n − 1}, we let Bk := (bk,j,i ) ∈ M(n, F ), then N = k=0 Bk X k .
Plugging this into (8.6) yields
χA Idn = (B0 + B1 X + · · · + Bn−1 X n−1 ) (X Idn −MA )
= −B0 MA + (B0 − B1 MA ) X + (B1 − B2 MA ) X 2
+ · · · + (Bn−2 − Bn−1 MA ) X n−1 + Bn−1 X n . (8.7)
Pn−1
Writing χA = X n + i=0 ai X i with a0 , . . . , an−1 ∈ F , the coefficients in front of each
i
X in (8.7) must agree: Indeed, in each entry of the respective matrix, we have an
element of F [X] and, in each entry, the coefficients of X i must agree (due to the linear
independence of the X i ) – hence, the matrix coefficients of X i in (8.7) must agree as
well. This yields
a0 Idn = −B0 MA ,
a1 Idn = B0 − B1 MA ,
..
.
an−1 Idn = Bn−2 − Bn−1 MA ,
Idn = Bn−1 .
Thus, ǫMA (χA ) turns out to be the telescoping sum
n−1
X n−1
X
ǫMA (χA ) = (MA )n + ai (MA )i = ai Idn (MA )i + Idn (MA )n
i=0 i=0
n−1
X
= −B0 MA + (Bi−1 − Bi MA ) (MA )i + Bn−1 (MA )n = 0.
i=1
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 115

As φ : L(V, V ) −→ M(n, F ), φ(A) := MA , is a ring isomorphism by [Phi19, Th.


7.10(a)], we also obtain ǫA (χA ) = φ−1 (ǫMA (χA )) = 0, thereby proving (a).
(b) is an immediate consequence of (a) in combination with Th. 8.6(ii).
(c): Suppose λ ∈ σ(A) and let 0 6= P v ∈ V be a corresponding eigenvector. Also let
m0 , . . . , ml ∈ F be such that µA = li=0 mi X i , l ∈ N. Then we compute
l
! l l
X X X
i i
mi (λi v)

0 = ǫA (µA ) v = mi A v = mi (A v) =
i=0 i=0 i=0
l
!
X
mi λi v = ǫλ (µA ) v,

=
i=0

showing ǫλ (µA ) = 0. Conversely, if λ ∈ F is such that ǫλ (µA ) = 0, then (b) implies


ǫλ (χA ) = 0, i.e. λ ∈ σ(A) by Th. 8.2(b).
(d): If #σ(A) = n, then µA has n distinct zeros by (c), implying deg µA = n. Since µA
is monic and µA | χA by (b), we have µA = χA as claimed. 
 
0 1
Example 8.10. Let F be a field. If M := ∈ M(2, F ), then χM = X 2 .
0 0
Since µM | χ
M and ǫM (X) = M 6= 0, we must have µM = χM . On the other hand, if
0 0
N := ∈ M(2, F ), then χN = X 2 and µN = X. Since N is diagonalizable, M is
0 0
not diagonalizable (cf. Ex. 6.5(d) and Ex. 6.14), but χM = χN , we can, in general, not
decide diagonalizability merely by looking at the characteristic polynomial. However,
we will see in Th. 8.14 below that the minimal polynomial does allow one to decide
diagonalizability.
Caveat 8.11. One has to use care when substituting matrices and endomorphisms
into polynomials: For example, one must be aware that in the expression X Idn −M ,
the polynomial X is a scalar. Thus, when substituting a matrix B ∈ M(n, F ) for X,
one must
 not use matrix multiplication
 between B and Idn : For example, for n = 2,
1 0 0 0
B := , and M := , we obtain B 2 = B and
0 0 0 0

ǫB (det(X Idn −M )) = ǫB (X 2 ) = B 2 6= 0 = det B = det(B − M ).


The following result further clarifies the relation between χA and µA :


Proposition 8.12. Let V be a vector space over the field F , dim V = n ∈ N, and
A ∈ L(V, V ).
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 116

(a) One has χA | (µA )n and, in particular, each irreducible factor of χA must be an
irreducible factor of µA .

(b) There exists an ordered basis B of V such that the matrix M of A with respect to
B is triangular if, and only if, there exist λ1 , . . . , λl ∈ F , l ∈ N, and n1 , . . . , nl ∈ N
with
Yl
µA = (X − λi )ni .
i=1

Proof. (a): Let M ∈ M(n, F ) be a matrix representing A with respect to some ordered
basis of V . Let G be an algebraically closed field with F ⊆ G (cf. Def. 7.11). We can
consider M as an element of M(n, G) and, then, σ(M ) is precisely the set of zeros of
both χM = χA and µA in G. As G is algebraically closed, for each λ ∈ σ(M ), there
exists mλ , nλ ∈ N such that mλ ≤ nλ ≤ n and
Y Y
µA = (X − λ)mλ | χA = (X − λ)nλ .
λ∈σ(M ) λ∈σ(M )

Letting q := λ∈σ(M ) (X − λ)n mλ −nλ , we have q ∈ G[X] as well as q = (µA )n (χA )−1 ,
Q

i.e. q χA = (µA )n , proving χA | (µA )n (in both G[X] and F [X], since q = (µA )n (χA )−1 ∈
F (X) ∩ G[X] = F [X]).
(b) follows by combining (a) with Th. 8.4. 

Theorem 8.13. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Suppose the minimal polynomial µA can be written in the form µA = g1 · · · gl ,
l ∈ N, where g1 , . . . , gl ∈ F [X] are such that, whenever i 6= j, then 1 is a greatest
common divisor of gi and gj . Then
l
M l
M
V = ker gi (A) = ker ǫA (gi ).
i=1 i=1

Proof. Define
l
Y
∀ hi := gk .
i∈{1,...,l}
k=1,
k6=i

Then 1 is a greatest common divisor of h1 , . . . , hl : Indeed, as 1 is a greatest common


divisor of gi and gj for i 6= j, the sets of prime factors of the gi must all be disjoint.
Thus, if f ∈ F [X] is a divisor of hi , i ∈ {1, . . . , l}, then f does not share a prime factor
with gi . If f | hi holds for each i ∈ {1, . . . , l}, then f does not share a prime factor
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 117

with any gi , implying f ∈ F \ {0}, i.e. 1 is a greatest common divisor of h1 , . . . , hl . In


consequence, (7.22) implies
l
X
∃ 1= fi h i
f1 ,...,fl ∈F [X]
i=1

and
l
X
∀ v = Id v = ǫA (1)v = ǫA (fi ) ǫA (hi ) v. (8.8)
v∈V
i=1

We verify that, for each i ∈ {1, . . . , l}, ǫA (fi ) ǫA (hi ) v ∈ ker ǫA (gi ): Indeed, since gi hi =
µA , one has
ǫA (gi ) ǫA (fi ) ǫA (hi ) v = ǫA (fi ) ǫA (µA ) v = 0.
Pl
Thus, (8.8) proves i=1 ker ǫA (gi ). According to Prop. 5.2(iii), it remains to show
X
∀ U := ker ǫA (gi ) ∩ ker ǫA (gj ) = {0}.
i∈{1,...,l}
j∈{1,...,l}\{i}

To this end, fix i ∈ {1, . . . , l} and note ǫA (gi )(U ) = {0} = ǫA (hi )(U ). On the other
hand, 1 is a greatest common divisor of gi , hi , i.e. (7.22) provides ri , si ∈ F [X] such that
1 = ri gi + si hi , yielding

{0} = ǫA (ri )ǫA (gi ) + ǫA (si )ǫA (hi ) (U ) = ǫA (1)(U ) = Id(U ) = U,
thereby completing the proof. 
Theorem 8.14. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Then A is diagonalizable if, and only if, there exist distinct λ1 , . . . , λl ∈ F ,
l ∈ N, such that µA = li=1 (X − λi ).
Q

Proof. Suppose A isQdiagonalizable and let B be a basis of V , consisting of eigenvectors


of A. Define g := λ∈σ(A) (X − λ). For each b ∈ B, there exists λ ∈ σ(A) such that
Ab = λb. Thus, Y
ǫA (g)(b) = (A − λ) b = 0.
λ∈σ(A)

According to Th. 8.9(b), we have µA | g. Since, by Th. 8.9(c), each λ ∈ σ(A) is a zero
of µA , we have deg µA = Q deg g. As both µA and g are monic, this means µA = g.
Conversely, suppose µA = li=1 (X − λi ) with distinct λ1 , . . . , λl ∈ F , l ∈ N. Then, by
Th. 8.13,
M l l
M M l
V = ker ǫA (X − λi ) = ker(A − λi Id) = EA (λi ),
i=1 i=1 i=1

proving A to be diagonalizable by Th. 6.3(d). 


8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 118

Example 8.15. (a) Let V be vector space over C, dim V = n ∈ N. If A ∈ L(V, V ) is


such
Qm that there exists m ∈ N with Am = Id, then Am − Id = 0 and µA | (X m − 1) =
k2πi/m
k=1 (X − ζk ), ζk := e . As the roots of unity ζk are all distinct (cf. [Phi16a,
Cor. 8.31]), A is diagonalizable by Th. 8.14.

(b) Let V be vector space over the field F , dim V = n ∈ N, and let P ∈ L(V, V ) be a
projection, i.e. P 2 = P . Then P 2 − P = 0 and µP | (X 2 − X) = X(X − 1). Thus,
we obtain the three cases

X
 for P = 0,
µP = X − 1 for P = Id,

X(X − 1) otherwise.

(c) Let V be vector space over the field F , dim V = n ∈ N, and let A ∈ L(V, V ) be a so-
called involution i.e. A2 = Id. Then A2 − Id = 0 and µA | (X 2 − 1) = (X + 1)(X − 1).
Thus, we obtain the three cases

X − 1
 for A = Id,
µA = X + 1 for A = − Id,

(X + 1)(X − 1) otherwise.

If A 6= ± Id, then, according to Th. 8.14, A is diagonalizable if, and only if, 1 6= −1,
i.e. if, and only if, char F 6= 2. Even though A 6= ± Id is not diagonalizable for
char F = 2, there still exists an ordered basis B of V such that the matrix M of A
with respect to B is triangular (all diagonal elements being 1), due to Prop. 8.12(b).
Proposition 8.16. Let V be a vector space over the real numbers R, dim V = n ∈ N,
and A ∈ L(V, V ).

(a) There exists a vector subspace U of V such that dim U ∈ {1, 2} and U is A-invariant
(i.e. A(U ) ⊆ U ).

(b) There exists an ordered basis B of V and matrices M1 , . . . , Ml , l ∈ N, such that


each Mi is either 1 × 1 or 2 × 2 over R, and such that the matrix M of A with
respect to B has the block triangular form
 
M1 ∗ ∗
M = ...
∗ .
 
Ml

Proof. Exercise. 
9 JORDAN NORMAL FORM 119

9 Jordan Normal Form


If V is a vector space over the algebraically closed field F , dim V = n ∈ N, A ∈ L(V, V ),
then one can always find an ordered basis B of V such that the corresponding matrix M
of A is in so-called Jordan normal form, which is an especially simple (upper) triangular
form, where the eigenvalues are found on the diagonal, the value 1 can, possibly, occur
directly above the diagonal, and all other entries are 0. However, we will also see below,
that, if F is not algebraically closed, then one can still obtain a normal form for M ,
albeit, in general, it is more complicated than the Jordan normal form.
Definition 9.1. Let V be a vector space over the field F , dim V = n ∈ N, A ∈ L(V, V ).

(a) A vector subspace U of V is called A-cyclic if, and only if, U is A-invariant (i.e.
A(U ) ⊆ U ) and
∃ U = {Ai v : i ∈ N0 } .


v∈V

(b) V is called A-irreducible if, and only if, V = U1 ⊕ U2 with U1 , U2 both A-invariant
vector subspaces of V , implies U1 = V or U2 = V .
Proposition 9.2. Let V be a finite-dimensional vector space over the field F , r ∈ N,
A ∈ L(V, V ). Suppose V is A-cyclic,
V = {Ai v : i ∈ N0 } , v ∈ V,

and r
X
∃ µA = ai X i , deg µA = r.
a0 ,...,ar ∈F
i=0

Then χA = µA , dim V = r, B := (v, Av, . . . , Ar−1 v) is an ordered basis of V , and, with


respect to B, A has the matrix
0 0 . . . . . . . . . −a0
 
 1 0 . . . . . . . . . −a1 
 
0 1 
M =  ..  ∈ M(r, F ). (9.1)
 
 . 
 .
 ..

0 −ar−2 
... ... 1 −ar−1

Proof. To show that v, Av, . . . , Ar−1 v are linearly independent, let λ0 , . . . , λr−1 ∈ F such
that
r−1
X
0= λi Ai v. (9.2)
i=0
9 JORDAN NORMAL FORM 120

Pr−1
Define g := i=0 λi X i ∈ F [X]. We need to show g = 0. Indeed, we have
(9.2)
∀ ǫA (g) Ai v = Ai ǫA (g) v = 0,
i∈N0

which, as the Ai generate V , implies ǫA (g) = 0. Since deg g < deg µA , this yields g = 0
and the linear independence of v, Av, . . . , Ar−1 v. We show hBi = V next: Seeking a
contradiction, assume Am ∈ / hBi, where we choose m ∈P N to be minimal. Then m ≥ r.
m−1 r−1
Then there exist λ0 , . . . , λr−1 ∈ F such that A v = i=0 λi Ai v. Then m > r yields
the contradiction r
X
m
A v= λi Ai v ∈ hBi.
i=1

However, m = r also yields a contradiction due to


r−1
X
r
0 = ǫA (µA )v = A v + ai Ai v ⇒ Ar v ∈ hBi.
i=0

Thus, B is an ordered basis of V . Then deg χA = r, implying χA = µA . Finally, if


ek is the kth standard column basisPvector of F r , k ∈ {1, . . . , r}, then M ek = ek+1 for
r−1
k ∈ {1, . . . , r − 1} and M er = − i=0 a i ei , showing M to be the matrix of A with
r−1
respect to B, since A(Ar−1 v) = Ar v = − i=0 ai Ai v.
P

Proposition 9.3. Let V be a vector space over the field F , dim V = n ∈ N, A ∈
L(V, V ). Moreover, let U be an A-invariant subspace of V , 1 ≤ l := dim U < n, and
define the linear maps

AU : U −→ U, AU := A↾U ,
AV /U : V /U −→ V /U, AV /U (v + U ) := Av + U.

Then the following holds:

(a) Let B := (v1 , . . . , vn ) be an ordered basis of V such that BU := (v1 , . . . , vl ) is an


ordered basis of U . Then the matrix M of A with respect to B has the block form
 
MU ∗
M = (mji ) = , MU ∈ M(l, F ), MV /U ∈ M(n − l, F ),
0 MV /U

where MU is the matrix of AU with respect to BU and MV /U is the matrix of AV /U


with respect to the ordered basis BV /U := (vl+1 + U, . . . , vn + U ) of V /U .

(b) χA = χAU χAV /U .

(c) µAU | µA and µAV /U | µA .


9 JORDAN NORMAL FORM 121

Proof. AU is well-defined, since U is A-invariant, and linear as A is linear. Moreover,


AV /U is well-defined, since, for each v, w ∈ V with v − w ∈ U , Av + U = Av + A(w −
v) + U = Aw + U ; AV /U is linear, since, for each v, w ∈ V and λ ∈ F ,

AV /U (v + U + w + U ) = A(v + w) + U = Av + U + Aw + U
= AV /U (v + U ) + AV /U (w + U ),
AV /U (λv + U ) = A(λv) + U = λ(Av + U ) = λ AV /U (v + U ).

(a): Since U is A-invariant, we have


l
X
∀ ∃ Avi = mji vj ,
i∈{1,...,l} m1i ,...,mli ∈F
j=1

showing M to have the claimed form with MU being the matrix of AU with respect to
BU . Moreover, by [Phi19, Cor. 6.13(a)], BV /U is, indeed, an ordered basis of V /U and
n
X n
X
∀ AV /U (vi + U ) = Avi + U = mji vj + U = mji vj + U,
i∈{l+1,...,n}
j=1 j=l+1

proving MV /U to be the matrix of AV /U with respect to BV /U .


(b): We compute
(a), Th. 4.25 (a)
χA = det(X Idn −M ) = det(X Idl −MU ) det(X Idn−l −MV /U ) = χAU χAV /U .

(c): Since ǫA (µA )v = 0 for each v ∈ V , ǫA (µA )v = 0 for each v ∈ U , proving ǫAU (µA ) = 0
and µAU | µA . Similarly, ǫAV /U (µA )(v + U ) = ǫA (µA )v + U = 0 for each v ∈ V , proving
ǫAV /U (µA ) = 0 and µAV /U | µA . 

Comparing Prop. 9.3(b),(c) above, one might wonder if the analogon of Prop. 9.3(b)
also holds for the minimal polynomials. The following Ex. 9.4 shows that, in general, it
does not:
Example 9.4. Let F be field and V := F 2 . Then, for A := Id ∈ L(V, V ), µA = X − 1.
If U is an arbitrary 1-dimensional subspace of V , then, using the notation of Prop. 9.3,
µAU = X − 1 = µAV /U , i.e. µAU µAV /U = (X − 1)2 6= µA .
Lemma 9.5. Let V be a vector space over the field F , dim V = n ∈ N, A ∈ L(V, V ).
Suppose V is A-cyclic.

(a) If µA = gh with g, h ∈ F [X], then we have:


9 JORDAN NORMAL FORM 122

(i) dim ker ǫA (h) = deg h.


(ii) ker ǫA (h) = Im ǫA (g).
(b) If λ ∈ σ(A), then dim EA (λ) = 1, i.e. every eigenspace of A has dimension 1.

Proof. (a): To prove (i), let U := Im h(A) = ǫA (h)(V ) and define AU , AV /U as in Prop.
9.3. As V is A-cyclic (say, generated by v ∈ V ), U is AU -cyclic (generated by ǫA (h)v)
and V /U is AV /U -cyclic (generated by v + U ). Thus, Prop. 9.2 yields
χA = µA , χA U = µA U , χAV /U = µAV /U ,
implying
Prop. 9.3(b)
gh = µA = χA = χAU χAV /U = µAU µAV /U . (9.3)
If v ∈ V , then ǫA (g)ǫA (h)v = ǫA (µA )v = 0, showing ǫAU (g) = 0 and µAU | g, deg µAU ≤
deg g. Similarly, ǫAV /U (h)(v + U ) = ǫA (h)v + U = 0 (since ǫA (h)v ∈ U ), proving
ǫAV /U (h) = 0 and µAV /U | h, deg µAV /U ≤ deg h. Since we also have deg g + deg h =
deg µAU + deg µAV /U by (9.3), we must have deg g = deg µAU and deg h = deg µAV /U .
Thus,
dim U = deg χAU = deg µAU = deg g = deg µA − deg h.
According to the isomorphism theorem [Phi19, Th. 6.16(a)], we know
V / ker ǫA (h) ∼
= Im ǫA (h) = U,
implying

deg h = deg µA − dim U = dim V − dim U = dim V − dim V / ker ǫA (h)

= dim V − dim V − dim ker ǫA (h) = dim ker ǫA (h),
thereby proving (i). We proceed to prove (ii): Since ǫA (h)(Im ǫA (g)) = {0}, we have
Im ǫA (g) ⊆ ker ǫA (h). To prove equality, we note that (i) must also hold with g instead
of h and compute
(i)
dim Im ǫA (g) = dim V − dim ker ǫA (g) = deg χA − deg g = deg µA − deg g
(i)
= deg h = dim ker ǫA (h),
completing the proof of (ii).
(b): Let λ ∈ σ(A). Then λ is a zero of µA and, by Prop. 7.13, there exists q ∈ F [X]
such that µA = (X − λ) q. Hence,
(a)(i)
dim EA (λ) = dim ker(A − λ Id) = deg(X − λ) = 1,
proving (b). 
9 JORDAN NORMAL FORM 123

Lemma 9.6. Let V be a vector space over the field F , dim V = n ∈ N, A ∈ L(V, V ).
Suppose µA = g r , where g ∈ F [X] is irreducible and r ∈ N. If U is an A-cyclic
subspace of V such that U has maximal dimension, then µA↾U = µA and there exists an
A-invariant subspace W such that V = U ⊕ W .

Proof. Letting AU := A ↾U , we show µAU = µA first: According to Prop. 9.3(c), we


have µAU | µA , i.e. there exists 1 ≤ r1 ≤ r such that µAU = g r1 . According to Prop.
9.2, χAU = µAU , implying dim U = deg µAU = r1 deg g. Let 0 6= v ∈ V and define
U1 := h{Ai v : i ∈ N0 }i. Then U1 is an A-cyclic subspace of V and the maximality of U
implies µAU1 | g r1 , implying ǫA (g r1 )(U1 ) = {0}. As 0 6= v ∈ V was arbitrary, this shows
ǫA (g r1 ) = 0, implying µA | g r1 = µAU , showing µAU = µA (and r1 = r) as claimed.
The proof of the existence of W is now conducted via induction on n ∈ N. For n = 1,
we have U = V and there is nothing to prove (W = {0}). Thus, let n > 1. If U = V ,
then, as for n = 1, we can merely set W := {0}. Thus, U 6= V . First, consider the case
that V /U is AV /U -reducible, where AV /U is as in Prop. 9.3. Then there exist A-invariant
subspaces {0} ( V1 , V2 ( V such that V /U = (V1 /U ) ⊕ (V2 /U ), i.e. V1 + V2 + U = V and
V1 ∩ V2 ⊆ U (since v ∈ V1 ∩ V2 implies v + U ∈ (V1 /U ) ∩ (V2 /U ) = U ). Replacing V1 , V2
by V1 + U, V2 + U , respectively, we may also assume V1 ∩ V2 = U . As U ⊆ V1 , V2 and
dim V1 , dim V2 < n, we can use the induction hypothesis to obtain A-invariant subspaces
W1 , W2 of V1 , V2 , respectively, such that V1 = U ⊕ W1 , V2 = U ⊕ W2 . Let W := W1 ⊕ W2
(the sum is direct, as w ∈ W1 ∩ W2 implies w ∈ V1 ∩ V2 = U , i.e. w = 0). If v ∈ V ,
then V = V1 + V2 implies the existence of u1 , u2 ∈ U and w1 ∈ W1 , w2 ∈ W2 such that
v = u1 + w1 + u2 + w2 , showing V = U + W . Using V /U = (V1 /U ) ⊕ (V2 /U ) as well as
W1 ∼= V1 /U , W2 ∼ = V2 /U by [Phi19, Th. 6.16(b)], we compute
dim V − dim U + dim(U ∩ W ) = dim W = dim W1 + dim W2
= dim(V1 /U ) + dim(V2 /U )
= dim(V /U ) = dim V − dim U,
showing U ∩ W = {0} and V = U ⊕ W . It remains to consider the case that V /U is
AV /U -irreducible. As dim V /U < n, the induction hypothesis applies to V /U , implying
V /U to be AV /U -cyclic, i.e. there exits v ∈ V with V /U = h{(AV /U )i (v + U ) : i ∈ N0 }i.
Since µ := µAV /U | µA by Prop. 9.3(c), we have µ = g s with 1 ≤ s ≤ r. Then

ǫA (g r−s ) ǫA (g s ) v = ǫA (µA )v = 0,
showing
ǫA (g s ) v = ǫA (µ) v ∈ U ∩ ker ǫA (g r−s ). (9.4)
Since µ| µA = µAU , we can apply Lem. 9.5(a)(ii) to obtain
U ∩ ker ǫA (g r−s ) = ker ǫAU (g r−s ) = Im ǫAU (g s ),
9 JORDAN NORMAL FORM 124

which, when combined with (9.4), yields some uv ∈ U with

ǫA (g s ) uv = ǫA (g s ) v. (9.5)

This, finally, allows us to define

w0 := v − uv , W := h{Ai w0 : i ∈ N0 }i.

Clearly, W is A-invariant. If x ∈ V , then x ∈ x + U ∈ V /U and, since dim(V /U ) < n,


there exist u ∈ U and λ1 , . . . , λn ∈ F such that
n
X n
X n
X n
X
i i i
x=u+ λi A v = u + λi A (w0 + uv ) = u + λ i A uv + λi Ai w0 ,
i=0 i=0 i=0 i=0

proving V = U + W . Since
(9.5)
ǫA (g s ) w0 = ǫA (g s ) v − ǫA (g s ) uv = 0,

we have dim W = deg µAW ≤ deg(g s ) = deg µ = dim(V /U ), implying (as V = U + W )


dim W = dim(V /U ) and V = U ⊕ W . 

Theorem 9.7. Let V be a vector space over the field F , dim V = n ∈ N, A ∈ L(V, V ).

(a) If V is A-irreducible, then V is A-cyclic.

(b) Suppose µA = g r , where g ∈ F [X] is irreducible and r ∈ N. Then V is A-irreducible,


if, and only if, V is A-cyclic.

(c) There exist subspaces U1 , . . . , Ul of V , l ∈ N, such that


l
M
V = Ui
i=1

and each Ui is both A-irreducible and A-cyclic.

Proof. (a): We must have µA = g r with r ∈ N and g ∈ F [X] irreducible, since, otherwise
V is A-reducible by Th. 8.13. In consequence, V is A-cyclic by Lem. 9.6.
(b): According to (a), it only remains to show that V being A-cyclic implies V to be
A-irreducible. Thus, let V be A-cyclic and V = V1 ⊕ V2 with A-invariant subspaces
V1 , V2 ⊆ V . Then, by Prop. 9.3(c), there exist 1 ≤ r1 , r2 ≤ r such that µAV1 = g r1 and
µAV2 = g r2 , where we choose V1 such that r2 ≤ r1 . Then ǫA (µAV1 )(V1 ) = ǫA (µAV1 )(V2 ) =
9 JORDAN NORMAL FORM 125

{0}, showing µA = µAV1 . As χA = µA by Prop. 9.2, we have dim V1 ≥ deg µAV1 =


deg χA = dim V , showing V1 = V as desired.
(c): The proof is conducted via induction on n ∈ N. If V is A-irreducible, then, by
(a), V is also A-cyclic and the statement holds (in particular, this yields the base case
n = 1). If V is A-reducible, then there exist A-invariant subspaces V1 , V2 of V such that
V = V1 ⊕ V2 with dim V1 , dim V2 < n. Thus, by induction hypothesis, both V1 and V2
can be written as a direct sum of subspaces that are both A-irreducible and A-cyclic,
proving the same for V . 

We now have all preparations in place to prove the existence of normal forms having
matrices with block diagonal form, where the blocks all look like the matrix of (9.1).
However, before we state and prove the corresponding theorem, we still provide a propo-
sition that will help to address the uniqueness of such normal forms:
Proposition 9.8. Let V be a vector space over the field F , dim V = n ∈ N, A ∈
L(V, V ). Moreover, suppose we have a decomposition
l
M
V = Ui , l ∈ N, (9.6)
i=1

where the U1 , . . . , Ul all are A-invariant and A-irreducible (and, thus, A-cyclic by Th.
9.7(a)) subspaces of V .

(a) If µA = g1r1 · · · gmrm


is the prime factorization of µA (i.e. each gi ∈ F [X] is irreducible,
r1 , . . . , rm ∈ N, m ∈ N), then
m
M
V = ker ǫA (giri ) (9.7)
i=1

and the decomposition of (9.6) is a refinement of (9.7) in the sense that each Ui is
r
contained in some ker ǫA (gj j ).
(b) If µA = g r with g ∈ F [X] irreducible, r ∈ N, then, for each i ∈ {1, . . . , l}, we have
µAUi = g ri , dim Ui = ri deg g, with 1 ≤ ri ≤ r. If

lk := # i ∈ {1, . . . , l} : µAUi = g k


k∈N0

(i.e. lk is the number of summands Ui with µAUi = g k ), then


r
X r
X
l= lk , dim V = (deg g) k lk , (9.8)
k=1 k=1
9 JORDAN NORMAL FORM 126

and r
X
s
∀ dim Im ǫA (g ) = (deg g) lk (k − s). (9.9)
s∈{0,...,r}
k=s

In consequence, the numbers lk are uniquely determined by A.

Proof. (a): That (9.7) is an immediate consequence of Th. 8.13. Moreover, if U is an


A-invariant and A-irreducible subspace of V , then µAU | µA and Th. 8.13 imply µAU = gjs
r
for some j ∈ {1, . . . , m} and 1 ≤ s ≤ rj . In consequence, U ⊆ ker ǫA (gj j ) as claimed.
(b): The proof of (a) already showed that, for each i ∈ {1, . . . , l}, µAUi = g ri , with
1 ≤ ri ≤ r, and then dim
P Ui = ri deg g by Prop. 9.2. From the definitions of r and lk , it
is immediate that l = rk=1 lk . As the sum in (9.6) is direct, we obtain
l
X r
X r
X
dim V = dim Ui = lk k deg g = (deg g) k lk .
i=1 k=1 k=1

r} and set h := g s . For each v ∈ V , there exist u1 , . . . , ul ∈


To prove (9.9), fix s ∈ {0, . . . , P
V such that ui ∈ Ui and v = li=1 ui . Then
l
X
ǫA (h) v = ǫA (h) ui ,
i=1

implying
l
M
ker ǫA (h) = ker ǫAUi (h), (9.10)
i=1

due to the fact that ǫA (h) v = 0 if, and only if, ǫA (h) ui = 0 for each i ∈ {1, . . . , l}. In
the case, where µAUi = g ri with ri ≤ s, we have ker ǫAUi (h) = Ui , i.e.

dim ker ǫAUi (h) = dim Ui = ri deg g, (9.11a)

while, in the case, where µAUi = g ri with s < ri , we can apply Lem. 9.5(a)(i) to obtain

dim ker ǫAUi (h) = deg(g s ) = s deg g. (9.11b)

Putting everything together yields

dim Im ǫA (g s ) = dim Im ǫA (h) = dim V − dim ker ǫA (h)


r l r
(9.8),(9.10) X X (9.11) X
= (deg g) k lk − dim ker ǫAUi (h) = (deg g) lk (k − s),
k=1 i=1 k=s
9 JORDAN NORMAL FORM 127

proving (9.9). To see that the lk are uniquely determined by A, observe lk = 0 for k > r
and k = 0, and, for 1 ≤ k ≤ r, (9.9) implies the recursion
lr = (deg g)−1 dim Im ǫA (g r−1 ), (9.12a)
r
X
∀ ls = (deg g)−1 dim Im ǫA (g s−1 ) − lk (k − (s − 1)), (9.12b)
s∈{1,...,r−1}
k=s+1

thereby completing the proof. 


Theorem 9.9 (General Normal Form). Let V be a vector space over the field F , dim V =
n ∈ N, A ∈ L(V, V ).

(a) There exist subspaces U1 , . . . , Ul of V , l ∈ N, such that each Ui is A-cyclic and


A-irreducible, satisfying
M l
V = Ui .
i=1

Moreover, for each i ∈ {1, . . . , l}, there exists vi ∈ Ui such that


Bi := {vi , Avi , . . . , Ari −1 vi }, ri := dim Ui ,
P i (i) k (i) (i)
is a basis of Ui . Then µAUi = rk=0 ak X with a0 , . . . , ari ∈ F , AUi := A ↾Ui ,
and, with respect to the ordered basis
B := (v1 , . . . , Ar1 −1 v1 , . . . , vl , . . . , Arl −1 vl ),
A has the block diagonal matrix
 
M1 0 . . . 0
 0 M2 . . . 0
M :=  .. ,
 
 . 
0 ... 0 Ml

each block having the form of (9.1), namely


 (i) 
0 0 . . . . . . . . . −a0
 1 0 . . . . . . . . . −a(i)
 
1 
 
0 1 
 
∀ Mi =  .
 ..
.
i∈{1,...,l} 


 .
 .. (i) 

 0 −a ri −2 
(i)
... ... 1 −ari −1
9 JORDAN NORMAL FORM 128

(b) If
m
M
V = Wi
i=1

is another decomposition of V into A-invariant and A-irreducible subspaces W1 , . . . ,


Wm of V , m ∈ N, then m = l and there exist a permutation π ∈ Sl and T ∈ GL(V )
such that
T A = AT ∧ ∀ T (Ui ) = Wπ(i) . (9.13)
i∈{1,...,l}

Proof. (a): The existence of the claimed decomposition was already shown in Th. 9.7(c)
and the remaining statements are then provided by Prop. 9.2, where the A-invariance
of the Ui yields the block diagonal structure of M .
(b): We divide the proof into three steps:
Step 1: Assume U and W to be A-cyclic subspaces of V such that 1 ≤ s := dim U =
dim W ≤ n and let u ∈ U , w ∈ W be such that BU := {u, . . . , As−1 u}, BW :=
{w, . . . , As−1 w} are bases of U, W , respectively. Define S ∈ L(U, W ) by letting

∀ S(Ai u) := Ai w
i∈{0,...,s−1}

(then S is invertible, as it maps the basis BU onto the basis BW ). We verify SA = AS:
Indeed,

∀ SA(Ai u) = S(Ai+1 u) = Ai+1 w = A(Ai w) = AS(Ai u)


i∈{0,...,s−2}

Ps
and, letting a0 , . . . , as ∈ F be such that µAU = µAW = i=0 ai X i (cf. Prop. 9.2),
s−1
! s−1
X X
s−1 s i
SA(A u) = S(A u) = S − ai A u =− ai Ai w
i=0 i=0

= As w = A(As−1 w) = AS(As−1 u).

Step 2: Assume µA = g s , where g ∈ F [X] is irreducible and s ∈ N. Letting

I(U, k) := i ∈ {1, . . . , l} : µAUi = g k ,




I(W, k) := i ∈ {1, . . . , m} : µAWi = g k ,

k∈N0

the uniqueness of the numbers lk in Prop. 9.8(b) shows

∀ lk = #I(U, k) = #I(W, k)
k∈N0
9 JORDAN NORMAL FORM 129

and, in particular, m = l, and the existence of π ∈ Sl such that, for each k ∈ N and each
i ∈ I(U, k), one has π(i) ∈ I(W, k). Thus, by Step 1, for each i ∈ {1, . . . , l}, there exists
an invertible Si ∈ L(Ui , Wπ(i) ) such that Si A = ASi . Define T ∈ GL(V ) by letting, for
each v ∈ V such that v = li=1 ui with ui ∈ Ui , T v := li=1 Si ui . Then, clearly, T
P P
satisfies (9.13).
Step 3: We now consider the general situation of (b). Let µA = g1r1 · · · gsrs be the prime
factorization of µA (i.e. each gi ∈ F [X] is irreducible, r1 , . . . , rs ∈ N, s ∈ N). According
to Prop. 9.8(a), there exist sets I1 , . . . , Is ⊆ {1, . . . , l} and J1 , . . . , Js ⊆ {1, . . . , m} such
that M M
∀ ker ǫA (gkrk ) = Ui = Wi .
k∈{1,...,s}
i∈Ik i∈Jk

Then, by Step 2, we have #Ik = #Jk for each k ∈ {1, . . . , s}, implying
s
X s
X
l= #Ik = #Jk = m
k=1 k=1

and the existence of a permutation π ∈ Sl such that π(Ik ) = Jk for each k ∈ {1, . . . , s}.
Again using Step 2, we can now, in addition, choose π ∈ Sl such that
 
∀ ∃ Ti invertible ∧ Ti A = ATi .
i∈{1,...,l} Ti ∈L(Ui ,Wπ(i) )

Pl
Define T ∈ GL(V ) by letting, for each v ∈ V such that v = i=1 ui with ui ∈ Ui ,
T v := li=1 Ti ui . Then, clearly, T satisfies (9.13).
P


The following Prop. 9.10 can sometimes be helpful in actually finding the Ui of Th.
9.9(a):

Proposition 9.10. Let V be a vector space over the field F , dim V = n ∈ N, A ∈


L(V, V ), and let µA = g1r1 · · · gm rm
be the prime factorization of µA (i.e. each gi ∈ F [X]
is irreducible, r1 , . . . , rm ∈ N, m ∈ N). For each i ∈ {1, . . . , m}, let Vi := ker ǫA (giri ).
Lm
(a) For each i ∈ {1, . . . , m}, we have µAVi = giri (recall V = i=1 Vi ).

(b) For each i ∈ {1, . . . , m}, in the decomposition V = lk=1 Uk of Th. 9.9(a), there
L
exists at least one Uk with Uk ⊆ Vi and dim Uk = deg(giri ).

(c) As in (b), let V = lk=1 Uk be the decomposition of Th. 9.9(a). Then, for each i ∈
L
{1, . . . , m} and each k ∈ {1, . . . , l} such that Uk ⊆ Vi , one has dim Uk ≤ deg(giri ).
9 JORDAN NORMAL FORM 130

s
Proof. (a): Let i ∈ {1, . . . , m}. According to Prop. 9.3(c), we have Pm µAVi = gi with
1 ≤ s ≤ ri . For each v ∈ V , there are v1 , . . . , vm ∈ V such that v = i=1 vi and vi ∈ Vi ,
implying
ri−1 s ri+1
ǫA (g1r1 · · · gi−1 rm
gi gi+1 · · · gm )v = 0
and ri ≤ s, i.e. ri = s.
(b): Let i ∈ {1, . . . , m}. According to (a), we have µAVi = giri . Using the uniqueness
of the decomposition V = lk=1 Uk is the sense of Th. 9.9(b) together with Lem. 9.6,
L
there exists Uk with Uk ⊆ Vi such that Uk is an A-cyclic subspace of Vi of maximal
dimension. Then Lem. 9.6 also yields µAUk = µAVi = giri , which, as Uk is A-cyclic,
implies dim Uk = deg(giri ).
(c): Let i ∈ {1, . . . , m}. Again, we know µAVi = giri by (a). If k ∈ {1, . . . , l} is such that
Uk ⊆ Vi , then Prop. 9.3(c) yields µAUk | µAVi = giri , showing dim Uk ≤ deg(giri ). 
Remark 9.11. In general, in the situation of Prop. 9.10, the knowledge of µA and
χA does not suffice to uniquely determine the normal form of Th. 9.9(a). In general,
for each gi and each s ∈ {1, . . . , ri }, one needs to determine dim Im ǫA (gis ), which then
determine the numbers lk of (9.9), i.e. the number of subspaces Uj with µAUj = gik . This
then determines the matrix M of Th. 9.9(a) (up to the order of the diagonal blocks),
since one obtains precisely lk many blocks of size k deg gi and the entries of these blocks
are given by the coefficients of gik .
Example 9.12. (a) Let F be a field and V := F 6 . Assume A ∈ L(V, V ) has
χA = (X − 2)2 (X − 3)4 , µA = (X − 2)2 (X − 3)3 .
We want to determine the decomposition V = li=1 Ui of Th. 9.9(a) and the matrix
L
M with respect to the corresponding basis of V given in Th. 9.9(a): We know
from Prop. 9.10(b) that we can choose U1 ⊆ ker(A − 2 Id)2 with dim U1 = 2 and
U2 ⊆ ker(A−3 Id)3 with dim U2 = 3. As dim V = 6, this then yields V = U1 ⊕U2 ⊕U3
with dim U3 = 1. We also know σ(A) = {2, 3} and, according to Th. 8.4, the
algebraic multiplicities are ma (2) = 2, ma (3) = 4. Moreover,
4 = ma (3) = dim ker(A − 3 Id)4 = dim ker ǫA ((X − 3)4 ),
implying U3 ⊆ ker(A − 3 Id)4 . As (X − 2)2 = X 2 − 4X + 4 and (X − 3)3 =
X 3 − 9X 2 + 27X − 27, M has the form
 
0 −4
 1 4 
 
 0 0 27 
M = .

 1 0 −27 

 0 1 9 
3
9 JORDAN NORMAL FORM 131

(b) Let F := Z2 = {0, 1} and V := F 3 . Assume, with respect to some basis of V ,


A ∈ L(V, V ) has the matrix
 
1 1 1
N := 1 0 1 .
1 0 0
We compute (using 1 = −1 and 0 = 2 in F )

X − 1 −1 −1

χA = det(X Id3 −N ) = −1 X −1
−1 0 X

= X 2 (X − 1) − 1 − X − X = X 3 − X 2 − 2X − 1 = X 3 + X 2 + 1.
Since χA = X(X 2 + X) + 1 and χA = (X + 1) X 2 + 1, χA is irreducible. Thus
χA = µA , V is A-irreducible and A-cyclic and the matrix of A with respect to the
basis of Th. 9.9(a) is (again making use of −1 = 1 in F )
 
0 0 1
M = 1 0 0 .
0 1 1

For fields that are algebraically closed, we can improve the normal form of Th. 9.9 to
the so-called Jordan normal norm:
Theorem 9.13 (Jordan Normal Form). Let V be a vector space over the field F ,
dim V = n ∈ N, A ∈ L(V, V ). Assume there exist distinct λ1 , . . . , λm ∈ K such that
m
Y
µA = (X − λi )ri , σ(A) = {λ1 , . . . , λm }, m, r1 , . . . , rm ∈ N (9.14)
i=1

(if F is algebraically closed, then (9.14) always holds).

(a) There exist subspaces U1 , . . . , Ul of V , l ∈ N, such that each Ui is A-cyclic and


A-irreducible, satisfying
M l
V = Uk .
k=1

Moreover, for each k ∈ {1, . . . , l}, there exists vk ∈ Uk and i = i(k) ∈ {1, . . . , m}
such that
Uk ⊆ ker(A − λi Id)ri
9 JORDAN NORMAL FORM 132

and
Jk := vk , (A − λi Id)vk , . . . , (A − λi Id)sk −1 vk ,


sk := dim Uk ≤ ri ≤ dim ker(A − λi Id)ri ,


is a basis of Uk (note that, in general, the same i will correspond to many distinct
subspaces Uk ). Then, with respect to the ordered basis
J := (A − λi(1) Id)s1 −1 v1 , . . . , v1 , . . . , (A − λi(l) Id)sl −1 vl , . . . , vl ,


A has a matrix in Jordan normal form, i.e. the block diagonal matrix
 
N1 0 ... 0
 0 N2 ... 0 
N :=  .. ,
 
 . 
0 ... 0 Nl
each block (called a Jordan block) having the form
Nk = (λi(k) ) ∈ M(1, F ) for sk = 1,
 
λi(k) 1 0 ... 0

 λi(k) 1 

Nk = 
 . . . . . .  ∈ M(sk , F )

for sk > 1.
 0 
 0 λi(k) 1 
λi(k)

(b) In the situation of (a) and recalling from Def. 6.12 that, for each λ ∈ σ(A), r(λ)
is such that ma (λ) = dim ker(A − λ Id)r(λ) and, for each s ∈ {1, . . . , r(λ)},
EAs (λ) = ker(A − λ Id)s
is called the corresponding generalized eigenspace of rank s of A, each v ∈ EAs (λ) \
EAs−1 (λ), s ≥ 2, is called a generalized eigenvector of rank s, we obtain
∀ ri = r(λi ),
i∈{1,...,m}
r
∀ Uk ⊆ EAsk (λi(k) ) ⊆ EAi(k) (λi(k) ).
k∈{1,...,l}

Moreover, for each k ∈ {1, . . . , l}, vk is a generalized eigenvector of rank sk and the
basis Jk consists of generalized eigenvectors, containing precisely one generalized
eigenvector of rank s for each s ∈ {1, . . . , sk }. Define
∀ l(i, s) := # k ∈ {1, . . . , l} : Uk ⊆ ker(A − λi Id)ri ∧ dim Uk = s


i∈{1,...,m} s∈N
9 JORDAN NORMAL FORM 133

(thus, l(i, s) is the number of Jordan blocks of size s corresponding to the eigenvalue
λi – apart from the slightly different notation used here, the l(i, s) are precisely the
numbers called lk in Prop. 9.8(b)). Then, for each i ∈ {1, . . . , m},

l(i, ri ) = dim ker(A − λi Id)ri − dim ker(A − λi Id)ri −1 , (9.15a)


∀ l(i, s) = dim ker(A − λi Id)ri − dim ker(A − λi Id)s−1
s∈{1,...,ri −1}
ri
X
− l(i, j) (j − (s − 1)). (9.15b)
j=s+1

Thus, in general, one needs to determine, for each i ∈ {1, . . . , m} and each s ∈
{1, . . . , ri }, dim ker(A − λi Id)s to know the precise structure of N .

(c) For the sake of completeness and convenience, we restate Th. 9.9(b): If
L
M
V = Wi
i=1

is another decomposition of V into A-invariant and A-irreducible subspaces W1 , . . . ,


WL of V , L ∈ N, then L = l and there exist a permutation π ∈ Sl and T ∈ GL(V )
such that
T A = AT ∧ ∀ T (Ui ) = Wπ(i) .
i∈{1,...,l}

Proof. (a),(b): The A-cyclic and A-irreducible subspaces U1 , . . . , Ul are given by Th.
9.9(a). As in Th. 9.9(a), for each k ∈ {1, . . . , l}, let vk ∈ Uk be such that

Bk := {vk , Avk , . . . , Ask −1 vk }, sk := dim Uk ,

is a basis of Uk . By Prop. 9.8(a), there exists i := i(k) ∈ {1, . . . , m} such that Uk ⊆


ker(A − λi Id)ri . For sk = 1, we have Jk = Bk = {vk }. For sk > 1, letting, for each
j ∈ {0, . . . , sk − 1}, wj := (A − λi Id)j vk , we obtain

∀ Awj = A(A − λi Id)j vk = (A − λi Id +λi Id)(A − λi Id)j vk = wj+1 + λi wj


j∈{0,...,sk −2}
(9.16a)
sk
and, using (A − λi Id) vk = 0 due to ǫAVk (µAVk ) = 0,

Awsk −1 = A(A − λi Id)sk −1 vk = (A − λi Id)sk vk + λi wsk −1 = λi wsk −1 . (9.16b)

Thus, with respect to the ordered basis (wsk −1 , . . . , w0 ), AUk has the matrix Nk . As
each Uk is A-invariant, this also proves N to be the matrix of A with respect to J.
Proposition 9.10(a),(c) yields sk ≤ ri ≤ dim ker(A − λi Id)ri . Moreover, (9.16) shows
9 JORDAN NORMAL FORM 134

that Jk contains precisely one generalized eigenvector of rank s for each s ∈ {1, . . . , sk }.
Finally, (9.15), i.e. the formulas for the l(i, s), are given by the recursion (9.12) (which
was inferred from (9.9)), using that, in the current situation, g = X − λi , dim g = 1,
and (with Vi := ker(A − λi Id)ri ),

∀ dim Im ǫAVi (g s ) = dim ker(A − λi Id)ri − dim ker(A − λi Id)s .


s∈N0

(c) was already proved as it is merely a restatement of Th. 9.9(b). 


Example 9.14. (a) In Ex. 9.12(a), we considered V := F 6 (F some field) and A ∈
L(V, V ) with

χA = (X − 2)2 (X − 3)4 , µA = (X − 2)2 (X − 3)3 .

We obtained V = 3i=1 Ui with U1 ⊆ ker(A − 2 Id)2 with dim U1 = 2 and U2 , U3 ⊆


L
ker(A − 3 Id)3 with dim U2 = 3, dim U3 = 1. Thus, the corresponding matrix in
Jordan normal form is
 
2 1

 2 

 3 1 
N = .
 3 1 

 3 
3

(b) Consider the matrices


   
2 1 2 1
 2 1   2 1 
   

 2 


 2 

 2 1   2 1 
N1 :=  , N2 :=  

 2 1 


 2 


 2 


 2 1 

 2   2 
2 2
over some field F . Both matrices are in Jordan normal form with

χN1 = χN2 = (X − 2)8 , µN1 = µN2 = (X − 2)3 .

Both have the same total number of Jordan blocks, namely 4, which corresponds
to
dim ker(N1 − 2 Id8 ) = dim ker(N2 − 2 Id8 ) = 4.
9 JORDAN NORMAL FORM 135

The differences appear in the generalized eigenspaces of higher order: N1 has two
linearly independent generalized eigenvectors of rank 2, whereas N2 has three lin-
early independent generalized eigenvectors of rank 2, yielding
dim ker(N1 − 2 Id8 )2 − dim ker(N1 − 2 Id8 ) = 2, i.e. dim ker(N1 − 2 Id8 )2 = 6,
dim ker(N2 − 2 Id8 )2 − dim ker(N2 − 2 Id8 ) = 3, i.e. dim ker(N2 − 2 Id8 )2 = 7.
Next, N1 has two linearly independent generalized eigenvectors of rank 3, whereas
N2 has one linearly independent generalized eigenvector of rank 3, yielding
dim ker(N1 − 2 Id8 )3 − dim ker(N1 − 2 Id8 )2 = 2, i.e. dim ker(N1 − 2 Id8 )3 = 8,
dim ker(N2 − 2 Id8 )3 − dim ker(N2 − 2 Id8 )2 = 1, i.e. dim ker(N2 − 2 Id8 )3 = 8.
From (9.15a), we obtain (with i = 1)
lN1 (1, 3) = 2, lN2 (1, 3) = 1,
corresponding to N1 having two blocks of size 3 and N2 having one block of size 3.
From (9.15b), we obtain (with i = 1)
lN1 (1, 2) = 8 − 4 − 2(3 − 1) = 0, lN2 (1, 2) = 8 − 4 − 1(3 − 1) = 2,
corresponding to N1 having no blocks of size 2 and N2 having two blocks of size 2.
To check consistency, we use (9.15b) again to obtain
lN1 (1, 1) = 8−0−0(2−0)−2(3−0) = 2, lN2 (1, 1) = 8−0−2(2−0)−1(3−0) = 1,
corresponding to N1 having two blocks of size 1 and N2 having one block of size 1.
Remark 9.15. In the situation of Th. 9.13, we saw that, in order to find a matrix N in
Jordan normal form for A, according to Th. 9.13(b), in general, for each i ∈ {1, . . . , m}
and each s ∈ {1, . . . , ri }, one has to know dim ker(A − λi Id)s . On the other hand,
given a matrix M of A, one might also want to find the transition matrix T ∈ GLn (F )
such that M = T N T −1 . As it turns out, if one has already determined generalized
eigenvectors forming bases of ker(A − λi Id)s , one may use these same vectors for the
columns of T : Indeed, if M = T N T −1 and t1 , . . . , tn denote the columns of T , then, if
j ∈ {1, . . . , n} corresponds to a column of N with λi being the only nonzero entry, then
M tj = T N T −1 tj = T N ej = T λi ej = λi tj ,
showing tj to be a corresponding eigenvector. If j ∈ {1, . . . , n} corresponds to a column
of N having nonzero entries λi and 1 (above the λi ), then
(M − λi Idn )tj = (T N T −1 − λi Idn )tj = T N ej − λi tj = T (ej−1 + λi ej ) − λi tj
= tj−1 + λi tj − λi tj = tj−1
showing tj to be a generalized eigenvector of rank ≥ 2 corresponding to the Jordan block
containing the index j.
10 VECTOR SPACES WITH INNER PRODUCTS 136

10 Vector Spaces with Inner Products


In this section, the field F will always be R or C. As before, we write K if K may stand
for R or C.

10.1 Definition, Examples


Definition 10.1. Let X be a vector space over K. A function h·, ·i : X × X −→ K
is called an inner product or a scalar product on X if, and only if, the following three
conditions are satisfied:

(i) hx, xi ∈ R+ for each 0 6= x ∈ X (i.e. an inner product is positive definite).

(ii) hλx + µy, zi = λhx, zi + µhy, zi for each x, y, z ∈ X and each λ, µ ∈ K (i.e. an
inner product is K-linear in its first argument).

(iii) hx, yi = hy, xi for each x, y ∈ X (i.e. an inner product is conjugate-symmetric,


even symmetric for K = R).

Lemma 10.2. For each inner product h·, ·i on a vector space X over K, the following
formulas are valid:

(a) hx, λy + µzi = λ̄hx, yi + µ̄hx, zi for each x, y, z ∈ X and each λ, µ ∈ K, i.e. h·, ·i
is conjugate-linear in its second argument, even linear for K = R. Together with
Def. 10.1(ii), this means that h·, ·i is a sesquilinear form, even a bilinear form for
K = R.

(b) h0, xi = hx, 0i = 0 for each x ∈ X.

Proof. (a): One computes, for each x, y, z ∈ X and each λ, µ ∈ K,


Def. 10.1(iii) Def. 10.1(ii)
hx, λy + µzi = hλy + µz, xi = λhy, xi + µhz, xi
Def. 10.1(iii)
= λ̄ hy, xi + µ̄ hz, xi = λ̄hx, yi + µ̄hx, zi.

(b): One computes, for each x ∈ X,


Def. 10.1(iii) Def. 10.1(ii)
hx, 0i = h0, xi = h0x, xi = 0hx, xi = 0,

thereby completing the proof of the lemma. 


10 VECTOR SPACES WITH INNER PRODUCTS 137

Remark 10.3. If X is a vector space over K with an inner product h·, ·i, then the map
p
k · k : X −→ R+0, kxk := hx, xi,

defines a norm on X (cf. [Phi16b, Prop. 1.65]). One calls this the norm induced by the
inner product.
Definition 10.4.
 Let X be a vector space over K. If h·, ·i is an inner product on X,
then X, h·, ·i is called an inner product space or a pre-Hilbert space. An inner product
p if, and only if, (X, k·k) is a Banach space, where k·k is the
space is called a Hilbert space
induced norm, i.e. kxk := hx, xi. Frequently, the inner product on X is understood
and X itself is referred to as an inner product space or Hilbert space.
Example 10.5. (a) On the space Kn , n ∈ N, we define an inner product by letting,
for each z = (z1 , . . . , zn ) ∈ Kn , w = (w1 , . . . , wn ) ∈ Kn :
n
X
z · w := zj w̄j (10.1)
j=1

(called the standard inner product on Kn , also the Euclidean inner product for
K = R). Let us verify that (10.1), indeed, defines an inner product in the sense
10.1: If z 6= 0, then there is j0 ∈ {1, . . . , n} such that zj0 6= 0. Thus,
of Def. P
z · z = nj=1 |zj |2 ≥ |zj0 |2 > 0, i.e. Def. 10.1(i) is satisfied. Next, let z, w, u ∈ Kn
and λ, µ ∈ K. One computes
n
X n
X n
X
(λz + µw) · u = (λzj + µwj )ūj = λzj ūj + µwj ūj = λ(z · u) + µ(w · u),
j=1 j=1 j=1

i.e. Def. 10.1(ii) is satisfied. For Def. 10.1(iii), merely note that
n
X n
X
z·w = zj w̄j = wj z̄j = w · z.
j=1 j=1

Hence, we have shown that (10.1) defines an inner product according to Def. 10.1.
Due to [Phi16b, Prop. 1.59(b)], the induced norm is complete, i.e. Kn with the inner
product of (10.1) is a Hilbert space.

(b) Let a, b ∈ R, a < b. We define an inner product on the space X := C[a, b] of


continuous K-valued functions on [a, b] by letting
Z b
h·, ·i : X × X −→ K, hf, gi := fg: (10.2)
a
10 VECTOR SPACES WITH INNER PRODUCTS 138

We verify that (10.2), indeed, defines an inner product: If f ∈ X, f 6≡ 0, then there


exists t ∈ [a, b] such that |f (t)| = α ∈ R+ . As f is continuous, there exists δ > 0
such that
α
∀ |f (s)| > ,
s∈[a,b]∩[t−δ,t+δ] 2

implying Z b Z
2
hf, f i = |f | ≥ |f |2 > 0.
a [a,b]∩[t−δ,t+δ]

Next, let f, g, h ∈ X and λ, µ ∈ K. One computes


Z Z Z
hλf + µg, hi = (λf + µg) h dµ = λ f h dµ + µ gh dµ = λhf, hi + µλhg, hi.
Ω Ω Ω

Moreover, Z Z
∀ hf, gi = f g dµ = f g dµ = hg, f i,
f,g∈X Ω Ω

showing h·, ·i to be an inner product on X. As it turns out, (X, k · k) is an example


of an inner product space that is not a Hilbert space: With respect to the norm
induced by h·, ·i (usually called the 2-norm, denoted k·k2 ), X is dense in L2 [a, b] (the
space of square-integrable functions with respect to Lebesgue (or Borel) measure
on [a, b], cf., e.g. [Phi17a, Th. 2.49(a)]), which is the completion of X with respect
to k · k2 .

10.2 Preserving Norm, Metric, Inner Product


The following Th. 10.6 provides important relations between the inner product and its
induced norm and metric:

Theorem 10.6. Let X, h·, ·i be an inner product space over K with induced norm k·k.
Then the following assertions hold true:

(a) Cauchy-Schwarz Inequality:

∀ |hx, yi| ≤ kxk kyk.


x,y∈X

(b) Parallelogram Law:

kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 .



∀ (10.3)
x,y∈X
10 VECTOR SPACES WITH INNER PRODUCTS 139

(c) Metric Expressed via Norm and Inner Product:

∀ kx − yk2 = kxk2 − hx, yi − hy, xi + kyk2 . (10.4)


x,y∈X

(d) If K = R, then
1  (10.3) 1
kx + yk2 − kx − yk2 = kxk2 + kyk2 − kx − yk2 . (10.5)

∀ hx, yi =
x,y∈X 4 2

If K = C, then

1
kx + yk2 − kx − yk2 + i kx + iyk2 − i kx − iyk2 .

∀ hx, yi =
x,y∈X 4

Proof. (a) was proved as [Phi16b, Th. 1.64].


(b),(c): The computation

kx − yk2 = hx − y, x − yi = kxk2 − hx, yi − hy, xi + kyk2

proves (c), whereas

kx + yk2 + kx − yk2 = kxk2 + hx, yi + hy, xi + kyk2 + kxk2 − hx, yi − hy, xi + kyk2
= 2 kxk2 + kyk2


proves (10.3).
(d): If K = R, then
kx + yk2 − kx − yk2 = 4 hx, yi.
If K = C, then

kx + yk2 − kx − yk2 + i kx + iyk2 − i kx − iyk2 = 4 Rehx, yi + 4i Rehx, iyi


= 4 Rehx, yi + 4 Imhx, yi = 4 hx, yi,

proving (d). 

One can actually also show (with more effort) that a normed space that satisfies (10.3)
must be an inner product space, see, e.g., [Wer11, Th. V.1.7].
Definition 10.7. Let (X, k · k), (Y, k · k) be normed vector spaces over K, f : X −→ Y .

(a) One calls f norm-preserving if, and only if,

∀ kf (v)k = kvk.
v∈X
10 VECTOR SPACES WITH INNER PRODUCTS 140

(b) One calls f isometric if, and only if,

∀ kf (u) − f (v)k = ku − vk
u,v∈X

(i.e. if, and only if, f preserves the metric induced by the norm on X).

(c) If the norms on X and Y are induced via inner products h·, ·i on X and Y , re-
spectively, kvk2 = hv, vi, then one calls f inner product-preserving if, and only
if,
∀ hf (u), f (v)i = hu, vi.
u,v∈X

While, in general, neither norm-preserving nor isometric implies any of the other prop-
erties defined in Def. 10.7 (cf. Ex. 10.9(a),(b) below), there exist simple as well as subtle
relationships between these notions and also relationships with linearity:
 
Theorem 10.8. X, h·, ·i , Y, h·, ·i be inner product spaces over K, f : X −→ Y . We
consider X, Y with the respective induced norms and metrics, kvk2 = hv, vi, d(u, v) =
ku − vk.

(a) If f is inner product-preserving, then f is norm-preserving, isometric, and linear.

(b) If K = R and f is isometric as well as norm-preserving, then f is inner product-


preserving (this does, in general, not hold for K = C, see Ex. 10.9(c)).

(c) If f is isometric with f (0) = 0, then f is norm-preserving and, for K = R, also


inner product-preserving (however, once again, cf. Ex. 10.9(c) for K = C).

(d) If K = R and f is isometric, then f is affine (i.e. x 7→ f (x) − f (0) is linear, cf.
Def. 1.26)7 . As in (b) and (c), Ex. 10.9(c) below shows that the result does not
extend to K = C.

(e) If f is linear, then the following statements are equivalent (where the equivalence
between (i) and (ii) even holds in arbitrary normed vector spaces X, Y over K):

(i) f is isometric.
7
This result holds in more general situations: If X and Y are arbitrary normed vector spaces over
R and f : X −→ Y is isometric, then f must be linear, provided Y is strictly convex (cf. [FJ03, Th.
1.3.8] and see Ex. 10.9(e) for a definition of strictly convex spaces) or provided f is surjective (this is
the Mazur-Ulam Theorem, cf. [FJ03, Th. 1.3.5]). However, there exist nonlinear isometries into spaces
that are not strictly convex, cf. Ex. 10.9(e).
10 VECTOR SPACES WITH INNER PRODUCTS 141

(ii) f is norm-preserving.
(iii) f is inner product-preserving.

Proof. (a): Assume f to be inner product-preserving and let u, v ∈ X, λ ∈ K. Then,

kf (v)k2 = hf (v), f (v)i = hv, vi = kvk2 ,

showing f to be norm-preserving. In consequence,


(10.4)
kf (u) − f (v)k2 = kf (u)k2 − hf (u), f (v)i − hf (v), f (u)i + kf (v)k2
= kuk2 − hu, vi − hv, ui + kvk2 ,

showing f to be isometric. Moreover,

kf (λu) − λf (u)k2 = hf (λu) − λf (u), f (λu) − λf (u)i


= hf (λu), f (λu)i − hf (λu), λf (u)i − hλf (u), f (λu)i + hλf (u), λf (u)i
= hλu, λui − λhf (λu), f (u)i − λhf (u), f (λu)i + λλhf (u), f (u)i
= λλhu, ui − λλhu, ui − λλhu, ui + λλhu, ui = 0,

proving f (λu) = λf (u), and

kf (u + v) − (f (u) + f (v))k2 = hf (u + v) − (f (u) + f (v)), f (u + v) − (f (u) + f (v))i


= hf (u + v), f (u + v)i − hf (u + v), (f (u) + f (v))i
− h(f (u) + f (v)), f (u + v)i + h(f (u) + f (v)), (f (u) + f (v))i
= hu + v, u + vi − hf (u + v), f (u)i − hf (u + v), f (v)i
− hf (u), f (u + v)i − hf (v), f (u + v)i
+ hf (u), f (u)i + hf (u), f (v)i + hf (v), f (u)i + hf (v), f (v)i
= hu, ui + hu, vi + hv, ui + hv, vi
− hu, ui − hv, ui − hu, vi − hv, vi
− hu, ui − hu, vi − hv, ui − hv, vi
+ hu, ui + hu, vi + hv, ui + hv, vi = 0,

proving f (u + v) = f (u) + f (v).


(b) is immediate from (10.5).
(c): If f is isometric with f (0) = 0, then f is always norm-preserving due to

∀ kf (x)k = kf (x) − 0k = kf (x) − f (0)k = kx − 0k = kxk


x∈X
10 VECTOR SPACES WITH INNER PRODUCTS 142

(this works in arbitrary normed vector spaces X, Y over R or C). According to (b), f is
then also inner product-preserving.
(d): Let f be isometric. To show f is affine, it suffices to show

F : X −→ Y, F (x) := f (x) − f (0),

is linear. Due to

∀ kF (u) − F (v)k = kf (u) − f (v)k = ku − vk,


u,v∈X

F is isometric as well. Thus, by (c), F is inner product-preserving, which, by (a), implies


F to be linear, as desired.
(e): (iii) ⇒ (i),(ii) holds due to (a). Next, we show (i) ⇔ (ii): As f is linear, we have

(ii) ⇔ ∀ kf (u)k = kuk


u∈X

⇔ ∀ f (u) − f (v) = f (u − v) = ku − vk ⇔ (i).
u,v∈X

It remains to prove (ii) ⇒ (iii). To this end, let u, v ∈ X. Then (ii) and the linearity of
f imply

hf (u), f (u)i + hf (u), f (v)i + hf (v), f (u)i + hf (v), f (v)i = hf (u) + f (v), f (u) + f (v)i
= hf (u + v), f (u + v)i = kf (u + v)k2 = ku + vk2
= hu + v, u + vi = hu, ui + hu, vi + hv, ui + hv, vi

and, thus, using (ii) again,

hf (u), f (v)i + hf (v), f (u)i = hu, vi + hv, ui.

Similarly,

hf (u), f (u)i − ihf (u), f (v)i + ihf (v), f (u)i + hf (v), f (v)i = hU (u + iv), U (u + iv)i
= hu + iv, u + ivi = hu, ui − ihu, vi + ihv, ui + hv, vi

and, thus,
hf (u), f (v)i − hf (v), f (u)i = hu, vi − hv, ui.
Adding both results and dividing by 2 then yields (iii). 

Example 10.9. Let (X, k · k), (Y, k · k) be normed vector spaces over K.
10 VECTOR SPACES WITH INNER PRODUCTS 143

(a) If 0 6= a ∈ X, then the translation

f : X −→ X, f (x) := x + a,

is isometric due to

∀ kf (u) − f (v)k = ku + a − (v + a)k = ku − vk.


u,v∈X

However, f is not norm-preserving due to

kf (0)k = kak 6= 0,

and, if h·, ·i is an inner product on X, then f is not inner product-preserving due


to
hf (0), f (0)i = ha, ai 6= 0.

(b) The following maps g and h are norm-preserving, but neither continuous (with
respect to the induced metric) nor linear. Thus, according to [Phi16b, Lem. 2.32(b)],
the maps are not isometric and, using Th. 10.8(a), they are not inner product-
preserving (if h·, ·i is an inner product on X). To define g, let 0 6= a ∈ X and
set (
x for x 6= a,
g : X −→ X, g(x) :=
−a for x = a.
Clearly, g is norm-preserving, discontinuous in a, and nonlinear (e.g. 2a = g(2a) 6=
−2a = 2g(a)). The map
(
x for Re x ∈ Q and Im x ∈ Q,
h : K −→ K, h(x) :=
−x otherwise,

is, clearly,
√ norm-preserving,
√ nowhere
√ continuous,
√ except in 0, and nonlinear (e.g.
−1 − 2 = h(1 + 2) 6= 1 − 2 = h(1) + h( 2)).

(c) The map


f : C −→ C, f (z) := z,
satisfies f (0) = 0, is surjective, and it is norm-preserving as well as isometric due
to |z| = |z| and |z − w| = |z − w|, but neither inner-product preserving nor C-linear
(it is conjugate linear due to f (wz) = wf (z)). If we consider (C2 , k · k2 ), then the
map
g : C2 −→ C2 , g(z, w) := (z, w),
10 VECTOR SPACES WITH INNER PRODUCTS 144

satisfies g(0) = 0, is surjective, norm-preserving (due to kg(z, w)k22 = zz + ww =


k(z, w)k22 ), and isometric (due to

kg(z, w) − g(u, v)k22 = k(z − u, w − v)k22 = |z − u|2 + |w − v|2 = |z − u|2 + |w − v|2


= k(z, w) − (u, v)k22 ),

but neither C-linear nor conjugate linear.


(d) Let (X, k · k) = (R, | · |) and (Y, k · k) = (R2 , k · k∞ ). Then the map

f : R −→ R2 , f (x) := (x, sin x),

satisfies f (0) = (0, 0), is isometric (due to

kf (x) − f (y)k∞ = max{|x − y|, | sin x − sin y|} = |x − y|,

recalling sin to be 1-Lipschitz), but not linear.


(e) The normed vector space (Y, k · k) over R is called strictly convex if, and only if,
there do not exist vectors u, v ∈ Y with u 6= v and such that

∀ α u + (1 − α) v = 1 (10.6)
α∈[0,1]

(if k·k is induced by an inner product on Y , then (Y, k·k) is strictly convex; whereas
(Rn , k · k∞ ) is not strictly convex for n > 1). If (Y, k · k) is not strictly convex and
u, v ∈ Y with u 6= v are such that (10.6) holds, then
(
su for s ≤ 1,
f : R −→ Y, f (s) :=
u + (s − 1) v for s > 1,

satisfies f (0) = (0, 0) and is isometric: To prove isometry, note


(
ksu − tuk = |s − t| for s, t ≤ 1,
kf (s) − f (t)k =
ku + (s − 1)v − u − (t − 1)vk = |s − t| for 1 < s, t,

and, for s ≤ 1 < t,

kf (s) − f (t)k = ksu − u − (t − 1)vk = k(t − 1)v + (1 − s)uk



(t − 1)v + (1 − s)u
= (t − s)
= |s − t|,
t−s
t−1 1−s t−1 1−s
since 0 ≤ ,
t−s t−s
≤ 1 and t−s
+ t−s
= 1. However, f is not linear, e.g., since

u + v = f (2) = f (1 + 1) 6= f (1) + f (1) = 2u.


10 VECTOR SPACES WITH INNER PRODUCTS 145

10.3 Orthogonality

Definition 10.10. Let X, h·, ·i be an inner product space over K.

(a) x, y ∈ X are called orthogonal or perpendicular (denoted x ⊥ y) if, and only if,
hx, yi = 0.
(b) Let E ⊆ X. Define the perpendicular space E ⊥ to E (called E perp) by
 

E := y ∈ X : ∀ hx, yi = 0 . (10.7)
x∈E

Caveat: As is common, we use the same symbol to denote the perpendicular space
that we used to denote the forward annihilator in Def. 2.13, even though these
objects are not the same: The perpendicular space is a subset of X, whereas the
forward annihilator is a subset of X ′ . In the following, when dealing with inner
product spaces, E ⊥ will always mean the perpendicular space.
(c) If X = V1 ⊕ V2 with subspaces V1 , V2 of X, then we call X the orthogonal sum of
V1 , V2 if, and only if, v1 ⊥ v2 for each v1 ∈ V1 , v2 ∈ V2 . In this case, we also write
X = V1 ⊥ V2 .
(d) Let S ⊆ X. Then S is an orthogonal system if, and only if, x ⊥ y for each x, y ∈ S
with x 6= y. A unit vector is x ∈ X such that kxk = 1 (with respect to the
induced norm on X). Then S is called an orthonormal system if, and only if, S
is an orthogonal system consisting entirely of unit vectors. Finally, S is called an
orthonormal basis if, and only if, it is a maximal orthonormal system in the sense
that, if S ⊆ T ⊆ X and T is an orthonormal system, then S = T (caveat: if X is
an infinite-dimensional Hilbert space, then an orthonormal basis of X is not(!) a
vector space basis of X).

Lemma 10.11. Let X, h·, ·i be an inner product space over K, E ⊆ X.

(a) E ∩ E ⊥ ⊆ {0}.
(b) E ⊥ is a subspace of X.
(c) X ⊥ = {0} and {0}⊥ = X.

Proof. (a): If x ∈ E ∩ E ⊥ , then hx, xi = 0, implying x = 0.


(b): We have 0 ∈ E ⊥ and

∀ ∀ ∀ hx, λy1 + µy2 i = λhx, y1 i + µhx, y2 i = 0,


λ,µ∈K y1 ,y2 ∈E ⊥ x∈E
10 VECTOR SPACES WITH INNER PRODUCTS 146

showing λy1 + µy2 ∈ E ⊥ , i.e. E ⊥ is a subspace of X.


(c): Is x ∈ X ⊥ , then 0 = hx, xi, implying x = 0. On the other hand, hx, 0i = 0 for each
x ∈ X by Lem. 10.2(b). 

Proposition 10.12. Let X, h·, ·i be an inner product space over K and let S ⊆ X be
an orthogonal system.

(a) S \ {0} is linearly independent.

(b) Pythagoras’ Theorem: If s1 , . . . , sn ∈ S are distinct, n ∈ N, then


n 2 n
X X
s = ksi k2 ,

i

i=1 i=1

where k · k is the induced norm on X.

Pn n ∈ N and λ1 , . . . , λn ∈ K together with s1 , . . . , sn ∈ S \{0} distinct


Proof. (a): Suppose
are such that i=1 λi si = 0. Then, as hsk , sj i = 0 for each k 6= j, we obtain
* n + n
X X
∀ 0 = h0, sj i = λi s i , s j = λi hsi , sj i = λj hsj , sj i,
j∈{1,...,n}
i=1 i=1

which yields λj = 0 by Def. 10.1(i). Thus, we have shown that λj = 0 for each j ∈
{1, . . . , n}, which establishes the linear independence of S \ {0}.
(b): We compute
2 * +
Xn n
X n
X n X
X n n
X n
X
si = si , sj = hsi , sj i = hsi , si i = ksi k2 ,



i=1 i=1 j=1 i=1 j=1 i=1 i=1

thereby establishing the case. 

To obtain orthogonal systems and orthonormal systems in inner product spaces, one
can apply the algorithm provided by the following Th. 10.13:

Theorem 10.13 (Gram-Schmidt Orthogonalization). Let X, h·, ·i be an inner product
space over K with induced norm k · k. Let x0 , x1 , . . . be a finite or infinite sequence of
vectors in X. Define v0 , v1 , . . . recursively as follows:
n−1
X hxn , vk i
v0 := x0 , vn := xn − 2
vk (10.8)
k=0,
kv kk
vk 6=0
10 VECTOR SPACES WITH INNER PRODUCTS 147

for each n ∈ N, additionally assuming that n is less than or equal to the max index of
the sequence x0 , x1 , . . . if the sequence is finite. Then the set {v0 , v1 , . . . } constitutes an
orthogonal system. Of course, by omitting the vk = 0 and by dividing each vk 6= 0 by
its norm, one can also obtain an orthonormal system (nonempty if at least one vk 6= 0).
Moreover, vn = 0 if, and only if, xn ∈ span{x0 , . . . , xn−1 }. In particular, if the x0 , x1 , . . .
are all linearly independent, then so are the v0 , v1 , . . . .

Proof. We show by induction on n ∈ N0 , that, for each 0 ≤ m < n, vn ⊥ vm . For n = 0,


there is nothing to show. Thus, let n > 0 and 0 ≤ m < n. By induction, hvk , vm i = 0
for each 0 ≤ k, m < n such that k 6= m. For vm = 0, hvn , vm i = 0 is clear. Otherwise,
* n−1
+
X hxn , vk i hxn , vm i
hvn , vm i = xn − 2
vk , vm = hxn , vm i − hvm , vm i = 0,
k=0,
kvk k kvm k2
vk 6=0

thereby establishing the case. So we know that v0 , v1 , . . . constitutes an orthogonal


system. Next, by induction, for each n, we obtain vn ∈ span{x0 , . . . , xn } directly from
(10.8). Thus, vn = 0 implies xn = n−1 hxn ,vk i
P
k=0, kvk k2 vk ∈ span{x0 , . . . , xn−1 }. Conversely, if
vk 6=0
xn ∈ span{x0 , . . . , xn−1 }, then

dim span{v0 , . . . , vn−1 , vn } = dim span{x0 , . . . , xn−1 , xn } = dim span{x0 , . . . , xn−1 }


= dim span{v0 , . . . , vn−1 },

which, due to Prop. 10.12(a), implies vn = 0. Finally, if all x0 , x1 , . . . are linearly inde-
pendent, then all vk 6= 0, k = 0, 1, . . . , such that the v0 , v1 , . . . are linearly independent
by Prop. 10.12(a). 

Example 10.14. In the space C[−1, 1] with the inner product according to Ex. 10.5(b),
consider
∀ xi : [−1, 1] −→ K, xi (x) := xi .
i∈N0

We check that the first four orthogonal polynomials resulting from applying (10.8) to
x0 , x1 , . . . are given by
1 3
v0 (x) = 1, v1 (x) = x, v2 (x) = x2 − , v3 (x) = x3 − x.
3 5
One has v0 = x0 ≡ 1 and, then, obtains successively from (10.8):
R1
hx1 , v0 i hx1 , v0 i x dx
v1 (x) = x1 (x) − 2
v0 (x) = x − 2
= x − R−11 = x,
kv0 k kv0 k dx−1
10 VECTOR SPACES WITH INNER PRODUCTS 148

R1 2
R1 3
hx2 , v0 i hx2 , v1 i −1
x dx x dx
v2 (x) = x2 (x) − 2
v0 (x) − v1 (x) = x2 − − R−1
1 x
kv0 k kv1 k2 2 x2 dx
−1
1
= x2 − .
3
hx3 , v0 i hx3 , v1 i hx3 , v2 i
v3 (x) = x3 (x) − 2
v0 (x) − 2
v1 (x) − v2 (x)
kv0 k kv1 k kv2 k2
R1 3 R1 4 R 1 3 2 1
x dx x dx x x − 3 dx
 
3 −1 −1 −1 2 1
=x − − R1 x− R1 2 x −
2 x2 dx x2 − 1 dx 3
−1 −1 3
2
3
= x3 − x = x3 − x.
5
2
3
5
 
Definition 10.15. Let X, h·, ·i and Y, h·, ·i be inner product spaces over K. We
call X and Y isometrically isomorphic if, and only if, there exists an isometric linear
isomorphism A ∈ L(X, Y ) (i.e., by Th. 10.8(e), a linear isomorphism A, satisfying
hAu, Avi = hu, vi for each u, v ∈ X).

Theorem 10.16. Let X, h·, ·i be a finite-dimensional inner product space over K.

(a) An orthonormal system S ⊆ X is an orthonormal basis of X if, and only if, S is a


vector space basis of X.

(b) X has an orthonormal basis.



(c) If Y, h·, ·i is an inner product space over K, then X and Y are isometrically
isomorphic if, and only if, dim X = dim Y .

(d) If U is a subspace of X, then the following holds:

(i) X = U ⊥ U ⊥ .
(ii) dim U ⊥ = dim X − dim U .
(iii) (U ⊥ )⊥ = U .

Proof. (a): Let S be an orthonormal system. If S is an orthonormal basis, then, by Prop.


10.12(a) and Th. 10.13, S is a maximal linearly independent subset of X, showing S to
be a vector space basis of X. Conversely, if S is a vector space basis of X, then, again
using Prop. 10.12(a), S must be a maximal orthonormal system, i.e. an orthonormal
basis of X.
(b): According to Th. 10.13, applying Gram-Schmidt orthogonalization to a vector space
basis of X yields an orthonormal basis of X.
10 VECTOR SPACES WITH INNER PRODUCTS 149

(c): We already know that the existence of a linear isomorphism between X and Y
implies dim X = dim Y . Conversely, assume n := dim X = dim Y ∈ N, and let BX =
{x1 , . . . , xn }, BY = {y1 , . . . , yn } be orthonormal bases of X, Y , respectively. Define
A ∈ L(X, Y ) by letting Axi = yi for each i ∈ {1, . . . , n}. As A(BX ) = BY , A is a linear
isomorphism. Moreover, if λ1 , . . . , λn , µ1 , . . . , µn ∈ K, then
* n
! n
!+ n X n n X
n
X X X X
A λi xi , A µ j xj = λi µj hyi , yj i = λi µj δij
i=1 j=1 i=1 j=1 i=1 j=1
n X
n
* n n
+
X X X
= λi µj hxi , xj i = λi xi , µj xj ,
i=1 j=1 i=1 j=1

showing A to be isometric.
(d): Let {x1 , . . . , xn } be a basis of X such that {x1 , . . . , xm }, 1 ≤ m ≤ n, is a basis
of U , dim X = n, dim U = m. In this case, if v1 , . . . , vn are given by Gram-Schmidt
orthogonalization according to Th. 10.13, then Th. 10.13 yields {v1 , . . . , vn } to be a
basis of X and {v1 , . . . , vm } to be a basis of U . Since {vm+1 , . . . , vn } ⊆ U ⊥ , we have
X = U + U ⊥ . As we also know U ∩ U ⊥ = {0} by Lem. 10.11(a), we have shown
X = U ⊥ U ⊥ , then yielding dim U ⊥ = dim X − dim U as well. As a consequence of (ii),
we have dim(U ⊥ )⊥ = dim U , which, together with U ⊆ (U ⊥ )⊥ , yields (U ⊥ )⊥ = U . 

Using Zorn’s lemma, one can extend Th. 10.16(b) to infinite-dimensional spaces (cf.
[Phi17b, Th. 4.31(a)]). However, as remarked before, an orthonormal basis of a Hilbert
space X is a vector space basis of X if, and only if, dim X < ∞ (cf. [Phi17b, Rem.
4.33]). Moreover, Th. 10.16(c) also extends to infinite-dimensional Hilbert spaces, which
are isometrically isomorphic if, and only if, they have orthonormal bases of the same
cardinality (cf. [Phi17b, Th. 4.31(c)]). If one adds the assumption that the subspace
U be closed (with respect to the induced norm), then Th. 10.16(c) extends to infinite-
dimensional Hilbert spaces as well (cf. [Phi17b, Th. 4.20(e),(f)]).
If X is a finite-dimensional inner product space, then all linear forms on X (i.e. all
elements of the dual X ′ ) are given by means of the inner product on X:

Theorem 10.17. Let X, h·, ·i be a finite-dimensional inner product space over K.
Then the map
ψ : X −→ X ′ , ψ(y) := αy , (10.9)
where
αy : X −→ K, αy (a) := ha, yi, (10.10)
is bijective and conjugate-linear (in particular, each α ∈ X ′ can be represented by y ∈ X,
and, if K = R, then ψ is a linear isomorphism).
10 VECTOR SPACES WITH INNER PRODUCTS 150

Proof. According to Def. 10.1(ii), for each y ∈ X, αy is linear, i.e. ψ is well-defined.


Moreover,
ψ(λy1 + µy2 )(a) = ha, λy1 + µy2 i = λha, y1 i + µha, y2 i
∀ ∀ ∀
λ,µ∈K y1 ,y2 ∈X a∈X = (λψ(y1 ) + µψ(y2 ))(a),
showing ψ to be conjugate-linear. It remains to show ψ is bijective. Let {x1 , . . . , xn }
be a basis of X, dim X = n. It suffices to show that B := {αx1 , . . . , αxn } is a basis of
X ′ . As dim X = dim X ′ , it even suffices to Pshow that B is linearly independent. To this
n
end, let λ1 , . . . , λn ∈ K be such that 0 = i=1 λi αxi . Then, for each v ∈ X,
n
! n n
* n
+
X X X X
0= λi α xi v = λi α xi v = λi hv, xi i = v, λi xi ,
i=1 i=1 i=1 i=1
Pn
showing i=1 λi xi ∈ X ⊥ = {0}. As the xi are linearly independent, this yields λ1 =
· · · = λn = 0 and the desired linear independence of B. 
Remark 10.18. The above Th. 10.17 is a finite-dimensional version of the Riesz rep-
resentation theorem for Hilbert spaces, which states that Th. 10.17 remains true if X
is an infinite-dimensional Hilbert space and X ′ is replaced by the topological dual Xtop

of X, consisting of all linear forms on X that are also continuous with respect to the
induced norm on X, cf. [Phi17b, Ex. 3.1] (recall that, if X is finite-dimensional, then
all linear forms on X are automatically continuous, cf. [Phi16b, Ex. 2.16]).
Proposition 10.19. (a) If M = (mkl ) ∈ M(n, C), n ∈ N, is such that
∀ x∗ M x = 0, (10.11)
x∈Cn

then M = 0.

(b) If X, h·, ·i is a finite-dimensional inner product space over C and A ∈ L(X, X) is
such that
∀ hAx, xi = 0, (10.12)
x∈X

then A = 0.
Caveat: This result does not extend to finite-dimensional vector spaces over R (cf. Ex.
10.20 below).

Proof. (a): From matrix multiplication, we have


 Pn 
m 1l x l n X n

  l=1. X
x M x = x1 . . . xn  . =

mkl xk xl .
Pn .
l=1 mnl xl
k=1 l=1
10 VECTOR SPACES WITH INNER PRODUCTS 151

Choosing x to be the standard basis vector eα , α ∈ {1, . . . , n}, we obtain

0 = e∗α M eα = mαα .

Now let α, β ∈ {1, . . . , n} with α 6= β and assume mαβ = a + bi, mβα = c + di with
a, b, c, d ∈ R. If x := (s + ti)eα + eβ , then
n X
X n
0= mkl xk xl = mαβ xα xβ + mβα xβ xα = (a + bi)(s − ti) + (c + di)(s + ti)
k=1 l=1

= as + bt + cs − dt + (bs − at + ct + ds)i

= (a + c)s + (b − d)t + (b + d)s + (c − a)t i,

implying
(a + c)s + (b − d)t = 0 ∧ (b + d)s + (c − a)t = 0.
Choosing s = 0 and t = c − a yields c = a; choosing s = a + c and t = 0 then yields
a + c = 2a = 0, implying a = 0 = c. Likewise, choosing s = 0 and t = b − d yields b = d;
then choosing s = b + d and t = 0 yields b + d = 2b = 0, implying b = 0 = d. Thus,
mαβ = mβα = 0, completing the proof that M = 0.
(b): Let n := dim X ∈ N. According to Th. 10.16(c), there exists a linear isomorphism
I : X −→ Cn such that
∀ hx, yi = hIx, Iyi2 ,
x,y∈X

where h·, ·i2 denotes the standard inner product on Cn (i.e. hu, vi2 = nk=1 uk v k ). Thus,
P
if we let B := I ◦ A ◦ I −1 , then B ∈ L(Cn , Cn ) and

∀ n hBu, ui2 = (I ◦ A ◦ I −1 )u, (I ◦ I −1 )u 2 = (A(I −1 u), I −1 u = 0.





u∈C

Now, if M ∈ M(n, C) represents B with respect to the standard basis of Cn , then, for
each u ∈ Cn , u∗ M u = hM u, ui2 = hBu, ui2 = 0, such that M = 0 by (a). Thus, B = 0,
also implying A = I −1 ◦ B ◦ I = 0. 
Example P10.20. Consider Rn , n ∈ N, n ≥ 2, with the standard inner product (i.e.
n
hu, vi = k=1 uk vk ), and the standard basis {e1 , . . . , en }. Suppose the linear map A :
Rn −→ Rn is defined by Ae1 := −e2 , Ae2 := e1 , Aek := 0 for each k ∈ {1, . . . , n}\{1, 2}.
Then A 6= 0, but

∀ hAx, xi = hx2 e1 − x1 e2 , xi = x2 x1 − x1 x2 = 0.
x=(x1 ,...,xn )∈Rn

Proposition 10.35 below will provide more thorough information on the real case in
comparison with Prop. 10.19.
10 VECTOR SPACES WITH INNER PRODUCTS 152

10.4 The Adjoint Map


 
Definition 10.21. Let X1 , h·, ·i , X2 , h·, ·i be a finite-dimensional inner product
space over K. Moreover, let A ∈ L(X1 , X2 ), let A′ ∈ L(X2′ , X1′ ) be the dual map
according to Def. 2.26, and let ψ1 : X1 −→ X1′ , ψ2 : X2 −→ X2′ be the maps given by
the Th. 10.17. Then the map

A∗ : X2 −→ X1 , A∗ := ψ1−1 ◦ A′ ◦ ψ2 , (10.13)

is called the adjoint map of A.


 
Theorem 10.22. Let X1 , h·, ·i , X2 , h·, ·i be finite-dimensional inner product spaces
over K. Let A ∈ L(X1 , X2 ).

(a) One has A∗ ∈ L(X2 , X1 ), and A∗ is the unique map X2 −→ X1 such that

∀ ∀ hAx, yi = hx, A∗ yi. (10.14)


x∈X1 y∈X2

(b) One has A∗∗ = A.

(c) One has that A 7→ A∗ is a conjugate-linear bijection of L(X1 , X2 ) onto L(X2 , X1 ).

(d) (IdX1 )∗ = IdX1 .



(e) If X3 , h·, ·i is another finite-dimensional inner product space over K and B ∈
L(X2 , X3 ), then
(B ◦ A)∗ = A∗ ◦ B ∗ .

(f ) ker(A∗ ) = (Im A)⊥ and Im(A∗ ) = (ker A)⊥ .

(g) A is a monomorphism if, and only if, A∗ is an epimorphism.

(h) A is a an epimorphism if, and only if, A∗ is a monomorphism.

(i) A−1 ∈ L(X2 , X1 ) exists if, and only if, (A∗ )−1 ∈ L(X1 , X2 ) exists, and, in that case,

(A∗ )−1 = (A−1 )∗ .

(j) A is isometric if, and only if, A∗ = A−1 .


10 VECTOR SPACES WITH INNER PRODUCTS 153

Proof. (a): Let A ∈ L(X1 , X2 ). Then A∗ ∈ L(X2 , X1 ), since A′ is linear, and ψ1−1 and
ψ2 are both conjugate-linear. Moreover, we know A′ is the unique map on X2′ such that

∀ ∀ A′ (β)(x) = β(A(x)).
β∈X2′ x∈X1

Thus, for each x ∈ X1 and each y ∈ X2 ,

hx, A∗ yi = hx, (ψ1−1 ◦ A′ ◦ ψ2 )(y)i = ψ1 (ψ1−1 ◦ A′ ◦ ψ2 )(y) (x)




= A′ (ψ2 (y))(x) = ψ2 (y)(Ax) = hAx, yi,

proving (10.14). For each y ∈ X2 and each x ∈ X1 , we have hAx, yi = (ψ2 (y))(Ax) =
∗ −1
 
(ψ2 (y)) ◦ A (x). Then Th. 10.17 and (10.14) imply A (y) = ψ1 (ψ2 (y)) ◦ A , showing
A∗ to be uniquely determined by (10.14).
(b): According to (a), A∗∗ is the unique map X1 −→ X2 such that

∀ ∀ hA∗ y, xi = hy, A∗∗ xi.


x∈X1 y∈X2

Comparing with (10.14) yields A = A∗∗ .


(c): If A, B ∈ L(X1 , X2 ) and λ ∈ K, then, for each y ∈ X2 ,

(A + B)∗ (y) = (ψ1−1 ◦ (A + B)′ ◦ ψ2 )(y) = (ψ1−1 ◦ (A′ + B ′ ) ◦ ψ2 )(y)


= (ψ1−1 ◦ A′ ◦ ψ2 )(y) + (ψ1−1 ◦ B ′ ◦ ψ2 )(y) = (A∗ + B ∗ )(y)

and
(λA)∗ (y) = (ψ1−1 ◦ (λA)′ ◦ ψ2 )(y) = λ(ψ1−1 ◦ A′ ◦ ψ2 )(y) = (λA∗ )(y),
showing A 7→ A∗ to be conjugate-linear. Moreover, A 7→ A∗ is bijective due to (b).
(d): One has (IdX1 )∗ = ψ1−1 ◦ (IdX1 )′ ◦ ψ1 = ψ1−1 ◦ IdX1′ ◦ψ1 = IdX1 .
(e): Let ψ3 : X3 −→ X3′ be given by Th. 10.17. Then

A∗ ◦ B ∗ = ψ1−1 ◦ A′ ◦ ψ2 ◦ ψ2−1 ◦ B ′ ◦ ψ3 = ψ1−1 ◦ (B ◦ A)′ ◦ ψ3 = (B ◦ A)∗ .

(f): We have

y ∈ ker(A∗ ) ⇔ ∀ hx, A∗ yi = 0 ⇔ ∀ hAx, yi = 0 ⇔ y ∈ (Im A)⊥ .


x∈X1 x∈X1

Applying the first part with A replaced by A∗ yields

ker A = ker A∗∗ = (Im(A∗ ))⊥


10 VECTOR SPACES WITH INNER PRODUCTS 154

and, thus, using Th. 10.16(d)(iii),


⊥
Im(A∗ ) = (Im(A∗ ))⊥ = (ker A)⊥ .

(g): According to (f), we have


ker A = {0} ⇔ Im(A∗ ) = (ker A)⊥ = {0}⊥ = X1 .

(h): According to (f), we have


ker(A∗ ) = {0} ⇔ Im A = (ker(A∗ ))⊥ = {0}⊥ = X2 .

(i): One has


A−1 ∈ L(X2 , X1 ) exists ⇔ (A−1 )′ = (A′ )−1 ∈ L(X1′ , X2′ ) exists
⇔ (A∗ )−1 = (ψ1−1 ◦ A′ ◦ ψ2 )−1 ∈ L(X1 , X2 ) exists.
Moreover, if A−1 ∈ L(X2 , X1 ) exists, then
(A∗ )−1 = ψ2−1 ◦ (A−1 )′ ◦ ψ1 = (A−1 )∗ ,
completing the proof.
(j): If A is isometric, then
∀ ∀ hAx, yi = hAx, AA−1 yi = hx, A−1 yi,
x∈X1 y∈X2

such that (a) implies A∗ = A−1 . Conversely, if A∗ = A−1 , then


∀ hAu, Avi = hu, A∗ Avi = hu, vi,
u,v∈X1

proving A to be isometric. 

For extensions of Def. 10.21 and Th. 10.22 to infinite-dimensional Hilbert spaces, see
[Phi17b, Def. 4.34], [Phi17b, Cor. 4.35].
Definition 10.23. Let m, n ∈ N and let M := (mkl ) ∈ M(m, n, K) be an m × n matrix
over K. We call
M := (mkl ) ∈ M(m, n, K)
the complex conjugate matrix of M and
t
M ∗ := M = (M t ) ∈ M(n, m, K)
the adjoint matrix of M (thus, for K = R, the adjoint matrix is the same as the transpose
matrix).
10 VECTOR SPACES WITH INNER PRODUCTS 155

 
Theorem 10.24. Let X, h·, ·i , Y, h·, ·i be finite-dimensional inner product spaces
over K. Let A ∈ L(X, Y ).

(a) Let BX := (x1 , . . . , xn ) and BY := (y1 , . . . , ym ) be ordered orthonormal bases of X


and Y , respectively. If M = (mkl ) ∈ M(m, n, K) is the matrix of A with respect to
BX and BY , then the adjoint matrix M ∗ represents the adjoint map A∗ ∈ L(Y, X)
with respect to BY and BX .
Pn
(b) Let X = Y , A ∈ L(X, X). Then det(A∗ )P= det A. If χA = k
k=0 ak X is the
n k
characteristic polynomial of A, then χA∗ = k=0 ak X and
σ(A∗ ) = λ : λ ∈ σ(A) .


Proof. (a): Suppose


m
X n
X

∀ Axl = mkl yk ∧ ∀ A yk = nkl xl
l∈{1,...,n} k∈{1,...,m}
k=1 l=1

with nkl ∈ K. Then, for each (k, l) ∈ {1, . . . , m} × {1, . . . , n},


mkl = hAxl , yk i = hxl , A∗ yk i = nkl ,
showing M ∗ to be the matrix of A∗ with respect to BY and BX .
(b): For M as in (a), it is
 t   (∗)
det(A∗ ) = det(M ∗ ) = det M = det M = det M = det A,

where (∗) holds, as the forming of complex conjugates commutes with the forming of
sums and products of complex numbers. Next, using this fact again together with the
linearity of forming the transpose of a matrix, we compute
 t   t 
χA∗ = det(X Idn −M ∗ ) = det X Idn − M = det X Idn −M
n
X
ak X k .

= det X Idn −M =
k=0

Thus,
n
X
λ ∈ σ(A∗ ) ⇔ ǫλ (χA∗ ) = ak λ k = 0
k=0
n
X k
⇔ ǫλ (χA ) = ak λ =0 ⇔ λ ∈ σ(A),
k=0

thereby proving (b). 


10 VECTOR SPACES WITH INNER PRODUCTS 156


Definition 10.25. Let X, h·, ·i be an inner product space over K and let U be a
subspace of X such that X = U ⊥ U ⊥ . Then the linear projection PU : X −→ U ,
PU (u + v) := u for u + v ∈ X with u ∈ U and v ∈ U ⊥ is called the orthogonal projection
from X onto U .

Theorem 10.26. Let X, h·, ·i be an inner product space over K, let U be a subspace
of X such that X = U ⊥ U ⊥ , and let PU : X −→ U be the orthogonal projection onto
U . Moreover, let k · k denote the induced norm on X.

(a) One has  


∀ ∀ u 6= PU (x) ⇒ kPU (x) − xk < ku − xk ,
x∈X u∈U

i.e. PU (x) is the strict minimum of the function u 7→ ku − xk from U into R+


0 and
PU (x) can be viewed as the best approximation to x in U .

(b) If BU := {u1 , . . . , un } is an orthonormal basis of U , dim U = n ∈ N, then


n
X
∀ PU (x) = hx, ui i ui .
x∈X
i=1

Proof. (a): Let x ∈ X and u ∈ U with u 6= PU (x). Then PU (x)−u ∈ U and x−PU (x) ∈
ker PU = U ⊥ . Thus,
Prop. 10.12(b)
ku − xk2 = kPU (x) − xk2 + ku − PU (x)k2 > kPU (x) − xk2 ,

thereby proving (a).


(b): Let x ∈ X. As BU is a basis of U , there exist λ1 , . . . , λn ∈ K such that
n
X
PU (x) = λ i ui .
i=1

Recalling x − PU (x) ∈ ker PU = U ⊥ , we obtain

∀ hx, ui i − hPU (x), ui i = hx − PU (x), ui i = 0


i∈{1,...,n}

and, thus,
∀ hx, ui i = hPU (x), ui i = λi ,
i∈{1,...,n}

thereby proving (b). 


10 VECTOR SPACES WITH INNER PRODUCTS 157


Proposition 10.27. Let X, h·, ·i be an inner product space over K with induced norm
k · k. Let P ∈ L(X, X) be a projection. Then P is an orthogonal projection onto
U := Im P (i.e. X = Im P ⊥ ker P ) if, and only if P = P ∗ .

Proof. First, assume X = Im P ⊥ ker P . Then, for each x = ux + vx and y = uy + vy


with ux , uy ∈ Im P and vx , vy ∈ ker P , we have

hP x, yi = hux , uy i + hux , vy i = hux , uy i + 0 = hux , uy i + hvx , uy i = hx, P yi,

proving P = P ∗ . Conversely, assume P = P ∗ . Then, for each x ∈ X,


(∗)
kP xk2 = hP x, P xi = hP 2 x, xi = hP x, xi ≤ kP xk kxk,

where (∗) holds due to the Cauchy-Schwarz inequality [Phi16b, (1.41)]. In consequence,

∀ kP xk ≤ kxk.
x∈X

Now let u ∈ Im P and 0 6= v ∈ ker P . Define

hu, vi
y := u − v.
kvk2

Then hy, vi = 0, implying


2
hu, vi 2 |hu, vi|2

2 2 2 Prop. 10.12(b)
hu, vi 2
kyk ≥ kP yk = kuk = u − v +
v = kyk +
.
kvk2 kvk2 kvk2

As this yields hu, vi = 0, we have X = Im P ⊥ ker P , as desired. 

Example 10.28. Let n ∈ N0 and define


( n
)
X
Tn := (f : [−π, π] −→ C) : f (t) = γk eikt ∧ γ−n , . . . , γn ∈ C
k=−n

(due to Euler’s formula, relating the exponential function to sine and cosine, the elements
of Tn are known as trigonometric polynomials). Clearly, U := Tn is a subspace of the
space X := C[−π, π] of Ex. 10.5(b). As it turns out, we even have X = U ⊥ U ⊥ (we will
not prove this here, but it follows from [Phi17b, Th. 4.20(e)], since the finite-dimensional
subspace U is automatically a closed subspace, cf. [Phi17b, Th. 1.16(b)]). Thus, one
has an orthogonal projection PU from X onto U , yielding the best approximation of a
continuous function by a trigonometric polynomial. Moreover, if one has an orthonormal
10 VECTOR SPACES WITH INNER PRODUCTS 158

basis of U , then one can use Th. 10.26(b) to compute PU (x) for each function x ∈ X.
We verify that an orthonormal basis is given by the functions

eikt
uk : [−π, π] −→ C, uk (t) := √ (k ∈ {−n, . . . , n}) :

One computes, for each k, l ∈ {−n, . . . , n},
(
π π
1 1 for k = l,
Z Z
uk u l = ei(k−l)t dt = 1 ei(k−l)t π
−π 2π −π [ ]
2π i(k−l) −π
6 l.
= 0 for k =

Thus, for each x ∈ X, the orthogonal projection is


n Z π
1 X
PU (x) = √ uk x(t) e−ikt dt .
2π k=−n −π

For example, let x(t) = t and n = 1. Then, since


Z π  it π Z π it
it te e
te dt = − dt = 2πi − 0 = 2πi,
−π i −π −π i
Z π
t dt = 0,
−π
Z π
te−it dt = −2πi,
−π

PU (x) = 2πi + 0 − 2πi = 0.

10.5 Hermitian, Unitary, and Normal Maps



Definition 10.29. Let X, h·, ·i be a finite-dimensional inner product space over K
and A ∈ L(X, X). Moreover, let M ∈ M(n, K), n ∈ N.

(a) We call A normal if, and only if, AA∗ = A∗ A; likewise, we call M normal if, and
only if, M M ∗ = M ∗ M . We define

Nor(X) := A ∈ L(X, X) : A is normal ,

Norn (K) := M ∈ M(n, K) : M is normal .

(b) We call A Hermitian or self-adjoint if, and only if, A = A∗ ; likewise, we call M
Hermitian or self-adjoint if, and only if, M = M ∗ (thus, for K = R, Hermitian is
10 VECTOR SPACES WITH INNER PRODUCTS 159

the same as symmetric, a notion we previously defined in [Phi19, Def. 7.25(c)] for
quadratic matrices M with M = M t and that is now extended to Hermitian maps A
for K = R). We call A skew-Hermitian if, and only if, −A = A∗ ; likewise, we call M
skew-Hermitian if, and only if, −M = M ∗ (thus, for K = R, skew-Hermitian is the
same as skew-symmetric, a notion we previously defined in [Phi19, Def. 7.30(a)] for
quadratic matrices M with −M = M t and that is now extended to skew-Hermitian
maps A for K = R). Moreover, we call AHer := 21 (A + A∗ ) the Hermitian part
of A, MHer := 12 (M + M ∗ ) the Hermitian part of M , AskHer := 21 (A − A∗ ) the
skew-Hermitian part of A, and AskHer := 21 (M − M ∗ ) the skew-Hermitian part of M
(thus, for K = R, Asym := AHer and Msym := MHer are the same as the symmetric
parts of A and M , respectively; Askew := AskHer and Mskew := MskHer are the same
as the skew-symmetric parts of A and M , respectively; notions previously defined
in [Phi19, Def. 7.30(b)] for quadratic matrices M , now extended to maps A for
K = R). We define

Herm(X) := A ∈ L(X, X) : A is Hermitian ,

Hermn (K) := M ∈ M(n, K) : M is Hermitian ,

skHerm(X) := A ∈ L(X, X) : A is skew-Hermitian ,

skHermn (K) := M ∈ M(n, K) : M is skew-Hermitian ,

and, for K = R,

Sym(X) := A ∈ L(X, X) : A is symmetric ,

Symn (R) := M ∈ M(n, R) : M is symmetric ,

Skew(X) := A ∈ L(X, X) : A is skew-symmetric ,

Skewn (R) := M ∈ M(n, R) : M is skew-symmetric .

(c) We call A ∈ GL(X) unitary if, and only if, A−1 = A∗ ; likewise, we call M ∈ GLn (K)
unitary if, and only if, M −1 = M ∗ . If K = R, then we also call unitary maps and
matrices orthogonal. We define

U(X) := A ∈ L(X, X) : A is unitary ,

Un (K) := M ∈ M(n, K) : M is unitary ,

and, for K = R,

O(X) := A ∈ L(X, X) : A is orthogonal ,

On (R) := M ∈ M(n, R) : M is orthogonal .
10 VECTOR SPACES WITH INNER PRODUCTS 160


Proposition 10.30. Let X, h·, ·i be a finite-dimensional inner product space over
K, n ∈ N. We have Herm(X) ⊆ Nor(X), Hermn (K) ⊆ Norn (K), U(X) ⊆ Nor(X),
Un (K) ⊆ Norn (K),

Proof. That Hermitian implies normal is immediate. If A ∈ U(X), then AA∗ = A∗ A =


Id, i.e. A ∈ Nor(X). 

Proposition 10.31. Let X, h·, ·i be a finite-dimensional inner product space over K,
n ∈ N.

(a) U(X) is a subgroup of GL(X); Un (K) is a subgroup of GLn (K).

(b) If A ∈ U(X) and M ∈ Un (K), then | det A| = | det M | = 1 (specializing to K = R


then yields that, for A, M orthogonal, one has det A, det M ∈ {−1, 1}).

(c) Let A, B ∈ Herm(X). Then A+B ∈ Herm(X) (if λ ∈ R, then also λA ∈ Herm(X),
showing Herm(X) to be a vector subspace of L(X, X) for K = R). If A ∈ GL(X),
then A−1 ∈ Herm(X). If AB = BA, then AB ∈ Herm(X). The analogous results
also hold for Hermitian matrices.

Proof. (a): If A, B ∈ U(X), then (AB)−1 = B −1 A−1 = B ∗ A∗ = (AB)∗ , showing


AB ∈ U(X). Also A = (A−1 )−1 = (A∗ )∗ , showing A−1 ∈ U(X) and establishing the
case.
(b): It suffices to consider A ∈ U(X). Then AA∗ = Id, implying
Th. 10.24(b)
| det A|2 = det A · det A = det A · det A∗ = det Id = 1.

(c): If A, B ∈ Herm(X), then (A + B)∗ = A∗ + B ∗ = A + B, showing A + B ∈ Herm(X).


If λ ∈ R, then (λA)∗ = λA∗ = λA, showing λA ∈ Herm(X). If A ∈ Herm(X) ∩ GL(X),
then (A−1 )∗ = (A∗ )−1 = A−1 , proving A−1 ∈ Herm(X) ∩ GL(X). If AB = BA, then
(AB)∗ = B ∗ A∗ = BA = AB, showing AB ∈ Herm(X). 

In general, for normal maps and normal matrices, neither the sum nor the product is
normal. However, one can show that, if A, B are normal with AB = BA, then A + B
and AB are normal – this makes use of the diagonalizability of normal maps over C and
is not quite as easy as one might think, cf. Prop. 10.43 below.

Proposition 10.32. Let X, h·, ·i be a finite-dimensional inner product space over K,
n ∈ N.
10 VECTOR SPACES WITH INNER PRODUCTS 161

(a) Let A, B ∈ skHerm(X). Then A + B ∈ skHerm(X) (if λ ∈ R, then λA ∈


skHerm(X), showing skHerm(X) to be a vector subspace of L(X, X) for K = R).
If A ∈ GL(X), then A−1 ∈ skHerm(X)8 . The analogous results also hold for Her-
mitian matrices.

(b) Herm(X) ∩ skHerm(X) = {0} and Hermn (K) ∩ skHermn (K) = {0}.

Proof. (a): If A, B ∈ skHerm(X), then (A + B)∗ = A∗ + B ∗ = −A − B = −(A + B),


showing A + B ∈ skHerm(X). If λ ∈ R, then (λA)∗ = λA∗ = −λA, showing λA ∈
skHerm(X). If A ∈ skHerm(X) ∩ GL(X), then (A−1 )∗ = (A∗ )−1 = (−A)−1 = −A−1 ,
proving A−1 ∈ skHerm(X) ∩ GL(X).
(b): Let A ∈ Herm(X) ∩ skHerm(X). Then −A = A∗ = A, i.e. 2A = 0 and A = 0. 

Proposition 10.33. Let X, h·, ·i be a finite-dimensional inner product space over K,
A ∈ L(X, X).

(a) The Hermitian part AHer = 21 (A + A∗ ) of A is Hermitian, the skew-Hermitian part


AskHer = 21 (A − A∗ ) of A is skew-Hermitian.

(b) A can be uniquely decomposed into its Hermitian and skew-Hermitian parts, i.e.
A = AHer + AskHer and, if A = S + B with S ∈ Herm(X) and B ∈ skHerm(X), then
S = AHer and B = AskHer .

(c) A is Hermitian if, and only if, A = AHer ; A is skew-Hermitian if, and only if,
A = AskHer .

The analogous results also hold for matrices, M ∈ M(n, K), n ∈ N.

Proof. (a): AHer is Hermitian due to


1 1
A∗Her = (A + A∗ )∗ = (A∗ + A) = AHer .
2 2
AskHer is skew-Hermitian due to
1 1
A∗skHer = (A − A∗ )∗ = (A∗ − A) = −AskHer .
2 2
8
Note that, for K = R, A ∈ skHerm(X) ∩ GL(X) can only exist if 1 ≤ d := dim X is even: If
A ∈ skHerm(X), then det A = det(A∗ ) = det(−A) = (−1)d det A, where we have used Th. 10.24(b)
and Cor. 4.37(c). Thus, for d odd, Re(det A) = 0 (implying A ∈
/ GL(X) for K = R) and, for d even,
Im(det A) = 0 (i.e. det A ∈ R).
10 VECTOR SPACES WITH INNER PRODUCTS 162

(b): While AHer + AskHer = 12 A + 21 A∗ + 21 A − 21 A∗ = A is immediate; if A = S + B with


S ∈ Herm(X) and B ∈ skHerm(X), then

AHer + AskHer = A = S + B ⇒ AHer − S = B − AskHer ,

where AHer − S ∈ Herm(X) by Prop. 10.31(c) and B − AskHer ∈ skHerm(X) by Prop.


10.32(a), showing AHer = S and AskHer = B by Prop. 10.32(b).
(c): If A = AHer , then A is Hermitian by (a); if A = AskHer , then A is skew-Hermitian by
(b). If A is Hermitian, then AskHer = A − AHer ∈ Herm(X) ∩ skHerm(X) = {0}, showing
A = AHer . If A is skew-Hermitian, then AHer = A − AskHer ∈ Herm(X) ∩ skHerm(X) =
{0}, showing A = AskHer . 

Proposition 10.34. Let X, h·, ·i be an inner product space over C, dim X = n ∈ N,
with ordered orthonormal basis B := (x1 , . . . , xn ). Let k · k denote the induced norm.
Moreover, let H ∈ L(X, X), and let M = (mkl ) ∈ M(n, C), M ∗ = (m∗kl ) ∈ M(n, C) be
the respective matrices of H and H ∗ with respect to B. Then the following statements
are equivalent:

(i) H and M are Hermitian.

(ii) hx, Hxi ∈ R for each x ∈ X.

(iii) x∗ M x ∈ R for each x ∈ Cn .

Caveat: This result does not extend to finite-dimensional inner product spaces over R:
Over R, (ii) and (iii) hold for every H ∈ L(X, X), M ∈ M(n, C), even though, for
n ≥ 2, not every such H, M is symmetric (cf. Ex. 10.20).

Proof. “(i)⇒(ii)”: If H is Hermitian and x ∈ X, then


(10.14)
hx, Hxi = hHx, xi = hx, H ∗ xi = hx, Hxi,

showing hx, Hxi ∈ R.


“(ii)⇒(i)”: We have, for each x ∈ X,
(ii)
h(H ∗ − H)x, xi = hH ∗ x, xi − hHx, xi = hx, Hxi − hx, Hxi = hx, Hxi − hx, Hxi = 0.

Thus, by Prop. 10.19(b), we have H ∗ − H = 0 and H ∗ = H, proving (i).


Now consider X := Cn with the standard inner product h·, ·i and let H ∈ L(X, X) such
that M ∈ M(n, C) represents H with respect to the standard basis. Then, for each
x ∈ Cn , x∗ M x = hM x, xi = hx, M xi, showing the equivalence between (ii) and (iii). 
10 VECTOR SPACES WITH INNER PRODUCTS 163

The following Prop. 10.35 is related to Prop. 10.19, highlighting important differences
between the complex and the real situation.

Proposition 10.35. Let X, h·, ·i be a finite-dimensional inner product space over R,
n ∈ N.

(a) If M = (mkl ) ∈ Symn (R) is such that

∀ x∗ M x = 0, (10.15)
x∈Rn

then M = 0.

(b) If A ∈ Sym(X) is such that


∀ hAx, xi = 0, (10.16)
x∈X

then A = 0.

(c) M ∈ M(n, R) is skew-symmetric if, and only if, (10.15) holds true.

(d) A ∈ L(X, X) is skew-symmetric if, and only if, (10.16) holds true.

Proof. (a): As in the proof of Prop. 10.19, we obtain from matrix multiplication
n X
X n
xt M x = mkl xk xl ,
k=1 l=1

where choosing x := eα , α ∈ {1, . . . , n}, yields 0 = etα M eα = mαα ; whereas choosing


x := eα + eβ for α 6= β with α, β ∈ {1, . . . , n} yields 0 = mαβ + mβα = 2mαβ , showing
mαβ = mβα = 0 and M = 0.
(b): Let n := dim X ∈ N. According to Th. 10.16(c), there exists a linear isomorphism
I : X −→ Rn such that
∀ hx, yi = hIx, Iyi2 ,
x,y∈X

where h·, ·i2 denotes the standard inner product on Rn . Moreover, we then also know
I t = I −1 from Th. 10.22(j). Thus, if we let B := I ◦ A ◦ I −1 , then B ∈ Sym(Rn ), since
B t = (I −1 )t ◦ At ◦ I t = B. Moreover,

∀ n hBu, ui2 = (I ◦ A ◦ I −1 )u, (I ◦ I −1 )u 2 = (A(I −1 u), I −1 u = 0.





u∈R

Now, if M ∈ M(n, R) represents B with respect to the standard basis of Rn , then M


is symmetric and, for each u ∈ Rn , ut M u = hM u, ui2 = hBu, ui2 = 0, such that M = 0
by (a). Thus, B = 0, also implying A = I −1 ◦ B ◦ I = 0.
10 VECTOR SPACES WITH INNER PRODUCTS 164

(d): If A is skew-symmetric, then, for each x ∈ X,

hAx, xi = hx, At xi = −hx, Axi = −hAx, xi,

proving 2hAx, xi = 0 and (10.16). Conversely, if (10.16), then, since AskHer is skew-
symmetric,

∀ 0 = hAx, xi = h(AHer + AskHer )x, xi = hAHer x, xi + hAskHer x, xi = hAHer x, xi.


x∈X

As AHer is symmetric, (b) implies AHer = 0, i.e. A = AskHer is skew-symmetric.


(c): If M is skew-symmetric, then, with respect to the standard basis of Rn (which
is orthonormal with respect to the standard inner product on Rn ), M represents a
skew-symmetric map A ∈ L(Rn , Rn ). As A satisfies (10.16) by (d), M satisfies (10.15).
Conversely, if M satisfies (10.15) and A is as before, then A satisfies (10.16) and is, thus,
skew-symmetric by (d), implying M to be skew-symmetric as well. 

Proposition 10.36. Let X, h·, ·i be an inner product space over K, dim X = n ∈ N,
with ordered orthonormal basis B := (x1 , . . . , xn ). Let k · k denote the induced norm.
Moreover, let U ∈ L(X, X), and let M = (mkl ) ∈ M(n, K), M ∗ = (m∗kl ) ∈ M(n, K) be
the respective matrices of U and U ∗ with respect to B. Then the following statements
are equivalent:

(i) U and M are unitary.

(ii) U ∗ and M ∗ are unitary.

(iii) The columns of M form an orthonormal basis of Kn with respect to the standard
inner product on Kn .

(iv) The rows of M form an orthonormal basis of Kn with respect to the standard inner
product on Kn .

(v) M t is unitary.

(vi) M is unitary.

(vii) hU x, U yi = hx, yi holds for each x, y ∈ X.

(viii) kU xk = kxk for each x ∈ X.

Proof. “(i)⇔(ii)”: U −1 = U ∗ is equivalent to Id = U U ∗ , which is equivalent to (U ∗ )−1 =


U = (U ∗ )∗ .
10 VECTOR SPACES WITH INNER PRODUCTS 165

“(i)⇔(iii)”: M −1 = M ∗ implies
   
m1k m1j n n
(
 ..   ..  X X 0 for k 6= j,
 . · . = mlk mlj = m∗jl mlk = (10.17)
1 for k = j,
mnk mnj l=1 l=1

showing that the columns of M form an orthonormal basis of Kn with respect to the
standard inner product on Kn . Conversely, if the columns of M form an orthonormal
basis of Kn with respect to the standard inner product, then they satisfy (10.17), which
implies M ∗ M = Id.
“(i)⇔(iv)”: M −1 = M ∗ implies
   
mk1 mj1 n n
(
 ..   ..  X X 0 for k 6= j,
 . · . = mkl mjl = mkl m∗lj = (10.18)
1 for k = j,
mkn mjn l=1 l=1

showing that the rows of M form an orthonormal basis of Kn with respect to the standard
inner product on Kn . Conversely, if the rows of M form an orthonormal basis of Kn
with respect to the standard inner product, then they satisfy (10.18), which implies
M ∗ M = Id.
“(i)⇔(v)”: Since the rows of M are the columns of M t , the equivalence of (i) and (v)
is immediate from (iii) and (iv).
“(i)⇔(vi)”: Since M = (M ∗ )t , the equivalence of (i) and (vi) is immediate from (ii) and
(v).
“(i)⇔(vii)” is a special case of what was shown in Th. 10.22(i).
“(vii)⇔(viii)” holds due to Th. 10.8(e). 

Corollary 10.37. Let X, h·, ·i be a finite-dimensional inner product space over R let
f : X −→ X. Then f is an isometry (e.g. a Euclidean isometry of Rn ) if, and only if,
f = L + a with an orthogonal map L ∈ O(X) and a ∈ X.

Proof. We know each translation Tv : X −→ X, T (x) := x + v to be an isometry by


Ex. 10.9(a) and, clearly, compositions of isometries are isometries.
“⇐”: If f = L + a with L ∈ O(X) and a ∈ X, then f = Ta ◦ L, where L is an isometric
linear isomorphism by Th. 10.36(vii) (cf. Def. 10.15). Thus, f must be isometric as well.
“⇒”: If f is an isometry, then f is affine by Th. 10.8(d), i.e. f = L+a with L ∈ L(X, X)
and a ∈ X. Then L = f − a = T−a ◦ f , showing L to be isometric. Thus, L must be
orthogonal by Th. 10.36. 
10 VECTOR SPACES WITH INNER PRODUCTS 166

In preparation for results on the diagonalizability of normal, Hermitian, and unitary


maps, we prove the following proposition:

Proposition 10.38. Let X, h·, ·i be an inner product space over K, dim X = n ∈ N,
and A ∈ Nor(X).

(a) If 0 6= x ∈ X is an eigenvector to the eigenvalue λ ∈ K of A, then x is also an


eigenvector to the eigenvalue λ of A∗ .
(b) If U is an A-invariant subspace of X, then A(U ⊥ ) ⊆ U ⊥ and A∗ (U ) ⊆ U .

Proof. (a): It suffices to show hA∗ x − λx, A∗ x − λxi = 0. To this end, we use Ax = λx
to compute
hA∗ x − λx, A∗ x − λxi = hA∗ x, A∗ xi − λhx, A∗ xi − λhA∗ x, xi + λ λhx, xi
= hAA∗ x, xi − λhAx, xi − λhx, Axi + λ λhx, xi
= hA∗ Ax, xi − λ λhx, xi − λ λhx, xi + λ λhx, xi
= hAx, Axi − λ λhx, xi = 0,
thereby establishing the case.
(b): Let dim U = m ∈ N and let BU := {u1 , . . . , um } be an orthonormal basis of U . As
U is A-invariant, there exist akl ∈ K such that
X m
∀ Aul = akl uk .
l∈{1,...,m}
k=1

Define m
X
∀ xl := A∗ ul − alk uk .
l∈{1,...,m}
k=1

To show the A -invariance of U , it suffices to show that, for each l ∈ {1, . . . , m}, xl = 0,
i.e. hxl , xl i = 0. To this end, for each l ∈ {1, . . . , m}, we compute
m
X m
X m X
X m
∗ ∗ ∗ ∗
hxl , xl i = hA ul , A ul i − alk hA ul , uk i − alk huk , A ul i + alj alk huk , uj i
k=1 k=1 j=1 k=1
m
X m
X m
X
= hA∗ Aul , ul i − alk hul , Auk i − alk hAuk , ul i + |alk |2
k=1 k=1 k=1
m
X m
XX m X m
m X m
X
= |akl |2 − ajk alk hul , uj i − ajk alk huj , ul i + |alk |2
k=1 k=1 j=1 k=1 j=1 k=1
Xm Xm m
X m
X m
X m
X
= |akl |2 − |alk |2 − |alk |2 + |alk |2 = |akl |2 − |alk |2 ,
k=1 k=1 k=1 k=1 k=1 k=1
10 VECTOR SPACES WITH INNER PRODUCTS 167

implying
m
X
hxl , xl i = 0.
l=1

As hxl , xl i ≥ 0 for each l ∈ {1, . . . , m}, this implies the desired hxl , xl i = 0 for each
l ∈ {1, . . . , m}, thereby proving A∗ (U ) ⊆ U . We will now make use of this result to also
show A(U ⊥ ) ⊆ U ⊥ : Let u ∈ U and x ∈ U ⊥ . Then
A∗ u∈U
hu, Axi = hA∗ u, xi = 0,

proving Ax ∈ U ⊥ . 

Theorem 10.39. Let X, h·, ·i be an inner product space over C, dim X = n ∈ N.

(a) The following statements are equivalent for A ∈ L(X, X):

(i) A ∈ Nor(X).
(ii) There exists an orthonormal basis B of X, consisting of eigenvectors of A.
(iii) There exists f ∈ C[X], deg f ≤ n − 1, such that A∗ = ǫA (f ), where, as before,
ǫA : C[X] −→ L(X, X) denotes the substitution homomorphism introduced in
Def. and Rem. 7.10.

In particular, each A ∈ Nor(X) is diagonalizable.

(b) The following statements are equivalent for M ∈ M(n, C):

(i) M ∈ Norn (C).


(ii) There exists a unitary matrix U ∈ Un (C) such that D = U −1 M U is a diagonal
matrix.

Proof. (a): “(i)⇒(ii)”: We prove the existence of the orthonormal basis of eigenvectors
via induction on n ∈ N. For n = 1, there is nothing to prove. Thus, let n > 1. As
C is algebraically closed, there exists λ ∈ σ(A). Let 0 6= v ∈ X be a corresponding
eigenvector and U := span{v}. According Prop. 10.38(b), both U and U ⊥ are A-
invariant. Moreover, A ↾U ⊥ is normal, since, if A and A∗ commute on X, they also
commute on the subspace U ⊥ . Thus, by induction hypothesis, U ⊥ has an orthonormal
basis B ′ , consisting of eigenvectors of A. Thus, X also has an orthonormal basis,
consisting of eigenvectors of A.
“(ii)⇒(iii)”: Let {v1 , . . . , vn } be an orthonormal basis of X, consisting of eigenvectors of
A, where Avj = λj vj , i.e. λ1 , . . . , λn ∈ C are the corresponding eigenvalues of A. Using
10 VECTOR SPACES WITH INNER PRODUCTS 168

Pn−1
Lagrange interpolation according to [Phi21, Th. 3.4], let f := k=0 fk X k ∈ C[X] with
f0 , . . . , fn−1 ∈ C be a polynomial of degree at most n − 1, satisfying
∀ ǫλj (f ) = λj , (10.19)
j∈{1,...,n}

and define B := ǫA (f ) (if all eigenvalues are distinct, then f is uniquely determined
by (10.19), otherwise, there exist infinitely many such
Ppolynomials, all resulting in the
∗ n
same map B = A , see below). Then, for each y := j=1 βj vj with β1 , . . . , βn ∈ C, we
obtain
n−1
! n ! n n−1
! n n−1
!
X X X X X X
By = fk B k βj vj = βj fk B k vj = βj fk λkj vj
k=0 j=1 j=1 k=0 j=1 k=0
n
(10.19) X
= βj λj vj .
j=1

As we also have, for each x := nj=1 αj vj with α1 , . . . , αn ∈ C, that Ax = nj=1 αj λj vj ,


P P
we conclude
* n n
+ n
* n n
+
X X X X X
hx, Byi = αj vj , βk λk vk = αj λj βj = α j λj vj , βk vk = hAx, yi,
j=1 k=1 j=1 j=1 k=1

proving B = A∗ by Th. 10.22(a).


Pn−1
“(iii)⇒(i)”: According to (iii), there exists f := k=0 fk X k ∈ C[X] with f0 , . . . , fn−1 ∈
C such that A∗ = ǫA (f ). Thus,
n−1 n−1
!
X X
∗ k k
∀ AA = A fk A = fk A A = A∗ A,
v∈X
k=0 k=0

proving A ∈ Nor(X).
(b): “(i)⇒(ii)”: Let M ∈ Norn (C). We consider M as a normal map on Cn with the
standard inner product h·, ·i, where the standard basis of Cn constitutes an orthonormal
basis. Then we know there exists an ordered orthonormal basis B := (x1 , . . . , xn ) of
Cn such that, with respect to B, M has the diagonal matrix D. Thus, there exists
U = (ukl ) ∈ GLn (C) such that D = U −1 M U and
n
X
∀ xl = ukl ek .
l∈{1,...,n}
k=1

Then n n
n X
X X
∀ ukl ukj = ukl umj hek , em i = hxl , xj i = δlj ,
l,j∈{1,...,n}
k=1 k=1 m=1
10 VECTOR SPACES WITH INNER PRODUCTS 169

showing the columns of U to be orthonormal and U to be unitary.


“(ii)⇒(i)”: Assume there exists a unitary matrix U ∈ Un (C) such that D = U −1 M U is
a diagonal matrix. Then M = U DU −1 and

M M ∗ = U DU −1 (U −1 )∗ D∗ U ∗ = U D Idn D∗ U ∗ = U D∗ Idn DU ∗
= (U −1 )∗ D∗ U ∗ U DU −1 = M ∗ M,

proving M ∈ Norn (C). 


2
Example 10.40. Consider
 R with the standard inner product. We already know from
0 −1
Ex. 6.5(a) that M := has no real eigenvalues (the characteristic polynomial is
1 0
χM = X 2 + 1). On the other hand, M is unitary (in particular, normal), showing, one
can not expect Th. 10.39 to hold with C replaced by R.

Theorem 10.41. Let X, h·, ·i be an inner product space over K, dim X = n ∈ N,
and A ∈ Herm(X). Then σ(A) ⊆ R and there exists an orthonormal basis B of X,
consisting of eigenvectors of A. In particular, A is diagonalizable. Moreover, if M ∈
Hermn (C), then there exists a unitary matrix U ∈ Un (C) such that D = U −1 M U is a
diagonal matrix. Also, in particular, for K = R, each A ∈ Sym(X) is diagonalizable,
and, if M ∈ Symn (R), then there exists an orthogonal matrix U ∈ On (R) such that
D = U −1 M U is a diagonal matrix.

Proof. Let λ ∈ σ(A) and let 0 6= x ∈ X be a corresponding eigenvector. Then, according


to Prop. 10.38(a), λx = Ax = A∗ x = λx, showing λ = λ and λ ∈ R. As A ∈ Herm(X)
implies A to be normal, the case K = C is now immediate from Th. 10.39(a)(ii). It
remains to consider K = R. First, consider n = 2. If B0 := (x1 , x2 ) is an ordered
orthonormal basis of X, then the matrix M ∈ M(2, R) of A with respect to B0 must
have the form  
a b
M=
b c
with a, b, c ∈ R. Thus, the characteristic polynomial is

χA = (X − a)(X − c) − b2 = X 2 − (a + c)C + ac − b2

with the zeros


r r
a+c (a + c)2 a+c (a − c)2 + 4b2
λ= ± − ac + b2 = ± ∈ R,
2 4 2 4
showing A to be diagonalizable in this case. The rest of the proof is now conducted
analogous to the proof of implication “(i)⇒(ii)” of Th. 10.39(a): We prove the existence
10 VECTOR SPACES WITH INNER PRODUCTS 170

of the orthonormal basis of eigenvectors via induction on n ∈ N. For n = 1, there


is nothing to prove. Thus, let n > 1. According to Prop. 8.16(a), there exists an A-
invariant subspace W of X such that dim W ∈ {1, 2}. If dim W = 1, then A has an
eigenvalue λ and a corresponding eigenvector 0 6= v ∈ X. If dim W = 2, then A↾W is,
clearly, also Hermitian, and the above-considered case n = 2 yields that A↾W (and, thus,
A) has an eigenvalue λ and a corresponding eigenvector 0 6= v ∈ X. Let U := span{v}.
According Prop. 10.38(b), both U and U ⊥ are A-invariant. Moreover, A ↾U ⊥ is also
Hermitian and, by induction hypothesis, U ⊥ has an orthonormal basis B ′ , consisting of
eigenvectors of A. Thus, X also has an orthonormal basis, consisting of eigenvectors
of A. Now let M ∈ Symn (R). We consider M as a symmetric map on Rn with the
standard inner product h·, ·i, where the standard basis of Rn constitutes an orthonormal
basis. Then we know there exists an ordered orthonormal basis B := (x1 , . . . , xn ) of
Rn such that, with respect to B, M has the diagonal matrix D. Thus, there exists
U = (ukl ) ∈ GLn (R) such that D = U −1 M U and
n
X
∀ xl = ukl ek .
l∈{1,...,n}
k=1

Then n n
n X
X X
∀ ukl ukj = ukl umj hek , em i = hxl , xj i = δlj ,
l,j∈{1,...,n}
k=1 k=1 m=1

showing the columns of U to be orthonormal and U to be orthogonal. 

The following commutation theorem extends to continuous linear operators on infinite-


dimensional Hilbert spaces (see, e.g., [Rud73, Th. 12.16]).

Theorem 10.42 (Fuglede). Let X, h·, ·i be a finite-dimensional inner product space
over K. If A ∈ L(X, X) and N ∈ Nor(X), then AN = N A implies AN ∗ = N ∗ A. The
analogous result also holds for matrices A ∈ M(n, K), N ∈ Norn (K), n ∈ N.

Proof. First, we consider the case K = C: According to Th. 10.39(a)(iii), there exists a
polynomial f ∈ C[X] such that N ∗ = ǫN (f ). Clearly, AN = N A implies A to commute
with powers of N and, thus, with N ∗ = ǫN (f ). If A ∈ M(n, C), N ∈ Norn (C), then, with
respect to the standard basis of Cn , A represents a map fA ∈ L(Cn , Cn ) and N represents
a map fN ∈ Nor(Cn ). Then AN = N A implies fA fN = fN fA , which, as already shown,
implies fA (fN )∗ = (fN )∗ fA , which, in turn, implies AN ∗ = N ∗ A (since AN ∗ represents
fA (fN )∗ and N ∗ A represents (fN )∗ fA ). For matrices, the case K = R is an immediate
special case of the case K = C. Now, if K = R, A ∈ L(X, X) and N ∈ Nor(X),
then there exists n ∈ N and an ordered orthonormal basis B := (v1 , . . . , vn ) of X such
that, with respect to B, A is represented by MA ∈ M(n, R) and N is represented by
10 VECTOR SPACES WITH INNER PRODUCTS 171

MN ∈ Norn (C). Then AN = N A implies MA MN = MN MA , which, as already shown,


implies MA (MN )∗ = (MN )∗ MA , which, in turn, implies AN ∗ = N ∗ A (since MA (MN )∗
represents AN ∗ and (MN )∗ MA represents N ∗ A). 

Proposition 10.43. Let X, h·, ·i be a finite-dimensional inner product space over K.
If A ∈ Nor(X) ∩ GL(X), then A−1 ∈ Nor(X) ∩ GL(X). Let A, B ∈ Nor(X) and λ ∈ K.
Then λA ∈ Nor(X). If AB = BA, then A + B ∈ Nor(X) and AB ∈ Nor(X). The
analogous results also hold for normal matrices.

Proof. If A ∈ Nor(X) ∩ GL(X), then


A−1 (A−1 )∗ = A−1 (A∗ )−1 = (A∗ A)−1 = (AA∗ )−1 = (A−1 )∗ A−1 ,
proving A−1 ∈ Nor(X) ∩ GL(X). Now let A, B ∈ Nor(X) and λ ∈ K. Then
λA(λA)∗ = λAλA∗ = λA∗ λA = (λA)∗ λA
shows λA ∈ Nor(X). If AB = BA, then
Th. 10.42
AB(AB)∗ = AB(BA)∗ = ABA∗ B ∗ = AA∗ BB ∗ = A∗ AB ∗ B
Th. 10.42
= A∗ B ∗ AB = (BA)∗ AB = (AB)∗ AB,
(A + B)(A + B)∗ = (A + B)(A∗ + B ∗ ) = AA∗ + AB ∗ + BA∗ + BB ∗
Th. 10.42
= A∗ A + B ∗ A + A∗ B + B ∗ B
= (A∗ + B ∗ )(A + B) = (A + B)∗ (A + B),
showing AB ∈ Nor(X) and A + B ∈ Nor(X). Let us provide another proof of the
normality of AB and A + B that does not make use of Fuglede’s Th. 10.42, but uses
the diagonalizability of A, B more directly (for K = C, this works directly; for K = R,
one can, e.g., extend X to a vector space Y over C, also extending A, B to Y in a
natural way): Since AB = BA, we know from Th. 6.8 that A, B are simultaneously
diagonalizable, i.e. there exists a basis {v1 , . . . , vn } of X (n ∈ N) such that there exist
α1 , . . . , αn , β1 , . . . , βn ∈ C with
 
∀ Avj = αj vj ∧ Bvj = βj vj .
j∈{1,...,n}

Moreover, according to Prop. 10.38(a), we then also have


 
∗ ∗
∀ A vj = α j vj ∧ B vj = β j vj .
j∈{1,...,n}

Thus,
∀ (AB)(AB)∗ vj = αj βj β j αj vj = β j αj αj βj vj = (AB)∗ (AB)vj ,
j∈{1,...,n}
11 DEFINITENESS OF QUADRATIC MATRICES OVER K 172

proving AB ∈ Nor(X). In the same way, one also sees B ∗ A = AB ∗ and A∗ B = BA∗
such that the computation from above, once again, shows (A + B)(A∗ + B ∗ ) = (A +
B)∗ (A + B). 

11 Definiteness of Quadratic Matrices over K


Definition 11.1. Let n ∈ N and let A = (akl ) ∈ M(n, K).

(a) A is called positive semidefinite if, and only if, x∗ Ax ∈ R+ n


0 for each x ∈ K .

(b) A is called positive definite if, and only if, A is positive semidefinite and x∗ Ax =
0 ⇔ x = 0, i.e. if, and only if, x∗ Ax > 0 for each 0 6= x ∈ Kn .

(c) A is called negative semidefinite if, and only if, x∗ Ax ∈ R− n


0 for each x ∈ K .

(d) A is called negative definite if, and only if, A is negative semidefinite and x∗ Ax =
0 ⇔ x = 0, i.e. if, and only if, x∗ Ax < 0 for each 0 6= x ∈ Kn .

(e) A is called indefinite if, and only if, A is neither positive semidefinite nor negative
semidefinite, i.e. if, and only if,
   
∗ ∗ + ∗ −

∃ n x Ax ∈ /R ∨ ∃ n x Ax ∈ R ∧ y Ay ∈ R .
x∈K x,y∈K

Lemma 11.2. Let n ∈ N and let A = (akl ) ∈ M(n, K).

(a) A is positive definite (positive semidefinite) if, and only if, −A is negative definite
(negative semidefinite).

(b) A is indefinite if, and only if, −A is indefinite.

Proof. The equivalences are immediate from Def. 11.1, since, for each x ∈ Kn , x∗ Ax >
0 ⇔ x∗ (−A)x < 0, x∗ Ax = 0 ⇔ x∗ (−A)x = 0, x∗ Ax ∈ R ⇔ x∗ (−A)x ∈ R. 

Theorem 11.3. Let n ∈ N and let A = (akl ) ∈ M(n, C).

(a) The following statements are equivalent:

(i) A is positive semidefinite (resp. positive definite).


(ii) A is Hermitian and all eigenvalues of A are nonnegative (resp. positive) real
numbers.
11 DEFINITENESS OF QUADRATIC MATRICES OVER K 173

Moreover, if A is positive semidefinite (resp. positive definite), then det A ≥ 0 (resp.


det A > 0).
(b) The following statements are equivalent:
(i) A is negative semidefinite (resp. negative definite).
(ii) A is Hermitian and all eigenvalues of A are nonpositive (resp. negative) real
numbers.
Moreover, if A is negative semidefinite (resp. negative definite), then det A ≥ 0
(resp. det A > 0) for n even and det A ≤ 0 (resp. det A < 0) for n odd.
(c) The following statements are equivalent:
(i) A is indefinite.
(ii) A is not Hermitian, or A is Hermitian and A has at least one positive and
one negative eigenvalue.

Proof. (a): If A is positive semidefinite, then A is Hermitian by Prop. 10.34 and, thus,
by Th. 10.41, all eigenvalues of A are real. If λ ∈ R is an eigenvalue of A and x ∈ Cn \{0}
is a corresponding eigenvector, then
0 ≤ x∗ Ax = x∗ λx = λkxk22 ,
where the inequality is strict in the case where A is positive definite. As kxk22 ∈ R+ ,
λ ∈ R+ +
0 and even λ ∈ R if A is positive definite. Then det A ≥ 0 (resp. det A > 0) also
follows, as det A is the product of the eigenvalues of A (cf. Cor. 8.5). Now assume A to
be Hermitian with only nonnegative (resp. positive) eigenvalues λ1 , . . . , λn ∈ R and let
{v1 , . . . , vn } be a corresponding orthonormal basis of eigenvectors (i.e. Avj = Pλnj vj and
∗ n
vj vk = δjk ). Then, for each x ∈ C , there exist α1 , . . . , αn ∈ C such that x = j=1 αj vj ,
implying
n
!∗ n
! n
! n !
X X X X
x∗ Ax = αk vk A αj vj = αk∗ vk∗ α j λj v j
k=1 j=1 k=1 j=1

n
X n
X
= αk∗ αk λk = |αk |2 λk ,
k=1 k=1

showing x∗ Ax ∈ R+ ∗ +
0 with x Ax ∈ R , for λ1 , . . . , λn ∈ R
+
and x 6= 0. Thus, A is
positive semidefinite and even positive definite if all λj are positive.
(b) follows by combining (a) with Lem. 11.2.
(c) is an immediate consequence of (a) and (b). 
11 DEFINITENESS OF QUADRATIC MATRICES OVER K 174

Theorem 11.4. Let n ∈ N and let A = (akl ) ∈ M(n, R).

(a) The following statements are equivalent:


(i) A is positive semidefinite (resp. positive definite).
(ii) The symmetric part Asym of A is positive semidefinite (resp. positive definite).
(iii) All eigenvalues of Asym are nonnegative (resp. positive) real numbers.
Moreover, if A is positive semidefinite (resp. positive definite), then det Asym ≥ 0
(resp. det Asym > 0).
(b) The following statements are equivalent:
(i) A is negative semidefinite (resp. negative definite).
(ii) The symmetric part Asym of A is negative semidefinite (resp. negative definite).
(iii) All eigenvalues of Asym are nonpositive (resp. negative) real numbers.
Moreover, if A is negative semidefinite (resp. negative definite), then det Asym ≥ 0
(resp. det Asym > 0) for n even and det Asym ≤ 0 (resp. det Asym < 0) for n odd.
(c) The following statements are equivalent:
(i) A is indefinite.
(ii) Asym is indefinite.
(iii) Asym has at least one positive and one negative eigenvalue.

Proof. (a): Since A = AHer + AskHer = Asym + Askew by Prop. 10.33(b), “(i) ⇔ (ii)” holds
as Prop. 10.35(c) yields x∗ Askew x = 0 and x∗ Ax = x∗ Asym x for each x ∈ Rn . Since Asym
is Hermitian, “(ii) ⇔ (iii)” is due to Th. 11.3(a).
(b) follows by combining (a) with Lem. 11.2.
(c) is an immediate consequence of (a) and (b). 
Notation 11.5. If A = (aij ) is an n × n matrix, n ∈ N, then, for 1 ≤ k ≤ l ≤ n, let
 
akk . . . akl
Akl :=  ... . . . ...  (11.1)
 
alk . . . all

denote the (1 + l − k) × (1 + l − k) principal submatrix of A, i.e.


∀ ∀ akl
ij := ai+k−1,j+k−1 . (11.2)
k,l∈{1,...,n}, i,j∈{1,...,1+l−k}
1≤k≤l≤n
A MULTILINEAR MAPS 175

Proposition 11.6. Let A = (aαβ ) ∈ M(n, K), n ∈ N. Then A is positive (semi-)definite


if, and only if, every principal submatrix Akl of A, 1 ≤ k ≤ l ≤ n, is positive
(semi-)definite.

Proof. If all principal submatrices of A are positive (semi-)definite, then, as A = Ann ,


A is positive (semi-)definite. Now assume A to be positive (semi-)definite and fix k, l ∈
{1, . . . , n} with 1 ≤ k ≤ l ≤ n. Let x = (xk , . . . , xl ) ∈ K1+l−k \ {0} and extend x to Kn
by 0, calling the extended vector y:
(
xα for k ≤ α ≤ l,
y = (y1 , . . . , yn ) ∈ Kn \ {0}, yα = (11.3)
0 otherwise.

We now consider x and y as column vectors and compute


l n
X X (≥)
∗ kl
x A x= aαβ xα xβ = aαβ y α yβ > 0, (11.4)
α,β=k α,β=1

showing Akl to be positive (semi-)definite. 

Theorem 11.7. Let m, n ∈ N and A ∈ M(m, n, C). Then A∗ A ∈ M(n, C) is Hermi-


tian and positive semidefinite. For m = n and det A 6= 0, A∗ A is even positive definite.

Proof. That A∗ A is Hermitian is due to

(A∗ A)∗ = A∗ (A∗ )∗ = A∗ A.

Moreover, if x ∈ Kn , then x∗ A∗ Ax = (Ax)∗ (Ax) = kAxk22 ∈ R+ ∗


0 , showing A A to be
positive semidefinite. If m = n and det A 6= 0, then, by Th. 10.24(b), det(AA∗ ) =
det A · det A = | det A|2 ∈ R+ . Thus, 0 is not an eigenvalue of A∗ A and A∗ A must be
positive definite according to Th. 11.3(a)(ii). 

A Multilinear Maps
Theorem A.1. Let V and W be vector spaces over the field F , α ∈ N. Then, as vector
spaces over F , L(V, Lα−1 (V, W )) and Lα (V, W ) are isomorphic via the isomorphism

Φ : L(V, Lα−1 (V, W )) −→ Lα (V, W ),


Φ(L)(x1 , . . . , xα ) := L(x1 )(x2 , . . . , xα ). (A.1)
B POLYNOMIALS IN SEVERAL VARIABLES 176

Proof. Since L is linear and L(x1 ) is (α − 1) times linear, Φ(L) is, indeed, an element
of Lα (V, W ), showing that Φ is well-defined by (A.1). Next, we verify Φ to be linear: If
λ ∈ F and K, L ∈ L(V, Lα−1 (V, W )), then

Φ(λL)(x1 , . . . , xα ) = (λL)(x1 )(x2 , . . . , xα ) = λ(L(x1 )(x2 , . . . , xα )) = λΦ(L)(x1 , . . . , xα )

and

Φ(K + L)(x1 , . . . , xα ) = (K + L)(x1 )(x2 , . . . , xα ) = (K(x1 ) + L(x1 ))(x2 , . . . , xα )


= K(x1 )(x2 , . . . , xα ) + L(x1 )(x2 , . . . , xα )
= Φ(K)(x1 , . . . , xα ) + Φ(L)(x1 , . . . , xα )
= (Φ(K) + Φ(L))(x1 , . . . , xα ),

proving Φ to be linear. Now we show Φ to be injective. To this end, we show that,


if L 6= 0, then Φ(L) 6= 0. If L 6= 0, then there exist x1 , . . . , xα ∈ V such that
L(x1 )(x2 , . . . , xα ) 6= 0, showing that Φ(L) 6= 0 as needed. To verify Φ is also surjective,
let K ∈ Lα (V, W ). Define L : V −→ Lα−1 (V, W ) by letting

L(x1 )(x2 , . . . , xα ) := K(x1 , . . . , xα ). (A.2)

Then, clearly, for each x1 ∈ V , L(x1 ) ∈ Lα−1 (V, W ). Moreover, L is linear, i.e. L ∈
L(V, Lα−1 (V, W )). Comparing (A.2) with (A.1) shows Φ(L) = K, i.e. Φ is surjective,
completing the proof. 

B Polynomials in Several Variables


Let (R, +, ·) be a commutative ring with unity. According to Def. B.4, polynomials in
one variable over R are precisely the linear combinations of monomials X 0 , X 1 , X 2 , . . .
Similarly, we now want polynomials in two variables over R to be the linear combinations
of monomials of the form X1k X2l = X2l X1k . The generalization to finitely many variables
X1 , . . . , Xn and even to infinitely many variables (Xi )i∈I (where I is an arbitrary infinite
set and the monomials are finite products of the Xi ) is then straightforward. We will
actually present a construction of polynomials that is even more general, namely the
construction of M -polynomials, where M is a commutative monoid (cf. Def. B.1 below):
Knowing this general construction is useful if one wants to pursue the study of Algebra
further, it comes at virtually no extra difficulty, and it elegantly includes all types of
polynomials mentioned above. We will define M -polynomials in Def. B.4 and we will
see how polynomials in finitely many variables as well as in infinitely many variables
arise as special cases in Ex. B.8(a)-(c).
B POLYNOMIALS IN SEVERAL VARIABLES 177

Definition B.1. (a) A semigroup (M, ◦) is called a monoid if, and only if, there exists
a neutral element e ∈ M (thus, a magma (M, ◦) is a monoid if, and only if, ◦ is
associative and M contains a neutral element).

(b) Let (M, ◦) be a monoid, ∅ = 6 U ⊆ M . We call U a submonoid of M if, and only


if, (U, ◦) forms a monoid, where the composition on U is the restriction of the
composition on M , i.e. ◦↾U ×U .

Lemma B.2. Let (M, ◦) be a monoid, ∅ 6= U ⊆ M . Then U is a submonoid of M if,


and only if, (i) and (ii) hold, where

(i) For each u, v ∈ U , one has u ◦ v ∈ U .

(ii) e ∈ U , where e denotes the neutral element of M .

Proof. If (U, ◦) is a monoid, then, clearly, it must satisfy (i) and (ii). Thus, we merely
need to show that (i) and (ii) are sufficient for (U, ◦) to be a monoid. According to (i),
◦ maps U × U into U . As ◦ is associative on M , it is also associative on U and, thus,
(ii) yields (U, ◦) to be a monoid. 

Example B.3. (a) (N0 , +) constitutes a commutative monoid.

(b) Let (M, ·) be a monoid with neutral element e ∈ M and let I be a set. Then, by
[Phi19, Th. 4.9(e)], F(I, M ) = M I becomes a monoid, if · is defined pointwise on
M I , where · is also commutative on M I if it is commutative on M . A submonoid
of (M I , ·) is given by (Mfin
I I
, ·), where, as in [Phi19, Ex. 5.16(c)], Mfin denotes the
set of functions f : I −→ M such that there exists a finite set If ⊆ I, satisfying

f (i) = e for each i ∈ I \ If , (B.1a)


f (i) 6= e for each i ∈ If : (B.1b)
I
Indeed, Mfin is a submonoid of M I : If f, g ∈ Mfin
I
, then If g ⊆ If ∪ Ig , showing
I I
f g ∈ Mfin ; fe ∈ Mfin for fe ≡ e.

(c) Let I be a set and n ∈ N. By combining (a) and (b), we obtain the commutative
monoids ((N0 )n , +), ((N0 )I , +), ((N0 )Ifin , +).

Definition B.4. Let (R, +, ·) be a commutative ring with unity and let (M, +) be a
commutative monoid. We call
M
= (f : M −→ R) : #f −1 (R \ {0}) < ∞

R[M ] := Rfin (B.2)
B POLYNOMIALS IN SEVERAL VARIABLES 178

the set of M -polynomials over R. We then have the pointwise-defined addition and
scalar multiplication on R[M ], which it inherits from RM :

∀ (f + g) : M −→ R, (f + g)(i) := f (i) + g(i),


f,g∈R[M ]
(B.3)
∀ ∀ (λ · f ) : M −→ R, (λ · f )(i) := λ f (i),
f ∈R[M ] λ∈R

where we know from [Phi19, Ex. 5.16(c)] that, with these compositions, R[M ] forms a
vector space over R, provided R is a field and, then, B = {ei : i ∈ M }, where

∀ ei : M −→ R, ei (j) := δij ,
i∈M

provides the standard basis of the vector space R[M ]. In the current context, we will
now write X i := ei and we will call these polynomials monomials. Furthermore, we
define a multiplication on R[M ] by letting

(ai )i∈M , (bi )i∈M 7→ (ci )i∈M := (ai )i∈M · (bi )i∈M ,
(B.4)
X X
ci := ak bl := ak b l ,
k+l=i (k,l)∈M 2 : k+l=i

where we note that, due to (B.2), only finitely many of the summands in the sum are
nonzero. If f := (ai )i∈M ∈ R[M ], then we call the ai ∈ R the coefficients of f .

Remark B.5. In the situation of Def. B.4, using the notation X i = ei , we can write ad-
dition, scalar multiplication, and
P multiplicationP
in the following, perhaps, more familiar-
looking forms: If λ ∈ R, f = i∈M fi X i , g = i∈M gi X i (each fi , gi ∈ R), then
X
f +g = (fi + gi ) X i ,
i∈M
X
λf = (λfi ) X i ,
i∈M
!
X X
fg = fk g l Xi
i∈M k+l=i

(as in (B.4), due to (B.2), only finitely many of the summands in each sum are nonzero).

Theorem B.6. Let (R, +, ·) be a commutative ring with unity and let (M, +) be a
commutative monoid. Then (R[M ], +, ·) forms a commutative ring with unity, where
1 = X 0 is the neutral element of multiplication.
B POLYNOMIALS IN SEVERAL VARIABLES 179

Proof. We already know from [Phi19, Ex. 4.9(e)] that (R[M ], +) forms a commutative
group. To verify associativity of multiplication, let a, b, c, d, f, g, h ∈ R[M ],

a := (ai )i∈M , b := (bi )i∈M , c := (ci )i∈M , d := (di )i∈M ,


f := (fi )i∈M , g := (gi )i∈M , h := (hi )i∈M ,

such that d := ab, f := bc, g := (ab)c, h := a(bc). Then, for each i ∈ M ,


X X X X
gi = d k cl = am b n c l = am b n c l
k+l=i k+l=i m+n=k m+n+l=i
X X X
= am b n c l = am f k = h i ,
m+k=i n+l=k m+k=i

proving g = h, as desired. To verify distributivity, let a, b, c, d, f, g ∈ R[M ] be as before,


but this time such that d := ab, f := ac, and g := a(b + c). Then, for each i ∈ M ,
X X X
gi = ak (bl + cl ) = ak b l + ak c l = d i + f i ,
k+l=i k+l=i k+l=i

proving g = d + f , as desired. To verify commutativity of multiplication, let a, b, c, d ∈


R[M ] be as before, but this time such that c := ab, d := ba. Then, for each i ∈ M ,
X X
ci = ak b l = b l ak = d i ,
k+l=i k+l=i

proving c = d, as desired. Finally, if b := X 0 , then b0 = 1 and bi = 0 for each i ∈ M \{0},


yielding, for c := ab and each i ∈ M ,
X X
ci = ak b l = ak b 0 = ai ,
k+l=i k+0=i

showing X 0 to be neutral and completing the proof. 


Proposition B.7. If R is a commutative ring with unity and (M, +) is a commutative
monoid, then R[M ] is a ring extension of R via the unital ring monomorphism

ι : R −→ R[M ], ι(r) := r X 0 .

Proof. The map ι is unital, since ι(1) = X 0 ; ι is a ring homomorphism, since, for each
r, s ∈ R, ι(r +s) = (r +s)X 0 = rX 0 +sX 0 = ι(r)+ι(s) and ι(rs) = rsX 0 = rX 0 ·sX 0 =
ι(r) ι(s); ι is injective, since, for r 6= 0, ι(r) = rX 0 6≡ 0. 
Example B.8. Let (R, +, ·) be a commutative ring with unity and let (M, +) be a
commutative monoid.
B POLYNOMIALS IN SEVERAL VARIABLES 180

(a) N0 -polynomials over R are polynomials in one variable over R as defined in Def.
7.1: For (M, +) = (N0 , +), the definition of an M -polynomial over R according to
Def. B.4 is precisely the same as that of a polynomial over R according to Def. 7.1,
i.e. R[X] = R[N0 ].

(b) (N0 )n -polynomials are polynomials in n variables: If (M, +) = ((N0 )n , +), then one
can interpret the X occurring in the monomial X (i1 ,...,in ) with (i1 , . . . , in ) ∈ (N0 )n
as the n-tuple of variables X = (X1 , . . . , Xn ) such that the monomial becomes
X (i1 ,...,in ) = X1i1 · · · Xnin . In consequence, it is also common to introduce the notation

R[X1 , . . . , Xn ] := R[(N0 )n ].

(c) (N0 )Ifin -polynomials, where I is an arbitrary set, are polynomials in the variables
(Xi )i∈I (possibly infinitely many): If ν ∈ (N0 )Ifin , then, consistently with the nota-
tion of Ex. B.3(b), we let Iν := ν −1 (N), i.e. Iν is the finite set satisfying

ν(i) = 0 for each i ∈ I \ Iν ,


ν(i) 6= 0 for each i ∈ Iν .

If (M, +) = ((N0 )Ifin , +), then one can interpret the X occurring in the monomial
X ν with ν ∈ (N0 )Ifin as the (#Iν )-tuple of variables X = (Xi )i∈Iν (with X = X 0 = 1
ν(i)
if Iν = ∅), such that the monomial becomes X ν = i∈Iν Xi . In consequence, it
Q
is also common to introduce the notation

R[(Xi )i∈I ] := R[(N0 )Ifin ].

If J ⊆ I is a finite subset of I, then the polynomial ring in finitely many variables


R[(N0 )J ] is, clearly, isomorphic to the ring R[X1 , . . . , X#J ] of (b). Moreover, we can
view R[(N0 )J ] as a subring of R[(N0 )Ifin ] via the ring extension given by the unital
ring monomorphism

ι : R[(N0 )J ] −→ R[(N0 )Ifin ],


 
X X X Y ν(i)
ι fν X ν  := fν X ν̃ = fν Xi ,
ν∈(N0 )J ν∈(N0 )J ν∈(N0 )J i∈J

where, for each ν : J −→ N0 , we define


(
ν(i) for i ∈ J,
ν̃ ∈ (N0 )Ifin , ν̃(i) :=
0 for i ∈ I \ J.
B POLYNOMIALS IN SEVERAL VARIABLES 181

fν X ν ∈ R[(N0 )Ifin ], we define the finite set


P
Next, for each f = ν∈(N0 )Ifin

[
f (I) := Iν . (B.5)
ν∈(N0 )Ifin : fν 6=0

By definition of f (I), we have


X
f= fν X ν ∈ R[(N0 )f (I) ],
ν∈(N0 )I(f )

showing [
R[(N0 )Ifin ] = R[(N0 )J ].
J⊆I: #J<∞

Theorem B.9. Let R be a commutative ring with unity and let M be a commutative
monoid. Moreover, let S be another commutative ring with unity and assume we have
a unital ring homomorphism φ : R −→ S as well as a homomorphism µ : (M, +) −→
(S, ·) with µ(0) = 1. Then the map
!
X X
Φ : R[M ] −→ S, Φ fi X i := φ(fi ) µ(i), (B.6)
i∈M i∈M

constitutes the unique ring homomorphism Φ : R[M ] −→ S that satisfies Φ ↾R = φ


(considering R as a subset of R[M ] due to Prop. B.7; in particular, Φ is also unital) as
well as
∀ Φ(X i ) = µ(i) (B.7)
i∈M

(one calls this kind of property a universal property to the polynomial ring R[M ], as one
can show it uniquely determines R[M ] up to a canonical isomorphism, cf. [Bos13, Ch.
2.5]).

Proof. If Φ is defined by (B.6), then, for each f = i∈M fi X i , g = i∈M gi X i ∈ R[M ],


P P
one computes (recalling that all occurring sums are, actually, finite)
X X 
Φ(f + g) = φ(fi + gi ) µ(i) = φ(fi ) + φ(gi ) µ(i)
i∈M i∈M
X X
= φ(fi ) µ(i) + φ(gi ) µ(i) = Φ(f ) + Φ(g)
i∈M i∈M
B POLYNOMIALS IN SEVERAL VARIABLES 182

as well as
! !
X X X X
Φ(f g) = φ fk g l µ(i) = φ(fk ) φ(gl ) µ(i)
i∈M k+l=i i∈M k+l=i
0∈M
X X X
= φ(fk ) φ(gl ) µ(i) = φ(fk ) φ(gl ) µ(k + l)
i∈M k+l=i k,l∈M
! !
X X X
= φ(fk ) φ(gl ) µ(k)µ(l) = φ(fk ) µ(k) φ(gl ) µ(l) = Φ(f )Φ(g),
k,l∈M k∈M l∈M

showing Φ to be a ring homomorphism. Next, for each r ∈ R, we have


Φ(r X 0 ) = φ(r) µ(0) = φ(r),
showing Φ↾R = φ. Since φ is unital,
∀ Φ(X i ) = Φ(1 · X i ) = φ(1)µ(i) = 1 · µ(i) = µ(i),
i∈M

proving (B.7).
To prove uniqueness, let Ψ : R[M ] −→ S be an arbitrary ring homomorphism such that
Ψ↾R = φ and Ψ(X i ) = µ(i) for each i ∈ M . Then, for each f = i∈M fi X i ∈ R[M ],
P
! !
X X X X
Ψ fi X i = Ψ(fi ) Ψ(X i ) = φ(fi ) µ(i) = Φ fi X i ,
i∈M i∈M i∈M i∈M

establishing Ψ = Φ, as desired. 

From now on, we will restrict ourselves to the cases of Ex. B.8, where, actually, Ex.
B.8(a) is a special case of Ex. B.8(b), and Ex. B.8(b), in turn, is a special case of Ex.
B.8(c). Thus, we restrict ourselves to M -polynomials, where (M, +) = ((N0 )Ifin , +) and
I may be an arbitrary set.
The ring isomorphisms provided by the following Cor. B.10 sometimes allow one to
establish properties for polynomial rings in finitely many variables via induction proofs.
Corollary B.10. Let (R, +, ·) be a commutative ring with unity and n ∈ N, n ≥ 2. Then
R[X1 , . . . , Xn ] and (R[X1 , . . . , Xn−1 ])[Xn ] are isomorphic via the ring isomorphism
Φ : (R[X1 , . . . , Xn−1 ])[Xn ] ∼
= R[X1 , . . . , Xn ],
   
N
X X X
Φ  fν,k X ν  Xnk  := fα X α ,
k=0 ν∈(N0 )n−1 α∈(N0 )n
B POLYNOMIALS IN SEVERAL VARIABLES 183

where (
fν,k if α(n) = k and α↾1,...,n−1 = ν,
fα :=
0 otherwise.
Moreover, noting that we have the unital ring monomorphism (the map called ι in Ex.
B.8(c))
 
X X
φ : R[X1 , . . . , Xn−1 ] −→ R[X1 , . . . , Xn ], φ  fν X ν  := fν X ν Xn0 ,
ν∈(N0 )n−1 ν∈(N0 )n−1

and the homomorphism

µ(k) := Xnk ,

µ : (N0 , +) −→ R[X1 , . . . , Xn ], · ,

Φ is the unique ring homomorphism from (R[X1 , . . . , Xn−1 ])[Xn ] into R[X1 , . . . , Xn ] with
Φ↾R[X1 ,...,Xn−1 ] = φ and
∀ Φ(Xnk ) = Xnk .
k∈N0

Proof. We apply Th. B.9 with R replaced by R[X1 , . . . , Xn−1 ], M := N0 , and S :=


R[X1 , . . . , Xn ]. We call the  ring homomorphism given by (B.6) Ψ and show Ψ = Φ: If
PN P ν
k=0 ν∈(N0 )n−1 fν,k X Xnk ∈ (R[X1 , . . . , Xn−1 ])[Xn ], then
     
XN X N
X X
Ψ  fν,k X ν  Xnk  = φ fν,k X ν  µ(k)
k=0 ν∈(N0 )n−1 k=0 ν∈(N0 )n−1
     
N
X X N
X X
=  fν,k X ν  Xnk = Φ   fν,k X ν  Xnk  ,
k=0 ν∈(N0 )n−1 k=0 ν∈(N0 )n−1

proving Ψ = Φ. Thus, it merely remains to show Φ is bijective. To this end, let


PN  P ν

k
f := k=0 ν∈(N0 )n−1 fν,k X Xn ∈ (R[X1 , . . . , Xn−1 ])[Xn ]. If f 6= 0, then there
exist k ∈ N0 and ν ∈ (N0 )n−1 such that fν,k 6= 0. Then, if α ∈ (N0 )n is such that
α↾1,...,n−1
P= ν and α(n) = k, then fα = fν,k 6= 0, showing Φ(f ) 6= 0 and injectivity of Φ.
α
If f = α∈(N0 )n fα X ∈ R[X1 , . . . , Xn ], then
   
X X
Φ  fα X α↾{1,...,n−1}  Xnk  = f,
k∈N0 α∈(N0 )n : α(n)=k

showing surjectivity of Φ, thereby completing the proof. 


B POLYNOMIALS IN SEVERAL VARIABLES 184

Proposition B.11. Let R be a commutative ring with unity and let I be a set. If R is
an integral domain (i.e. if it has no nonzero zero divisors), then R[(N0 )Ifin ] is an integral
domain as well and, moreover, (R[(N0 )Ifin ])∗ = R∗ .

Proof. To apply Cor. B.10, note that the result was proved for the polynomial ring in one
variable (i.e. for R[X]) in Prop. 7.7. Now let n ∈ N, n ≥ 2, and, by induction hypothesis,
∗
assume R[X1 , . . . , Xn−1 ] has no nonzero zero divisors and R[X1 , . . . , Xn−1 ] = R∗ .
Then Prop. 7.7 yields that (R[X1 , . . . , Xn−1 ])[Xn ] has no nonzero zero divisors and

(R[X1 , . . . , Xn−1 ])[Xn ] = R∗ . An application of Cor. B.10, in turn, provides that
∗
R[X1 , . . . , Xn ] has no nonzero zero divisors and R[X1 , . . . , Xn ] = R∗ , completing
the induction proof of the proposition’s assertion for polynomial rings in finitely many
variables. Proceeding to the case of a general, possibly infinite, set I, if f, g ∈ R[(N0 )Ifin ]\
{0}, then, using the notation introduced in (B.5), we have
f ∈ R[(N0 )f (I) ], g ∈ R[(N0 )g(I) ], f g ∈ R[(N0 )(f g)(I) ] ⊆ R[(N0 )f (I)∪g(I) ].
Since R[(N0 )f (I)∪g(I) ] is a polynomial ring in finitely many variables, we conclude f g 6= 0,
showing R[(N0 )Ifin ] has no nonzero zero divisors. Similarly, f ∈ (R[(N0 )Ifin ])∗ implies
f ∈ R∗ due to f ∈ R[(N0 )f (I) ] being an element of a polynomial ring in finitely many
variables. 
Notation B.12. If I is a set and ν ∈ (N0 )Ifin , then define
X
|ν| := ν(i).
i∈I

Lemma B.13. If I is a set, then


∀ |ν1 + ν2 | = |ν1 | + |ν2 |. (B.8)
ν1 ,ν2 ∈(N0 )Ifin

Proof. Noting that the following sums are, actually, finite, we compute, for each ν1 , ν2 ∈
(N0 )Ifin ,
X X  X X
|ν1 + ν2 | = (ν1 + ν2 )(i) = ν1 (i) + ν2 (i) = ν1 (i) + ν2 (i) = |ν1 | + |ν2 |,
i∈I i∈I i∈I i∈I

thereby establishing the case. 


Definition B.14. Let (R, +, ·) be a commutative ring with unity and let I be a set.

fν X ν ∈ R[(N0 )Ifin ], then we define the degree of f by


P
(a) If f = (fν )ν∈(N0 )Ifin = ν∈(N0 )Ifin
(
−∞ for f ≡ 0,
deg f := (B.9)
max{|ν| : fν 6= 0} for f 6≡ 0.
B POLYNOMIALS IN SEVERAL VARIABLES 185

fν X ν ∈ R[(N0 )Ifin ] homogeneous of degree d ∈ N0 if, and


P
(b) We call 0 6= f = ν∈(N0 )Ifin
only if,  
∀ fν 6= 0 ⇒ |ν| = d .
ν∈(N0 )Ifin

Moreover, we also define the zero polynomial to be homogeneous of every degree


d ∈ N0 .

(c) For f = ν∈(N0 )I fν X ν ∈ R[(N0 )Ifin ] and d ∈ N0 , we call


P
fin

X
hd (f ) := fν X ν
ν∈(N0 )Ifin : |ν|=d

the homogeneous component of degree d of f (clearly, deg hd (f ) = d or hd (f ) = 0).

Remark B.15. In the situation of Def. B.14(c), for f = ν∈(N0 )I fν X ν ∈ R[(N0 )Ifin ],
P
fin
the definition of the hd (f ) immediately yields
∞ deg f
X X
f= hd (f ) = hd (f ). (B.10)
d=0 d=0

Lemma B.16. Let R be a commutative ring with unity and let I be a set. If f, g ∈
R[(N0 )Ifin ] are homogeneous of degree df and dg , respectively (df , dg ∈ N0 ), then f g is
a homogeneous of degree df + dg (note that, according to Def. B.14(b), this does not
exclude the possibility f g = 0).

fν X ν and g = gν X ν , then
P P
Proof. If f = ν∈(N0 )Ifin ν∈(N0 )Ifin

!
X X
fg = fν1 g ν2 Xν.
ν∈(N0 )Ifin ν1 +ν2 =ν

Since f is homogeneous of degree df and g is homogeneous of degree dg , we have


   
fν1 gν2 6= 0 ⇒ fν1 6= 0 ∧ gν2 6= 0 ⇒ |ν1 | = df ∧ |ν2 | = dg
(B.8)
⇒ |ν| = |ν1 + ν2 | = |ν1 | + |ν2 | = df + dg ,

proving f g to be homogeneous of degree df + dg . 


B POLYNOMIALS IN SEVERAL VARIABLES 186

Theorem B.17. Let P R be a commutativeP ring with unity and let I be a set. If f, g ∈
I ν ν
R[(N0 )fin ] with f = ν∈(N0 )I fν X , g = ν∈(N0 )I gν X , then
fin fin

(
−∞ ≤ max{deg f, deg g} if f = −g,
deg(f + g) = (B.11a)
max{|ν| : fν 6= −gν } ≤ max{deg f, deg g}, otherwise,
deg(f g) ≤ deg f + deg g. (B.11b)

If R is an integral domain, then one even has

deg(f g) = deg f + deg g. (B.11c)

Proof. If f ≡ 0, then f + g = g and f g ≡ 0, i.e. the degree formulas hold if f ≡ 0 or


g ≡ 0. It is also immediate from (B.3) that (B.11a) holds in the remaining case. Using
Rem. B.15, we obtain
deg f deg g deg f deg g
X X XX
f= hd1 (f ), g= hd2 (g) ⇒ fg = hd1 (f ) hd2 (g).
d1 =0 d2 =0 d1 =0 d2 =0

According to Lem. B.16, each product hd1 (f ) hd2 (g) is homogeneous of degree d1 + d2
(where, in general, hd1 (f ) hd2 (g) = 0 is not excluded), proving (B.11b). If R has no
nonzero zero divisors, then, by Prop. B.11, R[(N0 )Ifin ] has no nonzero zero divisors,
implying hdeg f (f ) hdeg g (g) 6= 0 and, thus, (B.11c). 
Definition and Remark B.18. Let R be a commutative ring with unity, let I be a
set, and let R′ be a commutative ring extension of R, where ι : R −→ R′ is a unital
ring monomorphism. For each x := (xi )i∈I ∈ (R′ )I , the map
 
X X
ǫx : R[(N0 )Ifin ] −→ R′ , f 7→ ǫx (f ) = ǫx  fν X ν  := f ν xν , (B.12)
ν∈(N0 )Ifin ν∈(N0 )Ifin

where
ν(i)
Y
∀ ∀ xν := xi (B.13)
ν∈(N0 )Ifin x=(xi )i∈I ∈(R′ )I
i∈I

(the product is, actually, always finite and, thus, well-defined due to the commutativ-
ity of R′ ) is called the substitution homomorphism or evaluation homomorphism corre-
sponding to x: Indeed, if x ∈ R′ , then we may apply Th. B.9 with S := R′ , φ := ι,
M := (N0 )Ifin , and µ : (M, +) −→ (R′ , ·) defined by
Y ν(i)
µ(ν) := xν = xi : (B.14)
i∈I
B POLYNOMIALS IN SEVERAL VARIABLES 187

x0i = 1 ∈ R′ and
Q
Then µ(0) = i∈I
! !
ν (i)+ν2 (i) ν (i) ν2 (i)  ν (i) ν (i)
Y Y Y Y
µ(ν1 + ν2 ) = xi 1 = xi 1 xi = xi 1 xi 2 = µ(ν1 ) µ(ν2 )
i∈I i∈I i∈I i∈I

shows µ to be a homomorphism. Moreover, for each f = ν∈(N0 )I fν X ν ∈ R[(N0 )Ifin ],


P
fin
 
X X (B.6) X
ǫx (f ) = f ν xν = ι(fν ) µ(ν) = Φ  fν X ν  = Φ(f ),
ν∈(N0 )Ifin ν∈(N0 )Ifin ν∈(N0 )Ifin

identifying ǫx to be the unital ring homomorphism given by (B.6) of Th. B.9.


The map ǫx is linear, since, for each λ ∈ R and f as before,
X X
ǫx (λf ) = (λfν ) xν = λ fν xν = λ ǫx (f ).
ν∈(N0 )Ifin ν∈(N0 )Ifin

We call x ∈ (R′ )I a zero or a root of f ∈ R[(N0 )Ifin ] if, and only if, ǫx (f ) = 0.
Remark B.19. While Def. and Rem. B.18 is the analogon to Def. and Rem. 7.10
for polynomials in one variable, we note that condition (7.7) has been replaced by
the stronger assumption that R′ be commutative. This was used in the proof that
µ : (M, +) −→ (R′ , ·) is a homomorphism. It would suffice to replace (7.7) by the
assumption that, given x := (xi )i∈I ∈ (R′ )I , ab = ba holds for all a, b ∈ R ∪ {xi : i ∈ I},
but this still means that, for polynomials in several variables, one can, in general, no
longer use rings of matrices over R for R′ (one can no longer substitute matrices over R
for the variables of the polynomial).
Lemma B.20. Let (R, +, ·) be a commutative ring with unity and n ∈ N, n ≥ 2. If
Φ : (R[X1 , . . . , Xn−1 ])[Xn ] ∼
= R[X1 , . . . , Xn ]
is the ring isomorphism given by Cor. B.10 and R′ is a commutative ring extension of
R, then, for each x = (x1 , . . . , xn ) ∈ (R′ )n and each (f1 , . . . , fN ) ∈ (R[X1 , . . . , Xn−1 ])N ,
N ∈ N0 : !! !
X N XN
ǫx Φ fk Xnk = ǫx n ǫ(x1 ,...,xn−1 ) (fk )Xnk .
k=0 k=0

Proof. Suppose, for each k ∈ {0, . . . , N }, we have ν∈(N0 )n−1 fν,k X ν ∈ R[X1 , . . . , Xn−1 ].
P
From Cor. B.10, we recall
   
XN X X
Φ  fν,k X ν  Xnk  = fα X α ,
k=0 ν∈(N0 )n−1 α∈(N0 )n
B POLYNOMIALS IN SEVERAL VARIABLES 188

where (
fν,k if α(n) = k and α↾1,...,n−1 = ν,
fα :=
0 otherwise.
Thus, using the notation of (B.13), for each x = (x1 , . . . , xn ) ∈ (R′ )n ,
    
XN X X
ǫx Φ   fν,k X ν  Xnk  = f α xα
k=0 ν∈(N0 )n−1 α∈(N0 )n
 
N n−1 N n−1
ν(i) ν(i)
X X Y X X Y
= fν,k xkn xi =  fν,k xi  xkn
k=0 ν∈(N0 )n−1 i=1 k=0 ν∈(N0 )n−1 i=1
   
XN X
= ǫx n  ǫ(x1 ,...,xn−1 )  fν,k X ν  Xnk  ,
k=0 ν∈(N0 )n−1

thereby establishing the case. 


Notation B.21. Let R be a commutative ring with unity, let I be a set, and let R′
be a commutative ring extension of R. Moreover, let x := (xi )i∈I ∈ (R′ )I and let
ǫx : R[(N0 )Ifin ] −→ R′ be the substitution homomorphism of Def. and Rem. B.18. We
introduce the notation
R[x] := R (xi )i∈I := Im ǫx ⊆ R′ .
 

If R is an integral domain and L := R′ is a field, then we use R(x) := R (xi )i∈I to




denote the field of fractions of R[x], which, using the isomorphism of Def. and Rem.
7.41, we consider to be a subset of L, i.e.
R ⊆ R[x] ⊆ R(x) ⊆ L.
If n ∈ N and x1 , . . . , xn ∈ R′ , then we also use the simplified notation R[x1 , . . . , xn ] and
R(x1 , . . . , xn ), respectively.
Proposition B.22. In the situation of Not. B.21, the following holds:

(a)
\
S := S ⊆ R′ : R ∪ {xi : i ∈ I} ⊆ S ∧ S is subring of R′ ,

R[x] = S,
S∈S

i.e. R[x] is the smallest subring of R′ , containing R as well as all xi , i ∈ I. More-


over, it also holds that [  
R[x] = R (xi )i∈J .
J⊆I: #J<∞
B POLYNOMIALS IN SEVERAL VARIABLES 189

(b) If R is an integral domain and L := R′ is a field, then


\ 
R(x) = F, F := F ⊆ L : R ∪ {xi : i ∈ I} ⊆ F ∧ F is subfield of L ,
F ∈F

i.e. R(x) is the smallest subfield of L, containing R as well as all xi , i ∈ I. More-


over, it also holds that
[ 
R(x) = R (xi )i∈J .
J⊆I: #J<∞

Proof. (a): According to [Phi19, Ex. 4.36(c)], R[x] = Im ǫx is a subring of R′ . Since


ǫx (r) = r ∈ Im ǫx for each r ∈TR as well as ǫx (Xi ) = xi for each i ∈ I, R ∪ {xi : i ∈
I} ⊆ R[x], i.e. R[x] ∈ S and S∈S S ⊆ R[x]. To prove the remaining inclusion, let S
be a subring of R′ such that R ∪ {xi : i ∈ I} ⊆ S. Since S is then closed under sums
and products, if f ∈ R[(N0 )Ifin ], then ǫx (f ) ∈ S, showing R[x] = S
T Im ǫx ⊆ S and  R[x] ⊆ 
S∈S S. To prove the remaining representation of R[x], let U := J⊆I: #J<∞ R (xi )i∈J .
Then U is a subring of R′ by [Phi19, Ex. 4.36(f)], since the set M of finite subsets of
I is partially ordered by inclusion with J1 , J2 ⊆ J1 ∪ J2 ∈ M T for each J1 , J2 ∈ M.
Clearly, R ∪ {xi : i ∈ I} ⊆ U , i.e. U ∈ S and R[x] =  S∈S S ⊆ U . To prove
the remaining inclusion, note that J ⊆ I directly implies R (xi )i∈J ⊆ R[x]. Thus,
U ⊆ R[x], completing the proof.
(b) follows directly from (a) by applying Def. and Rem. 7.41 with R replaced by R[x].
The remaining representation of R(x) is proved precisely by the same argument as the
analogous representation of R[x] in (a). 

Theorem B.23. Let R be a commutative ring with unity, let I be a set, and consider
the map
I
φ : R[(N0 )Ifin ] −→ R(R ) , f 7→ φ(f ), φ(f )(x) := ǫx (f ). (B.15)
We define
Pol(R, (xi )i∈I ) := R[(xi )i∈I ] := φ R[(N0 )Ifin ]


and call the elements of Pol(R, (xi )i∈I ) polynomial functions (in n variables for #I =
n ∈ N; in infinitely many variables for I being infinite).
I
(a) φ is a unital ring homomorphism (in particular, Pol(R, (xi )i∈I ) is a subring of R(R )
and φ is a unital ring epimorphism onto Pol(R, (xi )i∈I )). If R is a field, then φ
is also linear (in particular, Pol(R, (xi )i∈I ) is then a vector subspace of the vector
I
space R(R ) over R and φ is then a linear epimorphism onto Pol(R, (xi )i∈I )).

(b) If R is finite and I is nonempty, then φ is not a monomorphism.


B POLYNOMIALS IN SEVERAL VARIABLES 190

(c) If F := R is an infinite field, then φ : R[(N0 )Ifin ] −→ Pol(R, (xi )i∈I ) is an isomor-
phism.
I
Proof. We first recall that we know R(R ) to be a commutative ring with unity from
I
[Phi19, Ex. 4.42(a)]. Similarly, if R is a field, then R(R ) is a vector space over R
according to [Phi19, Ex. 5.2(a)].
(a): If f = X 0 , then φ(f ) ≡ 1. We also know from Def. and Rem. B.18 that, for each
x ∈ RI , ǫx is a linear ring homomorphism. Thus, if f, g ∈ R[(N0 )Ifin ] and λ ∈ R, then,
for each x ∈ RI ,

φ(f + g)(x) = ǫx (f + g) = ǫx (f ) + ǫx (g) = φ(f ) + φ(g) (x),

φ(f g)(x) = ǫx (f g) = ǫx (f ) ǫx (g) = φ(f )φ(g) (x),

φ(λf )(x) = ǫx (λf ) = λ ǫx (f ) = λ φ(f ) (x).

Thus, φ is unital linear ring epimorphism onto Pol(R, (xi )i∈I ) (directly from the defini-
tion of Pol(R, (xi )i∈I )). In consequence, Pol(R, (xi )i∈I ) is a commutative ring with unity
by [Phi19, Prop. 4.37] and, if R is a field, then Pol(R, (xi )i∈I ) is a vector space over R
by [Phi19, Prop. 6.3(c)].
I I
(b): If R and I are both finite, then R(R ) and Pol(R, (xi )i∈I ) ⊆ R(R ) are finite as well,
(N )I
whereas R[(N0 )Ifin ] = Rfin0 fin is infinite (if I 6= ∅). In particular, φ can not be injective.
Now let I be infinite and let J ⊆ I be nonempty and finite. We recall the unital ring
monomorphism ι : R[(N0 )J ] −→ R[(N0 )Ifin ] introduced in Ex. B.8(c) and also consider
J
φJ : R[(N0 )Jfin ] −→ R(R ) , f 7→ φ(f ), φJ (f )(x) := ǫx (f ),

where we already know φJ is not injective. If we define


J I
ιJ : R(R ) −→ R(R ) , ιJ (P ) (xi )i∈I := P (xi )i∈J ,
 

then
φ↾ι(R[(N0 )J ]) = ιJ ◦ φJ ◦ ι−1 : (B.16)
fν X ν ∈ R[(N0 )J ], x := (xi )i∈I ∈ RI and xJ := (xi )i∈J , then
P
Indeed, if f = ν∈(N0 )J
 
ν(i)  ν(i)
X Y X Y
φ(ι(f ))(x) = φ  fν Xi (x) = fν xi
ν∈(N0 )J i∈J ν∈(N0 )J i∈J
 
ν(i) 
X Y 
= φJ  fν Xi (xJ ) = φJ (f )(xJ ) = ιJ φJ (f ) (x),
ν∈(N0 )J i∈J
B POLYNOMIALS IN SEVERAL VARIABLES 191

thereby proving (B.16). Since ι : R[(N0 )J ] −→ ι(R[(N0 )J ]) is bijective and φJ is not


injective, (B.16) shows φ restricted to ι(R[(N0 )J ]) is not injective, i.e. φ is not injective.
(c): It remains to show φ is a monomorphism, i.e. ker φ = {0}. For polynomials in one
variable over and infinite field F , this was shown in Th. 7.17(c). Next, we use induction
to extend this result to polynomials in finitely many variables: Let F be an infinite field,
n ≥ 2, and let
Φ : (F [X1 , . . . , Xn−1 ])[Xn ] ∼
= F [X1 , . . . , Xn ]
denote the ring isomorphism given by Cor. B.10. Suppose f ∈ F [X1 , . . . , Xn ]
n
φ(f ) = 0 ∈ F (F ) .

By induction, we have the isomorphisms

ψ : F [X1 , . . . , Xn−1 ] −→ Pol[F, (x1 , . . . , xn−1 )],


g 7→ ψ(g), ψ(g)(x1 , . . . , xn−1 ) := ǫ(x1 ,...,xn−1 ) (g)

and
φn : F [Xn ] −→ Pol[F ], h 7→ φn (h), φn (h)(xn ) := ǫxn (h).
Now let (f1 , . . . , fN ) ∈ (F [X1 , . . . , Xn−1 ])N , N ∈ N0 , such that Φ−1 (f ) = N k
P
k=0 fk Xn .
n
Then we know from Lem. B.20 that, for each x = (x1 , . . . , xn ) ∈ F ,
N
!! N
!
X X
0 = φ(f )(x) = ǫx (f ) = ǫx Φ fk Xnk = ǫx n ǫ(x1 ,...,xn−1 ) (fk )Xnk ,
k=0 k=0

showing N
P k
PN k
k=0 ǫ(x1 ,...,xn−1 ) (fk )Xn ∈ ker φn , i.e. k=0 ǫ(x1 ,...,xn−1 ) (fk )Xn = 0, since φn is
injective. Thus, f1 , . . . , fn ∈ ker ψ, implying f1 = · · · = fn = 0, since ψ is injective. In
consequence, we have shown Φ−1 (f ) = 0 and f = 0 as well, proving φ to be injective, as
desired. This concludes the induction and the proof for the case of polynomials in finitely
many variables. It merely to consider the case, where φ(f ) = 0 and f ∈ F [(N0 )Ifin ], I
being infinite and F being and infinite field. According to Ex. B.8(c), f ∈ F [(N0 )f (I) ] ⊆
F [(N0 )Ifin ], where f (I) ⊆ I is the finite set defined in Ex. B.8(c). Thus, f is, actually, a
polynomial in only finitely many variables and, since we already know φ↾F [(N0 )f (I) ] to be
injective, φ(f ) = 0 implies f = 0, proving φ to be injective. 
Remark B.24. Let R be a commutative ring with unity, let I be a set.PIf P : RI −→ R
is a polynomial function as defined in Th. B.23, then there exists f = ν∈(N0 )I fν X ν ∈
fin
R[(N0 )Ifin ] with P = φ(f ). Thus, for each x = (xi )i∈I ∈ RI ,
X X Y ν(i)
P (x) = ǫx (f ) = f ν xν = fν xi (B.17)
ν∈(N0 )Ifin ν∈(N0 )Ifin i∈I
C QUOTIENT RINGS 192

(all sums and products being, actually, finite). Thus, the polynomial functions are
precisely the linear combinations of the monomial functions x 7→ xν , ν ∈ (N0 )Ifin . Caveat:
In general, the representation of P given by (B.17) is not unique: For example, if R is
finite and I is nonempty, then it is not unique due to Th. B.23(b) (also cf. Ex. 7.18 and
Rem. 7.19).
Remark B.25. In Cor. 7.37, we concluded that the ring S := F [X] is factorial for
each field F (i.e. each 0 6= a ∈ S \ S ∗ admits a factorization into prime elements, which
is unique up to the order of the primes and up to association), as a consequence of
F [X] being a principal ideal domain. One can actually show, much more generally,
that, if R is a factorial ring, then R[(N0 )Ifin ] is a factorial ring as well (proofs for the
case of polynomial rings in finitely many variables can be found, e.g., in [Bos13, Sec.
2.7] and [Lan05, Ch. 4§2] – the case, where I is infinite can then be treated by the
method we used several times above, using that each element f ∈ R[(N0 )Ifin ] is, actually,
a polynomial in only finitely many variables). The reason the general result does not
follow as easily as the one in Cor. 7.37 lies in the fact that, in general, R[(N0 )Ifin ] is no
longer a principal ideal domain.

C Quotient Rings
In [Phi19, Def. 4.26], we defined the quotient group G/N of a group G with respect
to a normal subgroup N ; in [Phi19, Sec. 6.2], we defined the quotient space V /U of a
vector space V over a field F with respect to a subspace U . For a ring R, there exists
an analogous notion, where ideals a ⊆ R now play the role of the normal subgroup.
For simplicity, we will restrict ourselves to the case of a commutative ring R with unity.
According to Def. 7.22(a), every ideal a ⊆ R is an additive subgroup of R and we can
form the quotient group R/a. We will see below that we can even make a ⊆ R into a
commutative ring with unity, called the quotient ring or factor ring of R with respect
to a, where a being an ideal guarantees the well-definedness of the multiplication on
a ⊆ R. As we write (R, +) as an additive group, we write the respective cosets (i.e. the
elements of R/a) as x + a, x ∈ R.
Theorem C.1. Let R be a commutative ring with unity and let a ⊆ R be an ideal in R.

(a) The compositions


+ : R/a × R/a −→ R/a, (x + a) + (y + a) := x + y + a, (C.1a)
· : R/a × R/a −→ R/a, (x + a) · (y + a) := xy + a, (C.1b)
are well-defined, i.e. the results do not depend on the chosen representatives of the
respective cosets.
C QUOTIENT RINGS 193

(b) The natural (group) epimorphism of [Phi19, Th. 4.27(a)]

φa : R −→ R/a, φa (x) := x + a, (C.2)

satisfies

∀ φa (x + y) = φa (x) + φa (y), (C.3a)


x,y∈R

∀ φa (xy) = φa (x) · φa (y). (C.3b)


x,y∈R

(c) R/a with the compositions of (a) forms a commutative ring with unity and φa of
(b) constitutes a unital ring epimorphism.

Proof. (a): The composition + is well-defined by [Phi19, Th. 4.27(a)]. To verify that
· is well-defined as well, suppose x, y, x′ , y ′ ∈ R are such that x + a = x′ + a and
y + a = y ′ + a. We need to show xy + a = x′ y ′ + a. There exist ax , ay ∈ a such that
x′ = x + ax , y ′ = y + ay . Since N is a normal subgroup of G, we have bN = N b and
there exists n ∈ N such that na b = bn. Then, as nnb N = N , we obtain

xy + a = (x′ − ax )(y ′ − ay ) + a = x′ y ′ + x′ ay − ax y ′ + ax ay + a = x′ y ′ + a,

as needed, where we used that a is an additive subgroup of R and that az = za ∈ a for


each a ∈ a, z ∈ R.
(b): (C.3a) holds, as φa is a homomorphism with respect to + by [Phi19, Th. 4.27(a)].
To verify (C.3b), let x, y ∈ R. Then
(C.1b)
φa (xy) = xy + a = (x + a) · (y + a) = φa (x) · φa (y),

thereby establishing the case.


(c): In view of (b), R/a is a ring with unity by [Phi19, Prop. 4.37] and · is commutative
by [Phi19, Prop. 4.11(c)]. 

Lemma C.2. Let (G, ·) be a group (not necessarily commutative) and let N ⊆ G be a
normal subgroup of G. Consider the map

ΦN : P(G) −→ P(G/N ), ΦN (A) := φN (A), (C.4)

where φN is the natural epimorphism of [Phi19, (4.25)].

(a) U is a subgroup of G if, and only if, ΦN (U ) is a subgroup of G/N .


C QUOTIENT RINGS 194

(b) If one lets


n o
A := U ∈ P(R) : U is subgroup of G and N ⊆ U ⊆ G ,
n o
B := V ∈ P(G/N ) : V is subgroup of G/N ,

then ΦN : A −→ B is bijective, where


Φ−1
N : B −→ A, Φ−1 −1
N (B) := φN (B). (C.5)

Proof. (a) is merely a special case of [Phi19, Th. 4.20(a),(d)].


(b): According to (a), ΦN does map A into B. Since N = ker φN , (a) also shows
ΦN (A) = B. It remains to show ΦN is injective. To this end, suppose U, V ∈ A with
U ⊆ V , U 6= V . If x ∈ G such that φN (x) = xN ∈ φN (U ), then there exist u ∈ U and
nu , nx ∈ N such that xnx = unu , i.e. x = unu n−1 x ∈ U , since U is a subgroup of G with
N ⊆ U . The contraposition to what we have just shown says that v ∈ V \ U implies
φN (v) ∈/ φN (U ), i.e. φN (v) ∈ ΦN (V ) \ ΦN (U ), proving ΦN : A −→ B to be injective.
Now that we know ΦN : A −→ B to be bijective, (C.5) is clear from (C.4). 

Next, we formulate and proof a version of the previous lemma for commutative rings
with unity (where ideals now replace subgroups).
Lemma C.3. Let R be a commutative ring with unity and let a ⊆ R be an ideal in R.
Consider the map of (C.4), which we here denote as
Φa : P(R) −→ P(R/a), Φa (A) := φa (A).

(a) b is an ideal in R if, and only if, Φa (b) is an ideal in R/a.


(b) If one lets
n o
AR := b ∈ P(R) : b is ideal in R and a ⊆ b ⊆ R ,
n o
BR := B ∈ P(R/a) : B is ideal in R/a ,

then Φa : AR −→ BR is bijective, where


Φ−1
a : BR −→ AR , Φ−1 −1
a (B) := φa (B).

Proof. (a) is merely a special case of Prop. 7.24(c).


(b): According to (a), Φa does map AR into BR . Since a = ker φa , (a) also shows
Φa (AR ) = BR . If A is the set of Lem. C.2(b), then AR ⊆ A and, thus, Φa : AR −→ BR
is injective by Lem. C.2(b). The representation of Φ−1
a is then also clear from (C.5). 
C QUOTIENT RINGS 195

In [Phi19, Th. 4.27(b)], we proved the isomorphism theorem for groups: If G and H are
groups and φ : G −→ H is a homomorphism, then G/ ker φ ∼ = Im φ. Now let R be a
commutative ring with unity and let S be another ring. Since R and S are, in particular,
additive groups, if φ : R −→ S is a ring homomorphism, then (R/ ker φ, +) ∼ = (Im φ, +)
and, since ker φ = φ−1 {0} is an ideal in R by Prop. 7.24(b), it is natural to ask, whether
R/ ker φ and Im φ are isomorphic as rings. In Th. C.4 below we see this, indeed, to be
the case.
Theorem C.4 (Isomorphism Theorem). Let R be a commutative ring with unity and
let S be another ring. If φ : R −→ S is a ring homomorphism, then

(R/ ker φ, +, ·) ∼
= (Im φ, +, ·). (C.6)

More precisely, the map

f : R/ ker φ −→ Im φ, f (x + ker φ) := φ(x), (C.7)

is well-defined and constitutes an ring isomorphism. If fe : R −→ R/ ker φ denotes


the natural epimorphism and ι : Im φ −→ S, ι(x) := x, denotes the embedding, then
fm : R/ ker φ −→ S, fm := ι ◦ f , is a ring monomorphism such that

φ = fm ◦ fe . (C.8)

Proof. All assertions, except that f , fe , and fm are multiplicative homomorphisms, were
already proved in [Phi19, Th. 4.27(b)]. Moreover, fe is a multiplicative homomorphism
by (C.3b) and, thus, so is fm (by [Phi19, Prop. 4.11(a)]), once we have shown f to be a
multiplicative homomorphism. Thus, it merely remains to show

f (x + ker φ)(y + ker φ) = f (x + ker φ) f (y + ker φ)

for each x, y ∈ R. Indeed, if x, y ∈ R, then



f (x+ker φ)(y +ker φ) = f (xy +ker φ) = φ(xy) = φ(x) φ(y) = f (x+ker φ) f (y +ker φ),

as desired. 
Definition C.5. Let R be a commutative ring with unity and let a be an ideal in R.

(a) a is called a proper ideal if, and only if, a 6= R.

(b) a is called a prime ideal if, and only if, a is proper and
 
∀ xy ∈ a ⇒ x ∈ a ∨ y ∈ a .
x,y∈R
C QUOTIENT RINGS 196

(c) a is called a maximal ideal if, and only if, a is proper and
 
∀ a⊆b ⇒ b=a ∨ b=R .
b ⊆ R, b ideal

Lemma C.6. Let R be an integral domain, 0 6= p ∈ R. Then the following statements


are equivalent:

(i) p is prime.

(ii) (p) is prime.

Proof. Suppose p is prime. Then 0 6= p ∈ R \ R∗ , showing (p) to be proper. If x, y ∈ R


with xy ∈ (p), then xy = ap with a ∈ R, showing p| xy. As p is prime, this means
p| x ∨ p| y, showing x ∈ (p) or y ∈ (p), i.e. (p) is prime. Conversely, assume (p) to
be prime. Then p ∈ R \ R∗ , since (p) is proper. If p| xy, then there exists a ∈ R such
that pa = xy, i.e. xy ∈ (p). Thus, as (p) is prime, x ∈ (p) or y ∈ (p). If x ∈ (p), then
x = ax p with ax ∈ R, showing p| x; if y ∈ (p), then y = ay p with ay ∈ R, showing p| y.
In consequence, p is prime. 

Lemma C.7. Let R be a commutative ring with unity. Then the following statements
are equivalent:

(i) R is a field.

(ii) The ideal (0) is maximal in R.

Proof. If R is a field, then we know from Ex. 7.27(a) that (0) and R are the only ideals
in R, proving (0) to be maximal. Conversely, assume (0) to be maximal ideal in R. If
0 6= x ∈ R, then the ideal (x) must be all of R (since (x) 6= (0) and (0) is maximal).
Since (x) = R and 1 ∈ R, there exists y ∈ R such that xy = 1, showing R \ {0} = R∗
(every nonzero element of R is invertible), i.e. R is a field. 

Theorem C.8. Let R be a commutative ring with unity and let a be an ideal in R.

(a) The following statements are equivalent:

(i) a is proper.
(ii) R/a 6= {0} (i.e. the quotient ring contains more than one element).

(b) The following statements are equivalent:

(i) a is prime.
C QUOTIENT RINGS 197

(ii) R/a is an integral domain.

(c) The following statements are equivalent:

(i) a is maximal.
(ii) (0) is a maximal ideal in R/a.
(iii) R/a is a field (called the quotient field or factor field of R with respect to a).

Proof. (a): If a is not proper, then a = R and R/a = {0}. If a is proper, then there
exists x ∈ R \a, i.e. x+a 6= a (since a is an additive subgroup of R). Thus, R/a contains
at least two elements.
(b): If a is prime, then it is proper and, by (a), 0 6= 1 in R/a. Moreover if x, y ∈ R such
that
a = (x + a)(y + a) = xy + a (C.9)
then xy ∈ a and x ∈ a or y ∈ a (as a is prime). Thus, x + a = a or y + a = a, showing
R/a to be an integral domain. Conversely, if R/a is an integral domain, then 0 6= 1
in R/a and a is proper by (a). Moreover, if x, y ∈ R such that xy ∈ a, then (C.9)
holds, implying x + a = a or y + a = a (as the integral domain R/a has no nonzero zero
divisors). Thus, x ∈ a or y ∈ a, proving a to be prime.
(c): Letting AR and BR be as in Lem. C.3(b), we have the equivalences
Lem. C.3(b)
(i) ⇔ #AR = 2 ⇔ #BR = 2 ⇔ (ii).

The equivalence “(ii)⇔(iii)” is provided by Lem. C.7. 

Corollary C.9. Let R be a commutative ring with unity and let a be an ideal in R. If
a is maximal, then a is prime.

Proof. If a is maximal, then (by Th. C.8(c)) R/a is a field, implying, in particular, R/a
is an integral domain. Thus, by Th. C.8(b), a is prime. 

Theorem C.10. Let R be an integral domain and 0 6= a ∈ R \ R∗ . Consider the


following statements:

(i) (a) is a maximal ideal in R.

(ii) a is prime.

(iii) a is irreducible.
C QUOTIENT RINGS 198

Then (i) ⇒ (ii) ⇒ (iii) always holds. Moreover, if R is a principal ideal domain, then
the three statements are even equivalent.

Proof. “(i) ⇒ (ii)”: If (a) is maximal, then, according to Cor. (C.9), (a) is prime. Then
a is prime by Lem. C.6.
“(ii) ⇒ (iii)” was already shown in Prop. 7.30(e).
Now assume R to be a principal ideal domain. It only remains to show “(iii) ⇒ (i)”.
To this end, suppose a is irreducible and b is such that (a) ⊆ (b) ⊆ R. Then a ∈ (b), i.e.
there exists c ∈ R such that a = cb. As a is irreducible, this implies c ∈ R∗ or b ∈ R∗ . If
b ∈ R∗ , then (b) = R. If c ∈ R∗ , then b = ac−1 , proving (b) ⊆ (a) and (a) = (b). Thus,
(a) is maximal in R. 

Example C.11. We already know from Ex. 7.27(b) that Z is a principal ideal domain.
Thus, the ideals in Z are precisely the sets (n) = nZ with n ∈ N0 (where (n) = (−n)).
We also already know from [Phi19, Ex. 4.38] that the quotient ring Zn := Z/nZ is a
field if, and only if, n ∈ N is prime (in [Phi19, Ex. 4.38], we still avoided using the term
quotient ring). We can now concisely recover and summarize these previous results using
the notions and results of the present section by stating that the following assertions
are equivalent for n ∈ N:

(i) n is irreducible.

(ii) n is prime.

(iii) (n) = nZ is a prime ideal.

(iv) (n) = nZ is a maximal ideal.

(v) Zn is an integral domain.

(vi) Zn is a field.

Indeed, as Z is a principal ideal domain, (i) ⇔ (ii) ⇔ (iv) by Th. C.10. Moreover,
(iii) ⇔ (v) by Th. C.8(b) and (iv) ⇔ (vi) by Th. C.8(c). The still missing implication
(v) ⇒ (vi) was, actually, also already shown in [Phi19, Ex. 4.38] (it was shown that, if
n is not prime, then Zn has nonzero zero divisors).
Since Z has no nonzero zero divisors, (0) is a prime ideal in Z – it is the only prime
ideal in Z that is not a maximal ideal.

D ALGEBRAIC FIELD EXTENSIONS 199

In the following Th. C.12, we use Zorn’s lemma [Phi19, Th. 5.22] to prove the existence
of maximal ideals. In Th. D.16 below, this result will then be used to establish the
existence of algebraic closures.

Theorem C.12. Let R be a commutative ring with unity and let a be a proper ideal
in R. Then there exists a maximal ideal m in R such that a ⊆ m (in particular, each
commutative ring with unity contains at least one maximal ideal).

Proof. Let S be the set of all proper ideals in R that contain a. Note that set inclusion
⊆ provides a partial
S order on S. If C 6= ∅ is a totally ordered subset of S, then, by
Prop. 7.24(f), s := c∈C c is an ideal in R. If c ∈ C, then 1 ∈
/ c, since c is proper. Thus,
1∈/ s, showing s to be proper, i.e. s ∈ S provides an upper bound for C. Thus, Zorn’s
lemma [Phi19, Th. 5.22] applies, yielding a maximal element m ∈ S (i.e. maximal in S
with respect to ⊆), which is, thus, a maximal ideal in R that contains a. 

D Algebraic Field Extensions

D.1 Basic Definitions and Properties


Definition D.1. Let F ⊆ L be fields, i.e. let L be a field extension of F .

(a) We know from [Phi19, Ex. 5.2(b)] that L is a vector space over F . The dimension
of L as a vector space over F is denoted [L : F ] and is called the degree of L over
F . The field extension is called finite for [L : F ] < ∞ and infinite for [L : F ] = ∞.

(b) λ ∈ L is called algebraic over F if, and only if, ker ǫλ 6= {0}, where ǫλ : F [X] −→ L is
the substitution homomorphism of Def. and Rem. 7.10 (i.e. if, and only if, ǫλ (f ) = 0
for some nonzero polynomial f with coefficients in F ); λ ∈ L is called transcendental
over F if, and only if, λ is not algebraic over F .

(c) The field extension L is called algebraic over F if, and only if, each λ ∈ L is algebraic
over F .

Example D.2. (a) If F is a field and λ ∈ F , then λ is algebraic over F due to ǫλ (X −


λ) = 0.

(b) Consider the field extension C of R. Then [C : R]= 2 and C is algebraic over R,
since each z ∈ C satisfies ǫz X 2 − (2 Re z) X + |z|2 = 0: Indeed,

X 2 − 2 Re z X + |z|2 = X 2 − (z + z) + zz = (X − z) (X − z).
D ALGEBRAIC FIELD EXTENSIONS 200

(c) Consider the field extension R of Q. For each q ∈ Q+ 0 and each n ∈ N, λ :=


√n q ∈ R is algebraic over Q, since ǫ (X n − q) = 0. The real numbers e, π ∈ R
λ
are transcendental over Q, however, the proof is fairly involved (see, e.g., [Lan05,
App. 1]). We can conclude [R : Q] = ∞ and that R is not algebraic over Q using
simple cardinality considerations: Since Qn is countable for each n ∈ N and R is
uncountable, we see [R : Q] = ∞. Since, for each n ∈ N0 , the set

Q[X]n := span{1, X, . . . , X n } = {f ∈ Q[X] : deg f ≤ n},

forms an (n + 1)-dimensional vector subspace of


S the vector space Q[X] over Q,

each Q[X]n = Q n+1
is countable, and Q[X] = n∈N0 Q[X]n is countable as well.
According to Cor. 7.14, 0 6= f ∈ Q[X]n ⊆ R[X]n has at most n zeros (in Q and in
R), showing that
 [
S := x ∈ R : x algebraic over Q = x ∈ R : ǫx (f ) = 0 for some f ∈ Q[X]n
n∈N

is countable, i.e. S ( R and, thus, R is not algebraic over Q.

Theorem D.3 (Degree Theorem). Consider fields F, K, L such that F ⊆ K ⊆ L. Then

[L : F ] = [L : K] · [K : F ], (D.1)

where the equation holds in N := N ∪ {∞} if one uses the convention that n · ∞ =
∞ · n = ∞ for each n ∈ N.

Proof. Let BK be a basis of K as a vector space over F and let BL be a basis of L as a


vector space over K. It suffices to show that

B := {κ λ : κ ∈ BK ∧ λ ∈ BL }

is a basis of L as a vector space over F . We first show B to be linearly independent


over F . Suppose
Xn X m
cij κi λj = 0,
i=1 j=1

where κ1 , . . . , κn are distinct elements from BK (n ∈ N), λ1 , . . . , λm are distinct elements


form BL (m ∈ N), and cij ∈ F . Since
n X m m n
!
X X X
0= cij κi λj = cij κi λj ,
i=1 j=1 j=1 i=1
D ALGEBRAIC FIELD EXTENSIONS 201

Pn
each i=1 cij κi ∈ K, and λ1 , . . . , λm are linearly independent over K, we obtain
n
X
∀ cij κi = 0,
j∈{1,...,m}
i=1

implying cij = 0 for all (i, j) ∈ {1, . . . , n} × {1, . . . , m}, due to the linear independence
of κ1 , . . . , κn over F . This completes the proof that B is linearly independent over F . It
remains to show B is a generating set for L over F . To this end, let α ∈ L. Then Pm there
exists m ∈ N as well as λ1 , . . . , λm ∈ L and β1 , . . . , βm ∈ K such that α = j=1 βj λj .
Next, there exists n ∈ N as well as κ1 , . . . , κn and cij ∈ F such that
n
X
∀ βj = cij κi
j∈{1,...,m}
i=1

(by using cij := 0 if necessary, we can use the same κ1 , . . . , κn for each βj ). Thus, we
obtain m m X n
X X
α= βj λj = cij κi λj ,
j=1 j=1 i=1

proving B to be a generating set for L over F . 

Theorem D.4. Let F, L be fields such that L is a finite field extension of F (i.e. F ⊆ L
and [L : F ] < ∞). Then L is an algebraic field extension of F (however, there also exist
infinite field extensions, see Ex. D.12 below).

Proof. If [L : F ] = n ∈ N and λ ∈ L, then the n + 1 elements λ0 , . . . , λn ∈ L are linearly


dependent over F , i.e. there exist c0 , . . . , cn ∈ F such that
n
! n
X X
i
ǫλ ci X = c i λi = 0
i=0 i=0

and ni=0 ci X i 6= 0 ∈ F [X], showing λ to be algebraic over F . Since λ ∈ L was arbitrary,


P
L is algebraic over F . 

Theorem D.5. Let F, L be fields with F ⊆ L, let α ∈ L be algebraic over F , and let
ǫα : F [X] −→ L be the corresponding substitution homomorphism.

(a) There exists a unique monic polynomial µα ∈ F [X] such that ker ǫα = (µα ). More-
over, this polynomial µα is both prime and irreducible, and it is the unique monic
polynomial f ∈ F [X] such that ǫα (f ) = 0 and such that f is of minimal degree in
D ALGEBRAIC FIELD EXTENSIONS 202

ker ǫα . One calls µα the minimal polynomial9 or the irreducible polynomial of the
algebraic element α over F .

(b) If µα is the minimal polynomial of α over F as defined in (a), then one has

F [X]/ ker ǫα = F [X]/(µα ) ∼


= Im ǫα = F [α] = F (α) ⊆ L, (D.2)

where F [α] and F (α) are according to Not. B.21. In  particular,


 F [α] is a field
extension of F . The degree of this field extension is F [α] : F = deg µα and, in
consequence, F [α] ∼
= F [X]/ ker ǫα is algebraic over F .

Proof. (a): F [X] is a principal ideal domain according to Ex. 7.27(b). Thus, there
exists a monic µα ∈ F [X] such that ker ǫα = (µα ). If f ∈ F [X] is another monic
polynomial such that ker ǫα = (f ), then, by Prop. 7.30(c), f and µα are associated and,
by Prop. 7.7, there exists r ∈ R∗ such that f = r µα , implying f = µα . To see µα is
the unique monic polynomial of minimal degree in ker ǫα , let g ∈ F [X]. Then g ∈ F or
deg(gµα ) = deg g + deg µα > deg µα by (7.5c). According to the isomorphism Th. C.4,
F [X]/ ker ǫα = F [X]/(µα ) ∼
= Im ǫα ⊆ L. As a subring of the field L, Im ǫα is an integral
domain. Thus, (µα ) is prime by Th. C.8(b), implying µα to be prime by Lem. C.6 and
irreducible by Th. C.10.
(b): In the proof of (a), we already noticed (D.2) to hold due to the isomorphism Th.
C.4, except that we still need to verify F [α] = F (α), i.e. we need to show F [α] is a
field. However, by (a), µα is irreducible and, thus, (µα ) is maximal by Th.
 C.10.  In

consequence, F [α] = F [X]/(µα ) is a field by Th. C.8(c). Next, we show F [α] : F =
deg µα : Let
φ : F [X] −→ F [X]/(µα ), f 7→ f := φ(f ) = f + (µα ),
be the canonical epimorphism. Suppose n := deg µα . If f ∈ F [X], then, according to
the remainder Th. 7.9, there exist unique polynomials q, r ∈ F [X] such that

f = q µα + r ∧ deg r < n,
Pn−1
i.e. f uniquely determines c0 , . . . , cn−1 ∈ F such that r = i=0 ci X i . Applying φ yields
n−1
X
f = q µα + r = r = ci X i ,
i=0

9
According to (b), F (α) is a finite-dimension vector space over F . Thus, we can consider the F -linear
endomorphism A : F (α) −→ F (α), A( x) := αx. Then µα is actually also the minimal polynomial of A
in the sense of Th. 8.6: ǫA (µα ) = 0: Indeed, for each x ∈ F (α), ǫA (µα )(x) = ǫα (µα ) · x = 0 · x = 0. For
each polynomial f ∈ F [X] such that ǫA (f ) = 0, one has ǫα (f ) = 0 and f = gµα for some g ∈ F [X],
showing µα | f .
D ALGEBRAIC FIELD EXTENSIONS 203

showing {X 0 , . . . , X n−1 } to be a generating set for F [X]/(µα ) as a vector space over F .


To verify the set is also linearly independent, suppose a0 , . . . , an−1 ∈ F are such that

n−1
X n−1
X
ai X i = ai X i = (µα ) = 0 ∈ F [X]/(µα ).
i=0 i=0

Pn−1 Pn−1
Then i=0 ai X i ∈ ker ǫα , implying a0 = · · · = an−1 = 0, since deg i=0 ai X i < n.
Thus {X 0 , . . . , X n−1 } is a basis of F [X]/(µ α ) and, using the isomorphism of (D.2),
0 n−1
{α , . . . , α } is a basis of F [α], proving F [α] : F = n = deg µα . Finally, as F [α] :
F = deg µα < ∞, F (α) ∼ = F [X]/ ker ǫα is algebraic over F by Th. D.4. 

Definition D.6. Let F, L be fields with F ⊆ L and recall Not. B.21 as well as Prop.
B.22(b).

(a) L is called a simple fieldextension of F if, and only if, there exists λ ∈ L such that
L = F (λ). In this case, [F (λ) : F is called the degree of λ over F .

(b) L is called a finitely generated field extension of F if, and only if, there exist
λ1 , . . . , λn ∈ L, n ∈ N, such that L = F (λ1 , . . . , λn ).

Example D.7. Let F, L be fields with F ⊆ L. If τ ∈ L is transcendental over F , then


the field extension F (τ ) is finitely generated, but neither algebraic nor finite: Indeed,
since τ ∈ F (τ ), F (τ ) is not algebraic; F (τ ) is not finite, since {τ n : n ∈ N0 } is linearly
independent over F : If !
XN XN
0= c n τ n = ǫτ cn X n
n=0 n=0
PN
with c0 , . . . , cN ∈ F , N ∈ N0 , then n=0 cn X n ∈ ker ǫτ , implying c0 = · · · = cN = 0,
since τ is transcendental over F . In combination with Cor. D.9 below, this example
shows that, for λ1 , . . . , λn ∈ L, n ∈ N, F (λ1 , . . . , λn ) is a finite field extension of F if,
and only if, the elements λ1 , . . . , λn are all algebraic over F .

Theorem D.8. Let F, L be fields with F ⊆ L. Suppose n ∈ N and α1 , . . . , αn ∈ L are


algebraic over F such that L = F (α1 , . . . , αn ). Then the following holds true:

(a) L = F (α1 , . . . , αn ) = F [α1 , . . . , αn ].

(b) L is a finite (and, thus, algebraic) field extension of F .

Proof. We carry out the proof via induction on n ∈ N, where the base case (n = 1)
was already done in Th. D.5(b). Now let n > 1. By induction, we know K :=
D ALGEBRAIC FIELD EXTENSIONS 204

F (α1 , . . . , αn−1 ) = F [α1 , . . . , αn−1 ] to be a finite field extension of F . Since αn is alge-


braic over K (as αn is algebraic over F ⊆ K), Th. D.5(b) yields K[αn ] = K(αn ). Thus,
F [α1 , . . . , αn ] = K[αn ] is a field, proving L = F (α1 , . . . , αn ) = F [α1 , . . . , αn ], as desired.
Moreover, since [K : F ] < ∞ and [L : K] < ∞, Th. D.3 yields

[L : F ] = [L : K] · [K : F ] < ∞,

thereby completing the induction and the proof. 


Corollary D.9. Let F, L be fields with F ⊆ L. Then the following statements are
equivalent:

(i) L is a finite field extension of F .


(ii) L is finitely generated over F by finitely many algebraic elements.
(iii) L is a finitely generated algebraic field extension of F .

Proof. (ii) implies (i) and (iii) by Th. D.8. If (iii), then L = F (α1 , . . . , αn ) with
α1 , . . . , αn ∈ L. However, as L is algebraic over F , the elements α1 , . . . , αn ∈ L are
all algebraic over F , showing (iii) implies (ii). Finally, assume (i), i.e. there exists
n ∈ N and B := {α1 , . . . , αn } ⊆ L such that B forms a basis of L over F . Then
L = F (α1 , . . . , αn ) and α1 , . . . , αn are algebraic over F by Th. D.4, showing (i) implies
(ii). 
Corollary D.10. Let F, L be fields with F ⊆ L. Then the following statements are
equivalent:

(i) L is algebraic over F .


(ii) There exists a family A := (αi )i∈I in L of algebraic elements over F such that
L = F (A).

Proof. It is immediate that (i) implies (ii) (as one can choose A := L). Conversely, if
L = F (A), where A := S(αi )i∈I is a family of algebraic elements over F , then, by Prop.
B.22(b), L = F (A) = J⊆I: #J<∞ F (αi )i∈J , where each F (αi )i∈J with finite J is
algebraic over F by Th. D.8. Thus, each x ∈ L is algebraic over F , proving (i). 
Theorem D.11. Let F, K, L be fields such that F ⊆ K ⊆ L and let α ∈ L.

(a) If K is algebraic over F and α is algebraic over K, then α is algebraic over F .


(b) L is algebraic over F if, and only if, K is algebraic over F and L is algebraic over
K.
D ALGEBRAIC FIELD EXTENSIONS 205

Proof. (a): Since α is P algebraic over K, there exists n ∈ N and κ0 , . . . , κn ∈ K, not all
equal to 0, such that ni=0 κi αi = 0. This shows that α is also already algebraic over
K0 := F (κ0 , . . . , κn ) ⊆ K and [K0 [α] : K0 ] < ∞ by Th. D.5(b). Moreover, [K0 : F ] < ∞
by Th. D.8 and, thus, [K0 [α] : F ] = [K0 [α] : K0 ] · [K0 : F ] < ∞ by Th. D.3. Thus, K0 [α]
is algebraic over F and, in particular, α is algebraic over F .
(b): If K is algebraic over F and L is algebraic over K, then L is algebraic over F by
(a). Conversely, if L is algebraic over F , then K ⊆ L is algebraic over F , and L is
algebraic over K, as F ⊆ K. 
Example D.12. Consider the field of algebraic numbers, defined by

A := α ∈ C : α is algebraic over Q .

Indeed, A is a field, since, if α, β ∈ A, then A contains the field Q(α, β) by Th. D.8,
i.e., in particular, αβ, α + β, −α ∈ A and, for 0 6= α, α−1 ∈ A. Thus, A is an algebraic
field extension of Q, where an argument completely analogous to the one in Ex. D.2(c)
shows A to be countable and, in particular, A ( C. One can show that A is not a
√ √
finite field extension of Q: Since ( n p)n − p = 0 for each n, p ∈ N, we have n p ∈ A

for each n, p ∈ N. Since, for p prime, X n − p ∈ Q[X] is the minimal polynomial of n p

and deg(X n − p) = n, one obtains [A : Q] ≥ [Q( n p) : Q] = n, showing [A : Q] = ∞.

However, to actually prove X n − p ∈ Q[X] to be the minimal polynomial of n p for p
prime, one needs to show X n − p is irreducible. This turns out to be somewhat tricky
and is usually obtained by using Eisenstein’s irreducibility criterion (cf., e.g., [Bos13,
Th. 2.8.1]).

D.2 Algebraic Closure


Definition D.13. If F, L are fields such that L is a field extension of F , then L is called
an algebraic closure of F if, and only if, L is both algebraically closed and algebraic over
F . In this case, one often writes F for L.

The goal of the present section is to show every field is contained in an algebraic closure
(Th. D.16) and that all algebraic closures of a field are isomorphic (Cor. D.20). Both
results are based on suitable applications of Zorn’s lemma. In preparation for Th. D.16,
we first show in the following Th. D.14 that one can always extend a field F to a field
L such that a particular given polynomial over F has a zero in L.
Theorem D.14. Let F be a field and f ∈ F [X] such that deg f ≥ 1. Then there
exists an algebraic field extension L of F such that f has a zero in L (i.e. such that
D ALGEBRAIC FIELD EXTENSIONS 206

ǫα (f ) = 0 for some α ∈ L). Moreover, if f is irreducible over F , then one may choose
L := F [X]/(f ).

Proof. It suffices to consider the case, where f is irreducible over F : If f is not irre-
ducible, then, by Cor. 7.37, one writes f = f1 · · · fn with irreducible f1 , . . . , fn ∈ F [X],
n ∈ N, showing f to have a zero in L if an irreducible factor fi has a zero in L. Thus,
we now assume f to be irreducible. Then, according to Th. C.10, the ideal (f ) is
maximal in F [X], i.e. L := F [X]/(f ) is a field by Th. C.8(c). In the usual way, we
consider F ⊆ F [X], i.e. F [X] as a ring extension of F , and we consider the canonical
epimorphism
φ : F [X] −→ L = F [X]/(f ), φ(g) = g + (f ).
Then φ↾F : F −→ L is a unital ring homomorphism between fields and, thus, injective,
by Prop. 7.28. Thus, L is a field extension of F , where we can consider F ⊆ L, if we
identify F with φ(F ). We claim that φ(X) P ∈ L is the desired zero of f : Indeed, if
c0 , . . . , cn ∈ F (n ∈ N) are such that f = ni=0 ci X i , then
n n
!
X i X
ǫφ(X) (f ) = ci φ(X) = φ ci X i = φ(f ) = 0 ∈ L,
i=0 i=0

where, in the notation, we did not distinguish between ci ∈ F and φ(ci ) ∈ L. We


note that L is algebraic over F by Th. D.5(b), since f is irreducible and, thus, after
multiplication with c ∈ F \ {0}, the minimal polynomial of its zero φ(X) over F . 
Corollary D.15. For a field F , the following statements are equivalent:

(i) F is algebraically closed.


(ii) If L is an algebraic field extension of F , then F = L.

Proof. “(i)⇒(ii)”: Suppose F is algebraically closed and let L be an algebraic field


extension of F . If α ∈ L, then α is algebraic over F . If µα ∈ F [X] is the corresponding
minimal polynomial, then, by Cor. 7.14 (and using that F is algebraically closed, µα is
monic)
Yn
µα = (X − αj ), n = deg µα ∈ N, {α1 , . . . , αn } ⊆ F.
j=1

As µα is irreducible, this implies n = 1 and α1 = α ∈ F . As α ∈ L was arbitrary, this


yields F = L.
“(ii)⇒(i)”: Assuming (ii), we show F to be algebraically closed: To this end, let f ∈
F [X] with deg f ≥ 1. According to Th. D.14, there exists an algebraic field extension L
of F such that f has a zero in L. However, according to (ii), we have F = L, showing
f has a zero in F , i.e. F is algebraically closed. 
D ALGEBRAIC FIELD EXTENSIONS 207

Theorem D.16. Every field F is contained in an algebraic closure F .

Proof. Consider the set 


I := f ∈ F [X] : deg f ≥ 1 .
In a first step, we construct an algebraic field extension L1 of F such that each f ∈ I has
a zero in L1 : The method of construction is basically the same as in the proof of Th. D.14,
however, somewhat complicated by the fact that the set I is infinite. In consequence,
we now consider the polynomial ring in infinitely many variables F [(Xf )f ∈I ] (cf. Ex.
B.8(c)). As F [(Xf )f ∈I ] is a ring extension of F , we have, for each g ∈ I, the substitution
homomorphism
ǫXg : F [X] −→ F [(Xf )f ∈I ].
For each g ∈ I, we let g(Xg ) := ǫXg (g) (i.e. one may think of g(Xg ) as the element of
F [(Xf )f ∈I ] one obtains by replacing the “variable” X in g ∈ I by the “variable” Xg ).
Using Prop. 7.25, we may let a be the smallest ideal in F [(Xf )f ∈I ], containing all g(Xg )
with g ∈ I, i.e. 
a := {g(Xg ) : g ∈ I} .
We show a to be a proper ideal in F [(Xf )f ∈I ]: If a is not proper in F [(Xf )f ∈I ], then, using
Prop. 7.25(b), there exists n ∈ N and h1 , . . . , hn ∈ F [(Xf )f ∈I ] as well as g1 , . . . , gn ∈ I
such that X
hi gi (Xgi ) = 1. (D.3)
i=1

After applying Th. D.14 n times, we obtain a field extension K of F such that each of
the gi has a zero αi ∈ K. Define x := (xf )f ∈I by letting
(
αi for f = gi ,
xf :=
0 otherwise,

and consider the corresponding substitution homomorphism ǫx : F [(Xf )f ∈I ] −→ K.


Then !
X X
ǫx hi gi (Xgi ) = ǫx (hi ) ǫαi (gi ) = 0,
i=1 i=1

in contradiction to (D.3). Thus, a is proper in F [(Xf )f ∈I ] as desired. By Th. C.12, there


exists a maximal ideal m in R such that a ⊆ m. We can now define L1 := F [(Xf )f ∈I ]/m
and know L1 to be a field by Th. C.8(c). As before, we consider F ⊆ F [(Xf )f ∈I ], i.e.
F [(Xf )f ∈I ] as a ring extension of F , and we consider the canonical epimorphism

φ : F [(Xf )f ∈I ] −→ L1 = F [(Xf )f ∈I ]/m, φ(h) = h + m.


D ALGEBRAIC FIELD EXTENSIONS 208

Then φ↾F : F −→ L1 is a unital ring homomorphism between fields and, thus, injective,
by Prop. 7.28. Thus, L1 is a field extension of F , where we can consider F ⊆ L1 , if
we identify F with φ(F ). We proceed analogous to the proof of Th. D.14 and show
φ(XfP) ∈ L1 is the desired zero of f ∈ I: Indeed, if c0 , . . . , cn ∈ F (n ∈ N) are such that
f = ni=0 ci X i ∈ I, then
n n
!
X i X
i
 f (Xf )∈m
ǫφ(Xf ) (f ) = ci φ(Xf ) = φ ci (Xf ) = φ f (Xf ) = 0 ∈ L1 ,
i=0 i=0

where, in the notation, we did not distinguish between ci ∈ F and φ(ci ) ∈ L1 . Next, we
show L1 to be algebraic over F : With α := (φ(Xf ))f ∈I and ǫα : F [(Xf )f ∈I ] −→ L1 , we
have F [α] = Im ǫα as a subring of L1 . According to the isomorphism Th. C.4,

F [(Xf )f ∈I ]/ ker ǫα ∼
= Im ǫα = F [α] ⊆ L1 .

Moreover, according to Cor. D.10, F (α) ⊆ L1 is algebraic over F and, using Th. D.8(a),
we have
[  [  
F (α) = F (φ(Xf ))f ∈J = F (φ(Xf ))f ∈J = F [α].
J⊆I: #J<∞ J⊆I: #J<∞

Thus, F [α] is a field and ker ǫα is a maximal ideal in F [(Xf )f ∈I ]. Since a ⊆ ker ǫα , this
shows m = ker ǫα and L1 = F (α). In particular, L1 is algebraic over F .
One can now inductively iterate the above construction to obtain a sequence (Lk )k∈N of
fields such that
F ⊆ L1 ⊆ L2 ⊆ . . . ,
where, for each k ∈ N, Lk+1 is an algebraic field extension of Lk and f ∈ Lk [X] with
deg f ≥ 1 has a zero in Lk+1 . Then, as a consequence of Th. D.11(b), each Lk is also
algebraic over F . If we now let [
F := Lk ,
k∈N

then F is a field according to [Phi19, Ex. 4.36(f)] (actually, one first needs to extend +
and · to L, which is straightforward, since for a, b ∈ F , there exists k ∈ N such that
a, b ∈ Lk , and, thus, a + b and a · b are already defined in Lk – as the Lk are nested,
this yields a well-defined + and · on L). Now F is an algebraic closure of F : Indeed,
if α ∈ F , then α ∈ Lk for some k ∈ N and, thus, α is algebraic over F , proving F to
be algebraic over F . If f ∈ F [X], then, as f has only finitely many coefficients, there
exists k ∈ N such that f ∈ Lk [X]. Then f has a zero α ∈ Lk+1 ⊆ F , showing F to be
algebraically closed. 
D ALGEBRAIC FIELD EXTENSIONS 209

It is remarked in [Bos13], after the proof of [Bos13, Th. 3.4.4], that one can show by
different means that, in the situation of the proof of Th. D.16 above, one actually has
L1 = F , i.e. one does, in fact, obtain an algebraic closure of F in the first step of the
construction.
In preparation for showing that two algebraic closures of a field F are necessarily iso-
morphic, we need to briefly study extensions of homomorphisms between fields (it is
also of algebraic interest beyond its application here).
Lemma D.17. Let F, L be fields and let σ : F −→ L be a unital homomorphism. Then
σ extends to a map
n
X n
X
i σ
σ : F [X] −→ L[X], f= fi X 7→ f := σ(fi )X i . (D.4)
i=0 i=0

(a) σ : F [X] −→ L[X] is still a unital homomorphism.

(b) Let f ∈ F [X], x ∈ F . If ǫx (f ) = 0, then ǫσ(x) (f σ ) = 0.


Pn Pn
Proof. (a): If f, g ∈ F [X], where f = i=0 fi X i , g = i=0 gi X i , then
n
X n
X
(f + g)σ = σ(fi + gi ) X i = σ(fi ) + σ(gi ) X i

i=0 i=0
n
X n
X
= σ(fi ) X i + σ(gi ) X i = f σ + g σ ,
i=0 i=0
2n
! 2n
!
X X X X
(f g)σ = σ fk g l Xi = σ(fk ) σ(gl ) X i = f σ gσ ,
i=0 k+l=i i=0 k+l=i

proving σ : F [X] −→ L[X] to be a homomorphism.


(b): If f = ni=0 fi X i ∈ F [X], then
P

n n
!
i
X X
ǫσ(x) (f σ ) = fi xi = σ ǫx (f ) = σ(0) = 0,
 
σ(fi ) σ(x) = σ
i=0 i=0

as claimed. 
Proposition D.18. Let F, K be fields such that K = F (α) is a simple algebraic field
extension of F , α ∈ K. Let L be another field and let σ : F −→ L be a unital
homomorphism. Let µα ∈ F [X] denote the minimal polynomial of α over F and let
µσα ∈ L[X] denote its image under σ according to (D.4).
D ALGEBRAIC FIELD EXTENSIONS 210

(a) If τ : K −→ L is a homomorphism extending σ (i.e. τ ↾F = σ), then ǫτ (α) (µσα ) = 0.


(b) For each zero λ ∈ L of µσα , there exists a unique homomorphism τ : K −→ L such
that τ extends σ and τ (α) = λ.
(c) One has

# (τ : K −→ L) : τ ↾F = σ and τ is homomorphism
= # λ ∈ L : ǫτ (α) (µσα ) = 0 ≤ deg µσα ≤ deg µα .


Proof. (a) is immediate from Lem. D.17(b), since ǫα (µα ) = 0 and µτα = µσα .
(b): By definition, K = F (α) is the field of fractions of F [α] = Im ǫα , ǫα : F [X] −→ K.
If τ1 , τ2 : K −→ L are homomorphism extending σ such that τ1 (α) = τ2 (α) and f, g ∈
F [X] are such that α is not a zero of g, then
ǫτ1 (α) (f ) ǫτ2 (α) (f )
   
ǫα (f ) ǫα (f )
τ1 = = = τ2 ,
ǫα (g) ǫτ1 (α) (g) ǫτ2 (α) (g) ǫα (g)

showing τ1 = τ2 , proving the uniqueness statement (one could have even omitted the
denominators in the previous computation, as we know K = F (α) = F [α] from Th.
D.5(b)). To prove the existence statement, let λ ∈ L be such that ǫλ (µσα ) = 0. Consider
the homomorphisms

ǫα : F [X] −→ F (α) = K, ψ := ǫλ ◦ σ : F [X] −→ L.

Then we know ker ǫα = (µα ) from Th. D.5(b). We also know (µα ) ⊆ ker ψ, since, for
each f ∈ F [X],

ψ(f µα ) = ǫλ (f σ µσα ) = ǫλ (f σ )ǫλ (µσα ) = ǫλ (f σ ) · 0 = 0.

If
φ : F [X] −→ F [X]/(µα ), φ(f ) = f + (µα ),
is the canonical epimorphism, then, by the isomorphism Th. C.4, we can write ǫα =
φα ◦ φ with a monomorphism φα : F [X]/(µα ) −→ K and we can write ψ = φψ ◦ φ
with a monomorphism φψ : F [X]/(µα ) −→ L. As mentioned above, we know K =
F (α) = F [α], implying ǫα to be surjective, i.e. φα is also an epimorphism and, thus, an
isomorphism. We claim
τ : K −→ L, τ := φψ ◦ φ−1 α

to be the desired extension of σ with τ (α) = β (as a composition of homomorphisms, τ


is a homomorphism). Indeed, for each x ∈ F ⊆ F [X],

φα x + (µα ) = (φα ◦ φ)(x) = ǫα (x) = x ∈ K,
D ALGEBRAIC FIELD EXTENSIONS 211

implying

τ (x) = φψ ◦ φ−1
 
α (x) = φψ x + (µα ) = (φψ ◦ φ)(x) = ψ(x) = σ(x) ∈ L,

proving τ ↾F = σ. Analogously, for X ∈ F [X],



φα X + (µα ) = (φα ◦ φ)(X) = ǫα (X) = α ∈ K,

implying

τ (α) = φψ ◦ φ−1
 
α (α) = φψ X + (µα ) = (φψ ◦ φ)(X) = ψ(X) = ǫλ (X) = λ,

thereby establishing the case.


(c): The equality of cardinalities is immediate from combining (a) and (b). Then the
first estimate is due to Cor. 7.14 and the second estimate is due to (D.4). 
Theorem D.19. Let F, K be fields such that K is an algebraic field extension of F .
Let L be another field and let σ : F −→ L be a unital homomorphism.

(a) If L is algebraically closed, then there exists a homomorphism τ : K −→ L such


that τ ↾F = σ.

(b) If both K and L are algebraically closed, L is algebraic over σ(F ), and τ : K −→ L
is a homomorphism such that τ ↾F = σ, then τ is an isomorphism.

Proof. (a): The proof is basically a combination of Prop. D.18(b) with Zorn’s lemma:
To apply Zorn’s lemma of [Phi19, Th. A.52(iii)], we define a partial order on the set
n
M := (H, ϕ) : F ⊆ H ⊆ K, H is a field,
o
ϕ : H −→ L is a homomorphism with ϕ↾F = σ

by letting
(H1 , ϕ1 ) ≤ (H2 , ϕ2 ) :⇔ H 1 ⊆ H 2 ∧ ϕ 2 ↾H 1 = ϕ 1 .
Then (F, σ) ∈ M, i.e. S M 6= ∅. Every chain C ⊆ M has an upper bound, namely
(HC , ϕC ) with HC := (H,ϕ)∈C H and ϕC (x) := ϕ(x), where (H, ϕ) ∈ C is chosen such
that x ∈ H (since C is a chain, the value of ϕC (x) does not actually depend on the
choice of (H, ϕ) ∈ C and is, thus, well-defined). Clearly, F ⊆ HC ⊆ K, HC is a field
by [Phi19, Ex. 4.36(f)], and ϕC is a homomorphism with ϕ↾F = σ. Thus, Zorn’s lemma
applies, yielding a maximal element (Hmax , ϕmax ) ∈ M. We claim that Hmax = K:
Indeed, if there exists α ∈ K \ Hmax , then, by assumption, α is algebraic over F ,
and we may consider the minimal polynomial µα ∈ F [X] of α over F . If µσα ∈ L[X]
REFERENCES 212

denotes the image of µα under σ according to (D.4), then µσα has a zero λ ∈ L, since
L is algebraically closed. Thus, by Prop. D.18(b), we can extend ϕmax to Hmax (α),
where Hmax ( Hmax (α) ⊆ K, in contradiction to the maximality of (Hmax , ϕmax ). In
consequence, τ := ϕmax : K −→ L is the desired extension of σ.
(b): Under the hypotheses of (b), τ is injective by Prop. 7.28 and, thus, an isomorphism
between the fields K and τ (K), as τ (K) must be a field by [Phi19, Prop. 4.37]. In
consequence, if K is algebraically closed, then so is τ (K) (e.g. due to Lem. D.17(b)).
Since L is algebraic over σ(F ), L is algebraic over τ (K) ⊇ σ(F ), and Cor. D.15(ii) yields
τ (K) = L, showing τ to be an isomorphism between K and L as claimed. 

Corollary D.20. Let F be a field. If L1 and L2 are both algebraic closures of F , then
there exists an isomorphism φ : L1 ∼ = L2 such that φ↾F = IdF (however, this existence
result is nonconstructive, as it is based on a application of Zorn’s lemma).

Proof. We have Id : F −→ Li , i ∈ {1, 2}. According to Th. D.19(a), there exists a


homomorphism φ : L1 −→ L2 such that φ ↾F = Id. Since L2 is algebraic over F , it is
also algebraic over φ(L1 ) ⊇ F , i.e. φ is a isomorphism by Th. D.19(b). 

References
[Bos13] Siegfried Bosch. Algebra, 8th ed. Springer-Verlag, Berlin, 2013 (German).

[FJ03] Richard J. Fleming and James E. Jamison. Isometries on Banach Spaces:


Function Spaces. Monographs and Surveys in Pure and Applied Mathematics,
Vol. 129, CRC Press, Boca Raton, USA, 2003.

[For17] Otto Forster. Analysis 3, 8th ed. Springer Spektrum, Wiesbaden, Ger-
many, 2017 (German).

[Jac75] Nathan Jacobson. Lectures in Abstract Algebra II. Linear Algebra. Gradu-
ate Texts in Mathematics, Springer, New York, 1975.

[Kön04] Konrad Königsberger. Analysis 2, 5th ed. Springer-Verlag, Berlin, 2004


(German).

[Lan05] Serge Lang. Algebra, revised 3rd ed. Graduate Texts in Mathematics, Vol.
211, Springer, New York, 2005.
REFERENCES 213

[Phi16a] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, LMU Mu-
nich, 2015/2016, AMS Open Math Notes Ref. # OMN:202109.111306, avail-
able in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111306.

[Phi16b] P. Philip. Analysis II: Topology and Differential Calculus of Several Vari-
ables. Lecture Notes, LMU Munich, 2016, AMS Open Math Notes Ref. #
OMN:202109.111307, available in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111307.

[Phi17a] P. Philip. Analysis III: Measure and Integration Theory of Several Variables.
Lecture Notes, LMU Munich, 2016/2017, AMS Open Math Notes Ref. #
OMN:202109.111308, available in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111308.

[Phi17b] P. Philip. Functional Analysis. Lecture Notes, Ludwig-Maximi-


lians-Universität, Germany, 2017, available in PDF format at
http://www.math.lmu.de/~philip/publications/lectureNot
es/philipPeter_FunctionalAnalysis.pdf.

[Phi19] P. Philip. Linear Algebra I. Lecture Notes, Ludwig-Maximilians-Universität,


Germany, 2018/2019, available in PDF format at
http://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_LinearAlgebra1.pdf.

[Phi21] P. Philip. Numerical Mathematics I. Lecture Notes, Ludwig-Ma-


ximilians-Universität, Germany, 2020/2021, available in PDF format
at http://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Num
ericalMathematics1.pdf.

[Rud73] W. Rudin. Functional Analyis. McGraw-Hill Book Company, New York, 1973.

[Str08] Gernot Stroth. Lineare Algebra, 2nd ed. Berliner Studienreihe zur Math-
ematik, Vol. 7, Heldermann Verlag, Lemgo, Germany, 2008 (German).

[Wer11] D. Werner. Funktionalanalysis, 7th ed. Springer-Verlag, Berlin, 2011 (Ger-


man).

You might also like