Linear Algebra II: Peter Philip
Linear Algebra II: Peter Philip
Peter Philip∗
Lecture Notes
Originally Created for the Class of Spring Semester 2019 at LMU Munich
Includes Subsequent Corrections and Revisions†
Contents
1 Affine Subspaces and Geometry 4
1.1 Affine Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Affine Hull and Affine Independence . . . . . . . . . . . . . . . . . . . . 6
1.3 Affine Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Barycentric Coordinates and Convex Sets . . . . . . . . . . . . . . . . . . 12
1.5 Affine Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Affine Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Duality 22
2.1 Linear Forms and Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Hyperplanes and Linear Systems . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Dual Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Symmetric Groups 38
∗
E-Mail: [email protected]
†
Resources used in the preparation of this text include [Bos13, For17, Lan05, Str08].
1
CONTENTS 2
6 Eigenvalues 76
References 212
1 AFFINE SUBSPACES AND GEOMETRY 4
Thus, the affine subspaces of a vector space V are precisely the translations of vector
subspaces U of V , i.e. the cosets of subspaces U , i.e. the elements of quotient spaces
V /U .
Lemma 1.2. Let V be a vector space over the field F .
Affine spaces and vector spaces share many structural properties. In consequence, one
can develop a theory of affine spaces that is in many respects analogous to the theory
1 AFFINE SUBSPACES AND GEOMETRY 5
of vector spaces, as will be illustrated by some of the notions and results presented in
the following. We start by defining so-called affine combinations, which are, for affine
spaces, what linear combinations are for vector spaces:
Definition 1.4. Let V be a P vector space over the field F with v1 , . . . , vn ∈ V and
F , n ∈ N. Then ni=1 λi vi is called an affine combination of v1 , . . . , vn if,
λ1 , . . . , λn ∈ P
and only if, ni=1 λi = 1.
Theorem 1.5. Let V be a vector space over the field F , ∅ 6= M ⊆ V . Then M is
an affine subspace of V if, and only, if M is closed under affine combinations. More
precisely, the following statements are equivalent:
Proof. Exercise.
The following Th. 1.6 is the analogon of [Phi19, Th. 5.7] for affine spaces:
Theorem 1.6. Let V be a vector space over the field F .
(b) In contrast to intersections, unions of affine subspaces are almost never affine sub-
spaces. More precisely, if M1 and M2 are affine subspaces of V and char F 6= 2 (i.e.
1 6= −1 in F ), then
M1 ∪ M2 is an affine subspace of V ⇔ M 1 ⊆ M2 ∨ M2 ⊆ M1 (1.2)
(where “⇐” also holds for char F = 2, but cf. Ex. 1.7 below).
1
For char F = 2, (iii) does not imply (i) and (ii): Let F := Z2 = {0, 1}. Let V be a vector space
over F with #V ≥ 4 (e.g. V = F 2 ). Let p, q, r ∈ V be distinct, M := {p, q, r} (i.e. #M = 3). If
λ1 , λ2 ∈ F with λ1 + λ2 = 1, then (λ1 , λ2 ) ∈ {(0, 1), (1, 0)} and (iii) is trivially true. On the other hand,
v := p + q + r is an affine combination of p, q, r, since 1 + 1 + 1 = 1 in F ; but v ∈ / M: v = p + q + r = p
implies q = −r = r, and v = r, q likewise leads to a contradiction (this counterexample was pointed out
by Robin Mader).
1 AFFINE SUBSPACES AND GEOMETRY 6
The following Prop. 1.9 is the analogon of [Phi19, Prop. 5.9] for affine spaces:
Proposition 1.9. Let V be a vector space over the field F and ∅ 6= A ⊆ V .
(a) aff A is an affine subspace of V , namely the smallest affine subspace of V containing
A.
1 AFFINE SUBSPACES AND GEOMETRY 7
(b) aff A is the set of all affine combinations of elements from A, i.e.
( n n
)
X X
aff A = λ i ai : n ∈ N ∧ λ 1 , . . . , λ n ∈ F ∧ a1 , . . . , a n ∈ A ∧ λi = 1 .
i=1 i=1
(1.3)
(c) If A ⊆ B ⊆ V , then aff A ⊆ aff B.
(d) A = aff A if, and only if, A is an affine subspace of V .
(e) aff aff A = aff A.
Proof. (a): Since A ⊆ aff A implies aff A 6= ∅, (a) is immediate from Th. 1.6(a).
(b): Let W denote the right-hand side of (1.3). If M is an affine subspace of V and
A ⊆ M , then W ⊆ M , since M is closed under affine combinations, showing W ⊆ aff A.
On the other hand, suppose N, n1 , . . . , nN ∈ N, ak1 , . . . , aknk ∈ A for each k ∈ {1, . . . , N },
λk1 , . . . , λknk ∈ F for each k ∈ {1, . . . , N }, and α1 , . . . , αN ∈ F such that
nk
X N
X
∀ λki = αi = 1.
k∈{1,...,N }
i=1 i=1
Then
N
X nk
X
αk λki aki ∈ W,
k=1 i=1
since
X nk
N X N
X
αk λki = αk = 1,
k=1 i=1 k=1
(i) aff A = M .
1 AFFINE SUBSPACES AND GEOMETRY 8
(ii) h−v + Ai = U .
Proof. Exercise.
We will now define the notions of affine dependence/independence, which are, for affine
spaces, what linear dependence/independence are for vector spaces:
(b) A subset U of V is called affinely independent if, and only if, whenever 0 ∈ V is
written as a linear combination of distinct elements of U such that the coefficients
have sum 0, then all coefficients must be 0 ∈ F , i.e. if, and only if,
X
n∈N ∧ W ⊆U ∧ #W = n ∧ λu u = 0
u∈W
!
X
∧ ∀ λu ∈ F ∧ λu = 0 ⇒ ∀ λu = 0. (1.4)
u∈W u∈W
u∈W
Sets that are not affinely independent are called affinely dependent.
As a caveat, it is underlined that, in Def. 1.11(b) above, one does not consider affine
combinations of the vectors u ∈ U , but special linear combinations (this is related to
the fact that 0 is only an affine combination of vectors in U , if aff U is a vector subspace
of V ).
(a) ∅ is affinely independent: Indeed, if U = ∅, then the left side of the implication in
(1.4) is always false (since W ⊆ U means #W = 0), i.e. the implication is true.
1 AFFINE SUBSPACES AND GEOMETRY 9
(c) Every set {v, w} with two distinct vectors v, w ∈ V is affinely independent (but
not linearly independent for w = αv with some α ∈ F ): 0 = λv − λw = λ(v − w)
implies λ = 0 or v = w.
Proposition 1.14. Let V be a vector space over the field F and U ⊆ V . Then the
following statements are equivalent:
Proof. Exercise.
The following Prop. 1.15 is the analogon of [Phi19, Prop. 5.14(a)-(c)] for affine spaces:
(a) U is affinely dependent if, and only if, there exists u0 ∈ U such that u0 is affinely
dependent on U \ {u0 }.
There is a close relationship between affine bases and vector space bases:
Proposition 1.17. Let V be a vector space over the field F , let M ⊆ V be an affine
subspace, and let B ⊆ M with v ∈ B. Then the following statements are equivalent:
The following Th. 1.18 is the analogon of [Phi19, Th. 5.17] for affine spaces:
Theorem 1.18. Let V be a vector space over the field F , let M ⊆ V be an affine
subspace, and let ∅ 6= B ⊆ V . Then the following statements (i) – (iii) are equivalent:
Proof. Let v ∈ B, and let B0 and U be as in Prop. 1.17 above. Then, due to Prop.
1.14, B is a maximal affinely independent subset of M if, and only if, B0 is a maximal
linearly independent subset of U . Moreover, due to Prop. 1.10, B is a minimal (affine)
generating set for M if, and only if, B0 is a minimal (linear) generating set for U . Thus,
the equivalences of Th. 1.18 follow by combining Prop. 1.17 with [Phi19, Th. 5.17].
The following Th. 1.19 is the analogon of [Phi19, Th. 5.23] for affine spaces:
Theorem 1.19. Let V be a vector space over the field F and let M ⊆ V be an affine
subspace.
(a) If S ⊆ M is affinely independent, then there exists an affine basis of M that contains
S.
(b) M has an affine basis B ⊆ M .
(c) Affine bases of M have a unique cardinality, i.e. if B ⊆ M and B̃ ⊆ M are both
affine bases of M , then there exists a bijective map φ : B −→ B̃.
(d) If B is an affine basis of M and S ⊆ M is affinely independent, then there exists
C ⊆ B such that B̃ := S ∪˙ C is an affine basis of M .
Theorem 1.20. Let V be a vector space over the field F and assume M ⊆ V is an
affine subspace with affine basis B of M . Then each vector v ∈ M has unique barycentric
coordinates with respect to the affine basis B, i.e., for each v ∈ M , there exists a unique
finite subset Bv of B and a unique map c : Bv −→ F \ {0} such that
X X
v= c(b) b ∧ c(b) = 1. (1.5)
b∈Bv b∈Bv
Proof. The existence of Bv and the map c follows from the fact that the affine basis B
is an affine generating set, aff B = M . For the uniqueness proof, consider finite sets
Bv , B̃v ⊆ B and maps c : Bv −→ F \ {0}, c̃ : B̃v −→ F \ {0} such that
X X X X
v= c(b) b = c̃(b) b ∧ c(b) = c̃(b) = 1.
b∈Bv b∈B̃v b∈Bv b∈B̃v
Extend both c and c̃ to A := Bv ∪ B̃v by letting c(b) := 0 for b ∈ B̃v \ Bv and c̃(b) := 0
for b ∈ Bv \ B̃v . Then X
0= c(b) − c̃(b) b,
b∈A
such that the affine independence of A implies c(b) = c̃(b) for each b ∈ A, which, in
turn, implies Bv = B̃v and c = c̃.
Example 1.21. With respect to the affine basis {0, 1} of R over R, the barycentric
coordinates of 13 are 23 and 31 , whereas the barycentric coordinates of 5 are −4 and 5.
1 AFFINE SUBSPACES AND GEOMETRY 13
Remark 1.22. Let V be a vector space over the field F and assume M ⊆ V is an affine
subspace with affine basis B of M .
(a) Caveat: In the literature, one also finds the notion of affine coordinates, however,
this notion of affine coordinates is usually (but not always, so one has to use care)
defined differently from the notion of barycentric coordinates as defined in Th. 1.20
above: For the affine coordinates, one designates one point x0 ∈ B to be the origin
of M . Let v ∈ M and let c : Bv −→ F \ {0} be the map yielding the barycentric
coordinates according to Th. 1.20. We write {x0 } ∪ Bv = {x0 , x1 , . . . , xn } with
distinct elements x1 , . . . , xn ∈ M (if any) and we set c(x0 ) := 0 in case x0 ∈
/ Bv .
Then n n
X X
v= c(xi ) xi ∧ c(xi ) = 1,
i=0 i=0
Pn
which, since 1 − i=1 c(xi ) = c(x0 ), is equivalent to
n
X n
X
v = x0 + c(xi ) (xi − x0 ) ∧ c(xi ) = 1.
i=1 i=0
One calls the c(x1 ), . . . , c(xn ), given by the map ca := c↾Bv \{x0 } , the affine coordi-
nates of v with respect to the affine coordinate system {x0 } ∪ (−x0 + B) (for v = x0 ,
ca turns out to be the empty map).
(b) If x1 , . . . , xn ∈ M are distinct points that are affinely independent and n := n·1 6= 0
in F , then one sometimes calls
n
1X
v := xi ∈ M
n i=1
the barycenter of x1 , . . . , xn .
Definition and Remark 1.23. Let V be a vector space over R (we restrict ourselves
to vector spaces over R, since, for a scalar λ we will need to know what it means
for λ to be positive, i.e. λ > 0 needs to be well-defined). P Let v1 , . . . , vn ∈ V and
λ1 , . . . , λn ∈ R, n ∈ N. Then we call the affine combination ni=1 Pnλi vi of v1 , . . . , vn a
convex combination of v1 , . . . , vn if, and only if, in addition to i=1 λi = 1, one has
λi ≥ 0 for each i ∈ {1, . . . , n}. Moreover, we call C ⊆ V convex if, and only if, C
is closed under convex Pcombinations, if, and only if, n ∈ N, v1 , . . . , vn ∈ C, and
i.e. P
λ1 , . . . , λn ∈ R0 with i=1 λi = 1, implies ni=1 λi vi ∈ C (analogous to Th. 1.5, C ⊆ V
+ n
is then convex if, and only if, each convex combination of merely two elements of C
is again in C). Note that, in contrast to affine subspaces, we allow convex sets to
1 AFFINE SUBSPACES AND GEOMETRY 14
be empty. Clearly, the convex subsets of R are precisely the intervals (open, closed,
half-open, bounded or unbounded). Convex subsets of R2 include triangles and disks.
Analogous to the proof of Th. 1.6(a), one can show that arbitrary intersections of convex
sets are always convex, and, analogous to the definition of the affine hull in Def. 1.8,
one defines the convex hull conv A of a set A ⊆ V by letting
C := C ∈ P(V ) : A ⊆ C ∧ C is convex subset of V ,
\
conv A := C.
C∈C
Then Prop. 1.9 and its proof still work completely analogously in the convex situation
and one obtains conv A to be the smallest convex subset of V containing A, where conv A
consists precisely of all convex combinations of elements from A; A = conv A holds if, and
only if, A is convex; conv conv A = conv A; and conv A ⊆ conv B for each A ⊆ B ⊆ V .
If n ∈ N0 and A = {x0 , x1 , . . . , xn } ⊆ V is an affinely independent set, consisting of
the n + 1 distinct points x0 , x1 , . . . , xn , then conv A is called an n-dimensional simplex
(or simply an n-simplex) with vertices x0 , x1 , . . . , xn – 0-simplices are called points, 1-
simplices line segments, 2-simplices triangles, and 3-simplices tetrahedra. If {e1 , . . . , ed }
denotes the standard basis of Rd , d ∈ N, then conv{e1 , . . . , en+1 }, 0 ≤ n < d, is called
the standard n-simplex in Rd .
Tv : V −→ V, Tv (x) := x + v,
(c) Nontrivial translations are not linear: More precisely, Tv with v ∈ V is linear if,
and only if, v = 0 (i.e. Tv = Id).
1 AFFINE SUBSPACES AND GEOMETRY 15
Proof. Exercise.
We will now define affine maps, which are, for affine spaces, what linear maps are for
vector spaces:
Definition 1.26. Let V and W be vector spaces over the field F . A map A : V −→ W
is called affine if, and only if, there exists a linear map L ∈ L(V, W ) and w ∈ W such
that
∀ A(x) = (Tw ◦ L)(x) = w + L(x) (1.6)
x∈V
(i.e. the affine maps are precisely the compositions of linear maps with translations).
We denote the set of all affine maps from V into W by A(V, W ).
proving L ◦ Tv = TLv ◦ L.
1 AFFINE SUBSPACES AND GEOMETRY 16
A = Tw ◦ L ⇔ T−w ◦ A = T−w ◦ Tw ◦ L = Id ◦L = L.
Ax = w + Lx = w + Ly = Ay ⇔ Lx = Ly,
z + w = Ax = w + Lx ⇔ z = Lx,
z − w = Lx ⇔ z = w + Lx = Ax,
proving A = Tw ◦ L is injective (resp. surjective, resp. bijective) if, and only if, L is
injective (resp. surjective, resp. bijective).
(c): If A = Tw ◦ L with L ∈ L(V, W ) and w ∈ W is affine and bijective, then, by (b), L
is bijective. Thus, A−1 = L−1 ◦ (Tw )−1 = L−1 ◦ T−w , which is affine by (a).
(d): If A = Tw ◦ L, B = Tx ◦ K with L ∈ L(V, W ), w ∈ W , K ∈ L(W, X), x ∈ X, then
∀ (B ◦ A)(a) = B(w + La) = x + Kw + (K ◦ L)(a) = TKw+x ◦ (K ◦ L) (a),
a∈V
showing B ◦ A to be affine.
(e) is an immediate consequence of (c) and (d).
(in particular, each affine image of an affine subspace is an affine subspace). More-
over, if A := Tw ◦ L and S ⊆ V such that M := v + U = aff S, then A(M ) =
w + Lv + L(U ) = aff(A(S)).
∀ L−1 (y + U ) = v + L−1 (U )
v∈L−1 {y}
Proof. Exercise.
The following Prop. 1.29 is the analogon of [Phi19, Prop. 6.5(a),(b)] for affine spaces
(but cf. Caveat 1.30 below):
Proposition 1.29. Let V and W be vector spaces over the field F , and let A : V −→ W
be affine.
The following Prop. 1.32 shows that affine subspaces are precisely the images of vector
subspaces under translations and also precisely the sets of solutions to linear systems
with nonempty sets of solutions:
Proposition 1.32. Let V be a vector space over the field F and M ⊆ V . Then the
following statements are equivalent:
(iii) There exists a linear map L ∈ L(V, V ) and a vector b ∈ V such that ∅ 6= M =
L−1 {b} = {x ∈ V : Lx = b} (if V is finite-dimensional, then L−1 {b} = L(L|b),
where L(L|b) denotes the set of solutions to the linear system Lx = b according to
[Phi19, Rem. 8.3]).
Proof. “(i)⇔(ii)”: By the definition of affine subspaces, (i) is equivalent to the existence
of v ∈ V and a vector subspace U ⊆ V such that M = v + U = Tv (U ), which is (ii).
“(iii)⇒(i)”: Let L ∈ L(V, V ) and b ∈ V such that ∅ 6= M = L−1 {b}. Let x0 ∈ M . Then,
by [Phi19, Th. 4.20(f)], M = x0 + ker L, showing M to be an affine subspace.
“(i)⇒(iii)”: Now suppose M = v + U with v ∈ V and U a vector subspace of V .
According to [Phi19, Th. 5.27(c)], there exists a subspace W of V such that V = U ⊕W .
Then, clearly, L : V −→ V , L(u + w) := w (where u ∈ U , w ∈ W ), defines a linear map.
Let b := Lv. Then M = L−1 {b}: Indeed, if u ∈ U , then L(v + u) = Lv + 0 = Lv = b,
showing M ⊆ L−1 {b}; if L(u+w) = w = b = Lv, then u+w = v+u+w−v ∈ v+U = M
(since L(u + w − v) = Lw − Lv = w − Lv = 0 implies u + w − v ∈ U ), showing
L−1 {b} ⊆ M .
The following Th. 1.33 is the analogon of [Phi19, Th. 6.9] for affine spaces:
Theorem 1.33. Let V and W be vector spaces over the field F . Moreover, let MV =
v + UV ⊆ V and MW = w + UW ⊆ W be affine subspaces of V and W , respectively,
where v ∈ V , w ∈ W , UV is a vector subspace of V and UW is a vector subspace of
W . Let BV be an affine basis of MV and let BW be an affine basis of MW . Then the
following statements are equivalent:
Analogous to [Phi19, Def. 6.17], we now consider, for vector spaces V, W over the field
F , A(V, W ) with pointwise addition and scalar multiplication, letting, for each A, B ∈
A(V, W ), λ ∈ F ,
(A + B) : V −→ W, (A + B)(x) := A(x) + B(x),
(λ · A) : V −→ W, (λ · A)(x) := λ · A(x) for each λ ∈ F .
The following Th. 1.34 corresponds to [Phi19, Th. 6.18] and [Phi19, Th. 6.21] for linear
maps.
Theorem 1.34. Let V and W be vector spaces over the field F . Addition and scalar
multiplication on A(V, W ), given by the pointwise definitions above, are well-defined in
the sense that, if A, B ∈ A(V, W ) and λ ∈ F , then A+B ∈ A(V, W ) and λA ∈ A(V, W ).
Moreover, with these pointwise defined operations, A(V, W ) forms a vector space over
F.
Proof. According to [Phi19, Ex. 5.2(c)], it only remains to show that A(V, W ) is a
vector subspace of F(V, W ) = W V . To this end, let A, B ∈ A(V, W ) with A = Tw1 ◦ L1 ,
B = Tw2 ◦ L2 , where w1 , w2 ∈ W , L1 , L2 ∈ L(V, W ), and let λ ∈ F . If v ∈ V , then
(A + B)(v) = w1 + L1 v + w2 + L2 v = w1 + w2 + (L1 + L2 )v,
(λA)(v) = λw1 + λL1 v,
proving A + B = Tw1 +w2 ◦ (L1 + L2 ) ∈ A(V, W ) and λA = Tλw1 ◦ (λL1 ) ∈ A(V, W ), as
desired.
1 AFFINE SUBSPACES AND GEOMETRY 20
(a) If M k N , then M I N or M ∩ N = ∅.
(b) If n ∈ N0 and An denotes the set of affine subspaces with dimension n of V , then
the parallelity relation of Def. 1.35(b) constitutes an equivalence relation on An .
(c) If A denotes the set of all affine subspaces of V , then, for dim V ≥ 2, the parallelity
relation of Def. 1.35(b) is not transitive (in particular, not an equivalence relation)
on A.
(a) If x, y ∈ V with x 6= y, then there exists a unique line l ⊆ V (i.e. a unique affine
subspace l of V with dim l = 1) such that x, y ∈ l. Moreover, this affine subspace is
given by
l = x + h{x − y}i. (1.7)
(b) If x, y, z ∈ V and there does not exist a line l ⊆ V such that x, y, z ∈ l, then there
exists a unique plane p ⊆ V (i.e. a unique affine subspace p of V with dim p = 2)
such that x, y, z ∈ p. Moreover, this affine subspace is given by
Proof. Exercise.
Proposition 1.39. Let V, W be vector spaces over the field F and let M, N ⊆ V be
affine subspaces.
(b) If v ∈ V , then Tv (M ) k M .
Proof. (a): Let A ∈ A(V, W ). Then M I N implies A(M ) I A(N ), since M ⊆ N im-
plies A(M ) ⊆ A(N ) and A(M ), A(N ) are affine subspaces of W due to Prop. 1.28(a).
Moreover, if M = v + UM , N = w + UN with v, w ∈ V and UM , UN vector subspaces
of V , A = Tx ◦ L with x ∈ W and L ∈ L(V, W ), then A(M ) = x + Lv + L(UM ) and
A(N ) = x + Lw + L(UN ), such that M k N implies A(M ) k A(N ), since UM ⊆ UN
implies L(UM ) ⊆ L(UN ).
(b) is immediate from Tv (M ) = v + w + U for M = w + U with w ∈ V and U a vector
subspace of V .
2 DUALITY 22
2 Duality
(b) Let I be a nonempty set, V := F(I, F ) = F I (i.e. the vector space of functions
from I into F ). Then, for each i ∈ I, the projection onto the ith coordinate
πi : V −→ F, πi (f ) := f (i),
(d) Let a, b ∈ R, a ≤ b, I := [a, b], and let V := R(I, K) be the set of all K-valued
Riemann integrable functions on I. Then
Z
J : V −→ K, J(f ) := f,
I
(a) The functions from V into F (i.e. the elements of F(V, F ) = F V ) are called func-
tionals or forms on V . In particular, the elements of L(V, F ) are called linear
functionals or linear forms on V .
Corollary 2.3. Let V be a vector space over the field F . Then each linear form α :
V −→ F is uniquely determined by its values on a basis of V . More precisely, if B is a
P v ∈ V , cv : Bv −→ F \ {0}, Bv ⊆ V ,
basis of V , (λb )b∈B is a family in F , and, for each
is the corresponding coordinate map (i.e. v = b∈Bv cv (b) b for each v ∈ V ), then
!
X X
α : V −→ F, α(v) = α cv (b) b := cv (b) λb , (2.2)
b∈Bv b∈Bv
implies α = α̃.
Corollary 2.4. Let V be a vector space over the field F and let B be a basis of V .
Using Cor. 2.3, define linear forms αb ∈ V ′ by letting
(
1 for a = b,
∀ αb (a) := δba = (2.3)
(b,a)∈B×B 0 for a 6= b.
Define
B ′ := αb : b ∈ B .
(2.4)
Proof. Cor. 2.4(a),(b),(c) constitute special cases of the corresponding cases of [Phi19,
Th. 6.19].
Definition 2.5. If V is a vector space over the field F with dim V = n ∈ N and
B := (b1 , . . . , bn ) is an ordered basis of V , then we call B ′ := (α1 , . . . , αn ), where
∀ αi ∈ V ′ ∧ αi (bj ) = δij , (2.5)
i∈{1,...,n}
the ordered dual basis of B (and B the ordered dual basis of B ′ ) – according to Cor.
2.4(b), B ′ is, indeed, an ordered basis of V ′ .
Notation 2.7. Let V be a vector space over the field F , dim V = n ∈ N, with ordered
basis B = (b1 , . . . , bn ). Moreover, let B ′ = (α1 , . . . , αn ) be the corresponding ordered
dual basis of V ′ . If one then denotes the coordinates of v ∈ V with respect to B as the
column vector
v1
..
v = . ,
vn
then one typically denotes the coordinates of γ ∈ V ′ with respect to B ′ as the row vector
γ = γ1 . . . γn
(this has the advantage that one then can express γ(v) as a matrix product, cf. Rem.
2.8(a) below).
(a) We obtain
n
! n
! n
n X n
n X
X X X X
γ(v) = γ k αk v l bl = γk vl αk (bl ) = γk vl δkl
k=1 l=1 l=1 k=1 l=1 k=1
v
n
X .1
= γ k vk = γ 1 . . . γn .. .
k=1 vn
(b) Let B̃V := (ṽ1 , . . . , ṽn ) be another ordered basis of V and (cji ) ∈ GLn (F ) such that
n
X
∀ ṽi = cji vj .
i∈{1,...,n}
j=1
If B̃V′ := (α̃1 , . . . , α̃n ) denotes the ordered dual basis corresponding to B̃V and
(dji ) := (cji )−1 , then
n
X n
X
∀ α̃i = djit αj = dij αj ,
i∈{1,...,n}
j=1 j=1
Proposition 2.9. Let V be a vector space over the field F . If U is a vector subspace of
V and v ∈ V \ U , then there exists α ∈ V ′ , satisfying
α(v) = 1 ∧ ∀ α(u) = 0.
u∈U
(b) The dual of V ′ is called the bidual or the second dual of V . One writes V ′′ := (V ′ )′ .
2.2 Annihilators
Definition 2.13. Let V be a vector space over the field F , M ⊆ V , S ⊆ V ′ . Moreover,
let Φ : V −→ V ′′ denote the canonical embedding of (2.7). Then
( ′
V for M = ∅,
M ⊥ := α ∈ V ′ : ∀ α(v) = 0 = T
v∈M ker(Φv) for M 6= ∅
v∈M
is called the (backward) annihilator of S in V . In view of Rem. 2.15 and Ex. 2.16(b)
below, one also calls v ∈ V and α ∈ V ′ such that
(2.6)
α(v) = hv, αi = 0
M ⊥ = hM i⊥ , S ⊤ = hSi⊤ . (2.8)
Proof. Since M ⊥ and S ⊤ are both intersections of kernels of linear maps, they are
subspaces, since kernels are subspaces by [Phi19, Prop. 6.3(c)] and intersections of sub-
spaces are subspaces by [Phi19, Th. 5.7(a)]. Moreover, it is immediate from Def. 2.13
that M ⊥ ⊇ hM i⊥ and S ⊤ ⊇ hSi⊤ . On the other hand, consider α ∈ M ⊥ and v ∈ S ⊤ .
Let λ1 , . . . , λn ∈ F , n ∈ N. If v1 , . . . , vn ∈ M , then
n
! n
α∈M ⊥
X X
α λi vi = λi α(vi ) = 0,
i=1 i=1
Remark 2.15. On real vector spaces V , one can study so-called inner products (also
called scalar products), h·, ·i : V × V −→ R, (v, w) 7→ hv, wi ∈ R, which, as part
of their definition, have the requirement of being bilinear forms, i.e., for each v ∈ V ,
hv, ·i : V −→ R is a linear form and, for each w ∈ V , h·, wi : V −→ R is a linear form
(we will come back to vector spaces with inner products again Sec. 10 below). One then
calls vectors v, w ∈ V perpendicular or orthogonal with respect to h·, ·i if, and only if,
hv, wi = 0 so that the notions of Def. 2.13 can be seen as generalizing orthogonality
with respect to inner products (also cf. Ex. 2.16(b) below).
Example 2.16. (a) Let V be a vector space over the field F and let U be a subspace
of V with BU being a basis of U . Then, according to [Phi19, Th. 5.23(a)], there
exists a basis B of V such that BU ⊆ B. Then Cor. 2.3 implies
⊥
∀′ α∈U ⇔ ∀ α(b) = 0 .
α∈V b∈BU
denote the so-called Euclidean inner product on R2 . Then, clearly, for each w =
(w1 , w2 ) ∈ R2 ,
αw : R2 −→ R, αw (v) := hv, wi = v1 w1 + v2 w2 ,
defines a linear form on R2 . Let v := (1, 2). Then the span of v, i.e. lv := {(λ, 2λ) :
λ ∈ R}, represents the line through v. Moreover, for each w = (w1 , w2 ) ∈ R2 ,
Thus, l⊥ is spanned by (−2, 1) and we see that lv⊥ consists precisely of the linear
forms αw that are given by vectors w that are perpendicular to v in the Euclidean
geometrical sense (i.e. in the sense usually taught in high school geometry).
—
The following notions defined for linear forms in connection with subspaces can some-
times be useful when studying annihilators:
Definition 2.17. Let V be a vector space over the field F and let U be a subspace of
V . Then
R : V ′ −→ U ′ , Rf := f ↾U ,
2 DUALITY 29
I : (V /U )′ −→ V ′ , (Ig)(v) := g(v + U ),
and
U′ ∼
= V ′ /U ⊥ . (2.10)
(see [Phi19, Th. 6.8(a)] for the precise meaning of (2.9) in case at least one of the
occurring cardinalities is infinite). If dim V = n ∈ N, then one also has
α ∈ ker R ⇔ ∀ α(u) = 0 ⇔ α ∈ U ⊥,
u∈U
thereby proving (2.9). Next, applying the isomorphism theorem of [Phi19, Th. 6.16(a)]
yields
U ′ = Im R ∼
= V ′ / ker R = V ′ /U ⊥ ,
2 DUALITY 30
proving (2.11).
(b): Exercise.
Theorem 2.19. Let V be a vector space over the field F .
f (α) = 1 ∧ ∀ f (β) = 0.
β∈S
Since dim V = n ∈ N, we may employ Th. 2.11(b) to conclude that the canonical
embedding Φ : V −→ V ′′ is a linear isomorphism, in particular, surjective. Thus, there
exists v ∈ V such that f = Φv, i.e. f (γ) = γ(v) for each γ ∈ V ′ . Since f ∈ S ⊥ , we
have β(v) = f (β) = 0 for each β ∈ S, showing v ∈ S ⊤ . Thus, α ∈ (S ⊤ )⊥ implies the
contradiction 0 = α(v) = f (α) = 1. In consequence, (S ⊤ )⊥ \ S = ∅, proving (b).
2 DUALITY 31
B+ := BU1 ∪ BU2 = B1 ∪˙ B2 ∪˙ B∩
Definition 2.20. Let V be a vector space over the field F . If α ∈ V ′ \ {0} and r ∈ F ,
then the set
Hα,r := α−1 {r} = {v ∈ V : α(v) = r} ⊆ V
is called a hyperplane in V .
Notation 2.21. Let V be a vector space over the field F , v ∈ V , and α ∈ V ′ . We then
write
v ⊥ := {v}⊥ , α⊤ := {α}⊤ .
∀ ri := αi (v).
i∈{1,...,n−m}
Tn−m
We claim M = N := i=1 Hαi ,ri : Indeed, if x = v + u with u ∈ U , then
αi ∈U ⊥
∀ αi (x) = αi (v) + αi (u) = ri + 0 = ri ,
i∈{1,...,n−m}
2 DUALITY 33
∀ αi (x − v) = ri − ri = 0, i.e. x − v ∈ αi⊤ ,
i∈{1,...,n−m}
implying
Th. 2.19(a)
x − v ∈ h{α1 , . . . , αn−m }i⊤ = (U ⊥ )⊤ = U,
showing x ∈ v + U = M and N ⊆ M as claimed.
Example 2.24. Let F be a field. As in [Phi19, Sec. 8.1], consider the linear system
n
X
∀ ajk xk = bj , (2.13)
j∈{1,...,m}
k=1
(λ ∈ F , i 6= j) replaces Hαj ,bj by Hαj +λαi ,bj +λbi . We verify, once again, what we already
know form [Phi19, Th. 8.15], namely
m
\ m
\
L(A|b) = Hαk ,bk = M :=
H α k ,b k
∩ Hα +λα ,b +λb :
j i j i
k=1 k=1,
k6=j
If x ∈ L(A|b), then (αj + λαi )(x) = bj + λbi , showing x ∈ Hαj +λαi ,bj +λbi and x ∈ M .
Conversely, if x ∈ M , then αj (x) = (αj + λαi )(x) − λαi (x) = bj + λbi − λbi = bj , showing
x ∈ Hαj ,bj and x ∈ L(A|b).
A′ (β1 + β2 ) (v) = (β1 + β2 )(Av) = β1 (Av) + β2 (Av) = (A′ β1 )(v) + (A′ β1 )(v)
= (A′ β1 + A′ β2 )(v),
A′ (λβ) (v) = (λβ)(Av) = λ(A′ β)(v),
Definition 2.26. Let V, W be vector spaces over the field F , A ∈ L(V, W ). Then the
map A′ ∈ L(W ′ , V ′ ) given by Th. 2.26 is called the dual map corresponding to A (or
the transpose of A).
Proof. If (aji ) is the matrix corresponding to A with respect to BV and BW , then (cf.
[Phi19, Th. 7.10(b)])
Xm
∀ Avi = aji wj , (2.16)
i∈{1,...,n}
j=1
Indeed, one computes, for each j ∈ {1, . . . , m} and for each k ∈ {1, . . . , n},
m
! m m
X X X
(A′ βj )vk = βj (Avk ) = βj alk wl = alk βj (wl ) = alk δjl = ajk
l=1 l=1 l=1
n n n
!
X X X
= aji δik = aji αi (vk ) = aji αi vk ,
i=1 i=1 i=1
However, if one adopts the convention of Not. 2.7 to represent elements of the duals
as row vectors, then one applies transposes in the above equation to obtain
ǫ1 . . . ǫn = γ1 . . . γm (aji ),
showing that this notation allows A and A′ to be represented by the same matrix
(aji ).
(b) As in [Phi19, Th. 7.14] and Rem. 2.8(b) above, we now consider basis transitions
n
X m
X
∀ ṽi = cji vj , ∀ w̃i = fji wj , ,
i∈{1,...,n} i∈{1,...,m}
j=1 j=1
B̃V := (ṽ1 , . . . , ṽn ), B̃W := (w̃1 , . . . , w̃m ). We then know from [Phi19, Th. 7.14] that
the matrix representing A with respect to B̃V and B̃W is (fji )−1 (aji )(cji ). Thus, ac-
cording to Th. 2.26, the matrix representing A′ with respect to the dual bases B̃V′ =
′
t
(α̃1 , . . . , α̃n ) and B̃W = (β̃1 , . . . , β̃m ) is (fji )−1 (aji )(cji ) = (cji )t (aji )t ((fji )−1 )t .
Of course, we can, alternatively, observe that, by Rem. 2.8(b), the basis transi-
tion from BV ′ to B̃V′ is given by ((cji )−1 )t and the basis transition from BW ′ to
′
B̃W is given by ((fji )−1 )t and compute the matrix representing A′ with respect
to B̃V′ and B̃W ′
via Th.Pm2.26 and [Phi19, Th. 7.14] to obtain (cji )t (aji )t ((fji )−1 )t ,
′ ′
P n ′
as before. If γ = i=1 γi β̃i ∈ W and ǫ := A (γ) = i=1 ǫi α̃i ∈ V with
γ1 , . . . , γm , ǫ1 , . . . , ǫn ∈ F , then this yields
ǫ1 . . . ǫn = γ1 . . . γm (fji )−1 (aji )(cji ).
(c) Comparing with [Phi19, Rem. 7.24], we observe that the dual map A′ ∈ L(W ′ , V ′ )
is precisely the transpose map At of the map A considered in [Phi19, Rem. 7.24].
Moreover, as a consequence of Th. 2.26, the rows of the matrix (aji ), representing
A, span Im A′ in the same way that the columns of (aji ) span Im A.
Theorem 2.29. Let V, W be vector spaces over the field F .
showing (B ◦ A)′ = A′ ◦ B ′ .
Theorem 2.30. Let V, W be vector spaces over the field F and A ∈ L(V, W ).
(b): Exercise.
(c): If A is an epimorphism, then Im A = W , implying
(a)
ker A′ = (Im A)⊥ = W ⊥ = {0},
showing A to be an epimorphism.
(d): Exercise.
(e) is now immediate from combining (c) and (d).
3 SYMMETRIC GROUPS 38
Theorem 2.31. Let V, W be vector spaces over the field F with canonical embeddings
ΦV : V −→ V ′′ and ΦW : W −→ W ′′ according to Def. 2.10(c). Let A ∈ L(V, W ) and
A′′ := (A′ )′ ∈ L(V ′′ , W ′′ ). Then we have
ΦW ◦ A = A′′ ◦ ΦV . (2.18)
proves (2.18).
3 Symmetric Groups
In preparation for the introduction of the notion of determinant (which we will find
to be a useful tool to further study linear endomorphisms between finite-dimensional
vector spaces), we revisit the symmetric group Sn of [Phi19, Ex. 4.9(b)].
π = (i1 i2 . . . ik ). (3.2)
3 SYMMETRIC GROUPS 39
Then
π(1) = 5, π(2) = 2, π(3) = 1, π(4) = 4, π(5) = 3.
and, moreover,
Proof. Exercise.
Proof. We conduct the proof via induction: If n = 0, then S contains precisely the
empty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = {a},
N = {b}, then S contains precisely the map f : M −→ N , f (a) = b, and #S = 1 = 1!
is true. For the induction step, fix n ∈ N and assume #M = #N = n + 1. Let a ∈ M
and [
A := S M \ {a}, N \ {b} . (3.3)
b∈N
(a) Each permutation can be decomposed into finitely many disjoint cycles: For each
π ∈ Sn , there exists a decomposition of {1, . . . , n} into disjoint sets A1 , . . . , AN ,
N ∈ N, i.e.
N
[
{1, . . . , n} = Ai and Ai ∩ Aj = ∅ for i 6= j, (3.4)
i=1
∀ ∃ ∃ π = τN ◦ · · · ◦ τ1 , (3.6)
π∈Sn N ∈N τ1 ,...,τN ∈T
where T := (i i + 1) : i ∈ {1, . . . , n − 1} .
Indeed, since {1, . . . , n} is finite, there must be a smallest k ∈ N such that π k (i) ∈ A1 :=
{i, π(i), . . . , π k−1 (i)}. Since π is bijective, it must be π k (i) = i and (i π(i) . . . , π k−1 (i))
is a k-cycle. We are already done in case k = n. If k < n, then consider B :=
{1, . . . , n} \ A1 . Then, again using the bijectivity of π, π↾B is a permutation onSB with
1 ≤ #B < n. By induction, there are disjoint sets A2 , . . . , AN such that B = N j=2 Aj ,
Aj consists of the distinct elements aj1 , . . . , aj,Nj and
Since π = (i π(i) . . . , π k−1 (i)) ◦ π ↾B , this finishes the proof of (3.5). If there were
another, different,
SM decomposition of π into cycles, say, given by disjoint sets B1 , . . . , BM ,
{1, . . . , n} = i=1 Bi , M ∈ N, then there were Ai 6= Bj and k ∈ Ai ∩ Bj . But then k
were in the cycle given by Ai and in the cycle given by Bj , implying Ai = {π l (k) : l ∈
N} = Bj , in contradiction to Ai 6= Bj .
(b): We first show that every π ∈ Sn is a composition of finitely many transpositions
(not necessarily transpositions from the set T ): According to (a), it suffices to show
that every cycle is a composition of finitely many transpositions. Since each 1-cycle is
the identity, it is (i) = Id = (1 2) (1 2) for each i ∈ {1, . . . , n}. If (i1 . . . ik ) is a k-cycle,
k ∈ {2, . . . , n}, then
Indeed,
i1
for i = ik ,
∀ (i1 i2 ) (i2 i3 ) · · · (ik−1 ik )(i) = il+1 for i = il , l ∈ {1, . . . , k − 1},
i∈{1,...,n}
i for i ∈
/ {i1 , . . . , ik },
3 SYMMETRIC GROUPS 42
proving (3.8). To finish the proof of (b), we observe that every transposition is a
composition of finitely many elements of T : If i, j ∈ {1, . . . , n}, i < j, then
(i j) = (i i + 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i + 1 i + 2)(i i + 1) : (3.9)
Indeed,
∀ (i i + 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i + 1 i + 2)(i i + 1)(k)
k∈{1,...,n}
j for k = i,
i for k = j,
=
k for i < k < j,
k for k ∈ / {i, i + 1, . . . , j},
proving (3.9).
Remark 3.8. Let n ∈ N, n ≥ 2. According to Th. 3.7(a), each π ∈ Sn has a unique
decomposition into N ∈ N cycles as in (3.5). If τ ∈ Sn is a transposition, then, as
a consequence of Lem. 3.4(a),(b),(c), the corresponding cycle decomposition of πτ has
precisely N + 1 cycles (if Lem. 3.4(b) applies) or precisely N − 1 cycles (if Lem. 3.4(c)
applies).
Definition 3.9. Let k ∈ Z. We call the integer k even if, and only if, k ≡ 0 (mod 2)
(cf. notation introduced in [Phi19, Ex. 4.28(a)]; we call k odd if, and only if, k ≡ 1 (mod
2) (i.e. k is even if, and only if, 2 is a divisor of k; k is odd if, and only if, 2 is no divisor
of k, cf. [Phi19, Def. D.2(a)]). The property of being even or odd is called the parity of
the integer k.
Example 3.10. Let Id ∈ S3 be the identity on {1, 2, 3}. Then
Id = (1)(2)(3) = (1 2)(1 2) = (3 2)(3 2) = (1 2)(1 2)(3 2)(3 2),
(1 2 3) = (1 2)(2 3) = (2 3)(1 3) = (1 2)(1 3)(2 3)(1 2),
illustrating that, for n ≥ 2, one can write elements π of Sn as products of transpositions
with varying numbers of factors. However, while the number k ∈ N0 of factors in such
products representing π is not unique, we will prove in the following Th. 3.11(a) that
the parity of k is uniquely determined by π ∈ Sn .
Theorem 3.11. Let n ∈ N.
(a) Let n ≥ 2, π ∈ Sn , N ∈ N, k ∈ N0 . If
k
Y
π= hi (3.10)
i=1
3 SYMMETRIC GROUPS 43
Proof. (a): We conduct the proof via induction on k: For k = 0, the product in (3.10)
is empty, i.e.
π = Id = (1) · · · (n),
yielding N = n, showing (3.11) to hold. If k = 1, then π = h1 is a transposition and,
thus, has N = n − 1 cycles, showing (3.11) to holdonce again.
Now assume (3.11) for
Qk+1 Qk
k ≥ 1 by induction and consider π = i=1 hi = i=1 hi hk+1 . Thus, π = πk hk+1 ,
Qk
where πk := i=1 hi . If πk has Nk cycles, then, by induction,
k ≡ n − Nk (mod 2). (3.12)
Moreover, from Rem. 3.8 we know N = Nk + 1 or N = Nk − 1. In both cases, (3.12)
implies k + 1 ≡ n − N (mod 2), completing the induction.
(b): For n = 1, there is nothing to prove. For n ≥ 2, we first note sgn to be well-defined,
as the number of cycles N (π) is uniquely determined by π ∈ Sn (and each π ∈ Sn can
be written as a product of transpositions by Th. 3.7(b)). Next, we note sgn to be
surjective, since, for the identity, we can choose k = 0, i.e. sgn(Id) = (−1)0 = 1, and, for
each transposition τ ∈ Sn (as n ≥ 2, Sn contains at least the transposition τ = (1 2)),
we can choose k = 1, i.e. sgn(τ ) = (−1)1 = −1. To verify sgn to be a homomorphism,
let π, σ ∈ Sn . By Th. 3.7(b), there are transpositions τ1 , . . . , τkπ , h1 , . . . , hkσ ∈ Sn such
that ! k !
kπ
Y kσ
Y Ykπ Y σ
π= τi , σ = hi ⇒ πσ = τi hi ,
i=1 i=1 i=1 i=1
implying
sgn(πσ) = (−1)kπ +kσ = (−1)kπ (−1)kσ = sgn(π) sgn(σ),
thus, completing the proof that sgn constitutes a homomorphism.
3 SYMMETRIC GROUPS 44
Proof. (a): For each π ∈ Sn , let σ(π) denote the value given by the right-hand side of
(3.13). If n = 1, then Sn = {Id} and σ(Id) = 1 = sgn(Id), since the product in (3.13) is
empty (and, thus, equal to 1). For n ≥ 2, we first show
thereby establishing the case. Next, if τ ∈ Sn is a transposition, then there exist elements
i, j ∈ {1, . . . , n} such that i < j and τ = (i j). Thus,
(b): Using (3.14) together with the homomorphism property of sgn given by Th. 3.11(b),
if suffices to show that
sgn(γ) = (−1)k−1 (3.16)
holds for each cycle γ := (i1 . . . ik ) ∈ Sn , where i1 , . . . , ik are distinct elements of
{1, . . . , n}, k ∈ N. According to (3.8), we have
γ = (i1 . . . ik ) = (i1 i2 ) (i2 i3 ) · · · (ik−1 ik ),
showing γ to be the product of k − 1 transpositions, thereby proving (3.16) and the
proposition.
Definition 3.13. Let n ∈ N, n ≥ 2, and let sgn : Sn −→ {1, −1} be the group
homomorphism defined in Th. 3.11(b) above. We call π ∈ Sn even if, and only if,
sgn(π) = 1; we call π odd if, and only if, sgn(π) = −1. The property of being even or
odd is called the parity of the permutation π. Moreover, we call
An := ker sgn = {π ∈ Sn : sgn(π) = 1}
the alternating group on {1, . . . , n}.
Proposition 3.14. Let n ∈ N, n ≥ 2.
Theorem 4.3. Let V and W be vector spaces over the field F , α ∈ N. Moreover, let
B be a basis of V . Then each α times linear map L ∈ Lα (V, W ) is uniquely determined
by its values on B α : More precisely, if (wb )b∈B α is a family in W , and, for each v ∈ V ,
Bv and cv : Bv −→ F \ {0} are as in [Phi19, Th. 5.19] (providing the coordinates of v
with respect to B in the usual way), then the map
X X
L : V α −→ W, L(v 1 , . . . , v α ) = L cv1 (b1 ) b1 , . . . , cvα (bα ) bα
b1 ∈Bv1 bα ∈Bvα
X
:= cv1 (b1 ) · · · cvα (bα ) w(b1 ,...,bα ) , (4.3)
(b1 ,...,bα )∈Bv1 ×···×Bvα
∀ L̃(b) = wb , (4.4)
b∈B α
implies L = L̃.
Proof. Exercise: Apart from the more elaborate notation, everything works as in the
proof of [Phi19, Th. 6.6].
Theorem 4.4. Let V and W be vector spaces over the field F , let BV and BW be bases
of V and W , respectively, let α ∈ N. Given b1 , . . . , bα ∈ BV and b ∈ BW , and using Th.
4.3, define maps Lb1 ,...,bα ,b ∈ Lα (V, W ) by letting
(
b for (b̃1 , . . . , b̃α ) = (b1 , . . . , bα ),
Lb1 ,...,bα ,b (b̃1 , . . . , b̃α ) := (4.5)
0 otherwise.
(c) If dim V = ∞ and dim W ≥ 1, then hBi ( Lα (V, W ) and, in particular, B is not a
basis of Lα (V, W ).
Proof. (a): We verify that the elements of B are linearly independent: Let M, N ∈ N.
Let (b11 , . . . , bα1 ), . . . , (b1N , . . . , bαN ) ∈ (BV )α be distinct and let w1 , . . . , wM ∈ BW be
distinct as well. Assume λlk ∈ F to be such that
M X
X N
L := λlk Lb1k ,...,bαk ,wl = 0.
l=1 k=1
Remark 4.6. Let V and W be vector spaces over the field F , α ∈ N. Then Altα (V, W )
is a vector subspace of Lα (V, W ): Indeed, 0 ∈ Altα (V, W ) and, if λ ∈ F and A, B ∈
Lα (V, W ) satisfy (4.7), then λA and A + B satisfy (4.7) as well.
α
Notation 4.7. Let V, W be a sets and α ∈ N. Then, for each f ∈ F(V α , W ) = W V
and each permutation π ∈ Sα , define
Proof. For each (v1 , . . . , vα ) ∈ V α , let (w1 , . . . , wα ) := (vπ1 (1) , . . . , vπ1 (α) ) ∈ V α . Then,
for each i ∈ {1, . . . , α}, we have wi = vπ1 (i) and wπ2 (i) = vπ1 (π2 (i)) . Thus, we compute
(π1 π2 )f (v1 , . . . , vα ) = f (v(π1 π2 )(1) , . . . , v(π1 π2 )(α) ) = f (vπ1 (π2 (1)) , . . . , vπ1 (π2 (α)) )
= f (wπ2 (1) , . . . , wπ2 (α) ) = (π2 f )(w1 , . . . , wα )
= (π2 f )(vπ1 (1) , . . . , vπ1 (α) ) = π1 (π2 f ) (v1 , . . . , vα ),
Proposition 4.9. Let V and W be vector spaces over the field F , α ∈ N. Then, given
A ∈ Lα (V, W ), the following statements are equivalent for char F 6= 2, where “(i) ⇒ (ii)”
also holds for char F = 2.
(i) A is alternating.
4 MULTILINEAR MAPS AND DETERMINANTS 50
Proof. “(i)⇒(ii)”: We first prove (4.9) for transpositions π = (i i+1) with i ∈ {1, . . . , α−
1}: Let v k ∈ V , k ∈ {1, . . . , α} \ {i, i + 1}, and define
Then, as A is α times linear and alternating, B is bilinear and alternating. Thus, for
each v, w ∈ V ,
implying B(v, w) = −B(w, v), proving (4.9) for this case. For general π ∈ Sα , (4.9)
now follows from Th. 3.7(b), Th. 3.11(b), and Lem. 4.8: Let T := (i i + 1) ∈ Sα : i ∈
{1, . . . , α − 1} . Then, given π ∈ Sα , Th. 3.7(b) implies the existence of π1 , . . . , πN ∈ T ,
N ∈ N, such that π = π1 · · · πN . Thus, for each (v 1 , . . . , v α ) ∈ V α ,
Lem. 4.8
A(v π(1) , . . . , v π(α) ) = (πA)(v 1 , . . . , v α ) = π1 . . . (πN A) (v 1 , . . . , v α )
Th. 3.11(b)
= (−1)N A(v 1 , . . . , v α ) = = sgn(π) A(v 1 , . . . , v α ),
proving (4.9).
“(ii)⇒(i)”: Let char F = 6 2. Let (v 1 , . . . , v α ) ∈ V α and suppose i, j ∈ {1, . . . , α} are
such that i 6= j as well as v i = v j . Then
(ii)
A(v 1 , . . . , v α ) = sgn(ij)A(v 1 , . . . , v α ) = −A(v 1 , . . . , v α ).
The following Ex. 4.10 shows that “(i) ⇐ (ii)” can not be expected to hold in Prop. 4.9
for char F = 2:
Proposition 4.11. Let V and W be vector spaces over the field F , α ∈ N. Then, given
A ∈ Lα (V, W ), the following statements are equivalent:
4 MULTILINEAR MAPS AND DETERMINANTS 51
(i) A is alternating.
(ii) The implication in (4.7) holds whenever j = i + 1 (i.e. whenever two juxtaposed
arguments are identical).
Proof. Exercise.
Definition 4.12. Let F be a field and V := F α , α ∈ N. Then det ∈ Altα (V, F ) is called
a determinant if, and only if,
det(e1 , . . . , eα ) = 1,
where e1 , . . . , eα denote the standard basis vectors of V .
Definition 4.13. Let V be a vector space over the field F , α ∈ N. We define the map
Λα : (V ′ )α −→ Altα (V, F ), called outer product or wedge product of linear forms, as
follows: X
Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) := sgn(π) ωπ(1) (v 1 ) · · · ωπ(α) (v α ) (4.10)
π∈Sα
(cf. Th. 4.15 below). Given linear forms ω1 , . . . , ωα ∈ V ′ , it is common to also use the
notation
ω1 ∧ · · · ∧ ωα := Λα (ω1 , . . . , ωα ).
Lemma 4.14. Let V be a vector space over the field F , α ∈ N. Moreover, let Λα denote
the wedge product of Def. 4.13. Then one has
X
∀ ′ 1 ∀α Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) := sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) ).
ω1 ,...,ωα ∈V v ,...,v ∈V
π∈Sα
(4.11)
proving (4.11).
4 MULTILINEAR MAPS AND DETERMINANTS 52
Theorem 4.15. Let V be a vector space over the field F , α ∈ N. Moreover, let Λα
denote the wedge product of Def. 4.13.
showing A to be alternating.
(b) follows from (a): According to (a), Λα maps (V ′′ )α into Altα (V ′ , F ). Thus, if
Φ : V −→ V ′′ is the canonical embedding, then, for each v 1 , . . . , v α ∈ V α and each
ω1 , . . . , ωα ∈ V ′ ,
X
Λα (Φv 1 , . . . , Φv α )(ω1 , . . . , ωα ) = sgn(π) (Φv π(1) )(ω1 ) · · · (Φv π(α) )(ωα )
π∈Sα
X
= sgn(π) ω1 (v π(1) ) · · · ωα (v π(α) )
π∈Sα
(4.11)
= Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ),
4 MULTILINEAR MAPS AND DETERMINANTS 53
also holds.
Proof. For each i ∈ {1, . . . , α}, let ωi := πei : V −→ F be the projection onto the
coordinate with respect to ei according to Ex. 2.1(a). Then, if v j is as above, we have
ωi (v j ) = aji . Thus,
(4.10) X (4.12)
Λα (ω1 , . . . , ωα )(v 1 , . . . , v α ) = sgn(π) ωπ(1) (v 1 ) · · · ωπ(α) (v α ) = det(v 1 , . . . , v α ),
π∈Sα
According to Th. 4.17(b) below, the map defined by (4.12) is the only determinant in
Altα (F α , F ).
Theorem 4.17. Let V and W be vector spaces over the field F , let BV and BW be bases
of V and W , respectively, let α ∈ N. Moreover, as in Cor. 2.4, let B ′ := {ωb : b ∈ BV },
where
∀ ωb ∈ V ′ , ωb (a) := δba . (4.14)
(b,a)∈BV ×BV
4 MULTILINEAR MAPS AND DETERMINANTS 54
In addition, assume < to be a strict total order on BV (for dim V = ∞, the order on
BV exists due to the axiom of choice, cf. [Phi19, Th. A.52(iv)]) and define
(BV )αord := (b1 , . . . , bα ) ∈ (BV )α : b1 < · · · < bα .
Given (b1 , . . . , bα ) ∈ (BV )αord and y ∈ BW , and using Th. 4.15(a), define maps
Ab1 ,...,bα ,y ∈ Altα (V, W ), Ab1 ,...,bα ,y := Λα (ωb1 , . . . , ωbα ) y. (4.15)
: y ∈ BW , (b1 , . . . , bα ) ∈ (BV )αord .
Let B := Ab1 ,...,bα ,y
Proof. (a): We verify that the elements of B are linearly independent: Note that
∀ α
∀ (bπ(1) , . . . , bπ(α) ) ∈
/ (BV )αord . (4.17)
(b1 ,...,bα )∈(B V )ord Id6=π∈Sα
Thus, if
(b1 , . . . , bα ), (c1 , . . . , cα ) ∈ (BV )αord ,
then
X
Λα (ωb1 , . . . , ωbα )(c1 , . . . , cα ) = sgn(π) ωbπ(1) (c1 ) · · · ωbπ(α) (cα )
π∈Sα
(
(4.17),(4.14) 1 for (b1 , . . . , bα ) = (c1 , . . . , cα ),
= (4.18)
0 otherwise.
Now let M, N ∈ N, let (b11 , . . . , bα1 ), . . . , (b1N , . . . , bαN ) ∈ (BV )αord be distinct, and let
y1 , . . . , yM ∈ BW be distinct as well. Assume λlk ∈ F to be such that
M X
X N
A := λlk Ab1k ,...,bαk ,yl = 0.
l=1 k=1
4 MULTILINEAR MAPS AND DETERMINANTS 55
According to (a), it remains to show hBi = Altα (V, W ). Let A ∈ Altα (V, W ) and
(bi1 , . . . , biα ) ∈ (B α
V )ord . Then there exists a finite set B(i1 ,...,iα ) ⊆ BW such that
i1 iα
P
A(b , . . . , b ) = y∈B(i1 ,...,iα ) λy y with λy ∈ F . Now let y1 , . . . , yM , M ∈ N, be an
enumeration of the finite set
[
B(i1 ,...,iα ) .
(i1 ,...,iα )∈{1,...,n}α : (bi1 ,...,biα )∈(BV )α
ord
Then, for each (j, (bi1 , . . . , biα )) ∈ {1, . . . , M } × (BV )αord , there exists λj,(i1 ,...,iα ) ∈ F , such
that
M
X
i1 iα
i
∀ α
A(b , . . . , b ) = λj,(i1 ,...,iα ) yj .
(b 1 ,...,biα )∈(BV )ord
j=1
PM P
Letting à := j=1 (bi1 ,...,biα )∈(BV )α λj,(i1 ,...,iα ) Abi1 ,...,biα ,yj , we claim à = A. Indeed, for
ord
each (bj1 , . . . , bjα ) ∈ (BV )αord ,
M
X X
j1 jα
Ã(b , . . . , b ) = λj,(i1 ,...,iα ) Λα (ωbi1 , . . . , ωbiα )(bj1 , . . . , bjα ) yj
j=1 (bi1 ,...,biα )∈(BV )α
ord
M
(4.18) X X
= λj,(i1 ,...,iα ) δ(i1 ,...,iα ),(j1 ,...,jα ) yj
j=1 (bi1 ,...,biα )∈(BV )α
ord
M
X
= λj,(j1 ,...,jα ) yj = A(bj1 , . . . , bjα ),
j=1
proving à = A by (4.9) and Th. 4.3. Since à ∈ hBi, the proof of (b) is complete.
(c): Exercise.
4 MULTILINEAR MAPS AND DETERMINANTS 56
The following Th. 4.18 compiles some additional rules of importance for alternating
multilinear maps:
Theorem 4.18. Let V and W be vector spaces over the field F , α ∈ N. The following
rules hold for each A ∈ Altα (V, W ):
(a) The value of A remains unchanged if one argument is replaced by the sum of that
argument and a linear combination of the other arguments, i.e., if λ1 , . . . , λα ∈ F ,
v 1 , . . . , v α ∈ V , and i ∈ {1, . . . , α}, then
α
X
1 α
1 i−1 i
λj v j , v i+1 , . . . , v α
A(v , . . . , v ) = A v , . . . , v , v +
.
j=1
j6=i
then
!
X
1 α
A(w , . . . , w ) = sgn(π) a1π(1) · · · aαπ(α) A(v 1 , . . . , v α )
π∈Sα
where α
X
∀ xj := aji ei ∈ F α
j∈{1,...,α}
i=1
α
and {e1 , . . . , eα } is the standard basis of F .
(c) Suppose the family (v 1 , . . . , v α ) in V is linearly independent and such that there
exist w1 , . . . , wα ∈ h{v 1 , . . . , v α }i with A(w1 , . . . , wα ) 6= 0. Then A(v 1 , . . . , v α ) 6= 0
as well.
4 MULTILINEAR MAPS AND DETERMINANTS 57
A(v 1 , . . . , v i + λj v j , . . . , v α )
A∈Lα (V,W )
= λ−1 1 i j j α
j A(v , . . . , v + λj v , . . . , λj v , . . . , v )
A∈Lα (V,W )
= λ−1 1 i j α
j A(v , . . . , v , . . . , λj v , . . . , v )
+λ−1 1 j j α
j A(v , . . . , λj v , . . . , λj v , . . . , v )
(4.7)
= λ−1 1 i j α
j A(v , . . . , v , . . . , λj v , . . . , v ) + 0
A∈Lα (V,W )
= A(v 1 , . . . , v α ).
thereby proving the first equality in (4.19). The second equality in (4.19) is now an
immediate consequence of Cor. 4.16.
(c): Exercise.
Notation 4.21. Let F be a field and n ∈ N. As in [Phi19, Rem. 7.4(b)], we denote the
columns and rows of a matrix A := (aji ) ∈ M(n, F ) as follows:
a1i
..
ci := ci := . , ri := riA := ai1 . . . ain .
A
∀
i∈{1,...,n}
ani
Remark 4.22. Let F be a field and n ∈ N. If we consider the rows r1 , . . . , rn of the
matrix (aji ) ∈ M(n, F ) as elements of F n , then, by Cor. 4.16,
det(aji ) = det(r1 , . . . , rn ),
where the second det is the map det : (F n )n −→ F defined as in Cor. 4.16. As a further
consequence of Cor. 4.16, in combination with Th. 4.17(b), we see that det of Def. 4.19
is the unique form on M(n, F ) that is multilinear and alternating in the rows of the
matrix and that assigns the value 1 to the identity matrix.
Example 4.23. Let F be a field. We evaluate (4.20) explicitly for n = 1, 2, 3:
= sgn(Id) a11 a22 + sgn(1 2) a12 a21 = a11 a22 − a12 a21 ,
i.e. the determinant is the product of the elements of the main diagonal minus the
product of the elements of the other diagonal:
+ −
a11 a12
a21 a22
− +
4 MULTILINEAR MAPS AND DETERMINANTS 59
= sgn(Id) a11 a22 a33 + sgn(1 2) a12 a21 a33 + sgn(1 3) a13 a22 a31
+ sgn(2 3) a11 a23 a32 + sgn(1 2 3) a12 a23 a31 + sgn(1 3 2) a13 a21 a32
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 + a13 a22 a31 + a11 a23 a32 .
+ + + − − −
a11 a12 a13 a11 a12
a21 a22 a23 a21 a22
a31 a32 a33 a31 a32
− − − + + +
One writes the first two columns of the matrix once again on the right and then
takes the product of the first three diagonals in the direction of the main diagonal
with a positive sign and the first three diagonals in the other direction with a
negative sign.
We can now translate some of the results of Sec. 4.2 into results on determinants of
matrices:
(c) det is multilinear with regard to matrix rows as well as multilinear with regard to
matrix columns, i.e., for each v ∈ M(1, n, F ), w ∈ M(n, 1, F ), i ∈ {1, . . . , n}, and
4 MULTILINEAR MAPS AND DETERMINANTS 60
λ, µ ∈ F :
r1 r1
.. ..
. .
ri−1 ri−1
det λ ri + µ v = λ det(A) + µ det v ,
ri+1 ri+1
.. .
..
.
rn rn
det c1 . . . ci−1 λ ci + µ w ci+1 . . . cn
= λ det(A) + µ det c1 . . . ci−1 w ci+1 . . . cn .
rπ(n)
(i) det A = 0
(ii) The rows of A are linearly dependent.
(iii) The columns of A are linearly dependent.
4 MULTILINEAR MAPS AND DETERMINANTS 61
(i) The value of the determinant remains the same if one row of a matrix is replaced
by the sum of that row and a scalar multiple of another row. More generally, the
determinant remains the same if one row of a matrix is replaced by the sum of that
row and a linear combination of the other rows. The statement also remains true if
the word “row” is replaced by “column”. Thus, if λ1 , . . . , λn ∈ F and i ∈ {1, . . . , n},
then
r1
..
.
r1 r
i−1
.. P n
r + j=1 λj rj ,
det(A) = det . = det i j6=i
rn
ri+1
.
..
rn
n
X
det(A) = det(c1 , . . . , cn ) = det c1 , . . . , ci−1 , ci +
λj cj , ci+1 , . . . , cn
.
j=1
j6=i
Proof. As already observed in Rem. 4.22, det : M(n, F ) −→ F can be viewed as the
unique map det ∈ Altn (F n , F ) with det(e1 , . . . , en ) = 1, where one considers
r1
..
A = . ∈ (F n )n :
rn
we have
(4.20) X (4.12)
det(A) = sgn(π) a1π(1) · · · anπ(n) = det(r1 , . . . , rn )
π∈Sn
(4.13) X (4.12)
= sgn(π) aπ(1)1 · · · aπ(n)n = det(c1 , . . . , cn ). (4.22)
π∈Sn
and, thus,
(4.22) (4.19) (4.22)
det(AB) = det(r1C , . . . , rnC ) = det(r1A , . . . , rnA ) det(r1B , . . . , rnB ) = det(A) det(B).
(h): As a consequence of [Phi19, Th. 7.17(a)], A is singular if, and only if, the columns
of A are linearly dependent. Thus, det(A) = 0 if, and only if, A is singular due to (f).
Moreover, if A is invertible, then
(g)
1 = det(Idn ) = det(AA−1 ) = det(A) det(A−1 ).
Theorem 4.25 (Block Matrices). Let F be a field. The determinant of so-called block
matrices over F , where one block is a zero matrix (all entries 0), can be computed as
the product of the determinants of the corresponding blocks. More precisely, if n, m ∈ N,
4 MULTILINEAR MAPS AND DETERMINANTS 63
then
a11 . . . a1n
. .. ..
.
. . . ∗
an1 . . . ann
= det(aji ) det(bji ). (4.23)
0 ... 0 b11 . . . b1m
. .. .. .. .. ..
.. . . . . .
0 ... 0 bm1 . . . bmm
proving (4.23).
Corollary 4.26. Let F be a field and n ∈ N. If (aji ) ∈ M(n, F ) is upper triangular or
lower triangular, then
n
Y
det(aji ) = akk .
k=1
Proof. For (aji ) upper triangular, the statement follows from Th. 4.25 via an obvious
induction on n ∈ N. If (aji ) is lower triangular, then the transpose (aji )t is upper
triangular and the statement the follows from Cor. 4.24(b).
4 MULTILINEAR MAPS AND DETERMINANTS 64
Proof. Let j, i ∈ {1, . . . , n}, and let Mji denote the corresponding minor matrix of A
according to Def. 4.27. Then
Cor. 4.24(e) i+j 1
0 Cor. 4.24(e) j+i 1
∗
det R(j, i) = (−1) = det C(j, i) = (−1)
∗ Mji 0 Mji
(4.23) (4.25)
= (−1)i+j det(Mji ) = Aji ,
4 MULTILINEAR MAPS AND DETERMINANTS 65
Proof. (a): Let C := (cji ) := AÃ, D := (dji ) := ÃA. Also let R(j, i) and C(j, i) be as
in Lem. 4.28. Then, we compute, for each j, i ∈ {1, . . . , n},
A
r1
..
.
A (
n n ri−1
X
t (4.26)
X Cor. 4.24(c) A det(A) for i = j,
cji = ajk Aki = ajk det R(i, k) = rj = 0
det
k=1 k=1 rA for i 6= j,
i+1
.
..
rnA
n n
X (4.26) X
Atjk aki =
dji = aki det C(k, j)
k=1 k=1
(
Cor. 4.24(c) det(A) for i = j,
det cA . . . cA A
cA A
= 1 j−1 ci j+1 . . . cn =
0 for i 6= j,
proving (a).
(b): We obtain
(a) Cor. 4.24(d)
det(A) det(Ã) = det(AÃ) = det (det A) Idn = (det A)n ,
4 MULTILINEAR MAPS AND DETERMINANTS 66
proving (d).
(e): From (a), we obtain
n
X n
X
det(A) = Atij aji = aji Aji ,
j=1 j=1
proving (e).
Example 4.30. (a) We use Ex. 4.23(b) and Th. 4.29(d) to compute
1 2
D1 := :
3 4
D1 = 1 · 4 − 2 · 3 = −2,
which we also obtain when expanding with respect to the first row according to Th.
4.29(d). Expanding with respect to the second row, we obtain
D1 = −3 · 2 + 4 · 1 = −2.
D2 = 1·5·9+2·6·7+3·4·8−3·5·7−1·6·8−2·4·9 = 45+84+96−105−48−72 = 0.
Expanding with respect to the third column according to Th. 4.29(d), we obtain
4 5 1 2 1 2
D2 = 3·
−6·
+9·
= 3·(−3)−6·(−6)+9·(−3) = −9+36−27 = 0.
7 8 7 8 4 5
4 MULTILINEAR MAPS AND DETERMINANTS 67
Proof. In matrix form, the linear system reads Ax = b, which, for det(A) 6= 0, is
equivalent to x = A−1 b, where A−1 = (det A)−1 à by Th. 4.29(c). Since à := (Aji )t , we
have n n
X X
−1 t −1
∀ xj = (det A) Ajk bk = (det A) Akj bk ,
j∈{1,...,n}
k=1 k=1
proving (4.27).
Definition 4.32. Let n ∈ N. An element p = (p1 , . . . , pn ) ∈ (N0 )n is called a multi-
index; |p| := p1 + · · · + pn is called the degree of the multi-index. Let R be a ring with
unity. If x = (x1 , . . . , xn ) ∈ Rn and p = (p1 , . . . , pn ) is a multi-index, then we define
xp := xp11 xp22 · · · xpnn . (4.28)
Each function from Rn into R, x 7→ xp , is called a monomial function (in n variables).
A function P from Rn into R is called a polynomial function (in n variables) if, and only
if, it is a linear combination of monomial functions, i.e. if, and only if P has the form
X
P : Rn −→ R, P (x) = ap xp , k ∈ N0 , ap ∈ R (4.29)
|p|≤k
(if R is commutative, then our present definition of polynomial function is a special case
of the one given in Th. B.23 of the Appendix, also cf. Rem. B.24). If F := R is an
infinite field, then, as a consequence of Th. B.23(c), the representation of P in (4.29)
in terms of the monomial functions x 7→ xp is unique and, in this case, we also define,
for each polynomial function given in the form of (4.29), its degree3 , denoted deg(P ),
3
For example, if R = Z2 = {0, 1} and f, g : R −→ R, f (x) := x, g(x) := x2 , then f (0) = g(0) = 0,
f (1) = g(1) = 1, showing f = g and the nonuniqueness of the representation in (4.29) for R = Z2 .
However, it is still possible to generalize our degree definition for polynomial functions to situations,
where the representation in (4.29) is not unique: If P : Rn −→ R is a polynomial function, then for
each representation of P in the form (4.29), one can define the degree (of the representation) as in the
case, where R is an infinite field, then defining deg P to be the minimum of the representation degrees.
4 MULTILINEAR MAPS AND DETERMINANTS 68
to be the largest number d ≤ k such that there is p ∈ (N0 )n with |p| = d and ap 6= 0.
If all ap = 0, i.e. if P ≡ 0, then P is the zero polynomial function and its degree is
defined to be −∞ (some authors use −1 instead); in particular, d is then the degree of
each monomial function x 7→ xp with |p| = d. If F is a field (not necessarily infinite),
then we also define a rational function as a quotient of two polynomial functions: If
P, Q : F n −→ F are polynomials, then
P (x)
(P/Q) : F n \ Q−1 {0} −→ F, (P/Q)(x) := , (4.30)
Q(x)
(−1)i+j det(Mij )
−1
inv : GLn (F ) −→ GLn (F ), inv(aji ) := (aji ) = ,
det(aji )
where the Mji ∈ M(n − 1, F ) denote the minor matrices of (aji ). Thus, for each
(k, l) ∈ {1, . . . , n}2 , the component function
(−1)k+l det(Mlk )
invkl : GLn (F ) −→ F, invkl (aji ) = ,
det(aji )
merely grows proportional to n3 ): We know from Cor. 4.24(e),(i) that the operations
of the Gaussian elimination algorithm do not change the determinant, except for sign
changes, when switching rows. For the same reasons, one should use Gaussian elimi-
nation (or even more efficient algorithms adapted to special situations) rather than the
neat-looking explicit formula of Cramer’s rule of Th. 4.31 to solve large linear systems
and rather than Th. 4.29(c) to compute inverses of large matrices.
Theorem 4.35 (Vandermonde Determinant). Let F be a field and λ0 , λ1 , . . . , λn ∈ F ,
n ∈ N. Moreover, let
1 λ0 . . . λn0
1 λ1 . . . λn
1
V := .. .. ∈ M(n + 1, F ), (4.31)
. .
1 λn . . . λnn
which is known as the corresponding Vandermonde matrix. Then its determinant, the
so-called Vandermonde determinant, is given by
n
Y
det(V ) = (λk − λl ). (4.32)
k,l=0
k>l
Proof. The proof can be conducted by induction with respect to n: For n = 1, we have
1
1 λ0 Y
det(V ) = = λ1 − λ0 = (λk − λl ),
1 λ1
k,l=0
k>l
showing (4.32) holds for n = 1. Now let n > 1. Using Cor. 4.24(i), we add the (−λ0 )-fold
of the nth column to the (n + 1)st column, we obtain in the (n + 1)st column
0
λn − λn−1 λ0
1 1
.. .
.
n n−1
λn − λn λ0
Next, one adds the (−λ0 )-fold of the (n − 1)st column to the nth column, and, succes-
sively, the (−λ0 )-fold of the mth column to the (m + 1)st column. One finishes, in the
nth step, by adding the (−λ0 )-fold of the first column to the second column, obtaining
1 λ0 . . . λn 1 0 0 ... 0
0
1 λ1 . . . λn 1 λ1 − λ0 λ2 − λ1 λ0 . . . λn − λn−1 λ0
1 1 1 1
det(V ) = .. .. = .. .. .. .. .. .
. . . . . . .
n 2 n n−1
1 λn . . . λn 1 λn − λ0 λn − λn λ0 . . . λn − λn λ0
4 MULTILINEAR MAPS AND DETERMINANTS 70
We now use multilinearity to factor out, for each k ∈ {1, . . . , n}, (λk − λ0 ) from the kth
row, arriving at
n−1
n
1 λ 1 . . . λ 1
(λk − λ0 ) ... ... .. .. ,
Y
det(V ) =
. .
n−1
k=1 1 λn . . . λn
which is precisely the Vandermonde determinant of the n − 1 numbers λ1 , . . . , λn . Using
the induction hypothesis, we obtain
n
Y n
Y n
Y
det(V ) = (λk − λ0 ) (λk − λl ) = (λk − λl ),
k=1 k,l=1 k,l=0
k>l k>l
(namely (cji ) such that, for each i ∈ {1, . . . , n}, wi = nj=1 cji vj ). Thus,
P
and, in consequence,
(1)
det : L(V, V ) −→ F, det(A) := det(aji ),
is well-defined.
(b) If A ∈ L(V, V ), then det(A) = 0 if, and only if, A is not bijective. If A is bijective,
then det(A−1 ) = (det(A))−1 .
(b): Since (aji ) (with (aji ) as before) is singular if, and only if, A is not bijective, (b) is
immediate from Cor. 4.24(h).
(c): If (aji ) is as before, then
Cor. 4.24(d) n
det(λA) = det λ (aji ) = λ det(aji ) = λn det(A),
Definition 5.1. Let V be a vector space over the field F , let I be an index set and let
(Ui )i∈I be a family of subspaces of V . We say that V is the direct sum of the family of
subspaces (Ui )i∈I if, and only if, the following two conditions hold:
P
(i) V = i∈I Ui .
(ii) For each finite J ⊆ I and each family (uj )j∈J in V such that uj ∈ Uj for each
j ∈ J, one has X
0= uj ⇒ ∀ uj = 0.
j∈J
j∈J
L
If V is the direct sum of the Ui , then we write V = i∈I Ui .
5 DIRECT SUMS AND PROJECTIONS 72
Proposition 5.2. Let V be a vector space over the field F , let I be an index set and let
(Ui )i∈I be a family of subspaces of V . Then the following statements are equivalent:
L
(i) V = i∈I Ui .
(ii) For each v ∈ V , there exists a unique finite subset Jv of I and a unique map
σv : Jv −→ V \ {0}, j 7→ uj (v) := σv (j), such that
X X
v= σv (j) = uj (v) ∧ ∀ uj (v) ∈ Uj . (5.1)
j∈Jv
j∈Jv j∈Jv
P
(iii) V = i∈I Ui and
X
∀ Uj ∩ Ui = {0}. (5.2)
j∈I
i∈I\{j}
(iv) V = i∈I Ui and, letting I ′ := {i ∈ I : Ui 6= {0}}, each family (ui )i∈I ′ in V with
P
ui ∈ Ui \ {0} for each i ∈ I ′ is linearly independent.
P
Proof. “(i) ⇔ (ii)”: According to the definition P of V = i∈I Ui , the existence of Jv and
σv such that (5.1)
P holds is equivalent to V = U
i∈I i . If (5.1) holds and Iv ⊆ I is finite
such that v = j∈Iv τv (j) with τv : Iv −→ V \ {0} and τv (j) ∈ Uj for each j ∈ Iv , then
define σv (j) := 0 for each j ∈ Iv \ Jv and τv (j) := 0 for each j ∈ Jv \ Iv . Then
X
0=v−v = σv (j) − τv (j)
j∈Jv ∪Iv
and Def. 5.1(ii) implies σv (j) = τv (j) for each j ∈ Jv ∪ Iv as well as Jv = Iv . Conversely,
assume there exists J ⊆ I finite and uj ∈ Uj (j ∈ J) such that
X
0= uj ∧ ∃ uj0 6= 0.
j0 ∈J
j∈J
Then X
uj 0 = (−uj )
j∈J\{j0 }
shows (ii) does not hold (as v := uj0 has two different representations).
P
“(i) ⇒ (iii)”: If (i) holds and v ∈ Uj ∩ i∈I\{j} Ui for some j ∈ I, then there exists a
P
finite J ⊆ I \ {j} such that v = i∈J ui with ui ∈ Ui for each i ∈ J. Since −v ∈ Uj and
X
0 = −v + ui ,
i∈J
5 DIRECT SUMS AND PROJECTIONS 73
Proof. Exercise.
Example 5.4. Consider the vector space V := R2 over R and let U1 := h{(1, 0)}i,
U2 := h{(0, 1)}i, U3 := h{(1, 1)}i. Then V = Ui + Uj and Ui ∩ Uj = {0} for each
i, j ∈ {1, 2, 3} with i 6= j. In particular, the sum V = U1 + U2 + U3 is not a direct sum,
showing Prop. 5.2(iii) can, in general, not be replaced by the condition Ui ∩ Uj = {0}
for each i, j ∈ I with i 6= j.
Definition 5.5. Let S be a set. Then P : S −→ S is called a projection if, and only if,
P 2 := P ◦ P = P .
Example 5.7. (a) Let V be a vector space over the field F and let B be a basis of
V . If cv : Bv −→ F \ {0}, Bv ⊆ V finite, is the coordinate map for v ∈ V , then,
5 DIRECT SUMS AND PROJECTIONS 74
(b) Let (Pi )i∈I be a family of projections in L(V, PV ) such that, P for each v ∈ V , the set
Jv := {i ∈ I : Pi (v) 6= 0} is finite. If Id = i∈I Pi (where i∈I Pi is defined as in
(5.4)) and Pi Pj ≡ 0 for each i, j ∈ {1, . . . , n} with i 6= j, then
M
V = Im Pi .
i∈I
6 Eigenvalues
Definition 6.1. Let V be a vector space over the field F and A ∈ L(V, V ).
(a) We call λ ∈ F an eigenvalue of A if, and only if, there exists 0 6= v ∈ V such that
Av = λv. (6.1)
Then each 0 6= v ∈ V such that (6.1) holds is called an eigenvector of A for the
eigenvalue λ; the set
EA (λ) := ker λ Id −A = v ∈ V : Av = λv (6.2)
is then called the eigenspace of A with respect to the eigenvalue λ. The set
σ(A) := {λ ∈ F : λ eigenvalue of A} (6.3)
is called the spectrum of A.
(b) We call A diagonalizable if, and only if, there exists basis B of V such that each
v ∈ B is an eigenvector of A.
Remark 6.2. Let V be a finite-dimensional vector space over the field F , dim V =
n ∈ N, and assume A ∈ L(V, V ) to be diagonalizable. Then there exists a basis B =
{v1 , . . . , vn } of V , consisting of eigenvectors of A, i.e. Avi = λi vi , λi ∈ σ(A), for each
i ∈ {1, . . . , n}. Thus, with respect to B, A is represented by the diagonal matrix
λ1
diag(λ1 , . . . , λn ) =
... ,
λn
which explains the term diagonalizable.
—
6 EIGENVALUES 77
It will be a main goal of the present and the following sections to investigate under
which conditions, given A ∈ L(V, V ) with dim V < ∞, V has a basis B such that, with
respect to B, A is represented by a diagonal matrix. While such a basis does not always
exist, we will see that there always exist bases such that the representing matrix has a
particularly simple structure, a so-called normal form.
Theorem 6.3. Let V be a vector space over the field F and A ∈ L(V, V ).
(a) λ ∈ F is an eigenvalue of A if, and only if, ker(λ Id −A) 6= {0}, i.e. if, and only if,
λ Id −A is not injective.
(c) Let (vλ )λ∈σ(A) be a family in V such that, for each λ ∈ σ(A), vλ is a eigenvector for
λ. Then (vλ )λ∈σ(A) is linearly independent (in particular, #σ(A) ≤ dim V ).
(d) A is diagonalizable if, and only if, V is the direct sum of the eigenspaces of V , i.e.
if, and only if, M
V = EA (λ). (6.4)
λ∈σ(A)
(e) Let A be diagonalizable and, for each λ ∈ σ(A), let Pλ : V −→ V be the projection
with M
Im Pλ = EA (λ) and ker Pλ = EA (µ)
µ∈σ(A)\{λ}
and
∀ APλ = Pλ A, (6.5b)
λ∈σ(A)
(c): Seeking a contradiction, assume (vλ )λ∈σ(A) to be linearly dependent. Then there
exists a minimal family of vectors (vλ1 , . . . , vλk ), k ∈ N, such that λ1 , . . . , λk ∈ σ(A) and
there exist c1 , . . . , ck ∈ F \ {0} with
k
X
0= ci vλi .
i=1
We compute
k
! k k k
X X X X
0=0−0=A ci vλi − λk ci vλi = c i λi v λ i − λk ci vλi
i=1 i=1 i=1 i=1
k−1
X
= ci (λi − λk ) vλi .
i=1
As we had chosen the family (vλ1 , . . . , vλk ) to be minimal, we obtain ci (λi − λk ) = 0 for
each i ∈ {1, . . . , k − 1}, which is a contradiction, since ci 6= 0 as well as λi 6= λk .
(d): If A is diagonalizable, V has basis B, consisting of eigenvectors of A. Letting, for
each λ ∈ σ(A), Bλ := {b ∈ B : Ab = λb}, we have that Bλ is a basis of EA (λ). Since
Ṡ
we have B = λ∈σ(A) Bλ , (6.4) now follows from Prop. 5.3(a). Conversely, if (6.4) holds,
then V has a basis of eigenvectors of A by means of Prop. 5.3(b).
(e): Exercise.
Corollary 6.4. Let V be a vector space over the field F and A ∈ L(V, V ). If dim V =
n ∈ N and A has n distinct eigenvalues λ1 , . . . , λn ∈ F , then A is diagonalizable.
The following examples illustrate the dependence of diagonalizability, and even of the
mere existence of eigenvalues, on the structure of the field F .
Example 6.5. (a) Let K ∈ {R, C} and let V be a vector space over K with dim V = 2
and ordered basis B := (v1 , v2 ). Consider A ∈ L(V, V ) such that
0 −1
With respect to B, A is then given by the matrix M := . In consequence,
1 0
2 0 −1 0 −1 −1 0
M = = ,
1 0 1 0 0 −1
−v = A2 v = λ2 v ⇒ λ2 = −1.
showing A to be diagonalizable with σ(A) = {i, −i} and {v1 + iv2 , v1 − iv2 } being
a basis of V of eigenvectors of A.
where
∀ expa : R −→ R, expa (x) := eax .
a∈R
D : Vi −→ Vi , D(f ) := f ′ .
(c) Let V be a vector space over the field F and assume char F 6= 2. Moreover, let
A ∈ L(V, V ) such that A2 = Id (for
dim V = 2, a nontrivial example is given by
0 1
each A represented by the matrix with respect to some ordered basis of V ).
1 0
We claim A to be diagonalizable with σ(A) = {−1, 1} and
V = EA (−1) ⊕ EA (1) :
6 EIGENVALUES 80
showing Im(A + Id) ⊆ EA (1). If x ∈ Im(Id −A), then there exists v ∈ V such that
x = (Id −A)v, implying
(d) Let V be a vector space over the field F with dim V = 2 and ordered basis B :=
(v1 , v2 ). Consider A ∈ L(V, V ) such that
Av1 = v1 ,
Av2 = v1 + v2 .
1 1
With respect to B, A is then given by the matrix . Due to Av1 = v1 , we
0 1
have 1 ∈ σ(A). Let v ∈ V . Then there exist c1 , c2 ∈ F such that v = c1 v1 + c2 v2 .
If λ ∈ σ(A) and 0 6= v ∈ EA (λ), then
λ(c1 v1 + c2 v2 ) = λv = Av = c1 v1 + c2 v1 + c2 v2 .
As the coordinates with respect to the basis B are unique, we conclude λc1 = c1 +c2
and λc2 = c2 . If c2 6= 0, then the second equation yields λ = 1. If c2 = 0, then
c1 6= 0 and the first equation yields λ = 1. Altogether, we obtain σ(A) = {1} and,
since A 6= Id, A is not diagonalizable.
Definition 6.6. Let S be a set and A : S −→ S. Then U ⊆ S is called A-invariant if,
and only if, A(U ) ⊆ U .
Proposition 6.7. Let V be a vector space over the field F and let U ⊆ V be a subspaces.
If A ∈ L(V, V ) is diagonalizable and U is A-invariant, then A ↾U is diagonalizable as
well.
It suffices to show X
U = W := U ∩ EA (λ) ,
λ∈σ(A)
since, then M
U= EAU (λ)
λ∈σ(AU )
due to Th. 6.3(d) and Prop. 5.2(iv). Thus, seeking a contradiction, let u ∈ P U \ W (note
u 6= 0). Then there exist distinct λ1 , . . . , λn ∈ σ(A), n ∈ N, such that u = ni=1 vi with
vi ∈ EA (λi ) \ {0} for each i ∈ {1, . . . , n}, where we may choose u ∈ U \ W such that
n ∈ N is minimal. Since U is A-invariant, we know
n
X n
X
Au = Avi = λi vi ∈ U.
i=1 i=1
As λn u ∈ U as well, we conclude
n−1
X
Au − λn u = (λi − λn ) vi ∈ U
i=1
Since the sum in (6.6) is direct, (6.7) and Prop. 5.2(ii) imply σu = {λ1 , . . . , λn−1 } and
λi 6=λn
∀ wλi = (λi − λn ) vi ⇒ vi ∈ U ∩ EA (λi ) ⊆ W .
i∈{1,...,n−1}
We will now use Prop. 6.7 to prove a result regarding the simultaneous diagonalizability
of linear endomorphisms:
Theorem 6.8. Let V be a vector space over the field F and let A1 , . . . , An ∈ L(V, V ),
n ∈ N, be diagonalizable linear endomorphisms. Then the A1 , . . . , An are simultaneously
diagonalizable (i.e. there exists a basis B of V , consisting of eigenvectors of Ai for each
i ∈ {1, . . . , n}) if, and only if,
∀ Ai Aj = Aj Ai . (6.8)
i,j∈{1,...,n}
Then
∀ Ai Aj b = λi,b λj,b b = Aj Ai b,
i,j∈{1,...,n}
proving (6.8). Conversely, assume (6.8) to hold. We prove (6.9) via induction on n ∈ N.
For technical reasons, we actually prove (6.9) via induction on n ∈ N in the following,
clearly, equivalent form: There exists a family (Vk )k∈K of subspaces of V such that
L
(i) V = k∈K Vk .
(ii) For each k ∈ K, Vk has a basis Bk , consisting of eigenvectors of Ai for each
i ∈ {1, . . . , n}.
(iii) For each k ∈ K and each i ∈ {1, . . . , n}, Vk is contained in some eigenspace of Ai ,
i.e.
∀ ∀ ∃ Vk ⊆ EAi (λik ), ∀ Ai v = λik v .
k∈K i∈{1,...,n} λik ∈σ(Ai ) v∈Vk
(iv) For each k, l ∈ K with k 6= l, there exists i ∈ {1, . . . , n} such that Vk and Vl are
not contained in the same eigenspace of Ai , i.e.
∀ k 6= l ⇒ ∃ λik 6= λil .
k,l∈K i∈{1,...,n}
For n = 1, we can simply use K := σ(A1 ) and, for each λ ∈ K, Vλ := EA1 (λ). Thus,
consider n > 1. By induction, assume (i) – (iv) to hold with n replaced by n − 1. It
suffices to show that the spaces Vk , k ∈ K, are all An -invariant, i.e. An (Vk ) ⊆ Vk : Then,
according to Prop. 6.7, Ank := An ↾Vk is diagonalizable, i.e.
M
Vk = EAnk (λ).
λ∈σ(Ank )
6 EIGENVALUES 83
Now each Vkλ := EAnk (λ) has a basis Bkλ , consisting of eigenvectors of An . Since, for
each i ∈ {1, . . . , n − 1}, Bkλ ⊆ Vk ⊆ EAi (λik ), Bkλ consists of eigenvectors of Ai for each
i ∈ {1, . . . , n}. Letting Kn := {(k, λ) : k ∈ K, λ ∈ σ(Ank )}, (i) – (iv) then hold with
K replaced by Kn . Thus, it remains to show An (Vk ) ⊆ Vk for each k ∈ Vk : Fix v ∈ Vk ,
k ∈ K. We have
(6.8)
∀ Aj (An v) = An (Aj v) = An (λjk v).
j∈{1,...,n−1}
Then
X X X
∀ λjk vl = λjk (An v) = An (λjk v) = Aj (An v) = Aj vl = λjl vl .
j∈{1,...,n−1}
l∈Kv l∈Kv l∈Kv
As the sum in (i) is direct, Prop. 5.2(ii) implies λjk vl = λjl vl for each l ∈ Kv . For each
l ∈ Kv , we have vl 6= 0, implying λjk = λjl for each j ∈ {1, . . . , n − 1}. Thus, by (iv),
k = l, i.e. Kv = {k} and An v = vk ∈ Vk as desired.
In general, computing eigenvalues is a difficult task (we will say more about this issue
later in Rem. 8.3(e) below). The following results can sometimes help, where Th. 6.9(a)
is most useful for dim V small:
Theorem 6.9. Let V be a vector space over the field F , dim V = n ∈ N. Let A ∈
L(V, V ).
det(λ Id −A) = 0.
(b) If there exists a basis B of V such that the matrix (aji ) ∈ M(n, F ) of A with respect
to B is upper or lower triangular,
then the diagonal elements aii are precisely the
eigenvalues of A, i.e. σ(A) = aii : i ∈ {1, . . . , n} .
Proof. (a): According to Th. 6.3(a), λ ∈ σ(A) is equivalent to λ Id −A not being in-
jective, which (as V is finite-dimensional) is equivalent to det(λ Id −A) = 0 by Cor.
4.37(b).
6 EIGENVALUES 84
Example 6.10. Consider the vector space V := R2 over R and, with respect to the
3 −2
standard basis, let A ∈ L(V, V ) be given by the matrix M := . Then, for each
1 0
λ ∈ R,
λ − 3 2
det(λ Id −A) = = (λ − 3) · λ + 2 = λ2 − 3λ + 2 = (λ − 1)(λ − 2),
−1 λ
Remark 6.11. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Moreover, let λ ∈ σ(A). Clearly, one has
Then
∀ ker(A − λ Id)r(λ) = ker(A − λ Id)r(λ)+k : (6.11)
k∈N
Definition 6.12. Let V be a vector space over the field F , dim V = n ∈ N, and
A ∈ L(V, V ). Moreover, let λ ∈ σ(A). The number
where r(λ) is given by (6.10), is called the algebraic multiplicity of the eigenvalue λ,
whereas
mg (λ) := dim ker(A − λ Id) ∈ {1, . . . , n} (6.13)
is called its geometric multiplicity. We call λ simple if, and only if, mg (λ) = ma (λ) = 1;
we call λ semisimple if, and only if, mg (λ) = ma (λ). For each k ∈ {1, . . . , r(λ)}, the
space
EAk (λ) := ker(A − λ Id)k
is called the generalized eigenspace of rank k of A, corresponding to the eigenvalue λ;
each v ∈ EAk (λ) \ EAk−1 (λ), k ≥ 2, is called a generalized eigenvector of rank k with
corresponding to the eigenvalue λ (an eigenvector v ∈ EA (λ) is sometimes called a
generalized eigenvector of rank 1).
Proposition 6.13. Let V be a vector space over the field F , dim V = n ∈ N, and
A ∈ L(V, V ).
1 ≤ r(λ) ≤ n. (6.14)
(b) If λ ∈ σ(A), then, for each k ∈ {1, . . . , r(λ)}, the generalized eigenspace EAk (λ) is
A-invariant, i.e.
A EAk (λ) ⊆ EAk (λ).
(c) If A is diagonalizable, then mg (λ) = ma (λ) holds for each λ ∈ σ(A) (but cf. Ex.
6.14 below).
Proof. (a): Both (6.14) and (6.15) are immediate from Rem. 6.11 together with the defi-
nitions of r(λ), mg (λ) and ma (λ). Then (6.16a) follows from (6.15), since dim EAk+1 (λ) −
dim EAk (λ) ≥ 1; (6.16b) is immediate from (6.15); (6.16c) is immediate from (6.16b).
6 EIGENVALUES 86
(c): Exercise.
Example 6.14. Let V be a vector space over the field F , dim V = n ∈ N, n ≥ 2. Let
λ ∈ F . We show that there always exists a map A ∈ L(V, V ) such that λ ∈ σ(A) and
such that the difference between ma (λ) and mg (λ) maximal, namely
ma (λ) − mg (λ) = n − 1 :
Let B = (v1 , . . . , vn ) be an ordered basis of V , and let A ∈ L(V, V ) be such that
Av1 = λv1 ∧ ∀ Avi = λvi + vi−1 .
i∈{2,...,n}
For k = 1, we have
(A − λ Id)v1 = λv1 − λv1 = 0, ∀ (A − λ Id)vi = λvi + vi−1 − λvi = vi−1 ,
1<i≤n
completing the induction. In particular, since dim V = n, (6.17) yields, for each 1 <
k ≤ n, dim EAk (λ) − dim EAk−1 (λ) = 1 and dim EAk (λ) = k, implying mg (λ) = 1 and
ma (λ) = n, as claimed.
7 COMMUTATIVE RINGS, POLYNOMIALS 87
Definition 6.15. Let n ∈ N. Consider the vector space V := F n over the field F . All
of the notions we introduced in this section for linear endomorphisms A ∈ L(V, V ) (e.g.
eigenvalue, eigenvector, eigenspace, multiplicity of an eigenvalue, diagonalizability, etc.),
one also defines for quadratic matrices M ∈ M(n, F ): The notions are then meant with
respect to the linear map AM that M represents with respect to the standard basis of
F n.
∀ M k = T Dk T −1 .
k∈N0
Clearly, if one knows T and T −1 , this can tremendously simplify the computation of
M k , especially if k is large and M is fully populated. However, computing T and T −1
can also be difficult, and it depends on the situation if it is a good option to pursue this
route.
the set of polynomials over R (i.e. a polynomial over R is a sequence (ai )i∈N0 in R such
that all, but finitely many, of the entries ai are 0, cf. [Phi19, Ex. 5.16(c)]). We then
have the pointwise-defined addition and scalar multiplication on R[X], which it inherits
from RN0 :
∀ (f + g) : N0 −→ R, (f + g)(i) := f (i) + g(i),
f,g∈R[X]
(7.2)
∀ ∀ (λ · f ) : N0 −→ R, (λ · f )(i) := λ f (i),
f ∈R[X] λ∈R
7 COMMUTATIVE RINGS, POLYNOMIALS 88
where we know from [Phi19, Ex. 5.16(c)] that, with these compositions, R[X] forms a
vector space over R, provided R is a field and, then, B = {ei : i ∈ N0 }, where
∀ ei : N0 −→ R, ei (j) := δij ,
i∈N0
provides the standard basis of the vector space R[X]. In the current context, we will
now write X i := ei and we will call these polynomials monomials. Furthermore, we
define a multiplication on R[X] by letting
(ai )i∈N0 , (bi )i∈N0 7→ (ci )i∈N0 := (ai )i∈N0 · (bi )i∈N0 ,
i
X X X (7.3)
ci := ak bl := ak b l = ak bi−k .
k+l=i (k,l)∈(N0 )2 : k+l=i k=0
If f := (ai )i∈N0 ∈ R[X], then we call the ai ∈ R the coefficients of f , and we define the
degree of f by (
−∞ for f ≡ 0,
deg f := (7.4)
max{i ∈ N0 : ai 6= 0} for f 6≡ 0
(defining deg(0) = −∞ instead of deg(0) = −1 has the advantage that formulas (7.5a),
(7.5b) below then also hold for the zero polynomial). If deg f = n ∈ N0 and an = 1,
then the polynomial f is called monic.
Remark 7.2. In the situation of Def. 7.1, using the notation X i = ei , we can write ad-
dition, scalar multiplication, and
Pn multiplication
Pin the following, perhaps, more familiar-
n
looking forms: If λ ∈ R, f = i=0 fi X , g = i=0 gi X i , n ∈ N0 , f0 , . . . , fn , g0 , . . . , gn ∈
i
R, then
n
X
f +g = (fi + gi ) X i ,
i=0
n
X
λf = (λfi ) X i ,
i=0
2n
!
X X
fg = fk g l X i.
i=0 k+l=i
Recall from [Phi19, Def. and Rem. 4.41] that an element x in a ring with unity R is
called invertible if, and only if, there exists x ∈ R such that xx = xx = 1, and that
(R∗ , ·) denotes the group of invertible elements of R.
7 COMMUTATIVE RINGS, POLYNOMIALS 89
(b) (R[X], +, ·) forms a commutative ring with unity, where 1 = X 0 is the neutral
element of multiplication.
ι : R −→ R[X], ι(r) := r X 0 :
Proof. Since R has no nonzero zero divisors, (7.5c) holds, i.e., if f, g ∈ R[X] with
f, g 6= 0, then deg(f g) = deg f + deg g ≥ 0, showing f g 6= 0, such that f, g can not be
zero divisors. First note that R∗ ⊆ (R[X])∗ always holds according to Lem. 7.3(b). If
f ∈ R[X] \ R, then deg f ≥ 1 and (7.5c) implies deg(f g) ≥ 1 for each 0 6= g ∈ R[X],
/ (R[X])∗ and also that each g ∈ R \ R∗ is not in (R[X])∗ ,
i.e. f g 6= 1. This shows f ∈
thereby proving R∗ = (R[X])∗ .
Example 7.8. Let R := Z4 = Z/(4Z). Then (due to 4 = 0 in Z4 )
(2X 1 + 1X 0 )(2X 1 + 1X 0 ) = 0X 2 + 0X 1 + 1X 0 = X 0 ,
Next, we will prove a remainder theorem for polynomials, which can be seen as an
analogon of the remainder theorem for integers (cf. [Phi19, Th. D.1]):
Theorem
Pd 7.9 (Remainder Theorem). Let R be a commutative ring with unity. Let
g = i=0 gi X i ∈ R[X], deg g = d, where gd ∈ R∗ . Then, for each f ∈ R[X], there exist
unique polynomials q, r ∈ R[X] such that
However, since deg(r − r′ ) < d and deg g = d, this can only hold for q = q ′ , which, in
turn, implies r = r′ as well.
7 COMMUTATIVE RINGS, POLYNOMIALS 92
Definition 7.11. Let F be a field. We call F algebraically closed if, and only if, for each
f ∈ F [X] with deg f ≥ 1, there exists λ ∈ F such that ǫλ (f ) = 0, i.e. such that λ is a zero
of f , as defined in Def. and Rem. 7.10 (cf. Th. 7.38). It is an important result of Algebra
that every field F is contained in an algebraically closed field (a proof is provided in Th.
D.16 of the Appendix – some additional required material from nonlinear algebra, not
covered in the main text, is also included in the Appendix).
Notation 7.12. Let R be a commutative ring with unity. One commonly uses the
simplified notation X := X 1 ∈ R[X] and s := s X 0 ∈ R[X] for each s ∈ R.
Proposition 7.13. Let R be a commutative ring with unity. For each f ∈ R[X] with
deg f = n ∈ N and each s ∈ R, there exists q ∈ R[X] with deg q = n − 1 such that
f = ǫs (f ) + (X − s) q = ǫs (f ) X 0 + (X 1 − s X 0 ) q, (7.9a)
where ǫs is the substitution homomorphism according to Def. and Rem. 7.10. In partic-
ular, if s is a zero of f , then
f = (X − s) q. (7.9b)
Proof. According to Th. 7.9, there exist q, r ∈ R[X] such that f = q(X − s) + r with
deg r < deg(X − s) = 1. Thus, r ∈ R and deg q = n − 1 by (7.5c) (which holds, as X − s
is monic). Applying ǫs to f = q(X − s) + r then yields ǫs (f ) = ǫs (q)(s − s) + r = r,
proving (7.9).
Proof. For f ∈ F [X] with deg f = n ∈ N0 , the representation (7.10a) follows from (7.9b)
combined with a straightforward induction, and (7.10b) is immediate from (7.10a). From
(7.9b) combined with the degree formula (7.5c), we then also know k ≤ n, i.e. f ∈ F [X]
with deg f = n ∈ N0 can have at most n zeros. If F is algebraically closed and q has no
zeros, then deg q = 0, implying k = n.
Example 7.15. (a) Since C is algebraically closed by [Phi16a, Th. 8.32], for each f ∈
C[X] with n := deg f ∈ N, there exist numbers c, λ1 , . . . , λn ∈ C such that
n
Y
f =c (X − λj ) (7.11)
j=1
(the λ1 , . . . , λn are precisely all the zeros of f , some or all of which might be iden-
tical).
(b) For each f ∈ R[X] with n := deg f ∈ N, there exist numbers n1 , n2 ∈ N0 and
c, ξ1 , . . . , ξn1 , α1 , . . . , αn2 , β1 , . . . , βn2 ∈ R such that
n = n1 + 2n2 , (7.12a)
and
n1
Y n2
Y
f =c (X − ξj ) (X 2 + αj X + βj ) : (7.12b)
j=1 j=1
Indeed, if f has only real coefficients, then we can take complex conjugates to
obtain, for each λ ∈ C,
ǫλ (f ) = 0 ⇒ ǫλ (f ) = ǫλ (f ) = 0,
showing that the nonreal zeros of f (if any) must occur in conjugate pairs. Moreover,
Theorem 7.17. Let R be a commutative ring with unity and consider the map
φ : R[X] −→ Pol(R), f 7→ φ(f ), φ(f )(x) := ǫx (f ). (7.13)
Proof. (a): If f = X 0 , then φ(f ) ≡ 1. We also know from Def. and Rem. 7.10 that, for
each x ∈ R, ǫx is a linear ring homomorphism. Thus, if f, g ∈ R[X] and λ ∈ R, then,
for each x ∈ R,
φ(f + g)(x) = ǫx (f + g) = ǫx (f ) + ǫx (g) = φ(f ) + φ(g) (x),
φ(f g)(x) = ǫx (f g) = ǫx (f ) ǫx (g) = φ(f )φ(g) (x),
φ(λf )(x) = ǫx (λf ) = λ ǫx (f ) = λ φ(f ) (x).
Pn i
Moreover, φ is an epimorphism since, if P ∈ Pol(R) Pn with P (x) = i=0 fi x , where
f0 , . . . , fn ∈ R, n ∈ N0 , then P = φ(f ) with f = i=0 fi X i .
N0
(b): If R is finite, then RR and Pol(R) ⊆ RR are finite, whereas R[X] = Rfin is infinite
(also cf. 7.18 below).
(c): If F is an infinite field and f ∈ F [X] is such that P := φ(f ) ≡ 0, then each λ ∈ F
is a zero of f , showing f to have infinitely many zeros. Thus, according to Cor. 7.14,
deg f ∈/ N, implying f = 0. Thus, ker φ = {0} and φ is a monomorphism.
Example 7.18. If R is a finite commutative ring with unity, then f := λ∈R (X 1 − λ) ∈
Q
R[X] \ {0}, but, using φ of (7.13), φ(f ) ≡ 0. For a concrete example, consider the field
with two elements, R := Z2 = {0, 1}. Then, 0 6= f := X 2 + X 1 = X 1 (X 1 + 1) ∈ R[X],
but, for each x ∈ R, φ(f )(x) = x(x + 1) = 0.
Remark 7.19. If F is a field and P ∈ Pol(F ) can be written as P (x) = ni=0 ai xi
P
with ai ∈ F , n ∈ N0 , then the representation is unique if, and only if, F has at least
n + 1 elements (in particular, the monomial functions x 7→ xi , i ∈ {0, . . . , n}, are
linearly independent if, and only if, F has at least n + 1 elements): Indeed, if F has
less than n +Q1 elements, then, as in Ex. 7.18 (and again using φ of (7.13)),
Pn φ(fi ) = 0,
1
where f := λ∈F (X − λ) ∈ R[X] \ {0} and 1 ≤ deg f ≤ n. If g := i=0 aP i X , then
n i
P = φ(g) = φ(g + f ). Since Png + f 6
= g and deg(g + f ) ≤ n, if we write g + f = i=0 bi X
with bi ∈ F , then P (x) = i=0 bi xi , showing the nonuniqueness ofP the representation of
P . Conversely, assume F has at least n + 1 elements. If P P (x) = ni=0 bi xi with bi ∈ F
is also a representation of f , then φ(h) = 0, where h := ni=0 (bi − ai )X i . Thus, h has
at least n + 1 zeros and deg h ≤ n together with Cor. 7.14 implies h = 0, i.e. ai = bi for
each i ∈ {0, . . . , n}. In consequence, the representation of P is unique.
7 COMMUTATIVE RINGS, POLYNOMIALS 96
(a) We call R integral or an integral domain if, and only if, R does not have any nonzero
zero divisors.
(b) We call R Euclidean if, and only if, R is an integral domain and there exists a map
deg : R \ {0} −→ N0 such that, for each f, g ∈ R with g 6= 0, there exist q, r ∈ R,
satisfying
f = qg + r ∧ deg r < deg g ∨ r = 0 . (7.14)
The map deg is then called a degree map or a Euclidean map of R.
Example 7.21. (a) Every field F is a Euclidean ring, where we can choose deg : F ∗ =
F \ {0} −→ N0 , deg ≡ 0, as the degree map: If g ∈ F ∗ , then, given f ∈ F , we
choose q := f g −1 and r := 0. Then, clearly, (7.14) holds.
(b) Z is a Euclidean ring, where we can choose deg : Z \ {0} −→ N0 , deg(k) := |k|,
as the degree map: According to [Phi19, Th. D.1], for each f, g ∈ N, there exist
q, r ∈ N0 such that f = qg + r and 0 ≤ r < g, also implying −f = −qg − r with
| − r| < g, f = −q(−g) + r, and −f = q(−g) − r, showing that (7.14) can always
be satisfied (for f = 0, merely set q := r := 0).
(c) If F is a field, then F [X] is a Euclidean ring, where we can choose the degree map
as in Def. 7.1: If F is a field, then, for each 0 6= g ∈ F [X], we can apply Th. 7.9 to
obtain (7.14).
Definition 7.22. Let R be a commutative ring with unity.
(a) a ⊆ R is called an ideal in R if, and only if, the following two conditions hold:
(i) (a, +) is a subgroup of (R, +).
(ii) For each x ∈ R and each a ∈ a, one has ax ∈ a (which, as 1 ∈ R, is equivalent
to aR = a).
(b) An ideal a ⊆ R is called principal if, and only if, there exists a ∈ R such that
a = (a) := aR.
(c) R is called principal if, and only if, every ideal in R is principal.
(d) R is called a principal ideal domain if, and only if, R is both principal and integral.
Remark 7.23. Let R be a commutative ring with unity and let a ⊆ R be an ideal.
Since (a, +) is a subgroup of (R, +) and a, b ∈ a implies ab ∈ a, a is always a subring
of R. However, as 1 ∈ a implies a = R, (0) 6= a is a subring with unity if, and only if,
a = R.
7 COMMUTATIVE RINGS, POLYNOMIALS 97
The following Prop. 7.25 is the ideal analogue of [Phi19, Prop. 5.9] for vector spaces.
Proposition 7.25. Let R be a commutative ring with unity, A ⊆ R, and
S := a ⊆ R : A ⊆ a ∧ a is ideal in R .
is called the ideal generated by A (the notation (a) for a principal ideal with a ∈ R can
then be seen as a short form of ({a})). Moreover A is called a generating set of the
ideal b in R if, and only if, (A) = b.
Proof. Exercise.
Theorem 7.26. If R is a Euclidean ring, then R is a principal ideal domain.
Proof. Let R be a Euclidean ring with degree map deg : R \ {0} −→ N0 . Moreover, let
a ⊆ R be an ideal, a 6= (0). Let a ∈ a \ {0} be such that
(b) Z and F [X] (where F is a field) are principal ideal domains according to Th. 7.26,
since we know from Ex. 7.21(b),(c) that both rings are Euclidean rings.
(c) According to Rem. 7.23, a proper subring with unity S of the commutative ring
with unity R can never be an ideal in R (and then the unital ring monomorphism
ι : S −→ R, ι(x) := x, shows that the subring S = Im φ does not need to be an
ideal). For example, Z is a subring of Q, but not an ideal in Q; Q is a subring of
R, but not an ideal in R.
(d) The ring Z4 = {0, 1, 2, 3} is principal: (2) = {0, 2} and, if a is an ideal in Z4 with
3 ∈ a, then 3 + 3 = 2 ∈ a and 2 + 3 = 1 ∈ a, showing a = Z4 . However, since
2 · 2 = 0, Z4 is not a principal ideal domain.
Proposition 7.28. Let F be a field and let R 6= {0} be a ring with unity. Then every
unital ring homomorphism φ : F −→ R is injective (in particular, every unital ring
homomorphism between fields is injective).
Proof. According to Prop. 7.24(b), ker φ is an ideal in F . Thus, from Ex. 7.27(a),
ker φ = {0} or ker φ = F . As φ is unital, φ(1) = 1 6= 0, showing ker φ = {0}, i.e. φ is
injective.
We now want to show that the analogue of the fundamental theorem of arithmetic
[Phi19, Th. D.6] holds in every Euclidean ring (in particular, in F [X], if F is a field)
and even in every principal ideal domain. We begin with some preparations:
(a) We call x, y ∈ R associated if, and only if, there exists a ∈ R∗ such that x = ay.
7 COMMUTATIVE RINGS, POLYNOMIALS 100
∃ x1 a1 + · · · + xn an = d, (7.21)
x1 ,...,xn ∈R
which is known as Bézout’s identity (usually for n = 2). An important special case
is that, if 1 is a greatest common divisor of a1 , . . . , an , then
∃ x1 a1 + · · · + xn an = 1. (7.22)
x1 ,...,xn ∈R
7 COMMUTATIVE RINGS, POLYNOMIALS 102
(b) If F := R is a field, then F \ F ∗ = {0}, i.e. F has neither irreducible elements nor
prime elements.
(c) If R = Z, then R∗ = {−1, 1}, i.e. p ∈ Z is irreducible if, and only if, |p| is a prime
number in N (and, since Z is a principal ideal domain by Ex. 7.27(b), p ∈ Z is
irreducible if, and only if, it is prime).
(e) Suppose
R := Q + X 1 R[X] = {(fi )i∈N0 ∈ R[X] : f0 ∈ Q} .
Clearly, R is a subring of√R[X].√ Then X = X 1√∈ R is irreducible,
√ but X is not
prime, since X| 2X 2 = ( 2X)( 2X), but X 6 | 2X, since 2 ∈ / Q. Then, as a
consequence of Prop.
√ 7.31(b), R can not be a principle ideal√domain. Indeed, the
ideal a := (X) + ( 2X) is not principle in R: Clearly, X and 2X are not common
multiples of any noninvertible f ∈ R.
Lemma 7.33. Let R be a principal ideal domain. Let I 6= ∅ be an index set totally
ordered by ≤, let (ai )i∈I be an increasing
S family of ideals in R. According to Prop.
7.24(f), we can form the ideal a := i∈I ai . Then there exists i0 ∈ I such that a = ai0 .
7 COMMUTATIVE RINGS, POLYNOMIALS 103
Proof. As R is principal, there exists a ∈ R such that (a) = a. Since a ∈ a, there exists
i0 ∈ I such that a ∈ ai0 , implying (a) ⊆ ai0 ⊆ a = (a) and establishing the case.
a = p1 . . . pn . (7.23)
Proof. Let S be the set of all ideals (a) in R that are generated by elements 0 6= a ∈ R\R∗
that do not have a prime factorization as in (7.23). We need to prove S = ∅. Seeking a
contradiction, assume S 6= ∅ and note that set inclusion ⊆ provides a partial
S order on S.
If C 6= ∅ is a totally ordered subset of S, then, by Prop. 7.24(f), a := c∈C c is an ideal
in R and, by Lem. 7.33, there exists c ∈ C such that a = c, showing a ∈ S to provide an
upper bound for C. Thus, Zorn’s lemma [Phi19, Th. 5.22] applies, yielding a maximal
element m ∈ S (i.e. maximal in S with respect to ⊆). Then there exists a ∈ R \ R∗ such
that m = (a) and a does not have a prime factorization. In particular, a is not prime,
i.e. a must be reducible by Prop. 7.31(b). Thus, there exist a1 , a2 ∈ R \ (R∗ ∪ {0}) such
that a = a1 a2 . Then (a) ( (a1 ) and (a) ( (a2 ): Indeed, if a1 = ra = ra1 a2 with r ∈ R,
then Prop. 7.30(a) yields 1 = ra2 and a2 ∈ R∗ (and analogously for (a) = (a2 )). Due
to the maximality of m = (a) in S, we conclude (a1 ), (a2 ) ∈ / S. Thus, a1 , a2 both must
have prime factorizations, yielding the desired contradiction that a = a1 a2 must have a
prime factorization as well.
Remark 7.35. In particular, we obtain from Th. 7.34 that each k ∈ Z \ {−1, 0, 1} and
each f ∈ F [X] with F being a field and deg f ≥ 1 has a prime factorization. However,
for R = Z and R = F [X], we can prove the existence of a prime factorization for
each 0 6= a ∈ R \ R∗ in a simpler way and without making use of Zorn’s lemma: Let
deg : R \ {0} −→ N0 be the degree map as in Ex. 7.21(b),(c): We conduct the proof via
induction on deg(a) ∈ N: If a itself is prime, then there is nothing to prove, and this,
in particular, takes care of the base case of the induction. If a is not prime, then it is
reducible, i.e. a = a1 a2 with a1 , a2 ∈ R \ (R∗ ∪ {0}). In particular, 1 ≤ deg a1 , deg a2 <
deg a. Thus, by induction a1 , a2 both have prime factorizations, implying a to have a
prime factorization as well.
defines a permutation π ∈ Sn such that for each j ∈ {1, . . . , n}, rj and pπ(j) are associ-
ated.
Theorem 7.38. Let F be a field. Then the following statements are equivalent:
Proof. “(i) ⇔ (ii)”: If F is algebraically closed, then (ii) is given by Cor. 7.14. That (ii)
implies (i) is immediate.
“(i) ⇔ (iii)”: We already noted in Ex. 7.32(d) that each X −λ with λ ∈ F is irreducible,
i.e. each sX − λ with s ∈ F \ {0} is irreducible as well. If F is algebraically closed and
deg f > 1, then (7.9b) shows f to be reducible. Conversely, if (iii) holds, then an
7 COMMUTATIVE RINGS, POLYNOMIALS 105
induction over n = deg f ∈ N shows each f ∈ F [X] with deg f ∈ N to have a zero:
Indeed, f = aX + b with a, b ∈ R, a 6= 0, has −ba−1 as a zero, and, if deg f > 1, then f
is reducible, i.e. there exist g, h ∈ F [X] with 1 ≤ deg g, deg h < deg f such that f = gh.
By induction, g and h must have a zero, i.e. f must have a zero as well.
In [Phi19, Ex. 4.39], we saw how to obtain the field of rational numbers Q from the ring
of integers Z. The same construction actually still works if Z is replaced by an arbitrary
integral domain R, resulting in the so-called field of fractions of R (in the following
section, we will use the field of fractions of F [X] in the definition of the characteristic
polynomial of A ∈ L(V, V ), where V is a vector space over F ). This gives rise to the
following Th. 7.39.
Theorem 7.39. Let R be an integral domain. One defines the field of fractions6 F of
R as the quotient set F := (R × (R \ {0}))/ ∼ with respect to the following equivalence
relation ∼ on R × (R \ {0}), where the relation ∼ on R × (R \ {0}) is defined by
Proof. Exercise.
Example 7.40. (a) Q is the field of fractions of Z.
(b) If R is an integral domain, then we know from Prop. B.11 that R[X] is an integral
domain as well. The field of fractions of R[X] is denoted by R(X) and is called the
field of rational fractions over R.
Definition and Remark 7.41. Let R be an integral domain. We show that the field
of fractions of R (as defined in Th. 7.39) is the smallest field containing R: Let L be
some arbitrary field extension of R. Define
S := F ⊆ L : R ⊆ F ∧ F is subfield of L (7.31)
and \
K := F. (7.32)
F ∈S
According to [Phi19, Ex. 4.36(d)], K is a field, namely the smallest subfield of L, con-
taining R. If F (R) denotes the field of fractions of R, then
a
φ : F (R) −→ K, φ := ab−1 (7.33)
b
constitutes an isomorphism: Indeed, φ is well-defined, since
a c a c
= ⇒ ad = bc ⇒ φ = ab−1 = cd−1 = φ ,
b d b d
and since the definition of S guarantees ab−1 ∈ F for each a, b ∈ R with b 6= 0 and each
F ∈ S; φ is a homomorphism, since, for each a, b, c, d ∈ R with b, d 6= 0,
a c
−1 −1 −1 −1 ad + bc a c
φ +φ = ab + cd = (ad + bc)b d = φ =φ + ,
b d bd b d
a c ac a c
φ ·φ = ab−1 cd−1 = ac(bd)−1 = φ =φ · ;
b d bd b d
φ is injective, since a, b ∈ R\{0} implies φ( ab ) = ab−1 =
6 0; φ is surjective, since Im φ ⊆ K
is itself a subfield of L that contains R, implying K ⊆ Im φ and Im φ = K.
V is a vector space over the field F , then the eigenvalues of A ∈ L(V, V ) are precisely
the zeros of the polynomial function
pA : F −→ F, pA (t) := det(t Id −A).
In order to make the results of the previous section available, instead of associating a
polynomial function with A ∈ L(V, V ), we will associate an actual polynomial (this also
avoids issues related to the fact that, in the case of finite fields, different polynomials
can give rise to the same polynomial function according to Th. 7.17(b)). The idea is
to replace t 7→ det(t Id −A) with det(X Idn −MA ), where MA is the matrix of A with
respect to an ordered basis of V . If V is a vector space over the field F , then the entries
of the matrix X Idn −MA are elements of the ring F [X]. However, we defined deter-
minants only for matrices with entries in fields. Thus, to make the following definition
consistent with our definition of determinants, we consider the elements of X Idn −MA
to be elements of F (X), the field of rational fractions over F (cf. Ex. 7.40(b)):
Definition 8.1. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Moreover, let B be an ordered basis of A and let MA ∈ M(n, F ) be the matrix
of A with respect to B. Since F (X) is a field extension of F , we may consider MA as
an element of M(n, F (X)). We define
χA := det(X Idn −MA ) ∈ F [X]
to be the characteristic polynomial of A.
Proposition 8.2. Let V be a vector space over the field F , dim V = n ∈ N, and
A ∈ L(V, V ).
(a) The characteristic polynomial χA is well-defined by Def. 8.1, i.e. if B1 are B2 are
ordered bases of V and M1 , M2 are the matrices of A with respect to B1 , B2 , respec-
tively, then
χ1 := det(X Idn −M1 ) = χ2 := det(X Idn −M2 ).
(b) The spectrum σ(A) is precisely the set of zeros of χA .
proving (a).
(b): If λ ∈ F , then we have
Th. 6.9(a)
λ ∈ σ(A) ⇔ ǫλ (χA ) = det(λ Id −A) = 0,
thereby establishing the case.
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 108
Remark 8.3. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ).
(a) If B is an ordered basis of V , the matrix (aji ) ∈ M(n, F ) represents A with respect
to B, and we let (cji ) := (X Idn −(aji )), then
n n
Y X Y
χA = det X Idn −(aji ) = (X − aii ) + sgn(π) ciπ(i) (8.1)
i=1 π∈Sn \{Id} i=1
(b) Some authors prefer to define the characteristic polynomial of A as the polynomial
While χ̃A still has the property that σ(A) is precisely the set of zeros of χ̃A , χ̃A is
monic only for n even. On the other hand, χ̃A has the advantage that ǫ0 (χ̃A ) =
det(A).
(c) According to Prop. 8.2(b), the task of finding the eigenvalues of A is the same as
the task of finding the zeros of the characteristic polynomial χA . So one might hope
that only particularly simple polynomials can occur as characteristic polynomials.
However, this is not the case: Indeed, every monic polynomial of degree n occurs
as a characteristic polynomial: Let a1 , . . . , an ∈ F , and
n
X
n
f := X + ai X n−i .
i=1
(d) In Ex. 6.5(a), we saw that the considered linear endomorphism A had eigenvalues
for F = C, but no eigenvalues for F = R, which we can now relate to the fact that
χA = X 2 + 1 has no zeros over R, but χA = (X − i)(X + i) with zeros ±i over C.
(e) Given that eigenvalues are precisely the zeros of the characteristic polynomial, and
given that, according to (c), every monic polynomial of degree n can occur as the
characteristic polynomial of a matrix, it is not surprising that computing eigenvalues
is, in general, a difficult task, even if F is algebraically closed, guaranteeing the
eigenvalues’ existence. It is a result of Algebra that, for a generic polynomial
of degree at least 5, it is not possible to obtain its zeros using so-called radicals
(which are, roughly, zeros of polynomials of the form X k − λ, k ∈ N, λ ∈ F ,
see, e.g., [Bos13, Def. 6.1.1] for a precise definition) in finitely many steps (cf.,
e.g., [Bos13, Cor. 6.1.7]). In practice, one often has to make use of approximative
numerical methods (see, e.g., [Phi21, Sec. 7]). Having said that, let us note that
the problem of computing eigenvalues is, indeed, typically easier than the general
problem of computing zeros of polynomials. This is due to the fact that the difficulty
of computing the zeros of a polynomial depends tremendously on the form in which
the polynomial P is given: It is typically hard if the polynomial is expanded into
n i
the form f = i=0 ai X , but itQis easy (trivial, in fact), if the polynomial is
n
given in a factored form f = c i=1 (X − λi ). If the characteristic polynomial
is given implicitly by a matrix, one is, in general, somewhere between the two
extremes. In particular, for a large matrix, it usually makes no sense to compute
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 110
the characteristic polynomial in its expanded form (this is an expensive task in itself
and, in the process, one even loses the additional structure given by the matrix). It
makes much more sense to use methods tailored to the computation of eigenvalues,
and, if available, one should make use of additional structure a matrix might have.
Theorem 8.4. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Then there exists an ordered basis B of V such that the matrix M of A with
respect to B is triangular if, and only if, there exist distinct λ1 , . . . , λl ∈ F , l ∈ N, and
n1 , . . . , nl ∈ N with
X l Yl
ni = n ∧ χ A = (X − λi )ni . (8.2)
i=1 i=1
∀ ni = ma (λi ) (8.3)
i∈{1,...,l}
where each λi occurs precisely ni times on the diagonal. Moreover, one then has
Y l
Y
det A = det M = λ ma (λ)
= λni i . (8.5)
λ∈σ(A) i=1
Proof. If there exists a basis B of V such that the matrix M = (mji ) of A with respect
to B is triangular, then
n
Def. 8.1 Cor. 4.26
Y
χA = det(X Idn −M ) = (X − mii ).
i=1
Combining factors, where the mii are equal, yields (8.2). For the converse, we assume
(8.2) and prove the existence of the basis B such that M has the from of (8.4) via
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 111
induction on n. For n = 1, there is nothing to prove. Thus, let n > 1. Then λ1 must be
an eigenvector of A with some eigenvector 0 6= v1 ∈ V . Then, if B1 := (v1 , . . . , vn ) is an
ordered basis of V , the matrix M1 of A with respect to B1 has the block form
λ1 ∗
M1 = , N ∈ M(n − 1, F ).
0 N
According to Th. 4.25, we obtain
l
Y l
Y
ni n1 −1
(X − λi ) = χA = (X − λ1 ) χN ⇒ χN = (X − λ1 ) (X − λi )ni .
i=1 i=2
Let U := h{v1 }i and W := V /h{v1 }i. Then dim W = n−1 and, by [Phi19, Cor. 6.13(a)],
BW := (v2 + U, . . . , vn + U ) is an ordered basis of W . Let A1 ∈ L(W, W ) be such that,
with respect to BW , A1 has the matrix N . Then, by induction hypothesis, there exists
an ordered basis CW = (w2 + U, . . . , wn + U ) of W (w2 , . . . , wn ∈ V ), such that, with
respect to CW , the matrix N1 ∈ M(n − 1, F ) of A1 has the form (8.4), except that λ1
occurs precisely n1 − 1 times on the diagonal. That N1 is the matrix of A1 means, for
N1 = (νji )(j,i)∈{2,...,n}2 that
n
X
∀ A1 (wi + U ) = νji (wj + U ).
i∈{2,...,n}
j=2
implying
1 0 λ1 ∗ 1 0 λ1 ∗
M= = .
0 T1−1 0 N 0 T1 0 N1
It remains to verify (8.3). Letting w1 := v1 , we have B = (w1 , . . . , wn ). For each
k ∈ {1, . . . , n1 } and the standard column basis vector ek , we obtain
m1k
..
. k−1
m
X
(M − λ1 Idn ) ek = (mji ) − λ1 (δji ) ek = k−1,k ⇒ (A − λ1 Id)wk =
mαk wα ,
0 α=1
.
..
0
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 112
m1k
..
.
mk−1,k
k−1
X
(M − λ1 Idn ) ek = λ − λ1 ⇒ (A − λ1 Id)wk = (λ − λ1 ) wk + mαk wα ,
0
α=1
.
..
0
where r(λ1 ) is as defined in Rem. 6.11. Now note that λ1 was chosen arbitrarily is the
above argument. The same argument shows the existence of a basis B ′ such that λi ,
i ∈ {1, . . . , l}, appears in the upper left block of M . In particular, we obtain ni = ma (λi )
for each i ∈ {1, . . . , l}.
Finally, if (8.4) holds, then (8.5) is immediate from Cor. 4.26.
Corollary 8.5. Let V be a vector space over the algebraically closed field F , dim V =
n ∈ N, and A ∈ L(V, V ). Then there exists an ordered basis B of V such that the
matrix M of A with respect to B is triangular. Moreover, (8.5) then holds, i.e. det A is
the product of the eigenvalues of A, where each eigenvalue is multiplied according to its
algebraic multiplicity.
Proof. This is immediate from combining Th. 8.4 with Th. 7.38(ii).
Theorem 8.6. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). There exists a unique monic polynomial 0 6= µA ∈ F [X] (called the minimal
polynomial of A), satisfying the following two conditions:
satisfies both (i) and (ii). We need to show g 6= 0, i.e. a 6= (0) = {0}. To this end,
2
note that, since dim L(V, V ) = n2 , the n2 + 1 maps Id, A, A2 , . . . , An ∈ L(V, V ) must
be linearly dependent, i.e. there exist c0 , . . . , cn2 ∈ F , not all 0, such that
n2
X
0= ci Ai ,
i=0
Pn2 i
showing 0 6= f := i=0 ci X ∈ a. If h ∈ F [X] also satisfies (i) and (ii), then h| g
and g| h, implying g, h to be associated. In consequence, µA is the unique monic such
element of F [X].
Remark 8.7. Let F be a field.
(a) We extend Def. 6.15 to the characteristic polynomial and to the minimal polynomial:
Let n ∈ N. Consider the vector space V := F n over the field F . If M ∈ M(n, F ),
then χM and µM denote the characteristic polynomial and the minimal polynomial
of the linear map AM that M represents with respect to the standard basis of F n .
(b) Let V be a vector space over the field F , dim V = n ∈ N. As M(n, F ) is a ring
extension of F , we can plug M ∈ M(n, F ) into elements of F [X]. Moreover, if
f ∈ F [X], A ∈ L(V, V ) and M ∈ M(n, F ) represents A with respect to a basis B
of V , then, due to [Phi19, Th. 7.10(a)], ǫM (f ) represents ǫA (f ) with respect to B.
Example 8.8. Let F be a field. Consider
0 0 1
M := 0 0 0 ∈ M(3, F ) :
0 0 0
We claim that
Pthe minimal polynomial is µM = X 2 : Indeed, M 2 = 0 implies ǫM (X 2 ) = 0,
n
and, if f = i=0 fi X i ∈ F [X], n ∈ N, then ǫM (f ) = f0 Id3 +f1 M . Thus, if ǫM (f ) = 0,
then f0 = f1 = 0, implying X 2 | f and showing µM = X 2 .
Theorem 8.9 (Cayley-Hamilton). Let V be a vector space over the field F , dim V =
n ∈ N, and A ∈ L(V, V ). If χA and µA denote the characteristic and the minimal
polynomial of A, respectively, then the following statements hold true:
(c) λ ∈ σ(A) if, and only if, µA (λ) := ǫλ (µA ) = 0, i.e. the eigenvalues of A are precisely
the zeros of the minimal polynomial µA .
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 114
Proof. (a): Let B be an ordered basis of V and let (mji ) := MA be the matrix of A with
respect to B. Moreover, let N be the adjugate matrix of X Idn −MA , i.e., up to factors
of ±1, N contains the determinants of the (n − 1) × (n − 1) submatrices of X Idn −MA .
According to Th. 4.29(a), we then have
χA Idn = det(X Idn −MA ) Idn = N (X Idn −MA ). (8.6)
Since X Idn −MA contains only entries of degree at most 1 (deg(X − mii ) = 1, all other
entries having degree 0 or degree −∞), each entry nji of N has degree at most n − 1,
i.e.
n−1
X
∀ 2
∃ n ji = bk,j,i X k .
(j,i)∈{1,...,n} b0,j,i ,...,bn−1,j,i ∈F
k=0
Pn−1
If, for each k ∈ {0, . . . , n − 1}, we let Bk := (bk,j,i ) ∈ M(n, F ), then N = k=0 Bk X k .
Plugging this into (8.6) yields
χA Idn = (B0 + B1 X + · · · + Bn−1 X n−1 ) (X Idn −MA )
= −B0 MA + (B0 − B1 MA ) X + (B1 − B2 MA ) X 2
+ · · · + (Bn−2 − Bn−1 MA ) X n−1 + Bn−1 X n . (8.7)
Pn−1
Writing χA = X n + i=0 ai X i with a0 , . . . , an−1 ∈ F , the coefficients in front of each
i
X in (8.7) must agree: Indeed, in each entry of the respective matrix, we have an
element of F [X] and, in each entry, the coefficients of X i must agree (due to the linear
independence of the X i ) – hence, the matrix coefficients of X i in (8.7) must agree as
well. This yields
a0 Idn = −B0 MA ,
a1 Idn = B0 − B1 MA ,
..
.
an−1 Idn = Bn−2 − Bn−1 MA ,
Idn = Bn−1 .
Thus, ǫMA (χA ) turns out to be the telescoping sum
n−1
X n−1
X
ǫMA (χA ) = (MA )n + ai (MA )i = ai Idn (MA )i + Idn (MA )n
i=0 i=0
n−1
X
= −B0 MA + (Bi−1 − Bi MA ) (MA )i + Bn−1 (MA )n = 0.
i=1
8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 115
(a) One has χA | (µA )n and, in particular, each irreducible factor of χA must be an
irreducible factor of µA .
(b) There exists an ordered basis B of V such that the matrix M of A with respect to
B is triangular if, and only if, there exist λ1 , . . . , λl ∈ F , l ∈ N, and n1 , . . . , nl ∈ N
with
Yl
µA = (X − λi )ni .
i=1
Proof. (a): Let M ∈ M(n, F ) be a matrix representing A with respect to some ordered
basis of V . Let G be an algebraically closed field with F ⊆ G (cf. Def. 7.11). We can
consider M as an element of M(n, G) and, then, σ(M ) is precisely the set of zeros of
both χM = χA and µA in G. As G is algebraically closed, for each λ ∈ σ(M ), there
exists mλ , nλ ∈ N such that mλ ≤ nλ ≤ n and
Y Y
µA = (X − λ)mλ | χA = (X − λ)nλ .
λ∈σ(M ) λ∈σ(M )
Letting q := λ∈σ(M ) (X − λ)n mλ −nλ , we have q ∈ G[X] as well as q = (µA )n (χA )−1 ,
Q
i.e. q χA = (µA )n , proving χA | (µA )n (in both G[X] and F [X], since q = (µA )n (χA )−1 ∈
F (X) ∩ G[X] = F [X]).
(b) follows by combining (a) with Th. 8.4.
Theorem 8.13. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Suppose the minimal polynomial µA can be written in the form µA = g1 · · · gl ,
l ∈ N, where g1 , . . . , gl ∈ F [X] are such that, whenever i 6= j, then 1 is a greatest
common divisor of gi and gj . Then
l
M l
M
V = ker gi (A) = ker ǫA (gi ).
i=1 i=1
Proof. Define
l
Y
∀ hi := gk .
i∈{1,...,l}
k=1,
k6=i
and
l
X
∀ v = Id v = ǫA (1)v = ǫA (fi ) ǫA (hi ) v. (8.8)
v∈V
i=1
We verify that, for each i ∈ {1, . . . , l}, ǫA (fi ) ǫA (hi ) v ∈ ker ǫA (gi ): Indeed, since gi hi =
µA , one has
ǫA (gi ) ǫA (fi ) ǫA (hi ) v = ǫA (fi ) ǫA (µA ) v = 0.
Pl
Thus, (8.8) proves i=1 ker ǫA (gi ). According to Prop. 5.2(iii), it remains to show
X
∀ U := ker ǫA (gi ) ∩ ker ǫA (gj ) = {0}.
i∈{1,...,l}
j∈{1,...,l}\{i}
To this end, fix i ∈ {1, . . . , l} and note ǫA (gi )(U ) = {0} = ǫA (hi )(U ). On the other
hand, 1 is a greatest common divisor of gi , hi , i.e. (7.22) provides ri , si ∈ F [X] such that
1 = ri gi + si hi , yielding
{0} = ǫA (ri )ǫA (gi ) + ǫA (si )ǫA (hi ) (U ) = ǫA (1)(U ) = Id(U ) = U,
thereby completing the proof.
Theorem 8.14. Let V be a vector space over the field F , dim V = n ∈ N, and A ∈
L(V, V ). Then A is diagonalizable if, and only if, there exist distinct λ1 , . . . , λl ∈ F ,
l ∈ N, such that µA = li=1 (X − λi ).
Q
According to Th. 8.9(b), we have µA | g. Since, by Th. 8.9(c), each λ ∈ σ(A) is a zero
of µA , we have deg µA = Q deg g. As both µA and g are monic, this means µA = g.
Conversely, suppose µA = li=1 (X − λi ) with distinct λ1 , . . . , λl ∈ F , l ∈ N. Then, by
Th. 8.13,
M l l
M M l
V = ker ǫA (X − λi ) = ker(A − λi Id) = EA (λi ),
i=1 i=1 i=1
(b) Let V be vector space over the field F , dim V = n ∈ N, and let P ∈ L(V, V ) be a
projection, i.e. P 2 = P . Then P 2 − P = 0 and µP | (X 2 − X) = X(X − 1). Thus,
we obtain the three cases
X
for P = 0,
µP = X − 1 for P = Id,
X(X − 1) otherwise.
(c) Let V be vector space over the field F , dim V = n ∈ N, and let A ∈ L(V, V ) be a so-
called involution i.e. A2 = Id. Then A2 − Id = 0 and µA | (X 2 − 1) = (X + 1)(X − 1).
Thus, we obtain the three cases
X − 1
for A = Id,
µA = X + 1 for A = − Id,
(X + 1)(X − 1) otherwise.
If A 6= ± Id, then, according to Th. 8.14, A is diagonalizable if, and only if, 1 6= −1,
i.e. if, and only if, char F 6= 2. Even though A 6= ± Id is not diagonalizable for
char F = 2, there still exists an ordered basis B of V such that the matrix M of A
with respect to B is triangular (all diagonal elements being 1), due to Prop. 8.12(b).
Proposition 8.16. Let V be a vector space over the real numbers R, dim V = n ∈ N,
and A ∈ L(V, V ).
(a) There exists a vector subspace U of V such that dim U ∈ {1, 2} and U is A-invariant
(i.e. A(U ) ⊆ U ).
Proof. Exercise.
9 JORDAN NORMAL FORM 119
(a) A vector subspace U of V is called A-cyclic if, and only if, U is A-invariant (i.e.
A(U ) ⊆ U ) and
∃ U = {Ai v : i ∈ N0 } .
v∈V
(b) V is called A-irreducible if, and only if, V = U1 ⊕ U2 with U1 , U2 both A-invariant
vector subspaces of V , implies U1 = V or U2 = V .
Proposition 9.2. Let V be a finite-dimensional vector space over the field F , r ∈ N,
A ∈ L(V, V ). Suppose V is A-cyclic,
V = {Ai v : i ∈ N0 } , v ∈ V,
and r
X
∃ µA = ai X i , deg µA = r.
a0 ,...,ar ∈F
i=0
Proof. To show that v, Av, . . . , Ar−1 v are linearly independent, let λ0 , . . . , λr−1 ∈ F such
that
r−1
X
0= λi Ai v. (9.2)
i=0
9 JORDAN NORMAL FORM 120
Pr−1
Define g := i=0 λi X i ∈ F [X]. We need to show g = 0. Indeed, we have
(9.2)
∀ ǫA (g) Ai v = Ai ǫA (g) v = 0,
i∈N0
which, as the Ai generate V , implies ǫA (g) = 0. Since deg g < deg µA , this yields g = 0
and the linear independence of v, Av, . . . , Ar−1 v. We show hBi = V next: Seeking a
contradiction, assume Am ∈ / hBi, where we choose m ∈P N to be minimal. Then m ≥ r.
m−1 r−1
Then there exist λ0 , . . . , λr−1 ∈ F such that A v = i=0 λi Ai v. Then m > r yields
the contradiction r
X
m
A v= λi Ai v ∈ hBi.
i=1
AU : U −→ U, AU := A↾U ,
AV /U : V /U −→ V /U, AV /U (v + U ) := Av + U.
AV /U (v + U + w + U ) = A(v + w) + U = Av + U + Aw + U
= AV /U (v + U ) + AV /U (w + U ),
AV /U (λv + U ) = A(λv) + U = λ(Av + U ) = λ AV /U (v + U ).
showing M to have the claimed form with MU being the matrix of AU with respect to
BU . Moreover, by [Phi19, Cor. 6.13(a)], BV /U is, indeed, an ordered basis of V /U and
n
X n
X
∀ AV /U (vi + U ) = Avi + U = mji vj + U = mji vj + U,
i∈{l+1,...,n}
j=1 j=l+1
(c): Since ǫA (µA )v = 0 for each v ∈ V , ǫA (µA )v = 0 for each v ∈ U , proving ǫAU (µA ) = 0
and µAU | µA . Similarly, ǫAV /U (µA )(v + U ) = ǫA (µA )v + U = 0 for each v ∈ V , proving
ǫAV /U (µA ) = 0 and µAV /U | µA .
Comparing Prop. 9.3(b),(c) above, one might wonder if the analogon of Prop. 9.3(b)
also holds for the minimal polynomials. The following Ex. 9.4 shows that, in general, it
does not:
Example 9.4. Let F be field and V := F 2 . Then, for A := Id ∈ L(V, V ), µA = X − 1.
If U is an arbitrary 1-dimensional subspace of V , then, using the notation of Prop. 9.3,
µAU = X − 1 = µAV /U , i.e. µAU µAV /U = (X − 1)2 6= µA .
Lemma 9.5. Let V be a vector space over the field F , dim V = n ∈ N, A ∈ L(V, V ).
Suppose V is A-cyclic.
Proof. (a): To prove (i), let U := Im h(A) = ǫA (h)(V ) and define AU , AV /U as in Prop.
9.3. As V is A-cyclic (say, generated by v ∈ V ), U is AU -cyclic (generated by ǫA (h)v)
and V /U is AV /U -cyclic (generated by v + U ). Thus, Prop. 9.2 yields
χA = µA , χA U = µA U , χAV /U = µAV /U ,
implying
Prop. 9.3(b)
gh = µA = χA = χAU χAV /U = µAU µAV /U . (9.3)
If v ∈ V , then ǫA (g)ǫA (h)v = ǫA (µA )v = 0, showing ǫAU (g) = 0 and µAU | g, deg µAU ≤
deg g. Similarly, ǫAV /U (h)(v + U ) = ǫA (h)v + U = 0 (since ǫA (h)v ∈ U ), proving
ǫAV /U (h) = 0 and µAV /U | h, deg µAV /U ≤ deg h. Since we also have deg g + deg h =
deg µAU + deg µAV /U by (9.3), we must have deg g = deg µAU and deg h = deg µAV /U .
Thus,
dim U = deg χAU = deg µAU = deg g = deg µA − deg h.
According to the isomorphism theorem [Phi19, Th. 6.16(a)], we know
V / ker ǫA (h) ∼
= Im ǫA (h) = U,
implying
deg h = deg µA − dim U = dim V − dim U = dim V − dim V / ker ǫA (h)
= dim V − dim V − dim ker ǫA (h) = dim ker ǫA (h),
thereby proving (i). We proceed to prove (ii): Since ǫA (h)(Im ǫA (g)) = {0}, we have
Im ǫA (g) ⊆ ker ǫA (h). To prove equality, we note that (i) must also hold with g instead
of h and compute
(i)
dim Im ǫA (g) = dim V − dim ker ǫA (g) = deg χA − deg g = deg µA − deg g
(i)
= deg h = dim ker ǫA (h),
completing the proof of (ii).
(b): Let λ ∈ σ(A). Then λ is a zero of µA and, by Prop. 7.13, there exists q ∈ F [X]
such that µA = (X − λ) q. Hence,
(a)(i)
dim EA (λ) = dim ker(A − λ Id) = deg(X − λ) = 1,
proving (b).
9 JORDAN NORMAL FORM 123
Lemma 9.6. Let V be a vector space over the field F , dim V = n ∈ N, A ∈ L(V, V ).
Suppose µA = g r , where g ∈ F [X] is irreducible and r ∈ N. If U is an A-cyclic
subspace of V such that U has maximal dimension, then µA↾U = µA and there exists an
A-invariant subspace W such that V = U ⊕ W .
ǫA (g r−s ) ǫA (g s ) v = ǫA (µA )v = 0,
showing
ǫA (g s ) v = ǫA (µ) v ∈ U ∩ ker ǫA (g r−s ). (9.4)
Since µ| µA = µAU , we can apply Lem. 9.5(a)(ii) to obtain
U ∩ ker ǫA (g r−s ) = ker ǫAU (g r−s ) = Im ǫAU (g s ),
9 JORDAN NORMAL FORM 124
ǫA (g s ) uv = ǫA (g s ) v. (9.5)
w0 := v − uv , W := h{Ai w0 : i ∈ N0 }i.
proving V = U + W . Since
(9.5)
ǫA (g s ) w0 = ǫA (g s ) v − ǫA (g s ) uv = 0,
Theorem 9.7. Let V be a vector space over the field F , dim V = n ∈ N, A ∈ L(V, V ).
Proof. (a): We must have µA = g r with r ∈ N and g ∈ F [X] irreducible, since, otherwise
V is A-reducible by Th. 8.13. In consequence, V is A-cyclic by Lem. 9.6.
(b): According to (a), it only remains to show that V being A-cyclic implies V to be
A-irreducible. Thus, let V be A-cyclic and V = V1 ⊕ V2 with A-invariant subspaces
V1 , V2 ⊆ V . Then, by Prop. 9.3(c), there exist 1 ≤ r1 , r2 ≤ r such that µAV1 = g r1 and
µAV2 = g r2 , where we choose V1 such that r2 ≤ r1 . Then ǫA (µAV1 )(V1 ) = ǫA (µAV1 )(V2 ) =
9 JORDAN NORMAL FORM 125
We now have all preparations in place to prove the existence of normal forms having
matrices with block diagonal form, where the blocks all look like the matrix of (9.1).
However, before we state and prove the corresponding theorem, we still provide a propo-
sition that will help to address the uniqueness of such normal forms:
Proposition 9.8. Let V be a vector space over the field F , dim V = n ∈ N, A ∈
L(V, V ). Moreover, suppose we have a decomposition
l
M
V = Ui , l ∈ N, (9.6)
i=1
where the U1 , . . . , Ul all are A-invariant and A-irreducible (and, thus, A-cyclic by Th.
9.7(a)) subspaces of V .
and the decomposition of (9.6) is a refinement of (9.7) in the sense that each Ui is
r
contained in some ker ǫA (gj j ).
(b) If µA = g r with g ∈ F [X] irreducible, r ∈ N, then, for each i ∈ {1, . . . , l}, we have
µAUi = g ri , dim Ui = ri deg g, with 1 ≤ ri ≤ r. If
lk := # i ∈ {1, . . . , l} : µAUi = g k
∀
k∈N0
and r
X
s
∀ dim Im ǫA (g ) = (deg g) lk (k − s). (9.9)
s∈{0,...,r}
k=s
implying
l
M
ker ǫA (h) = ker ǫAUi (h), (9.10)
i=1
due to the fact that ǫA (h) v = 0 if, and only if, ǫA (h) ui = 0 for each i ∈ {1, . . . , l}. In
the case, where µAUi = g ri with ri ≤ s, we have ker ǫAUi (h) = Ui , i.e.
while, in the case, where µAUi = g ri with s < ri , we can apply Lem. 9.5(a)(i) to obtain
proving (9.9). To see that the lk are uniquely determined by A, observe lk = 0 for k > r
and k = 0, and, for 1 ≤ k ≤ r, (9.9) implies the recursion
lr = (deg g)−1 dim Im ǫA (g r−1 ), (9.12a)
r
X
∀ ls = (deg g)−1 dim Im ǫA (g s−1 ) − lk (k − (s − 1)), (9.12b)
s∈{1,...,r−1}
k=s+1
(b) If
m
M
V = Wi
i=1
Proof. (a): The existence of the claimed decomposition was already shown in Th. 9.7(c)
and the remaining statements are then provided by Prop. 9.2, where the A-invariance
of the Ui yields the block diagonal structure of M .
(b): We divide the proof into three steps:
Step 1: Assume U and W to be A-cyclic subspaces of V such that 1 ≤ s := dim U =
dim W ≤ n and let u ∈ U , w ∈ W be such that BU := {u, . . . , As−1 u}, BW :=
{w, . . . , As−1 w} are bases of U, W , respectively. Define S ∈ L(U, W ) by letting
∀ S(Ai u) := Ai w
i∈{0,...,s−1}
(then S is invertible, as it maps the basis BU onto the basis BW ). We verify SA = AS:
Indeed,
Ps
and, letting a0 , . . . , as ∈ F be such that µAU = µAW = i=0 ai X i (cf. Prop. 9.2),
s−1
! s−1
X X
s−1 s i
SA(A u) = S(A u) = S − ai A u =− ai Ai w
i=0 i=0
∀ lk = #I(U, k) = #I(W, k)
k∈N0
9 JORDAN NORMAL FORM 129
and, in particular, m = l, and the existence of π ∈ Sl such that, for each k ∈ N and each
i ∈ I(U, k), one has π(i) ∈ I(W, k). Thus, by Step 1, for each i ∈ {1, . . . , l}, there exists
an invertible Si ∈ L(Ui , Wπ(i) ) such that Si A = ASi . Define T ∈ GL(V ) by letting, for
each v ∈ V such that v = li=1 ui with ui ∈ Ui , T v := li=1 Si ui . Then, clearly, T
P P
satisfies (9.13).
Step 3: We now consider the general situation of (b). Let µA = g1r1 · · · gsrs be the prime
factorization of µA (i.e. each gi ∈ F [X] is irreducible, r1 , . . . , rs ∈ N, s ∈ N). According
to Prop. 9.8(a), there exist sets I1 , . . . , Is ⊆ {1, . . . , l} and J1 , . . . , Js ⊆ {1, . . . , m} such
that M M
∀ ker ǫA (gkrk ) = Ui = Wi .
k∈{1,...,s}
i∈Ik i∈Jk
Then, by Step 2, we have #Ik = #Jk for each k ∈ {1, . . . , s}, implying
s
X s
X
l= #Ik = #Jk = m
k=1 k=1
and the existence of a permutation π ∈ Sl such that π(Ik ) = Jk for each k ∈ {1, . . . , s}.
Again using Step 2, we can now, in addition, choose π ∈ Sl such that
∀ ∃ Ti invertible ∧ Ti A = ATi .
i∈{1,...,l} Ti ∈L(Ui ,Wπ(i) )
Pl
Define T ∈ GL(V ) by letting, for each v ∈ V such that v = i=1 ui with ui ∈ Ui ,
T v := li=1 Ti ui . Then, clearly, T satisfies (9.13).
P
The following Prop. 9.10 can sometimes be helpful in actually finding the Ui of Th.
9.9(a):
(b) For each i ∈ {1, . . . , m}, in the decomposition V = lk=1 Uk of Th. 9.9(a), there
L
exists at least one Uk with Uk ⊆ Vi and dim Uk = deg(giri ).
(c) As in (b), let V = lk=1 Uk be the decomposition of Th. 9.9(a). Then, for each i ∈
L
{1, . . . , m} and each k ∈ {1, . . . , l} such that Uk ⊆ Vi , one has dim Uk ≤ deg(giri ).
9 JORDAN NORMAL FORM 130
s
Proof. (a): Let i ∈ {1, . . . , m}. According to Prop. 9.3(c), we have Pm µAVi = gi with
1 ≤ s ≤ ri . For each v ∈ V , there are v1 , . . . , vm ∈ V such that v = i=1 vi and vi ∈ Vi ,
implying
ri−1 s ri+1
ǫA (g1r1 · · · gi−1 rm
gi gi+1 · · · gm )v = 0
and ri ≤ s, i.e. ri = s.
(b): Let i ∈ {1, . . . , m}. According to (a), we have µAVi = giri . Using the uniqueness
of the decomposition V = lk=1 Uk is the sense of Th. 9.9(b) together with Lem. 9.6,
L
there exists Uk with Uk ⊆ Vi such that Uk is an A-cyclic subspace of Vi of maximal
dimension. Then Lem. 9.6 also yields µAUk = µAVi = giri , which, as Uk is A-cyclic,
implies dim Uk = deg(giri ).
(c): Let i ∈ {1, . . . , m}. Again, we know µAVi = giri by (a). If k ∈ {1, . . . , l} is such that
Uk ⊆ Vi , then Prop. 9.3(c) yields µAUk | µAVi = giri , showing dim Uk ≤ deg(giri ).
Remark 9.11. In general, in the situation of Prop. 9.10, the knowledge of µA and
χA does not suffice to uniquely determine the normal form of Th. 9.9(a). In general,
for each gi and each s ∈ {1, . . . , ri }, one needs to determine dim Im ǫA (gis ), which then
determine the numbers lk of (9.9), i.e. the number of subspaces Uj with µAUj = gik . This
then determines the matrix M of Th. 9.9(a) (up to the order of the diagonal blocks),
since one obtains precisely lk many blocks of size k deg gi and the entries of these blocks
are given by the coefficients of gik .
Example 9.12. (a) Let F be a field and V := F 6 . Assume A ∈ L(V, V ) has
χA = (X − 2)2 (X − 3)4 , µA = (X − 2)2 (X − 3)3 .
We want to determine the decomposition V = li=1 Ui of Th. 9.9(a) and the matrix
L
M with respect to the corresponding basis of V given in Th. 9.9(a): We know
from Prop. 9.10(b) that we can choose U1 ⊆ ker(A − 2 Id)2 with dim U1 = 2 and
U2 ⊆ ker(A−3 Id)3 with dim U2 = 3. As dim V = 6, this then yields V = U1 ⊕U2 ⊕U3
with dim U3 = 1. We also know σ(A) = {2, 3} and, according to Th. 8.4, the
algebraic multiplicities are ma (2) = 2, ma (3) = 4. Moreover,
4 = ma (3) = dim ker(A − 3 Id)4 = dim ker ǫA ((X − 3)4 ),
implying U3 ⊆ ker(A − 3 Id)4 . As (X − 2)2 = X 2 − 4X + 4 and (X − 3)3 =
X 3 − 9X 2 + 27X − 27, M has the form
0 −4
1 4
0 0 27
M = .
1 0 −27
0 1 9
3
9 JORDAN NORMAL FORM 131
= X 2 (X − 1) − 1 − X − X = X 3 − X 2 − 2X − 1 = X 3 + X 2 + 1.
Since χA = X(X 2 + X) + 1 and χA = (X + 1) X 2 + 1, χA is irreducible. Thus
χA = µA , V is A-irreducible and A-cyclic and the matrix of A with respect to the
basis of Th. 9.9(a) is (again making use of −1 = 1 in F )
0 0 1
M = 1 0 0 .
0 1 1
For fields that are algebraically closed, we can improve the normal form of Th. 9.9 to
the so-called Jordan normal norm:
Theorem 9.13 (Jordan Normal Form). Let V be a vector space over the field F ,
dim V = n ∈ N, A ∈ L(V, V ). Assume there exist distinct λ1 , . . . , λm ∈ K such that
m
Y
µA = (X − λi )ri , σ(A) = {λ1 , . . . , λm }, m, r1 , . . . , rm ∈ N (9.14)
i=1
Moreover, for each k ∈ {1, . . . , l}, there exists vk ∈ Uk and i = i(k) ∈ {1, . . . , m}
such that
Uk ⊆ ker(A − λi Id)ri
9 JORDAN NORMAL FORM 132
and
Jk := vk , (A − λi Id)vk , . . . , (A − λi Id)sk −1 vk ,
A has a matrix in Jordan normal form, i.e. the block diagonal matrix
N1 0 ... 0
0 N2 ... 0
N := .. ,
.
0 ... 0 Nl
each block (called a Jordan block) having the form
Nk = (λi(k) ) ∈ M(1, F ) for sk = 1,
λi(k) 1 0 ... 0
λi(k) 1
Nk =
. . . . . . ∈ M(sk , F )
for sk > 1.
0
0 λi(k) 1
λi(k)
(b) In the situation of (a) and recalling from Def. 6.12 that, for each λ ∈ σ(A), r(λ)
is such that ma (λ) = dim ker(A − λ Id)r(λ) and, for each s ∈ {1, . . . , r(λ)},
EAs (λ) = ker(A − λ Id)s
is called the corresponding generalized eigenspace of rank s of A, each v ∈ EAs (λ) \
EAs−1 (λ), s ≥ 2, is called a generalized eigenvector of rank s, we obtain
∀ ri = r(λi ),
i∈{1,...,m}
r
∀ Uk ⊆ EAsk (λi(k) ) ⊆ EAi(k) (λi(k) ).
k∈{1,...,l}
Moreover, for each k ∈ {1, . . . , l}, vk is a generalized eigenvector of rank sk and the
basis Jk consists of generalized eigenvectors, containing precisely one generalized
eigenvector of rank s for each s ∈ {1, . . . , sk }. Define
∀ l(i, s) := # k ∈ {1, . . . , l} : Uk ⊆ ker(A − λi Id)ri ∧ dim Uk = s
∀
i∈{1,...,m} s∈N
9 JORDAN NORMAL FORM 133
(thus, l(i, s) is the number of Jordan blocks of size s corresponding to the eigenvalue
λi – apart from the slightly different notation used here, the l(i, s) are precisely the
numbers called lk in Prop. 9.8(b)). Then, for each i ∈ {1, . . . , m},
Thus, in general, one needs to determine, for each i ∈ {1, . . . , m} and each s ∈
{1, . . . , ri }, dim ker(A − λi Id)s to know the precise structure of N .
(c) For the sake of completeness and convenience, we restate Th. 9.9(b): If
L
M
V = Wi
i=1
Proof. (a),(b): The A-cyclic and A-irreducible subspaces U1 , . . . , Ul are given by Th.
9.9(a). As in Th. 9.9(a), for each k ∈ {1, . . . , l}, let vk ∈ Uk be such that
Thus, with respect to the ordered basis (wsk −1 , . . . , w0 ), AUk has the matrix Nk . As
each Uk is A-invariant, this also proves N to be the matrix of A with respect to J.
Proposition 9.10(a),(c) yields sk ≤ ri ≤ dim ker(A − λi Id)ri . Moreover, (9.16) shows
9 JORDAN NORMAL FORM 134
that Jk contains precisely one generalized eigenvector of rank s for each s ∈ {1, . . . , sk }.
Finally, (9.15), i.e. the formulas for the l(i, s), are given by the recursion (9.12) (which
was inferred from (9.9)), using that, in the current situation, g = X − λi , dim g = 1,
and (with Vi := ker(A − λi Id)ri ),
Both have the same total number of Jordan blocks, namely 4, which corresponds
to
dim ker(N1 − 2 Id8 ) = dim ker(N2 − 2 Id8 ) = 4.
9 JORDAN NORMAL FORM 135
The differences appear in the generalized eigenspaces of higher order: N1 has two
linearly independent generalized eigenvectors of rank 2, whereas N2 has three lin-
early independent generalized eigenvectors of rank 2, yielding
dim ker(N1 − 2 Id8 )2 − dim ker(N1 − 2 Id8 ) = 2, i.e. dim ker(N1 − 2 Id8 )2 = 6,
dim ker(N2 − 2 Id8 )2 − dim ker(N2 − 2 Id8 ) = 3, i.e. dim ker(N2 − 2 Id8 )2 = 7.
Next, N1 has two linearly independent generalized eigenvectors of rank 3, whereas
N2 has one linearly independent generalized eigenvector of rank 3, yielding
dim ker(N1 − 2 Id8 )3 − dim ker(N1 − 2 Id8 )2 = 2, i.e. dim ker(N1 − 2 Id8 )3 = 8,
dim ker(N2 − 2 Id8 )3 − dim ker(N2 − 2 Id8 )2 = 1, i.e. dim ker(N2 − 2 Id8 )3 = 8.
From (9.15a), we obtain (with i = 1)
lN1 (1, 3) = 2, lN2 (1, 3) = 1,
corresponding to N1 having two blocks of size 3 and N2 having one block of size 3.
From (9.15b), we obtain (with i = 1)
lN1 (1, 2) = 8 − 4 − 2(3 − 1) = 0, lN2 (1, 2) = 8 − 4 − 1(3 − 1) = 2,
corresponding to N1 having no blocks of size 2 and N2 having two blocks of size 2.
To check consistency, we use (9.15b) again to obtain
lN1 (1, 1) = 8−0−0(2−0)−2(3−0) = 2, lN2 (1, 1) = 8−0−2(2−0)−1(3−0) = 1,
corresponding to N1 having two blocks of size 1 and N2 having one block of size 1.
Remark 9.15. In the situation of Th. 9.13, we saw that, in order to find a matrix N in
Jordan normal form for A, according to Th. 9.13(b), in general, for each i ∈ {1, . . . , m}
and each s ∈ {1, . . . , ri }, one has to know dim ker(A − λi Id)s . On the other hand,
given a matrix M of A, one might also want to find the transition matrix T ∈ GLn (F )
such that M = T N T −1 . As it turns out, if one has already determined generalized
eigenvectors forming bases of ker(A − λi Id)s , one may use these same vectors for the
columns of T : Indeed, if M = T N T −1 and t1 , . . . , tn denote the columns of T , then, if
j ∈ {1, . . . , n} corresponds to a column of N with λi being the only nonzero entry, then
M tj = T N T −1 tj = T N ej = T λi ej = λi tj ,
showing tj to be a corresponding eigenvector. If j ∈ {1, . . . , n} corresponds to a column
of N having nonzero entries λi and 1 (above the λi ), then
(M − λi Idn )tj = (T N T −1 − λi Idn )tj = T N ej − λi tj = T (ej−1 + λi ej ) − λi tj
= tj−1 + λi tj − λi tj = tj−1
showing tj to be a generalized eigenvector of rank ≥ 2 corresponding to the Jordan block
containing the index j.
10 VECTOR SPACES WITH INNER PRODUCTS 136
(ii) hλx + µy, zi = λhx, zi + µhy, zi for each x, y, z ∈ X and each λ, µ ∈ K (i.e. an
inner product is K-linear in its first argument).
Lemma 10.2. For each inner product h·, ·i on a vector space X over K, the following
formulas are valid:
(a) hx, λy + µzi = λ̄hx, yi + µ̄hx, zi for each x, y, z ∈ X and each λ, µ ∈ K, i.e. h·, ·i
is conjugate-linear in its second argument, even linear for K = R. Together with
Def. 10.1(ii), this means that h·, ·i is a sesquilinear form, even a bilinear form for
K = R.
Remark 10.3. If X is a vector space over K with an inner product h·, ·i, then the map
p
k · k : X −→ R+0, kxk := hx, xi,
defines a norm on X (cf. [Phi16b, Prop. 1.65]). One calls this the norm induced by the
inner product.
Definition 10.4.
Let X be a vector space over K. If h·, ·i is an inner product on X,
then X, h·, ·i is called an inner product space or a pre-Hilbert space. An inner product
p if, and only if, (X, k·k) is a Banach space, where k·k is the
space is called a Hilbert space
induced norm, i.e. kxk := hx, xi. Frequently, the inner product on X is understood
and X itself is referred to as an inner product space or Hilbert space.
Example 10.5. (a) On the space Kn , n ∈ N, we define an inner product by letting,
for each z = (z1 , . . . , zn ) ∈ Kn , w = (w1 , . . . , wn ) ∈ Kn :
n
X
z · w := zj w̄j (10.1)
j=1
(called the standard inner product on Kn , also the Euclidean inner product for
K = R). Let us verify that (10.1), indeed, defines an inner product in the sense
10.1: If z 6= 0, then there is j0 ∈ {1, . . . , n} such that zj0 6= 0. Thus,
of Def. P
z · z = nj=1 |zj |2 ≥ |zj0 |2 > 0, i.e. Def. 10.1(i) is satisfied. Next, let z, w, u ∈ Kn
and λ, µ ∈ K. One computes
n
X n
X n
X
(λz + µw) · u = (λzj + µwj )ūj = λzj ūj + µwj ūj = λ(z · u) + µ(w · u),
j=1 j=1 j=1
i.e. Def. 10.1(ii) is satisfied. For Def. 10.1(iii), merely note that
n
X n
X
z·w = zj w̄j = wj z̄j = w · z.
j=1 j=1
Hence, we have shown that (10.1) defines an inner product according to Def. 10.1.
Due to [Phi16b, Prop. 1.59(b)], the induced norm is complete, i.e. Kn with the inner
product of (10.1) is a Hilbert space.
implying Z b Z
2
hf, f i = |f | ≥ |f |2 > 0.
a [a,b]∩[t−δ,t+δ]
Moreover, Z Z
∀ hf, gi = f g dµ = f g dµ = hg, f i,
f,g∈X Ω Ω
(d) If K = R, then
1 (10.3) 1
kx + yk2 − kx − yk2 = kxk2 + kyk2 − kx − yk2 . (10.5)
∀ hx, yi =
x,y∈X 4 2
If K = C, then
1
kx + yk2 − kx − yk2 + i kx + iyk2 − i kx − iyk2 .
∀ hx, yi =
x,y∈X 4
kx + yk2 + kx − yk2 = kxk2 + hx, yi + hy, xi + kyk2 + kxk2 − hx, yi − hy, xi + kyk2
= 2 kxk2 + kyk2
proves (10.3).
(d): If K = R, then
kx + yk2 − kx − yk2 = 4 hx, yi.
If K = C, then
proving (d).
One can actually also show (with more effort) that a normed space that satisfies (10.3)
must be an inner product space, see, e.g., [Wer11, Th. V.1.7].
Definition 10.7. Let (X, k · k), (Y, k · k) be normed vector spaces over K, f : X −→ Y .
∀ kf (v)k = kvk.
v∈X
10 VECTOR SPACES WITH INNER PRODUCTS 140
∀ kf (u) − f (v)k = ku − vk
u,v∈X
(i.e. if, and only if, f preserves the metric induced by the norm on X).
(c) If the norms on X and Y are induced via inner products h·, ·i on X and Y , re-
spectively, kvk2 = hv, vi, then one calls f inner product-preserving if, and only
if,
∀ hf (u), f (v)i = hu, vi.
u,v∈X
While, in general, neither norm-preserving nor isometric implies any of the other prop-
erties defined in Def. 10.7 (cf. Ex. 10.9(a),(b) below), there exist simple as well as subtle
relationships between these notions and also relationships with linearity:
Theorem 10.8. X, h·, ·i , Y, h·, ·i be inner product spaces over K, f : X −→ Y . We
consider X, Y with the respective induced norms and metrics, kvk2 = hv, vi, d(u, v) =
ku − vk.
(d) If K = R and f is isometric, then f is affine (i.e. x 7→ f (x) − f (0) is linear, cf.
Def. 1.26)7 . As in (b) and (c), Ex. 10.9(c) below shows that the result does not
extend to K = C.
(e) If f is linear, then the following statements are equivalent (where the equivalence
between (i) and (ii) even holds in arbitrary normed vector spaces X, Y over K):
(i) f is isometric.
7
This result holds in more general situations: If X and Y are arbitrary normed vector spaces over
R and f : X −→ Y is isometric, then f must be linear, provided Y is strictly convex (cf. [FJ03, Th.
1.3.8] and see Ex. 10.9(e) for a definition of strictly convex spaces) or provided f is surjective (this is
the Mazur-Ulam Theorem, cf. [FJ03, Th. 1.3.5]). However, there exist nonlinear isometries into spaces
that are not strictly convex, cf. Ex. 10.9(e).
10 VECTOR SPACES WITH INNER PRODUCTS 141
(ii) f is norm-preserving.
(iii) f is inner product-preserving.
(this works in arbitrary normed vector spaces X, Y over R or C). According to (b), f is
then also inner product-preserving.
(d): Let f be isometric. To show f is affine, it suffices to show
is linear. Due to
It remains to prove (ii) ⇒ (iii). To this end, let u, v ∈ X. Then (ii) and the linearity of
f imply
hf (u), f (u)i + hf (u), f (v)i + hf (v), f (u)i + hf (v), f (v)i = hf (u) + f (v), f (u) + f (v)i
= hf (u + v), f (u + v)i = kf (u + v)k2 = ku + vk2
= hu + v, u + vi = hu, ui + hu, vi + hv, ui + hv, vi
Similarly,
hf (u), f (u)i − ihf (u), f (v)i + ihf (v), f (u)i + hf (v), f (v)i = hU (u + iv), U (u + iv)i
= hu + iv, u + ivi = hu, ui − ihu, vi + ihv, ui + hv, vi
and, thus,
hf (u), f (v)i − hf (v), f (u)i = hu, vi − hv, ui.
Adding both results and dividing by 2 then yields (iii).
Example 10.9. Let (X, k · k), (Y, k · k) be normed vector spaces over K.
10 VECTOR SPACES WITH INNER PRODUCTS 143
f : X −→ X, f (x) := x + a,
is isometric due to
kf (0)k = kak 6= 0,
(b) The following maps g and h are norm-preserving, but neither continuous (with
respect to the induced metric) nor linear. Thus, according to [Phi16b, Lem. 2.32(b)],
the maps are not isometric and, using Th. 10.8(a), they are not inner product-
preserving (if h·, ·i is an inner product on X). To define g, let 0 6= a ∈ X and
set (
x for x 6= a,
g : X −→ X, g(x) :=
−a for x = a.
Clearly, g is norm-preserving, discontinuous in a, and nonlinear (e.g. 2a = g(2a) 6=
−2a = 2g(a)). The map
(
x for Re x ∈ Q and Im x ∈ Q,
h : K −→ K, h(x) :=
−x otherwise,
is, clearly,
√ norm-preserving,
√ nowhere
√ continuous,
√ except in 0, and nonlinear (e.g.
−1 − 2 = h(1 + 2) 6= 1 − 2 = h(1) + h( 2)).
(if k·k is induced by an inner product on Y , then (Y, k·k) is strictly convex; whereas
(Rn , k · k∞ ) is not strictly convex for n > 1). If (Y, k · k) is not strictly convex and
u, v ∈ Y with u 6= v are such that (10.6) holds, then
(
su for s ≤ 1,
f : R −→ Y, f (s) :=
u + (s − 1) v for s > 1,
10.3 Orthogonality
Definition 10.10. Let X, h·, ·i be an inner product space over K.
(a) x, y ∈ X are called orthogonal or perpendicular (denoted x ⊥ y) if, and only if,
hx, yi = 0.
(b) Let E ⊆ X. Define the perpendicular space E ⊥ to E (called E perp) by
⊥
E := y ∈ X : ∀ hx, yi = 0 . (10.7)
x∈E
Caveat: As is common, we use the same symbol to denote the perpendicular space
that we used to denote the forward annihilator in Def. 2.13, even though these
objects are not the same: The perpendicular space is a subset of X, whereas the
forward annihilator is a subset of X ′ . In the following, when dealing with inner
product spaces, E ⊥ will always mean the perpendicular space.
(c) If X = V1 ⊕ V2 with subspaces V1 , V2 of X, then we call X the orthogonal sum of
V1 , V2 if, and only if, v1 ⊥ v2 for each v1 ∈ V1 , v2 ∈ V2 . In this case, we also write
X = V1 ⊥ V2 .
(d) Let S ⊆ X. Then S is an orthogonal system if, and only if, x ⊥ y for each x, y ∈ S
with x 6= y. A unit vector is x ∈ X such that kxk = 1 (with respect to the
induced norm on X). Then S is called an orthonormal system if, and only if, S
is an orthogonal system consisting entirely of unit vectors. Finally, S is called an
orthonormal basis if, and only if, it is a maximal orthonormal system in the sense
that, if S ⊆ T ⊆ X and T is an orthonormal system, then S = T (caveat: if X is
an infinite-dimensional Hilbert space, then an orthonormal basis of X is not(!) a
vector space basis of X).
Lemma 10.11. Let X, h·, ·i be an inner product space over K, E ⊆ X.
(a) E ∩ E ⊥ ⊆ {0}.
(b) E ⊥ is a subspace of X.
(c) X ⊥ = {0} and {0}⊥ = X.
which yields λj = 0 by Def. 10.1(i). Thus, we have shown that λj = 0 for each j ∈
{1, . . . , n}, which establishes the linear independence of S \ {0}.
(b): We compute
2 * +
Xn
n
X n
X n X
X n n
X n
X
si
= si , sj = hsi , sj i = hsi , si i = ksi k2 ,
i=1 i=1 j=1 i=1 j=1 i=1 i=1
To obtain orthogonal systems and orthonormal systems in inner product spaces, one
can apply the algorithm provided by the following Th. 10.13:
Theorem 10.13 (Gram-Schmidt Orthogonalization). Let X, h·, ·i be an inner product
space over K with induced norm k · k. Let x0 , x1 , . . . be a finite or infinite sequence of
vectors in X. Define v0 , v1 , . . . recursively as follows:
n−1
X hxn , vk i
v0 := x0 , vn := xn − 2
vk (10.8)
k=0,
kv kk
vk 6=0
10 VECTOR SPACES WITH INNER PRODUCTS 147
for each n ∈ N, additionally assuming that n is less than or equal to the max index of
the sequence x0 , x1 , . . . if the sequence is finite. Then the set {v0 , v1 , . . . } constitutes an
orthogonal system. Of course, by omitting the vk = 0 and by dividing each vk 6= 0 by
its norm, one can also obtain an orthonormal system (nonempty if at least one vk 6= 0).
Moreover, vn = 0 if, and only if, xn ∈ span{x0 , . . . , xn−1 }. In particular, if the x0 , x1 , . . .
are all linearly independent, then so are the v0 , v1 , . . . .
which, due to Prop. 10.12(a), implies vn = 0. Finally, if all x0 , x1 , . . . are linearly inde-
pendent, then all vk 6= 0, k = 0, 1, . . . , such that the v0 , v1 , . . . are linearly independent
by Prop. 10.12(a).
Example 10.14. In the space C[−1, 1] with the inner product according to Ex. 10.5(b),
consider
∀ xi : [−1, 1] −→ K, xi (x) := xi .
i∈N0
We check that the first four orthogonal polynomials resulting from applying (10.8) to
x0 , x1 , . . . are given by
1 3
v0 (x) = 1, v1 (x) = x, v2 (x) = x2 − , v3 (x) = x3 − x.
3 5
One has v0 = x0 ≡ 1 and, then, obtains successively from (10.8):
R1
hx1 , v0 i hx1 , v0 i x dx
v1 (x) = x1 (x) − 2
v0 (x) = x − 2
= x − R−11 = x,
kv0 k kv0 k dx−1
10 VECTOR SPACES WITH INNER PRODUCTS 148
R1 2
R1 3
hx2 , v0 i hx2 , v1 i −1
x dx x dx
v2 (x) = x2 (x) − 2
v0 (x) − v1 (x) = x2 − − R−1
1 x
kv0 k kv1 k2 2 x2 dx
−1
1
= x2 − .
3
hx3 , v0 i hx3 , v1 i hx3 , v2 i
v3 (x) = x3 (x) − 2
v0 (x) − 2
v1 (x) − v2 (x)
kv0 k kv1 k kv2 k2
R1 3 R1 4 R 1 3 2 1
x dx x dx x x − 3 dx
3 −1 −1 −1 2 1
=x − − R1 x− R1 2 x −
2 x2 dx x2 − 1 dx 3
−1 −1 3
2
3
= x3 − x = x3 − x.
5
2
3
5
Definition 10.15. Let X, h·, ·i and Y, h·, ·i be inner product spaces over K. We
call X and Y isometrically isomorphic if, and only if, there exists an isometric linear
isomorphism A ∈ L(X, Y ) (i.e., by Th. 10.8(e), a linear isomorphism A, satisfying
hAu, Avi = hu, vi for each u, v ∈ X).
Theorem 10.16. Let X, h·, ·i be a finite-dimensional inner product space over K.
(i) X = U ⊥ U ⊥ .
(ii) dim U ⊥ = dim X − dim U .
(iii) (U ⊥ )⊥ = U .
(c): We already know that the existence of a linear isomorphism between X and Y
implies dim X = dim Y . Conversely, assume n := dim X = dim Y ∈ N, and let BX =
{x1 , . . . , xn }, BY = {y1 , . . . , yn } be orthonormal bases of X, Y , respectively. Define
A ∈ L(X, Y ) by letting Axi = yi for each i ∈ {1, . . . , n}. As A(BX ) = BY , A is a linear
isomorphism. Moreover, if λ1 , . . . , λn , µ1 , . . . , µn ∈ K, then
* n
! n
!+ n X n n X
n
X X X X
A λi xi , A µ j xj = λi µj hyi , yj i = λi µj δij
i=1 j=1 i=1 j=1 i=1 j=1
n X
n
* n n
+
X X X
= λi µj hxi , xj i = λi xi , µj xj ,
i=1 j=1 i=1 j=1
showing A to be isometric.
(d): Let {x1 , . . . , xn } be a basis of X such that {x1 , . . . , xm }, 1 ≤ m ≤ n, is a basis
of U , dim X = n, dim U = m. In this case, if v1 , . . . , vn are given by Gram-Schmidt
orthogonalization according to Th. 10.13, then Th. 10.13 yields {v1 , . . . , vn } to be a
basis of X and {v1 , . . . , vm } to be a basis of U . Since {vm+1 , . . . , vn } ⊆ U ⊥ , we have
X = U + U ⊥ . As we also know U ∩ U ⊥ = {0} by Lem. 10.11(a), we have shown
X = U ⊥ U ⊥ , then yielding dim U ⊥ = dim X − dim U as well. As a consequence of (ii),
we have dim(U ⊥ )⊥ = dim U , which, together with U ⊆ (U ⊥ )⊥ , yields (U ⊥ )⊥ = U .
Using Zorn’s lemma, one can extend Th. 10.16(b) to infinite-dimensional spaces (cf.
[Phi17b, Th. 4.31(a)]). However, as remarked before, an orthonormal basis of a Hilbert
space X is a vector space basis of X if, and only if, dim X < ∞ (cf. [Phi17b, Rem.
4.33]). Moreover, Th. 10.16(c) also extends to infinite-dimensional Hilbert spaces, which
are isometrically isomorphic if, and only if, they have orthonormal bases of the same
cardinality (cf. [Phi17b, Th. 4.31(c)]). If one adds the assumption that the subspace
U be closed (with respect to the induced norm), then Th. 10.16(c) extends to infinite-
dimensional Hilbert spaces as well (cf. [Phi17b, Th. 4.20(e),(f)]).
If X is a finite-dimensional inner product space, then all linear forms on X (i.e. all
elements of the dual X ′ ) are given by means of the inner product on X:
Theorem 10.17. Let X, h·, ·i be a finite-dimensional inner product space over K.
Then the map
ψ : X −→ X ′ , ψ(y) := αy , (10.9)
where
αy : X −→ K, αy (a) := ha, yi, (10.10)
is bijective and conjugate-linear (in particular, each α ∈ X ′ can be represented by y ∈ X,
and, if K = R, then ψ is a linear isomorphism).
10 VECTOR SPACES WITH INNER PRODUCTS 150
of X, consisting of all linear forms on X that are also continuous with respect to the
induced norm on X, cf. [Phi17b, Ex. 3.1] (recall that, if X is finite-dimensional, then
all linear forms on X are automatically continuous, cf. [Phi16b, Ex. 2.16]).
Proposition 10.19. (a) If M = (mkl ) ∈ M(n, C), n ∈ N, is such that
∀ x∗ M x = 0, (10.11)
x∈Cn
then M = 0.
(b) If X, h·, ·i is a finite-dimensional inner product space over C and A ∈ L(X, X) is
such that
∀ hAx, xi = 0, (10.12)
x∈X
then A = 0.
Caveat: This result does not extend to finite-dimensional vector spaces over R (cf. Ex.
10.20 below).
0 = e∗α M eα = mαα .
Now let α, β ∈ {1, . . . , n} with α 6= β and assume mαβ = a + bi, mβα = c + di with
a, b, c, d ∈ R. If x := (s + ti)eα + eβ , then
n X
X n
0= mkl xk xl = mαβ xα xβ + mβα xβ xα = (a + bi)(s − ti) + (c + di)(s + ti)
k=1 l=1
= as + bt + cs − dt + (bs − at + ct + ds)i
= (a + c)s + (b − d)t + (b + d)s + (c − a)t i,
implying
(a + c)s + (b − d)t = 0 ∧ (b + d)s + (c − a)t = 0.
Choosing s = 0 and t = c − a yields c = a; choosing s = a + c and t = 0 then yields
a + c = 2a = 0, implying a = 0 = c. Likewise, choosing s = 0 and t = b − d yields b = d;
then choosing s = b + d and t = 0 yields b + d = 2b = 0, implying b = 0 = d. Thus,
mαβ = mβα = 0, completing the proof that M = 0.
(b): Let n := dim X ∈ N. According to Th. 10.16(c), there exists a linear isomorphism
I : X −→ Cn such that
∀ hx, yi = hIx, Iyi2 ,
x,y∈X
where h·, ·i2 denotes the standard inner product on Cn (i.e. hu, vi2 = nk=1 uk v k ). Thus,
P
if we let B := I ◦ A ◦ I −1 , then B ∈ L(Cn , Cn ) and
Now, if M ∈ M(n, C) represents B with respect to the standard basis of Cn , then, for
each u ∈ Cn , u∗ M u = hM u, ui2 = hBu, ui2 = 0, such that M = 0 by (a). Thus, B = 0,
also implying A = I −1 ◦ B ◦ I = 0.
Example P10.20. Consider Rn , n ∈ N, n ≥ 2, with the standard inner product (i.e.
n
hu, vi = k=1 uk vk ), and the standard basis {e1 , . . . , en }. Suppose the linear map A :
Rn −→ Rn is defined by Ae1 := −e2 , Ae2 := e1 , Aek := 0 for each k ∈ {1, . . . , n}\{1, 2}.
Then A 6= 0, but
∀ hAx, xi = hx2 e1 − x1 e2 , xi = x2 x1 − x1 x2 = 0.
x=(x1 ,...,xn )∈Rn
Proposition 10.35 below will provide more thorough information on the real case in
comparison with Prop. 10.19.
10 VECTOR SPACES WITH INNER PRODUCTS 152
A∗ : X2 −→ X1 , A∗ := ψ1−1 ◦ A′ ◦ ψ2 , (10.13)
(a) One has A∗ ∈ L(X2 , X1 ), and A∗ is the unique map X2 −→ X1 such that
(i) A−1 ∈ L(X2 , X1 ) exists if, and only if, (A∗ )−1 ∈ L(X1 , X2 ) exists, and, in that case,
Proof. (a): Let A ∈ L(X1 , X2 ). Then A∗ ∈ L(X2 , X1 ), since A′ is linear, and ψ1−1 and
ψ2 are both conjugate-linear. Moreover, we know A′ is the unique map on X2′ such that
∀ ∀ A′ (β)(x) = β(A(x)).
β∈X2′ x∈X1
proving (10.14). For each y ∈ X2 and each x ∈ X1 , we have hAx, yi = (ψ2 (y))(Ax) =
∗ −1
(ψ2 (y)) ◦ A (x). Then Th. 10.17 and (10.14) imply A (y) = ψ1 (ψ2 (y)) ◦ A , showing
A∗ to be uniquely determined by (10.14).
(b): According to (a), A∗∗ is the unique map X1 −→ X2 such that
and
(λA)∗ (y) = (ψ1−1 ◦ (λA)′ ◦ ψ2 )(y) = λ(ψ1−1 ◦ A′ ◦ ψ2 )(y) = (λA∗ )(y),
showing A 7→ A∗ to be conjugate-linear. Moreover, A 7→ A∗ is bijective due to (b).
(d): One has (IdX1 )∗ = ψ1−1 ◦ (IdX1 )′ ◦ ψ1 = ψ1−1 ◦ IdX1′ ◦ψ1 = IdX1 .
(e): Let ψ3 : X3 −→ X3′ be given by Th. 10.17. Then
(f): We have
proving A to be isometric.
For extensions of Def. 10.21 and Th. 10.22 to infinite-dimensional Hilbert spaces, see
[Phi17b, Def. 4.34], [Phi17b, Cor. 4.35].
Definition 10.23. Let m, n ∈ N and let M := (mkl ) ∈ M(m, n, K) be an m × n matrix
over K. We call
M := (mkl ) ∈ M(m, n, K)
the complex conjugate matrix of M and
t
M ∗ := M = (M t ) ∈ M(n, m, K)
the adjoint matrix of M (thus, for K = R, the adjoint matrix is the same as the transpose
matrix).
10 VECTOR SPACES WITH INNER PRODUCTS 155
Theorem 10.24. Let X, h·, ·i , Y, h·, ·i be finite-dimensional inner product spaces
over K. Let A ∈ L(X, Y ).
where (∗) holds, as the forming of complex conjugates commutes with the forming of
sums and products of complex numbers. Next, using this fact again together with the
linearity of forming the transpose of a matrix, we compute
t t
χA∗ = det(X Idn −M ∗ ) = det X Idn − M = det X Idn −M
n
X
ak X k .
= det X Idn −M =
k=0
Thus,
n
X
λ ∈ σ(A∗ ) ⇔ ǫλ (χA∗ ) = ak λ k = 0
k=0
n
X k
⇔ ǫλ (χA ) = ak λ =0 ⇔ λ ∈ σ(A),
k=0
Definition 10.25. Let X, h·, ·i be an inner product space over K and let U be a
subspace of X such that X = U ⊥ U ⊥ . Then the linear projection PU : X −→ U ,
PU (u + v) := u for u + v ∈ X with u ∈ U and v ∈ U ⊥ is called the orthogonal projection
from X onto U .
Theorem 10.26. Let X, h·, ·i be an inner product space over K, let U be a subspace
of X such that X = U ⊥ U ⊥ , and let PU : X −→ U be the orthogonal projection onto
U . Moreover, let k · k denote the induced norm on X.
Proof. (a): Let x ∈ X and u ∈ U with u 6= PU (x). Then PU (x)−u ∈ U and x−PU (x) ∈
ker PU = U ⊥ . Thus,
Prop. 10.12(b)
ku − xk2 = kPU (x) − xk2 + ku − PU (x)k2 > kPU (x) − xk2 ,
and, thus,
∀ hx, ui i = hPU (x), ui i = λi ,
i∈{1,...,n}
Proposition 10.27. Let X, h·, ·i be an inner product space over K with induced norm
k · k. Let P ∈ L(X, X) be a projection. Then P is an orthogonal projection onto
U := Im P (i.e. X = Im P ⊥ ker P ) if, and only if P = P ∗ .
where (∗) holds due to the Cauchy-Schwarz inequality [Phi16b, (1.41)]. In consequence,
∀ kP xk ≤ kxk.
x∈X
hu, vi
y := u − v.
kvk2
(due to Euler’s formula, relating the exponential function to sine and cosine, the elements
of Tn are known as trigonometric polynomials). Clearly, U := Tn is a subspace of the
space X := C[−π, π] of Ex. 10.5(b). As it turns out, we even have X = U ⊥ U ⊥ (we will
not prove this here, but it follows from [Phi17b, Th. 4.20(e)], since the finite-dimensional
subspace U is automatically a closed subspace, cf. [Phi17b, Th. 1.16(b)]). Thus, one
has an orthogonal projection PU from X onto U , yielding the best approximation of a
continuous function by a trigonometric polynomial. Moreover, if one has an orthonormal
10 VECTOR SPACES WITH INNER PRODUCTS 158
basis of U , then one can use Th. 10.26(b) to compute PU (x) for each function x ∈ X.
We verify that an orthonormal basis is given by the functions
eikt
uk : [−π, π] −→ C, uk (t) := √ (k ∈ {−n, . . . , n}) :
2π
One computes, for each k, l ∈ {−n, . . . , n},
(
π π
1 1 for k = l,
Z Z
uk u l = ei(k−l)t dt = 1 ei(k−l)t π
−π 2π −π [ ]
2π i(k−l) −π
6 l.
= 0 for k =
(a) We call A normal if, and only if, AA∗ = A∗ A; likewise, we call M normal if, and
only if, M M ∗ = M ∗ M . We define
Nor(X) := A ∈ L(X, X) : A is normal ,
Norn (K) := M ∈ M(n, K) : M is normal .
(b) We call A Hermitian or self-adjoint if, and only if, A = A∗ ; likewise, we call M
Hermitian or self-adjoint if, and only if, M = M ∗ (thus, for K = R, Hermitian is
10 VECTOR SPACES WITH INNER PRODUCTS 159
the same as symmetric, a notion we previously defined in [Phi19, Def. 7.25(c)] for
quadratic matrices M with M = M t and that is now extended to Hermitian maps A
for K = R). We call A skew-Hermitian if, and only if, −A = A∗ ; likewise, we call M
skew-Hermitian if, and only if, −M = M ∗ (thus, for K = R, skew-Hermitian is the
same as skew-symmetric, a notion we previously defined in [Phi19, Def. 7.30(a)] for
quadratic matrices M with −M = M t and that is now extended to skew-Hermitian
maps A for K = R). Moreover, we call AHer := 21 (A + A∗ ) the Hermitian part
of A, MHer := 12 (M + M ∗ ) the Hermitian part of M , AskHer := 21 (A − A∗ ) the
skew-Hermitian part of A, and AskHer := 21 (M − M ∗ ) the skew-Hermitian part of M
(thus, for K = R, Asym := AHer and Msym := MHer are the same as the symmetric
parts of A and M , respectively; Askew := AskHer and Mskew := MskHer are the same
as the skew-symmetric parts of A and M , respectively; notions previously defined
in [Phi19, Def. 7.30(b)] for quadratic matrices M , now extended to maps A for
K = R). We define
Herm(X) := A ∈ L(X, X) : A is Hermitian ,
Hermn (K) := M ∈ M(n, K) : M is Hermitian ,
skHerm(X) := A ∈ L(X, X) : A is skew-Hermitian ,
skHermn (K) := M ∈ M(n, K) : M is skew-Hermitian ,
and, for K = R,
Sym(X) := A ∈ L(X, X) : A is symmetric ,
Symn (R) := M ∈ M(n, R) : M is symmetric ,
Skew(X) := A ∈ L(X, X) : A is skew-symmetric ,
Skewn (R) := M ∈ M(n, R) : M is skew-symmetric .
(c) We call A ∈ GL(X) unitary if, and only if, A−1 = A∗ ; likewise, we call M ∈ GLn (K)
unitary if, and only if, M −1 = M ∗ . If K = R, then we also call unitary maps and
matrices orthogonal. We define
U(X) := A ∈ L(X, X) : A is unitary ,
Un (K) := M ∈ M(n, K) : M is unitary ,
and, for K = R,
O(X) := A ∈ L(X, X) : A is orthogonal ,
On (R) := M ∈ M(n, R) : M is orthogonal .
10 VECTOR SPACES WITH INNER PRODUCTS 160
Proposition 10.30. Let X, h·, ·i be a finite-dimensional inner product space over
K, n ∈ N. We have Herm(X) ⊆ Nor(X), Hermn (K) ⊆ Norn (K), U(X) ⊆ Nor(X),
Un (K) ⊆ Norn (K),
(c) Let A, B ∈ Herm(X). Then A+B ∈ Herm(X) (if λ ∈ R, then also λA ∈ Herm(X),
showing Herm(X) to be a vector subspace of L(X, X) for K = R). If A ∈ GL(X),
then A−1 ∈ Herm(X). If AB = BA, then AB ∈ Herm(X). The analogous results
also hold for Hermitian matrices.
In general, for normal maps and normal matrices, neither the sum nor the product is
normal. However, one can show that, if A, B are normal with AB = BA, then A + B
and AB are normal – this makes use of the diagonalizability of normal maps over C and
is not quite as easy as one might think, cf. Prop. 10.43 below.
Proposition 10.32. Let X, h·, ·i be a finite-dimensional inner product space over K,
n ∈ N.
10 VECTOR SPACES WITH INNER PRODUCTS 161
(b) Herm(X) ∩ skHerm(X) = {0} and Hermn (K) ∩ skHermn (K) = {0}.
(b) A can be uniquely decomposed into its Hermitian and skew-Hermitian parts, i.e.
A = AHer + AskHer and, if A = S + B with S ∈ Herm(X) and B ∈ skHerm(X), then
S = AHer and B = AskHer .
(c) A is Hermitian if, and only if, A = AHer ; A is skew-Hermitian if, and only if,
A = AskHer .
Caveat: This result does not extend to finite-dimensional inner product spaces over R:
Over R, (ii) and (iii) hold for every H ∈ L(X, X), M ∈ M(n, C), even though, for
n ≥ 2, not every such H, M is symmetric (cf. Ex. 10.20).
The following Prop. 10.35 is related to Prop. 10.19, highlighting important differences
between the complex and the real situation.
Proposition 10.35. Let X, h·, ·i be a finite-dimensional inner product space over R,
n ∈ N.
∀ x∗ M x = 0, (10.15)
x∈Rn
then M = 0.
then A = 0.
(c) M ∈ M(n, R) is skew-symmetric if, and only if, (10.15) holds true.
(d) A ∈ L(X, X) is skew-symmetric if, and only if, (10.16) holds true.
Proof. (a): As in the proof of Prop. 10.19, we obtain from matrix multiplication
n X
X n
xt M x = mkl xk xl ,
k=1 l=1
where h·, ·i2 denotes the standard inner product on Rn . Moreover, we then also know
I t = I −1 from Th. 10.22(j). Thus, if we let B := I ◦ A ◦ I −1 , then B ∈ Sym(Rn ), since
B t = (I −1 )t ◦ At ◦ I t = B. Moreover,
proving 2hAx, xi = 0 and (10.16). Conversely, if (10.16), then, since AskHer is skew-
symmetric,
(iii) The columns of M form an orthonormal basis of Kn with respect to the standard
inner product on Kn .
(iv) The rows of M form an orthonormal basis of Kn with respect to the standard inner
product on Kn .
(v) M t is unitary.
(vi) M is unitary.
“(i)⇔(iii)”: M −1 = M ∗ implies
m1k m1j n n
(
.. .. X X 0 for k 6= j,
. · . = mlk mlj = m∗jl mlk = (10.17)
1 for k = j,
mnk mnj l=1 l=1
showing that the columns of M form an orthonormal basis of Kn with respect to the
standard inner product on Kn . Conversely, if the columns of M form an orthonormal
basis of Kn with respect to the standard inner product, then they satisfy (10.17), which
implies M ∗ M = Id.
“(i)⇔(iv)”: M −1 = M ∗ implies
mk1 mj1 n n
(
.. .. X X 0 for k 6= j,
. · . = mkl mjl = mkl m∗lj = (10.18)
1 for k = j,
mkn mjn l=1 l=1
showing that the rows of M form an orthonormal basis of Kn with respect to the standard
inner product on Kn . Conversely, if the rows of M form an orthonormal basis of Kn
with respect to the standard inner product, then they satisfy (10.18), which implies
M ∗ M = Id.
“(i)⇔(v)”: Since the rows of M are the columns of M t , the equivalence of (i) and (v)
is immediate from (iii) and (iv).
“(i)⇔(vi)”: Since M = (M ∗ )t , the equivalence of (i) and (vi) is immediate from (ii) and
(v).
“(i)⇔(vii)” is a special case of what was shown in Th. 10.22(i).
“(vii)⇔(viii)” holds due to Th. 10.8(e).
Corollary 10.37. Let X, h·, ·i be a finite-dimensional inner product space over R let
f : X −→ X. Then f is an isometry (e.g. a Euclidean isometry of Rn ) if, and only if,
f = L + a with an orthogonal map L ∈ O(X) and a ∈ X.
Proof. (a): It suffices to show hA∗ x − λx, A∗ x − λxi = 0. To this end, we use Ax = λx
to compute
hA∗ x − λx, A∗ x − λxi = hA∗ x, A∗ xi − λhx, A∗ xi − λhA∗ x, xi + λ λhx, xi
= hAA∗ x, xi − λhAx, xi − λhx, Axi + λ λhx, xi
= hA∗ Ax, xi − λ λhx, xi − λ λhx, xi + λ λhx, xi
= hAx, Axi − λ λhx, xi = 0,
thereby establishing the case.
(b): Let dim U = m ∈ N and let BU := {u1 , . . . , um } be an orthonormal basis of U . As
U is A-invariant, there exist akl ∈ K such that
X m
∀ Aul = akl uk .
l∈{1,...,m}
k=1
Define m
X
∀ xl := A∗ ul − alk uk .
l∈{1,...,m}
k=1
∗
To show the A -invariance of U , it suffices to show that, for each l ∈ {1, . . . , m}, xl = 0,
i.e. hxl , xl i = 0. To this end, for each l ∈ {1, . . . , m}, we compute
m
X m
X m X
X m
∗ ∗ ∗ ∗
hxl , xl i = hA ul , A ul i − alk hA ul , uk i − alk huk , A ul i + alj alk huk , uj i
k=1 k=1 j=1 k=1
m
X m
X m
X
= hA∗ Aul , ul i − alk hul , Auk i − alk hAuk , ul i + |alk |2
k=1 k=1 k=1
m
X m
XX m X m
m X m
X
= |akl |2 − ajk alk hul , uj i − ajk alk huj , ul i + |alk |2
k=1 k=1 j=1 k=1 j=1 k=1
Xm Xm m
X m
X m
X m
X
= |akl |2 − |alk |2 − |alk |2 + |alk |2 = |akl |2 − |alk |2 ,
k=1 k=1 k=1 k=1 k=1 k=1
10 VECTOR SPACES WITH INNER PRODUCTS 167
implying
m
X
hxl , xl i = 0.
l=1
As hxl , xl i ≥ 0 for each l ∈ {1, . . . , m}, this implies the desired hxl , xl i = 0 for each
l ∈ {1, . . . , m}, thereby proving A∗ (U ) ⊆ U . We will now make use of this result to also
show A(U ⊥ ) ⊆ U ⊥ : Let u ∈ U and x ∈ U ⊥ . Then
A∗ u∈U
hu, Axi = hA∗ u, xi = 0,
proving Ax ∈ U ⊥ .
Theorem 10.39. Let X, h·, ·i be an inner product space over C, dim X = n ∈ N.
(i) A ∈ Nor(X).
(ii) There exists an orthonormal basis B of X, consisting of eigenvectors of A.
(iii) There exists f ∈ C[X], deg f ≤ n − 1, such that A∗ = ǫA (f ), where, as before,
ǫA : C[X] −→ L(X, X) denotes the substitution homomorphism introduced in
Def. and Rem. 7.10.
Proof. (a): “(i)⇒(ii)”: We prove the existence of the orthonormal basis of eigenvectors
via induction on n ∈ N. For n = 1, there is nothing to prove. Thus, let n > 1. As
C is algebraically closed, there exists λ ∈ σ(A). Let 0 6= v ∈ X be a corresponding
eigenvector and U := span{v}. According Prop. 10.38(b), both U and U ⊥ are A-
invariant. Moreover, A ↾U ⊥ is normal, since, if A and A∗ commute on X, they also
commute on the subspace U ⊥ . Thus, by induction hypothesis, U ⊥ has an orthonormal
basis B ′ , consisting of eigenvectors of A. Thus, X also has an orthonormal basis,
consisting of eigenvectors of A.
“(ii)⇒(iii)”: Let {v1 , . . . , vn } be an orthonormal basis of X, consisting of eigenvectors of
A, where Avj = λj vj , i.e. λ1 , . . . , λn ∈ C are the corresponding eigenvalues of A. Using
10 VECTOR SPACES WITH INNER PRODUCTS 168
Pn−1
Lagrange interpolation according to [Phi21, Th. 3.4], let f := k=0 fk X k ∈ C[X] with
f0 , . . . , fn−1 ∈ C be a polynomial of degree at most n − 1, satisfying
∀ ǫλj (f ) = λj , (10.19)
j∈{1,...,n}
and define B := ǫA (f ) (if all eigenvalues are distinct, then f is uniquely determined
by (10.19), otherwise, there exist infinitely many such
Ppolynomials, all resulting in the
∗ n
same map B = A , see below). Then, for each y := j=1 βj vj with β1 , . . . , βn ∈ C, we
obtain
n−1
! n ! n n−1
! n n−1
!
X X X X X X
By = fk B k βj vj = βj fk B k vj = βj fk λkj vj
k=0 j=1 j=1 k=0 j=1 k=0
n
(10.19) X
= βj λj vj .
j=1
proving A ∈ Nor(X).
(b): “(i)⇒(ii)”: Let M ∈ Norn (C). We consider M as a normal map on Cn with the
standard inner product h·, ·i, where the standard basis of Cn constitutes an orthonormal
basis. Then we know there exists an ordered orthonormal basis B := (x1 , . . . , xn ) of
Cn such that, with respect to B, M has the diagonal matrix D. Thus, there exists
U = (ukl ) ∈ GLn (C) such that D = U −1 M U and
n
X
∀ xl = ukl ek .
l∈{1,...,n}
k=1
Then n n
n X
X X
∀ ukl ukj = ukl umj hek , em i = hxl , xj i = δlj ,
l,j∈{1,...,n}
k=1 k=1 m=1
10 VECTOR SPACES WITH INNER PRODUCTS 169
M M ∗ = U DU −1 (U −1 )∗ D∗ U ∗ = U D Idn D∗ U ∗ = U D∗ Idn DU ∗
= (U −1 )∗ D∗ U ∗ U DU −1 = M ∗ M,
χA = (X − a)(X − c) − b2 = X 2 − (a + c)C + ac − b2
Then n n
n X
X X
∀ ukl ukj = ukl umj hek , em i = hxl , xj i = δlj ,
l,j∈{1,...,n}
k=1 k=1 m=1
Proof. First, we consider the case K = C: According to Th. 10.39(a)(iii), there exists a
polynomial f ∈ C[X] such that N ∗ = ǫN (f ). Clearly, AN = N A implies A to commute
with powers of N and, thus, with N ∗ = ǫN (f ). If A ∈ M(n, C), N ∈ Norn (C), then, with
respect to the standard basis of Cn , A represents a map fA ∈ L(Cn , Cn ) and N represents
a map fN ∈ Nor(Cn ). Then AN = N A implies fA fN = fN fA , which, as already shown,
implies fA (fN )∗ = (fN )∗ fA , which, in turn, implies AN ∗ = N ∗ A (since AN ∗ represents
fA (fN )∗ and N ∗ A represents (fN )∗ fA ). For matrices, the case K = R is an immediate
special case of the case K = C. Now, if K = R, A ∈ L(X, X) and N ∈ Nor(X),
then there exists n ∈ N and an ordered orthonormal basis B := (v1 , . . . , vn ) of X such
that, with respect to B, A is represented by MA ∈ M(n, R) and N is represented by
10 VECTOR SPACES WITH INNER PRODUCTS 171
Thus,
∀ (AB)(AB)∗ vj = αj βj β j αj vj = β j αj αj βj vj = (AB)∗ (AB)vj ,
j∈{1,...,n}
11 DEFINITENESS OF QUADRATIC MATRICES OVER K 172
proving AB ∈ Nor(X). In the same way, one also sees B ∗ A = AB ∗ and A∗ B = BA∗
such that the computation from above, once again, shows (A + B)(A∗ + B ∗ ) = (A +
B)∗ (A + B).
(b) A is called positive definite if, and only if, A is positive semidefinite and x∗ Ax =
0 ⇔ x = 0, i.e. if, and only if, x∗ Ax > 0 for each 0 6= x ∈ Kn .
(d) A is called negative definite if, and only if, A is negative semidefinite and x∗ Ax =
0 ⇔ x = 0, i.e. if, and only if, x∗ Ax < 0 for each 0 6= x ∈ Kn .
(e) A is called indefinite if, and only if, A is neither positive semidefinite nor negative
semidefinite, i.e. if, and only if,
∗ ∗ + ∗ −
∃ n x Ax ∈ /R ∨ ∃ n x Ax ∈ R ∧ y Ay ∈ R .
x∈K x,y∈K
(a) A is positive definite (positive semidefinite) if, and only if, −A is negative definite
(negative semidefinite).
Proof. The equivalences are immediate from Def. 11.1, since, for each x ∈ Kn , x∗ Ax >
0 ⇔ x∗ (−A)x < 0, x∗ Ax = 0 ⇔ x∗ (−A)x = 0, x∗ Ax ∈ R ⇔ x∗ (−A)x ∈ R.
Proof. (a): If A is positive semidefinite, then A is Hermitian by Prop. 10.34 and, thus,
by Th. 10.41, all eigenvalues of A are real. If λ ∈ R is an eigenvalue of A and x ∈ Cn \{0}
is a corresponding eigenvector, then
0 ≤ x∗ Ax = x∗ λx = λkxk22 ,
where the inequality is strict in the case where A is positive definite. As kxk22 ∈ R+ ,
λ ∈ R+ +
0 and even λ ∈ R if A is positive definite. Then det A ≥ 0 (resp. det A > 0) also
follows, as det A is the product of the eigenvalues of A (cf. Cor. 8.5). Now assume A to
be Hermitian with only nonnegative (resp. positive) eigenvalues λ1 , . . . , λn ∈ R and let
{v1 , . . . , vn } be a corresponding orthonormal basis of eigenvectors (i.e. Avj = Pλnj vj and
∗ n
vj vk = δjk ). Then, for each x ∈ C , there exist α1 , . . . , αn ∈ C such that x = j=1 αj vj ,
implying
n
!∗ n
! n
! n !
X X X X
x∗ Ax = αk vk A αj vj = αk∗ vk∗ α j λj v j
k=1 j=1 k=1 j=1
n
X n
X
= αk∗ αk λk = |αk |2 λk ,
k=1 k=1
showing x∗ Ax ∈ R+ ∗ +
0 with x Ax ∈ R , for λ1 , . . . , λn ∈ R
+
and x 6= 0. Thus, A is
positive semidefinite and even positive definite if all λj are positive.
(b) follows by combining (a) with Lem. 11.2.
(c) is an immediate consequence of (a) and (b).
11 DEFINITENESS OF QUADRATIC MATRICES OVER K 174
Proof. (a): Since A = AHer + AskHer = Asym + Askew by Prop. 10.33(b), “(i) ⇔ (ii)” holds
as Prop. 10.35(c) yields x∗ Askew x = 0 and x∗ Ax = x∗ Asym x for each x ∈ Rn . Since Asym
is Hermitian, “(ii) ⇔ (iii)” is due to Th. 11.3(a).
(b) follows by combining (a) with Lem. 11.2.
(c) is an immediate consequence of (a) and (b).
Notation 11.5. If A = (aij ) is an n × n matrix, n ∈ N, then, for 1 ≤ k ≤ l ≤ n, let
akk . . . akl
Akl := ... . . . ... (11.1)
alk . . . all
A Multilinear Maps
Theorem A.1. Let V and W be vector spaces over the field F , α ∈ N. Then, as vector
spaces over F , L(V, Lα−1 (V, W )) and Lα (V, W ) are isomorphic via the isomorphism
Proof. Since L is linear and L(x1 ) is (α − 1) times linear, Φ(L) is, indeed, an element
of Lα (V, W ), showing that Φ is well-defined by (A.1). Next, we verify Φ to be linear: If
λ ∈ F and K, L ∈ L(V, Lα−1 (V, W )), then
and
Then, clearly, for each x1 ∈ V , L(x1 ) ∈ Lα−1 (V, W ). Moreover, L is linear, i.e. L ∈
L(V, Lα−1 (V, W )). Comparing (A.2) with (A.1) shows Φ(L) = K, i.e. Φ is surjective,
completing the proof.
Definition B.1. (a) A semigroup (M, ◦) is called a monoid if, and only if, there exists
a neutral element e ∈ M (thus, a magma (M, ◦) is a monoid if, and only if, ◦ is
associative and M contains a neutral element).
Proof. If (U, ◦) is a monoid, then, clearly, it must satisfy (i) and (ii). Thus, we merely
need to show that (i) and (ii) are sufficient for (U, ◦) to be a monoid. According to (i),
◦ maps U × U into U . As ◦ is associative on M , it is also associative on U and, thus,
(ii) yields (U, ◦) to be a monoid.
(b) Let (M, ·) be a monoid with neutral element e ∈ M and let I be a set. Then, by
[Phi19, Th. 4.9(e)], F(I, M ) = M I becomes a monoid, if · is defined pointwise on
M I , where · is also commutative on M I if it is commutative on M . A submonoid
of (M I , ·) is given by (Mfin
I I
, ·), where, as in [Phi19, Ex. 5.16(c)], Mfin denotes the
set of functions f : I −→ M such that there exists a finite set If ⊆ I, satisfying
(c) Let I be a set and n ∈ N. By combining (a) and (b), we obtain the commutative
monoids ((N0 )n , +), ((N0 )I , +), ((N0 )Ifin , +).
Definition B.4. Let (R, +, ·) be a commutative ring with unity and let (M, +) be a
commutative monoid. We call
M
= (f : M −→ R) : #f −1 (R \ {0}) < ∞
R[M ] := Rfin (B.2)
B POLYNOMIALS IN SEVERAL VARIABLES 178
the set of M -polynomials over R. We then have the pointwise-defined addition and
scalar multiplication on R[M ], which it inherits from RM :
where we know from [Phi19, Ex. 5.16(c)] that, with these compositions, R[M ] forms a
vector space over R, provided R is a field and, then, B = {ei : i ∈ M }, where
∀ ei : M −→ R, ei (j) := δij ,
i∈M
provides the standard basis of the vector space R[M ]. In the current context, we will
now write X i := ei and we will call these polynomials monomials. Furthermore, we
define a multiplication on R[M ] by letting
(ai )i∈M , (bi )i∈M 7→ (ci )i∈M := (ai )i∈M · (bi )i∈M ,
(B.4)
X X
ci := ak bl := ak b l ,
k+l=i (k,l)∈M 2 : k+l=i
where we note that, due to (B.2), only finitely many of the summands in the sum are
nonzero. If f := (ai )i∈M ∈ R[M ], then we call the ai ∈ R the coefficients of f .
Remark B.5. In the situation of Def. B.4, using the notation X i = ei , we can write ad-
dition, scalar multiplication, and
P multiplicationP
in the following, perhaps, more familiar-
looking forms: If λ ∈ R, f = i∈M fi X i , g = i∈M gi X i (each fi , gi ∈ R), then
X
f +g = (fi + gi ) X i ,
i∈M
X
λf = (λfi ) X i ,
i∈M
!
X X
fg = fk g l Xi
i∈M k+l=i
(as in (B.4), due to (B.2), only finitely many of the summands in each sum are nonzero).
Theorem B.6. Let (R, +, ·) be a commutative ring with unity and let (M, +) be a
commutative monoid. Then (R[M ], +, ·) forms a commutative ring with unity, where
1 = X 0 is the neutral element of multiplication.
B POLYNOMIALS IN SEVERAL VARIABLES 179
Proof. We already know from [Phi19, Ex. 4.9(e)] that (R[M ], +) forms a commutative
group. To verify associativity of multiplication, let a, b, c, d, f, g, h ∈ R[M ],
ι : R −→ R[M ], ι(r) := r X 0 .
Proof. The map ι is unital, since ι(1) = X 0 ; ι is a ring homomorphism, since, for each
r, s ∈ R, ι(r +s) = (r +s)X 0 = rX 0 +sX 0 = ι(r)+ι(s) and ι(rs) = rsX 0 = rX 0 ·sX 0 =
ι(r) ι(s); ι is injective, since, for r 6= 0, ι(r) = rX 0 6≡ 0.
Example B.8. Let (R, +, ·) be a commutative ring with unity and let (M, +) be a
commutative monoid.
B POLYNOMIALS IN SEVERAL VARIABLES 180
(a) N0 -polynomials over R are polynomials in one variable over R as defined in Def.
7.1: For (M, +) = (N0 , +), the definition of an M -polynomial over R according to
Def. B.4 is precisely the same as that of a polynomial over R according to Def. 7.1,
i.e. R[X] = R[N0 ].
(b) (N0 )n -polynomials are polynomials in n variables: If (M, +) = ((N0 )n , +), then one
can interpret the X occurring in the monomial X (i1 ,...,in ) with (i1 , . . . , in ) ∈ (N0 )n
as the n-tuple of variables X = (X1 , . . . , Xn ) such that the monomial becomes
X (i1 ,...,in ) = X1i1 · · · Xnin . In consequence, it is also common to introduce the notation
R[X1 , . . . , Xn ] := R[(N0 )n ].
(c) (N0 )Ifin -polynomials, where I is an arbitrary set, are polynomials in the variables
(Xi )i∈I (possibly infinitely many): If ν ∈ (N0 )Ifin , then, consistently with the nota-
tion of Ex. B.3(b), we let Iν := ν −1 (N), i.e. Iν is the finite set satisfying
If (M, +) = ((N0 )Ifin , +), then one can interpret the X occurring in the monomial
X ν with ν ∈ (N0 )Ifin as the (#Iν )-tuple of variables X = (Xi )i∈Iν (with X = X 0 = 1
ν(i)
if Iν = ∅), such that the monomial becomes X ν = i∈Iν Xi . In consequence, it
Q
is also common to introduce the notation
[
f (I) := Iν . (B.5)
ν∈(N0 )Ifin : fν 6=0
showing [
R[(N0 )Ifin ] = R[(N0 )J ].
J⊆I: #J<∞
Theorem B.9. Let R be a commutative ring with unity and let M be a commutative
monoid. Moreover, let S be another commutative ring with unity and assume we have
a unital ring homomorphism φ : R −→ S as well as a homomorphism µ : (M, +) −→
(S, ·) with µ(0) = 1. Then the map
!
X X
Φ : R[M ] −→ S, Φ fi X i := φ(fi ) µ(i), (B.6)
i∈M i∈M
(one calls this kind of property a universal property to the polynomial ring R[M ], as one
can show it uniquely determines R[M ] up to a canonical isomorphism, cf. [Bos13, Ch.
2.5]).
as well as
! !
X X X X
Φ(f g) = φ fk g l µ(i) = φ(fk ) φ(gl ) µ(i)
i∈M k+l=i i∈M k+l=i
0∈M
X X X
= φ(fk ) φ(gl ) µ(i) = φ(fk ) φ(gl ) µ(k + l)
i∈M k+l=i k,l∈M
! !
X X X
= φ(fk ) φ(gl ) µ(k)µ(l) = φ(fk ) µ(k) φ(gl ) µ(l) = Φ(f )Φ(g),
k,l∈M k∈M l∈M
proving (B.7).
To prove uniqueness, let Ψ : R[M ] −→ S be an arbitrary ring homomorphism such that
Ψ↾R = φ and Ψ(X i ) = µ(i) for each i ∈ M . Then, for each f = i∈M fi X i ∈ R[M ],
P
! !
X X X X
Ψ fi X i = Ψ(fi ) Ψ(X i ) = φ(fi ) µ(i) = Φ fi X i ,
i∈M i∈M i∈M i∈M
establishing Ψ = Φ, as desired.
From now on, we will restrict ourselves to the cases of Ex. B.8, where, actually, Ex.
B.8(a) is a special case of Ex. B.8(b), and Ex. B.8(b), in turn, is a special case of Ex.
B.8(c). Thus, we restrict ourselves to M -polynomials, where (M, +) = ((N0 )Ifin , +) and
I may be an arbitrary set.
The ring isomorphisms provided by the following Cor. B.10 sometimes allow one to
establish properties for polynomial rings in finitely many variables via induction proofs.
Corollary B.10. Let (R, +, ·) be a commutative ring with unity and n ∈ N, n ≥ 2. Then
R[X1 , . . . , Xn ] and (R[X1 , . . . , Xn−1 ])[Xn ] are isomorphic via the ring isomorphism
Φ : (R[X1 , . . . , Xn−1 ])[Xn ] ∼
= R[X1 , . . . , Xn ],
N
X X X
Φ fν,k X ν Xnk := fα X α ,
k=0 ν∈(N0 )n−1 α∈(N0 )n
B POLYNOMIALS IN SEVERAL VARIABLES 183
where (
fν,k if α(n) = k and α↾1,...,n−1 = ν,
fα :=
0 otherwise.
Moreover, noting that we have the unital ring monomorphism (the map called ι in Ex.
B.8(c))
X X
φ : R[X1 , . . . , Xn−1 ] −→ R[X1 , . . . , Xn ], φ fν X ν := fν X ν Xn0 ,
ν∈(N0 )n−1 ν∈(N0 )n−1
µ(k) := Xnk ,
µ : (N0 , +) −→ R[X1 , . . . , Xn ], · ,
Φ is the unique ring homomorphism from (R[X1 , . . . , Xn−1 ])[Xn ] into R[X1 , . . . , Xn ] with
Φ↾R[X1 ,...,Xn−1 ] = φ and
∀ Φ(Xnk ) = Xnk .
k∈N0
Proposition B.11. Let R be a commutative ring with unity and let I be a set. If R is
an integral domain (i.e. if it has no nonzero zero divisors), then R[(N0 )Ifin ] is an integral
domain as well and, moreover, (R[(N0 )Ifin ])∗ = R∗ .
Proof. To apply Cor. B.10, note that the result was proved for the polynomial ring in one
variable (i.e. for R[X]) in Prop. 7.7. Now let n ∈ N, n ≥ 2, and, by induction hypothesis,
∗
assume R[X1 , . . . , Xn−1 ] has no nonzero zero divisors and R[X1 , . . . , Xn−1 ] = R∗ .
Then Prop. 7.7 yields that (R[X1 , . . . , Xn−1 ])[Xn ] has no nonzero zero divisors and
∗
(R[X1 , . . . , Xn−1 ])[Xn ] = R∗ . An application of Cor. B.10, in turn, provides that
∗
R[X1 , . . . , Xn ] has no nonzero zero divisors and R[X1 , . . . , Xn ] = R∗ , completing
the induction proof of the proposition’s assertion for polynomial rings in finitely many
variables. Proceeding to the case of a general, possibly infinite, set I, if f, g ∈ R[(N0 )Ifin ]\
{0}, then, using the notation introduced in (B.5), we have
f ∈ R[(N0 )f (I) ], g ∈ R[(N0 )g(I) ], f g ∈ R[(N0 )(f g)(I) ] ⊆ R[(N0 )f (I)∪g(I) ].
Since R[(N0 )f (I)∪g(I) ] is a polynomial ring in finitely many variables, we conclude f g 6= 0,
showing R[(N0 )Ifin ] has no nonzero zero divisors. Similarly, f ∈ (R[(N0 )Ifin ])∗ implies
f ∈ R∗ due to f ∈ R[(N0 )f (I) ] being an element of a polynomial ring in finitely many
variables.
Notation B.12. If I is a set and ν ∈ (N0 )Ifin , then define
X
|ν| := ν(i).
i∈I
Proof. Noting that the following sums are, actually, finite, we compute, for each ν1 , ν2 ∈
(N0 )Ifin ,
X X X X
|ν1 + ν2 | = (ν1 + ν2 )(i) = ν1 (i) + ν2 (i) = ν1 (i) + ν2 (i) = |ν1 | + |ν2 |,
i∈I i∈I i∈I i∈I
X
hd (f ) := fν X ν
ν∈(N0 )Ifin : |ν|=d
Remark B.15. In the situation of Def. B.14(c), for f = ν∈(N0 )I fν X ν ∈ R[(N0 )Ifin ],
P
fin
the definition of the hd (f ) immediately yields
∞ deg f
X X
f= hd (f ) = hd (f ). (B.10)
d=0 d=0
Lemma B.16. Let R be a commutative ring with unity and let I be a set. If f, g ∈
R[(N0 )Ifin ] are homogeneous of degree df and dg , respectively (df , dg ∈ N0 ), then f g is
a homogeneous of degree df + dg (note that, according to Def. B.14(b), this does not
exclude the possibility f g = 0).
fν X ν and g = gν X ν , then
P P
Proof. If f = ν∈(N0 )Ifin ν∈(N0 )Ifin
!
X X
fg = fν1 g ν2 Xν.
ν∈(N0 )Ifin ν1 +ν2 =ν
Theorem B.17. Let P R be a commutativeP ring with unity and let I be a set. If f, g ∈
I ν ν
R[(N0 )fin ] with f = ν∈(N0 )I fν X , g = ν∈(N0 )I gν X , then
fin fin
(
−∞ ≤ max{deg f, deg g} if f = −g,
deg(f + g) = (B.11a)
max{|ν| : fν 6= −gν } ≤ max{deg f, deg g}, otherwise,
deg(f g) ≤ deg f + deg g. (B.11b)
According to Lem. B.16, each product hd1 (f ) hd2 (g) is homogeneous of degree d1 + d2
(where, in general, hd1 (f ) hd2 (g) = 0 is not excluded), proving (B.11b). If R has no
nonzero zero divisors, then, by Prop. B.11, R[(N0 )Ifin ] has no nonzero zero divisors,
implying hdeg f (f ) hdeg g (g) 6= 0 and, thus, (B.11c).
Definition and Remark B.18. Let R be a commutative ring with unity, let I be a
set, and let R′ be a commutative ring extension of R, where ι : R −→ R′ is a unital
ring monomorphism. For each x := (xi )i∈I ∈ (R′ )I , the map
X X
ǫx : R[(N0 )Ifin ] −→ R′ , f 7→ ǫx (f ) = ǫx fν X ν := f ν xν , (B.12)
ν∈(N0 )Ifin ν∈(N0 )Ifin
where
ν(i)
Y
∀ ∀ xν := xi (B.13)
ν∈(N0 )Ifin x=(xi )i∈I ∈(R′ )I
i∈I
(the product is, actually, always finite and, thus, well-defined due to the commutativ-
ity of R′ ) is called the substitution homomorphism or evaluation homomorphism corre-
sponding to x: Indeed, if x ∈ R′ , then we may apply Th. B.9 with S := R′ , φ := ι,
M := (N0 )Ifin , and µ : (M, +) −→ (R′ , ·) defined by
Y ν(i)
µ(ν) := xν = xi : (B.14)
i∈I
B POLYNOMIALS IN SEVERAL VARIABLES 187
x0i = 1 ∈ R′ and
Q
Then µ(0) = i∈I
! !
ν (i)+ν2 (i) ν (i) ν2 (i) ν (i) ν (i)
Y Y Y Y
µ(ν1 + ν2 ) = xi 1 = xi 1 xi = xi 1 xi 2 = µ(ν1 ) µ(ν2 )
i∈I i∈I i∈I i∈I
We call x ∈ (R′ )I a zero or a root of f ∈ R[(N0 )Ifin ] if, and only if, ǫx (f ) = 0.
Remark B.19. While Def. and Rem. B.18 is the analogon to Def. and Rem. 7.10
for polynomials in one variable, we note that condition (7.7) has been replaced by
the stronger assumption that R′ be commutative. This was used in the proof that
µ : (M, +) −→ (R′ , ·) is a homomorphism. It would suffice to replace (7.7) by the
assumption that, given x := (xi )i∈I ∈ (R′ )I , ab = ba holds for all a, b ∈ R ∪ {xi : i ∈ I},
but this still means that, for polynomials in several variables, one can, in general, no
longer use rings of matrices over R for R′ (one can no longer substitute matrices over R
for the variables of the polynomial).
Lemma B.20. Let (R, +, ·) be a commutative ring with unity and n ∈ N, n ≥ 2. If
Φ : (R[X1 , . . . , Xn−1 ])[Xn ] ∼
= R[X1 , . . . , Xn ]
is the ring isomorphism given by Cor. B.10 and R′ is a commutative ring extension of
R, then, for each x = (x1 , . . . , xn ) ∈ (R′ )n and each (f1 , . . . , fN ) ∈ (R[X1 , . . . , Xn−1 ])N ,
N ∈ N0 : !! !
X N XN
ǫx Φ fk Xnk = ǫx n ǫ(x1 ,...,xn−1 ) (fk )Xnk .
k=0 k=0
Proof. Suppose, for each k ∈ {0, . . . , N }, we have ν∈(N0 )n−1 fν,k X ν ∈ R[X1 , . . . , Xn−1 ].
P
From Cor. B.10, we recall
XN X X
Φ fν,k X ν Xnk = fα X α ,
k=0 ν∈(N0 )n−1 α∈(N0 )n
B POLYNOMIALS IN SEVERAL VARIABLES 188
where (
fν,k if α(n) = k and α↾1,...,n−1 = ν,
fα :=
0 otherwise.
Thus, using the notation of (B.13), for each x = (x1 , . . . , xn ) ∈ (R′ )n ,
XN X X
ǫx Φ fν,k X ν Xnk = f α xα
k=0 ν∈(N0 )n−1 α∈(N0 )n
N n−1 N n−1
ν(i) ν(i)
X X Y X X Y
= fν,k xkn xi = fν,k xi xkn
k=0 ν∈(N0 )n−1 i=1 k=0 ν∈(N0 )n−1 i=1
XN X
= ǫx n ǫ(x1 ,...,xn−1 ) fν,k X ν Xnk ,
k=0 ν∈(N0 )n−1
denote the field of fractions of R[x], which, using the isomorphism of Def. and Rem.
7.41, we consider to be a subset of L, i.e.
R ⊆ R[x] ⊆ R(x) ⊆ L.
If n ∈ N and x1 , . . . , xn ∈ R′ , then we also use the simplified notation R[x1 , . . . , xn ] and
R(x1 , . . . , xn ), respectively.
Proposition B.22. In the situation of Not. B.21, the following holds:
(a)
\
S := S ⊆ R′ : R ∪ {xi : i ∈ I} ⊆ S ∧ S is subring of R′ ,
R[x] = S,
S∈S
Theorem B.23. Let R be a commutative ring with unity, let I be a set, and consider
the map
I
φ : R[(N0 )Ifin ] −→ R(R ) , f 7→ φ(f ), φ(f )(x) := ǫx (f ). (B.15)
We define
Pol(R, (xi )i∈I ) := R[(xi )i∈I ] := φ R[(N0 )Ifin ]
and call the elements of Pol(R, (xi )i∈I ) polynomial functions (in n variables for #I =
n ∈ N; in infinitely many variables for I being infinite).
I
(a) φ is a unital ring homomorphism (in particular, Pol(R, (xi )i∈I ) is a subring of R(R )
and φ is a unital ring epimorphism onto Pol(R, (xi )i∈I )). If R is a field, then φ
is also linear (in particular, Pol(R, (xi )i∈I ) is then a vector subspace of the vector
I
space R(R ) over R and φ is then a linear epimorphism onto Pol(R, (xi )i∈I )).
(c) If F := R is an infinite field, then φ : R[(N0 )Ifin ] −→ Pol(R, (xi )i∈I ) is an isomor-
phism.
I
Proof. We first recall that we know R(R ) to be a commutative ring with unity from
I
[Phi19, Ex. 4.42(a)]. Similarly, if R is a field, then R(R ) is a vector space over R
according to [Phi19, Ex. 5.2(a)].
(a): If f = X 0 , then φ(f ) ≡ 1. We also know from Def. and Rem. B.18 that, for each
x ∈ RI , ǫx is a linear ring homomorphism. Thus, if f, g ∈ R[(N0 )Ifin ] and λ ∈ R, then,
for each x ∈ RI ,
φ(f + g)(x) = ǫx (f + g) = ǫx (f ) + ǫx (g) = φ(f ) + φ(g) (x),
φ(f g)(x) = ǫx (f g) = ǫx (f ) ǫx (g) = φ(f )φ(g) (x),
φ(λf )(x) = ǫx (λf ) = λ ǫx (f ) = λ φ(f ) (x).
Thus, φ is unital linear ring epimorphism onto Pol(R, (xi )i∈I ) (directly from the defini-
tion of Pol(R, (xi )i∈I )). In consequence, Pol(R, (xi )i∈I ) is a commutative ring with unity
by [Phi19, Prop. 4.37] and, if R is a field, then Pol(R, (xi )i∈I ) is a vector space over R
by [Phi19, Prop. 6.3(c)].
I I
(b): If R and I are both finite, then R(R ) and Pol(R, (xi )i∈I ) ⊆ R(R ) are finite as well,
(N )I
whereas R[(N0 )Ifin ] = Rfin0 fin is infinite (if I 6= ∅). In particular, φ can not be injective.
Now let I be infinite and let J ⊆ I be nonempty and finite. We recall the unital ring
monomorphism ι : R[(N0 )J ] −→ R[(N0 )Ifin ] introduced in Ex. B.8(c) and also consider
J
φJ : R[(N0 )Jfin ] −→ R(R ) , f 7→ φ(f ), φJ (f )(x) := ǫx (f ),
then
φ↾ι(R[(N0 )J ]) = ιJ ◦ φJ ◦ ι−1 : (B.16)
fν X ν ∈ R[(N0 )J ], x := (xi )i∈I ∈ RI and xJ := (xi )i∈J , then
P
Indeed, if f = ν∈(N0 )J
ν(i) ν(i)
X Y X Y
φ(ι(f ))(x) = φ fν Xi (x) = fν xi
ν∈(N0 )J i∈J ν∈(N0 )J i∈J
ν(i)
X Y
= φJ fν Xi (xJ ) = φJ (f )(xJ ) = ιJ φJ (f ) (x),
ν∈(N0 )J i∈J
B POLYNOMIALS IN SEVERAL VARIABLES 191
and
φn : F [Xn ] −→ Pol[F ], h 7→ φn (h), φn (h)(xn ) := ǫxn (h).
Now let (f1 , . . . , fN ) ∈ (F [X1 , . . . , Xn−1 ])N , N ∈ N0 , such that Φ−1 (f ) = N k
P
k=0 fk Xn .
n
Then we know from Lem. B.20 that, for each x = (x1 , . . . , xn ) ∈ F ,
N
!! N
!
X X
0 = φ(f )(x) = ǫx (f ) = ǫx Φ fk Xnk = ǫx n ǫ(x1 ,...,xn−1 ) (fk )Xnk ,
k=0 k=0
showing N
P k
PN k
k=0 ǫ(x1 ,...,xn−1 ) (fk )Xn ∈ ker φn , i.e. k=0 ǫ(x1 ,...,xn−1 ) (fk )Xn = 0, since φn is
injective. Thus, f1 , . . . , fn ∈ ker ψ, implying f1 = · · · = fn = 0, since ψ is injective. In
consequence, we have shown Φ−1 (f ) = 0 and f = 0 as well, proving φ to be injective, as
desired. This concludes the induction and the proof for the case of polynomials in finitely
many variables. It merely to consider the case, where φ(f ) = 0 and f ∈ F [(N0 )Ifin ], I
being infinite and F being and infinite field. According to Ex. B.8(c), f ∈ F [(N0 )f (I) ] ⊆
F [(N0 )Ifin ], where f (I) ⊆ I is the finite set defined in Ex. B.8(c). Thus, f is, actually, a
polynomial in only finitely many variables and, since we already know φ↾F [(N0 )f (I) ] to be
injective, φ(f ) = 0 implies f = 0, proving φ to be injective.
Remark B.24. Let R be a commutative ring with unity, let I be a set.PIf P : RI −→ R
is a polynomial function as defined in Th. B.23, then there exists f = ν∈(N0 )I fν X ν ∈
fin
R[(N0 )Ifin ] with P = φ(f ). Thus, for each x = (xi )i∈I ∈ RI ,
X X Y ν(i)
P (x) = ǫx (f ) = f ν xν = fν xi (B.17)
ν∈(N0 )Ifin ν∈(N0 )Ifin i∈I
C QUOTIENT RINGS 192
(all sums and products being, actually, finite). Thus, the polynomial functions are
precisely the linear combinations of the monomial functions x 7→ xν , ν ∈ (N0 )Ifin . Caveat:
In general, the representation of P given by (B.17) is not unique: For example, if R is
finite and I is nonempty, then it is not unique due to Th. B.23(b) (also cf. Ex. 7.18 and
Rem. 7.19).
Remark B.25. In Cor. 7.37, we concluded that the ring S := F [X] is factorial for
each field F (i.e. each 0 6= a ∈ S \ S ∗ admits a factorization into prime elements, which
is unique up to the order of the primes and up to association), as a consequence of
F [X] being a principal ideal domain. One can actually show, much more generally,
that, if R is a factorial ring, then R[(N0 )Ifin ] is a factorial ring as well (proofs for the
case of polynomial rings in finitely many variables can be found, e.g., in [Bos13, Sec.
2.7] and [Lan05, Ch. 4§2] – the case, where I is infinite can then be treated by the
method we used several times above, using that each element f ∈ R[(N0 )Ifin ] is, actually,
a polynomial in only finitely many variables). The reason the general result does not
follow as easily as the one in Cor. 7.37 lies in the fact that, in general, R[(N0 )Ifin ] is no
longer a principal ideal domain.
C Quotient Rings
In [Phi19, Def. 4.26], we defined the quotient group G/N of a group G with respect
to a normal subgroup N ; in [Phi19, Sec. 6.2], we defined the quotient space V /U of a
vector space V over a field F with respect to a subspace U . For a ring R, there exists
an analogous notion, where ideals a ⊆ R now play the role of the normal subgroup.
For simplicity, we will restrict ourselves to the case of a commutative ring R with unity.
According to Def. 7.22(a), every ideal a ⊆ R is an additive subgroup of R and we can
form the quotient group R/a. We will see below that we can even make a ⊆ R into a
commutative ring with unity, called the quotient ring or factor ring of R with respect
to a, where a being an ideal guarantees the well-definedness of the multiplication on
a ⊆ R. As we write (R, +) as an additive group, we write the respective cosets (i.e. the
elements of R/a) as x + a, x ∈ R.
Theorem C.1. Let R be a commutative ring with unity and let a ⊆ R be an ideal in R.
satisfies
(c) R/a with the compositions of (a) forms a commutative ring with unity and φa of
(b) constitutes a unital ring epimorphism.
Proof. (a): The composition + is well-defined by [Phi19, Th. 4.27(a)]. To verify that
· is well-defined as well, suppose x, y, x′ , y ′ ∈ R are such that x + a = x′ + a and
y + a = y ′ + a. We need to show xy + a = x′ y ′ + a. There exist ax , ay ∈ a such that
x′ = x + ax , y ′ = y + ay . Since N is a normal subgroup of G, we have bN = N b and
there exists n ∈ N such that na b = bn. Then, as nnb N = N , we obtain
xy + a = (x′ − ax )(y ′ − ay ) + a = x′ y ′ + x′ ay − ax y ′ + ax ay + a = x′ y ′ + a,
Lemma C.2. Let (G, ·) be a group (not necessarily commutative) and let N ⊆ G be a
normal subgroup of G. Consider the map
Next, we formulate and proof a version of the previous lemma for commutative rings
with unity (where ideals now replace subgroups).
Lemma C.3. Let R be a commutative ring with unity and let a ⊆ R be an ideal in R.
Consider the map of (C.4), which we here denote as
Φa : P(R) −→ P(R/a), Φa (A) := φa (A).
In [Phi19, Th. 4.27(b)], we proved the isomorphism theorem for groups: If G and H are
groups and φ : G −→ H is a homomorphism, then G/ ker φ ∼ = Im φ. Now let R be a
commutative ring with unity and let S be another ring. Since R and S are, in particular,
additive groups, if φ : R −→ S is a ring homomorphism, then (R/ ker φ, +) ∼ = (Im φ, +)
and, since ker φ = φ−1 {0} is an ideal in R by Prop. 7.24(b), it is natural to ask, whether
R/ ker φ and Im φ are isomorphic as rings. In Th. C.4 below we see this, indeed, to be
the case.
Theorem C.4 (Isomorphism Theorem). Let R be a commutative ring with unity and
let S be another ring. If φ : R −→ S is a ring homomorphism, then
(R/ ker φ, +, ·) ∼
= (Im φ, +, ·). (C.6)
φ = fm ◦ fe . (C.8)
Proof. All assertions, except that f , fe , and fm are multiplicative homomorphisms, were
already proved in [Phi19, Th. 4.27(b)]. Moreover, fe is a multiplicative homomorphism
by (C.3b) and, thus, so is fm (by [Phi19, Prop. 4.11(a)]), once we have shown f to be a
multiplicative homomorphism. Thus, it merely remains to show
f (x + ker φ)(y + ker φ) = f (x + ker φ) f (y + ker φ)
as desired.
Definition C.5. Let R be a commutative ring with unity and let a be an ideal in R.
(b) a is called a prime ideal if, and only if, a is proper and
∀ xy ∈ a ⇒ x ∈ a ∨ y ∈ a .
x,y∈R
C QUOTIENT RINGS 196
(c) a is called a maximal ideal if, and only if, a is proper and
∀ a⊆b ⇒ b=a ∨ b=R .
b ⊆ R, b ideal
(i) p is prime.
Lemma C.7. Let R be a commutative ring with unity. Then the following statements
are equivalent:
(i) R is a field.
Proof. If R is a field, then we know from Ex. 7.27(a) that (0) and R are the only ideals
in R, proving (0) to be maximal. Conversely, assume (0) to be maximal ideal in R. If
0 6= x ∈ R, then the ideal (x) must be all of R (since (x) 6= (0) and (0) is maximal).
Since (x) = R and 1 ∈ R, there exists y ∈ R such that xy = 1, showing R \ {0} = R∗
(every nonzero element of R is invertible), i.e. R is a field.
Theorem C.8. Let R be a commutative ring with unity and let a be an ideal in R.
(i) a is proper.
(ii) R/a 6= {0} (i.e. the quotient ring contains more than one element).
(i) a is prime.
C QUOTIENT RINGS 197
(i) a is maximal.
(ii) (0) is a maximal ideal in R/a.
(iii) R/a is a field (called the quotient field or factor field of R with respect to a).
Proof. (a): If a is not proper, then a = R and R/a = {0}. If a is proper, then there
exists x ∈ R \a, i.e. x+a 6= a (since a is an additive subgroup of R). Thus, R/a contains
at least two elements.
(b): If a is prime, then it is proper and, by (a), 0 6= 1 in R/a. Moreover if x, y ∈ R such
that
a = (x + a)(y + a) = xy + a (C.9)
then xy ∈ a and x ∈ a or y ∈ a (as a is prime). Thus, x + a = a or y + a = a, showing
R/a to be an integral domain. Conversely, if R/a is an integral domain, then 0 6= 1
in R/a and a is proper by (a). Moreover, if x, y ∈ R such that xy ∈ a, then (C.9)
holds, implying x + a = a or y + a = a (as the integral domain R/a has no nonzero zero
divisors). Thus, x ∈ a or y ∈ a, proving a to be prime.
(c): Letting AR and BR be as in Lem. C.3(b), we have the equivalences
Lem. C.3(b)
(i) ⇔ #AR = 2 ⇔ #BR = 2 ⇔ (ii).
Corollary C.9. Let R be a commutative ring with unity and let a be an ideal in R. If
a is maximal, then a is prime.
Proof. If a is maximal, then (by Th. C.8(c)) R/a is a field, implying, in particular, R/a
is an integral domain. Thus, by Th. C.8(b), a is prime.
(ii) a is prime.
(iii) a is irreducible.
C QUOTIENT RINGS 198
Then (i) ⇒ (ii) ⇒ (iii) always holds. Moreover, if R is a principal ideal domain, then
the three statements are even equivalent.
Proof. “(i) ⇒ (ii)”: If (a) is maximal, then, according to Cor. (C.9), (a) is prime. Then
a is prime by Lem. C.6.
“(ii) ⇒ (iii)” was already shown in Prop. 7.30(e).
Now assume R to be a principal ideal domain. It only remains to show “(iii) ⇒ (i)”.
To this end, suppose a is irreducible and b is such that (a) ⊆ (b) ⊆ R. Then a ∈ (b), i.e.
there exists c ∈ R such that a = cb. As a is irreducible, this implies c ∈ R∗ or b ∈ R∗ . If
b ∈ R∗ , then (b) = R. If c ∈ R∗ , then b = ac−1 , proving (b) ⊆ (a) and (a) = (b). Thus,
(a) is maximal in R.
Example C.11. We already know from Ex. 7.27(b) that Z is a principal ideal domain.
Thus, the ideals in Z are precisely the sets (n) = nZ with n ∈ N0 (where (n) = (−n)).
We also already know from [Phi19, Ex. 4.38] that the quotient ring Zn := Z/nZ is a
field if, and only if, n ∈ N is prime (in [Phi19, Ex. 4.38], we still avoided using the term
quotient ring). We can now concisely recover and summarize these previous results using
the notions and results of the present section by stating that the following assertions
are equivalent for n ∈ N:
(i) n is irreducible.
(ii) n is prime.
(vi) Zn is a field.
Indeed, as Z is a principal ideal domain, (i) ⇔ (ii) ⇔ (iv) by Th. C.10. Moreover,
(iii) ⇔ (v) by Th. C.8(b) and (iv) ⇔ (vi) by Th. C.8(c). The still missing implication
(v) ⇒ (vi) was, actually, also already shown in [Phi19, Ex. 4.38] (it was shown that, if
n is not prime, then Zn has nonzero zero divisors).
Since Z has no nonzero zero divisors, (0) is a prime ideal in Z – it is the only prime
ideal in Z that is not a maximal ideal.
—
D ALGEBRAIC FIELD EXTENSIONS 199
In the following Th. C.12, we use Zorn’s lemma [Phi19, Th. 5.22] to prove the existence
of maximal ideals. In Th. D.16 below, this result will then be used to establish the
existence of algebraic closures.
Theorem C.12. Let R be a commutative ring with unity and let a be a proper ideal
in R. Then there exists a maximal ideal m in R such that a ⊆ m (in particular, each
commutative ring with unity contains at least one maximal ideal).
Proof. Let S be the set of all proper ideals in R that contain a. Note that set inclusion
⊆ provides a partial
S order on S. If C 6= ∅ is a totally ordered subset of S, then, by
Prop. 7.24(f), s := c∈C c is an ideal in R. If c ∈ C, then 1 ∈
/ c, since c is proper. Thus,
1∈/ s, showing s to be proper, i.e. s ∈ S provides an upper bound for C. Thus, Zorn’s
lemma [Phi19, Th. 5.22] applies, yielding a maximal element m ∈ S (i.e. maximal in S
with respect to ⊆), which is, thus, a maximal ideal in R that contains a.
(a) We know from [Phi19, Ex. 5.2(b)] that L is a vector space over F . The dimension
of L as a vector space over F is denoted [L : F ] and is called the degree of L over
F . The field extension is called finite for [L : F ] < ∞ and infinite for [L : F ] = ∞.
(b) λ ∈ L is called algebraic over F if, and only if, ker ǫλ 6= {0}, where ǫλ : F [X] −→ L is
the substitution homomorphism of Def. and Rem. 7.10 (i.e. if, and only if, ǫλ (f ) = 0
for some nonzero polynomial f with coefficients in F ); λ ∈ L is called transcendental
over F if, and only if, λ is not algebraic over F .
(c) The field extension L is called algebraic over F if, and only if, each λ ∈ L is algebraic
over F .
(b) Consider the field extension C of R. Then [C : R]= 2 and C is algebraic over R,
since each z ∈ C satisfies ǫz X 2 − (2 Re z) X + |z|2 = 0: Indeed,
X 2 − 2 Re z X + |z|2 = X 2 − (z + z) + zz = (X − z) (X − z).
D ALGEBRAIC FIELD EXTENSIONS 200
[L : F ] = [L : K] · [K : F ], (D.1)
where the equation holds in N := N ∪ {∞} if one uses the convention that n · ∞ =
∞ · n = ∞ for each n ∈ N.
B := {κ λ : κ ∈ BK ∧ λ ∈ BL }
Pn
each i=1 cij κi ∈ K, and λ1 , . . . , λm are linearly independent over K, we obtain
n
X
∀ cij κi = 0,
j∈{1,...,m}
i=1
implying cij = 0 for all (i, j) ∈ {1, . . . , n} × {1, . . . , m}, due to the linear independence
of κ1 , . . . , κn over F . This completes the proof that B is linearly independent over F . It
remains to show B is a generating set for L over F . To this end, let α ∈ L. Then Pm there
exists m ∈ N as well as λ1 , . . . , λm ∈ L and β1 , . . . , βm ∈ K such that α = j=1 βj λj .
Next, there exists n ∈ N as well as κ1 , . . . , κn and cij ∈ F such that
n
X
∀ βj = cij κi
j∈{1,...,m}
i=1
(by using cij := 0 if necessary, we can use the same κ1 , . . . , κn for each βj ). Thus, we
obtain m m X n
X X
α= βj λj = cij κi λj ,
j=1 j=1 i=1
Theorem D.4. Let F, L be fields such that L is a finite field extension of F (i.e. F ⊆ L
and [L : F ] < ∞). Then L is an algebraic field extension of F (however, there also exist
infinite field extensions, see Ex. D.12 below).
Theorem D.5. Let F, L be fields with F ⊆ L, let α ∈ L be algebraic over F , and let
ǫα : F [X] −→ L be the corresponding substitution homomorphism.
(a) There exists a unique monic polynomial µα ∈ F [X] such that ker ǫα = (µα ). More-
over, this polynomial µα is both prime and irreducible, and it is the unique monic
polynomial f ∈ F [X] such that ǫα (f ) = 0 and such that f is of minimal degree in
D ALGEBRAIC FIELD EXTENSIONS 202
ker ǫα . One calls µα the minimal polynomial9 or the irreducible polynomial of the
algebraic element α over F .
(b) If µα is the minimal polynomial of α over F as defined in (a), then one has
Proof. (a): F [X] is a principal ideal domain according to Ex. 7.27(b). Thus, there
exists a monic µα ∈ F [X] such that ker ǫα = (µα ). If f ∈ F [X] is another monic
polynomial such that ker ǫα = (f ), then, by Prop. 7.30(c), f and µα are associated and,
by Prop. 7.7, there exists r ∈ R∗ such that f = r µα , implying f = µα . To see µα is
the unique monic polynomial of minimal degree in ker ǫα , let g ∈ F [X]. Then g ∈ F or
deg(gµα ) = deg g + deg µα > deg µα by (7.5c). According to the isomorphism Th. C.4,
F [X]/ ker ǫα = F [X]/(µα ) ∼
= Im ǫα ⊆ L. As a subring of the field L, Im ǫα is an integral
domain. Thus, (µα ) is prime by Th. C.8(b), implying µα to be prime by Lem. C.6 and
irreducible by Th. C.10.
(b): In the proof of (a), we already noticed (D.2) to hold due to the isomorphism Th.
C.4, except that we still need to verify F [α] = F (α), i.e. we need to show F [α] is a
field. However, by (a), µα is irreducible and, thus, (µα ) is maximal by Th.
C.10. In
∼
consequence, F [α] = F [X]/(µα ) is a field by Th. C.8(c). Next, we show F [α] : F =
deg µα : Let
φ : F [X] −→ F [X]/(µα ), f 7→ f := φ(f ) = f + (µα ),
be the canonical epimorphism. Suppose n := deg µα . If f ∈ F [X], then, according to
the remainder Th. 7.9, there exist unique polynomials q, r ∈ F [X] such that
f = q µα + r ∧ deg r < n,
Pn−1
i.e. f uniquely determines c0 , . . . , cn−1 ∈ F such that r = i=0 ci X i . Applying φ yields
n−1
X
f = q µα + r = r = ci X i ,
i=0
9
According to (b), F (α) is a finite-dimension vector space over F . Thus, we can consider the F -linear
endomorphism A : F (α) −→ F (α), A( x) := αx. Then µα is actually also the minimal polynomial of A
in the sense of Th. 8.6: ǫA (µα ) = 0: Indeed, for each x ∈ F (α), ǫA (µα )(x) = ǫα (µα ) · x = 0 · x = 0. For
each polynomial f ∈ F [X] such that ǫA (f ) = 0, one has ǫα (f ) = 0 and f = gµα for some g ∈ F [X],
showing µα | f .
D ALGEBRAIC FIELD EXTENSIONS 203
n−1
X n−1
X
ai X i = ai X i = (µα ) = 0 ∈ F [X]/(µα ).
i=0 i=0
Pn−1 Pn−1
Then i=0 ai X i ∈ ker ǫα , implying a0 = · · · = an−1 = 0, since deg i=0 ai X i < n.
Thus {X 0 , . . . , X n−1 } is a basis of F [X]/(µ α ) and, using the isomorphism of (D.2),
0 n−1
{α , . . . , α } is a basis of F [α], proving F [α] : F = n = deg µα . Finally, as F [α] :
F = deg µα < ∞, F (α) ∼ = F [X]/ ker ǫα is algebraic over F by Th. D.4.
Definition D.6. Let F, L be fields with F ⊆ L and recall Not. B.21 as well as Prop.
B.22(b).
(a) L is called a simple fieldextension of F if, and only if, there exists λ ∈ L such that
L = F (λ). In this case, [F (λ) : F is called the degree of λ over F .
(b) L is called a finitely generated field extension of F if, and only if, there exist
λ1 , . . . , λn ∈ L, n ∈ N, such that L = F (λ1 , . . . , λn ).
Proof. We carry out the proof via induction on n ∈ N, where the base case (n = 1)
was already done in Th. D.5(b). Now let n > 1. By induction, we know K :=
D ALGEBRAIC FIELD EXTENSIONS 204
[L : F ] = [L : K] · [K : F ] < ∞,
Proof. (ii) implies (i) and (iii) by Th. D.8. If (iii), then L = F (α1 , . . . , αn ) with
α1 , . . . , αn ∈ L. However, as L is algebraic over F , the elements α1 , . . . , αn ∈ L are
all algebraic over F , showing (iii) implies (ii). Finally, assume (i), i.e. there exists
n ∈ N and B := {α1 , . . . , αn } ⊆ L such that B forms a basis of L over F . Then
L = F (α1 , . . . , αn ) and α1 , . . . , αn are algebraic over F by Th. D.4, showing (i) implies
(ii).
Corollary D.10. Let F, L be fields with F ⊆ L. Then the following statements are
equivalent:
Proof. It is immediate that (i) implies (ii) (as one can choose A := L). Conversely, if
L = F (A), where A := S(αi )i∈I is a family of algebraic elements over F , then, by Prop.
B.22(b), L = F (A) = J⊆I: #J<∞ F (αi )i∈J , where each F (αi )i∈J with finite J is
algebraic over F by Th. D.8. Thus, each x ∈ L is algebraic over F , proving (i).
Theorem D.11. Let F, K, L be fields such that F ⊆ K ⊆ L and let α ∈ L.
Proof. (a): Since α is P algebraic over K, there exists n ∈ N and κ0 , . . . , κn ∈ K, not all
equal to 0, such that ni=0 κi αi = 0. This shows that α is also already algebraic over
K0 := F (κ0 , . . . , κn ) ⊆ K and [K0 [α] : K0 ] < ∞ by Th. D.5(b). Moreover, [K0 : F ] < ∞
by Th. D.8 and, thus, [K0 [α] : F ] = [K0 [α] : K0 ] · [K0 : F ] < ∞ by Th. D.3. Thus, K0 [α]
is algebraic over F and, in particular, α is algebraic over F .
(b): If K is algebraic over F and L is algebraic over K, then L is algebraic over F by
(a). Conversely, if L is algebraic over F , then K ⊆ L is algebraic over F , and L is
algebraic over K, as F ⊆ K.
Example D.12. Consider the field of algebraic numbers, defined by
A := α ∈ C : α is algebraic over Q .
Indeed, A is a field, since, if α, β ∈ A, then A contains the field Q(α, β) by Th. D.8,
i.e., in particular, αβ, α + β, −α ∈ A and, for 0 6= α, α−1 ∈ A. Thus, A is an algebraic
field extension of Q, where an argument completely analogous to the one in Ex. D.2(c)
shows A to be countable and, in particular, A ( C. One can show that A is not a
√ √
finite field extension of Q: Since ( n p)n − p = 0 for each n, p ∈ N, we have n p ∈ A
√
for each n, p ∈ N. Since, for p prime, X n − p ∈ Q[X] is the minimal polynomial of n p
√
and deg(X n − p) = n, one obtains [A : Q] ≥ [Q( n p) : Q] = n, showing [A : Q] = ∞.
√
However, to actually prove X n − p ∈ Q[X] to be the minimal polynomial of n p for p
prime, one needs to show X n − p is irreducible. This turns out to be somewhat tricky
and is usually obtained by using Eisenstein’s irreducibility criterion (cf., e.g., [Bos13,
Th. 2.8.1]).
The goal of the present section is to show every field is contained in an algebraic closure
(Th. D.16) and that all algebraic closures of a field are isomorphic (Cor. D.20). Both
results are based on suitable applications of Zorn’s lemma. In preparation for Th. D.16,
we first show in the following Th. D.14 that one can always extend a field F to a field
L such that a particular given polynomial over F has a zero in L.
Theorem D.14. Let F be a field and f ∈ F [X] such that deg f ≥ 1. Then there
exists an algebraic field extension L of F such that f has a zero in L (i.e. such that
D ALGEBRAIC FIELD EXTENSIONS 206
ǫα (f ) = 0 for some α ∈ L). Moreover, if f is irreducible over F , then one may choose
L := F [X]/(f ).
Proof. It suffices to consider the case, where f is irreducible over F : If f is not irre-
ducible, then, by Cor. 7.37, one writes f = f1 · · · fn with irreducible f1 , . . . , fn ∈ F [X],
n ∈ N, showing f to have a zero in L if an irreducible factor fi has a zero in L. Thus,
we now assume f to be irreducible. Then, according to Th. C.10, the ideal (f ) is
maximal in F [X], i.e. L := F [X]/(f ) is a field by Th. C.8(c). In the usual way, we
consider F ⊆ F [X], i.e. F [X] as a ring extension of F , and we consider the canonical
epimorphism
φ : F [X] −→ L = F [X]/(f ), φ(g) = g + (f ).
Then φ↾F : F −→ L is a unital ring homomorphism between fields and, thus, injective,
by Prop. 7.28. Thus, L is a field extension of F , where we can consider F ⊆ L, if we
identify F with φ(F ). We claim that φ(X) P ∈ L is the desired zero of f : Indeed, if
c0 , . . . , cn ∈ F (n ∈ N) are such that f = ni=0 ci X i , then
n n
!
X i X
ǫφ(X) (f ) = ci φ(X) = φ ci X i = φ(f ) = 0 ∈ L,
i=0 i=0
After applying Th. D.14 n times, we obtain a field extension K of F such that each of
the gi has a zero αi ∈ K. Define x := (xf )f ∈I by letting
(
αi for f = gi ,
xf :=
0 otherwise,
Then φ↾F : F −→ L1 is a unital ring homomorphism between fields and, thus, injective,
by Prop. 7.28. Thus, L1 is a field extension of F , where we can consider F ⊆ L1 , if
we identify F with φ(F ). We proceed analogous to the proof of Th. D.14 and show
φ(XfP) ∈ L1 is the desired zero of f ∈ I: Indeed, if c0 , . . . , cn ∈ F (n ∈ N) are such that
f = ni=0 ci X i ∈ I, then
n n
!
X i X
i
f (Xf )∈m
ǫφ(Xf ) (f ) = ci φ(Xf ) = φ ci (Xf ) = φ f (Xf ) = 0 ∈ L1 ,
i=0 i=0
where, in the notation, we did not distinguish between ci ∈ F and φ(ci ) ∈ L1 . Next, we
show L1 to be algebraic over F : With α := (φ(Xf ))f ∈I and ǫα : F [(Xf )f ∈I ] −→ L1 , we
have F [α] = Im ǫα as a subring of L1 . According to the isomorphism Th. C.4,
F [(Xf )f ∈I ]/ ker ǫα ∼
= Im ǫα = F [α] ⊆ L1 .
Moreover, according to Cor. D.10, F (α) ⊆ L1 is algebraic over F and, using Th. D.8(a),
we have
[ [
F (α) = F (φ(Xf ))f ∈J = F (φ(Xf ))f ∈J = F [α].
J⊆I: #J<∞ J⊆I: #J<∞
Thus, F [α] is a field and ker ǫα is a maximal ideal in F [(Xf )f ∈I ]. Since a ⊆ ker ǫα , this
shows m = ker ǫα and L1 = F (α). In particular, L1 is algebraic over F .
One can now inductively iterate the above construction to obtain a sequence (Lk )k∈N of
fields such that
F ⊆ L1 ⊆ L2 ⊆ . . . ,
where, for each k ∈ N, Lk+1 is an algebraic field extension of Lk and f ∈ Lk [X] with
deg f ≥ 1 has a zero in Lk+1 . Then, as a consequence of Th. D.11(b), each Lk is also
algebraic over F . If we now let [
F := Lk ,
k∈N
then F is a field according to [Phi19, Ex. 4.36(f)] (actually, one first needs to extend +
and · to L, which is straightforward, since for a, b ∈ F , there exists k ∈ N such that
a, b ∈ Lk , and, thus, a + b and a · b are already defined in Lk – as the Lk are nested,
this yields a well-defined + and · on L). Now F is an algebraic closure of F : Indeed,
if α ∈ F , then α ∈ Lk for some k ∈ N and, thus, α is algebraic over F , proving F to
be algebraic over F . If f ∈ F [X], then, as f has only finitely many coefficients, there
exists k ∈ N such that f ∈ Lk [X]. Then f has a zero α ∈ Lk+1 ⊆ F , showing F to be
algebraically closed.
D ALGEBRAIC FIELD EXTENSIONS 209
It is remarked in [Bos13], after the proof of [Bos13, Th. 3.4.4], that one can show by
different means that, in the situation of the proof of Th. D.16 above, one actually has
L1 = F , i.e. one does, in fact, obtain an algebraic closure of F in the first step of the
construction.
In preparation for showing that two algebraic closures of a field F are necessarily iso-
morphic, we need to briefly study extensions of homomorphisms between fields (it is
also of algebraic interest beyond its application here).
Lemma D.17. Let F, L be fields and let σ : F −→ L be a unital homomorphism. Then
σ extends to a map
n
X n
X
i σ
σ : F [X] −→ L[X], f= fi X 7→ f := σ(fi )X i . (D.4)
i=0 i=0
n n
!
i
X X
ǫσ(x) (f σ ) = fi xi = σ ǫx (f ) = σ(0) = 0,
σ(fi ) σ(x) = σ
i=0 i=0
as claimed.
Proposition D.18. Let F, K be fields such that K = F (α) is a simple algebraic field
extension of F , α ∈ K. Let L be another field and let σ : F −→ L be a unital
homomorphism. Let µα ∈ F [X] denote the minimal polynomial of α over F and let
µσα ∈ L[X] denote its image under σ according to (D.4).
D ALGEBRAIC FIELD EXTENSIONS 210
Proof. (a) is immediate from Lem. D.17(b), since ǫα (µα ) = 0 and µτα = µσα .
(b): By definition, K = F (α) is the field of fractions of F [α] = Im ǫα , ǫα : F [X] −→ K.
If τ1 , τ2 : K −→ L are homomorphism extending σ such that τ1 (α) = τ2 (α) and f, g ∈
F [X] are such that α is not a zero of g, then
ǫτ1 (α) (f ) ǫτ2 (α) (f )
ǫα (f ) ǫα (f )
τ1 = = = τ2 ,
ǫα (g) ǫτ1 (α) (g) ǫτ2 (α) (g) ǫα (g)
showing τ1 = τ2 , proving the uniqueness statement (one could have even omitted the
denominators in the previous computation, as we know K = F (α) = F [α] from Th.
D.5(b)). To prove the existence statement, let λ ∈ L be such that ǫλ (µσα ) = 0. Consider
the homomorphisms
Then we know ker ǫα = (µα ) from Th. D.5(b). We also know (µα ) ⊆ ker ψ, since, for
each f ∈ F [X],
If
φ : F [X] −→ F [X]/(µα ), φ(f ) = f + (µα ),
is the canonical epimorphism, then, by the isomorphism Th. C.4, we can write ǫα =
φα ◦ φ with a monomorphism φα : F [X]/(µα ) −→ K and we can write ψ = φψ ◦ φ
with a monomorphism φψ : F [X]/(µα ) −→ L. As mentioned above, we know K =
F (α) = F [α], implying ǫα to be surjective, i.e. φα is also an epimorphism and, thus, an
isomorphism. We claim
τ : K −→ L, τ := φψ ◦ φ−1 α
implying
τ (x) = φψ ◦ φ−1
α (x) = φψ x + (µα ) = (φψ ◦ φ)(x) = ψ(x) = σ(x) ∈ L,
implying
τ (α) = φψ ◦ φ−1
α (α) = φψ X + (µα ) = (φψ ◦ φ)(X) = ψ(X) = ǫλ (X) = λ,
(b) If both K and L are algebraically closed, L is algebraic over σ(F ), and τ : K −→ L
is a homomorphism such that τ ↾F = σ, then τ is an isomorphism.
Proof. (a): The proof is basically a combination of Prop. D.18(b) with Zorn’s lemma:
To apply Zorn’s lemma of [Phi19, Th. A.52(iii)], we define a partial order on the set
n
M := (H, ϕ) : F ⊆ H ⊆ K, H is a field,
o
ϕ : H −→ L is a homomorphism with ϕ↾F = σ
by letting
(H1 , ϕ1 ) ≤ (H2 , ϕ2 ) :⇔ H 1 ⊆ H 2 ∧ ϕ 2 ↾H 1 = ϕ 1 .
Then (F, σ) ∈ M, i.e. S M 6= ∅. Every chain C ⊆ M has an upper bound, namely
(HC , ϕC ) with HC := (H,ϕ)∈C H and ϕC (x) := ϕ(x), where (H, ϕ) ∈ C is chosen such
that x ∈ H (since C is a chain, the value of ϕC (x) does not actually depend on the
choice of (H, ϕ) ∈ C and is, thus, well-defined). Clearly, F ⊆ HC ⊆ K, HC is a field
by [Phi19, Ex. 4.36(f)], and ϕC is a homomorphism with ϕ↾F = σ. Thus, Zorn’s lemma
applies, yielding a maximal element (Hmax , ϕmax ) ∈ M. We claim that Hmax = K:
Indeed, if there exists α ∈ K \ Hmax , then, by assumption, α is algebraic over F ,
and we may consider the minimal polynomial µα ∈ F [X] of α over F . If µσα ∈ L[X]
REFERENCES 212
denotes the image of µα under σ according to (D.4), then µσα has a zero λ ∈ L, since
L is algebraically closed. Thus, by Prop. D.18(b), we can extend ϕmax to Hmax (α),
where Hmax ( Hmax (α) ⊆ K, in contradiction to the maximality of (Hmax , ϕmax ). In
consequence, τ := ϕmax : K −→ L is the desired extension of σ.
(b): Under the hypotheses of (b), τ is injective by Prop. 7.28 and, thus, an isomorphism
between the fields K and τ (K), as τ (K) must be a field by [Phi19, Prop. 4.37]. In
consequence, if K is algebraically closed, then so is τ (K) (e.g. due to Lem. D.17(b)).
Since L is algebraic over σ(F ), L is algebraic over τ (K) ⊇ σ(F ), and Cor. D.15(ii) yields
τ (K) = L, showing τ to be an isomorphism between K and L as claimed.
Corollary D.20. Let F be a field. If L1 and L2 are both algebraic closures of F , then
there exists an isomorphism φ : L1 ∼ = L2 such that φ↾F = IdF (however, this existence
result is nonconstructive, as it is based on a application of Zorn’s lemma).
References
[Bos13] Siegfried Bosch. Algebra, 8th ed. Springer-Verlag, Berlin, 2013 (German).
[For17] Otto Forster. Analysis 3, 8th ed. Springer Spektrum, Wiesbaden, Ger-
many, 2017 (German).
[Jac75] Nathan Jacobson. Lectures in Abstract Algebra II. Linear Algebra. Gradu-
ate Texts in Mathematics, Springer, New York, 1975.
[Lan05] Serge Lang. Algebra, revised 3rd ed. Graduate Texts in Mathematics, Vol.
211, Springer, New York, 2005.
REFERENCES 213
[Phi16a] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, LMU Mu-
nich, 2015/2016, AMS Open Math Notes Ref. # OMN:202109.111306, avail-
able in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111306.
[Phi16b] P. Philip. Analysis II: Topology and Differential Calculus of Several Vari-
ables. Lecture Notes, LMU Munich, 2016, AMS Open Math Notes Ref. #
OMN:202109.111307, available in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111307.
[Phi17a] P. Philip. Analysis III: Measure and Integration Theory of Several Variables.
Lecture Notes, LMU Munich, 2016/2017, AMS Open Math Notes Ref. #
OMN:202109.111308, available in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111308.
[Rud73] W. Rudin. Functional Analyis. McGraw-Hill Book Company, New York, 1973.
[Str08] Gernot Stroth. Lineare Algebra, 2nd ed. Berliner Studienreihe zur Math-
ematik, Vol. 7, Heldermann Verlag, Lemgo, Germany, 2008 (German).