Math Basics
Math Basics
Calculus
Jean Gallier
Department of Computer and Information Science
University of Pennsylvania
Philadelphia, PA 19104, USA
e-mail: [email protected]
c Jean Gallier
December 9, 2014
Contents
1 Introduction
2 Vector Spaces, Bases, Linear Maps
2.1 Groups, Rings, and Fields . . . . .
2.2 Vector Spaces . . . . . . . . . . . .
2.3 Linear Independence, Subspaces . .
2.4 Bases of a Vector Space . . . . . .
2.5 Linear Maps . . . . . . . . . . . . .
2.6 Quotient Spaces . . . . . . . . . . .
2.7 Summary . . . . . . . . . . . . . .
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
22
29
34
41
48
49
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
51
67
83
86
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
87
100
113
114
123
125
.
.
.
.
.
.
.
129
129
133
136
142
146
147
147
.
.
.
.
.
.
5 Determinants
5.1 Permutations, Signature of a Permutation . . .
5.2 Alternating Multilinear Maps . . . . . . . . . .
5.3 Definition of a Determinant . . . . . . . . . . .
5.4 Inverse Matrices and Determinants . . . . . . .
5.5 Systems of Linear Equations and Determinants
5.6 Determinant of a Linear Map . . . . . . . . . .
5.7 The CayleyHamilton Theorem . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
5.8
5.9
Permanents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Further Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
157
157
161
183
186
190
205
211
.
.
.
.
.
213
213
219
232
240
242
.
.
.
.
245
245
252
256
258
.
.
.
.
.
259
259
262
264
269
276
.
.
.
.
.
.
.
277
277
285
296
299
301
303
305
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
317
317
326
330
332
334
337
341
.
.
.
.
.
.
.
343
343
343
352
359
362
365
373
.
.
.
.
.
.
.
.
.
375
375
382
386
390
393
396
409
413
417
447
447
455
458
CONTENTS
16.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
461
461
469
470
478
.
.
.
.
483
483
491
495
500
.
.
.
.
.
.
.
.
.
.
.
.
501
501
508
510
511
514
519
524
531
533
537
539
541
.
.
.
.
.
.
555
555
556
562
564
572
576
. . . . . . . . . . . . . . . . . 583
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
589
589
603
608
612
CONTENTS
22 Annihilating Polynomials; Primary Decomposition
22.1 Annihilating Polynomials and the Minimal Polynomial
22.2 Minimal Polynomials of Diagonalizable Linear Maps . .
22.3 The Primary Decomposition Theorem . . . . . . . . . .
22.4 Nilpotent Linear Maps and Jordan Form . . . . . . . .
23 Tensor Algebras
23.1 Tensors Products . . . . . . . . . . . . . . . . . .
23.2 Bases of Tensor Products . . . . . . . . . . . . . .
23.3 Some Useful Isomorphisms for Tensor Products .
23.4 Duality for Tensor Products . . . . . . . . . . . .
23.5 Tensor Algebras . . . . . . . . . . . . . . . . . . .
23.6 Symmetric Tensor Powers . . . . . . . . . . . . .
23.7 Bases of Symmetric Powers . . . . . . . . . . . .
23.8 Some Useful Isomorphisms for Symmetric Powers
23.9 Duality for Symmetric Powers . . . . . . . . . . .
23.10Symmetric Algebras . . . . . . . . . . . . . . . .
23.11Exterior Tensor Powers . . . . . . . . . . . . . . .
23.12Bases of Exterior Powers . . . . . . . . . . . . . .
23.13Some Useful Isomorphisms for Exterior Powers . .
23.14Duality for Exterior Powers . . . . . . . . . . . .
23.15Exterior Algebras . . . . . . . . . . . . . . . . . .
23.16The Hodge -Operator . . . . . . . . . . . . . . .
23.17Testing Decomposability; Left and Right Hooks .
23.18Vector-Valued Alternating Forms . . . . . . . . .
23.19The Pfaffian Polynomial . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
613
613
615
619
625
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
631
631
639
641
642
645
650
654
656
656
658
660
664
666
667
669
672
674
681
684
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
689
689
697
702
705
708
716
721
.
.
.
.
737
737
742
743
745
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
26 Topology
26.1 Metric Spaces and Normed Vector Spaces .
26.2 Topological Spaces . . . . . . . . . . . . .
26.3 Continuous Functions, Limits . . . . . . .
26.4 Connected Sets . . . . . . . . . . . . . . .
26.5 Compact Sets . . . . . . . . . . . . . . . .
26.6 Continuous Linear and Multilinear Maps .
26.7 Normed Affine Spaces . . . . . . . . . . .
26.8 Futher Readings . . . . . . . . . . . . . . .
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
759
759
763
768
773
779
793
798
798
27 A Detour On Fractals
799
27.1 Iterated Function Systems and Fractals . . . . . . . . . . . . . . . . . . . . . 799
28 Differential Calculus
28.1 Directional Derivatives, Total Derivatives . . . . .
28.2 Jacobian Matrices . . . . . . . . . . . . . . . . . .
28.3 The Implicit and The Inverse Function Theorems
28.4 Tangent Spaces and Differentials . . . . . . . . . .
28.5 Second-Order and Higher-Order Derivatives . . .
28.6 Taylors formula, Fa`a di Brunos formula . . . . .
28.7 Vector Fields, Covariant Derivatives, Lie Brackets
28.8 Futher Readings . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
807
807
815
820
824
825
831
835
837
.
.
.
.
839
839
848
851
859
861
861
862
868
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
871
Chapter 1
Introduction
10
CHAPTER 1. INTRODUCTION
Chapter 2
Vector Spaces, Bases, Linear Maps
2.1
In the following three chapters, the basic algebraic structures (groups, rings, fields, vector
spaces) are reviewed, with a major emphasis on vector spaces. Basic notions of linear algebra
such as vector spaces, subspaces, linear combinations, linear independence, bases, quotient
spaces, linear maps, matrices, change of bases, direct sums, linear forms, dual spaces, hyperplanes, transpose of a linear maps, are reviewed.
The set R of real numbers has two operations + : R R R (addition) and : R R
R (multiplication) satisfying properties that make R into an abelian group under +, and
R {0} = R into an abelian group under . Recall the definition of a group.
Definition 2.1. A group is a set G equipped with a binary operation : G G G that
associates an element a b G to every pair of elements a, b G, and having the following
properties: is associative, has an identity element e G, and every element in G is invertible
(w.r.t. ). More explicitly, this means that the following equations hold for all a, b, c G:
(G1) a (b c) = (a b) c.
(associativity);
(G2) a e = e a = a.
(identity);
(inverse).
12
Example 2.1.
1. The set Z = {. . . , n, . . . , 1, 0, 1, . . . , n, . . .} of integers is a group under addition,
with identity element 0. However, Z = Z {0} is not a group under multiplication.
2. The set Q of rational numbers (fractions p/q with p, q Z and q 6= 0) is a group
under addition, with identity element 0. The set Q = Q {0} is also a group under
multiplication, with identity element 1.
3. Given any nonempty set S, the set of bijections f : S S, also called permutations
of S, is a group under function composition (i.e., the multiplication of f and g is the
composition g f ), with identity element the identity function idS . This group is not
abelian as soon as S has more than two elements.
4. The set of n n invertible matrices with real (or complex) coefficients is a group under
matrix multiplication, with identity element the identity matrix In . This group is
called the general linear group and is usually denoted by GL(n, R) (or GL(n, C)).
It is customary to denote the operation of an abelian group G by +, in which case the
inverse a1 of an element a G is denoted by a.
The identity element of a group is unique. In fact, we can prove a more general fact:
Fact 1. If a binary operation : M M M is associative and if e0 M is a left identity
and e00 M is a right identity, which means that
e0 a = a for all a M
(G2l)
(G2r)
and
then e0 = e00 .
Proof. If we let a = e00 in equation (G2l), we get
e0 e00 = e00 ,
and if we let a = e0 in equation (G2r), we get
e0 e00 = e0 ,
and thus
e0 = e0 e00 = e00 ,
as claimed.
13
Fact 1 implies that the identity element of a monoid is unique, and since every group is
a monoid, the identity element of a group is unique. Furthermore, every element in a group
has a unique inverse. This is a consequence of a slightly more general fact:
Fact 2. In a monoid M with identity element e, if some element a M has some left inverse
a0 M and some right inverse a00 M , which means that
a0 a = e
(G3l)
a a00 = e,
(G3r)
and
then a0 = a00 .
Proof. Using (G3l) and the fact that e is an identity element, we have
(a0 a) a00 = e a00 = a00 .
Similarly, Using (G3r) and the fact that e is an identity element, we have
a0 (a a00 ) = a0 e = a0 .
However, since M is monoid, the operation is associative, so
a0 = a0 (a a00 ) = (a0 a) a00 = a00 ,
as claimed.
Remark: Axioms (G2) and (G3) can be weakened a bit by requiring only (G2r) (the existence of a right identity) and (G3r) (the existence of a right inverse for every element) (or
(G2l) and (G3l)). It is a good exercise to prove that the group axioms (G2) and (G3) follow
from (G2r) and (G3r).
If a group G has a finite number n of elements, we say that G is a group of order n. If
G is infinite, we say that G has infinite order . The order of a group is usually denoted by
|G| (if G is finite).
Given a group G, for any two subsets R, S G, we let
RS = {r s | r R, s S}.
In particular, for any g G, if R = {g}, we write
gS = {g s | s S},
and similarly, if S = {g}, we write
Rg = {r g | r R}.
14
For any g G, define Lg , the left translation by g, by Lg (a) = ga, for all a G, and
Rg , the right translation by g, by Rg (a) = ag, for all a G. Observe that Lg and Rg are
bijections. We show this for Lg , the proof for Rg being similar.
If Lg (a) = Lg (b), then ga = gb, and multiplying on the left by g 1 , we get a = b, so Lg
injective. For any b G, we have Lg (g 1 b) = gg 1 b = b, so Lg is surjective. Therefore, Lg
is bijective.
Definition 2.2. Given a group G, a subset H of G is a subgroup of G iff
(1) The identity element e of G also belongs to H (e H);
(2) For all h1 , h2 H, we have h1 h2 H;
(3) For all h H, we have h1 H.
The proof of the following proposition is left as an exercise.
Proposition 2.1. Given a group G, a subset H G is a subgroup of G iff H is nonempty
and whenever h1 , h2 H, then h1 h1
2 H.
If the group G is finite, then the following criterion can be used.
Proposition 2.2. Given a finite group G, a subset H G is a subgroup of G iff
(1) e H;
(2) H is closed under multiplication.
Proof. We just have to prove that condition (3) of Definition 2.2 holds. For any a H, since
the left translation La is bijective, its restriction to H is injective, and since H is finite, it is
also bijective. Since e H, there is a unique b H such that La (b) = ab = e. However, if
a1 is the inverse of a in G, we also have La (a1 ) = aa1 = e, and by injectivity of La , we
have a1 = b H.
Definition 2.3. If H is a subgroup of G and g G is any element, the sets of the form gH
are called left cosets of H in G and the sets of the form Hg are called right cosets of H in
G.
The left cosets (resp. right cosets) of H induce an equivalence relation defined as
follows: For all g1 , g2 G,
g1 g2 iff g1 H = g2 H
(resp. g1 g2 iff Hg1 = Hg2 ). Obviously, is an equivalence relation.
Now, we claim that g1 H = g2 H iff g21 g1 H = H iff g21 g1 H.
15
Proof. If we apply the bijection Lg21 to both g1 H and g2 H we get Lg21 (g1 H) = g21 g1 H
and Lg21 (g2 H) = H, so g1 H = g2 H iff g21 g1 H = H. If g21 g1 H = H, since 1 H, we get
g21 g1 H. Conversely, if g21 g1 H, since H is a group, the left translation Lg21 g1 is a
bijection of H, so g21 g1 H = H. Thus, g21 g1 H = H iff g21 g1 H.
It follows that the equivalence class of an element g G is the coset gH (resp. Hg).
Since Lg is a bijection between H and gH, the cosets gH all have the same cardinality. The
map Lg1 Rg is a bijection between the left coset gH and the right coset Hg, so they also
have the same cardinality. Since the distinct cosets gH form a partition of G, we obtain the
following fact:
Proposition 2.3. (Lagrange) For any finite group G and any subgroup H of G, the order
h of H divides the order n of G.
The ratio n/h is denoted by (G : H) and is called the index of H in G. The index (G : H)
is the number of left (and right) cosets of H in G. Proposition 2.3 can be stated as
|G| = (G : H)|H|.
The set of left cosets of H in G (which, in general, is not a group) is denoted G/H.
The points of G/H are obtained by collapsing all the elements in a coset into a single
element.
It is tempting to define a multiplication operation on left cosets (or right cosets) by
setting
(g1 H)(g2 H) = (g1 g2 )H,
but this operation is not well defined in general, unless the subgroup H possesses a special
property. This property is typical of the kernels of group homomorphisms, so we are led to
Definition 2.4. Given any two groups G and G0 , a function : G G0 is a homomorphism
iff
(g1 g2 ) = (g1 )(g2 ), for all g1 , g2 G.
Taking g1 = g2 = e (in G), we see that
(e) = e0 ,
and taking g1 = g and g2 = g 1 , we see that
(g 1 ) = (g)1 .
If : G G0 and : G0 G00 are group homomorphisms, then : G G00 is also a
homomorphism. If : G G0 is a homomorphism of groups, and H G, H 0 G0 are two
subgroups, then it is easily checked that
Im H = (H) = {(g) | g H}
16
In this case, is unique and it is denoted 1 . When is an isomorphism we say the the
groups G and G0 are isomorphic. It is easy to see that a bijective homomorphism is an
isomorphism. When G0 = G, a group isomorphism is called an automorphism. The left
translations Lg and the right translations Rg are automorphisms of G.
We claim that H = Ker satisfies the following property:
gH = Hg,
for all g G.
()
for all g G,
gHg 1 H,
for all g G.
This is because gHg 1 H implies H g 1 Hg, and this for all g G. But,
(ghg 1 ) = (g)(h)(g 1 ) = (g)e0 (g)1 = (g)(g)1 = e0 ,
for all h H = Ker and all g G. Thus, by definition of H = Ker , we have gHg 1 H.
Definition 2.5. For any group G, a subgroup N of G is a normal subgroup of G iff
gN g 1 = N,
for all g G.
This is denoted by N C G.
Observe that if G is abelian, then every subgroup of G is normal.
If N is a normal subgroup of G, the equivalence relation induced by left cosets is the
same as the equivalence induced by right cosets. Furthermore, this equivalence relation is
a congruence, which means that: For all g1 , g2 , g10 , g20 G,
17
18
(associativity of +)
(commutativity of +)
(zero)
(additive inverse)
(associativity of )
(identity for )
(distributivity)
(distributivity)
(2.1)
(2.2)
(2.3)
(2.4)
(2.5)
(2.6)
(2.7)
(2.8)
(2.9)
(2.10)
Note that (2.9) implies that if 1 = 0, then a = 0 for all a A, and thus, A = {0}. The
ring A = {0} is called the trivial ring. A ring for which 1 6= 0 is called nontrivial . The
multiplication a b of two elements a, b A is often denoted by ab.
Example 2.2.
19
2. The group R[X] of polynomials in one variable with real coefficients is a ring under
multiplication of polynomials. It is a commutative ring.
3. The group of n n matrices Mn (R) is a ring under matrix multiplication. However, it
is not a commutative ring.
4. The group C(]a, b[) of continuous functions f : ]a, b[ R is a ring under the operation
f g defined such that
(f g)(x) = f (x)g(x)
for all x ]a, b[.
Again, the reader will easily check that the ring axioms are satisfied, with [0] as zero
and [1] as multiplicative unit. The resulting ring is denoted by Z/pZ.1 Observe that
if p is composite, then this ring has zero-divisors. For example, if p = 4, then we have
220
1
(mod 4).
The notation Zp is sometimes used instead of Z/pZ but it clashes with the notation for the p-adic integers
so we prefer not to use it.
20
if n 0 (with 0 a = 0) and
n a = (n) a
h(n) = n 1A
is a ring homomorphism (where 1A is the multiplicative identity of A).
2. Given any real R, the evaluation map : R[X] R defined by
(f (X)) = f ()
for every polynomial f (X) R[X] is a ring homomorphism.
A ring homomorphism h : A B is an isomorphism iff there is a homomorphism g : B
A such that g f = idA and f g = idB . Then, g is unique and denoted by h1 . It is easy
to show that a bijective ring homomorphism h : A B is an isomorphism. An isomorphism
from a ring to itself is called an automorphism.
21
h(x1 ) = h(x)1 .
22
2.2
Vector Spaces
For every n 1, let Rn be the set of n-tuples x = (x1 , . . . , xn ). Addition can be extended to
Rn as follows:
(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ).
We can also define an operation : R Rn Rn as follows:
(x1 , . . . , xn ) = (x1 , . . . , xn ).
The resulting algebraic structure has some interesting properties, those of a vector space.
Before defining vector spaces, we need to discuss a strategic choice which, depending
how it is settled, may reduce or increase headackes in dealing with notions such as linear
combinations and linear dependence (or independence). The issue has to do with using sets
of vectors versus sequences of vectors.
Our experience tells us that it is preferable to use sequences of vectors; even better,
indexed families of vectors. (We are not alone in having opted for sequences over sets, and
we are in good company; for example, Artin [4], Axler [7], and Lang [67] use sequences.
Nevertheless, some prominent authors such as Lax [71] use sets. We leave it to the reader
to conduct a survey on this issue.)
23
24
The issue is that the binary operation + only tells us how to compute a1 + a2 for two
elements of A, but it does not tell us what is the sum of three of more elements. For example,
how should a1 + a2 + a3 be defined?
What we have to do is to define a1 +a2 +a3 by using a sequence of steps each involving two
elements, and there are two possible ways to do this: a1 + (a2 + a3 ) and (a1 + a2 ) + a3 . If our
operation + is not associative, these are different values. If it associative, then a1 +(a2 +a3 ) =
(a1 + a2 ) + a3 , but then there are still six possible permutations of the indices 1, 2, 3, and if
+ is not commutative, these values are generally different. If our operation is commutative,
then all six permutations have the same value. P
Thus, if + is associative and commutative,
it seems intuitively clear that a sum of the form iI ai does not depend on the order of the
operations used to compute it.
This is indeed the case, but a rigorous proof requires induction, and such a proof is
surprisingly
involved. Readers may accept without proof the fact that sums of the form
P
iI ai are indeed well defined, and jump directly to Definition 2.9. For those who want to
see the gory details, here we go.
P
First, we define sums iI ai , where I is a finite sequence of distinct natural numbers,
say I = (i1 , . . . , im ). If I = (i1 , . . . , im ) with m 2, we denote the sequence (i2 , . . . , im ) by
I {i1 }. We proceed by induction on the size m of I. Let
X
ai = ai 1 ,
if m = 1,
iI
ai = ai 1 +
iI
X
ai ,
if m > 1.
iI{i1 }
iI
If the operation + is not associative, the grouping of the terms matters. For instance, in
general
a1 + (a2 + (a3 + a4 )) 6= (a1 + a2 ) + (a3 + a4 ).
P
However, if the operation + is associative, the sum iI ai should not depend on the grouping
of the elements in I, as long as their order is preserved. For example, if I = (1, 2, 3, 4, 5),
J1 = (1, 2), and J2 = (3, 4, 5), we expect that
X X
X
ai =
aj +
aj .
iI
jJ1
jJ2
25
Proposition 2.5. Given any nonempty set A equipped with an associative binary operation
+ : A A A, for any nonempty finite sequence I of distinct natural numbers and for
any partition of I into p nonempty sequences Ik1 , . . . , Ikp , for some nonempty sequence K =
(k1 , . . . , kp ) of distinct natural numbers such that ki < kj implies that < for all Iki
and all Ikj , for every sequence (ai )iI of elements in A, we have
X
X X
a =
a .
I
Ik
kK
Case 1. The sequence Ik1 has a single element, say , which is the first element of I.
In this case, write C for the sequence obtained from I by deleting its first element . By
definition,
X
X
a ,
a = a +
C
and
X X
kK
a
= a +
X X
Ik
jJ
a
.
Ij
jJ
Ij
kK
Ik0
Ik0
jJ
Ij
26
If we add the righthand side to a , using associativity and the definition of an indexed sum,
we get
X X X
X X X
a +
a +
a
= a +
a
+
a
Ik0
jJ
Ik0
Ij
X
jJ
+
X X
Ik1
X X
kK
jJ
Ij
a
Ij
a ,
Ik
as claimed.
Pn
P
If I = (1, . . . , n), we also write
a
instead
of
i
i=1
iI ai . Since + is associative, PropoPn
sition 2.5 shows that the sum i=1 ai is independent of the grouping of its elements, which
justifies the use the notation a1 + + an (without any parentheses).
If we also assume that
P our associative binary operation on A is commutative, then we
can show that the sum iI ai does not depend on the ordering of the index set I.
Proposition 2.6. Given any nonempty set A equipped with an associative and commutative
binary operation + : A A A, for any two nonempty finite sequences I and J of distinct
natural numbers such that J is a permutation of I (in other words, the unlerlying sets of I
and J are identical), for every sequence (ai )iI of elements in A, we have
X
X
a =
a .
I
a =
X
I 0
iX
X
p
1 1
a =
ai +
ai .
i=1
i=i1 +1
27
i=i1 +1
i=i1 +1
i=1
i=i1 +1
then using associativity and commutativity several times (more rigorously, using induction
on i1 1), we get
iX
X
iX
X
p
p
1 1
1 1
ai 1 +
ai
+
ai =
ai + ai 1 +
ai
i=1
i=i1 +1
i=1
i=i1 +1
ai ,
i=1
as claimed.
The cases where i1 = 1 or i1 = p are treated similarly, but in a simpler manner since
either P = () or Q = () (where () denotes the empty sequence).
P
Having done all this, we can now make sense of sums of the form iI ai , for any finite
indexed set I and any family a = (ai )iI of elements in A, where A is a set equipped with a
binary operation + which is associative and commutative.
Indeed, since I is finite, it is in bijection with the set {1, . . . , n} for some n N, and any
total ordering on I corresponds to a permutation I of {1, . . . , n} (where
we identify a
P
permutation with its image). For any total ordering on I, we define iI, aj as
X
X
aj .
aj =
iI,
jI
jI0
and since I and I0 are different permutations of {1, . . . , n}, by Proposition 2.6, we have
X
X
aj =
aj .
jI
jI0
P
Therefore,
the sum iI, aj does
P
P not depend on the total ordering on I. We define the sum
iI ai as the common value
jI aj for all total orderings of I.
Vector spaces are defined as follows.
28
Definition 2.9. Given a field K (with addition + and multiplication ), a vector space over
K (or K-vector space) is a set E (of vectors) together with two operations + : E E E
(called vector addition),2 and : K E E (called scalar multiplication) satisfying the
following conditions for all , K and all u, v E;
The symbol + is overloaded, since it denotes both addition in the field K and addition of vectors in E.
It is usually clear from the context which + is intended.
3
The symbol 0 is also overloaded, since it represents both the zero in K (a scalar) and the identity element
of E (the zero vector). Confusion rarely arises, but one may prefer using 0 for the zero vector.
29
for all (x1 , . . . , xn ) Rn and all R. Axioms (V0)(V3) are all satisfied, but (V4) fails.
Less trivial examples can be given using the notion of a basis, which has not been defined
yet.
The field K itself can be viewed as a vector space over itself, addition of vectors being
addition in the field, and multiplication by a scalar being multiplication in the field.
Example 2.6.
1. The fields R and C are vector spaces over R.
2. The groups Rn and Cn are vector spaces over R, and Cn is a vector space over C.
3. The ring R[X] of polynomials is a vector space over R, and C[X] is a vector space over
R and C. The ring of n n matrices Mn (R) is a vector space over R.
4. The ring C(]a, b[) of continuous functions f : ]a, b[ R is a vector space over R.
Let E be a vector space. We would like to define the important notions of linear combination and linear independence. These notions can be defined for sets of vectors in E,
but it will turn out to be more convenient to define them for families (vi )iI , where I is any
arbitrary index set.
2.3
One of the most useful properties of vector spaces is that there possess bases. What this
means is that in every vector space, E, there is some set of vectors, {e1 , . . . , en }, such that
every, vector, v E, can be written as a linear combination,
v = 1 e1 + + n en ,
of the ei , for some scalars, 1 , . . . , n K. Furthermore, the n-tuple, (1 , . . . , n ), as above
is unique.
This description is fine when E has a finite basis, {e1 , . . . , en }, but this is not always the
case! For example, the vector space of real polynomials, R[X], does not have a finite basis
but instead it has an infinite basis, namely
1, X, X 2 , . . . , X n , . . .
One might wonder if it is possible for a vector space to have bases of different sizes, or even
to have a finite basis as well as an infinite basis. We will see later on that this is not possible;
all bases of a vector space have the same number of elements (cardinality), which is called
the dimension of the space. However, we have the following problem: If a vector space has
30
31
i ui .
iI
P
When I = , we stipulate that v = 0. (By proposition 2.6, sums of the form iI i ui are
well defined.) We say that a family (ui )iI is linearly independent if for every family (i )iI
of scalars in K,
X
i ui = 0 implies that i = 0 for all i I.
iI
Equivalently, a family (ui )iI is linearly dependent if there is some family (i )iI of scalars
in K such that
X
i ui = 0 and j 6= 0 for some j I.
iI
i ui = 1 u + 2 v + 1 u
iI
makes sense. Using sets of vectors in the definition of a linear combination does not allow
such linear combinations; this is too restrictive.
Unravelling Definition 2.10, a family (ui )iI is linearly dependent iff some uj in the family
can be expressed as a linear combination of the other vectors in the family. Indeed, there is
some family (i )iI of scalars in K such that
X
iI
X
i(I{j})
1
j i ui .
Observe that one of the reasons for defining linear dependence for families of vectors
rather than for sets of vectors is that our definition allows multiple occurrences of a vector.
This is important because a matrix may contain identical columns, and we would like to say
that these columns are linearly dependent. The definition of linear dependence for sets does
not allow us to do that.
32
The above also shows that a family (ui )iI is linearly independent iff either I = , or I
consists of a single element i and ui 6= 0, or |I| 2 and no vector uj in the family can be
expressed as a linear combination of the other vectors in the family.
When I is nonempty, if the family (ui )iI is linearly independent, note that ui 6= 0 for
all
i
P I. Otherwise, if ui = 0 for some i I, then we get a nontrivial linear dependence
iI i ui = 0 by picking any nonzero i and letting k = 0 for all k I with k 6= i, since
i 0 = 0. If |I| 2, we must also have ui 6= uj for all i, j I with i 6= j, since otherwise we
get a nontrivial linear dependence by picking i = and j = for any nonzero , and
letting k = 0 for all k I with k 6= i, j.
Thus, the definition of linear independence implies that a nontrivial linearly indendent
family is actually a set. This explains why certain authors choose to define linear independence for sets of vectors. The problem with this approach is that linear dependence, which
is the logical negation of linear independence, is then only defined for sets of vectors. However, as we pointed out earlier, it is really desirable to define linear dependence for families
allowing multiple occurrences of the same vector.
Example 2.7.
1. Any two distinct scalars , 6= 0 in K are linearly dependent.
2. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) are linearly independent.
3. In R4 , the vectors (1, 1, 1, 1), (0, 1, 1, 1), (0, 0, 1, 1), and (0, 0, 0, 1) are linearly independent.
4. In R2 , the vectors u = (1, 1), v = (0, 1) and w = (2, 3) are linearly dependent, since
w = 2u + v.
Note that a family (ui )iI is linearly independent iff (uP
j )jJ is linearly independent for
every finite subset J of I (even when I = ).P
Indeed, when iI i ui = 0, theP
family (i )iI
of scalars in K has finite support, and thus iI i ui = 0 really means that jJ j uj = 0
for a finite subset J of I. When I is finite, we often assume that it is the set I = {1, 2, . . . , n}.
In this case, we denote the family (ui )iI as (u1 , . . . , un ).
The notion of a subspace of a vector space is defined as follows.
Definition 2.11. Given a vector space E, a subset F of E is a linear subspace (or subspace)
of E if F is nonempty and u + v F for all u, v F , and all , K.
It is easy to see that a subspace F of E is indeed a vector space, since the restriction
of + : E E E to F F is indeed a function + : F F F , and the restriction of
: K E E to K F is indeed a function : K F F .
33
It is also easy to see that any intersection of subspaces is a subspace. Since F is nonempty,
if we pick any vector u F and if we let = = 0, then u + u = 0u + 0u = 0, so every
subspace contains the vector 0. For any nonempty finite index set I, one can show by
induction on the cardinalityPof I that if (ui )iI is any family of vectors ui F and (i )iI is
any family of scalars, then iI i ui F .
The subspace {0} will be denoted by (0), or even 0 (with a mild abuse of notation).
Example 2.8.
1. In R2 , the set of vectors u = (x, y) such that
x+y =0
is a subspace.
2. In R3 , the set of vectors u = (x, y, z) such that
x+y+z =0
is a subspace.
3. For any n 0, the set of polynomials f (X) R[X] of degree at most n is a subspace
of R[X].
4. The set of upper triangular n n matrices is a subspace of the space of n n matrices.
Proposition 2.7. Given any vector space E, if S is any nonempty subset of E, then the
smallest subspace hSi (or Span(S)) of E containing S is the set of all (finite) linear combinations of elements from S.
Proof. We prove that the set Span(S) of all linear combinations of elements of S is a subspace
of E, leaving as an exercise the verification that every subspace containing S also contains
Span(S).
P
First,P
Span(S) is nonempty since it contains S (which is nonempty). If u = iI i ui
and v = jJ j vj are any two linear combinations in Span(S), for any two scalars , R,
u + v =
i ui +
iI
X
iIJ
j vj
jJ
i ui +
iI
j vj
jJ
i ui +
X
iIJ
(i + i )ui +
j vj ,
jJI
which is a linear combination with index set I J, and thus u + v Span(S), which
proves that Span(S) is a subspace.
34
One might wonder what happens if we add extra conditions to the coefficients involved
in forming linear combinations. Here are three natural restrictions which turn out to be
important (as usual, we assume that our index sets are finite):
(1) Consider combinations
iI
i ui for which
X
i = 1.
iI
These
are called affine combinations. One should realize that every linear combination
P
jJ
However, we get new spaces. For example, in R3 , the set of all affine combinations of
the three vectors e1 = (1, 0, 0), e2 = (0, 1, 0), and e3 = (0, 0, 1), is the plane passing
through these three points. Since it does not contain 0 = (0, 0, 0), it is not a linear
subspace.
(2) Consider combinations
iI
i ui for which
i 0,
for all i I.
These are called positive (or conic) combinations It turns out that positive combinations of families of vectors are cones. They show naturally in convex optimization.
(3) Consider combinations
iI
X
iI
i = 1,
These are called convex combinations. Given any finite family of vectors, the set of all
convex combinations of these vectors is a convex polyhedron. Convex polyhedra play a
very important role in convex optimization.
2.4
Given a vector space E, given a family (vi )iI , the subset V of E consisting of the null vector 0
and of all linear combinations of (vi )iI is easily seen to be a subspace of E. Subspaces having
such a generating family play an important role, and motivate the following definition.
35
Definition 2.12. Given a vector space E and a subspace V of E, a family (vi )iI of vectors
vi V spans V or generates V if for every v V , there is some family (i )iI of scalars in
K such that
X
v=
i vi .
iI
We also say that the elements of (vi )iI are generators of V and that V is spanned by (vi )iI ,
or generated by (vi )iI . If a subspace V of E is generated by a finite family (vi )iI , we say
that V is finitely generated . A family (ui )iI that spans V and is linearly independent is
called a basis of V .
Example 2.9.
1. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form a basis.
2. The vectors (1, 1, 1, 1), (1, 1, 1, 1), (1, 1, 0, 0), (0, 0, 1, 1) form a basis of R4 known
as the Haar basis. This basis and its generalization to dimension 2n are crucial in
wavelet theory.
3. In the subspace of polynomials in R[X] of degree at most n, the polynomials 1, X, X 2 ,
. . . , X n form a basis.
n
4. The Bernstein polynomials
(1 X)nk X k for k = 0, . . . , n, also form a basis of
k
that space. These polynomials play a major role in the theory of spline curves.
It is a standard result of linear algebra that every vector space E has a basis, and that
for any two bases (ui )iI and (vj )jJ , I and J have the same cardinality. In particular, if E
has a finite basis of n elements, every basis of E has n elements, and the integer n is called
the dimension of the vector space E. We begin with a crucial lemma.
Lemma 2.8. Given a linearly independent family (ui )iI of elements of a vector space E, if
v E is not a linear combination of (ui )iI , then the family (ui )iI k (v) obtained by adding
v to the family (ui )iI is linearly independent (where k
/ I).
P
Proof. Assume that v + iI i ui = 0, for any family (i )iI of scalars
If 6= 0, then
P in K.
1
has an inverse (because K is a field), and thus we have v = iI ( i )ui , showing
that v is a linear
Pcombination of (ui )iI and contradicting the hypothesis. Thus, = 0. But
then, we have iI i ui = 0, and since the family (ui )iI is linearly independent, we have
i = 0 for all i I.
The next theorem holds in general, but the proof is more sophisticated for vector spaces
that do not have a finite set of generators. Thus, in this chapter, we only prove the theorem
for finitely generated vector spaces.
36
Theorem 2.9. Given any finite family S = (ui )iI generating a vector space E and any
linearly independent subfamily L = (uj )jJ of S (where J I), there is a basis B of E such
that L B S.
Proof. Consider the set of linearly independent families B such that L B S. Since this
set is nonempty and finite, it has some maximal element, say B = (uh )hH . We claim that
B generates E. Indeed, if B does not generate E, then there is some up S that is not a
linear combination of vectors in B (since S generates E), with p
/ H. Then, by Lemma
2.8, the family B 0 = (uh )hH{p} is linearly independent, and since L B B 0 S, this
contradicts the maximality of B. Thus, B is a basis of E such that L B S.
Remark: Theorem 2.9 also holds for vector spaces that are not finitely generated. In this
case, the problem is to guarantee the existence of a maximal linearly independent family B
such that L B S. The existence of such a maximal family can be shown using Zorns
lemma, see Appendix 31 and the references given there.
A situation where the full generality of Theorem 2.9 is needed
is the case of the vector
space R over the field of coefficients Q. The numbers 1 and 2 are linearly independent
over Q, so according to Theorem 2.9, the linearly independent family L = (1, 2) can be
extended to a basis B of R. Since R is uncountable and Q is countable, such a basis must
be uncountable!
Let (vi )iI be a family of vectors in E. We say that (vi )iI a maximal linearly independent
family of E if it is linearly independent, and if for any vector w E, the family (vi )iI k {w}
obtained by adding w to the family (vi )iI is linearly dependent. We say that (vi )iI a
minimal generating family of E if it spans E, and if for any index p I, the family (vi )iI{p}
obtained by removing vp from the family (vi )iI does not span E.
The following proposition giving useful properties characterizing a basis is an immediate
consequence of Theorem 2.9.
Proposition 2.10. Given a vector space E, for any family B = (vi )iI of vectors of E, the
following properties are equivalent:
(1) B is a basis of E.
(2) B is a maximal linearly independent family of E.
(3) B is a minimal generating family of E.
Proof. We prove the equivalence of (1) and (2), leaving the equivalence of (1) and (3) as an
exercise.
Assume (1). We claim that B is a maximal linearly independent family. If B is not a
maximal linearly independent family, then there is some vector w E such that the family
B 0 obtained by adding w to B is linearly independent. However, since B is a basis of E, the
37
m
X
i ui +
i=1
n
X
j vj .
j=m+1
m
X
i ui ,
i=1
a nontrivial linear dependence of the ui , which is impossible since (u1 , . . . , um+1 ) are linearly
independent.
38
m
X
(1
m+1 i )ui
i=1
1
m+1 um+1
n
X
(1
m+1 j )vj .
j=m+2
Observe that the families (u1 , . . . , um , vm+1 , . . . , vn ) and (u1 , . . . , um+1 , vm+2 , . . . , vn ) generate
the same subspace, since um+1 is a linear combination of (u1 , . . . , um , vm+1 , . . . , vn ) and vm+1
is a linear combination of (u1 , . . . , um+1 , vm+2 , . . . , vn ). Since (u1 , . . . , um , vm+1 , . . . , vn ) and
(v1 , . . . , vn ) generate the same subspace, we conclude that (u1 , . . . , um+1 , vm+2 , . . . , vn ) and
and (v1 , . . . , vn ) generate the same subspace, which concludes the induction hypothesis.
For the sake of completeness, here is a more formal statement of the replacement lemma
(and its proof).
Proposition 2.12. (Replacement lemma, version 2) Given a vector space E, let (ui )iI be
any finite linearly independent family in E, where |I| = m, and let (vj )jJ be any finite family
such that every ui is a linear combination of (vj )jJ , where |J| = n. Then, there exists a set
L and an injection : L J (a relabeling function) such that L I = , |L| = n m, and
the families (ui )iI (v(l) )lL and (vj )jJ generate the same subspace of E. In particular,
m n.
Proof. We proceed by induction on |I| = m. When m = 0, the family (ui )iI is empty, and
the proposition holds trivially with L = J ( is the identity). Assume |I| = m + 1. Consider
the linearly independent family (ui )i(I{p}) , where p is any member of I. By the induction
hypothesis, there exists a set L and an injection : L J such that L (I {p}) = ,
|L| = n m, and the families (ui )i(I{p}) (v(l) )lL and (vj )jJ generate the same subspace
of E. If p L, we can replace L by (L {p}) {p0 } where p0 does not belong to I L, and
replace by the injection 0 which agrees with on L {p} and such that 0 (p0 ) = (p).
Thus, we can always assume that L I = . Since up is a linear combination of (vj )jJ
and the families (ui )i(I{p}) (v(l) )lL and (vj )jJ generate the same subspace of E, up is
a linear combination of (ui )i(I{p}) (v(l) )lL . Let
X
X
i ui +
l v(l) .
(1)
up =
i(I{p})
lL
i ui up = 0,
contradicting the fact that (ui )iI is linearly independent. Thus, l 6= 0 for some l L, say
l = q. Since q 6= 0, we have
X
X
1
v(q) =
(1
(1
(2)
q i )ui + q up +
q l )v(l) .
i(I{p})
l(L{q})
39
We claim that the families (ui )i(I{p}) (v(l) )lL and (ui )iI (v(l) )l(L{q}) generate the
same subset of E. Indeed, the second family is obtained from the first by replacing v(q) by up ,
and vice-versa, and up is a linear combination of (ui )i(I{p}) (v(l) )lL , by (1), and v(q) is a
linear combination of (ui )iI (v(l) )l(L{q}) , by (2). Thus, the families (ui )iI (v(l) )l(L{q})
and (vj )jJ generate the same subspace of E, and the proposition holds for L {q} and the
restriction of the injection : L J to L {q}, since L I = and |L| = n m imply that
(L {q}) I = and |L {q}| = n (m + 1).
The idea is that m of the vectors vj can be replaced by the linearly independent ui s in
such a way that the same subspace is still generated. The purpose of the function : L J
is to pick n m elements j1 , . . . , jnm of J and to relabel them l1 , . . . , lnm in such a way
that these new indices do not clash with the indices in I; this way, the vectors vj1 , . . . , vjnm
who survive (i.e. are not replaced) are relabeled vl1 , . . . , vlnm , and the other m vectors vj
with j J {j1 , . . . , jnm } are replaced by the ui . The index set of this new family is I L.
Actually, one can prove that Proposition 2.12 implies Theorem 2.9 when the vector space
is finitely generated. Putting Theorem 2.9 and Proposition 2.12 together, we obtain the
following fundamental theorem.
Theorem 2.13. Let E be a finitely generated vector space. Any family (ui )iI generating E
contains a subfamily (uj )jJ which is a basis of E. Any linearly independent family (ui )iI
can be extended to a family (uj )jJ which is a basis of E (with I J). Furthermore, for
every two bases (ui )iI and (vj )jJ of E, we have |I| = |J| = n for some fixed integer n 0.
Proof. The first part follows immediately by applying Theorem 2.9 with L = and S =
(ui )iI . For the second part, consider the family S 0 = (ui )iI (vh )hH , where (vh )hH is any
finitely generated family generating E, and with I H = . Then, apply Theorem 2.9 to
L = (ui )iI and to S 0 . For the last statement, assume that (ui )iI and (vj )jJ are bases of
E. Since (ui )iI is linearly independent and (vj )jJ spans E, proposition 2.12 implies that
|I| |J|. A symmetric argument yields |J| |I|.
Remark: Theorem 2.13 also holds for vector spaces that are not finitely generated. This
can be shown as follows. Let (ui )iI be a basis of E, let (vj )jJ be a generating family of E,
and assume that I is infinite. For every j J, let Lj I be the finite set
X
Lj = {i I | vj =
i ui , i 6= 0}.
iI
40
When E is not finitely generated, we say that E is of infinite dimension. The dimension
of a vector space E is the common cardinality of all of its bases and is denoted by dim(E).
Clearly, if the field K itself is viewed as a vector space, then every family (a) where a K
and a 6= 0 is a basis. Thus dim(K) = 1. Note that dim({0}) = 0.
If E is a vector space, for any subspace U of E, if dim(U ) = 1, then U is called a line; if
dim(U ) = 2, then U is called a plane. If dim(U ) = k, then U is sometimes called a k-plane.
Let (ui )iI be a basis of a vector space E. For any vector v E, since the family (ui )iI
generates E, there is a family (i )iI of scalars in K, such that
X
v=
i ui .
iI
and since (ui )iI is linearly independent, we must have i i = 0 for all i I, that is, i = i
for all i I. The converse is shown by contradiction. If (ui )iI was linearly dependent, there
would be a family (i )iI of scalars not all null such that
X
i u i = 0
iI
iI
iI
iI
with j 6= j +
Pj since j 6= 0, contradicting the assumption that (i )iI is the unique family
such that v = iI i ui .
If (ui )iI is a basis of a vector space E, for any vector v E, if (xi )iI is the unique
family of scalars in K such that
X
v=
xi ui ,
iI
each xi is called the component (or coordinate) of index i of v with respect to the basis (ui )iI .
Given a field K and any (nonempty) set I, we can form a vector space K (I) which, in
some sense, is the standard vector space of dimension |I|.
41
Definition 2.13. Given a field K and any (nonempty) set I, let K (I) be the subset of the
cartesian product K I consisting of all families (i )iI with finite support of scalars in K.4
We define addition and multiplication by a scalar as follows:
(i )iI + (i )iI = (i + i )iI ,
and
(i )iI = (i )iI .
It is immediately verified that addition and multiplication by a scalar are well defined.
Thus, K (I) is a vector space. Furthermore, because families with finite support are considered, the family (ei )iI of vectors ei , defined such that (ei )j = 0 if j 6= i and (ei )i = 1, is
clearly a basis of the vector space K (I) . When I = {1, . . . , n}, we denote K (I) by K n . The
function : I K (I) , such that (i) = ei for every i I, is clearly an injection.
When I is a finite set, K (I) = K I , but this is false when I is infinite. In fact, dim(K (I) ) =
|I|, but dim(K I ) is strictly greater when I is infinite.
Many interesting mathematical structures are vector spaces. A very important example
is the set of linear maps between two vector spaces to be defined in the next section. Here
is an example that will prepare us for the vector space of linear maps.
Example 2.10. Let X be any nonempty set and let E be a vector space. The set of all
functions f : X E can be made into a vector space as follows: Given any two functions
f : X E and g : X E, let (f + g) : X E be defined such that
(f + g)(x) = f (x) + g(x)
for all x X, and for every K, let f : X E be defined such that
(f )(x) = f (x)
for all x X. The axioms of a vector space are easily verified. Now, let E = K, and let I
be the set of all nonempty subsets of X. For every S I, let fS : X E be the function
such that fS (x) = 1 iff x S, and fS (x) = 0 iff x
/ S. We leave as an exercise to show that
(fS )SI is linearly independent.
2.5
Linear Maps
A function between two vector spaces that preserves the vector space structure is called
a homomorphism of vector spaces, or linear map. Linear maps formalize the concept of
linearity of a function. In the rest of this section, we assume that all vector spaces are over
a given field K (say R).
4
42
Definition 2.14. Given two vector spaces E and F , a linear map between E and F is a
function f : E F satisfying the following two conditions:
for all x, y E;
for all K, x E.
f (x + y) = f (x) + f (y)
f (x) = f (x)
Setting x = y = 0 in the first identity, we get f (0) = 0. The basic property of linear
maps is that they transform linear combinations into linear combinations. Given a family
(ui )iI of vectors in E, given any family (i )iI of scalars in K, we have
X
X
f(
i ui ) =
i f (ui ).
iI
iI
The above identity is shown by induction on the size of the support of the family (i ui )iI ,
using the properties of Definition 2.14.
Example 2.11.
1. The map f : R2 R2 defined such that
x0 = x y
y0 = x + y
is a linear map. The reader should
check that it is the composition of a rotation by
/4 with a magnification of ratio 2.
2. For any vector space E, the identity map id : E E given by
id(u) = u for all u E
is a linear map. When we want to be more precise, we write idE instead of id.
3. The map D : R[X] R[X] defined such that
D(f (X)) = f 0 (X),
where f 0 (X) is the derivative of the polynomial f (X), is a linear map.
4. The map : C([a, b]) R given by
Z
(f ) =
f (t)dt,
a
where C([a, b]) is the set of continuous functions defined on the interval [a, b], is a linear
map.
43
f (t)g(t)dt,
a
is linear in each of the variable f , g. It also satisfies the properties hf, gi = hg, f i and
hf, f i = 0 iff f = 0. It is an example of an inner product.
Definition 2.15. Given a linear map f : E F , we define its image (or range) Im f = f (E),
as the set
Im f = {y F | (x E)(y = f (x))},
and its Kernel (or nullspace) Ker f = f 1 (0), as the set
Ker f = {x E | f (x) = 0}.
Proposition 2.15. Given a linear map f : E F , the set Im f is a subspace of F and the
set Ker f is a subspace of E. The linear map f : E F is injective iff Ker f = 0 (where 0
is the trivial subspace {0}).
Proof. Given any x, y Im f , there are some u, v E such that x = f (u) and y = f (v),
and for all , K, we have
f (u + v) = f (u) + f (v) = x + y,
and thus, x + y Im f , showing that Im f is a subspace of F .
Given any x, y Ker f , we have f (x) = 0 and f (y) = 0, and thus,
f (x + y) = f (x) + f (y) = 0,
that is, x + y Ker f , showing that Ker f is a subspace of E.
First, assume that Ker f = 0. We need to prove that f (x) = f (y) implies that x = y.
However, if f (x) = f (y), then f (x) f (y) = 0, and by linearity of f we get f (x y) = 0.
Because Ker f = 0, we must have x y = 0, that is x = y, so f is injective. Conversely,
assume that f is injective. If x Ker f , that is f (x) = 0, since f (0) = 0 we have f (x) =
f (0), and by injectivity, x = 0, which proves that Ker f = 0. Therefore, f is injective iff
Ker f = 0.
Since by Proposition 2.15, the image Im f of a linear map f is a subspace of F , we can
define the rank rk(f ) of f as the dimension of Im f .
A fundamental property of bases in a vector space is that they allow the definition of
linear maps as unique homomorphic extensions, as shown in the following proposition.
44
Proposition 2.16. Given any two vector spaces E and F , given any basis (ui )iI of E,
given any other family of vectors (vi )iI in F , there is a unique linear map f : E F such
that f (ui ) = vi for all i I. Furthermore, f is injective iff (vi )iI is linearly independent,
and f is surjective iff (vi )iI generates F .
Proof. If such a linear map f : E F exists, since (ui )iI is a basis of E, every vector x E
can written uniquely as a linear combination
X
x=
xi ui ,
iI
xi f (ui ) =
iI
xi vi .
iI
xi vi
iI
P
for every x =
iI xi ui . It is easy to verify that f is indeed linear, it is unique by the
previous reasoning, and obviously, f (ui ) = vi .
Now, assume that f is injective. Let (i )iI be any family of scalars, and assume that
X
i vi = 0.
iI
iI
iI
i ui = 0,
iI
and since (ui )iI is a basis, we have i = 0 for all i I, which shows that (vi )iI is linearly
independent. Conversely, assume that (vi )iI is linearly
Pindependent. Since (ui )iI is a basis
of E, every vector x E is a linear combination x = iI i ui of (ui )iI . If
X
f (x) = f (
i ui ) = 0,
iI
then
X
iI
i vi =
X
iI
i f (ui ) = f (
i ui ) = 0,
iI
and i = 0 for all i I because (vi )iI is linearly independent, which means that x = 0.
Therefore, Ker f = 0, which implies that f is injective. The part where f is surjective is left
as a simple exercise.
45
By the second part of Proposition 2.16, an injective linear map f : E F sends a basis
(ui )iI to a linearly independent family (f (ui ))iI of F , which is also a basis when f is
bijective. Also, when E and F have the same finite dimension n, (ui )iI is a basis of E, and
f : E F is injective, then (f (ui ))iI is a basis of F (by Proposition 2.10).
We can now show that the vector space K (I) of Definition 2.13 has a universal property
that amounts to saying that K (I) is the vector space freely generated by I. Recall that
: I K (I) , such that (i) = ei for every i I, is an injection from I to K (I) .
Proposition 2.17. Given any set I, for any vector space F , and for any function f : I F ,
there is a unique linear map f : K (I) F , such that
f = f ,
as in the following diagram:
I CC / K (I)
CC
CC
f
C
f CC!
F
46
xi f (ui ) =
iI
xi vi .
iI
This shows that f is unique if it exists. Conversely, assume that (ui )iI does not generate E.
Since F is nontrivial, there is some some vector y F such that y 6= 0. Since (ui )iI does
not generate E, there is some vector w E that is not in the subspace generated by (ui )iI .
By Theorem 2.13, there is a linearly independent subfamily (ui )iI0 of (ui )iI generating the
same subspace. Since by hypothesis, w E is not in the subspace generated by (ui )iI0 , by
Lemma 2.8 and by Theorem 2.13 again, there is a basis (ej )jI0 J of E, such that ei = ui ,
for all i I0 , and w = ej0 , for some j0 J. Letting (vi )iI be the family in F such that
vi = 0 for all i I, defining f : E F to be the constant linear map with value 0, we have
a linear map such that f (ui ) = 0 for all i I. By Proposition 2.16, there is a unique linear
map g : E F such that g(w) = y, and g(ej ) = 0, for all j (I0 J) {j0 }. By definition
of the basis (ej )jI0 J of E, we have, g(ui ) = 0 for all i I, and since f 6= g, this contradicts
the fact that there is at most one such map.
(2) If the family (ui )iI is linearly independent, then by Theorem 2.13, (ui )iI can be
extended to a basis of E, and the conclusion follows by Proposition 2.16. Conversely, assume
that (ui )iI is linearly dependent. Then, there is some family (i )iI of scalars (not all zero)
such that
X
i ui = 0.
iI
By the assumption, for any nonzero vector, y F , for every i I, there is some linear map
fi : E F , such that fi (ui ) = y, and fi (uj ) = 0, for j I {i}. Then, we would get
X
X
i fi (ui ) = i y,
i ui ) =
0 = fi (
iI
iI
and since y 6= 0, this implies i = 0, for every i I. Thus, (ui )iI is linearly independent.
Given vector spaces E, F , and G, and linear maps f : E F and g : F G, it is easily
verified that the composition g f : E G of f and g is a linear map.
and f g = idF .
()
Such a map g is unique. This is because if g and h both satisfy g f = idE , f g = idF ,
h f = idE , and f h = idF , then
g = g idF = g (f h) = (g f ) h = idE h = h.
The map g satisfying () above is called the inverse of f and it is also denoted by f 1 .
47
Proposition 2.16 implies that if E and F are two vector spaces, (ui )iI is a basis of E,
and f : E F is a linear map which is an isomorphism, then the family (f (ui ))iI is a basis
of F .
One can verify that if f : E F is a bijective linear map, then its inverse f 1 : F E
is also a linear map, and thus f is an isomorphism.
Another useful corollary of Proposition 2.16 is this:
Proposition 2.19. Let E be a vector space of finite dimension n 1 and let f : E E be
any linear map. The following properties hold:
(1) If f has a left inverse g, that is, if g is a linear map such that g f = id, then f is an
isomorphism and f 1 = g.
(2) If f has a right inverse h, that is, if h is a linear map such that f h = id, then f is
an isomorphism and f 1 = h.
Proof. (1) The equation g f = id implies that f is injective; this is a standard result
about functions (if f (x) = f (y), then g(f (x)) = g(f (y)), which implies that x = y since
g f = id). Let (u1 , . . . , un ) be any basis of E. By Proposition 2.16, since f is injective,
(f (u1 ), . . . , f (un )) is linearly independent, and since E has dimension n, it is a basis of
E (if (f (u1 ), . . . , f (un )) doesnt span E, then it can be extended to a basis of dimension
strictly greater than n, contradicting Theorem 2.13). Then, f is bijective, and by a previous
observation its inverse is a linear map. We also have
g = g id = g (f f 1 ) = (g f ) f 1 = id f 1 = f 1 .
(2) The equation f h = id implies that f is surjective; this is a standard result about
functions (for any y E, we have f (g(y)) = y). Let (u1 , . . . , un ) be any basis of E. By
Proposition 2.16, since f is surjective, (f (u1 ), . . . , f (un )) spans E, and since E has dimension
n, it is a basis of E (if (f (u1 ), . . . , f (un )) is not linearly independent, then because it spans
E, it contains a basis of dimension strictly smaller than n, contradicting Theorem 2.13).
Then, f is bijective, and by a previous observation its inverse is a linear map. We also have
h = id h = (f 1 f ) h = f 1 (f h) = f 1 id = f 1 .
This completes the proof.
The set of all linear maps between two vector spaces E and F is denoted by Hom(E, F )
or by L(E; F ) (the notation L(E; F ) is usually reserved to the set of continuous linear maps,
where E and F are normed vector spaces). When we wish to be more precise and specify
the field K over which the vector spaces E and F are defined we write HomK (E, F ).
The set Hom(E, F ) is a vector space under the operations defined at the end of Section
2.1, namely
(f + g)(x) = f (x) + g(x)
48
(f )(x) = f (x)
for all x E. The point worth checking carefully is that f is indeed a linear map, which
uses the commutativity of in the field K. Indeed, we have
(f )(x) = f (x) = f (x) = f (x) = (f )(x).
When E and F have finite dimensions, the vector space Hom(E, F ) also has finite dimension, as we shall see shortly. When E = F , a linear map f : E E is also called an
endomorphism. It is also important to note that composition confers to Hom(E, E) a ring
structure. Indeed, composition is an operation : Hom(E, E) Hom(E, E) Hom(E, E),
which is associative and has an identity idE , and the distributivity properties hold:
(g1 + g2 ) f = g1 f + g2 f ;
g (f1 + f2 ) = g f1 + g f2 .
The ring Hom(E, E) is an example of a noncommutative ring. It is easily seen that the
set of bijective linear maps f : E E is a group under composition. Bijective linear maps
are also called automorphisms. The group of automorphisms of E is called the general linear
group (of E), and it is denoted by GL(E), or by Aut(E), or when E = K n , by GL(n, K),
or even by GL(n).
Although in this book, we will not have many occasions to use quotient spaces, they are
fundamental in algebra. The next section may be omitted until needed.
2.6
Quotient Spaces
Let E be a vector space, and let M be any subspace of E. The subspace M induces a relation
M on E, defined as follows: For all u, v E,
u M v iff u v M .
Proposition 2.20. Given any vector space E and any subspace M of E, the relation M
is an equivalence relation with the following two congruential properties:
1. If u1 M v1 and u2 M v2 , then u1 + u2 M v1 + v2 , and
2. if u M v, then u M v.
Proof. It is obvious that M is an equivalence relation. Note that u1 M v1 and u2 M v2
are equivalent to u1 v1 = w1 and u2 v2 = w2 , with w1 , w2 M , and thus,
(u1 + u2 ) (v1 + v2 ) = w1 + w2 ,
49
2.7. SUMMARY
2.7
Summary
The main concepts and results of this chapter are listed below:
Groups, rings and fields.
The notion of a vector space.
Families of vectors.
Linear combinations of vectors; linear dependence and linear independence of a family
of vectors.
Linear subspaces.
Spanning (or generating) family; generators, finitely generated subspace; basis of a
subspace.
Every linearly independent family can be extended to a basis (Theorem 2.9).
50
Chapter 3
Matrices and Linear Maps
3.1
Matrices
Proposition 2.16 shows that given two vector spaces E and F and a basis (uj )jJ of E,
every linear map f : E F is uniquely determined by the family (f (uj ))jJ of the images
under f of the vectors in the basis (uj )jJ . Thus, in particular, taking F = K (J) , we get an
isomorphism between any vector space E of dimension |J| and K (J) . If J = {1, . . . , n}, a
vector space E of dimension n is isomorphic to the vector space K n . If we also have a basis
(vi )iI of F , then every vector f (uj ) can be written in a unique way as
f (uj ) =
ai j v i ,
iI
where j J, for a family of scalars (ai j )iI . Thus, with respect to the two bases (uj )jJ
of E and (vi )iI of F , the linear map f is completely determined by a possibly infinite
I J-matrix M (f ) = (ai j )iI, jJ .
Remark: Note that we intentionally assigned the index set J to the basis (uj )jJ of E,
and the index I to the basis (vi )iI of F , so that the rows of the matrix M (f ) associated
with f : E F are indexed by I, and the columns of the matrix M (f ) are indexed by J.
Obviously, this causes a mildly unpleasant reversal. If we had considered the bases (ui )iI of
E and (vj )jJ of F , we would obtain a J I-matrix M (f ) = (aj i )jJ, iI . No matter what
we do, there will be a reversal! We decided to stick to the bases (uj )jJ of E and (vi )iI of
F , so that we get an I J-matrix M (f ), knowing that we may occasionally suffer from this
decision!
When I and J are finite, and say, when |I| = m and |J| = n, the linear map f is
determined by the matrix M (f ) whose entries in the j-th column are the components of the
51
52
a1 1
a2 1
M (f ) = ..
.
am 1
. . . a1 n
. . . a2 n
..
..
.
.
. . . am n
We will now show that when E and F have finite dimension, linear maps can be very
conveniently represented by matrices, and that composition of linear maps corresponds to
matrix multiplication. We will follow rather closely an elegant presentation method due to
Emil Artin.
Let E and F be two vector spaces, and assume that E has a finite basis (u1 , . . . , un ) and
that F has a finite basis (v1 , . . . , vm ). Recall that we have shown that every vector x E
can be written in a unique way as
x = x1 u1 + + xn un ,
and similarly every vector y F can be written in a unique way as
y = y1 v1 + + ym vm .
Let f : E F be a linear map between E and F . Then, for every x = x1 u1 + + xn un in
E, by linearity, we have
f (x) = x1 f (u1 ) + + xn f (un ).
Let
or more concisely,
f (uj ) = a1 j v1 + + am j vm ,
f (uj ) =
m
X
ai j v i ,
i=1
for every j, 1 j n. This can be expressed by writing the coefficients a1j , a2j , . . . , amj of
f (uj ) over the basis (v1 , . . . , vm ), as the jth column of a matrix, as shown below:
v1
v2
..
.
vm
f (u1 ) f (u2 )
a11
a12
a21
a22
..
..
.
.
am1 am2
. . . f (un )
. . . a1n
. . . a2n
..
...
.
. . . amn
Then, substituting the right-hand side of each f (uj ) into the expression for f (x), we get
m
m
X
X
f (x) = x1 (
ai 1 vi ) + + xn (
ai n vi ),
i=1
i=1
53
3.1. MATRICES
which, by regrouping terms to obtain a linear combination of the vi , yields
f (x) = (
n
X
j=1
a1 j xj )v1 + + (
n
X
am j xj )vm .
j=1
n
X
ai j x j
(1)
j=1
for all i, 1 i m.
To make things more concrete, let us treat the case where n = 3 and m = 2. In this case,
f (u1 ) = a11 v1 + a21 v2
f (u2 ) = a12 v1 + a22 v2
f (u3 ) = a13 v1 + a23 v2 ,
54
Let E, F , and G, be three vectors spaces with respective bases (u1 , . . . , up ) for E,
(v1 , . . . , vn ) for F , and (w1 , . . . , wm ) for G. Let g : E F and f : F G be linear maps.
As explained earlier, g : E F is determined by the images of the basis vectors uj , and
f : F G is determined by the images of the basis vectors vk . We would like to understand
how f g : E G is determined by the images of the basis vectors uj .
Remark: Note that we are considering linear maps g : E F and f : F G, instead
of f : E F and g : F G, which yields the composition f g : E G instead of
g f : E G. Our perhaps unusual choice is motivated by the fact that if f is represented
by a matrix M (f ) = (ai k ) and g is represented by a matrix M (g) = (bk j ), then f g : E G
is represented by the product AB of the matrices A and B. If we had adopted the other
choice where f : E F and g : F G, then g f : E G would be represented by the
product BA. Personally, we find it easier to remember the formula for the entry in row i and
column of j of the product of two matrices when this product is written by AB, rather than
BA. Obviously, this is a matter of taste! We will have to live with our perhaps unorthodox
choice.
Thus, let
f (vk ) =
m
X
ai k w i ,
i=1
n
X
bk j v k ,
k=1
a11
a12
. . . a1n
a21
a22 . . . a2n
..
..
.
.
..
..
.
.
am1 am2 . . . amn
and
v1
v2
..
.
vn
g(u1 ) g(u2 )
b11
b12
b21
b22
..
..
.
.
bn1
bn2
. . . g(up )
. . . b1p
. . . b2p
..
..
.
.
. . . bnp
55
3.1. MATRICES
By previous considerations, for every
x = x1 u1 + + xp up ,
letting g(x) = y = y1 v1 + + yn vn , we have
p
X
yk =
bk j x j
(2)
j=1
zi =
ai k y k
(3)
k=1
for all i, 1 i m. Then, if y = g(x) and z = f (y), we have z = f (g(x)), and in view of
(2) and (3), we have
zi =
=
n
X
p
X
ai k (
b k j xj )
j=1
k=1
p
n
XX
ai k b k j x j
k=1 j=1
=
=
p
n
X
X
ai k b k j x j
j=1 k=1
p
n
X
X
ai k bk j )xj .
j=1 k=1
n
X
ai k b k j ,
k=1
p
X
ci j x j
(4)
j=1
56
a1 1 a1 2 . . .
a2 1 a2 2 . . .
..
..
...
.
.
am 1 am 2 . . .
1jn
of scalars in
a1 n
a2 n
..
.
am n
a1 1
..
.
am 1
In these last two cases, we usually omit the constant index 1 (first index in case of a row,
second index in case of a column). The set of all m n-matrices is denoted by Mm,n (K)
or Mm,n . An n n-matrix is called a square matrix of dimension n. The set of all square
matrices of dimension n is denoted by Mn (K), or Mn .
Remark: As defined, a matrix A = (ai j )1im, 1jn is a family, that is, a function from
{1, 2, . . . , m} {1, 2, . . . , n} to K. As such, there is no reason to assume an ordering on
the indices. Thus, the matrix A can be represented in many different ways as an array, by
adopting different orders for the rows or the columns. However, it is customary (and usually
convenient) to assume the natural ordering on the sets {1, 2, . . . , m} and {1, 2, . . . , n}, and
to represent A as an array according to this ordering of the rows and columns.
We also define some operations on matrices as follows.
Definition 3.2. Given two m n matrices A = (ai j ) and B = (bi j ), we define their sum
A + B as the matrix C = (ci j ) such that ci j = ai j + bi j ; that is,
a1 1
a2 1
..
.
a1 2
a2 2
..
.
am 1 am 2
b1 1 b1 2
. . . a1 n
b2 1 b2 2
. . . a2 n
.. + ..
..
..
.
. .
.
. . . am n
bm 1 bm 2
. . . b1 n
. . . b2 n
..
..
.
.
. . . bm n
a1 1 + b 1 1
a2 1 + b 2 1
=
..
a1 2 + b1 2
a2 2 + b 2 2
..
.
...
...
...
a1 n + b1 n
a2 n + b2 n
..
.
am 1 + bm 1 am 2 + bm 2 . . . am n + bm n
57
3.1. MATRICES
For any matrix A = (ai j ), we let A be the matrix (ai j ). Given a scalar K, we define
the matrix A as the matrix C = (ci j ) such that ci j = ai j ; that is
a1 1 a1 2 . . . a1 n
a1 1 a1 2 . . . a1 n
a2 1 a2 2 . . . a2 n a2 1 a2 2 . . . a2 n
..
..
.. = ..
..
.. .
.
.
.
.
.
.
.
.
. .
.
.
am 1 am 2 . . . am n
am 1 am 2 . . . am n
Given an m n matrices A = (ai k ) and an n p matrices B = (bk j ), we define their product
AB as the m p matrix C = (ci j ) such that
ci j =
n
X
ai k b k j ,
k=1
a1 1 a1 2 . . . a1 n
b1 1 b1 2 . . .
a2 1 a2 2 . . . a2 n b2 1 b2 2 . . .
..
..
.. ..
.. . .
...
.
.
.
. .
.
am 1 am 2 . . . am n
bn 1 bn 2 . . .
= C shown below
c1 1 c1 2
b1 p
b 2 p c2 1 c2 2
..
.. = ..
.
. .
cm 1 cm 2
bn p
. . . c1 p
. . . c2 p
..
...
.
. . . cm p
note that the entry of index i and j of the matrix AB obtained by multiplying the matrices
A and B can be identified with the product of the row matrix corresponding to the i-th row
of A with the column matrix corresponding to the j-column of B:
b1 j
n
.. X
(ai 1 ai n ) . =
ai k b k j .
k=1
bn j
The square matrix In of dimension n containing 1 on the diagonal and 0 everywhere else
is called the identity matrix . It is denoted as
1 0 ... 0
0 1 . . . 0
.. .. . . ..
. .
. .
0 0 ... 1
Given an m n matrix A = (ai j ), its transpose A> = (a>
j i ), is the n m-matrix such
that a>
=
a
,
for
all
i,
1
m,
and
all
j,
1
n.
ij
ji
58
The following observation will be useful later on when we discuss the SVD. Given any
m n matrix A and any n p matrix B, if we denote the columns of A by A1 , . . . , An and
the rows of B by B1 , . . . , Bn , then we have
AB = A1 B1 + + An Bn .
For every square matrix A of dimension n, it is immediately verified that AIn = In A = A.
If a matrix B such that AB = BA = In exists, then it is unique, and it is called the inverse
of A. The matrix B is also denoted by A1 . An invertible matrix is also called a nonsingular
matrix, and a matrix that is not invertible is called a singular matrix.
Proposition 2.19 shows that if a square matrix A has a left inverse, that is a matrix B
such that BA = I, or a right inverse, that is a matrix C such that AC = I, then A is actually
invertible; so B = A1 and C = A1 . These facts also follow from Proposition 4.14.
It is immediately verified that the set Mm,n (K) of m n matrices is a vector space under
addition of matrices and multiplication of a matrix by a scalar. Consider the m n-matrices
Ei,j = (eh k ), defined such that ei j = 1, and eh k = 0, if h 6= i or k 6= j. It is clear that every
matrix A = (ai j ) Mm,n (K) can be written in a unique way as
A=
m X
n
X
ai j Ei,j .
i=1 j=1
Thus, the family (Ei,j )1im,1jn is a basis of the vector space Mm,n (K), which has dimension mn.
Remark: Definition 3.1 and Definition 3.2 also make perfect sense when K is a (commutative) ring rather than a field. In this more general setting, the framework of vector spaces
is too narrow, but we can consider structures over a commutative ring A satisfying all the
axioms of Definition 2.9. Such structures are called modules. The theory of modules is
(much) more complicated than that of vector spaces. For example, modules do not always
have a basis, and other properties holding for vector spaces usually fail for modules. When
a module has a basis, it is called a free module. For example, when A is a commutative
ring, the structure An is a module such that the vectors ei , with (ei )i = 1 and (ei )j = 0 for
j 6= i, form a basis of An . Many properties of vector spaces still hold for An . Thus, An is a
free module. As another example, when A is a commutative ring, Mm,n (A) is a free module
with basis (Ei,j )1im,1jn . Polynomials over a commutative ring also form a free module
of infinite dimension.
Square matrices provide a natural example of a noncommutative ring with zero divisors.
Example 3.1. For example, letting A, B be the 2 2-matrices
1 0
0 0
A=
, B=
,
0 0
1 0
59
3.1. MATRICES
then
1 0
0 0
0 0
0 0
=
,
1 0
0 0
0 0
1 0
1 0
0 0
=
.
0 0
1 0
AB =
and
BA =
m
X
i=1
ai j vi ,
for every j, 1 j n.
The coefficients a1j , a2j , . . . , amj of f (uj ) over the basis (v1 , . . . , vm ) form the jth column of
the matrix M (f ) shown below:
v1
v2
..
.
vm
f (u1 ) f (u2 )
a11
a12
a21
a22
..
..
.
.
am1 am2
. . . f (un )
. . . a1n
. . . a2n
.. .
..
.
.
. . . amn
The matrix M (f ) associated with the linear map f : E F is called the matrix of f with
respect to the bases (u1 , . . . , un ) and (v1 , . . . , vm ). When E = F and the basis (v1 , . . . , vm )
is identical to the basis (u1 , . . . , un ) of E, the matrix M (f ) associated with f : E E (as
above) is called the matrix of f with respect to the base (u1 , . . . , un ).
Remark: As in the remark after Definition 3.1, there is no reason to assume that the vectors
in the bases (u1 , . . . , un ) and (v1 , . . . , vm ) are ordered in any particular way. However, it is
often convenient to assume the natural ordering. When this is so, authors sometimes refer
60
to the matrix M (f ) as the matrix of f with respect to the ordered bases (u1 , . . . , un ) and
(v1 , . . . , vm ).
Then, given a linear map f : E F represented by the matrix M (f ) = (ai j ) w.r.t. the
bases (u1 , . . . , un ) and (v1 , . . . , vm ), by equations (1) and the definition of matrix multiplication, the equation y = f (x) correspond to the matrix equation M (y) = M (f )M (x), that
is,
x1
y1
a1 1 . . . a1 n
.. ..
.
.
..
.. ...
. = .
.
ym
am 1 . . . am n
xn
Recall that
a1 1 a1 2
a2 1 a2 2
..
..
.
.
am 1 am 2
. . . a1 n
x1
a1 1
a1 2
a1 n
a2 1
a2 2
a2 n
. . . a2 n
x2
=
x
+
x
+
+
x
..
..
..
..
1
2
n .. .
...
.
.
.
.
.
. . . am n
xn
am 1
am 2
am n
The above notation seems reasonable, but it has the slight disadvantage that in the
expression MU ,V (f )xU , the input argument xU which is fed to the matrix MU ,V (f ) does not
appear next to the subscript U in MU ,V (f ). We could have used the notation MV,U (f ), and
some people do that. But then, we find a bit confusing that V comes before U when f maps
from the space E with the basis U to the space F with the basis V. So, we prefer to use the
notation MU ,V (f ).
Be aware that other authors such as Meyer [80] use the notation [f ]U ,V , and others such
as Dummit and Foote [32] use the notation MUV (f ), instead of MU ,V (f ). This gets worse!
You may find the notation MVU (f ) (as in Lang [67]), or U [f ]V , or other strange notations.
Let us illustrate the representation of a linear map by a matrix in a concrete situation.
Let E be the vector space R[X]4 of polynomials of degree at most 4, let F be the vector
61
3.1. MATRICES
space R[X]3 of polynomials of degree at most 3, and let the linear map be the derivative
map d: that is,
d(P + Q) = dP + dQ
d(P ) = dP,
with R. We choose (1, x, x2 , x3 , x4 ) as a basis of E and (1, x, x2 , x3 ) as a basis of F .
Then, the 4 5 matrix D associated with d is obtained by expressing the derivative dxi of
each basis vector for i = 0, 1, 2, 3, 4 over the basis (1, x, x2 , x3 ). We find
0 1 0 0 0
0 0 2 0 0
D=
0 0 0 3 0 .
0 0 0 0 4
Then, if P denotes the polynomial
P = 3x4 5x3 + x2 7x + 5,
we have
dP = 12x3 15x2 + 2x 7,
0 1 0 0 0
7
0 0 2 0 0 7 2
0 0 0 3 0 1 = 15 ,
5
0 0 0 0 4
12
3
as expected! The kernel (nullspace) of d consists of the polynomials of degree 0, that is, the
constant polynomials. Therefore dim(Ker d) = 1, and from
dim(E) = dim(Ker d) + dim(Im d)
(see Theorem 4.11), we get dim(Im d) = 4 (since dim(E) = 5).
For fun, let us figure out the linear map from the vector space R[X]3 to the vector space
i
R[X]4 given by integration (finding
R the primitive, or anti-derivative) of x , for i = 0, 1, 2, 3).
The 5 4 matrix S representing with respect to the same bases as before is
0 0
0
0
1 0
0
0
.
0
1/2
0
0
S=
0 0 1/3 0
0 0
0 1/4
62
We verify that DS = I4 ,
0
0
0
0
1
0
0
0
0
2
0
0
0
0
3
0
0 0
0
0
0
1
1 0
0
0
0
0
0 1/2 0
0
= 0
0
0 0 1/3 0
4
0
0 0
0 1/4
0 0
0
0
0 1
1 0
0
0
0 1/2 0
0 0
0
0 0
0 0 1/3 0
0 0
0 0
0 1/4
0
1
0
0
0
0
,
0
1
0
0
1
0
0
2
0
0
0
0
3
0
0
0
0
0
= 0
0
0
4
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
,
0
1
63
3.1. MATRICES
so we get
and
64
where M (x) is the column vector associated with the vector x and M (g(x)) is the column
vector associated with g(x), as explained in Definition 3.3.
Thus, M : Hom(E, F ) Mn,p is an isomorphism of vector spaces, and when p = n
and the basis (v1 , . . . , vn ) is identical to the basis (u1 , . . . , up ), M : Hom(E, E) Mn is an
isomorphism of rings.
Proof. That M (g(x)) = M (g)M (x) was shown just before stating the proposition, using
identity (1). The identities M (g + h) = M (g) + M (h) and M (g) = M (g) are straightforward, and M (f g) = M (f )M (g) follows from (4) and the definition of matrix multiplication.
The mapping M : Hom(E, F ) Mn,p is clearly injective, and since every matrix defines a
linear map, it is also surjective, and thus bijective. In view of the above identities, it is an
isomorphism (and similarly for M : Hom(E, E) Mn ).
In view of Proposition 3.2, it seems preferable to represent vectors from a vector space
of finite dimension as column vectors rather than row vectors. Thus, from now on, we will
denote vectors of Rn (or more generally, of K n ) as columm vectors.
It is important to observe that the isomorphism M : Hom(E, F ) Mn,p given by Proposition 3.2 depends on the choice of the bases (u1 , . . . , up ) and (v1 , . . . , vn ), and similarly for the
isomorphism M : Hom(E, E) Mn , which depends on the choice of the basis (u1 , . . . , un ).
Thus, it would be useful to know how a change of basis affects the representation of a linear
map f : E F as a matrix. The following simple proposition is needed.
Proposition 3.3. Let E be a vector space, and let (u1 , . . . , un ) be aPbasis of E. For every
family (v1 , . . . , vn ), let P = (ai j ) be the matrix defined such that vj = ni=1 ai j ui . The matrix
P is invertible iff (v1 , . . . , vn ) is a basis of E.
Proof. Note that we have P = M (f ), the matrix associated with the unique linear map
f : E E such that f (ui ) = vi . By Proposition 2.16, f is bijective iff (v1 , . . . , vn ) is a basis
of E. Furthermore, it is obvious that the identity matrix In is the matrix associated with the
identity id : E E w.r.t. any basis. If f is an isomorphism, then f f 1 = f 1 f = id, and
by Proposition 3.2, we get M (f )M (f 1 ) = M (f 1 )M (f ) = In , showing that P is invertible
and that M (f 1 ) = P 1 .
Proposition 3.3 suggests the following definition.
Definition 3.4. Given a vector space E of dimension n, for any two bases (u1 , . . . , un ) and
(v1 , . . . , vn ) of E, let P = (ai j ) be the invertible matrix defined such that
vj =
n
X
ai j ui ,
i=1
which is also the matrix of the identity id : E E with respect to the bases (v1 , . . . , vn ) and
(u1 , . . . , un ), in that order . Indeed, we express each id(vj ) = vj over the basis (u1 , . . . , un ).
65
3.1. MATRICES
The coefficients a1j , a2j , . . . , anj of vj over the basis (u1 , . . . , un ) form the jth column of the
matrix P shown below:
v1
u1
u2
..
.
v2 . . .
a11 a12
a21 a22
..
..
.
.
un an1 an2
vn
. . . a1n
. . . a2n
.. .
..
. .
. . . ann
The matrix P is called the change of basis matrix from (u1 , . . . , un ) to (v1 , . . . , vn ).
Clearly, the change of basis matrix from (v1 , . . . , vn ) to (u1 , . . . , un ) is P 1 . Since P =
(ai,j ) is the matrix of the identity id : E E with respect to the bases (v1 , . . . , vn ) and
(u1 , . . . , un ), given any vector x E, if x = x1 u1 + + xn un over the basis (u1 , . . . , un ) and
x = x01 v1 + + x0n vn over the basis (v1 , . . . , vn ), from Proposition 3.2, we have
a1 1 . . . a 1 n
x01
x1
.. .. . .
.. .. ,
. = .
.
. .
x0n
xn
an 1 . . . an n
showing that the old coordinates (xi ) of x (over (u1 , . . . , un )) are expressed in terms of the
new coordinates (x0i ) of x (over (v1 , . . . , vn )).
Now we face the painful task of assigning a good notation incorporating the bases
U = (u1 , . . . , un ) and V = (v1 , . . . , vn ) into the notation for the change of basis matrix from
U to V. Because the change of basis matrix from U to V is the matrix of the identity map
idE with respect to the bases V and U in that order , we could denote it by MV,U (id) (Meyer
[80] uses the notation [I]V,U ), which we abbreviate as
PV,U .
Note that
1
PU ,V = PV,U
.
Then, if we write xU = (x1 , . . . , xn ) for the old coordinates of x with respect to the basis U
and xV = (x01 , . . . , x0n ) for the new coordinates of x with respect to the basis V, we have
xU = PV,U xV ,
1
xV = PV,U
xU .
The above may look backward, but remember that the matrix MU ,V (f ) takes input
expressed over the basis U to output expressed over the basis V. Consequently, PV,U takes
input expressed over the basis V to output expressed over the basis U, and xU = PV,U xV
matches this point of view!
66
Beware that some authors (such as Artin [4]) define the change of basis matrix from U
1
to V as PU ,V = PV,U
. Under this point of view, the old basis U is expressed in terms of
the new basis V. We find this a bit unnatural. Also, in practice, it seems that the new basis
is often expressed in terms of the old basis, rather than the other way around.
Since the matrix P = PV,U expresses the new basis (v1 , . . . , vn ) in terms of the old basis
(u1 , . . ., un ), we observe that the coordinates (xi ) of a vector x vary in the opposite direction
of the change of basis. For this reason, vectors are sometimes said to be contravariant.
However, this expression does not make sense! Indeed, a vector in an intrinsic quantity that
does not depend on a specific basis. What makes sense is that the coordinates of a vector
vary in a contravariant fashion.
Let us consider some concrete examples of change of bases.
Example 3.2. Let E = F = R2 , with u1 = (1, 0), u2 = (0, 1), v1 = (1, 1) and v2 = (1, 1).
The change of basis matrix P from the basis U = (u1 , u2 ) to the basis V = (v1 , v2 ) is
1 1
P =
1 1
and its inverse is
P
=
1/2 1/2
.
1/2 1/2
The old coordinates (x1 , x2 ) with respect to (u1 , u2 ) are expressed in terms of the new
coordinates (x01 , x02 ) with respect to (v1 , v2 ) by
0
x1
1 1
x1
=
,
x2
1 1
x02
and the new coordinates (x01 , x02 ) with respect to (v1 , v2 ) are expressed in terms of the old
coordinates (x1 , x2 ) with respect to (u1 , u2 ) by
0
1/2 1/2
x1
x1
=
.
0
x2
1/2 1/2
x2
Example 3.3. Let E = F = R[X]3 be the set of polynomials of degree at most 3,
and consider the bases U = (1, x, x2 , x3 ) and V = (B03 (x), B13 (x), B23 (x), B33 (x)), where
B03 (x), B13 (x), B23 (x), B33 (x) are the Bernstein polynomials of degree 3, given by
B03 (x) = (1 x)3
B33 (x) = x3 .
By expanding the Bernstein polynomials, we find that the change of basis matrix PV,U is
given by
1
0
0 0
3 3
0 0
PV,U =
3 6 3 0 .
1 3 3 1
67
1
PV,U
1 0
0 0
1 1/3 0 0
=
1 2/3 1/3 0 .
1 1
1 1
Therefore, the coordinates of the polynomial 2x3 x + 1 over the basis V are
1
1 0
0 0
1
2/3 1 1/3 0 0 1
=
1/3 1 2/3 1/3 0 0 ,
2
1 1
1 1
2
and so
2
1
2x3 x + 1 = B03 (x) + B13 (x) + B23 (x) + 2B33 (x).
3
3
Our next example is the Haar wavelets, a fundamental tool in signal processing.
3.2
w1 =
w2 =
w3 =
1
1
0
1
1
0
0
0
w4 =
1 .
1
Note that these vectors are pairwise orthogonal, so they are indeed linearly independent
(we will see this in a later chapter). Let W = {w1 , w2 , w3 , w4 } be the Haar basis, and let
U = {e1 , e2 , e3 , e4 } be the canonical basis of R4 . The change of basis matrix W = PW,U from
U to W is given by
1 1
1
0
1 1 1 0
,
W =
1 1 0
1
1 1 0 1
and we easily find that the inverse of W is given by
1/4 0
0
0
1 1
1
1
0 1/4 0
0
1 1 1 1 .
W 1 =
0
0 1/2 0 1 1 0
0
0
0
0 1/2
0 0
1 1
68
So, the vector v = (6, 4, 5, 1) over the basis U becomes c = (c1 , c2 , c3 , c4 ) over the Haar basis
W, with
c1
1/4 0
0
0
1 1
1
1
6
4
c2 0 1/4 0
0 1 1 1 1 4 1
=
.
=
c3 0
0 1/2 0 1 1 0
0 5 1
c4
0
0
0 1/2
0 0
1 1
1
2
Given a signal v = (v1 , v2 , v3 , v4 ), we first transform v into its coefficients c = (c1 , c2 , c3 , c4 )
over the Haar basis by computing c = W 1 v. Observe that
c1 =
v1 + v2 + v3 + v4
4
is the overall average value of the signal v. The coefficient c1 corresponds to the background
of the image (or of the sound). Then, c2 gives the coarse details of v, whereas, c3 gives the
details in the first part of v, and c4 gives the details in the second half of v.
Reconstruction of the signal consists in computing v = W c. The trick for good compression is to throw away some of the coefficients of c (set them to zero), obtaining a compressed
signal b
c, and still retain enough crucial information so that the reconstructed signal vb = W b
c
looks almost as good as the original signal v. Thus, the steps are:
input v coefficients c = W 1 v compressed b
c compressed vb = W b
c.
This kind of compression scheme makes modern video conferencing possible.
It turns out that there is a faster way to find c = W 1 v, without actually using W 1 .
This has to do with the multiscale nature of Haar wavelets.
Given the original signal v = (6, 4, 5, 1) shown in Figure 3.1, we compute averages and
half differences obtaining Figure 3.2. We get the coefficients c3 = 1 and c4 = 2. Then, again
we compute averages and half differences obtaining Figure 3.3. We get the coefficients c1 = 4
and c2 = 1.
69
1
3
1
4
1 1
1
0
1
0
0
0
1 1
1
0 1 0
0
0
1 1 1 0
0
1
0
0
1 1 1 0
0
1
0
0
W =
1 1 0
1
0
0
1
0
1 1 0
1
0
0 1 0
1 1 0 1 0
0
0
1
1 1 0 1 0
0
0 1
The columns of this matrix are orthogonal and it is easy to see that
W 1 = diag(1/8, 1/8, 1/4, 1/4, 1/2, 1/2, 1/2, 1/2)W > .
A pattern is begining to emerge. It looks like the second Haar basis vector w2 is the mother
of all the other basis vectors, except the first, whose purpose is to perform averaging. Indeed,
in general, given
w2 = (1, . . . , 1, 1, . . . , 1),
|
{z
}
2n
the other Haar basis vectors are obtained by a scaling and shifting process. Starting from
w2 , the scaling process generates the vectors
w3 , w5 , w9 , . . . , w2j +1 , . . . , w2n1 +1 ,
70
such that w2j+1 +1 is obtained from w2j +1 by forming two consecutive blocks of 1 and 1
of half the size of the blocks in w2j +1 , and setting all other entries to zero. Observe that
w2j +1 has 2j blocks of 2nj elements. The shifting process, consists in shifting the blocks of
1 and 1 in w2j +1 to the right by inserting a block of (k 1)2nj zeros from the left, with
0 j n 1 and 1 k 2j . Thus, we obtain the following formula for w2j +k :
1 i (k 1)2nj
1
(k 1)2nj + 1 i (k 1)2nj + 2nj1
w2j +k (i) =
0
k2nj + 1 i 2n ,
with 0 j n 1 and 1 k 2j . Of course
w1 = (1, . . . , 1) .
| {z }
2n
The above formulae look a little better if we change our indexing slightly by letting k vary
from 0 to 2j 1 and using the index j instead of 2j . In this case, the Haar basis is denoted
by
w1 , h00 , h10 , h11 , h20 , h21 , h22 , h23 , . . . , hjk , . . . , hn1
2n1 1 ,
and
1
hjk (i) =
1 i k2nj
k2nj + 1 i k2nj + 2nj1
k2nj + 2nj1 + 1 i (k + 1)2nj
(k + 1)2nj + 1 i 2n ,
with 0 j n 1 and 0 k 2j 1.
It turns out that there is a way to understand these formulae better if we interpret a
vector u = (u1 , . . . , um ) as a piecewise linear function over the interval [0, 1). We define the
function plf(u) such that
plf(u)(x) = ui ,
i1
i
x < , 1 i m.
m
m
In words, the function plf(u) has the value u1 on the interval [0, 1/m), the value u2 on
[1/m, 2/m), etc., and the value um on the interval [(m1)/m, 1). For example, the piecewise
linear function associated with the vector
u = (2.4, 2.2, 2.15, 2.05, 6.8, 2.8, 1.1, 1.3)
is shown in Figure 3.4.
Then, each basis vector hjk corresponds to the function
kj = plf(hjk ).
71
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
if 0 x < 1/2
1
(x) = 1 if 1/2 x < 1
0
otherwise,
whose graph is shown in Figure 3.5. Then, it is easy to see that kj is given by the simple
1
0 j n 1, 0 k 2j 1.
The above formula makes it clear that kj is obtained from by scaling and shifting. The
function 00 = plf(w1 ) is the piecewise linear function with the constant value 1 on [0, 1), and
the functions kj together with 00 are known as the Haar wavelets.
72
Rather than using W 1 to convert a vector u to a vector c of coefficients over the Haar
basis, and the matrix W to reconstruct the vector u from its Haar coefficients c, we can use
faster algorithms that use averaging and differencing.
If c is a vector of Haar coefficients of dimension 2n , we compute the sequence of vectors
u , u1 , . . ., un as follows:
0
u0 = c
uj+1 = uj
uj+1 (2i 1) = uj (i) + uj (2j + i)
We leave it as an exercise to implement the above programs in Matlab using two variables
u and c, and by building iteratively 2j . Here is an example of the conversion of a vector to
its Haar coefficients for n = 3.
Given the sequence u = (31, 29, 23, 17, 6, 8, 2, 4), we get the sequence
c3
c2
c1
c0
so c = (10, 15, 5, 2, 1, 3, 1, 1). Conversely, given c = (10, 15, 5, 2, 1, 3, 1, 1), we get the
sequence
u0
u1
u2
u3
= (10, 15, 5, 2, 1, 3, 1, 1)
= (25, 5, 5, 2, 1, 3, 1, 1)
= (30, 20, 7, 3, 1, 3, 1, 1)
= (31, 29, 23, 17, 6, 8, 2, 4),
73
There is another recursive method for constucting the Haar matrix Wn of dimension 2n
that makes it clearer why the above algorithms are indeed correct (which nobody seems to
prove!). If we split Wn into two 2n 2n1 matrices, then the second matrix containing the
last 2n1 columns of Wn has a very simple structure: it consists of the vector
(1, 1, 0, . . . , 0)
{z
}
|
2n
1
0
0
0
1 0
0
0
0
1
0
0
0 1 0
.
0
0
1
0
0
1
0
0
0
0
1
0
0
0 1
Then, we form the 2n 2n2 matrix obtained by doubling each column of odd index, which
means replacing each such column by a column in which the block of 1 is doubled and the
block of 1 is doubled. In general, given a current matrix of dimension 2n 2j , we form a
2n 2j1 matrix by doubling each column of odd index, which means that we replace each
such column by a column in which the block of 1 is doubled and the block of 1 is doubled.
We repeat this process n 1 times until we get the vector
(1, . . . , 1, 1, . . . , 1) .
{z
}
|
2n
The first vector is the averaging vector (1, . . . , 1). This process is illustrated below for n = 3:
| {z }
2n
1
1
0
1
0
0
0
1
1
1 0
0
0
0
1
1 0
0
1
0
0
0
= 1 0 = 0 1 0
1
0
1
0
1
0
1
0
1
0
1
0
1
0 1
0
0
0
1
1
0 1
0
0
0 1
74
1
1
1
W3 =
1
1
1
1
1
0
1
0
0
0
1
1
0 1 0
0
0
1 1 0
0
1
0
0
1 1 0
0 1 0
0
.
1 0
1
0
0
1
0
1 0
1
0
0 1 0
1 0 1 0
0
0
1
1 0 1 0
0
0 1
Observe that the right block (of size 2n 2n1 ) shows clearly how the detail coefficients
in the second half of the vector c are added and subtracted to the entries in the first half of
the partially reconstructed vector after n 1 steps.
An important and attractive feature of the Haar basis is that it provides a multiresolution analysis of a signal. Indeed, given a signal u, if c = (c1 , . . . , c2n ) is the vector of its
Haar coefficients, the coefficients with low index give coarse information about u, and the
coefficients with high index represent fine information. For example, if u is an audio signal
corresponding to a Mozart concerto played by an orchestra, c1 corresponds to the background noise, c2 to the bass, c3 to the first cello, c4 to the second cello, c5 , c6 , c7 , c7 to the
violas, then the violins, etc. This multiresolution feature of wavelets can be exploited to
compress a signal, that is, to use fewer coefficients to represent it. Here is an example.
Consider the signal
u = (2.4, 2.2, 2.15, 2.05, 6.8, 2.8, 1.1, 1.3),
whose Haar transform is
c = (2, 0.2, 0.1, 3, 0.1, 0.05, 2, 0.1).
The piecewise-linear curves corresponding to u and c are shown in Figure 3.6. Since some of
the coefficients in c are small (smaller than or equal to 0.2) we can compress c by replacing
them by 0. We get
c2 = (2, 0, 0, 3, 0, 0, 2, 0),
and the reconstructed signal is
u2 = (2, 2, 2, 2, 7, 3, 1, 1).
The piecewise-linear curves corresponding to u2 and c2 are shown in Figure 3.7.
75
6
2.5
3
1.5
0
0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.6
0.7
0.8
0.9
6
2.5
5
2
1.5
1
0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
76
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
7
4
x 10
x 10
0.6
1
0.8
0.4
0.6
0.2
0.4
0.2
0
0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
1
0.8
7
4
x 10
x 10
Figure 3.9: The compressed signal handel and its Haar transform
matrices (even rectangular) without any extra effort! This allows for the compression of
digital images. But first, we address the issue of normalization of the Haar coefficients. As
we observed earlier, the 2n 2n matrix Wn of Haar basis vectors has orthogonal columns,
but its columns do not have unit length. As a consequence, Wn> is not the inverse of Wn ,
but rather the matrix
Wn1 = Dn Wn>
with Dn = diag 2n , |{z}
2n , 2(n1) , 2(n1) , 2(n2) , . . . , 2(n2) , . . . , 21 , . . . , 21 .
|
{z
} |
{z
}
|
{z
}
20
21
22
Hn = Wn Dn2
2n1
77
21
22
2n1
We call Hn the normalized Haar transform matrix. Because Hn is orthogonal, Hn1 = Hn> .
Given a vector (signal) u, we call c = Hn> u the normalized Haar coefficients of u. Then, a
moment of reflexion shows that we have to slightly modify the algorithms to compute Hn> u
and Hn c as follows: When computing the sequence of uj s, use
Note that things are now more symmetric, at the expense of a division by 2. However, for
long vectors, it turns out that these algorithms are numerically more stable.
Remark:Some authors (for example, Stollnitz, Derose and Salesin [103]) rescale c by 1/ 2n
and u by 2n . This is because the norm of the basis functions kj is not equal to 1 (under
R1
the inner product hf, gi = 0 f (t)g(t)dt). The normalized basis functions are the functions
2j kj .
Let us now explain the 2D version of the Haar transform. We describe the version using
the matrix Wn , the method using Hn being identical (except that Hn1 = Hn> , but this does
not hold for Wn1 ). Given a 2m 2n matrix A, we can first convert the rows of A to their
Haar coefficients using the Haar transform Wn1 , obtaining a matrix B, and then convert the
columns of B to their Haar coefficients, using the matrix Wm1 . Because columns and rows
are exchanged in the first step,
B = A(Wn1 )> ,
and in the second step C = Wm1 B, thus, we have
C = Wm1 A(Wn1 )> = Dm Wm> AWn Dn .
In the other direction, given a matrix C of Haar coefficients, we reconstruct the matrix A
(the image) by first applying Wm to the columns of C, obtaining B, and then Wn> to the
rows of B. Therefore
A = Wm CWn> .
Of course, we dont actually have to invert Wm and Wn and perform matrix multiplications.
We just have to use our algorithms using averaging and differencing. Here is an example.
78
64 2 3 61 60 6
9 55 54 12 13 51
17 47 46 20 21 43
40 26 27 37 36 30
A=
32 34 35 29 28 38
41 23 22 44 45 19
49 15 14 52 53 11
8 58 59 5 4 62
7
50
42
31
39
18
10
63
32.5 0
0
0
0
0
0 0
0
0
0
0
0 0
0
0
4
4
0 0
0
0
4
4
C=
0 0 0.5
0.5
27 25
0 0 0.5 0.5 11 9
0 0 0.5
0.5 5
7
0 0 0.5 0.5 21 23
57
16
24
33
,
25
48
56
1
0
0
0
0
4 4
4 4
.
23 21
7 5
9 11
25 27
As we can see, C has a more zero entries than A; it is a compressed version of A. We can
further compress C by setting to 0 all entries of absolute value at most 0.5. Then, we get
32.5 0 0 0 0
0
0
0
0 0 0 0 0
0
0
0
0 0 0 0 4
4
4
4
0 0 0 0 4
4 4 4
.
C2 =
0
0
0
0
27
25
23
21
0 0 0 0 11 9 7 5
0 0 0 0 5
7 9 11
0 0 0 0 21 23 25 27
We find that the reconstructed image is
61.5
11.5
19.5
37.5
29.5
43.5
51.5
5.5
59.5
13.5
21.5
35.5
27.5
45.5
53.5
3.5
5.5
51.5
43.5
29.5
37.5
19.5
11.5
61.5
7.5
49.5
41.5
31.5
39.5
17.5
9.5
63.5
57.5
15.5
23.5
33.5
,
25.5
47.5
55.5
1.5
79
If we use the normalized matrices Hm and Hn , then the equations relating the image
80
A = Hm CHn> .
The Haar transform can also be used to send large images progressively over the internet.
Indeed, we can start sending the Haar coefficients of the matrix C starting from the coarsest
coefficients (the first column from top down, then the second column, etc.) and at the
receiving end we can start reconstructing the image as soon as we have received enough
data.
Observe that instead of performing all rounds of averaging and differencing on each row
and each column, we can perform partial encoding (and decoding). For example, we can
perform a single round of averaging and differencing for each row and each column. The
result is an image consisting of four subimages, where the top left quarter is a coarser version
of the original, and the rest (consisting of three pieces) contain the finest detail coefficients.
We can also perform two rounds of averaging and differencing, or three rounds, etc. This
process is illustrated on the image shown in Figure 3.12. The result of performing one round,
two rounds, three rounds, and nine rounds of averaging is shown in Figure 3.13. Since our
images have size 512 512, nine rounds of averaging yields the Haar transform, displayed as
the image on the bottom right. The original image has completely disappeared! We leave it
as a fun exercise to modify the algorithms involving averaging and differencing to perform
k rounds of averaging/differencing. The reconstruction algorithm is a little tricky.
A nice and easily accessible account of wavelets and their uses in image processing and
computer graphics can be found in Stollnitz, Derose and Salesin [103]. A very detailed
account is given in Strang and and Nguyen [106], but this book assumes a fair amount of
background in signal processing.
We can find easily a basis of 2n 2n = 22n vectors wij (2n 2n matrices) for the linear
map that reconstructs an image from its Haar coefficients, in the sense that for any matrix
C of Haar coefficients, the image matrix A is given by
n
A=
2 X
2
X
cij wij .
i=1 j=1
C=
2 X
2
X
i=1 j=1
aij hij .
81
82
Figure 3.13: Haar tranforms after one, two, three, and nine rounds of averaging
3.3
83
The effect of a change of bases on the representation of a linear map is described in the
following proposition.
Proposition 3.4. Let E and F be vector spaces, let U = (u1 , . . . , un ) and U 0 = (u01 , . . . , u0n )
0
be two bases of E, and let V = (v1 , . . . , vm ) and V 0 = (v10 , . . . , vm
) be two bases of F . Let
0
P = PU 0 ,U be the change of basis matrix from U to U , and let Q = PV 0 ,V be the change of
basis matrix from V to V 0 . For any linear map f : E F , let M (f ) = MU ,V (f ) be the matrix
associated to f w.r.t. the bases U and V, and let M 0 (f ) = MU 0 ,V 0 (f ) be the matrix associated
to f w.r.t. the bases U 0 and V 0 . We have
M 0 (f ) = Q1 M (f )P,
or more explicitly
MU 0 ,V 0 (f ) = PV1
0 ,V MU ,V (f )PU 0 ,U = PV,V 0 MU ,V (f )PU 0 ,U .
Proof. Since f : E F can be written as f = idF f idE , since P is the matrix of idE
w.r.t. the bases (u01 , . . . , u0n ) and (u1 , . . . , un ), and Q1 is the matrix of idF w.r.t. the bases
0
(v1 , . . . , vm ) and (v10 , . . . , vm
), by Proposition 3.2, we have M 0 (f ) = Q1 M (f )P .
As a corollary, we get the following result.
Corollary 3.5. Let E be a vector space, and let U = (u1 , . . . , un ) and U 0 = (u01 , . . . , u0n ) be
two bases of E. Let P = PU 0 ,U be the change of basis matrix from U to U 0 . For any linear
map f : E E, let M (f ) = MU (f ) be the matrix associated to f w.r.t. the basis U, and let
M 0 (f ) = MU 0 (f ) be the matrix associated to f w.r.t. the basis U 0 . We have
M 0 (f ) = P 1 M (f )P,
or more explicitly,
MU 0 (f ) = PU1
0 ,U MU (f )PU 0 ,U = PU ,U 0 MU (f )PU 0 ,U .
Example 3.4. Let E = R2 , U = (e1 , e2 ) where e1 = (1, 0) and e2 = (0, 1) are the canonical
basis vectors, let V = (v1 , v2 ) = (e1 , e1 e2 ), and let
2 1
A=
.
0 1
The change of basis matrix P = PV,U from U to V is
1 1
P =
,
0 1
84
A =P
1 1
2 1
1 1
2 0
AP = P AP =
=
= D,
0 1
0 1
0 1
0 1
a diagonal matrix. Therefore, in the basis V, it is clear what the action of f is: it is a stretch
by a factor of 2 in the v1 direction and it is the identity in the v2 direction. Observe that v1
and v2 are not orthogonal.
What happened is that we diagonalized the matrix A. The diagonal entries 2 and 1 are
the eigenvalues of A (and f ) and v1 and v2 are corresponding eigenvectors. We will come
back to eigenvalues and eigenvectors later on.
The above example showed that the same linear map can be represented by different
matrices. This suggests making the following definition:
Definition 3.5. Two nn matrices A and B are said to be similar iff there is some invertible
matrix P such that
B = P 1 AP.
It is easily checked that similarity is an equivalence relation. From our previous considerations, two n n matrices A and B are similar iff they represent the same linear map with
respect to two different bases. The following surprising fact can be shown: Every square
matrix A is similar to its transpose A> . The proof requires advanced concepts than we will
not discuss in these notes (the Jordan form, or similarity invariants).
If U = (u1 , . . . , un ) and V = (v1 , . . . , vn ) are two bases of E, the change of basis matrix
P = PV,U
= ..
..
..
.
.
.
.
.
.
an1 an2 ann
from (u1 , . . . , un ) to (v1 , . . . , vn ) is the matrix whose jth column consists of the coordinates
of vj over the basis (u1 , . . . , un ), which means that
vj =
n
X
i=1
aij ui .
85
v1
..
It is natural to extend the matrix notation and to express the vector . in E n as the
vn
u1
..
product of a matrix times the vector . in E n , namely as
un
u1
v1
a11 a21 an1
v2 a12 a22 an2 u2
.. = ..
..
.. .. ,
.
.
. .
.
.
. .
vn
a1n a2n ann
un
but notice that the matrix involved is not P , but its transpose P > .
This observation has the following consequence: if U = (u1 , . . . , un ) and V = (v1 , . . . , vn )
are two bases of E and if
v1
u1
..
..
. = A . ,
vn
un
that is,
vi =
n
X
aij uj ,
j=1
n
X
i=1
then
xi ui =
n
X
y k vk ,
k=1
x1
y1
..
> ..
. = A . ,
xn
and so
yn
y1
x1
..
> 1 ..
. = (A ) . .
yn
xn
It is easy to see that (A> )1 = (A1 )> . Also, if U = (u1 , . . . , un ), V = (v1 , . . . , vn ), and
W = (w1 , . . . , wn ) are three bases of E, and if the change of basis matrix from U to V is
P = PV,U and the change of basis matrix from V to W is Q = PW,V , then
v1
u1
w1
v1
..
..
> ..
> ..
. = P . , . = Q . ,
vn
un
wn
vn
86
so
w1
u1
u1
..
> > ..
> ..
. = Q P . = (P Q) . ,
wn
un
un
which means that the change of basis matrix PW,U from U to W is P Q. This proves that
PW,U = PV,U PW,V .
3.4
Summary
The main concepts and results of this chapter are listed below:
The representation of linear maps by matrices.
The vector space of linear maps HomK (E, F ).
The vector space Mm,n (K) of m n matrices over the field K; The ring Mn (K) of
n n matrices over the field K.
Column vectors, row vectors.
Matrix operations: addition, scalar multiplication, multiplication.
The matrix representation mapping M : Hom(E, F ) Mn,p and the representation
isomorphism (Proposition 3.2).
Haar basis vectors and a glimpse at Haar wavelets.
Change of basis matrix and Proposition 3.4.
Chapter 4
Direct Sums, The Dual Space, Duality
4.1
Before considering linear forms and hyperplanes, we define the notion of direct sum and
prove some simple propositions.
There is a subtle point, which is that if we attempt to
`
define the direct sum E F of two vector spaces using the cartesian product E F , we
dont
ordered pairs, but we want
` quite get
` the right notion because elements of E F are `
E F = F E. Thus, we want to think of the elements of E F as unordrered pairs of
elements. It is possible to do so by considering the direct sum of a family (Ei )i{1,2} , and
more generally of a family (Ei )iI . For simplicity, we begin by considering the case where
I = {1, 2}.
Definition 4.1.
` Given a family (Ei )i{1,2} of two vector spaces, we define the (external)
direct sum E1 E2 (or coproduct) of the family (Ei )i{1,2} as the set
E1
with addition
{h1, u1 i, h2, v1 i} + {h1, u2 i, h2, v2 i} = {h1, u1 + u2 i, h2, v1 + v2 i},
and scalar multiplication
{h1, ui, h2, vi} = {h1, ui, h2, vi}.
`
`
We define the injections in1 : E1 E1 E2 and in2 : E2 E1 E2 as the linear maps
defined such that,
in1 (u) = {h1, ui, h2, 0i},
and
in2 (v) = {h1, 0i, h2, vi}.
87
88
a
E1 = {{h2, vi, h1, ui} | v E2 , u E1 } = E1
E2 .
`
Thus, every member {h1, ui, h2, vi} of E1 E2 can be viewed as an unordered pair consisting
of the two vectors u and v, tagged with the index 1 and 2, respectively.
`
Q
Remark: In fact, E1 E2 is just the product i{1,2} Ei of the family (Ei )i{1,2} .
E2
This is not to be confused with the cartesian product E1 E2 . The vector space E1 E2
is the set of all ordered pairs hu, vi, where u E1 , and v E2 , with addition and
multiplication by a scalar defined such that
hu1 , v1 i + hu2 , v2 i = hu1 + u2 , v1 + v2 i,
hu, vi = hu, vi.
Q
There is a bijection between i{1,2} Ei and E1 E2 , but as we just saw, elements of
Q
i{1,2} Ei are certain sets. The product E1 En of any number of vector spaces
can also be defined. We will do this shortly.
The following property holds.
`
Proposition 4.1. Given any two vector spaces, E1 and E2 , the set E1 E2 is a vector
space. For every`pair of linear maps, f : E1 G and g : E2 G, there is a unique linear
map, f + g : E1 E2 G, such that (f + g) in1 = f and (f + g) in2 = g, as in the
following diagram:
E1 PPP
PPP
PPPf
PPP
PPP
P'
`
f +g
/
E1 O E2
n7
n
n
n
nnn
in2
nnng
n
n
n
nnn
in1
E2
Proof. Define
(f + g)({h1, ui, h2, vi}) = f (u) + g(v),
for every u E1 and v E2 . It is immediately verified that f + g is the unique linear map
with the required properties.
`
We `
already noted that E1 `E2 is in bijection with E1 E2 . If we define the projections
1 : E1 E2 E1 and 2 : E1 E2 E2 , such that
1 ({h1, ui, h2, vi}) = u,
and
we have the following proposition.
89
Proposition 4.2. Given any two vector spaces, E1 and E2 , for every pair `
of linear maps,
f : D E1 and g : D E2 , there is a unique linear map, f g : D E1 E2 , such that
1 (f g) = f and 2 (f g) = g, as in the following diagram:
7 E1
nnn O
n
n
f nn
n
1
nnn
n
n
n
`
n
f g
n
/ E1
E2
PPP
PPP
PPP
P
2
g PPPP
PP(
E2
Proof. Define
(f g)(w) = {h1, f (w)i, h2, g(w)i},
for every w D. It is immediately verified that f g is the unique linear map with the
required properties.
Remark: It is a peculiarity of linear algebra that direct sums and products of finite families
are isomorphic. However, this is no longer true for products and sums of infinite families.
When U, V are subspaces
of a vector space E, letting i1 : U E and i2 : V E be the
`
inclusion maps, if U V is isomomorphic to E under the map i1 +`i2 given by Proposition
4.1, we say that E is a direct
` sum of U and V , and we write E = U V (with a slight abuse
of notation, since E and U V are only isomorphic). It is also convenient to define the sum
U1 + + Up and the internal direct sum U1 Up of any number of subspaces of E.
Definition 4.2. Given p 2 vector spaces E1 , . . . , Ep , the product F = E1 Ep can
be made into a vector space by defining addition and scalar multiplication as follows:
(u1 , . . . , up ) + (v1 , . . . , vp ) = (u1 + v1 , . . . , up + vp )
(u1 , . . . , up ) = (u1 , . . . , up ),
for all ui , vi Ei and all K. With the above addition and multiplication, the vector
space F = E1 Ep is called the direct product of the vector spaces E1 , . . . , Ep .
As a special case, when E1 = = Ep = K, we find again the vector space F = K p .
The projection maps pri : E1 Ep Ei given by
pri (u1 , . . . , up ) = ui
are clearly linear. Similarly, the maps ini : Ei E1 Ep given by
ini (ui ) = (0, . . . , 0, ui , 0, . . . , 0)
90
are injective and linear. If dim(Ei ) = ni and if (ei1 , . . . , eini ) is a basis of Ei for i = 1, . . . , p,
then it is easy to see that the n1 + + np vectors
(e11 , 0, . . . , 0),
..
.
...,
..
.
(e1n1 , 0, . . . , 0),
..
.
with ui Ui for i = 1, . . . , p. It is clear that this map is linear, and so its image is a subspace
of E denoted by
U1 + + Up
and called the sum of the subspaces U1 , . . . , Up . It is immediately verified that U1 + + Up
is the smallest subspace of E containing U1 , . . . , Up . This also implies that U1 + + Up does
not depend on the order of the factors Ui ; in particular,
U1 + U2 = U2 + U1 .
if
If the map a is injective, then Ker a = 0, which means that if ui Ui for i = 1, . . . , p and
u1 + + up = 0
Definition 4.3. For any vector space E and any p 2 subspaces U1 , . . . , Up of E, if the
map a defined above is injective, then the sum U1 + + Up is called a direct sum and it is
denoted by
U1 Up .
E = U1 Up .
91
X
p
j=1,j6=i
Uj
= (0),
i = 1, . . . , p.
92
(3) We have
Ui
X
i1
Uj
= (0),
i = 2, . . . , p.
j=1
u = u1 + + up
for some ui Ui for i = 1 . . . , p, we can define the maps i : E Ui , called projections, by
i (u) = i (u1 + + up ) = ui .
It is easy to check that these maps are linear and satisfy the following properties:
(
i if i = j
j i =
0 if i 6= j,
1 + + p = idE .
A + A>
,
2
A A>
.
2
93
iI
94
iI
P
such that, ( iI hi ) ini = hi , for every i I.
`
Remark: When Ei = E, for all i I, we denote iI Ei by E (I) . In particular, when
Ei = K, for all i I, we find the vector space K (I) of Definition 2.13.
We also have the following basic proposition about injective or surjective linear maps.
Proposition 4.9. Let E and F be vector spaces, and let f : E F be a linear map. If
f : E F is injective, then there is a surjective linear map r : F E called a retraction,
such that r f = idE . If f : E F is surjective, then there is an injective linear map
s : F E called a section, such that f s = idF .
Proof. Let (ui )iI be a basis of E. Since f : E F is an injective linear map, by Proposition
2.16, (f (ui ))iI is linearly independent in F . By Theorem 2.9, there is a basis (vj )jJ of F ,
where I J, and where vi = f (ui ), for all i I. By Proposition 2.16, a linear map r : F E
can be defined such that r(vi ) = ui , for all i I, and r(vj ) = w for all j (J I), where w
is any given vector in E, say w = 0. Since r(f (ui )) = ui for all i I, by Proposition 2.16,
we have r f = idE .
Eo
1
2
Fo
95
f0
Ker f E Im f,
i
f0
96
Proposition 4.12. Given a vector space E, if U and V are any two subspaces of E, then
dim(U ) + dim(V ) = dim(U + V ) + dim(U V ),
an equation known as Grassmanns relation.
Proof. Recall that U + V is the image of the linear map
a: U V E
given by
a(u, v) = u + v,
and that we proved earlier that the kernel Ker a of a is isomorphic to U V . By Theorem
4.11,
dim(U V ) = dim(Ker a) + dim(Im a),
but dim(U V ) = dim(U ) + dim(V ), dim(Ker a) = dim(U V ), and Im a = U + V , so the
Grassmann relation holds.
The Grassmann relation can be very useful to figure out whether two subspace have a
nontrivial intersection in spaces of dimension > 3. For example, it is easy to see that in R5 ,
there are subspaces U and V with dim(U ) = 3 and dim(V ) = 2 such that U V = 0; for
example, let U be generated by the vectors (1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0), and V be
generated by the vectors (0, 0, 0, 1, 0) and (0, 0, 0, 0, 1). However, we claim that if dim(U ) = 3
and dim(V ) = 3, then dim(U V ) 1. Indeed, by the Grassmann relation, we have
dim(U ) + dim(V ) = dim(U + V ) + dim(U V ),
namely
3 + 3 = 6 = dim(U + V ) + dim(U V ),
dim(U V ) = n 2.
Here is a characterization of direct sums that follows directly from Theorem 4.11.
97
98
w 0 v 0 = u0 ,
(w + w0 ) (v + v 0 ) = (u + u0 ),
where u + u0 U . Similarly, if
where u U , then we have
wv =u
w v = u,
99
The notion of rank of a linear map or of a matrix is an important one, both theoretically
and practically, since it is the key to the solvability of linear equations. Recall from Definition
2.15 that the rank rk(f ) of a linear map f : E F is the dimension dim(Im f ) of the image
subspace Im f of F .
We have the following simple proposition.
Proposition 4.16. Given a linear map f : E F , the following properties hold:
(i) rk(f ) = codim(Ker f ).
(ii) rk(f ) + dim(Ker f ) = dim(E).
(iii) rk(f ) min(dim(E), dim(F )).
Proof. Since by Proposition 4.11, dim(E) = dim(Ker f ) + dim(Im f ), and by definition,
rk(f ) = dim(Im f ), we have rk(f ) = codim(Ker f ). Since rk(f ) = dim(Im f ), (ii) follows
from dim(E) = dim(Ker f ) + dim(Im f ). As for (iii), since Im f is a subspace of F , we have
rk(f ) dim(F ), and since rk(f ) + dim(Ker f ) = dim(E), we have rk(f ) dim(E).
The rank of a matrix is defined as follows.
Definition 4.5. Given a m n-matrix A = (ai j ) over the field K, the rank rk(A) of the
matrix A is the maximum number of linearly independent columns of A (viewed as vectors
in K m ).
In view of Proposition 2.10, the rank of a matrix A is the dimension of the subspace of
K generated by the columns of A. Let E and F be two vector spaces, and let (u1 , . . . , un )
be a basis of E, and (v1 , . . . , vm ) a basis of F . Let f : E F be a linear map, and let M (f )
be its matrix w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ). Since the rank rk(f ) of f is the
dimension of Im f , which is generated by (f (u1 ), . . . , f (un )), the rank of f is the maximum
number of linearly independent vectors in (f (u1 ), . . . , f (un )), which is equal to the number
of linearly independent columns of M (f ), since F and K m are isomorphic. Thus, we have
rk(f ) = rk(M (f )), for every matrix representing f .
m
We will see later, using duality, that the rank of a matrix A is also equal to the maximal
number of linearly independent rows of A.
If U is a hyperplane, then E = U V for some subspace V of dimension 1. However, a
subspace V of dimension 1 is generated by any nonzero vector v V , and thus we denote
V by Kv, and we write E = U Kv. Clearly, v
/ U . Conversely, let x E be a vector
such that x
/ U (and thus, x 6= 0). We claim that E = U Kx. Indeed, since U is a
hyperplane, we have E = U Kv for some v
/ U (with v 6= 0). Then, x E can be written
in a unique way as x = u + v, where u U , and since x
/ U , we must have 6= 0, and
thus, v = 1 u + 1 x. Since E = U Kv, this shows that E = U + Kx. Since x
/ U,
100
4.2
We already observed that the field K itself is a vector space (over itself). The vector space
Hom(E, K) of linear maps from E to the field K, the linear forms, plays a particular role.
We take a quick look at the connection between E and Hom(E, K), its dual space. As we
will see shortly, every linear map f : E F gives rise to a linear map f > : F E , and it
turns out that in a suitable basis, the matrix of f > is the transpose of the matrix of f . Thus,
the notion of dual space provides a conceptual explanation of the phenomena associated with
transposition. But it does more, because it allows us to view subspaces as solutions of sets
of linear equations and vice-versa.
Consider the following set of two linear equations in R3 ,
xy+z =0
x y z = 0,
and let us find out what is their set V of common solutions (x, y, z) R3 . By subtracting
the second equation from the first, we get 2z = 0, and by adding the two equations, we find
that 2(x y) = 0, so the set V of solutions is given by
y=x
z = 0.
This is a one dimensional subspace of R3 . Geometrically, this is the line of equation y = x
in the plane z = 0.
Now, why did we say that the above equations are linear? This is because, as functions
of (x, y, z), both maps f1 : (x, y, z) 7 x y + z and f2 : (x, y, z) 7 x y z are linear. The
set of all such linear functions from R3 to R is a vector space; we used this fact to form linear
combinations of the equations f1 and f2 . Observe that the dimension of the subspace V
is 1. The ambient space has dimension n = 3 and there are two independent equations
f1 , f2 , so it appears that the dimension dim(V ) of the subspace V defined by m independent
equations is
dim(V ) = n m,
101
for every v E,
102
We shall see that the map evalE is injective, and that it is an isomorphism when E has finite
dimension.
We now formalize the notion of the set V 0 of linear equations vanishing on all vectors in
a given subspace V E, and the notion of the set U 0 of common solutions of a given set
U E of linear equations. The duality theorem (Theorem 4.17) shows that the dimensions
of V and V 0 , and the dimensions of U and U 0 , are related in a crucial way. It also shows that,
in finite dimension, the maps V 7 V 0 and U 7 U 0 are inverse bijections from subspaces of
E to subspaces of E .
Definition 4.7. Given a vector space E and its dual E , we say that a vector v E and a
linear form u E are orthogonal if hu , vi = 0. Given a subspace V of E and a subspace U
of E , we say that V and U are orthogonal if hu , vi = 0 for every u U and every v V .
Given a subset V of E (resp. a subset U of E ), the orthogonal V 0 of V is the subspace V 0
of E defined such that
V 0 = {u E | hu , vi = 0, for every v V }
(resp. the orthogonal U 0 of U is the subspace U 0 of E defined such that
U 0 = {v E | hu , vi = 0, for every u U }).
The subspace V 0 E is also called the annihilator of V . The subspace U 0 E
annihilated by U E does not have a special name. It seems reasonable to call it the
linear subspace (or linear variety) defined by U .
Informally, V 0 is the set of linear equations that vanish on V , and U 0 is the set of common
zeros of all linear equations in U .
We can also define V 0 by
V 0 = {u E | V Ker u }
and U 0 by
U0 =
Ker u .
u U
Indeed, if V1 V2 E, then for any f V20 we have f (v) = 0 for all v V2 , and thus
f (v) = 0 for all v V1 , so f V10 . Similarly, if U1 U2 E , then for any v U20 , we
have f (v) = 0 for all f U2 , so f (v) = 0 for all f U1 , which means that v U10 .
Here are some examples. Let E = M2 (R), the space of real 2 2 matrices, and let V be
the subspace of M2 (R) spanned by the matrices
0 1
1 0
0 0
,
,
.
1 0
0 0
0 1
103
We check immediately that the subspace V consists of all matrices of the form
b a
,
a c
that is, all symmetric matrices. The matrices
a11 a12
a21 a22
in V satisfy the equation
a12 a21 = 0,
and all scalar multiples of these equations, so V 0 is the subspace of E spanned by the linear
form given by u (a11 , a12 , a21 , a22 ) = a12 a21 . We have
dim(V 0 ) = dim(E) dim(V ) = 4 3 = 1.
The above example generalizes to E = Mn (R) for any n 1, but this time, consider the
space U of linear forms asserting that a matrix A is symmetric; these are the linear forms
spanned by the n(n 1)/2 equations
aij aji = 0,
1 i < j n;
Note there are no constraints on diagonal entries, and half of the equations
aij aji = 0,
1 i 6= j n
are redudant. It is easy to check that the equations (linear forms) for which i < j are linearly
independent. To be more precise, let U be the space of linear forms in E spanned by the
linear forms
uij (a11 , . . . , a1n , a21 , . . . , a2n , . . . , an1 , . . . , ann ) = aij aji ,
1 i < j n.
Then, the set U 0 of common solutions of these equations is the space S(n) of symmetric
matrices. This space has dimension
n(n 1)
n(n + 1)
= n2
.
2
2
We leave it as an exercise to find a basis of S(n).
If E = Mn (R), consider the subspace U of linear forms in E spanned by the linear forms
uij (a11 , . . . , a1n , a21 , . . . , a2n , . . . , an1 , . . . , ann ) = aij + aji ,
uii (a11 , . . . , a1n , a21 , . . . , a2n , . . . , an1 , . . . , ann )
= aii ,
1i<jn
1 i n.
104
It is easy to see that these linear forms are linearly independent, so dim(U ) = n(n + 1)/2.
The space U 0 of matrices A Mn (R) satifying all of the above equations is clearly the space
Skew(n) of skew-symmetric matrices. The dimension of U 0 is
n(n + 1)
n(n 1)
= n2
.
2
2
We leave it as an exercise to find a basis of Skew(n).
For yet another example, with E = Mn (R), for any A Mn (R), consider the linear form
in E given by
tr(A) = a11 + a22 + + ann ,
called the trace of A. The subspace U 0 of E consisting of all matrices A such that tr(A) = 0
is a space of dimension n2 1. We leave it as an exercise to find a basis of this space.
The dimension equations
dim(V ) + dim(V 0 ) = dim(E)
dim(U ) + dim(U 0 ) = dim(E)
are always true (if E is finite-dimensional). This is part of the duality theorem (Theorem
4.17).
In constrast with the previous examples, given a matrix A Mn (R), the equations
asserting that A> A = I are not linear constraints. For example, for n = 2, we have
a211 + a221 = 1
a221 + a222 = 1
a11 a12 + a21 a22 = 0.
Remarks:
(1) The notation V 0 (resp. U 0 ) for the orthogonal of a subspace V of E (resp. a subspace
U of E ) is not universal. Other authors use the notation V (resp. U ). However,
the notation V is also used to denote the orthogonal complement of a subspace V
with respect to an inner product on a space E, in which case V is a subspace of E
and not a subspace of E (see Chapter 10). To avoid confusion, we prefer using the
notation V 0 .
(2) Since linear forms can be viewed as linear equations (at least in finite dimension), given
a subspace (or even a subset) U of E , we can define the set Z(U ) of common zeros of
the equations in U by
Z(U ) = {v E | u (v) = 0, for all u U }.
105
Of course Z(U ) = U 0 , but the notion Z(U ) can be generalized to more general kinds
of equations, namely polynomial equations. In this more general setting, U is a set of
polynomials in n variables with coefficients in K (where n = dim(E)). Sets of the form
Z(U ) are called algebraic varieties. Linear forms correspond to the special case where
homogeneous polynomials of degree 1 are considered.
If V is a subset of E, it is natural to associate with V the set of polynomials in
K[X1 , . . . , Xn ] that vanish on V . This set, usually denoted I(V ), has some special
properties that make it an ideal . If V is a linear subspace of E, it is natural to restrict
our attention to the space V 0 of linear forms that vanish on V , and in this case we
identify I(V ) and V 0 (although technically, I(V ) is no longer an ideal).
For any arbitrary set of polynomials U K[X1 , . . . , Xn ] (resp V E) the relationship
between I(Z(U )) and U (resp. Z(I(V )) and V ) is generally not simple, even though
we always have
U I(Z(U )) (resp. V Z(I(V ))).
However, when the field K is algebraically closed, then I(Z(U )) is equal to the radical
of the ideal U , a famous result due to Hilbert known as the Nullstellensatz (see Lang
[67] or Dummit and Foote [32]). The study of algebraic varieties is the main subject
of algebraic geometry, a beautiful but formidable subject. For a taste of algebraic
geometry, see Lang [67] or Dummit and Foote [32].
The duality theorem (Theorem 4.17) shows that the situation is much simpler if we
restrict our attention to linear subspaces; in this case
U = I(Z(U )) and V = Z(I(V )).
We claim that V V 00 for every subspace V of E, and that U U 00 for every subspace
U of E .
Indeed, for any v V , to show that v V 00 we need to prove that u (v) = 0 for all
u V 0 . However, V 0 consists of all linear forms u such that u (y) = 0 for all y V ; in
particular, since v V , u (v) = 0 for all u V 0 , as required.
Similarly, for any u U , to show that u U 00 we need to prove that u (v) = 0 for
all v U 0 . However, U 0 consists of all vectors v such that f (v) = 0 for all f U ; in
particular, since u U , u (v) = 0 for all v U 0 , as required.
We will see shortly that in finite dimension, we have V = V 00 and U = U 00 .
106
Given a vector space E and any basis (ui )iI for E, we can associate to each ui a linear
form ui E , and the ui have some remarkable properties.
Definition 4.8. Given a vector space E and any basis (ui )iI for E, by Proposition 2.16,
for every i I, there is a unique linear form ui such that
1 if i = j
ui (uj ) =
0 if i 6= j,
for every j I. The linear form ui is called the coordinate form of index i w.r.t. the basis
(ui )iI .
Given an index set I, authors often define the so called Kronecker symbol i j , such
that
1 if i = j
i j =
0 if i 6= j,
for all i, j I. Then, ui (uj ) = i j .
The reason for the terminology coordinate form is as follows: If E has finite dimension
and if (u1 , . . . , un ) is a basis of E, for any vector
v = 1 u1 + + n un ,
we have
ui (v) = ui (1 u1 + + n un )
= 1 ui (u1 ) + + i ui (ui ) + + n ui (un )
= i ,
since ui (uj ) = i j . Therefore, ui is the linear function that returns the ith coordinate of a
vector expressed over the basis (u1 , . . . , un ).
Given a vector space E and a subspace U of E, by Theorem 2.9, every basis (ui )iI of U
can be extended to a basis (uj )jIJ of E, where I J = . We have the following important
theorem adapted from E. Artin [3] (Chapter 1).
Theorem 4.17. (Duality theorem) Let E be a vector space. The following properties hold:
(a) For every basis (ui )iI of E, the family (ui )iI of coordinate forms is linearly independent.
(b) For every subspace V of E, we have V 00 = V .
(c) For every subspace V of finite codimension m of E, for every subspace W of E such
that E = V W (where W is of finite dimension m), for every basis (ui )iI of E such
that (u1 , . . . , um ) is a basis of W , the family (u1 , . . . , um ) is a basis of the orthogonal
V 0 of V in E , so that
dim(V 0 ) = codim(V ).
Furthermore, we have V 00 = V .
107
i ui = 0,
iI
for a family (i )iI (of scalars in K). Since (i )iI has finite support, there is a finite subset
J of I such that i = 0 for all i I J, and we have
X
j uj = 0.
jJ
Applying the linear form jJ j uj to each uj (j J), by Definition 4.8, since ui (uj ) = 1 if
i = j and 0 otherwise, we get j = 0 for all j J, that is i = 0 for all i I (by definition
of J as the support). Thus, (ui )iI is linearly independent.
P
(b) Clearly, we have V V 00 . If V 6= V 00 , then let (ui )iIJ be a basis of V 00 such that
(ui )iI is a basis of V (where I J = ). Since V 6= V 00 , uj0 V 00 for some j0 J (and
thus, j0
/ I). Since uj0 V 00 , uj0 is orthogonal to every linear form in V 0 . Now, we have
uj0 (ui ) = 0 for all i I, and thus uj0 V 0 . However, uj0 (uj0 ) = 1, contradicting the fact
that uj0 is orthogonal to every linear form in V 0 . Thus, V = V 00 .
(c) Let J = I {1, . . . , m}. Every linear form f V 0 is orthogonal to every uj , for
j J, and thus, f (uj ) = 0, for all j J. For such a linear form f V 0 , let
g = f (u1 )u1 + + f (um )um .
for every v E, is a linear map, and that its kernel Ker h is precisely U 0 . Then, by
Proposition 4.11,
E Ker (h) Im h = U 0 Im h,
and since dim(Im h) m, we deduce that U 0 is a subspace of E of finite codimension at
most m, and by (c), we have dim(U 00 ) = codim(U 0 ) m = dim(U ). However, it is clear
that U U 00 , which implies dim(U ) dim(U 00 ), and so dim(U 00 ) = dim(U ) = m, and we
must have U = U 00 .
108
One should be careful that this bijection does not extend to subspaces of E of infinite
dimension.
When E is of infinite dimension, for every basis (ui )iI of E, the family (ui )iI of coordinate forms is never a basis of E . It is linearly independent, but it is too small to
generate E . For example, if E = R(N) , where N = {0, 1, 2, . . .}, the map f : E R that
sums the nonzero coordinates of a vector in E is a linear form, but it is easy to see that it
cannot be expressed as a linear combination of coordinate forms. As a consequence, when
E is of infinite dimension, E and E are not isomorphic.
Here is another example illustrating the power of Theorem 4.17. Let E = Mn (R), and
consider the equations asserting that the sum of the entries in every row of a matrix Mn (R)
is equal to the same number. We have n 1 equations
n
X
(aij ai+1j ) = 0,
j=1
1 i n 1,
and it is easy to see that they are linearly independent. Therefore, the space U of linear
forms in E spanned by the above linear forms (equations) has dimension n 1, and the
space U 0 of matrices sastisfying all these equations has dimension n2 n + 1. It is not so
obvious to find a basis for this space.
When E is of finite dimension n and (u1 , . . . , un ) is a basis of E, we noted that the family
(u1 , . . . , un ) is a basis of the dual space E (called the dual basis of (u1 , . . . , un )). Let us see
how the coordinates of a linear form over the dual basis (u1 , . . . , un ) vary under a change
of basis.
Let (u1 , . . . , un ) and (v1 , . . . , vn ) be two bases of E, and let P = (ai j ) be the change of
basis matrix from (u1 , . . . , un ) to (v1 , . . . , vn ), so that
vj =
n
X
i=1
ai j ui ,
109
ui =
bj i vj .
j=1
n
X
vj (
bk i vk ) = bj i ,
k=1
and thus
vj
n
X
bj i ui ,
i=1
and
ui =
n
X
ai j vj .
j=1
This means that the change of basis from the dual basis (u1 , . . . , un ) to the dual basis
(v1 , . . . , vn ) is (P 1 )> . Since
n
n
X
X
=
i ui =
0i vi ,
i=1
we get
0j
i=1
n
X
ai j i ,
i=1
so the new coordinates 0j are expressed in terms of the old coordinates i using the matrix
P > . If we use the row vectors (1 , . . . , n ) and (01 , . . . , 0n ), we have
(01 , . . . , 0n ) = (1 , . . . , n )P.
Comparing with the change of basis
vj =
n
X
ai j ui ,
i=1
we note that this time, the coordinates (i ) of the linear form change in the same direction
as the change of basis. For this reason, we say that the coordinates of linear forms are
covariant. By abuse of language, it is often said that linear forms are covariant, which
explains why the term covector is also used for a linear form.
Observe that if (e1 , . . . , en ) is a basis of the vector space E, then, as a linear map from
E to K, every linear form f E is represented by a 1 n matrix, that is, by a row vector
(1 , . . . , n ),
110
with
Pn respect to the basis (e1 , . . . , en ) of E, and 1 of K, where f (ei ) = i . A vector u =
i=1 ui ei E is represented by a n 1 matrix, that is, by a column vector
u1
..
. ,
un
and the action of f on u, namely f (u), is represented by the matrix product
u
.1
1 n .. = 1 u1 + + n un .
un
On the other hand, with respect to the dual basis (e1 , . . . , en ) of E , the linear form f is
represented by the column vector
1
..
. .
n
Remark: In many texts using tensors, vectors are often indexed with lower indices. If so, it
is more convenient to write the coordinates of a vector x over the basis (u1 , . . . , un ) as (xi ),
using an upper index, so that
n
X
xi ui ,
x=
i=1
n
X
aij ui
i=1
and
i
x =
n
X
aij x0j .
j=1
Dually, linear forms are indexed with upper indices. Then, it is more convenient to write the
coordinates of a covector over the dual basis (u 1 , . . . , u n ) as (i ), using a lower index,
so that
n
X
=
i u i
i=1
u =
n
X
j=1
aij v j
n
X
111
aij i .
i=1
With these conventions, the index of summation appears once in upper position and once in
lower position, and the summation sign can be safely omitted, a trick due to Einstein. For
example, we can write
0j = aij i
as an abbreviation for
0j =
n
X
aij i .
i=1
For another example of the use of Einsteins notation, if the vectors (v1 , . . . , vn ) are linear
combinations of the vectors (u1 , . . . , un ), with
vi =
n
X
1 i n,
aij uj ,
j=1
1 i n.
Thus, in Einsteins notation, the n n matrix (aij ) is denoted by (aji ), a (1, 1)-tensor .
Beware that some authors view a matrix as a mapping between coordinates, in which
case the matrix (aij ) is denoted by (aij ).
We will now pin down the relationship between a vector space E and its bidual E .
Proposition 4.18. Let E be a vector space. The following properties hold:
(a) The linear map evalE : E E defined such that
evalE (v) = evalv
for all v E,
112
If E is of finite dimension n, by Theorem 4.17, for every basis (u1 , . . . , un ), the family
the bidual E . This shows that dim(E) = dim(E ) = n, and since by part (a), we know
that evalE : E E is injective, in fact, evalE : E E is bijective (because an injective
map carries a linearly independent family to a linearly independent family, and in a vector
space of dimension n, a linearly independent family of n vectors is a basis, see Proposition
2.10).
(u1 , . . . , un )
When a vector space E has infinite dimension, E and its bidual E are never isomorphic.
When E is of finite dimension and (u1 , . . . , un ) is a basis of E, in view of the canon
ical isomorphism evalE : E E , the basis (u
1 , . . . , un ) of the bidual is identified with
(u1 , . . . , un ).
Proposition 4.18 can be reformulated very fruitfully in terms of pairings (adapted from
E. Artin [3], Chapter 1). Given two vector spaces E and F over a field K, we say that a
function : E F K is bilinear if for every v V , the map u 7 (u, v) (from E to K)
is linear, and for every u E, the map v 7 (u, v) (from F to K) is linear.
Definition 4.9. Given two vector spaces E and F over K, a pairing between E and F is a
bilinear map : E F K. Such a pairing is nondegenerate iff
(1) for every u E, if (u, v) = 0 for all v F , then u = 0, and
(2) for every v F , if (u, v) = 0 for all u E, then v = 0.
A pairing : E F K is often denoted by h, i : E F K. For example, the
map h, i : E E K defined earlier is a nondegenerate pairing (use the proof of (a) in
Proposition 4.18).
Given a pairing : E F K, we can define two maps l : E F and r : F E
as follows: For every u E, we define the linear form l (u) in F such that
l (u)(y) = (u, y) for every y F ,
and for every v F , we define the linear form r (v) in E such that
r (v)(x) = (x, v) for every x E.
We have the following useful proposition.
Proposition 4.19. Given two vector spaces E and F over K, for every nondegenerate
pairing : E F K between E and F , the maps l : E F and r : F E are linear
and injective. Furthermore, if E and F have finite dimension, then this dimension is the
same and l : E F and r : F E are bijections.
113
4.3
Actually, Proposition 4.20 below follows from parts (c) and (d) of Theorem 4.17, but we feel
that it is also interesting to give a more direct proof.
Proposition 4.20. Let E be a vector space. The following properties hold:
(a) Given any nonnull linear form f E , its kernel H = Ker f is a hyperplane.
(b) For any hyperplane H in E, there is a (nonnull) linear form f E such that H =
Ker f .
(c) Given any hyperplane H in E and any (nonnull) linear form f E such that H =
Ker f , for every linear form g E , H = Ker g iff g = f for some 6= 0 in K.
Proof. (a) If f E is nonnull, there is some vector v0 E such that f (v0 ) 6= 0. Let
H = Ker f . For every v E, we have
f (v)
f (v)
v0 = f (v)
f (v0 ) = f (v) f (v) = 0.
f v
f (v0 )
f (v0 )
Thus,
v
f (v)
v0 = h H,
f (v0 )
and
v =h+
f (v)
v0 ,
f (v0 )
114
(c) Let H be a hyperplane in E, and let f E be any (nonnull) linear form such that
H = Ker f . Clearly, if g = f for some 6= 0, then H = Ker g . Conversely, assume that
H = Ker g for some nonnull linear form g . From (a), we have E = H Kv0 , for some v0
such that f (v0 ) 6= 0 and g (v0 ) 6= 0. Then, observe that
g
g (v0 )
f
f (v0 )
is a linear form that vanishes on H, since both f and g vanish on H, but also vanishes on
Kv0 . Thus, g = f , with
g (v0 )
=
.
f (v0 )
We leave as an exercise the fact that every subspace V 6= E of a vector space E, is the
intersection of all hyperplanes that contain V . We now consider the notion of transpose of
a linear map and of a matrix.
4.4
Given a linear map f : E F , it is possible to define a map f > : F E which has some
interesting properties.
Definition 4.10. Given a linear map f : E F , the transpose f > : F E of f is the
linear map defined such that
f > (v ) = v f,
for every v F ,
/F
BB
BB
B v
f > (v ) B
E BB
K.
115
Note the reversal of composition on the right-hand side of (g f )> = f > g > .
The equation (g f )> = f > g > implies the following useful proposition.
Proposition 4.21. If f : E F is any linear map, then the following properties hold:
(1) If f is injective, then f > is surjective.
We also have the following property showing the naturality of the eval map.
Proposition 4.22. For any linear map f : E F , we have
f >> evalE = evalF f,
f >>
F O
evalE
evalF
f
F.
= hf > (), ui
= h, f (u)i
= hevalF (f (u)), i
= h(evalF f )(u), i
= (evalF f )(u)(),
116
If E and F are finite-dimensional, then evalE and then evalF are isomorphisms, so Proposition 4.22 shows that if we identify E with its bidual E and F with its bidual F then
(f > )> = f.
As a corollary of Proposition 4.22, if dim(E) is finite, then we have
Ker (f >> ) = evalE (Ker (f )).
Indeed, if E is finite-dimensional, the map evalE : E E is an isomorphism, so every
E is of the form = evalE (u) for some u E, the map evalF : F F is injective,
and we have
f >> () = 0 iff
iff
iff
iff
iff
and
Proof. We have
hw , f (v)i = hf > (w ), vi,
for all v E and all w F , and thus, we have hw , f (v)i = 0 for every v V , i.e.
w f (V )0 , iff hf > (w ), vi = 0 for every v V , iff f > (w ) V 0 , i.e. w (f > )1 (V 0 ),
proving that
f (V )0 = (f > )1 (V 0 ).
Since we already observed that E 0 = 0, letting V = E in the above identity, we obtain
that
Ker f > = (Im f )0 .
From the equation
hw , f (v)i = hf > (w ), vi,
117
we deduce that v (Im f > )0 iff hf > (w ), vi = 0 for all w F iff hw , f (v)i = 0 for all
w F . Assume that v (Im f > )0 . If we pick a basis (wi )iI of F , then we have the linear
forms wi : F K such that wi (wj ) = ij , and since we must have hwi , f (v)i = 0 for all
i I and (wi )iI is a basis of F , we conclude that f (v) = 0, and thus v Ker f (this is
because hwi , f (v)i is the coefficient of f (v) associated with the basis vector wi ). Conversely,
if v Ker f , then hw , f (v)i = 0 for all w F , so we conclude that v (Im f > )0 .
Therefore, v (Im f > )0 iff v Ker f ; that is,
Ker f = (Im f > )0 ,
as claimed.
The following proposition gives a natural interpretation of the dual (E/U ) of a quotient
space E/U .
Proposition 4.24. For any subspace U of a vector space E, if p : E E/U is the canonical
surjection onto E/U , then p> is injective and
Im(p> ) = U 0 = (Ker (p))0 .
Therefore, p> is a linear isomorphism between (E/U ) and U 0 .
Proof. Since p is surjective, by Proposition 4.21, the map p> is injective. Obviously, U =
Ker (p). Observe that Im(p> ) consists of all linear forms E such that = p for
some (E/U ) , and since Ker (p) = U , we have U Ker (). Conversely for any linear
form E , if U Ker (), then factors through E/U as = p as shown in the
following commutative diagram
p
E CC / E/U
CC
CC
C
CC
!
K,
v E,
where v E/U denotes the equivalence class of v E. The map does not depend on the
representative chosen in the equivalence class v, since if v 0 = v, that is v 0 v = u U , then
(v 0 ) = (v + u) = (v) + (u) = (v) + 0 = (v). Therefore, we have
Im(p> ) = { p | (E/U ) }
= { : E K | U Ker ()}
= U 0,
which proves our result.
118
Proposition 4.24 yields another proof of part (b) of the duality theorem (theorem 4.17)
that does not involve the existence of bases (in infinite dimension).
Proposition 4.25. For any vector space E and any subspace V of E, we have V 00 = V .
Proof. We begin by observing that V 0 = V 000 . This is because, for any subspace U of E ,
we have U U 00 , so V 0 V 000 . Furthermore, V V 00 holds, and for any two subspaces
M, N of E, if M N then N 0 N 0 , so we get V 000 V 0 . Write V1 = V 00 , so that
V10 = V 000 = V 0 . We wish to prove that V1 = V .
Since V V1 = V 00 , the canonical projection p1 : E E/V1 factors as p1 = f p as in
the diagram below,
p
/ E/V
CC
CC
f
p1 CCC
!
E CC
E/V1
where p : E E/V is the canonical projection onto E/V and f : E/V E/V1 is the
quotient map induced by p1 , with f (uE/V ) = p1 (u) = uE/V1 , for all u E (since V V1 , if
u u0 = v V , then u u0 = v V1 , so p1 (u) = p1 (u0 )). Since p1 is surjective, so is f . We
wish to prove that f is actually an isomorphism, and for this, it is enough to show that f is
injective. By transposing all the maps, we get the commutative diagram
E dHo H
p>
(E/V )
HH
HH
HH
HH
p>
1
f>
(E/V1 ) ,
0
but by Proposition 4.24, the maps p> : (E/V ) V 0 and p>
1 : (E/V1 ) V1 are isomorphism, and since V 0 = V10 , we have the following diagram where both p> and p>
1 are
isomorphisms:
V 0 dHo H
p>
(E/V
)
O
HH
HH
HH
HH
p>
1
f>
(E/V1 ) .
Therefore, f > = (p> )1 p>
1 is an isomorphism. We claim that this implies that f is injective.
If f is not injective, then there is some x E/V such that x 6= 0 and f (x) = 0, so
for every (E/V1 ) , we have f > ()(x) = (f (x)) = 0. However, there is linear form
(E/V ) such that (x) = 1, so 6= f > () for all (E/V1 ) , contradicting the fact
that f > is surjective. To find such a linear form , pick any supplement W of Kx in E/V , so
that E/V = Kx W (W is a hyperplane in E/V not containing x), and define to be zero
119
on W and 1 on x.3 Therefore, f is injective, and since we already know that it is surjective,
it is bijective. This means that the canonical map f : E/V E/V1 with V V1 is an
isomorphism, which implies that V = V1 = V 00 (otherwise, if v V1 V , then p1 (v) = 0, so
f (p(v)) = p1 (v) = 0, but p(v) 6= 0 since v
/ V , and f is not injective).
The following theorem shows the relationship between the rank of f and the rank of f > .
Theorem 4.26. Given a linear map f : E F , the following properties hold.
(a) The dual (Im f ) of Im f is isomorphic to Im f > = f > (F ); that is,
(Im f ) Im f > .
(b) rk(f ) rk(f > ). If rk(f ) is finite, we have rk(f ) = rk(f > ).
Proof. (a) Consider the linear maps
p
E Im f F,
p
p>
p>
Using Zorns lemma, we pick W maximal among all subspaces of E/V such that Kx W = (0); then,
E/V = Kx W .
120
Remarks:
1. If dim(E) is finite, following an argument of Dan Guralnik, we can also prove that
rk(f ) = rk(f > ) as follows.
We know from Proposition 4.23 applied to f > : F E that
Ker (f >> ) = (Im f > )0 ,
and we showed as a consequence of Proposition 4.22 that
Ker (f >> ) = evalE (Ker (f )).
It follows (since evalE is an isomorphism) that
dim((Im f > )0 ) = dim(Ker (f >> )) = dim(Ker (f )) = dim(E) dim(Im f ),
and since
dim(Im f > ) + dim((Im f > )0 ) = dim(E),
we get
dim(Im f > ) = dim(Im f ).
2. As indicated by Dan Guralnik, if dim(E) is finite, the above result can be used to prove
that
Im f > = (Ker (f ))0 .
From
hf > (), ui = h, f (u)i
for all F and all u E, we see that if u Ker (f ), then hf > (), ui = h, 0i = 0,
which means that f > () (Ker (f ))0 , and thus, Im f > (Ker (f ))0 . For the converse,
since dim(E) is finite, we have
dim((Ker (f ))0 ) = dim(E) dim(Ker (f )) = dim(Im f ),
121
>
Since p is surjective, p> is injective, since j is injective, j > is surjective, and since f is
>
>
bijective, f is also bijective. It follows that (E/Ker (f )) = Im(f j > ), and we have
Im f > = Im p> .
Since p : E E/Ker (f ) is the canonical surjection, by Proposition 4.24 applied to U =
Ker (f ), we get
Im f > = Im p> = (Ker (f ))0 ,
as claimed.
122
>
>
f > (vi ) = a>
1 i u1 + + aj i uj + + an i un
>
>
over the basis (u1 , . . . , un ), which is just a>
j i = f (vi )(uj ) = hf (vi ), uj i. Since
We now can give a very short proof of the fact that the rank of a matrix is equal to the
rank of its transpose.
Proposition 4.29. Given a m n matrix A over a field K, we have rk(A) = rk(A> ).
Proof. The matrix A corresponds to a linear map f : K n K m , and by Theorem 4.26,
rk(f ) = rk(f > ). By Proposition 4.28, the linear map f > corresponds to A> . Since rk(A) =
rk(f ), and rk(A> ) = rk(f > ), we conclude that rk(A) = rk(A> ).
Thus, given an mn-matrix A, the maximum number of linearly independent columns is
equal to the maximum number of linearly independent rows. There are other ways of proving
this fact that do not involve the dual space, but instead some elementary transformations
on rows and columns.
Proposition 4.29 immediately yields the following criterion for determining the rank of a
matrix:
123
a11 a12
A = a21 a22
a31 a32
has rank 2 iff one of the three 2 2 matrices
a11 a12
a11 a12
a21 a22
a31 a32
a21 a22
a31 a32
is invertible. We will see in Chapter 5 that this is equivalent to the fact the determinant of
one of the above matrices is nonzero. This is not a very efficient way of finding the rank of
a matrix. We will see that there are better ways using various decompositions such as LU,
QR, or SVD.
4.5
124
(1) The column space of A, denoted by Im A or R(A); this is the subspace of Rm spanned
by the columns of A, which corresponds to the image Im f of f .
(2) The kernel or nullspace of A, denoted by Ker A or N (A); this is the subspace of Rn
consisting of all vectors x Rn such that Ax = 0.
(3) The row space of A, denoted by Im A> or R(A> ); this is the subspace of Rn spanned
by the rows of A, or equivalently, spanned by the columns of A> , which corresponds
to the image Im f > of f > .
(4) The left kernel or left nullspace of A denoted by Ker A> or N (A> ); this is the kernel
(nullspace) of A> , the subspace of Rm consisting of all vectors y Rm such that
A> y = 0, or equivalently, y > A = 0.
Recall that the dimension r of Im f , which is also equal to the dimension of the column
space Im A = R(A), is the rank of A (and f ). Then, some our previous results can be
reformulated as follows:
1. The column space R(A) of A has dimension r.
2. The nullspace N (A) of A has dimension n r.
3. The row space R(A> ) has dimension r.
4. The left nullspace N (A> ) of A has dimension m r.
The above statements constitute what Strang calls the Fundamental Theorem of Linear
Algebra, Part I (see Strang [105]).
The two statements
Ker f = (Im f > )0
Ker f > = (Im f )0
translate to
(1) The nullspace of A is the orthogonal of the row space of A.
(2) The left nullspace of A is the orthogonal of the column space of A.
The above statements constitute what Strang calls the Fundamental Theorem of Linear
Algebra, Part II (see Strang [105]).
Since vectors are represented by column vectors and linear forms by row vectors (over a
basis in E or F ), a vector x Rn is orthogonal to a linear form y if
yx = 0.
125
4.6. SUMMARY
Then, a vector x Rn is orthogonal to the row space of A iff x is orthogonal to every row
of A, namely Ax = 0, which is equivalent to the fact that x belong to the nullspace of A.
Similarly, the column vector y Rm (representing a linear form over the dual basis of F )
belongs to the nullspace of A> iff A> y = 0, iff y > A = 0, which means that the linear form
given by y > (over the basis in F ) is orthogonal to the column space of A.
Since (2) is equivalent to the fact that the column space of A is equal to the orthogonal
of the left nullspace of A, we get the following criterion for the solvability of an equation of
the form Ax = b:
The equation Ax = b has a solution iff for all y Rm , if A> y = 0, then y > b = 0.
Indeed, the condition on the right-hand side says that b is orthogonal to the left nullspace
of A, that is, that b belongs to the column space of A.
This criterion can be cheaper to check that checking directly that b is spanned by the
columns of A. For example, if we consider the system
x 1 x 2 = b1
x 2 x 3 = b2
x 3 x 1 = b3
which, in matrix form, is written Ax = b as below:
1 1 0
x1
b1
0
1 1
x 2 = b2 ,
1 0
1
x3
b3
we see that the rows of the matrix A add up to 0. In fact, it is easy to convince ourselves that
the left nullspace of A is spanned by y = (1, 1, 1), and so the system is solvable iff y > b = 0,
namely
b1 + b2 + b3 = 0.
Note that the above criterion can also be stated negatively as follows:
The equation Ax = b has no solution iff there is some y Rm such that A> y = 0 and
y > b 6= 0.
4.6
Summary
The main concepts and results of this chapter are listed below:
Direct products, sums, direct sums.
Projections.
126
(Proposition 4.23).
If F is finite-dimensional, then
rk(f ) = rk(f > ).
(Theorem 4.26).
127
4.6. SUMMARY
The matrix of the transpose map f > is equal to the transpose of the matrix of the map
f (Proposition 4.28).
For any m n matrix A,
rk(A) = rk(A> ).
128
Chapter 5
Determinants
5.1
This chapter contains a review of determinants and their use in linear algebra. We begin
with permutations and the signature of a permutation. Next, we define multilinear maps
and alternating multilinear maps. Determinants are introduced as alternating multilinear
maps taking the value 1 on the unit matrix (following Emil Artin). It is then shown how
to compute a determinant using the Laplace expansion formula, and the connection with
the usual definition is made. It is shown how determinants can be used to invert matrices
and to solve (at least in theory!) systems of linear equations (the Cramer formulae). The
determinant of a linear map is defined. We conclude by defining the characteristic polynomial
of a matrix (and of a linear map) and by proving the celebrated Cayley-Hamilton theorem
which states that every matrix is a zero of its characteristic polynomial (we give two proofs;
one computational, the other one more conceptual).
Determinants can be defined in several ways. For example, determinants can be defined
in a fancy way in terms of the exterior algebra (or alternating algebra) of a vector space.
We will follow a more algorithmic approach due to Emil Artin. No matter which approach
is followed, we need a few preliminaries about permutations on a finite set. We need to
show that every permutation on n elements is a product of transpositions, and that the
parity of the number of transpositions involved is an invariant of the permutation. Let
[n] = {1, 2 . . . , n}, where n N, and n > 0.
Definition 5.1. A permutation on n elements is a bijection : [n] [n]. When n = 1, the
only function from [1] to [1] is the constant map: 1 7 1. Thus, we will assume that n 2.
A transposition is a permutation : [n] [n] such that, for some i < j (with 1 i < j n),
(i) = j, (j) = i, and (k) = k, for all k [n] {i, j}. In other words, a transposition
exchanges two distinct elements i, j [n]. A cyclic permutation of order k (or k-cycle) is a
permutation : [n] [n] such that, for some sequence (i1 , i2 , . . . , ik ) of distinct elements of
[n] with 2 k n,
(i1 ) = i2 , (i2 ) = i3 , . . . , (ik1 ) = ik , (ik ) = i1 ,
129
130
CHAPTER 5. DETERMINANTS
and (j) = j, for j [n] {i1 , . . . , ik }. The set {i1 , . . . , ik } is called the domain of the cyclic
permutation, and the cyclic permutation is sometimes denoted by (i1 , i2 , . . . , ik ).
If is a transposition, clearly, = id. Also, a cyclic permutation of order 2 is a
transposition, and for a cyclic permutation of order k, we have k = id. Clearly, the
composition of two permutations is a permutation and every permutation has an inverse
which is also a permutation. Therefore, the set of permutations on [n] is a group often
denoted Sn . It is easy to show by induction that the group Sn has n! elements. We will
also use the terminology product of permutations (or transpositions), as a synonym for
composition of permutations.
The following proposition shows the importance of cyclic permutations and transpositions.
Proposition 5.1. For every n 2, for every permutation : [n] [n], there is a partition
of [n] into r subsets called the orbits of , with 1 r n, where each set J in this partition
is either a singleton {i}, or it is of the form
J = {i, (i), 2 (i), . . . , ri 1 (i)},
where ri is the smallest integer, such that, ri (i) = i and 2 ri n. If is not the identity,
then it can be written in a unique way (up to the order) as a composition = 1 . . . s
of cyclic permutations with disjoint domains, where s is the number of orbits with at least
two elements. Every permutation : [n] [n] can be written as a nonempty composition of
transpositions.
Proof. Consider the relation R defined on [n] as follows: iR j iff there is some k 1 such
that j = k (i). We claim that R is an equivalence relation. Transitivity is obvious. We
claim that for every i [n], there is some least r (1 r n) such that r (i) = i.
Indeed, consider the following sequence of n + 1 elements:
hi, (i), 2 (i), . . . , n (i)i.
Since [n] only has n distinct elements, there are some h, k with 0 h < k n such that
h (i) = k (i),
and since is a bijection, this implies kh (i) = i, where 0 k h n. Thus, we proved
that there is some integer m 1 such that m (i) = i, so there is such a smallest integer r.
131
Now, for every i [n], the equivalence class (orbit) of i is a subset of [n], either the singleton
{i} or a set of the form
J = {i, (i), 2 (i), . . . , ri 1 (i)},
where ri is the smallest integer such that ri (i) = i and 2 ri n, and in the second case,
the restriction of to J induces a cyclic permutation i , and = 1 . . . s , where s is the
number of equivalence classes having at least two elements.
For the second part of the proposition, we proceed by induction on n. If n = 2, there are
exactly two permutations on [2], the transposition exchanging 1 and 2, and the identity.
However, id2 = 2 . Now, let n 3. If (n) = n, since by the induction hypothesis, the
restriction of to [n 1] can be written as a product of transpositions, itself can be
written as a product of transpositions. If (n) = k 6= n, letting be the transposition such
that (n) = k and (k) = n, it is clear that leaves n invariant, and by the induction
hypothesis, we have = m . . . 1 for some transpositions, and thus
= m . . . 1 ,
a product of transpositions (since = idn ).
Remark: When = idn is the identity permutation, we can agree that the composition of
0 transpositions is the identity. The second part of Proposition 5.1 shows that the transpositions generate the group of permutations Sn .
In writing a permutation as a composition = 1 . . . s of cyclic permutations, it
is clear that the order of the i does not matter, since their domains are disjoint. Given
a permutation written as a product of transpositions, we now show that the parity of the
number of transpositions is an invariant.
Definition 5.2. For every n 2, since every permutation : [n] [n] defines a partition
of r subsets over which acts either as the identity or as a cyclic permutation, let (),
called the signature of , be defined by () = (1)nr , where r is the number of sets in the
partition.
If is a transposition exchanging i and j, it is clear that the partition associated with
consists of n 1 equivalence classes, the set {i, j}, and the n 2 singleton sets {k}, for
k [n] {i, j}, and thus, ( ) = (1)n(n1) = (1)1 = 1.
Proposition 5.2. For every n 2, for every permutation : [n] [n], for every transposition , we have
( ) = ().
Consequently, for every product of transpositions such that = m . . . 1 , we have
() = (1)m ,
which shows that the parity of the number of transpositions is an invariant.
132
CHAPTER 5. DETERMINANTS
Proof. Assume that (i) = j and (j) = i, where i < j. There are two cases, depending
whether i and j are in the same equivalence class Jl of R , or if they are in distinct equivalence
classes. If i and j are in the same class Jl , then if
Jl = {i1 , . . . , ip , . . . iq , . . . ik },
where ip = i and iq = j, since
(( 1 (ip ))) = (ip ) = (i) = j = iq
and
((iq1 )) = (iq ) = (j) = i = ip ,
it is clear that Jl splits into two subsets, one of which is {ip , . . . , iq1 }, and thus, the number
of classes associated with is r + 1, and ( ) = (1)nr1 = (1)nr = (). If i
and j are in distinct equivalence classes Jl and Jm , say
{i1 , . . . , ip , . . . ih }
and
{j1 , . . . , jq , . . . jk },
where ip = i and jq = j, since
(( 1 (ip ))) = (ip ) = (i) = j = jq
and
(( 1 (jq ))) = (jq ) = (j) = i = ip ,
we see that the classes Jl and Jm merge into a single class, and thus, the number of classes
associated with is r 1, and ( ) = (1)nr+1 = (1)nr = ().
Now, let = m . . . 1 be any product of transpositions. By the first part of the
proposition, we have
() = (1)m1 (1 ) = (1)m1 (1) = (1)m ,
since (1 ) = 1 for a transposition.
Remark: When = idn is the identity permutation, since we agreed that the composition
of 0 transpositions is the identity, it it still correct that (1)0 = (id) = +1. From the
proposition, it is immediate that ( 0 ) = ( 0 )(). In particular, since 1 = idn , we
get ( 1 ) = ().
We can now proceed with the definition of determinants.
5.2
133
First, we define multilinear maps, symmetric multilinear maps, and alternating multilinear
maps.
Remark: Most of the definitions and results presented in this section also hold when K is
a commutative ring, and when we consider modules over K (free modules, when bases are
needed).
Let E1 , . . . , En , and F , be vector spaces over a field K, where n 1.
Definition 5.3. A function f : E1 . . . En F is a multilinear map (or an n-linear
map) if it is linear in each argument, holding the others fixed. More explicitly, for every i,
1 i n, for all x1 E1 . . ., xi1 Ei1 , xi+1 Ei+1 , . . ., xn En , for all x, y Ei , for all
K,
f (x1 , . . . , xi1 , x + y, xi+1 , . . . , xn ) = f (x1 , . . . , xi1 , x, xi+1 , . . . , xn )
+ f (x1 , . . . , xi1 , y, xi+1 , . . . , xn ),
f (x1 , . . . , xi1 , x, xi+1 , . . . , xn ) = f (x1 , . . . , xi1 , x, xi+1 , . . . , xn ).
When F = K, we call f an n-linear form (or multilinear form). If n 2 and E1 =
E2 = . . . = En , an n-linear map f : E . . . E F is called symmetric, if f (x1 , . . . , xn ) =
f (x(1) , . . . , x(n) ), for every permutation on {1, . . . , n}. An n-linear map f : E . . . E
F is called alternating, if f (x1 , . . . , xn ) = 0 whenever xi = xi+1 , for some i, 1 i n 1 (in
other words, when two adjacent arguments are equal). It does not harm to agree that when
n = 1, a linear map is considered to be both symmetric and alternating, and we will do so.
When n = 2, a 2-linear map f : E1 E2 F is called a bilinear map. We have already
seen several examples of bilinear maps. Multiplication : K K K is a bilinear map,
treating K as a vector space over itself. More generally, multiplication : A A A in a
ring A is a bilinear map, viewing A as a module over itself.
The operation h, i : E E K applying a linear form to a vector is a bilinear map.
Symmetric bilinear maps (and multilinear maps) play an important role in geometry
(inner products, quadratic forms), and in differential calculus (partial derivatives).
A bilinear map is symmetric if f (u, v) = f (v, u), for all u, v E.
Alternating multilinear maps satisfy the following simple but crucial properties.
Proposition 5.3. Let f : E . . . E F be an n-linear alternating map, with n 2. The
following properties hold:
(1)
f (. . . , xi , xi+1 , . . .) = f (. . . , xi+1 , xi , . . .)
134
CHAPTER 5. DETERMINANTS
(2)
f (. . . , xi , . . . , xj , . . .) = 0,
where xi = xj , and 1 i < j n.
(3)
f (. . . , xi , . . . , xj , . . .) = f (. . . , xj , . . . , xi , . . .),
where 1 i < j n.
(4)
f (. . . , xi , . . .) = f (. . . , xi + xj , . . .),
for any K, and where i 6= j.
Proof. (1) By multilinearity applied twice, we have
f (. . . , xi + xi+1 , xi + xi+1 , . . .) = f (. . . , xi , xi , . . .) + f (. . . , xi , xi+1 , . . .)
+ f (. . . , xi+1 , xi , . . .) + f (. . . , xi+1 , xi+1 , . . .),
and since f is alternating, this yields
0 = f (. . . , xi , xi+1 , . . .) + f (. . . , xi+1 , xi , . . .),
that is, f (. . . , xi , xi+1 , . . .) = f (. . . , xi+1 , xi , . . .).
(2) If xi = xj and i and j are not adjacent, we can interchange xi and xi+1 , and then xi
and xi+2 , etc, until xi and xj become adjacent. By (1),
f (. . . , xi , . . . , xj , . . .) = f (. . . , xi , xj , . . .),
where = +1 or 1, but f (. . . , xi , xj , . . .) = 0, since xi = xj , and (2) holds.
(3) follows from (2) as in (1). (4) is an immediate consequence of (2).
Proposition 5.3 will now be used to show a fundamental property of alternating multilinear maps. First, we need to extend the matrix notation a little bit. Let E be a vector space
over K. Given an n n matrix A = (ai j ) over K, we can define a map L(A) : E n E n as
follows:
L(A)1 (u) = a1 1 u1 + + a1 n un ,
...
L(A)n (u) = an 1 u1 + + an n un ,
for all u1 , . . . , un E, with u = (u1 , . . . , un ). It is immediately verified that L(A) is linear.
Then, given two n n matrice A = (ai j ) and B = (bi j ), by repeating the calculations
establishing the product of matrices (just before Definition 3.1), we can show that
L(AB) = L(A) L(B).
135
It is then convenient to use the matrix notation to describe the effect of the linear map L(A),
as
L(A)1 (u)
a1 1 a1 2 . . . a 1 n
u1
L(A)2 (u) a2 1 a2 2 . . . a2 n u2
= ..
..
.. . .
.. .. .
.
.
.
.
. .
L(A)n (u)
an 1 an 2 . . . an n
un
Lemma 5.4. Let f : E . . . E F be an n-linear alternating map. Let (u1 , . . . , un ) and
(v1 , . . . , vn ) be two families of n vectors, such that,
v1 = a1 1 u1 + + an 1 un ,
...
vn = a1 n u1 + + an n un .
Equivalently, letting
a1 1 a1 2
a2 1 a2 2
A = ..
..
.
.
an 1 an 2
assume that we have
. . . a1 n
. . . a2 n
..
..
.
.
. . . an n
u1
v1
u2
v2
.. = A> .. .
.
.
un
vn
Then,
f (v1 , . . . , vn ) =
X
Sn
()a(1) 1 a(n) n f (u1 , . . . , un ),
136
CHAPTER 5. DETERMINANTS
The quantity
det(A) =
X
Sn
()a(1) 1 a(n) n
is in fact the value of the determinant of A (which, as we shall see shortly, is also equal to the
determinant of A> ). However, working directly with the above definition is quite ackward,
and we will proceed via a slightly indirect route
5.3
Definition of a Determinant
Recall that the set of all square n n-matrices with coefficients in a field K is denoted by
Mn (K).
Definition 5.4. A determinant is defined as any map
D : Mn (K) K,
which, when viewed as a map on (K n )n , i.e., a map of the n columns of a matrix, is n-linear
alternating and such that D(In ) = 1 for the identity matrix In . Equivalently, we can consider
a vector space E of dimension n, some fixed basis (e1 , . . . , en ), and define
D : En K
as an n-linear alternating map such that D(e1 , . . . , en ) = 1.
First, we will show that such maps D exist, using an inductive definition that also gives
a recursive method for computing determinants. Actually, we will define a family (Dn )n1
of (finite) sets of maps D : Mn (K) K. Second, we will show that determinants are in fact
uniquely defined, that is, we will show that each Dn consists of a single map. This will show
the equivalence of the direct definition det(A) of Lemma 5.4 with the inductive definition
D(A). Finally, we will prove some basic properties of determinants, using the uniqueness
theorem.
Given a matrix A Mn (K), we denote its n columns by A1 , . . . , An .
Definition 5.5. For every n 1, we define a finite set Dn of maps D : Mn (K) K
inductively as follows:
When n = 1, D1 consists of the single map D such that, D(A) = a, where A = (a), with
a K.
Assume that Dn1 has been defined, where n 2. We define the set Dn as follows. For
every matrix A Mn (K), let Ai j be the (n 1) (n 1)-matrix obtained from A = (ai j )
by deleting row i and column j. Then, Dn consists of all the maps D such that, for some i,
1 i n,
D(A) = (1)i+1 ai 1 D(Ai 1 ) + + (1)i+n ai n D(Ai n ),
where for every j, 1 j n, D(Ai j ) is the result of applying any D in Dn1 to Ai j .
137
We confess that the use of the same letter D for the member of Dn being defined, and
for members of Dn1 , may be slightly confusing. We considered using subscripts to
distinguish, but this seems to complicate things unnecessarily. One should not worry too
much anyway, since it will turn out that each Dn contains just one map.
Each (1)i+j D(Ai j ) is called the cofactor of ai j , and the inductive expression for D(A)
is called a Laplace expansion of D according to the i-th row . Given a matrix A Mn (K),
each D(A) is called a determinant of A.
We can think of each member of Dn as an algorithm to evaluate the determinant of A.
The main point is that these algorithms, which recursively evaluate a determinant using all
possible Laplace row expansions, all yield the same result, det(A).
We will prove shortly that D(A) is uniquely defined (at the moment, it is not clear that
Dn consists of a single map). Assuming this fact, given a n n-matrix A = (ai j ),
a1 1 a1 2 . . . a 1 n
a2 1 a2 2 . . . a 2 n
A = ..
.. . .
..
.
.
.
.
an 1 an 2 . . . an n
its determinant is denoted by D(A) or det(A), or more
a1 1 a1 2 . . .
a2 1 a2 2 . . .
det(A) = ..
.. . .
.
.
.
an 1 an 2 . . .
explicitly by
a1 n
a2 n
..
.
an n
a1 1 a1 2 a1 3
A = a2 1 a2 2 a2 3
a3 1 a3 2 a3 3
138
CHAPTER 5. DETERMINANTS
expanding according to the first row, we have
a2 2 a2 3
a2 1 a2 3
a2 1 a2 2
a1 2
D(A) = a1 1
a3 1 a3 3 + a1 3 a3 1 a3 2
a3 2 a3 3
that is,
D(A) = a1 1 (a2 2 a3 3 a3 2 a2 3 ) a1 2 (a2 1 a3 3 a3 1 a2 3 ) + a1 3 (a2 1 a3 2 a3 1 a2 2 ),
which gives the explicit formula
D(A) = a1 1 a2 2 a3 3 + a2 1 a3 2 a1 3 + a3 1 a1 2 a2 3 a1 1 a3 2 a2 3 a2 1 a1 2 a3 3 a3 1 a2 2 a1 3 .
139
Theorem 5.6. For every n 1, for every D Dn , for every matrix A Mn (K), we have
X
D(A) =
()a(1) 1 a(n) n ,
Sn
where the sum ranges over all permutations on {1, . . . , n}. As a consequence, Dn consists
of a single map for every n 1, and this map is given by the above explicit formula.
Proof. Consider the standard basis (e1 , . . . , en ) of K n , where (ei )i = 1 and (ei )j = 0, for
j 6= i. Then, each column Aj of A corresponds to a vector vj whose coordinates over the
basis (e1 , . . . , en ) are the components of Aj , that is, we can write
v1 = a1 1 e1 + + an 1 en ,
...
vn = a1 n e1 + + an n en .
Since by Lemma 5.5, each D is a multilinear alternating map, by applying Lemma 5.4, we
get
X
D(A) = D(v1 , . . . , vn ) =
()a(1) 1 a(n) n D(e1 , . . . , en ),
Sn
where the sum ranges over all permutations on {1, . . . , n}. But D(e1 , . . . , en ) = D(In ),
and by Lemma 5.5, we have D(In ) = 1. Thus,
X
()a(1) 1 a(n) n ,
D(A) =
Sn
140
CHAPTER 5. DETERMINANTS
X
Sn
()a(1) 1 a(n) n ,
where the sum ranges over all permutations on {1, . . . , n}. Since a permutation is invertible,
every product
a(1) 1 a(n) n
can be rewritten as
a1 1 (1) an 1 (n) ,
and since ( 1 ) = () and the sum is taken over all permutations on {1, . . . , n}, we have
X
X
()a(1) 1 a(n) n =
()a1 (1) an (n) ,
Sn
Sn
where and range over all permutations. But it is immediately verified that
X
()a1 (1) an (n) .
det(A> ) =
Sn
A useful consequence of Corollary 5.7 is that the determinant of a matrix is also a multilinear alternating map of its rows. This fact, combined with the fact that the determinant of
a matrix is a multilinear alternating map of its columns is often useful for finding short-cuts
in computing determinants. We illustrate this point on the following example which shows
up in polynomial interpolation.
Example 5.2. Consider the so-called Vandermonde determinant
1
1
.
.
.
1
x1
x2 . . . xn
2
x22 . . . x2n .
V (x1 , . . . , xn ) = x1
..
..
..
..
.
.
.
.
n1 n1
n1
x1
x2
. . . xn
We claim that
V (x1 , . . . , xn ) =
(xj xi ),
1i<jn
141
etc, multiply row i 1 by x1 and subtract it from row i, until we reach row 1. We obtain
the following determinant:
1
1
...
1
0
x
x
.
.
.
x
x
2
1
n
1
V (x1 , . . . , xn ) = 0 x2 (x2 x1 ) . . . xn (xn x1 )
..
..
..
...
.
.
.
n2
n2
0 x2 (x2 x1 ) . . . xn (xn x1 )
Now, expanding this determinant according to the first column and using multilinearity,
we can factor (xi x1 ) from the column of index i 1 of the matrix obtained by deleting
the first row and the first column, and thus
V (x1 , . . . , xn ) = (x2 x1 )(x3 x1 ) (xn x1 )V (x2 , . . . , xn ),
which establishes the induction step.
Lemma 5.4 can be reformulated nicely as follows.
Proposition 5.8. Let f : E . . . E F be an n-linear alternating map. Let (u1 , . . . , un )
and (v1 , . . . , vn ) be two families of n vectors, such that
v1 = a1 1 u1 + + a1 n un ,
...
vn = an 1 u1 + + an n un .
Equivalently, letting
a1 1 a1 2
a2 1 a2 2
A = ..
..
.
.
an 1 an 2
. . . a1 n
. . . a2 n
..
..
.
.
. . . an n
142
CHAPTER 5. DETERMINANTS
As a consequence, we get the very useful property that the determinant of a product of
matrices is the product of the determinants of these matrices.
Proposition 5.9. For any two n n-matrices A and B, we have det(AB) = det(A) det(B).
Proof. We use Proposition 5.8 as follows: let (e1 , . . . , en ) be the standard basis of K n , and
let
w1
e1
w2
e2
.. = AB .. .
.
.
wn
en
Then, we get
det(w1 , . . . , wn ) = det(AB) det(e1 , . . . , en ) = det(AB),
since det(e1 , . . . , en ) = 1. Now, letting
v1
e1
v2
e2
.. = B .. ,
.
.
vn
en
we get
det(v1 , . . . , vn ) = det(B),
and since
w1
v1
w2
v2
.. = A .. ,
.
.
wn
vn
we get
det(w1 , . . . , wn ) = det(A) det(v1 , . . . , vn ) = det(A) det(B).
It should be noted that all the results of this section, up to now, also holds when K is a
commutative ring, and not necessarily a field. We can now characterize when an nn-matrix
A is invertible in terms of its determinant det(A).
5.4
In the next two sections, K is a commutative ring and when needed, a field.
143
e = (bi j )
Definition 5.6. Let K be a commutative ring. Given a matrix A Mn (K), let A
be the matrix defined such that
bi j = (1)i+j det(Aj i ),
e is called the adjugate of A, and each matrix Aj i is called
the cofactor of aj i . The matrix A
a minor of the matrix A.
If j 6= i, we can form the matrix A0 by replacing the j-th row of A by the i-th row of A.
Now, the matrix Aj k obtained by deleting row j and column k from A is equal to the matrix
A0j k obtained by deleting row j and column k from A0 , since A and A0 only differ by the j-th
row. Thus,
det(Aj k ) = det(A0j k ),
and we have
ci j = ai 1 (1)j+1 det(A0j 1 ) + + ai n (1)j+n det(A0j n ).
However, this is the expansion of det(A0 ) according to the j-th row, since the j-th row of A0
is equal to the i-th row of A, and since A0 has two identical rows i and j, because det is an
alternating map of the rows (see an earlier remark), we have det(A0 ) = 0. Thus, we have
shown that ci i = det(A), and ci j = 0, when j 6= i, and so
e = det(A)In .
AA
144
CHAPTER 5. DETERMINANTS
e that
It is also obvious from the definition of A,
f> .
e> = A
A
Then, applying the first part of the argument to A> , we have
f> = det(A> )In ,
A> A
f> , and (AA)
e> = A
e > = A> A
e> , we get
and since, det(A> ) = det(A), A
f> = A> A
e> = (AA)
e >,
det(A)In = A> A
that is,
e > = det(A)In ,
(AA)
which yields
e = det(A)In ,
AA
since In> = In . This proves that
e = AA
e = det(A)In .
AA
e Conversely, if A is
As a consequence, if det(A) is invertible, we have A1 = (det(A))1 A.
1
1
invertible, from AA = In , by Proposition 5.9, we have det(A) det(A ) = 1, and det(A) is
invertible.
When K is a field, an element a K is invertible iff a 6= 0. In this case, the second part
of the proposition can be stated as A is invertible iff det(A) 6= 0. Note in passing that this
method of computing the inverse of a matrix is usually not practical.
We now consider some applications of determinants to linear independence and to solving
systems of linear equations. Although these results hold for matrices over an integral domain,
their proofs require more sophisticated methods (it is necessary to use the fraction field of
the integral domain, K). Therefore, we assume again that K is a field.
Let A be an n n-matrix, x a column vectors of variables, and b another column vector,
and let A1 , . . . , An denote the columns of A. Observe that the system of equation Ax = b,
a1 1 a1 2 . . . a 1 n
x1
b1
a2 1 a2 2 . . . a 2 n x 2 b 2
..
.. . .
.. .. = ..
.
.
.
. . .
an 1 an 2 . . . an n
xn
bn
is equivalent to
x1 A1 + + xj Aj + + xn An = b,
145
Proof. First, assume that the columns A1 , . . . , An of A are linearly dependent. Then, there
are x1 , . . . , xn K, such that
x1 A1 + + xj Aj + + xn An = 0,
det(A1 , . . . , x1 A1 + + xj Aj + + xn An , . . . , An ) = det(A1 , . . . , 0, . . . , An ) = 0,
where 0 occurs in the j-th position, by multilinearity, all terms containing two identical
columns Ak for k 6= j vanish, and we get
xj det(A1 , . . . , An ) = 0.
If we combine Proposition 5.11 with Proposition 4.30, we obtain the following criterion
for finding the rank of a matrix.
Proposition 5.12. Given any m n matrix A over a field K (typically K = R or K = C),
the rank of A is the maximum natural number r such that there is an r r submatrix B of
A obtained by selecting r rows and r columns of A, and such that det(B) 6= 0.
6
146
5.5
CHAPTER 5. DETERMINANTS
We now characterize when a system of linear equations of the form Ax = b has a unique
solution.
Proposition 5.13. Given an n n-matrix A over a field K, the following properties hold:
(1) For every column vector b, there is a unique column vector x such that Ax = b iff the
only solution to Ax = 0 is the trivial vector x = 0, iff det(A) 6= 0.
(2) If det(A) 6= 0, the unique solution of Ax = b is given by the expressions
xj =
5.6
147
5.7
We conclude this chapter with an interesting and important application of Proposition 5.10,
the CayleyHamilton theorem. The results of this section apply to matrices over any commutative ring K. First, we need the concept of the characteristic polynomial of a matrix.
Definition 5.8. If K is any commutative ring, for every n n matrix A Mn (K), the
characteristic polynomial PA (X) of A is the determinant
PA (X) = det(XI A).
148
CHAPTER 5. DETERMINANTS
X a
b
= X 2 (a + d)X + ad bc.
PA (X) =
c
X d
We can substitute the matrix A for the variable X in the polynomial PA (X), obtaining a
matrix PA . If we write
PA (X) = X n + c1 X n1 + + cn ,
then
PA = An + c1 An1 + + cn I.
We have the following remarkable theorem.
Theorem 5.15. (CayleyHamilton) If K is any commutative ring, for every n n matrix
A Mn (K), if we let
PA (X) = X n + c1 X n1 + + cn
be the characteristic polynomial of A, then
PA = An + c1 An1 + + cn I = 0.
Proof. We can view the matrix B = XI A as a matrix with coefficients in the polynomial
e which is the transpose of the matrix of
ring K[X], and then we can form the matrix B
e is an (n 1) (n 1) determinant, and thus a
cofactors of elements of B. Each entry in B
e as
polynomial of degree a most n 1, so we can write B
e = X n1 B0 + X n2 B1 + + Bn1 ,
B
for some matrices B0 , . . . , Bn1 with coefficients in K. For example, when n = 2, we have
X a
b
X
d
b
1
0
d
b
e=
B=
, B
=X
+
.
c
X d
c
X a
0 1
c a
By Proposition 5.10, we have
e = det(B)I = PA (X)I.
BB
On the other hand, we have
e = (XI A)(X n1 B0 + X n2 B1 + + X nj1 Bj + + Bn1 ),
BB
149
150
CHAPTER 5. DETERMINANTS
i = 1, . . . , n,
151
then this means that p(f )(ei ) = 0 for i = 1, . . . , n, which means that the linear map p(f )
vanishes on E. We can also check, as we did in Section 5.2, that if A and B are two n n
matrices and if (u1 , . . . , un ) are any n vectors, then
u1
u1
..
..
A B . = (AB) . .
un
un
This suggests the plan of attack for our second proof of the CayleyHamilton theorem.
For simplicity, we prove the theorem for vector spaces over a field. The proof goes through
for a free module over a commutative ring.
Theorem 5.16. (CayleyHamilton) For every finite-dimensional vector space over a field
K, for every linear map f : E E, for every basis (e1 , . . . , en ), if A is the matrix over f
over the basis (e1 , . . . , en ) and if
PA (X) = X n + c1 X n1 + + cn
is the characteristic polynomial of A, then
PA (f ) = f n + c1 f n1 + + cn id = 0.
Proof. Since the columns of A consist of the vector f (ej ) expressed over the basis (e1 , . . . , en ),
we have
n
X
f (ej ) =
ai j ei , 1 j n.
i=1
n
X
i=1
ai j ei ,
1 j n,
which yields
j1
X
i=1
ai j ei + (X aj j ) ej +
n
X
i=j+1
ai j ei = 0,
1 j n.
Observe that the transpose of the characteristic polynomial shows up, so the above system
can be written as
X a1 1
a2 1
an 1
e1
0
a1 2
X a2 2
an 2 e2 0
.. = .. .
..
..
..
..
. .
.
.
.
.
a1 n
a2 n
X an n
en
0
152
CHAPTER 5. DETERMINANTS
eB
e
e
B
.. = (BB)
.. = PA I .. = B
.. = .. ;
.
.
.
. .
en
en
en
0
0
that is,
PA ej = 0,
j = 1, . . . , n,
5.8
Permanents
153
5.8. PERMANENTS
If we drop the sign () of every permutation from the above formula, we obtain a quantity
known as the permanent:
X
per(A) =
a(1) 1 a(n) n .
Sn
Permanents and determinants were investigated as early as 1812 by Cauchy. It is clear from
the above definition that the permanent is a multilinear and symmetric form. We also have
per(A) = per(A> ),
and the following unsigned version of the Laplace expansion formula:
per(A) = ai 1 per(Ai 1 ) + + ai j per(Ai j ) + + ai n per(Ai n ),
for i = 1, . . . , n. However, unlike determinants which have a clear geometric interpretation as
signed volumes, permanents do not have any natural geometric interpretation. Furthermore,
determinants can be evaluated efficiently, for example using the conversion to row reduced
echelon form, but computing the permanent is hard.
Permanents turn out to have various combinatorial interpretations. One of these is in
terms of perfect matchings of bipartite graphs which we now discuss.
Recall that a bipartite (undirected) graph G = (V, E) is a graph whose set of nodes V can
be partionned into two nonempty disjoint subsets V1 and V2 , such that every edge e E has
one endpoint in V1 and one endpoint in V2 . An example of a bipatite graph with 14 nodes
is shown in Figure 5.8; its nodes are partitioned into the two sets {x1 , x2 , x3 , x4 , x5 , x6 , x7 }
and {y1 , y2 , y3 , y4 , y5 , y6 , y7 }.
y1
y2
y3
y4
y5
y6
y7
x1
x2
x3
x4
x5
x6
x7
154
CHAPTER 5. DETERMINANTS
y1
y2
y3
y4
y5
y6
y7
x1
x2
x3
x4
x5
x6
x7
Obviously, a perfect matching in a bipartite graph can exist only if its set of nodes has
a partition in two blocks of equal size, say {x1 , . . . , xm } and {y1 , . . . , ym }. Then, there is
a bijection between perfect matchings and bijections : {x1 , . . . , xm } {y1 , . . . , ym } such
that (xi ) = yj iff there is an edge between xi and yj .
Now, every bipartite graph G with a partition of its nodes into two sets of equal size as
above is represented by an m m matrix A = (aij ) such that aij = 1 iff there is an edge
between xi and yj , and aij = 0 otherwise. Using the interpretation of perfect machings as
bijections : {x1 , . . . , xm } {y1 , . . . , ym }, we see that the permanent per(A) of the (0, 1)matrix A representing the bipartite graph G counts the number of perfect matchings in G.
In a famous paper published in 1979, Leslie Valiant proves that computing the permanent
is a #P-complete problem. Such problems are suspected to be intractable. It is known that
if a polynomial-time algorithm existed to solve a #P-complete problem, then we would have
P = N P , which is believed to be very unlikely.
Another combinatorial interpretation of the permanent can be given in terms of systems
of distinct representatives. Given a finite set S, let (A1 , . . . , An ) be any sequence of nonempty
subsets of S (not necessarily distinct). A system of distinct representatives (for short SDR)
of the sets A1 , . . . , An is a sequence of n distinct elements (a1 , . . . , an ), with ai Ai for i =
1, . . . , n. The number of SDRs of a sequence of sets plays an important role in combinatorics.
Now, if S = {1, 2, . . . , n} and if we associate to any sequence (A1 , . . . , An ) of nonempty
subsets of S the matrix A = (aij ) defined such that aij = 1 if j Ai and aij = 0 otherwise,
then the permanent per(A) counts the number of SDRs of the set A1 , . . . , An .
This interpretation of permanents in terms of SDRs can be used to prove bounds for the
permanents of various classes of matrices. Interested readers are referred to van Lint and
Wilson [113] (Chapters 11 and 12). In particular, a proof of a theorem known as Van der
Waerden conjecture is given in Chapter 12. This theorem states that for any n n matrix
A with nonnegative entries in which all row-sums and column-sums are 1 (doubly stochastic
matrices), we have
n!
per(A) n ,
n
155
with equality for the matrix in which all entries are equal to 1/n.
5.9
Further Readings
Thorough expositions of the material covered in Chapters 24 and 5 can be found in Strang
[105, 104], Lax [71], Lang [67], Artin [4], Mac Lane and Birkhoff [73], Hoffman and Kunze
[62], Bourbaki [14, 15], Van Der Waerden [112], Serre [96], Horn and Johnson [57], and Bertin
[12]. These notions of linear algebra are nicely put to use in classical geometry, see Berger
[8, 9], Tisseron [109] and Dieudonne [28].
156
CHAPTER 5. DETERMINANTS
Chapter 6
Gaussian Elimination,
LU -Factorization, Cholesky
Factorization, Reduced Row Echelon
Form
6.1
Curve interpolation is a problem that arises frequently in computer graphics and in robotics
(path planning). There are many ways of tackling this problem and in this section we will
describe a solution using cubic splines. Such splines consist of cubic Bezier curves. They
are often used because they are cheap to implement and give more flexibility than quadratic
Bezier curves.
A cubic Bezier curve C(t) (in R2 or R3 ) is specified by a list of four control points
(b0 , b2 , b2 , b3 ) and is given parametrically by the equation
C(t) = (1 t)3 b0 + 3(1 t)2 t b1 + 3(1 t)t2 b2 + t3 b3 .
Clearly, C(0) = b0 , C(1) = b3 , and for t [0, 1], the point C(t) belongs to the convex hull of
the control points b0 , b1 , b2 , b3 . The polynomials
(1 t)3 ,
3(1 t)2 t,
3(1 t)t2 ,
t3
157
158
b1
b2
b3
b0
b1
b3
b0
b2
Figure 6.2: A Bezier curve with an inflexion point
159
b2
b1
b0
b3
and dN +1 = xN
160
x2
d7
x1
d3
x3
d0
d6
x6
x4
d4
x5
d5
x0 = d1
x7 = d8
Figure 6.4: A C 2 cubic interpolation spline curve passing through the points x0 , x1 , x2 , x3 ,
x4 , x5 , x6 , x7
It can be shown that d1 , . . . , dN 1 are given by the linear system
7
3
1
d
d
6x
1
0
1
2
2
1 4 1
0
6x2
d2
.. .. ..
..
.
.
.
.
.
. . =
.
0
1 4 1 dN 2 6xN 2
7
3
1 2
6xN 1 2 dN
dN 1
We will show later that the above matrix is invertible because it is strictly diagonally
dominant.
161
Once the above system is solved, the Bezier cubics C1 , . . ., CN are determined as follows
(we assume N 2): For 2 i N 1, the control points (bi0 , bi1 , bi2 , bi3 ) of Ci are given by
bi0 = xi1
2
bi1 = di1 +
3
1
i
b2 = di1 +
3
bi3 = xi .
1
di
3
2
di
3
bN
0 = xN 1
1
1
bN
1 = dN 1 + dN
2
2
N
b2 = d N
bN
3 = xN .
We will now describe various methods for solving linear systems. Since the matrix of the
above system is tridiagonal, there are specialized methods which are more efficient than the
general methods. We will discuss a few of these methods.
6.2
162
(2) One does not solve (large) linear systems by computing determinants (using Cramers
formulae). This is because this method requires a number of additions (resp. multiplications) proportional to (n + 1)! (resp. (n + 2)!).
The key idea on which most direct methods (as opposed to iterative methods, that look
for an approximation of the solution) are based is that if A is an upper-triangular matrix,
which means that aij = 0 for 1 j < i n (resp. lower-triangular, which means that
aij = 0 for 1 i < j n), then computing the solution x is trivial. Indeed, say A is an
upper-triangular matrix
a1 1 a1 2
0 a2 2
0
0
A=
0
0
0
0
a1 n2
a2 n2
..
...
.
...
0
0
a1 n1
a1 n
a2 n1
a2 n
..
..
.
.
.
..
..
.
.
an1 n1 an1 n
0
an n
163
new system
2x + y + z = 5
8y 2z = 12
8y + 3z = 14.
This time, we can eliminate the variable y from the third equation by adding the second
equation to the third:
2x + y + z = 5
8y 2z = 12
z = 2.
This last system is upper-triangular. Using back-substitution, we find the solution: z = 2,
y = 1, x = 1.
Observe that we have performed only row operations. The general method is to iteratively
eliminate variables using simple row operations (namely, adding or subtracting a multiple of
a row to another row of the matrix) while simultaneously applying these operations to the
vector b, to obtain a system, M Ax = M b, where M A is upper-triangular. Such a method is
called Gaussian elimination. However, one extra twist is needed for the method to work in
all cases: It may be necessary to permute rows, as illustrated by the following example:
x + y + z =1
x + y + 3z = 1
2x + 5y + 8z = 1.
In order to eliminate x from the second and third row, we subtract the first row from the
second and we subtract twice the first row from the third:
x +
z
=1
2z = 0
3y + 6z = 1.
Now, the trouble is that y does not occur in the second row; so, we cant eliminate y from
the third row by adding or subtracting a multiple of the second row to it. The remedy is
simple: Permute the second and the third row! We get the system:
x +
y + z
=1
3y + 6z = 1
2z = 0,
which is already in triangular form. Another example where some permutations are needed
is:
z = 1
2x + 7y + 2z = 1
4x 6y
= 1.
164
165
k
a1 1 ak1 2 ak1 n
ak2 2 ak2 n
..
..
..
.
.
.
Ak =
.
akk k akk n
..
..
.
.
akn k akn n
Actually, note that
akij = aii j
for all i, j with 1 i k 2 and i j n, since the first k 1 rows remain unchanged
after the (k 1)th step.
We will prove later that det(Ak ) = det(A). Consequently, Ak is invertible. The fact
that Ak is invertible iff A is invertible can also be shown without determinants from the fact
that there is some invertible matrix Mk such that Ak = Mk A, as we will see shortly.
Since Ak is invertible, some entry akik with k i n is nonzero. Otherwise, the last
n k + 1 entries in the first k columns of Ak would be zero, and the first k columns of
Ak would yield k vectors in Rk1 . But then, the first k columns of Ak would be linearly
dependent and Ak would not be invertible, a contradiction.
So, one the entries akik with k i n can be chosen as pivot, and we permute the kth
row with the ith row, obtaining the matrix k = (jk l ). The new pivot is k = kk k , and we
zero the entries i = k + 1, . . . , n in column k by adding ikk /k times row k to row i. At
the end of this step, we have Ak+1 . Observe that the first k 1 rows of Ak are identical to
the first k 1 rows of Ak+1 .
It is easy to figure out what kind of matrices perform the elementary row operations
used during Gaussian elimination. The key point is that if A = P B, where A, B are m n
matrices and P is a square matrix of dimension m, if (as usual) we denote the rows of A and
B by A1 , . . . , Am and B1 , . . . , Bm , then the formula
aij =
m
X
pik bkj
k=1
giving the (i, j)th entry in A shows that the ith row of A is a linear combination of the rows
of B:
Ai = pi1 B1 + + pim Bm .
Therefore, multiplication of a matrix on the left by a square matrix performs row operations. Similarly, multiplication of a matrix on the right by a square matrix performs column
operations
166
The permutation of the kth row with the ith row is achieved by multiplying A on the left
by the transposition matrix P (i, k), which is the matrix obtained from the identity matrix
by permuting rows i and k, i.e.,
1
1
0
1
.
..
P (i, k) =
.
1
0
1
1
Observe that det(P (i, k)) = 1. Furthermore, P (i, k) is symmetric (P (i, k)> = P (i, k)), and
P (i, k)1 = P (i, k).
During the permutation step (2), if row k and row i need to be permuted, the matrix A
is multiplied on the left by the matrix Pk such that Pk = P (i, k), else we set Pk = I.
Adding times row j to row i is achieved by multiplying A on the left by the elementary
matrix ,
Ei,j; = I + ei j ,
where
(ei j )k l =
i.e.,
Ei,j;
1 if k = i and l = j
0 if k 6= i or l =
6 j,
1
1
..
1
1
or Ei,j;
1
1
..
=
.
On the left, i > j, and on the right, i < j. Observe that the inverse of Ei,j; = I + ei j is
Ei,j; = I ei j and that det(Ei,j; ) = 1. Therefore, during step 3 (the elimination step),
the matrix A is multiplied on the left by a product Ek of matrices of the form Ei,k;i,k , with
i > k.
167
168
k = 1, . . . , n,
where k is the pivot obtained after k 1 elimination steps. Therefore, the kth pivot is given
by
if k = 2, . . . , n.
det(A[1..k 1, 1..k 1])
Proof. First, assume that A = LU is an LU -factorization of A. We can write
A[1..k, 1..k] A2
L1 0
U1 U2
L1 U1
L1 U2
A=
=
=
,
A3
A4
L3 L4
0 U4
L3 U1 L3 U2 + L4 U4
169
k = 1, . . . , n,
as claimed.
Remark: The use of determinants in the first part of the proof of Proposition 6.2 can be
avoided if we use the fact that a triangular matrix is invertible iff all its diagonal entries are
nonzero.
Corollary 6.3. (LU -Factorization) Let A be an invertible n n-matrix. If every matrix
A[1..k, 1..k] is invertible for k = 1, . . . , n, then Gaussian elimination requires no pivoting
and yields an LU -factorization A = LU .
170
Proof. We proved in Proposition 6.2 that in this case Gaussian elimination requires no
pivoting. Then, since every elementary matrix Ei,k; is lower-triangular (since we always
1
arrange that the pivot k occurs above the rows that it operates on), since Ei,k;
= Ei,k;
and the Ek0 s are products of Ei,k;i,k s, from
En1 E2 E1 A = U,
where U is an upper-triangular matrix, we get
A = LU,
1
where L = E11 E21 En1
is a lower-triangular matrix. Furthermore, as the diagonal
entries of each Ei,k; are 1, the diagonal entries of each Ek are also 1.
2 1
4 3
8 7
6 7
3 1 2 1
=
9 5 4 3
9 8
3 4
0 0
2 1 1 0
0 0
0 1 1 1 .
1 0 0 0 2 2
1 1
0 0 0 2
One of the main reasons why the existence of an LU -factorization for a matrix A is
interesting is that if we need to solve several linear systems Ax = b corresponding to the
same matrix A, we can do this cheaply by solving the two triangular systems
Lw = b,
and U x = w.
171
The following easy proposition shows that, in principle, A can be premultiplied by some
permutation matrix P , so that P A can be converted to upper-triangular form without using
any pivoting. Permutations are discussed in some detail in Section 20.3, but for now we
just need their definition. A permutation matrix is a square matrix that has a single 1 in
every row and every column and zeros everywhere else. It is shown in Section 20.3 that
every permutation matrix is a product of transposition matrices (the P (i, k)s), and that P
is invertible with inverse P > .
Proposition 6.4. Let A be an invertible n n-matrix. Then, there is some permutation
matrix P so that P A[1..k, 1..k] is invertible for k = 1, . . . , n.
Proof. The case n = 1 is trivial, and so is the case n = 2 (we swap the rows if necessary). If
n 3, we proceed by induction. Since A is invertible, its columns are linearly independent;
in particular, its first n 1 columns are also linearly independent. Delete the last column of
A. Since the remaining n 1 columns are linearly independent, there are also n 1 linearly
independent rows in the corresponding n (n 1) matrix. Thus, there is a permutation
of these n rows so that the (n 1) (n 1) matrix consisting of the first n 1 rows is
invertible. But, then, there is a corresponding permutation matrix P1 , so that the first n 1
rows and columns of P1 A form an invertible matrix A0 . Applying the induction hypothesis
to the (n 1) (n 1) matrix A0 , we see that there some permutation matrix P2 (leaving
the nth row fixed), so that P2 P1 A[1..k, 1..k] is invertible, for k = 1, . . . , n 1. Since A is
invertible in the first place and P1 and P2 are invertible, P1 P2 A is also invertible, and we are
done.
Remark: One can also prove Proposition 6.4 using a clever reordering of the Gaussian
elimination steps suggested by Trefethen and Bau [110] (Lecture 21). Indeed, we know that
if A is invertible, then there are permutation matrices Pi and products of elementary matrices
Ei , so that
An = En1 Pn1 E2 P2 E1 P1 A,
where U = An is upper-triangular. For example, when n = 4, we have E3 P3 E2 P2 E1 P1 A = U .
We can define new matrices E10 , E20 , E30 which are still products of elementary matrices so
that we have
E30 E20 E10 P3 P2 P1 A = U.
Indeed, if we let E30 = E3 , E20 = P3 E2 P31 , and E10 = P3 P2 E1 P21 P31 , we easily verify that
each Ek0 is a product of elementary matrices and that
E30 E20 E10 P3 P2 P1 = E3 (P3 E2 P31 )(P3 P2 E1 P21 P31 )P3 P2 P1 = E3 P3 E2 P2 E1 P1 .
It can also be proved that E10 , E20 , E30 are lower triangular (see Theorem 6.5).
In general, we let
1
1
Ek0 = Pn1 Pk+1 Ek Pk+1
Pn1
,
172
and we have
0
En1
E10 Pn1 P1 A = U,
1
0
0 0
`21 1
0 0
`31 `32 1 0
L=
,
..
..
.. . .
.
. 0
.
.
`n1 `n2 `n3 1
where the kth column of L is the kth column of Ek1 , for k = 1, . . . , n 1.
(3) If En1 Pn1 E1 P1 A = U is the result of Gaussian elimination with some pivoting,
write Ak = Ek1 Pk1 E1 P1 A, and define Ejk , with 1 j n 1 and j k n 1,
such that, for j = 1, . . . , n 2,
Ejj = Ej
Ejk = Pk Ejk1 Pk ,
for k = j + 1, . . . , n 1,
and
n1
En1
= En1 .
Then,
Ejk = Pk Pk1 Pj+1 Ej Pj+1 Pk1 Pk
n1
E1n1 Pn1 P1 A,
U = En1
173
and if we set
P = Pn1 P1
n1 1
L = (E1n1 )1 (En1
) ,
then
P A = LU.
Furthermore,
(Ejk )1 = I + Ejk ,
1 j n 1, j k n 1,
0
0
0
.
.
..
.
. ..
..
.
.
0
0
0
Ejk =
k
0
j+1j 0
. .
..
..
.. ..
.
.
k
0 `nj 0
..
.
0
..
.
0
,
0
. . ..
. .
0
we have
Ejk = I Ejk ,
and
Ejk = Pk Ejk1 ,
1 j n 2, j + 1 k n 1,
0
k
21
k
31
..
k = k .
k+11
k
k+21
.
..
kn1
0
0
0
0 0
.
..
0
0
0
0 ..
. 0
.
.
.
..
.. 0
k32
0
0 ..
..
..
.. ..
...
.
0
0 .
. .
k
k
k+12 k+1k 0 0
..
k
k
. 0
k+22 k+2k 0
..
..
.. .. . . ..
..
.
. .
.
.
. .
k
k
n2 nk 0 0
174
0 0
`1 0
21
1 = .. .. . .
. .
.
1
`n1 0
0
0
.. ,
.
0
0 k1
21
0
k1
31
.
..
1
0
k = (I + k )Ek I =
0 k1
k1
0 k1
k+11
..
.
0 k1
n1
0
..
.
...
0 k1
0
0
0 k1
32
..
.
0 k1
k2
0 k1
k k1
0
k1
k
k+12 k+1
k1 `k+1k
..
..
..
..
.
.
.
.
0 k1
0 k1
k
n2
n k1
`nk
..
. 0
..
. 0
.. ..
. .
,
0
..
. 0
.. . . ..
. .
.
0
..
.
..
.
..
.
with Pk = I or Pk = P (k, i) for some i > k. This means that in assembling L, row k
and row i of k1 need to be permuted when a pivoting step permuting row k and row
i of Ak is required. Then
I + k = (E1k )1 (Ekk )1
k = E1k Ekk ,
175
1
(2) When P = I, we have L = E11 E21 En1
, where Ek is the product of n k
elementary matrices of the form Ei,k;`i , where Ei,k;`i subtracts `i times row k from row i,
with `ik = akik /akkk , 1 k n 1, and k + 1 i n. Then, it is immediately verified that
1
0
0 0
..
.. .. ..
.. . . .
.
. . .
.
1
0 0
0
Ek =
,
0 `k+1k 1 0
. .
..
.. . . ..
.. ..
. .
.
.
0 `nk 0 1
and that
1
..
.
0
=
0
.
..
0
Ek1
If we define Lk by
0
..
..
.
.
1
`k+1k
..
..
.
.
`nk
0 0
.. .. ..
. . .
0 0
.
1 0
.. . . ..
. .
.
0 1
`21
1
0
0
.
..
`31
`32
0
..
Lk = ..
...
.
1
.
0
0
0
..
.
..
.
..
.
..
.
0
1
.
0 ..
0
0
1
2 k n 1,
because multiplication on the right by Ek1 adds `i times column i to column k (of the matrix
Lk1 ) with i > k, and column i of Lk1 has only the nonzero entry 1 as its ith element.
Since
Lk = E11 Ek1 , 1 k n 1,
k = 1, . . . , n 2.
176
Now, if k 2,
Ak+1 = Ek Pk Ak ,
k1
= Ek Pk Ek1
(Pk Pk ) (Pk Pk )E2k1 (Pk Pk )E1k1 (Pk Pk )Pk1 P1 A
k1
= Ek (Pk Ek1
Pk ) (Pk E2k1 Pk )(Pk E1k1 Pk )Pk Pk1 P1 A.
Observe that Pk has been moved to the right of the elimination steps. However, by
definition,
Ejk = Pk Ejk1 Pk ,
j = 1, . . . , k 1
Ekk = Ek ,
so we get
k
Ak+1 = Ekk Ek1
E2k E1k Pk P1 A,
n1
U = An1 = En1
E1n1 Pn1 P1 A,
k = j + 1, . . . , n 1,
n1
since En1
= En1 and Pk1 = Pk , we get (Ejj )1 = Ej1 for j = 1, . . . , n 1, and for
j = 1, . . . , n 2, we have
(Ejk )1 = Pk (Ejk1 )1 Pk ,
k = j + 1, . . . , n 1.
Since
(Ejk1 )1 = I + Ejk1
177
0
..
.
0
Ejk =
0
.
..
0
and that
Ejk = Pk Ejk1 ,
1 j n 2, j + 1 k n 1.
0
0 0
..
.. .. ..
..
.
.
. . .
0
0 0
,
`kj+1j 0 0
..
..
.. . . ..
. .
.
.
.
`knj 0 0
1 j n 2, j + 1 k n 1,
=
0
.
..
0
Ejk1
0
..
..
.
.
0
k1
`j+1j
..
..
.
.
`k1
nj
0 0
.. .. ..
. . .
0 0
,
0 0
.. . . ..
. .
.
0 0
178
We also know that multiplying (Ejk1 )1 on the left by P (k, i) will permute rows i and
k, which shows that Ejk has the desired form, as claimed. Since all Ejk are strictly lower
triangular, all (Ejk )1 = I + Ejk are lower triangular, so the product
n1 1
L = (E1n1 )1 (En1
)
for k = 1, . . . , n 1.
1
0 0
`1 1 0
21
E1 = ..
.. . . .. .
.
. .
.
1
`n1 0 1
We get
(E11 )1
1 0 0
`1 1 0
21
= .. .. . . .. = I + 1 ,
. .
. .
`1n1 0 1
Since (E11 )1 = I + E11 , we also get 1 = E11 , and the base step holds.
Since (Ejk )1 = I + Ejk with
0
..
.
0
k
Ej =
0
.
..
0
0
..
..
.
.
0
`kj+1j
..
..
.
.
k
`nj
0 0
.. .. ..
. . .
0 0
,
0 0
.. . . ..
. .
.
0 0
2 k n.
()
179
Similarly, from the fact that Ejk1 P (k, i) = Ejk1 if i k + 1 and j k 1 and since
(Ejk )1 = I + Pk Ejk1 ,
1 j n 2, j + 1 k n 1,
we get
k1
k
(E1k )1 (Ek1
)1 = I + Pk E1k1 Ek1
,
2 k n 1.
()
However, by definition
I + k = (I + Pk k1 )Ek1 ,
which proves that
k
I + k = (E1k )1 (Ek1
)1 (Ekk )1 ,
()
and finishes the induction step for the proof of this formula.
If we apply equation () again with k + 1 in place of k, we have
(E1k )1 (Ekk )1 = I + E1k Ekk ,
and together with (), we obtain,
k = E1k Ekk ,
also finishing the induction step for the proof of this formula. For k = n 1 in (), we obtain
the desired equation: L = I + n1 .
Part (3) of Theorem 6.5 shows the remarkable fact that in assembling the matrix L while
performing Gaussian elimination with pivoting, the only change to the algorithm is to make
the same transposition on the rows of L (really k , since the ones are not altered) that we
make on the rows of A (really Ak ) during a pivoting step involving row k and row i. We
can also assemble P by starting with the identity matrix and applying to P the same row
transpositions that we apply to A and . Here is an example illustrating this method.
180
1
2 3 4
4
8 12 8
.
A=
2
3
2
1
3 1 1 4
We set P0 = I4 , and we can also set 0 = 0. The first step is to permute row 1 and row 2,
using the pivot 4. We also apply this permutation to P0 :
4
8 12 8
1
2 3 4
A01 =
2
3
2
1
3 1 1 4
0
1
P1 =
0
0
1
0
0
0
0
0
1
0
0
0
.
0
1
Next, we subtract 1/4 times row 1 from row 2, 1/2 times row 1 from row 3, and add 3/4
times row 1 to row 4, and start assembling :
4 8 12 8
0 0 6 6
A2 =
0 1 4 5
0 5 10 10
Next we permute row
and P :
4 8
0 5
A03 =
0 1
0 0
0
1/4
1 =
1/2
3/4
0
0
0
0
0
0
0
0
0
0
0
0
0
1
P1 =
0
0
1
0
0
0
0
0
1
0
0
0
.
0
1
12 8
10 10
4 5
6 6
3/4
02 =
1/2
1/4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
P2 =
0
1
1
0
0
0
0
0
1
0
0
1
.
0
0
4
0
A3 =
0
0
8
5
0
0
12 8
10 10
2 3
6 6
0
0
0
3/4
0
0
2 =
1/2 1/5 0
1/4
0
0
0
0
0
0
0
0
P2 =
0
1
1
0
0
0
0
0
1
0
0
1
.
0
0
Next we permute row 3 and row 4, using the pivot 6. We also apply this permutation to
and P :
4 8 12 8
0
0
0 0
0 1 0 0
0 5 10 10
0
0 0
03 = 3/4
P3 = 0 0 0 1 .
A04 =
0 0 6 6
1/4
1 0 0 0
0
0 0
0 0 2 3
1/2 1/5 0 0
0 0 1 0
181
4 8 12 8
0
0
0 0
0 1
0 5 10 10
3/4
0 0
0
0
0
P3 =
A4 =
0 0 6 6 3 = 1/4
1 0
0
0 0
0 0 0
1
1/2 1/5 1/3 0
0 0
Consequently, adding the identity to 3 , we obtain
1
0
0 0
4 8 12 8
3/4
1
0 0
, U = 0 5 10 10 ,
L=
1/4
0
1 0
0 0 6 6
1/2 1/5 1/3 1
0 0 0
1
0
0
P =
1
0
1
0
0
0
0
0
0
1
0
1
.
0
0
0
0
0
1
0
1
.
0
0
We check that
0
0
PA =
1
0
1
0
0
0
0
0
0
1
0
1
2 3 4
4
8 12 8
1
8 12 8
4
= 3 1 1 4 ,
0 2
3
2
1 1
2 3 4
0
3 1 1 4
2
3
2
1
and that
1
0
0
3/4
1
0
LU =
1/4
0
1
1/2 1/5 1/3
0
4
0
0
0 0
1
0
8 12 8
4
8 12 8
5 10 10
= 3 1 1 4 = P A.
0 6 6
1
2 3 4
0 0
1
2
3
2
1
Note that if one willing to overwrite the lower triangular part of the evolving matrix A,
one can store the evolving there, since these entries will eventually be zero anyway! There
is also no need to save explicitly the permutation matrix P . One could instead record the
permutation steps in an extra column (record the vector ((1), . . . , (n)) corresponding to
the permutation applied to the rows). We let the reader write such a bold and spaceefficient version of LU -decomposition!
As a corollary of Theorem 6.5(1), we can show the following result.
Proposition 6.6. If an invertible symmetric matrix A has an LU -decomposition, then A
has a factorization of the form
A = LDL> ,
where L is a lower-triangular matrix whose diagonal entries are equal to 1, and where D
consists of the pivots. Furthermore, such a decomposition is unique.
Proof. If A has an LU -factorization, then it has an LDU factorization
A = LDU,
182
y
=
1
4
(1 10 )y = 2 104 .
104
,
104 1
y=
104 2
.
104 1
However, if roundoff takes place on the fourth digit, then 104 1 = 9999 and 104 2 = 9998
will be rounded off both to 9990, and then the solution is x = 0 and y = 1, very far from the
exact solution where x 1 and y 1. The problem is that we picked a very small pivot. If
instead we permute the equations, the pivot is 1, and after elimination, we get the system
x +
y
=
2
4
(1 10 )y = 1 2 104 .
This time, 1 104 = 0.9999 and 1 2 104 = 0.9998 are rounded off to 0.999 and the
solution is x = 1, y = 1, much closer to the exact solution.
To remedy this problem, one may use the strategy of partial pivoting. This consists of
choosing during step k (1 k n 1) one of the entries akik such that
|akik | = max |akp k |.
kpn
By maximizing the value of the pivot, we avoid dividing by undesirably small pivots.
183
n
X
i=1, i6=j
|ai j |,
for j = 1, . . . , n
n
X
j=1, j6=i
|ai j |,
for i = 1, . . . , n.)
It has been known for a long time (before 1900, say by Hadamard) that if a matrix A
is strictly column diagonally dominant (resp. strictly row diagonally dominant), then it is
invertible. (This is a good exercise, try it!) It can also be shown that if A is strictly column
diagonally dominant, then Gaussian elimination with partial pivoting does not actually require pivoting (See Problem 21.6 in Trefethen and Bau [110], or Question 2.19 in Demmel
[27]).
Another strategy, called complete pivoting, consists in choosing some entry akij , where
k i, j n, such that
|akij | = max |akp q |.
kp,qn
However, in this method, if the chosen pivot is not in column k, it is also necessary to
permute columns. This is achieved by multiplying on the right by a permutation matrix.
However, complete pivoting tends to be too expensive in practice, and partial pivoting is the
method of choice.
A special case where the LU -factorization is particularly efficient is the case of tridiagonal
matrices, which we now consider.
6.3
b1 c 1
a2 b 2
c2
a3
b3
c3
.
.
.
.
.
.
A=
.
.
.
.
1 = b1 ,
k = bk k1 ak ck1 k2 ,
2 k n.
184
Proposition 6.7. If A is the tridiagonal matrix above, then k = det(A[1..k, 1..k]) for
k = 1, . . . , n.
Proof. By expanding det(A[1..k, 1..k]) with respect to its last row, the proposition follows
by induction on k.
Theorem 6.8. If A is the tridiagonal matrix above and k 6= 0 for k = 1, . . . , n, then A has
the following LU -factorization:
1
0
1
a2
1
a3
A=
1
..
..
.
n3
an1
n2
1
an
n2
n1
c1
2
1
c2
3
2
c3
..
..
.
n1
n2
cn1
n
n1
2 k n,
since k = bk k1 ak ck1 k2 .
It follows that there is a simple method to solve a linear system Ax = d where A is
tridiagonal (and k 6= 0 for k = 1, . . . , n). For this, it is convenient to squeeze the diagonal
matrix defined such that k k = k /k1 into the factorization so that A = (L)(1 U ),
and if we let
z1 =
c1
,
b1
zk = ck
k1
,
k
2 k n 1,
zn =
n
n1
= bn an zn1 ,
185
z1
a2
A=
c2
z2
a3
c3
z3
..
.
..
an1
cn1
zn1
an
1 z1
zn
z2
1
z3
..
..
zn2
1
zn1
1
As a consequence, the system Ax = d can be solved by constructing three sequences: First,
the sequence
z1 =
c1
,
b1
zk =
ck
,
bk ak zk1
k = 2, . . . , n 1,
zn = bn an zn1 ,
d1
,
b1
wk =
dk ak wk1
,
bk ak zk1
k = 2, . . . , n,
xk = wk zk xk+1 ,
k = n 1, n 2, . . . , 1,
186
(1) The matrix A is invertible. (Indeed, if Ax = 0, then x> Ax = 0, which implies x = 0.)
(2) We have ai i > 0 for i = 1, . . . , n. (Just observe that for x = ei , the ith canonical basis
vector of Rn , we have e>
i Aei = ai i > 0.)
(3) For every n n invertible matrix Z, the matrix Z > AZ is symmetric positive definite
iff A is symmetric positive definite.
Next, we prove that a symmetric positive definite matrix has a special LU -factorization
of the form A = BB > , where B is a lower-triangular matrix whose diagonal elements are
strictly positive. This is the Cholesky factorization.
6.4
First, we note that a symmetric positive definite matrix satisfies the condition of Proposition
6.2.
Proposition 6.9. If A is a symmetric positive definite matrix, then A[1..k, 1..k] is symmetric
positive definite, and thus invertible for k = 1, . . . , n.
Proof. Since A is symmetric, each A[1..k, 1..k] is also symmetric. If w Rk , with 1 k n,
we let x Rn be the vector with xi = wi for i = 1, . . . , k and xi = 0 for i = k + 1, . . . , n.
Now, since A is symmetric positive definite, we have x> Ax > 0 for all x Rn with x 6= 0.
This holds in particular for all vectors x obtained from nonzero vectors w Rk as defined
earlier, and clearly
x> Ax = w> A[1..k, 1..k] w,
which implies that A[1..k, 1..k] is positive definite. Thus, A[1..k, 1..k] is also invertible.
Proposition 6.9 can be strengthened as follows: A symmetric matrix A is positive definite
iff det(A[1..k, 1..k]) > 0 for k = 1, . . . , n.
The above fact is known as Sylvesters criterion. We will prove it after establishing the
Cholesky factorization.
Let A be an n n symmetric positive definite matrix and write
a1 1 W >
A=
,
W
C
where C is an (n 1) (n 1) symmetric matrix and W is an (n 1) 1 matrix. Since A
is symmetric positive definite, a1 1 > 0, and we can compute = a1 1 . The trick is that we
can factor A uniquely as
a1 1 W >
0
1
0
W > /
A=
=
,
W
C
W/ I
0 C W W > /a1 1
0
I
i.e., as A = B1 A1 B1> , where B1 is lower-triangular with positive diagonal entries. Thus, B1
is invertible, and by fact (3) above, A1 is also symmetric positive definite.
187
0
1
0
W > /
a1 1 W >
=
A=
= B1 A1 B1> ,
W
C
W/ I
0 C W W > /a1 1
0
I
A =
=
=
=
W/
W/
W/
W/
1
0
0
1
I
0
0
1
I
0
0
L
0
0
I
0
W > /
C W W > /a1 1
0
I
>
0
W /
>
LL
0
I
0
1 0
W > /
L
0 L>
0
I
W > /
.
L>
Therefore, if we let
B=
0
,
W/ L
we have a unique lower-triangular matrix with positive diagonal entries and A = BB > .
The uniqueness of the Cholesky decomposition can also be established using the uniqueness of an LU decomposition. Indeed, if A = B1 B1> = B2 B2> where B1 and B2 are lower
triangular with positive diagonal entries, if we let 1 (resp. 2 ) be the diagonal matrix
consisting of the diagonal entries of B1 (resp. B2 ) so that (k )ii = (Bk )ii for k = 1, 2, then
we have two LU decompositions
>
A = (B1 11 )(1 B1> ) = (B2 1
2 )(2 B2 )
188
1
>
>
with B1 1
1 , B2 2 unit lower triangular, and 1 B1 , 2 B2 upper triangular. By uniquenes
of LU factorization (Theorem 6.5(1)), we have
1
B1 1
1 = B2 2 ,
1 B1> = 2 B2> ,
()
The diagonal entries of B1 1 are (B1 )2ii and similarly the diagonal entries of B2 2 are (B2 )2ii ,
so the above equation implies that
(B1 )2ii = (B2 )2ii ,
i = 1, . . . , n.
Since the diagonal entries of both B1 and B2 are assumed to be positive, we must have
(B1 )ii = (B2 )ii ,
i = 1, . . . , n;
that is, 1 = 2 , and since both are invertible, we conclude from () that B1 = B2 .
The proof of Theorem 6.10 immediately yields an algorithm to compute B from A by
solving for a lower triangular matrix B such that A = BB > . For j = 1, . . . , n,
!1/2
j1
X
b j j = aj j
b2j k
,
k=1
ai j
j1
X
!
bi k bj k
/bj j .
k=1
The above formulae are used to compute the jth column of B from top-down, using the first
j 1 columns of B previously computed, and the matrix A.
Remark: It can be shown that this methods requires n3 /6 + O(n2 ) additions, n3 /6 + O(n2 )
multiplications, n2 /2+O(n) divisions, and O(n) square root extractions. Thus, the Cholesky
method requires half of the number of operations required by Gaussian elimination (since
Gaussian elimination requires n3 /3 + O(n2 ) additions, n3 /3 + O(n2 ) multiplications, and
n2 /2 + O(n) divisions). It also requires half of the space (only B is needed, as opposed to
both L and U ). Furthermore, it can be shown that Choleskys method is numerically stable
(see Trefethen and Bau [110], Lecture 23).
Remark: If A = BB > , where B is any invertible matrix, then A is symmetric positive
definite.
189
Proof. Obviously, BB > is symmetric, and since B is invertible, B > is invertible, and from
x> Ax = x> BB > x = (B > x)> B > x,
it is clear that x> Ax > 0 if x 6= 0.
We now give three more criteria for a symmetric matrix to be positive definite.
Proposition 6.11. Let A be any n n symmetric matrix. The following conditions are
equivalent:
(a) A is positive definite.
(b) All principal minors of A are positive; that is: det(A[1..k, 1..k]) > 0 for k = 1, . . . , n
(Sylvesters criterion).
(c) A has an LU -factorization and all pivots are positive.
(d) A has an LDL> -factorization and all pivots in D are positive.
Proof. By Proposition 6.9, if A is symmetric positive definite, then each matrix A[1..k, 1..k] is
symmetric positive definite for k = 1, . . . , n. By the Cholsesky decomposition, A[1..k, 1..k] =
Q> Q for some invertible matrix Q, so det(A[1..k, 1..k]) = det(Q)2 > 0. This shows that (a)
implies (b).
If det(A[1..k, 1..k]) > 0 for k = 1, . . . , n, then each A[1..k, 1..k] is invertible. By Proposition 6.2, the matrix A has an LU -factorization, and since the pivots k are given by
if k = 2, . . . , n,
det(A[1..k 1, 1..k 1])
we see that k > 0 for k = 1, . . . , n. Thus (b) implies (c).
Assume A has an LU -factorization and that the pivots are all positive. Since A is
symmetric, this implies that A has a factorization of the form
A = LDL> ,
with L lower-triangular with 1s on its diagonal, and where D is a diagonal matrix with
positive entries on the diagonal (the pivots). This shows that (c) implies (d).
Given a factorization A = LDL> with all pivots in D positive, if we form the diagonal
matrix
D = diag( 1 , . . . , n )
190
Criterion (c) yields a simple computational test to check whether a symmetric matrix is
positive definite. There is one more criterion for a symmetric matrix to be positive definite:
its eigenvalues must be positive. We will have to learn about the spectral theorem for
symmetric matrices to establish this criterion.
For more on the stability analysis and efficient implementation methods of Gaussian
elimination, LU -factoring and Cholesky factoring, see Demmel [27], Trefethen and Bau [110],
Ciarlet [24], Golub and Van Loan [49], Meyer [80], Strang [104, 105], and Kincaid and Cheney
[63].
6.5
Gaussian elimination described in Section 6.2 can also be applied to rectangular matrices.
This yields a method for determining whether a system Ax = b is solvable, and a description
of all the solutions when the system is solvable, for any rectangular m n matrix A.
It turns out that the discussion is simpler if we rescale all pivots to be 1, and for this we
need a third kind of elementary matrix. For any 6= 0, let Ei, be the n n diagonal matrix
...
Ei, =
,
..
.
1
with (Ei, )ii = (1 i n). Note that Ei, is also given by
Ei, = I + ( 1)ei i ,
and that Ei, is invertible with
1
Ei,
= Ei,1 .
191
The result is that after performing such elimination steps, we obtain a matrix that has
a special shape known as a reduced row echelon matrix . Here is an example illustrating this
process: Starting from the matrix
1 0 2 1 5
A1 = 1 1 5 2 7
1 2 8 4 12
we perform the following steps
1 0 2 1 5
A1 A2 = 0 1 3 1 2 ,
0 2 6 3 7
by subtracting row
1 0 2
A2 0 2 6
0 1 3
1 5
1 0 2
3 7 0 1 3
1 2
0 1 3
3;
1
5
1 0 2
1
5
3/2 7/2 A3 = 0 1 3 3/2
7/2 ,
1
2
0 0 0 1/2 3/2
1 0 2 1
5
1
A3 0 1 3 3/2 7/2 A4 = 0
0 0 0 1
3
0
after dividing row 3 by 1/2, subtracting row 3 from row 1, and subtracting (3/2) row 3
from row 2.
It is clear that columns 1, 2 and 4 are linearly independent, that column 3 is a linear
combination of columns 1 and 2, and that column 5 is a linear combinations of columns
1, 2, 4.
In general, the sequence of steps leading to a reduced echelon matrix is not unique. For
example, we could have chosen 1 instead of 2 as the second pivot in matrix A2 . Nevertherless,
the reduced row echelon matrix obtained from any given matrix is unique; that is, it does
not depend on the the sequence of steps that are followed during the reduction process. This
fact is not so easy to prove rigorously, but we will do it later.
If we want to solve a linear system of equations of the form Ax = b, we apply elementary
row operations to both the matrix A and the right-hand side b. To do this conveniently, we
form the augmented matrix (A, b), which is the m (n + 1) matrix obtained by adding b as
an extra column to the matrix A. For example if
1 0 2 1
5
A = 1 1 5 2 and b = 7 ,
1 2 8 4
12
192
1 0 2 1 5
(A, b) = 1 1 5 2 7 .
1 2 8 4 12
Now, for any matrix M , since
M (A, b) = (M A, M b),
performing elementary row operations on (A, b) is equivalent to simultaneously performing
operations on both A and b. For example, consider the system
x1
+ 2x3 + x4 = 5
x1 + x2 + 5x3 + 2x4 = 7
x1 + 2x2 + 8x3 + 4x4 = 12.
Its augmented matrix is the matrix
1 0 2 1 5
(A, b) = 1 1 5 2 7
1 2 8 4 12
considered above, so the reduction steps applied to this matrix yield the system
x1
x2
+ 2x3
+ 3x3
x4
= 2
= 1
= 3.
This reduced system has the same set of solutions as the original, and obviously x3 can be
chosen arbitrarily. Therefore, our system has infinitely many solutions given by
x1 = 2 2x3 ,
x2 = 1 3x3 ,
x4 = 3,
where x3 is arbitrary.
The following proposition shows that the set of solutions of a system Ax = b is preserved
by any sequence of row operations.
Proposition 6.12. Given any m n matrix A and any vector b Rm , for any sequence
of elementary row operations E1 , . . . , Ek , if P = Ek E1 and (A0 , b0 ) = P (A, b), then the
solutions of Ax = b are the same as the solutions of A0 x = b0 .
Proof. Since each elementary row operation Ei is invertible, so is P , and since (A0 , b0 ) =
P (A, b), then A0 = P A and b0 = P b. If x is a solution of the original system Ax = b, then
multiplying both sides by P we get P Ax = P b; that is, A0 x = b0 , so x is a solution of the
new system. Conversely, assume that x is a solution of the new system, that is A0 x = b0 .
Then, because A0 = P A, b0 = P B, and P is invertible, we get
Ax = P 1 A0 x = P 1 b0 = b,
so x is a solution of the original system Ax = b.
193
1 6 0 1
A = 0 0 1 2
0 0 0 0
is a reduced row echelon matrix.
The following proposition shows that every matrix can be converted to a reduced row
echelon form using row operations.
194
0 0 1
0 0 0
A1 = ..
.. .. ..
.. ,
.
. . .
.
0 0 0
where stands for an arbitrary scalar, or more concisely
0 1 B
A1 =
,
0 0 D
where D is a (m 1) (n j) matrix. If j = n, we are done. Otherwise, by the induction
hypothesis applied to D, there is a sequence of row operations that converts D to a reduced
row echelon matrix R0 , and these row operations do not affect the first row of A1 , which
means that A1 is reduced to a matrix of the form
0 1 B
R=
.
0 0 R0
Because R0 is a reduced row echelon matrix, the matrix R satisfies conditions (a) and (b) of
the reduced row echelon form. Finally, the entries above all pivots in R0 can be cleared out
by subtracting suitable multiples of the rows of R0 containing a pivot. The resulting matrix
also satisfies condition (c), and the induction step is complete.
Remark: There is a Matlab function named rref that converts any matrix to its reduced
row echelon form.
If A is any matrix and if R is a reduced row echelon form of A, the second part of
Proposition 6.13 can be sharpened a little. Namely, the rank of A is equal to the number of
pivots in R.
This is because the structure of a reduced row echelon matrix makes it clear that its rank
is equal to the number of pivots.
195
Given a system of the form Ax = b, we can apply the reduction procedure to the augmented matrix (A, b) to obtain a reduced row echelon matrix (A0 , b0 ) such that the system
A0 x = b0 has the same solutions as the original system Ax = b. The advantage of the reduced
system A0 x = b0 is that there is a simple test to check whether this system is solvable, and
to find its solutions if it is solvable.
Indeed, if any row of the matrix A0 is zero and if the corresponding entry in b0 is nonzero,
then it is a pivot and we have the equation
0 = 1,
which means that the system A0 x = b0 has no solution. On the other hand, if there is no
pivot in b0 , then for every row i in which b0i 6= 0, there is some column j in A0 where the
entry on row i is 1 (a pivot). Consequently, we can assign arbitrary values to the variable
xk if column k does not contain a pivot, and then solve for the pivot variables.
For example, if we consider the reduced row
1 6
0 0
(A , b ) = 0 0
0 0
echelon matrix
0 1 0
1 2 0 ,
0 0 1
1 6
0 0
(A , b ) = 0 0
0 0
0 1 1
1 2 3
0 0 0
has solutions. We can pick the variables x2 , x4 corresponding to nonpivot columns arbitrarily,
and then solve for x3 (using the second equation) and x1 (using the first equation).
The above reasoning proved the following theorem:
Theorem 6.15. Given any system Ax = b where A is a m n matrix, if the augmented
matrix (A, b) is a reduced row echelon matrix, then the system Ax = b has a solution iff there
is no pivot in b. In that case, an arbitrary value can be assigned to the variable xj if column
j does not contain a pivot.
Nonpivot variables are often called free variables.
Putting Proposition 6.14 and Theorem 6.15 together we obtain a criterion to decide
whether a system Ax = b has a solution: Convert the augmented system (A, b) to a row
reduced echelon matrix (A0 , b0 ) and check whether b0 has no pivot.
Remark: When writing a program implementing row reduction, we may stop when the last
column of the matrix A is reached. In this case, the test whether the system Ax = b is
196
solvable is that the row-reduced matrix A0 has no zero row of index i > r such that b0i 6= 0
(where r is the number of pivots, and b0 is the row-reduced right-hand side).
If we have a homogeneous system Ax = 0, which means that b = 0, of course x = 0 is
always a solution, but Theorem 6.15 implies that if the system Ax = 0 has more variables
than equations, then it has some nonzero solution (we call it a nontrivial solution).
Proposition 6.16. Given any homogeneous system Ax = 0 of m equations in n variables,
if m < n, then there is a nonzero vector x Rn such that Ax = 0.
Proof. Convert the matrix A to a reduced row echelon matrix A0 . We know that Ax = 0 iff
A0 x = 0. If r is the number of pivots of A0 , we must have r m, so by Theorem 6.15 we may
assign arbitrary values to n r > 0 nonpivot variables and we get nontrivial solutions.
Theorem 6.15 can also be used to characterize when a square matrix is invertible. First,
note the following simple but important fact:
If a square n n matrix A is a row reduced echelon matrix, then either A is the identity
or the bottom row of A is zero.
Proposition 6.17. Let A be a square matrix of dimension n. The following conditions are
equivalent:
(a) The matrix A can be reduced to the identity by a sequence of elementary row operations.
(b) The matrix A is a product of elementary matrices.
(c) The matrix A is invertible.
(d) The system of homogeneous equations Ax = 0 has only the trivial solution x = 0.
Proof. First, we prove that (a) implies (b). If (a) can be reduced to the identity by a sequence
of row operations E1 , . . . , Ep , this means that Ep E1 A = I. Since each Ei is invertible,
we get
A = E11 Ep1 ,
where each Ei1 is also an elementary row operation, so (b) holds. Now if (b) holds, since
elementary row operations are invertible, A is invertible, and (c) holds. If A is invertible, we
already observed that the homogeneous system Ax = 0 has only the trivial solution x = 0,
because from Ax = 0, we get A1 Ax = A1 0; that is, x = 0. It remains to prove that (d)
implies (a), and for this we prove the contrapositive: if (a) does not hold, then (d) does not
hold.
Using our basic observation about reducing square matrices, if A does not reduce to the
identity, then A reduces to a row echelon matrix A0 whose bottom row is zero. Say A0 = P A,
where P is a product of elementary row operations. Because the bottom row of A0 is zero,
the system A0 x = 0 has at most n 1 nontrivial equations, and by Proposition 6.16, this
197
A1 = Ep E1 .
From a practical point of view, we can build up the product Ep E1 by reducing to row
echelon form the augmented n 2n matrix (A, In ) obtained by adding the n columns of the
identity matrix to A. This is just another way of performing the GaussJordan procedure.
Here is an example: let us find the inverse of the matrix
5 4
A=
.
6 5
We form the 2 4 block matrix
(A, I) =
5 4 1 0
6 5 0 1
and apply elementary row operations to reduce A to the identity. For example:
5 4 1 0
5 4 1 0
(A, I) =
6 5 0 1
1 1 1 1
by subtracting row 1 from row 2,
1 0 5 4
5 4 1 0
1 1 1 1
1 1 1 1
= (I, A1 ),
1 1 1 1
0 1 6 5
by subtracting row 1 from row 2. Thus
1
=
5 4
.
6 5
Proposition 6.17 can also be used to give an elementary proof of the fact that if a square
matrix A has a left inverse B (resp. a right inverse B), so that BA = I (resp. AB = I),
then A is invertible and A1 = B. This is an interesting exercise, try it!
For the sake of completeness, we prove that the reduced row echelon form of a matrix is
unique. The neat proof given below is borrowed and adapted from W. Kahan.
198
Proposition 6.18. Let A be any m n matrix. If U and V are two reduced row echelon
matrices obtained from A by applying two sequences of elementary row operations E1 , . . . , Ep
and F1 , . . . , Fq , so that
U = Ep E1 A
and
V = Fq F1 A,
then U = V and Ep E1 = Fq F1 . In other words, the reduced row echelon form of any
matrix is unique.
Proof. Let
C = Ep E1 F11 Fq1
so that
U = CV
and V = C 1 U.
Therefore, we may simplify our task by striking out columns of zeros from U, V , and A,
since they will have corresponding indices. We still use n to denote the number of columns of
A. Observe that because U and V are reduced row echelon matrices with no zero columns,
we must have u1 = v1 = `1 .
Claim. If U and V are reduced row echelon matrices without zero columns such that
U = CV , for all k 1, if k n, then `k occurs in U iff `k occurs in V , and if `k does occurs
in U , then
1. `k occurs for the same index jk in both U and V ;
2. the first jk columns of U and V match;
3. the subsequent columns in U and V (of index > jk ) whose elements beyond the kth
all vanish also match;
4. the first k columns of C match the first k columns of In .
We prove this claim by induction on k.
For the base case k = 1, we already know that u1 = v1 = `1 . We also have
c1 = C`1 = Cv1 = u1 = `1 .
199
200
which means that x1 x0 Ker (A). Therefore, Z x0 + Ker (A), where x0 is a special
solution of Ax = b. Conversely, if Ax0 = b, then for any z Ker (A), we have Az = 0, and
so
A(x0 + z) = Ax0 + Az = b + 0 = b,
which shows that x0 + Ker (A) Z. Therefore, Z = x0 + Ker (A).
Given a linear system Ax = b, reduce the augmented matrix (A, b) to its row echelon
form (A0 , b0 ). As we showed before, the system Ax = b has a solution iff b0 contains no pivot.
Assume that this is the case. Then, if (A0 , b0 ) has r pivots, which means that A0 has r pivots
since b0 has no pivot, we know that the first r columns of In appear in A0 .
We can permute the columns of A0 and renumber the variables in x correspondingly so
that the first r columns of In match the first r columns of A0 , and then our reduced echelon
matrix is of the form (R, b0 ) with
Ir
F
R=
0mr,r 0mr,nr
and
0
b =
0mr
Ir
0mr,r 0mr,nr
we see that
0nr
x0 =
=
d
0mr
,
0nr
201
form a basis of the kernel of A. This is because N contains the identity matrix Inr as a
submatrix, so the columns of N are linearly independent. In summary, if N 1 , . . . , N nr are
the columns of N , then the general solution of the equation Ax = b is given by
d
x=
+ xr+1 N 1 + + xn N nr ,
0nr
where xr+1 , . . . , xn are the free variables; that is, the nonpivot variables.
In the general case where the columns corresponding to pivots are mixed with the columns
corresponding to free variables, we find the special solution as follows. Let i1 < < ir be
the indices of the columns corresponding to pivots. Then, assign b0k to the pivot variable
xik for k = 1, . . . , r, and set all other variables to 0. To find a basis of the kernel, we
form the n r vectors N k obtained as follows. Let j1 < < jnr be the indices of the
columns corresponding to free variables. For every column jk corresponding to a free variable
(1 k n r), form the vector N k defined so that the entries Nik1 , . . . , Nikr are equal to the
negatives of the first r entries in column jk (flip the sign of these entries); let Njkk = 1, and set
all other entries to zero. The presence of the 1 in position jk guarantees that N 1 , . . . , N nr
are linearly independent.
An illustration of the above method, consider the problem of finding a basis of the
subspace V of n n matrices A Mn (R) satisfying the following properties:
1. The sum of the entries in every row has the same value (say c1 );
2. The sum of the entries in every column has the same value (say c2 ).
It turns out that c1 = c2 and that the 2n2 equations corresponding to the above conditions
are linearly independent. We leave the proof of these facts as an interesting exercise. By the
duality theorem, the dimension of the space V of matrices satisying the above equations is
n2 (2n 2). Let us consider the case n = 4. There are 6 equations, and the space V has
dimension 10. The equations are
a11 + a12 + a13 + a14 a21 a22 a23 a24
a21 + a22 + a23 + a24 a31 a32 a33 a34
a31 + a32 + a33 + a34 a41 a42 a43 a44
a11 + a21 + a31 + a41 a12 a22 a32 a42
a12 + a22 + a32 + a42 a13 a23 a33 a43
a13 + a23 + a33 + a43 a14 a24 a34 a44
=0
=0
=0
=0
=0
= 0,
202
1 1
1
0 0
0
0 0
0
A=
1 1 0
0 1 1
0 0
1
The result of
in rref:
1
0
0
U =
0
0
0
matrix is
1 1 1 1 1 0
0
0
0
0
0
0
0
0
1
1
1
1 1 1 1 1 0
0
0
0
0
0
0
0
0
1
1
1
1 1 1 1 1
.
0
1 1 0
0
1 1 0
0
1 1 0
0
0
0
1 1 0
0
1 1 0
0
1 1 0
1 0
0
1 1 0
0
1 1 0
0
1 1
performing the reduction to row echelon form yields the following matrix
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0 1 1 1 0 1 1 1 2
1
1
1
0 1
0
0 0 1
0
0 1 0 1 1
0 0
1
0 0 0
1
0 1 1 0 1
0 0
0
1 0 0
0
1 1 1 1 0
1 1
1
1 0 0
0
0 1 1 1 1
0 0
0
0 1 1
1
1 1 1 1 1
The list pivlist of indices of the pivot variables and the list freelist of indices of the free
variables is given by
pivlist = (1, 2, 3, 4, 5, 9),
freelist = (6, 7, 8, 10, 11, 12, 13, 14, 15, 16).
After applying the algorithm
matrix
1
1
0
BK =
0
0
0
1
1
1
1
1 2 1 1 1
0
0 1 0
0
1
0
1
1
1 0
0 1 0
1
1
0
1
0 1 0
0 1 1
1
1
0
1 1 0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
.
0
0 1 1 1 1
1
1
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
The reader should check that that in each column j of BK, the lowest 1 belongs to the
row whose index is the jth element in freelist, and that in each column j of BK, the signs of
203
the entries whose indices belong to pivlist are the fipped signs of the 6 entries in the column
U corresponding to the jth index in freelist. We can now read off from BK the 44 matrices
that form a basis of V : every column of BK corresponds to a matrix whose rows have been
concatenated. We get the following 10 matrices:
1 1 0 0
1 0 1 0
1 0 0 1
1 1 0 0
1 0 1 0
1 0 0 1
,
M1 =
M
=
M
=
2
3
0
0 0 0 0
0 0 0 0 ,
0 0 0
0
0 0 0
0 0 0 0
0 0 0 0
1 1 0 0
0
0 0 0
M4 =
1 1 0 0 ,
0
0 0 0
2
1
M7 =
1
1
1
1
M10 =
1
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
,
0
0
0
0
.
0
1
1
0
M5 =
1
0
1
1
M8 =
1
0
0 1 0
0 0 0
,
0 1 0
0 0 0
0
0
0
1
1
0
0
0
0
0
0
0
0 1
0 0
,
0 1
0 0
1
0
0
0
0
0
0
1
1
0
M6 =
1
0
1
0
,
0
0
1
1
M9 =
1
0
1
0
,
0
0
Recall that a magic square is a square matrix that satisfies the two conditions about
the sum of the entries in each row and in each column to be the same number, and also
the additional two constraints that the main descending and the main ascending diagonals
add up to this common number. Furthermore, the entries are also required to be positive
integers. For n = 4, the additional two equations are
a22 + a33 + a44 a12 a13 a14 = 0
a41 + a32 + a23 a11 a12 a13 = 0,
and the 8 equations stating that a matrix is a magic square are linearly independent. Again,
by running row elimination, we get a basis of the generalized magic squares whose entries
are not restricted to be positive integers. We find a basis of 8 matrices. For n = 3, we find
a basis of 3 matrices.
A magic square is said to be normal if its entries are precisely the integers 1, 2 . . . , n2 .
Then, since the sum of these entries is
1 + 2 + 3 + + n2 =
n2 (n2 + 1)
,
2
204
and since each row (and column) sums to the same number, this common value (the magic
sum) is
n(n2 + 1)
.
2
It is easy to see that there are no normal magic squares for n = 2. For n = 3, the magic sum
is 15, and for n = 4, it is 34. In the case n = 3, we have the additional condition that the
rows and columns add up to 15, so we end up with a solution parametrized by two numbers
x1 , x2 ; namely,
x1 + x2 5 10 x2
10 x1
20 2x1 x2
5
2x1 + x2 10 .
x1
x2
15 x1 x2
Thus, in order to find a normal magic square, we have the additional inequality constraints
x1 + x 2
x1
x2
2x1 + x2
2x1 + x2
x1
x2
x1 + x2
>5
< 10
< 10
< 20
> 10
>0
>0
< 15,
and all 9 entries in the matrix must be distinct. After a tedious case analysis, we discover the
remarkable fact that there is a unique normal magic square (up to rotations and reflections):
2 7 6
9 5 1 .
4 3 8
It turns out that there are 880 different normal magic squares for n = 4, and 275, 305, 224
normal magic squares for n = 5 (up to rotations and reflections). Even for n = 4, it takes a
fair amount of work to enumerate them all!
Instead of performing elementary row operations on a matrix A, we can perform elementary columns operations, which means that we multiply A by elementary matrices on the
right. As elementary row and column operations, P (i, k), Ei,j; , Ei, perform the following
actions:
1. As a row operation, P (i, k) permutes row i and row k.
2. As a column operation, P (i, k) permutes column i and column k.
3. The inverse of P (i, k) is P (i, k) itself.
205
6.6
In this section, we characterize the linear isomorphisms of a vector space E that leave every
vector in some hyperplane fixed. These maps turn out to be the linear maps that are
represented in some suitable basis by elementary matrices of the form Ei,j; (transvections)
or Ei, (dilatations). Furthermore, the transvections generate the group SL(E), and the
dilatations generate the group GL(E).
Let H be any hyperplane in E, and pick some (nonzero) vector v E such that v
/ H,
so that
E = H Kv.
Assume that f : E E is a linear isomorphism such that f (u) = u for all u H, and that
f is not the identity. We have
f (v) = h + v,
206
then
f (x) = f (y) + f (tv) = y + tf (v) = y + th + tv,
and since x = y + tv, we get
f (x) x = (1 )y + th
f (x) x = t(h + ( 1)v).
Observe that if E is finite-dimensional, by picking a basis of E consisting of v and basis
vectors of H, then the matrix of f is a lower triangular matrix whose diagonal entries are
all 1 except the first entry which is equal to . Therefore, det(f ) = .
Case 1 . 6= 1.
We have f (x) = x iff (1 )y + th = 0 iff
y=
t
h.
1
t
t
t
h + tv =
(h + ( 1)v) =
w,
1
1
1
207
that is, f (x) x Kh for all x E. Assume that the hyperplane H is given as the kernel
of some linear form , and let a = (v). We have a 6= 0, since v
/ H. For any x E, we
have
(x a1 (x)v) = (x) a1 (x)(v) = (x) (x) = 0,
which shows that x a1 (x)v H for all x E. Since every vector in H is fixed by f , we
get
x a1 (x)v = f (x a1 (x)v)
= f (x) a1 (x)f (v),
so
f (x) = x + (x)(f (a1 v) a1 v).
(u) = 0.
()
Therefore, we proved that every linear isomorphism of E that leaves every vector in some
hyperplane H fixed and has the property that f (x) x H for all x E is given by a map
,u as defined by equation (), where is some nonzero linear form defining H and u is
some vector in H. We have ,u = id iff u = 0.
208
Definition 6.3. Given any hyperplane H in E, for any nonzero nonlinear form E
defining H (which means that H = Ker ()) and any nonzero vector u H, the linear map
,u given by
,u (x) = x + (x)u, (u) = 0,
for all x E is called a transvection of hyperplane H and direction u. The map ,u leaves
every vector in H fixed, and f (x) x Ku for all x E.
The above arguments show the following result.
Proposition 6.20. Let f : E
that f (x) = x for all x H,
vector u E such that u
/H
otherwise, f is a dilatation of
0 0
0 1
0
..
.. ,
.
.
.
. .
0 0 1
which is an elementary matrix of the form E1, . Conversely, it is clear that every elementary
matrix of the form Ei, with 6= 0, 1 is a dilatation.
Now, assume that f is a transvection of hyperplane H and direction u H. Pick some
v
/ H, and pick some basis (u, e3 , . . . , en ) of H, so that (v, u, e3 , . . . , en ) is a basis of E. Since
209
1 0 0
1
0
..
.. ,
.
.
.
. .
0 0 1
which is an elementary matrix of the form E2,1; . Conversely, it is clear that every elementary
matrix of the form Ei,j; ( 6= 0) is a transvection.
The following proposition is an interesting exercise that requires good mastery of the
elementary row operations Ei,j; .
Proposition 6.22. Given any invertible n n matrix A, there is a matrix S such that
In1 0
SA =
= En, ,
0
with = det(A), and where S is a product of elementary matrices of the form Ei,j; ; that
is, S is a composition of transvections.
Surprisingly, every transvection is the composition of two dilatations!
Proposition 6.23. If the field K is not of charateristic 2, then every transvection f of
hyperplane H can be written as f = d2 d1 , where d1 , d2 are dilatations of hyperplane H,
where the direction of d1 can be chosen arbitrarily.
Proof. Pick some dilalation d1 of hyperplane H and scale factor 6= 0, 1. Then, d2 = f d1
1
leaves every vector in H fixed, and det(d2 ) = 1 6= 1. By Proposition 6.21, the linear map
d2 is a dilatation of hyperplane H, and we have f = d2 d1 , as claimed.
Observe that in Proposition 6.23, we can pick = 1; that is, every transvection of
hyperplane H is the compositions of two symmetries about the hyperplane H, one of which
can be picked arbitrarily.
Remark: Proposition 6.23 holds as long as K 6= {0, 1}.
The following important result is now obtained.
Theorem 6.24. Let E be any finite-dimensional vector space over a field K of characteristic
not equal to 2. Then, the group SL(E) is generated by the transvections, and the group
GL(E) is generated by the dilatations.
210
Proof. Consider any f SL(E), and let A be its matrix in any basis. By Proposition 6.22,
there is a matrix S such that
In1 0
SA =
= En, ,
0
with = det(A), and where S is a product of elementary matrices of the form Ei,j; . Since
det(A) = 1, we have = 1, and the resulxt is proved. Otherwise, En, is a dilatation, S is a
product of transvections, and by Proposition 6.23, every transvection is the composition of
two dilatations, so the second result is also proved.
We conclude this section by proving that any two transvections are conjugate in GL(E).
Let ,u (u 6= 0) be a transvection and let g GL(E) be any invertible linear map. We have
(g ,u g 1 )(x) = g(g 1 (x) + (g 1 (x))u)
= x + (g 1 (x))g(u).
Let us find the hyperplane determined by the linear form x 7 (g 1 (x)). This is the set of
vectors x E such that (g 1 (x)) = 0, which holds iff g 1 (x) H iff x g(H). Therefore,
Ker (g 1 ) = g(H) = H 0 , and we have g(u) g(H) = H 0 , so g ,u g 1 is the transvection
of hyperplane H 0 = g(H) and direction u0 = g(u) (with u0 H 0 ).
Conversely, let ,u0 be some transvection (u0 6= 0). Pick some vector v, v 0 such that
(v) = (v 0 ) = 1, so that
E = H Kv = H 0 v 0 .
There is a linear map g GL(E) such that g(u) = u0 , g(v) = v 0 , and g(H) = H 0 . To
define g, pick a basis (v, u, e2 , . . . , en1 ) where (u, e2 , . . . , en1 ) is a basis of H and pick a
basis (v 0 , u0 , e02 , . . . , e0n1 ) where (u0 , e02 , . . . , e0n1 ) is a basis of H 0 ; then g is defined so that
g(v) = v 0 , g(u) = u0 , and g(ei ) = g(e0i ), for i = 2, . . . , n 1. If n = 2, then ei and e0i are
missing. Then, we have
(g ,u g 1 )(x) = x + (g 1 (x))u0 .
Now, g 1 also determines the hyperplane H 0 = g(H), so we have g 1 = for some
nonzero in K. Since v 0 = g(v), we get
(v) = g 1 (v 0 ) = (v 0 ),
and since (v) = (v 0 ) = 1, we must have = 1. It follows that
(g ,u g 1 )(x) = x + (x)u0 = ,u0 (x).
In summary, we proved almost all parts the following result.
6.7. SUMMARY
211
Proposition 6.25. Let E be any finite-dimensional vector space. For every transvection
,u (u 6= 0) and every linear map g GL(E), the map g ,u g 1 is the transvection
of hyperplane g(H) and direction g(u) (that is, g ,u g 1 = g1 ,g(u) ). For every other
transvection ,u0 (u0 6= 0), there is some g GL(E) such ,u0 = g ,u g 1 ; in other
words any two transvections (6= id) are conjugate in GL(E). Moreover, if n 3, then the
linear isomorphim g as above can be chosen so that g SL(E).
Proof. We just need to prove that if n 3, then for any two transvections ,u and ,u0
(u, u0 6= 0), there is some g SL(E) such that ,u0 = g ,u g 1 . As before, we pick a basis
(v, u, e2 , . . . , en1 ) where (u, e2 , . . . , en1 ) is a basis of H, we pick a basis (v 0 , u0 , e02 , . . . , e0n1 )
where (u0 , e02 , . . . , e0n1 ) is a basis of H 0 , and we define g as the unique linear map such that
g(v) = v 0 , g(u) = u0 , and g(ei ) = e0i , for i = 1, . . . , n 1. But, in this case, both H and
H 0 = g(H) have dimension at least 2, so in any basis of H 0 including u0 , there is some basis
vector e02 independent of u0 , and we can rescale e02 in such a way that the matrix of g over
the two bases has determinant +1.
6.7
Summary
The main concepts and results of this chapter are listed below:
One does not solve (large) linear systems by computing determinants.
Upper-triangular (lower-triangular ) matrices.
Solving by back-substitution (forward-substitution).
Gaussian elimination.
Permuting rows.
The pivot of an elimination step; pivoting.
Transposition matrix ; elementary matrix .
The Gaussian elimination theorem (Theorem 6.1).
Gauss-Jordan factorization.
LU -factorization; Necessary and sufficient condition for the existence of an
LU -factorization (Proposition 6.2).
LDU -factorization.
P A = LU theorem (Theorem 6.5).
LDL> -factorization of a symmetric matrix.
212
Chapter 7
Vector Norms and Matrix Norms
7.1
In order to define how close two vectors or two matrices are, and in order to define the
convergence of sequences of vectors or matrices, we can use the notion of a norm. Recall
that R+ = {x R | x 0}. Also recall
that
if z = a + ib C is a complex number, with
(positivity)
(triangle inequality)
ky xk = k(x y)k = kx yk ,
213
214
we also have
Therefore,
kyk kxk kx yk .
|kxk kyk| kx yk,
for all x, y E.
()
Observe that setting = 0 in (N2), we deduce that k0k = 0 without assuming (N1).
Then, by setting y = 0 in (), we obtain
|kxk| kxk ,
for all x E.
Therefore, the condition kxk 0 in (N1) follows from (N2) and (N3), and (N1) can be
replaced by the weaker condition
(N1) For all x E, if kxk = 0 then x = 0,
A function k k : E R satisfying axioms (N2) and (N3) is called a seminorm. From the
above discussion, a seminorm also has the properties
kxk 0 for all x E, and k0k = 0.
However, there may be nonzero vectors x E such that kxk = 0. Let us give some
examples of normed vector spaces.
Example 7.1.
1. Let E = R, and kxk = |x|, the absolute value of x.
2. Let E = C, and kzk = |z|, the modulus of z.
3. Let E = Rn (or E = Cn ). There are three standard norms. For every (x1 , . . . , xn ) E,
we have the norm kxk1 , defined such that,
kxk1 = |x1 | + + |xn |,
we have the Euclidean norm kxk2 , defined such that,
kxk2 = |x1 |2 + + |xn |2
12
215
p q
+ .
p
q
()
To prove the above inequality, we use the fact that the exponential function t 7 et satisfies
the following convexity inequality:
ex+(1)y ex + (1 y)ey ,
for all x, y R and all with 0 1.
Since the case = 0 is trivial, let us assume that > 0 and > 0. If we replace by
1/p, x by p log and y by q log , then we get
1
1
1
1
e p p log + q q log ep log + eq log ,
p
q
which simplifies to
p q
+ ,
p
q
as claimed.
We will now prove that for any two vectors u, v E, we have
n
X
i=1
()
Since the above is trivial if u = 0 or v = 0, let us assume that u 6= 0 and v 6= 0. Then, the
inequality () with = |ui |/ kukp and = |vi |/ kvkq yields
|ui vi |
|ui |p
|vi |q
+
,
kukp kvkq
p kukpp q kukqq
for i = 1, . . . , n, and by summing up these inequalities, we get
n
X
i=1
216
as claimed. To finish the proof, we simply have to prove that property (N3) holds, since
(N1) and (N2) are clear. Now, for i = 1, . . . , n, we can write
(|ui | + |vi |)p = |ui |(|ui | + |vi |)p1 + |vi |(|ui | + |vi |)p1 ,
so that by summing up these equations we get
n
X
i=1
n
X
i=1
n
X
i=1
X
n
i=1
(|ui | + |vi |)
(p1)q
1/q
.
X
n
i=1
1/q
(|ui | + |vi |)
which yields
X
n
i=1
1/p
(|ui | + |vi |)
kukp + kvkp .
Since |ui + vi | |ui | + |vi |, the above implies the triangle inequality ku + vkp kukp + kvkp ,
as claimed.
For p > 1 and 1/p + 1/q = 1, the inequality
n
X
i=1
|ui vi |
X
n
i=1
|ui |
1/p X
n
i=1
1/q
|vi |
n
X
ui v i ,
i=1
n
X
i=1
|ui v i | =
n
X
i=1
|ui vi |,
217
also called Holders inequality, which, for p = 2 is the standard CauchySchwarz inequality.
The triangle inequality for the `p -norm,
X
n
i=1
1/p
(|ui + vi |)
X
n
i=1
1/p
|ui |
X
n
i=1
|vi |
1/q
,
It is very useful to observe that if we represent (as usual) u = (u1 , . . . , un ) and v = (v1 , . . . , vn )
(in Rn ) by column vectors, then their Euclidean inner product is given by
hu, vi = u> v = v > u,
and when u, v Cn , their Hermitian inner product is given by
hu, vi = v u = u v.
In particular, when u = v, in the complex case we get
kuk22 = u u,
and in the real case, this becomes
kuk22 = u> u.
As convenient as these notations are, we still recommend that you do not abuse them; the
notation hu, vi is more intrinsic and still works when our vector space is infinite dimensional.
The following proposition is easy to show.
Proposition 7.2. The following inequalities hold for all x Rn (or x Cn ):
kxk kxk1 nkxk ,
218
Definition 7.2. Given any (real or complex) vector space E, two norms k ka and k kb are
equivalent iff there exists some positive reals C1 , C2 > 0, such that
kuka C1 kukb
and
Given any norm k k on a vector space of dimension n, for any basis (e1 , . . . , en ) of E,
observe that for any vector x = x1 e1 + + xn en , we have
kxk = kx1 e1 + + xn en k |x1 | ke1 k + + |xn | ken k C(|x1 | + + |xn |) = C kxk1 ,
with C = max1in kei k and
kxk1 = kx1 e1 + + xn en k = |x1 | + + |xn |.
The above implies that
| kuk kvk | ku vk C ku vk1 ,
which means that the map u 7 kuk is continuous with respect to the norm k k1 .
Let S1n1 be the unit sphere with respect to the norm k k1 , namely
S1n1 = {x E | kxk1 = 1}.
Now, S1n1 is a closed and bounded subset of a finite-dimensional vector space, so by Heine
Borel (or equivalently, by BolzanoWeiertrass), S1n1 is compact. On the other hand, it
is a well known result of analysis that any continuous real-valued function on a nonempty
compact set has a minimum and a maximum, and that they are achieved. Using these facts,
we can prove the following important theorem:
Theorem 7.3. If E is any real or complex vector space of finite dimension, then any two
norms on E are equivalent.
Proof. It is enough to prove that any norm k k is equivalent to the 1-norm. We already proved
that the function x 7 kxk is continuous with respect to the norm k k1 and we observed that
the unit sphere S1n1 is compact. Now, we just recalled that because the function f : x 7 kxk
is continuous and because S1n1 is compact, the function f has a minimum m and a maximum
M , and because kxk is never zero on S1n1 , we must have m > 0. Consequently, we just
proved that if kxk1 = 1, then
0 < m kxk M,
so for any x E with x 6= 0, we get
m kx/ kxk1 k M,
which implies
m kxk1 kxk M kxk1 .
Since the above inequality holds trivially if x = 0, we just proved that k k and k k1 are
equivalent, as claimed.
Next, we will consider norms on matrices.
219
7.2
Matrix Norms
For simplicity of exposition, we will consider the vector spaces Mn (R) and Mn (C) of square
n n matrices. Most results also hold for the spaces Mm,n (R) and Mm,n (C) of rectangular
m n matrices. Since n n matrices can be multiplied, the idea behind matrix norms is
that they should behave well with respect to matrix multiplication.
Definition 7.3. A matrix norm k k on the space of square n n matrices in Mn (K), with
K = R or K = C, is a norm on the vector space Mn (K), with the additional property called
submultiplicativity that
kABk kAk kBk ,
for all A, B Mn (K). A norm on matrices satisfying the above property is often called a
submultiplicative matrix norm.
Since I 2 = I, from kIk = kI 2 k kIk2 , we get kIk 1, for every matrix norm.
Before giving examples of matrix norms, we need to review some basic definitions about
matrices. Given any matrix A = (aij ) Mm,n (C), the conjugate A of A is the matrix such
that
Aij = aij , 1 i m, 1 j n.
1 i m, 1 j n.
AA = A A,
220
221
has the complex eigenvalues i and i, but no real eigenvalues. Thus, typically, even for real
matrices, we consider complex eigenvalues.
iff
iff
iff
iff
and U 6= 0, we have kU k =
6 0, and get
(A) = || kAk ,
as claimed.
222
Proposition 7.4 also holds for any real matrix norm k k on Mn (R) but the proof is more
subtle and requires the notion of induced norm. We prove it after giving Definition 7.7.
Now, it turns out that if A is a real n n symmetric matrix, then the eigenvalues of A
are all real and there is some orthogonal matrix Q such that
A = Qdiag(1 , . . . , n )Q> ,
where diag(1 , . . . , n ) denotes the matrix whose only nonzero entries (if any) are its diagonal
entries, which are the (real) eigenvalues of A. Similarly, if A is a complex n n Hermitian
matrix, then the eigenvalues of A are all real and there is some unitary matrix U such that
A = U diag(1 , . . . , n )U ,
where diag(1 , . . . , n ) denotes the matrix whose only nonzero entries (if any) are its diagonal
entries, which are the (real) eigenvalues of A.
We now return to matrix norms. We begin with the so-called Frobenius norm, which is
2
just the norm k k2 on Cn , where the n n matrix A is viewed as the vector obtained by
concatenating together the rows (or the columns) of A. The reader should check that for
any n n complex matrix A = (aij ),
X
n
i,j=1
|aij |
1/2
=
p
p
tr(A A) = tr(AA ).
Definition 7.6. The Frobenius norm k kF is defined so that for every square n n matrix
A Mn (C),
X
1/2
n
p
p
2
kAkF =
|aij |
= tr(AA ) = tr(A A).
i,j=1
The following proposition show that the Frobenius norm is a matrix norm satisfying other
nice properties.
Proposition 7.5. The Frobenius norm k kF on Mn (C) satisfies the following properties:
(1) It is a matrix norm; that is, kABkF kAkF kBkF , for all A, B Mn (C).
(2) It is unitarily invariant, which means that for all unitary matrices U, V , we have
kAkF = kU AkF = kAV kF = kU AV kF .
(3)
p
p
(A A) kAkF n (A A), for all A Mn (C).
223
Proof. (1) The only property that requires a proof is the fact kABkF kAkF kBkF . This
follows from the CauchySchwarz inequality:
kABk2F
2
n X
X
n
=
a
b
ik
kj
i,j=1 k=1
n X
n
X
i,j=1
h=1
X
n
i,h=1
|aih |
2
|aih |
X
n
k=1
X
n
k,j=1
|bkj |
2
|bkj |
= kAk2F kBk2F .
(2) We have
kAk2F = tr(A A) = tr(V V A A) = tr(V A AV ) = kAV k2F ,
and
kAk2F = tr(A A) = tr(A U U A) = kU Ak2F .
The identity
kAkF = kU AV kF
follows from the previous two.
(3) It is well known that the trace of a matrix is equal to the sum of its eigenvalues.
Furthermore, A A is symmetric positive semidefinite (which means that its eigenvalues are
nonnegative), so (A A) is the largest eigenvalue of A A and
(A A) tr(A A) n(A A),
which yields (3) by taking square roots.
Remark: The Frobenius norm is also known as the Hilbert-Schmidt norm or the Schur
norm. So many famous names associated with such a simple thing!
We now give another method for obtaining matrix norms using subordinate norms. First,
we need a proposition that shows that in a finite-dimensional space, the linear map induced
by a matrix is bounded, and thus continuous.
Proposition 7.6. For every norm k k on Cn (or Rn ), for every matrix A Mn (C) (or
A Mn (R)), there is a real constant CA > 0, such that
kAuk CA kuk ,
for every vector u Cn (or u Rn if A is real).
224
xCn
x6=0
kAxk
CA .
kxk
xCn
x6=0
kAxk
= sup kAxk .
kxk
xCn
kxk=1
Similarly
sup
xRn
x6=0
kAxk
= sup kAxk .
kxk
xRn
kxk=1
xCn
x6=0
kAxk
= sup kAxk .
kxk
xCn
kxk=1
The function A 7 kAk is called the subordinate matrix norm or operator norm induced
by the norm k k.
225
It is easy to check that the function A 7 kAk is indeed a norm, and by definition, it
satisfies the property
kAxk kAk kxk , for all x Cn .
A norm k k on Mn (C) satisfying the above property is said to be subordinate to the vector
norm k k on Cn . As a consequence of the above inequality, we have
kABxk kAk kBxk kAk kBk kxk ,
for all x Cn , which implies that
kABk kAk kBk
xRn
x6=0
kAxk
= sup kAxk .
kxk
xRn
kxk=1
226
for all real matrices A Mn (R). However, it is possible to construct vector norms k k on Cn
and real matrices A such that
kAkR < kAk .
In order to avoid this kind of difficulties, we define subordinate matrix norms over Mn (C).
Luckily, it turns out that kAkR = kAk for the vector norms, k k1 , k k2 , and k k .
We now prove Proposition 7.4 for real matrix norms.
Proposition 7.7. For any matrix norm k k on Mn (R) and for any square n n matrix
A Mn (R), we have
(A) kAk .
Proof. We follow the proof in Denis Serres book [96]. If A is a real matrix, the problem is
that the eigenvectors associated with the eigenvalue of maximum modulus may be complex.
We use a trick based on the fact that for every matrix A (real or complex),
(Ak ) = ((A))k ,
which is left as an exercise (use Proposition 8.5 which shows that if (1 , . . . , n ) are the (not
necessarily distinct) eigenvalues of A, then (k1 , . . . , kn ) are the eigenvalues of Ak , for k 1).
Pick any complex norm k kc on Cn and let k kc denote the corresponding induced norm
on matrices. The restriction of k kc to real matrices is a real norm that we also denote by
k kc . Now, by Theorem 7.3, since Mn (R) has finite dimension n2 , there is some constant
C > 0 so that
kAkc C kAk , for all A Mn (R).
Furthermore,
for every k 1 and for every
real
n n matrix A, by Proposition 7.4, (Ak )
k
A
, and because k k is a matrix norm,
Ak
kAkk , so we have
c
((A))k = (Ak )
Ak
c C
Ak
C kAkk ,
for all k 1. It follows that
(A) C 1/k kAk ,
for all k 1.
However because C > 0, we have limk7 C 1/k = 1 (we have limk7 k1 log(C) = 0). Therefore, we conclude that
(A) kAk ,
as desired.
We now determine explicitly what are the subordinate matrix norms associated with the
vector norms k k1 , k k2 , and k k .
227
n
X
xCn
kxk1 =1
i=1
xCn
kxk =1
xCn
kxk2 =1
|aij |
n
X
j=1
|aij |
(A A) =
(AA ).
n
X
i=1
|aij |.
It remains to show that equality can be achieved. For this let j0 be some index such that
X
X
max
|aij | =
|aij0 |,
j
X
X
= max
aij uj max
|aij | kuk ,
i
i
j
n
X
j=1
|aij |.
228
xCn
x x=1
Since the matrix A A is symmetric, it has real eigenvalues and it can be diagonalized with
respect to an orthogonal matrix. These facts can be used to prove that the function x 7
x A Ax has a maximum on the sphere x x = 1 equal to the largest eigenvalue of A A,
namely, (A A). We postpone the proof until we discuss optimizing quadratic functions.
Therefore,
p
kAk2 = (A A).
Let use now prove that (A A) = (AA ). First, assume that (A A) > 0. In this case,
there is some eigenvector u (6= 0) such that
A Au = (A A)u,
and since (A A) > 0, we must have Au 6= 0. Since Au 6= 0,
AA (Au) = (A A)Au
which means that (A A) is an eigenvalue of AA , and thus
(A A) (AA ).
Because (A ) = A, by replacing A by A , we get
(AA ) (A A),
and so (A A) = (AA ).
229
kAk22 = (A A) = (A U U A) = kU Ak22 .
Finally, if A is a normal matrix (AA = A A), it can be shown that there is some unitary
matrix U so that
A = U DU ,
where D = diag(1 , . . . , n ) is a diagonal matrix consisting of the eigenvalues of A, and thus
A A = (U DU ) U DU = U D U U DU = U D DU .
However, D D = diag(|1 |2 , . . . , |n |2 ), which proves that
(A A) = (D D) = max |i |2 = ((A))2 ,
i
which shows that the Frobenius norm is an upper bound on the spectral norm. The Frobenius
norm is much easier to compute than the spectal norm.
The reader will check that the above proof still holds if the matrix A is real, confirming
the fact that kAkR = kAk for the vector norms k k1 , k k2 , and k k . It is also easy to verify
that the proof goes through for rectangular matrices, with the same formulae. Similarly,
the Frobenius norm is also a norm on rectangular matrices. For these norms, whenever AB
makes sense, we have
kABk kAk kBk .
Remark: Let (E, k k) and (F, k k) be two normed vector spaces (for simplicity of notation,
we use the same symbol k k for the norms on E and F ; this should not cause any confusion).
Recall that a function f : E F is continuous if for every a E, for every > 0, there is
some > 0 such that for all x E,
if
kx ak
then
kf (x) f (a)k .
It is not hard to show that a linear map f : E F is continuous iff there is some constant
C > 0 such that
kf (x)k C kxk for all x E.
If so, we say that f is bounded (or a linear bounded operator ). We let L(E; F ) denote the
set of all continuous (equivalently, bounded) linear maps from E to F . Then, we can define
the operator norm (or subordinate norm) k k on L(E; F ) as follows: for every f L(E; F ),
kf k = sup
xE
x6=0
kf (x)k
= sup kf (x)k ,
kxk
xE
kxk=1
230
or equivalently by
kf k = inf{ R | kf (x)k kxk , for all x E}.
It is not hard to show that the map f 7 kf k is a norm on L(E; F ) satisfying the property
kf (x)k kf k kxk
for all x E, and that if f L(E; F ) and g L(F ; G), then
kg f k kgk kf k .
Operator norms play an important role in functional analysis, especially when the spaces E
and F are complete.
The following proposition will be needed when we deal with the condition number of a
matrix.
Proposition 7.9. Let k k be any matrix norm and let B be a matrix such that kBk < 1.
(1) If k k is a subordinate matrix norm, then the matrix I + B is invertible and
(I + B)1
1
.
1 kBk
(2) If a matrix of the form I + B is singular, then kBk 1 for every matrix norm (not
necessarily subordinate).
Proof. (1) Observe that (I + B)u = 0 implies Bu = u, so
kuk = kBuk .
Recall that
kBuk kBk kuk
for every subordinate norm. Since kBk < 1, if u 6= 0, then
kBuk < kuk ,
which contradicts kuk = kBuk. Therefore, we must have u = 0, which proves that I + B is
injective, and thus bijective, i.e., invertible. Then, we have
(I + B)1 + B(I + B)1 = (I + B)(I + B)1 = I,
so we get
(I + B)1 = I B(I + B)1 ,
231
1
.
1 kBk
1 t12
0 2
..
T = ...
.
0 0
0 0
t13
t23
..
.
..
.
n1
0
t1n
t2n
..
.
tn1 n
n
where 1 , . . . , n are the eigenvalues of A. For every 6= 0, define the diagonal matrix
D = diag(1, , 2 , . . . , n1 ),
and consider the matrix
..
..
.. .
...
(U D )1 A(U D ) = D1 T D = ...
.
.
.
0
0
n1 tn1 n
0
0
0
n
Now, define the function k k : Mn (C) R by
kBk =
(U D )1 B(U D )
,
232
for every B Mn (C). Then it is easy to verify that the above function is the matrix norm
subordinate to the vector norm
v 7
(U D )1 v
.
Furthermore, for every > 0, we can pick so that
n
X
j=i+1
| ji tij | ,
1 i n 1,
7.3
Unfortunately, there exist linear systems Ax = b whose solutions are not stable under small
perturbations of either b or A. For example, consider the system
10 7 8 7
x1
32
7 5 6 5 x2 23
8 6 10 9 x3 = 33 .
7 5 9 10
x4
31
The reader should check that it has the solution x = (1, 1, 1, 1). If we perturb slightly the
right-hand side, obtaining the new system
10 7 8 7
x1 + x1
32.1
7 5 6 5 x2 + x2 22.9
8 6 10 9 x3 + x3 = 33.1 ,
7 5 9 10
x4 + x4
30.9
the new solutions turns out to be x = (9.2, 12.6, 4.5, 1.1). In other words, a relative error
of the order 1/200 in the data (here, b) produces a relative error of the order 10/1 in the
solution, which represents an amplification of the relative error of the order 2000.
10
7.08
8
6.99
233
7
8.1 7.2
x1 + x1
32
5.04 6
5 x2 + x2 23
.
=
5.98 9.98 9 x3 + x3 33
4.99 9 9.98
x4 + x4
31
This time, the solution is x = (81, 137, 34, 22). Again, a small change in the data alters
the result rather drastically. Yet, the original system is symmetric, has determinant 1, and
has integer entries. The problem is that the matrix of the system is badly conditioned, a
concept that we will now explain.
Given an invertible matrix A, first, assume that we perturb b to b + b, and let us analyze
the change between the two exact solutions x and x + x of the two systems
Ax = b
A(x + x) = b + b.
We also assume that we have some norm k k and we use the subordinate matrix norm on
matrices. From
Ax = b
Ax + Ax = b + b,
we get
x = A1 b,
and we conclude that
kxk
A1
kbk
kbk kAk kxk .
Consequently, the relative error in the result kxk / kxk is bounded in terms of the relative
error kbk / kbk in the data as follows:
kbk
kxk
kAk
A1
.
kxk
kbk
Now let us assume that A is perturbed to A + A, and let us analyze the change between
the exact solutions of the two systems
Ax = b
(A + A)(x + x) = b.
The second equation yields Ax + Ax + A(x + x) = b, and by subtracting the first
equation we get
x = A1 A(x + x).
234
It follows that
kxk
A1
kAk kx + xk ,
which can be rewritten as
kAk
kxk
kAk
A1
.
kx + xk
kAk
Observe that the above reasoning is valid even if the matrix A + A is singular, as long
as x + x is a solution of the second system. Furthermore, if kAk is small enough, it is
not unreasonable to expect that the ratio kxk / kx + xk is close to kxk / kxk. This will
be made more precise later.
In summary, for each of the two perturbations, we see that the relative error in the result
is bounded by the relative error in the data, multiplied the number kAk kA1 k. In fact, this
factor turns out to be optimal and this suggests the following definition:
Definition 7.9. For any subordinate matrix norm k k, for any invertible matrix A, the
number
cond(A) = kAk
A1
is called the condition number of A relative to k k.
The condition number cond(A) measures the sensitivity of the linear system Ax = b to
variations in the data b and A; a feature referred to as the condition of the system. Thus,
when we says that a linear system is ill-conditioned , we mean that the condition number of
its matrix is large. We can sharpen the preceding analysis as follows:
Proposition 7.11. Let A be an invertible matrix and let x and x + x be the solutions of
the linear systems
Ax = b
A(x + x) = b + b.
If b 6= 0, then the inequality
kbk
kxk
cond(A)
kxk
kbk
holds and is the best possible. This means that for a given matrix A, there exist some vectors
b 6= 0 and b 6= 0 for which equality holds.
Proof. We already proved the inequality. Now, because k k is a subordinate matrix norm,
there exist some vectors x 6= 0 and b 6= 0 for which
1
1
A b
=
A
kbk and kAxk = kAk kxk .
235
Proposition 7.12. Let A be an invertible matrix and let x and x + x be the solutions of
the two systems
Ax = b
(A + A)(x + x) = b.
If b 6= 0, then the inequality
kxk
kAk
cond(A)
kx + xk
kAk
holds and is the best possible. This means that given a matrix A, there exist a vector b 6= 0
and a matrix A 6= 0 for which equality holds. Furthermore, if kAk is small enough (for
instance, if kAk < 1/ kA1 k), we have
kAk
kxk
cond(A)
(1 + O(kAk));
kxk
kAk
in fact, we have
kxk
kAk
cond(A)
kxk
kAk
1
1 kA1 k kAk
.
Proof. The first inequality has already been proved. To show that equality can be achieved,
let w be any vector such that w 6= 0 and
1
1
A w
=
A
kwk ,
and let 6= 0 be any real number. Now, the vectors
x = A1 w
x + x = w
b = (A + I)w
and the matrix
A = I
sastisfy the equations
Ax = b
(A + A)(x + x) = b
kxk = ||
A1 w
= kAk
A1
kx + xk .
Finally, we can pick so that is not equal to any of the eigenvalues of A, so that
A + A = A + I is invertible and b is is nonzero.
236
1
1
.
1
1
1 kA Ak
1 kA k kAk
1
1 kA1 k kAk
kA1 k kAk
kxk ,
1 kA1 k kAk
kxk
kAk
cond(A)
kxk
kAk
1
1 kA1 k kAk
,
+
;
kxk
1 kA1 k kAk
kAk
kbk
237
We now list some properties of condition numbers and figure out what cond(A) is in the
case of the spectral norm (the matrix norm induced by k k2 ). First, we need to introduce a
very important factorization of matrices, the singular value decomposition, for short, SVD.
It can be shown that given any n n matrix A Mn (C), there exist two unitary matrices
U and V , and a real diagonal matrix = diag(1 , . . . , n ), with 1 2 n 0,
such that
A = V U .
The nonnegative numbers 1 , . . . , n are called the singular values of A.
If A is a real matrix, the matrices U and V are orthogonal matrices. The factorization
A = V U implies that
A A = U 2 U
and AA = V 2 V ,
which shows that 12 , . . . , n2 are the eigenvalues of both A A and AA , that the columns of U
are corresponding eivenvectors for A A, and that the columns of V are corresponding eivenvectors for AA . In the case of a normal matrix if 1 , . . . , n are the (complex) eigenvalues
of A, then
i = |i |, 1 i n.
Proposition 7.13. For every invertible matrix A Mn (C), the following properties hold:
(1)
cond(A) 1,
cond(A) = cond(A1 )
cond(A) = cond(A) for all C {0}.
(2) If cond2 (A) denotes the condition number of A with respect to the spectral norm, then
1
,
n
cond2 (A) =
|1 |
,
|n |
238
(5) The condition number cond2 (A) is invariant under unitary transformations, which
means that
cond2 (A) = cond2 (U A) = cond2 (AV ),
for all unitary matrices U and V .
Proof. The properties in (1) are immediate consequences of the properties of subordinate
matrix norms. In particular, AA1 = I implies
1 = kIk kAk
A1
= cond(A).
(2) We showed earlier that kAk22 = (A A), which is the square of the modulus of the largest
eigenvalue of A A. Since we just saw that the eigenvalues of A A are 12 n2 , where
1 , . . . , n are the singular values of A, we have
kAk2 = 1 .
Now, if A is invertible, then 1 n > 0, and it is easy to show that the eigenvalues
of (A A)1 are n2 12 , which shows that
1
A
= n1 ,
2
and thus
1
.
n
(3) This follows from the fact that kAk2 = (A) for a normal matrix.
cond2 (A) =
then A A = p
AA = I, so (A A) = 1, and kAk2 =
p (4) If A is a unitary matrix,
(A A) = 1. We also have kA1 k2 = kA k2 = (AA ) = 1, and thus cond(A) = 1.
(5) This follows immediately from the unitary invariance of the spectral norm.
Proposition 7.13 (4) shows that unitary and orthogonal transformations are very wellconditioned, and part (5) shows that unitary transformations preserve the condition number.
In order to compute cond2 (A), we need to compute the top and bottom singular values
of A, which may be hard. The inequality
239
Thus, if A is nearly singular, then there will be some orthonormal pair u, v such that Au
and Av are nearly parallel; the angle (A) will the be small and cot((A)/2)) will be large.
For more details, see Horn and Johnson [57] (Section 5.8 and Section 7.4).
It should be noted that in general (if A is not a normal matrix) a matrix could have
a very large condition number even if all its eigenvalues are identical! For example, if we
consider the n n matrix
1 2 0 0 ... 0 0
0 1 2 0 . . . 0 0
0 0 1 2 . . . 0 0
0 0 . . . 0 1 2 0
0 0 . . . 0 0 1 2
0 0 ... 0 0 0 1
it turns out that cond2 (A) 2n1 .
A classical example of matrix with a very large condition number is the Hilbert matrix
H (n) , the n n matrix with
1
(n)
.
Hij =
i+j1
For example, when n = 5,
H (5)
1
2
1
=
3
1
4
1
5
1
2
1
3
1
4
1
5
1
6
1
3
1
4
1
5
1
6
1
7
1
4
1
5
1
6
1
7
1
8
1
5
1
6
1
.
7
1
8
1
9
10
7
A=
8
7
7 8 7
5 6 5
,
6 10 9
5 9 10
240
which is a symmetric, positive, definite, matrix, it can be shown that its eigenvalues, which
in this case are also its singular values because A is SPD, are
1 30.2887 > 2 3.858 > 3 0.8431 > 4 0.01015,
so that
1
2984.
4
The reader should check that for the perturbation of the right-hand side b used earlier, the
relative errors kxk /kxk and kxk /kxk satisfy the inequality
cond2 (A) =
7.4
kxk
kbk
cond(A)
kxk
kbk
The problem of solving an inconsistent linear system Ax = b often arises in practice. This
is a system where b does not belong to the column space of A, usually with more equations
than variables. Thus, such a system has no solution. Yet, we would still like to solve such
a system, at least approximately.
Such systems often arise when trying to fit some data. For example, we may have a set
of 3D data points
{p1 , . . . , pn },
and we have reason to believe that these points are nearly coplanar. We would like to find
a plane that best fits our data points. Recall that the equation of a plane is
x + y + z + = 0,
with (, , ) 6= (0, 0, 0). Thus, every plane is either not parallel to the x-axis ( 6= 0) or not
parallel to the y-axis ( 6= 0) or not parallel to the z-axis ( 6= 0).
Say we have reasons to believe that the plane we are looking for is not parallel to the
z-axis. If we are wrong, in the least squares solution, one of the coefficients, , , will be
very large. If 6= 0, then we may assume that our plane is given by an equation of the form
z = ax + by + d,
and we would like this equation to be satisfied for all the pi s, which leads to a system of n
equations in 3 unknowns a, b, d, with pi = (xi , yi , zi );
ax1 + by1 + d = z1
..
..
.
.
axn + byn + d = zn .
241
However, if n is larger than 3, such a system generally has no solution. Since the above
system cant be solved exactly, we can try to find a solution (a, b, d) that minimizes the
least-squares error
n
X
(axi + byi + d zi )2 .
i=1
This is what Legendre and Gauss figured out in the early 1800s!
In general, given a linear system
Ax = b,
we solve the least squares problem: minimize kAx bk22 .
Fortunately, every n m-matrix A can be written as
A = V DU >
where U and V are orthogonal and D is a rectangular diagonal matrix with non-negative
entries (singular value decomposition, or SVD); see Chapter 16.
The SVD can be used to solve an inconsistent system. It is shown in Chapter 17 that
there is a vector x of smallest norm minimizing kAx bk2 . It is given by the (Penrose)
pseudo-inverse of A (itself given by the SVD).
It has been observed that solving in the least-squares sense may give too much weight to
outliers, that is, points clearly outside the best-fit plane. In this case, it is preferable to
minimize (the `1 -norm)
n
X
|axi + byi + d zi |.
i=1
This does not appear to be a linear problem, but we can use a trick to convert this
minimization problem into a linear program (which means a problem involving linear constraints).
Note that |x| = max{x, x}. So, by introducing new variables e1 , . . . , en , our minimization problem is equivalent to the linear program (LP):
minimize
subject to
e1 + + en
axi + byi + d zi ei
(axi + byi + d zi ) ei
1 i n.
1 i n.
242
For an optimal solution, we must have equality, since otherwise we could decrease some ei
and get an even better solution. Of course, we are no longer dealing with pure linear
algebra, since our constraints are inequalities.
We prefer not getting into linear programming right now, but the above example provides
a good reason to learn more about linear programming!
7.5
Summary
The main concepts and results of this chapter are listed below:
Norms and normed vector spaces.
The triangle inequality.
The Euclidean norm; the `p -norms.
Holders inequality; the CauchySchwarz inequality; Minkowskis inequality.
Hermitian inner product and Euclidean inner product.
Equivalent norms.
All norms on a finite-dimensional vector space are equivalent (Theorem 7.3).
Matrix norms.
Hermitian, symmetric and normal matrices. Orthogonal and unitary matrices.
The trace of a matrix.
Eigenvalues and eigenvectors of a matrix.
The characteristic polynomial of a matrix.
The spectral radius (A) of a matrix A.
The Frobenius norm.
The Frobenius norm is a unitarily invariant matrix norm.
Bounded linear maps.
Subordinate matrix norms.
Characterization of the subordinate matrix norms for the vector norms k k1 , k k2 , and
k k .
7.5. SUMMARY
243
244
Chapter 8
Eigenvectors and Eigenvalues
8.1
Given a finite-dimensional vector space E, let f : E E be any linear map. If, by luck,
there is a basis (e1 , . . . , en ) of E with respect to which f is represented by a diagonal matrix
1 0 . . . 0
.
.
0 2 . . ..
D=. .
,
.. ... 0
..
0 . . . 0 n
then the action of f on E is very simple; in every direction ei , we have
f (ei ) = i ei .
We can think of f as a transformation that stretches or shrinks space along the direction
e1 , . . . , en (at least if E is a real vector space). In terms of matrices, the above property
translates into the fact that there is an invertible matrix P and a diagonal matrix D such
that a matrix A can be factored as
A = P DP 1 .
When this happens, we say that f (or A) is diagonalizable, the i s are called the eigenvalues
of f , and the ei s are eigenvectors of f . For example, we will see that every symmetric matrix
can be diagonalized. Unfortunately, not every matrix can be diagonalized. For example, the
matrix
1 1
A1 =
0 1
cant be diagonalized. Sometimes, a matrix fails to be diagonalizable because its eigenvalues
do not belong to the field of coefficients, such as
0 1
A2 =
,
1 0
245
246
whose eigenvalues are i. This is not a serious problem because A2 can be diagonalized over
the complex numbers. However, A1 is a fatal case! Indeed, its eigenvalues are both 1 and
the problem is that A1 does not have enough eigenvectors to span E.
The next best thing is that there is a basis with respect to which f is represented by an
upper triangular matrix. In this case we say that f can be triangularized . As we will see
in Section 8.2, if all the eigenvalues of f belong to the field of coefficients K, then f can be
triangularized. In particular, this is the case if K = C.
Now, an alternative to triangularization is to consider the representation of f with respect
to two bases (e1 , . . . , en ) and (f1 , . . . , fn ), rather than a single basis. In this case, if K = R
or K = C, it turns out that we can even pick these bases to be orthonormal , and we get a
diagonal matrix with nonnegative entries, such that
f (ei ) = i fi ,
1 i n.
The nonzero i s are the singular values of f , and the corresponding representation is the
singular value decomposition, or SVD. The SVD plays a very important role in applications,
and will be considered in detail later.
In this section, we focus on the possibility of diagonalizing a linear map, and we introduce
the relevant concepts to do so. Given a vector space E over a field K, let I denote the identity
map on E.
Definition 8.1. Given any vector space E and any linear map f : E E, a scalar K
is called an eigenvalue, or proper value, or characteristic value of f if there is some nonzero
vector u E such that
f (u) = u.
Equivalently, is an eigenvalue of f if Ker (I f ) is nontrivial (i.e., Ker (I f ) 6= {0}).
A vector u E is called an eigenvector, or proper vector, or characteristic vector of f if
u 6= 0 and if there is some K such that
f (u) = u;
the scalar is then an eigenvalue, and we say that u is an eigenvector associated with
. Given any eigenvalue K, the nontrivial subspace Ker (I f ) consists of all the
eigenvectors associated with together with the zero vector; this subspace is denoted by
E (f ), or E(, f ), or even by E , and is called the eigenspace associated with , or proper
subspace associated with .
Note that distinct eigenvectors may correspond to the same eigenvalue, but distinct
eigenvalues correspond to disjoint sets of eigenvectors.
Remark: As we emphasized in the remark following Definition 7.4, we require an eigenvector
to be nonzero. This requirement seems to have more benefits than inconvenients, even though
247
it may considered somewhat inelegant because the set of all eigenvectors associated with an
eigenvalue is not a subspace since the zero vector is excluded.
Let us now assume that E is of finite dimension n. The next proposition shows that the
eigenvalues of a linear map f : E E are the roots of a polynomial associated with f .
Proposition 8.1. Let E be any vector space of finite dimension n and let f be any linear
map f : E E. The eigenvalues of f are the roots (in K) of the polynomial
det(I f ).
Proof. A scalar K is an eigenvalue of f iff there is some nonzero vector u 6= 0 in E such
that
f (u) = u
iff
(I f )(u) = 0
det(I f ) = 0.
In view of the importance of the polynomial det(I f ), we have the following definition.
Definition 8.2. Given any vector space E of dimension n, for any linear map f : E E,
the polynomial Pf (X) = f (X) = det(XI f ) is called the characteristic polynomial of
f . For any square matrix A, the polynomial PA (X) = A (X) = det(XI A) is called the
characteristic polynomial of A.
Note that we already encountered the characteristic polynomial in Section 5.7; see Definition 5.8.
Given any basis (e1 , . . . , en ), if A = M (f ) is the matrix of f w.r.t. (e1 , . . . , en ), we can
compute the characteristic polynomial f (X) = det(XI f ) of f by expanding the following
determinant:
X a1 1
a
.
.
.
a
12
1n
a2 1
X a2 2 . . .
a2 n
det(XI A) =
.
..
..
..
..
.
.
.
.
an 1
an 2
. . . X an n
If we expand this determinant, we find that
A (X) = det(XI A) = X n (a1 1 + + an n )X n1 + + (1)n det(A).
The sum tr(A) = a1 1 + + an n of the diagonal elements of A is called the trace of A. Since
we proved in Section 5.7 that the characteristic polynomial only depends on the linear map
f , the above shows that tr(A) has the same value for all matrices A representing f . Thus,
248
the trace of a linear map is well-defined; we have tr(f ) = tr(A) for any matrix A representing
f.
Remark: The characteristic polynomial of a linear map is sometimes defined as det(f XI).
Since
det(f XI) = (1)n det(XI f ),
this makes essentially no difference but the version det(XI f ) has the small advantage
that the coefficient of X n is +1.
If we write
A (X) = det(XI A) = X n 1 (A)X n1 + + (1)k k (A)X nk + + (1)n n (A),
then we just proved that
1 (A) = tr(A) and n (A) = det(A).
It is also possible to express k (A) in terms of determinants of certain submatrices of A. For
any nonempty subset, I {1, . . . , n}, say I = {i1 < . . . < ik }, let AI,I be the k k submatrix
of A whose jth column consists of the elements aih ij , where h = 1, . . . , k. Equivalently, AI,I
is the matrix obtained from A by first selecting the columns whose indices belong to I, and
then the rows whose indices also belong to I. Then, it can be shown that
X
k (A) =
det(AI,I ).
I{1,...,n}
|I|=k
If all the roots, 1 , . . . , n , of the polynomial det(XI A) belong to the field K, then we
can write
A (X) = det(XI A) = (X 1 ) (X n ),
where some of the i s may appear more than once. Consequently,
i ,
I{1,...,n} iI
|I|=k
249
where 1 , . . . , n are the eigenvalues of f (and A), where some of the i s may appear more
than once. In particular, f is not invertible iff it admits 0 has an eigenvalue.
Remark: Depending on the field K, the characteristic polynomial A (X) = det(XI A)
may or may not have roots in K. This motivates considering algebraically closed fields,
which are fields K such that every polynomial with coefficients in K has all its root in K.
For example, over K = R, not every polynomial has real roots. If we consider the matrix
cos sin
A=
,
sin
cos
then the characteristic polynomial det(XI A) has no real roots unless = k. However,
over the field C of complex numbers, every polynomial has roots. For example, the matrix
above has the roots cos i sin = ei .
It is possible to show that every linear map f over a complex vector space E must have
some (complex) eigenvalue without having recourse to determinants (and the characteristic
polynomial). Let n = dim(E), pick any nonzero vector u E, and consider the sequence
u, f (u), f 2 (u), . . . , f n (u).
Since the above sequence has n + 1 vectors and E has dimension n, these vectors must be
linearly dependent, so there are some complex numbers c0 , . . . , cm , not all zero, such that
c0 f m (u) + c1 f m1 (u) + + cm u = 0,
where m n is the largest integer such that the coefficient of f m (u) is nonzero (m must
exits since we have a nontrivial linear dependency). Now, because the field C is algebraically
closed, the polynomial
c0 X m + c1 X m1 + + cm
can be written as a product of linear factors as
c0 X m + c1 X m1 + + cm = c0 (X 1 ) (X m )
for some complex numbers 1 , . . . , m C, not necessarily distinct. But then, since c0 6= 0,
c0 f m (u) + c1 f m1 (u) + + cm u = 0
is equivalent to
(f 1 I) (f m I)(u) = 0.
250
251
Thus, from Proposition 8.3, if 1 , . . . , m are all the pairwise distinct eigenvalues of f
(where m n), we have a direct sum
E1 Em
of the eigenspaces Ei . Unfortunately, it is not always the case that
E = E1 Em .
When
E = E1 Em ,
we say that f is diagonalizable (and similarly for any matrix associated with f ). Indeed,
picking a basis in each Ei , we obtain a matrix which is a diagonal matrix consisting of the
eigenvalues, each i occurring a number of times equal to the dimension of Ei . This happens
if the algebraic multiplicity and the geometric multiplicity of every eigenvalue are equal. In
particular, when the characteristic polynomial has n distinct roots, then f is diagonalizable.
It can also be shown that symmetric matrices have real eigenvalues and can be diagonalized.
For a negative example, we leave as exercise to show that the matrix
1 1
M=
0 1
cannot be diagonalized, even though 1 is an eigenvalue. The problem is that the eigenspace
of 1 only has dimension 1. The matrix
cos sin
A=
sin
cos
cannot be diagonalized either, because it has no real eigenvalues, unless = k. However,
over the field of complex numbers, it can be diagonalized.
252
8.2
Unfortunately, not every linear map on a complex vector space can be diagonalized. The
next best thing is to triangularize, which means to find a basis over which the matrix has
zero entries below the main diagonal. Fortunately, such a basis always exist.
We say that a square matrix A is an upper triangular matrix if it has the following shape,
a1 1 a1 2 a1 3 . . . a1 n1
a1 n
0 a2 2 a2 3 . . . a2 n1
a2 n
0
0 a3 3 . . . a3 n1
a3 n
..
..
.. . .
..
.. ,
.
.
.
.
.
.
0
0
0 . . . an1 n1 an1 n
0
0
0 ...
0
an n
i.e., ai j = 0 whenever j < i, 1 i, j n.
Theorem 8.4. Given any finite dimensional vector space over a field K, for any linear map
f : E E, there is a basis (u1 , . . . , un ) with respect to which f is represented by an upper
triangular matrix (in Mn (K)) iff all the eigenvalues of f belong to K. Equivalently, for every
n n matrix A Mn (K), there is an invertible matrix P and an upper triangular matrix T
(both in Mn (K)) such that
A = P T P 1
iff all the eigenvalues of A belong to K.
Proof. If there is a basis (u1 , . . . , un ) with respect to which f is represented by an upper
triangular matrix T in Mn (K), then since the eigenvalues of f are the diagonal entries of T ,
all the eigenvalues of f belong to K.
For the converse, we proceed by induction on the dimension n of E. For n = 1 the result
is obvious. If n > 1, since by assumption f has all its eigenvalue in K, pick some eigenvalue
1 K of f , and let u1 be some corresponding (nonzero) eigenvector. We can find n 1
vectors (v2 , . . . , vn ) such that (u1 , v2 , . . . , vn ) is a basis of E, and let F be the subspace of
dimension n 1 spanned by (v2 , . . . , vn ). In the basis (u1 , v2 . . . , vn ), the matrix of f is of
the form
1 a1 2 . . . a 1 n
0 a2 2 . . . a 2 n
U = ..
.. . .
.. ,
.
.
.
.
0 an 2 . . . an n
since its first column contains the coordinates of 1 u1 over the basis (u1 , v2 , . . . , vn ). If we
let p : E F be the projection defined such that p(u1 ) = 0 and p(vi ) = vi when 2 i n,
the linear map g : F F defined as the restriction of p f to F is represented by the
(n 1) (n 1) matrix V = (ai j )2i,jn over the basis (v2 , . . . , vn ). We need to prove
253
that all the eigenvalues of g belong to K. However, since the first column of U has a single
nonzero entry, we get
U (X) = det(XI U ) = (X 1 ) det(XI V ) = (X 1 )V (X),
where U (X) is the characteristic polynomial of U and V (X) is the characteristic polynomial
of V . It follows that V (X) divides U (X), and since all the roots of U (X) are in K, all
the roots of V (X) are also in K. Consequently, we can apply the induction hypothesis, and
there is a basis (u2 , . . . , un ) of F such that g is represented by an upper triangular matrix
(bi j )1i,jn1 . However,
E = Ku1 F,
and thus (u1 , . . . , un ) is a basis for E. Since p is the projection from E = Ku1 F onto F
and g : F F is the restriction of p f to F , we have
f (u1 ) = 1 u1
and
f (ui+1 ) = a1 i u1 +
i
X
bi j uj+1
j=1
for some a1 i K, when 1 i n 1. But then the matrix of f with respect to (u1 , . . . , un )
is upper triangular.
For the matrix version, we assume that A is the matrix of f with respect to some basis,
Then, we just proved that there is a change of basis matrix P such that A = P T P 1 where
T is upper triangular.
If A = P T P 1 where T is upper triangular, note that the diagonal entries of T are the
eigenvalues 1 , . . . , n of A. Indeed, A and T have the same characteristic polynomial. Also,
if A is a real matrix whose eigenvalues are all real, then P can be chosen to real, and if A
is a rational matrix whose eigenvalues are all rational, then P can be chosen rational. Since
any polynomial over C has all its roots in C, Theorem 8.4 implies that every complex n n
matrix can be triangularized.
If is an eigenvalue of the matrix A and if u is an eigenvector associated with , from
Au = u,
we obtain
A2 u = A(Au) = A(u) = Au = 2 u,
which shows that 2 is an eigenvalue of A2 for the eigenvector u. An obvious induction shows
that k is an eigenvalue of Ak for the eigenvector u, for all k 1. Now, if all eigenvalues
1 , . . . , n of A are in K, it follows that k1 , . . . , kn are eigenvalues of Ak . However, it is not
obvious that Ak does not have other eigenvalues. In fact, this cant happen, and this can be
proved using Theorem 8.4.
254
Proposition 8.5. Given any n n matrix A Mn (K) with coefficients in a field K, if all
eigenvalues 1 , . . . , n of A are in K, then for every polynomial q(X) K[X], the eigenvalues
of q(A) are exactly (q(1 ), . . . , q(n )).
Proof. By Theorem 8.4, there is an upper triangular matrix T and an invertible matrix P
(both in Mn (K)) such that
A = P T P 1 .
Since A and T are similar, they have the same eigenvalues (with the same multiplicities), so
the diagonal entries of T are the eigenvalues of A. Since
Ak = P T k P 1 ,
k 1,
255
Using, Theorem 8.6, we can derive the fact that if A is a Hermitian matrix, then there
is a unitary matrix U and a real diagonal matrix D such that A = U DU . Indeed, since
A = A, we get
U T U = U T U ,
which implies that T = T . Since T is an upper triangular matrix, T is a lower triangular
matrix, which implies that T is a real diagonal matrix. In fact, applying this result to a
(real) symmetric matrix A, we obtain the fact that all the eigenvalues of a symmetric matrix
are real, and by applying Theorem 8.6 again, we conclude that A = QDQ> , where Q is
orthogonal and D is a real diagonal matrix. We will also prove this in Chapter 13.
When A has complex eigenvalues, there is a version of Theorem 8.6 involving only real
matrices provided that we allow T to be block upper-triangular (the diagonal entries may
be 2 2 matrices or real entries).
Theorem 8.6 is not a very practical result but it is a useful theoretical result to cope
with matrices that cannot be diagonalized. For example, it can be used to prove that
every complex matrix is the limit of a sequence of diagonalizable matrices that have distinct
eigenvalues!
Remark: There is another way to prove Proposition 8.5 that does not use Theorem 8.4, but
instead uses the fact that given any field K, there is field extension K of K (K K) such
that every polynomial q(X) = c0 X m + + cm1 X + cm (of degree m 1) with coefficients
ci K factors as
q(X) = c0 (X 1 ) (X n ),
i K, i = 1, . . . , n.
The field K is called an algebraically closed field (and an algebraic closure of K).
Assume that all eigenvalues 1 , . . . , n of A belong to K. Let q(X) be any polynomial
(in K[X]) and let K be any eigenvalue of q(A) (this means that is a zero of the
characteristic polynomial q(A) (X) K[X] of q(A). Since K is algebraically closed, q(A) (X)
has all its root in K). We claim that = q(i ) for some eigenvalue i of A.
Proof. (After Lax [71], Chapter 6). Since K is algebraically closed, the polynomial q(X)
factors as
q(X) = c0 (X 1 ) (X n ),
256
8.3
Location of Eigenvalues
If A is an n n complex (or real) matrix A, it would be useful to know, even roughly, where
the eigenvalues of A are located in the complex plane C. The Gershgorin discs provide some
precise information about this.
Definition 8.4. For any complex n n matrix A, for i = 1, . . . , n, let
Ri0 (A)
n
X
j=1
j6=i
and let
G(A) =
n
[
|ai j |
{z C | |z ai i | Ri0 (A)}.
i=1
Each disc {z C | |z ai i | Ri0 (A)} is called a Gershgorin disc and their union G(A) is
called the Gershgorin domain.
Although easy to prove, the following theorem is very useful:
Theorem 8.7. (Gershgorins disc theorem) For any complex n n matrix A, all the eigenvalues of A belong to the Gershgorin domain G(A). Furthermore the following properties
hold:
(1) If A is strictly row diagonally dominant, that is
|ai i | >
n
X
j=1, j6=i
|ai j |,
for i = 1, . . . , n,
then A is invertible.
(2) If A is strictly row diagonally dominant, and if ai i > 0 for i = 1, . . . , n, then every
eigenvalue of A has a strictly positive real part.
Proof. Let be any eigenvalue of A and let u be a corresponding eigenvector (recall that we
must have u 6= 0). Let k be an index such that
|uk | = max |ui |.
1in
Since Au = u, we have
( ak k )uk =
n
X
j=1
j6=k
ak j u j ,
257
n
X
j=1
j6=k
n
X
j=1
j6=k
|ak j |
n
X
j=1
j6=k
and thus
{z C | |z ak k | Rk0 (A)} G(A),
as claimed.
(1) Strict row diagonal dominance implies that 0 does not belong to any of the Gershgorin
discs, so all eigenvalues of A are nonzero, and A is invertible.
(2) If A is strictly row diagonally dominant and ai i > 0 for i = 1, . . . , n, then each of the
Gershgorin discs lies strictly in the right half-plane, so every eigenvalue of A has a strictly
positive real part.
In particular, Theorem 8.7 implies that if a symmetric matrix is strictly row diagonally
dominant and has strictly positive diagonal entries, then it is positive definite. Theorem 8.7
is sometimes called the GershgorinHadamard theorem.
Since A and A> have the same eigenvalues (even for complex matrices) we also have a
version of Theorem 8.7 for the discs of radius
Cj0 (A)
n
X
i=1
i6=j
|ai j |,
n
X
i=1, i6=j
|ai j |,
for j = 1, . . . , n,
258
(2) If A is strictly column diagonally dominant, and if ai i > 0 for i = 1, . . . , n, then every
eigenvalue of A has a strictly positive real part.
There are refinements of Gershgorins theorem and eigenvalue location results involving
other domains besides discs; for more on this subject, see Horn and Johnson [57], Sections
6.1 and 6.2.
Remark: Neither strict row diagonal dominance nor strict column diagonal dominance are
necessary for invertibility. Also, if we relax all strict inequalities to inequalities, then row
diagonal dominance (or column diagonal dominance) is not a sufficient condition for invertibility.
8.4
Summary
The main concepts and results of this chapter are listed below:
Diagonal matrix .
Eigenvalues, eigenvectors; the eigenspace associated with an eigenvalue.
The characteristic polynomial .
The trace.
algebraic and geometric multiplicity.
Eigenspaces associated with distinct eigenvalues form a direct sum (Proposition 8.3).
Reduction of a matrix to an upper-triangular matrix.
Schur decomposition.
The Gershgorins discs can be used to locate the eigenvalues of a complex matrix; see
Theorems 8.7 and 8.8.
Chapter 9
Iterative Methods for Solving Linear
Systems
9.1
In Chapter 6 we have discussed some of the main methods for solving systems of linear
equations. These methods are direct methods, in the sense that they yield exact solutions
(assuming infinite precision!).
Another class of methods for solving linear systems consists in approximating solutions
using iterative methods. The basic idea is this: Given a linear system Ax = b (with A a
square invertible matrix), find another matrix B and a vector c, such that
1. The matrix I B is invertible
2. The unique solution x
e of the system Ax = b is identical to the unique solution u
e of the
system
u = Bu + c,
and then, starting from any vector u0 , compute the sequence (uk ) given by
uk+1 = Buk + c,
k N.
Under certain conditions (to be clarified soon), the sequence (uk ) converges to a limit u
e
which is the unique solution of u = Bu + c, and thus of Ax = b.
Consequently, it is important to find conditions that ensure the convergence of the above
sequences and to have tools to compare the rate of convergence of these sequences. Thus,
we begin with some general results about the convergence of sequences of vectors and matrices.
Let (E, k k) be a normed vector space. Recall that a sequence (uk ) of vectors uk E
converges to a limit u E, if for every > 0, there some natural number N such that
kuk uk ,
for all k N.
259
260
We write
u = lim uk .
k7
If E is a finite-dimensional vector space and dim(E) = n, we know from Theorem 7.3 that
any two norms are equivalent, and if we choose the norm k k , we see that the convergence
of the sequence of vectors uk is equivalent to the convergence of the n sequences of scalars
formed by the components of these vectors (over any basis). The same property applies to
the finite-dimensional vector space Mm,n (K) of m n matrices (with K = R or K = C),
(k)
which means that the convergence of a sequence of matrices Ak = (aij ) is equivalent to the
(k)
convergence of the m n sequences of scalars (aij ), with i, j fixed (1 i m, 1 j n).
The first theorem below gives a necessary and sufficient condition for the sequence (B k )
of powers of a matrix B to converge to the zero matrix. Recall that the spectral radius (B)
of a matrix B is the maximum of the moduli |i | of the eigenvalues of B.
Theorem 9.1. For any square matrix B, the following conditions are equivalent:
(1) limk7 B k = 0,
(2) limk7 B k v = 0, for all vectors v,
(3) (B) < 1,
(4) kBk < 1, for some subordinate matrix norm k k.
Proof. Assume (1) and let k k be a vector norm on E and k k be the corresponding matrix
norm. For every vector v E, because k k is a matrix norm, we have
kB k vk kB k kkvk,
and since limk7 B k = 0 means that limk7 kB k k = 0, we conclude that limk7 kB k vk = 0,
that is, limk7 B k v = 0. This proves that (1) implies (2).
Assume (2). If We had (B) 1, then there would be some eigenvector u (6= 0) and
some eigenvalue such that
Bu = u,
|| = (B) 1,
but then the sequence (B k u) would not converge to 0, because B k u = k u and |k | = ||k
1. It follows that (2) implies (3).
Assume that (3) holds, that is, (B) < 1. By Proposition 7.10, we can find > 0 small
enough that (B) + < 1, and a subordinate matrix norm k k such that
kBk (B) + ,
which is (4).
261
k7
Proof. We know from Proposition 7.4 that (B) kBk, and since (B) = ((B k ))1/k , we
deduce that
(B) kB k k1/k for all k 1,
and so
(B) lim kB k k1/k .
k7
Now, let us prove that for every > 0, there is some integer N () such that
kB k k1/k (B) + for all k N (),
which proves that
lim kB k k1/k (B),
k7
B
.
(B) +
Since kB k < 1, Theorem 9.1 implies that limk7 Bk = 0. Consequently, there is some
integer N () such that for all k N (), we have
kB k k =
kB k k
1,
((B) + )k
262
9.2
Recall that iterative methods for solving a linear system Ax = b (with A invertible) consists
in finding some matrix B and some vector c, such that I B is invertible, and the unique
solution x
e of Ax = b is equal to the unique solution u
e of u = Bu + c. Then, starting from
any vector u0 , compute the sequence (uk ) given by
k N,
uk+1 = Buk + c,
and say that the iterative method is convergent iff
lim uk = u
e,
k7
ek = uk u
e,
where u
e is the unique solution of the system u = Bu + c. Clearly, the iterative method is
convergent iff
lim ek = 0.
k7
We claim that
ek = B k e0 ,
where e0 = u0 u
e.
k 0,
and because u
e = Be
u + c and ek = B k e0 (by the induction hypothesis), we obtain
uk+1 u
e = Buk Be
u = B(uk u
e) = Bek = BB k e0 = B k+1 e0 ,
proving the induction step. Thus, the iterative method converges iff
lim B k e0 = 0.
k7
263
The next proposition is needed to compare the rate of convergence of iterative methods.
It shows that asymptotically, the error vector ek = B k e0 behaves at worst like ((B))k .
Proposition 9.4. Let kk be any vector norm, let B be a matrix such that I B is invertible,
and let u
e be the unique solution of u = Bu + c.
(1) If (uk ) is any sequence defined iteratively by
k N,
uk+1 = Buk + c,
then
lim
k7
sup
ku0 e
uk=1
kuk u
ek
1/k
= (B).
(2) Let B1 and B2 be two matrices such that I B1 and I B2 are invertibe, assume
that both u = B1 u + c1 and u = B2 u + c2 have the same unique solution u
e, and consider any
two sequences (uk ) and (vk ) defined inductively by
uk+1 = B1 uk + c1
vk+1 = B2 vk + c2 ,
with u0 = v0 . If (B1 ) < (B2 ), then for any > 0, there is some integer N (), such that
for all k N (), we have
sup
ku0 e
uk=1
kvk u
ek
kuk u
ek
1/k
(B2 )
.
(B1 ) +
which implies
(B1 ) = sup kB1k e0 k1/k = kB1k k1/k ,
ke0 k=1
vk u
e = B2k e0 ,
264
with e0 = u0 u
e = v0 u
e. Again, by Proposition 9.2, for every > 0, there is some natural
number N () such that if k N (), then
sup kB1k e0 k1/k (B1 ) + .
ke0 k=1
Furthermore, for all k N (), there exists a vector e0 = e0 (k) such that
ke0 k = 1 and kB2k e0 k1/k = kB2k k1/k (B2 ),
which implies statement (2).
In light of the above, we see that when we investigate new iterative methods, we have to
deal with the following two problems:
1. Given an iterative method with matrix B, determine whether the method is convergent. This involves determining whether (B) < 1, or equivalently whether there is
a subordinate matrix norm such that kBk < 1. By Proposition 7.9, this implies that
I B is invertible (since k Bk = kBk, Proposition 7.9 applies).
2. Given two convergent iterative methods, compare them. The iterative method which
is faster is that whose matrix has the smaller spectral radius.
We now discuss three iterative methods for solving linear systems:
1. Jacobis method
2. Gauss-Seidels method
3. The relaxation method.
9.3
The methods described in this section are instances of the following scheme: Given a linear
system Ax = b, with A invertible, suppose we can write A in the form
A = M N,
with M invertible, and easy to invert, which means that M is close to being a diagonal or
a triangular matrix (perhaps by blocks). Then, Au = b is equivalent to
M u = N u + b,
that is,
u = M 1 N u + M 1 b.
265
Therefore, we are in the situation described in the previous sections with B = M 1 N and
c = M 1 b. In fact, since A = M N , we have
B = M 1 N = M 1 (M A) = I M 1 A,
which shows that I B = M 1 A is invertible. The iterative method associated with the
matrix B = M 1 N is given by
uk+1 = M 1 N uk + M 1 b,
k 0,
starting from any arbitrary vector u0 . From a practical point of view, we do not invert M ,
and instead we solve iteratively the systems
M uk+1 = N uk + b,
k 0.
Various methods correspond to various ways of choosing M and N from A. The first two
methods choose M and N as disjoint submatrices of A, but the relaxation method allows
some overlapping of M and N .
To describe the various choices of M and N , it is convenient to write A in terms of three
submatrices D, E, F , as
A = D E F,
where the only nonzero entries in D are the diagonal entries in A, the only nonzero entries
in E are entries in A below the the diagonal, and the only nonzero entries in F are entries
in A above the diagonal. More explicitly, if
a11
a12
a13
a1n1
a1n
a21
a
a
a
a
22
23
2n1
2n
a31
a32
a33
a3n1
a3n
A= .
,
.
.
.
.
.
..
..
..
..
..
..
an 1
an 2
an 3 an n1
an n
then
D=
a33
.. . .
.
.
0
..
.
an1 n1
,
..
a11
a22
0
..
.
0
..
.
an n
266
0
0
a21
a
a32
0
31
E = .
..
...
...
..
.
an 2
an 3
0
0 a12 a13 a1n1 a1n
0
0 0 a23 a2n1 a2n
..
0
0 0
. a3n1 a3n
0
, F =
.
..
.. . .
..
.. ..
..
.
.
.
. .
.
.
0
0
an1 n
0
0 0
0
0
0
..
.
0
an n1 0
In Jacobis method , we assume that all diagonal entries in A are nonzero, and we pick
M =D
N = E + F,
so that
B = M 1 N = D1 (E + F ) = I D1 A.
J = I D1 A = D1 (E + F ),
which is called Jacobis matrix . The corresponding method, Jacobis iterative method , computes the sequence (uk ) using the recurrence
uk+1 = D1 (E + F )uk + D1 b,
k 0.
k 0.
=
=
..
.
a21 uk1
a12 uk2
..
.
a13 uk3
a23 uk3
a1n ukn
a2n ukn
an1 n ukn
= an1 1 uk1
an1 n2 ukn2
k
k
k
= an 1 u1 an 2 u2
an n1 un1
+ b1
+ b2
.
+ bn1
+ bn
Observe that we can try to speed up the method by using the new value uk+1
instead
1
k+2
k+1
k
of u1 in solving for u2 using the second equations, and more generally, use u1 , . . . , uk+1
i1
instead of uk1 , . . . , uki1 in solving for uk+1
in
the
ith
equation.
This
observation
leads
to
the
i
system
a11 uk+1
1
a22 uk+1
2
..
.
=
=
..
.
a21 uk+1
1
..
.
a12 uk2
a13 uk3
a23 uk3
a1n ukn
a2n ukn
+ b1
+ b2
k+1
an1 n1 uk+1
an1 n2 uk+1
an1 n ukn + bn1
n1 = an1 1 u1
n2
an n uk+1
= an 1 uk+1
an 2 uk+1
an n1 uk+1
+ bn ,
1
2
n1
n
267
k 0.
to compute uk+1
. We also show that in certain important cases (for example, if A is a
i
tridiagonal matrix), the method of Gauss-Seidel converges faster than Jacobis method (in
this case, they both converge or diverge simultaneously).
The new ingredient in the relaxation method is to incorporate part of the matrix D into
N : we define M and N by
M=
D
E
N=
1
D + F,
where 6= 0 is a real parameter to be suitably chosen. Actually, we show in Section 9.4 that
for the relaxation method to converge, we must have (0, 2). Note that the case = 1
corresponds to the method of Gauss-Seidel.
268
If we assume that all diagonal entries of D are nonzero, the matrix M is invertible. The
matrix B is denoted by L and called the matrix of relaxation, with
1
1
D
E
D + F = (D E)1 ((1 )D + F ).
L =
The number is called the parameter of relaxation. When > 1, the relaxation method is
known as successive overrelaxation, abbreviated as SOR.
At first glance, the relaxation matrix L seems at lot more complicated than the GaussSeidel matrix L1 , but the iterative system associated with the relaxation method is very
similar to the method of Gauss-Seidel, and is quite simple. Indeed, the system associated
with the relaxation method is given by
1
D
E uk+1 =
D + F uk + b,
which is equivalent to
(D E)uk+1 = ((1 )D + F )uk + b,
and can be written
Duk+1 = Duk (Duk Euk+1 F uk b).
Explicitly, this is the system
a11 uk+1
= a11 uk1 (a11 uk1 + a12 uk2 + a13 uk3 + + a1n2 ukn2 + a1n1 ukn1 + a1n ukn b1 )
1
a22 uk+1
= a22 uk2 (a21 uk+1
+ a22 uk2 + a23 uk3 + + a2n2 ukn2 + a2n1 ukn1 + a2n ukn b2 )
2
1
..
.
k+1
k
an n uk+1
= an n ukn (an 1 uk+1
+ an 2 uk+1
+ + an n2 uk+1
n
1
2
n2 + an n1 un1 + an n un bn ).
What remains to be done is to find conditions that ensure the convergence of the relaxation method (and the Gauss-Seidel method), that is:
1. Find conditions on , namely some interval I R so that I implies (L ) < 1;
we will prove that (0, 2) is a necessary condition.
2. Find if there exist some optimal value 0 of I, so that
(L0 ) = inf (L ).
I
We will give partial answers to the above questions in the next section.
It is also possible to extend the methods of this section by using block decompositions of
the form A = D E F , where D, E, and F consist of blocks, and with D an invertible
block-diagonal matrix.
269
9.4
We begin with a general criterion for the convergence of an iterative method associated with
a (complex) Hermitian, positive, definite matrix, A = M N . Next, we apply this result to
the relaxation method.
Proposition 9.5. Let A be any Hermitian, positive, definite matrix, written as
A = M N,
with M invertible. Then, M + N is Hermitian, and if it is positive, definite, then
(M 1 N ) < 1,
so that the iterative method converges.
Proof. Since M = A + N and A is Hermitian, A = A, so we get
M + N = A + N + N = A + N + N = M + N = (M + N ) ,
which shows that M + N is indeed Hermitian.
Because A is symmetric, positive, definite, the function
v 7 (v Av)1/2
from Cn to R is a vector norm k k, and let k k also denote its subordinate matrix norm. We
prove that
kM 1 N k < 1,
which, by Theorem 9.1 proves that (M 1 N ) < 1. By definition
kM 1 N k = kI M 1 Ak = sup kv M 1 Avk,
kvk=1
which leads us to evaluate kv M 1 Avk when kvk = 1. If we write w = M 1 Av, using the
facts that kvk = 1, v = A1 M w, A = A, and A = M N , we have
kv wk2 = (v w) A(v w)
= kvk2 v Aw w Av + w Aw
= 1 w M w w M w + w Aw
= 1 w (M + N )w.
Now, since we assumed that M + N is positive definite, if w 6= 0, then w (M + N )w > 0,
and we conclude that
if kvk = 1 then kv M 1 Avk < 1.
270
kvk=1
D
E
N=
1
D + F,
D
1
2
E +
D+F =
D.
If D consists of the diagonal entries of A, then we know from Section 6.3 that these entries
are all positive, and since (0, 2), we see that the matrix ((2 )/)D is positive definite.
If D consists of diagonal blocks of A, because A is positive, definite, by choosing vectors z
obtained by picking a nonzero vector for each block of D and padding with zeros, we see
that each block of D is positive, definite, and thus D itself is positive definite. Therefore, in
all cases, M + N is positive, definite, and we conclude by using Proposition 9.5.
M +N =
E +
D+F =
+ 1 D.
271
But,
1
+
1 ( 1)( 1)
1
1 | 1|2
+ 1=
=
=
,
||2
||2
so the relaxation method also converges for C, provided that
| 1| < 1.
This condition reduces to 0 < < 2 if is real.
Unfortunately, Theorem 9.6 does not apply to Jacobis method, but in special cases,
Proposition 9.5 can be used to prove its convergence. On the positive side, if a matrix
is strictly column (or row) diagonally dominant, then it can be shown that the method of
Jacobi and the method of Gauss-Seidel both converge. The relaxation method also converges
if (0, 1], but this is not a very useful result because the speed-up of convergence usually
occurs for > 1.
We now prove that, without any assumption on A = D E F , other than the fact
that A and D are invertible, in order for the relaxation method to converge, we must have
(0, 2).
Proposition 9.7. Given any matrix A = D E F , with A and D invertible, for any
6= 0, we have
(L ) | 1|.
Therefore, the relaxation method (possibly by blocks) does not converge unless (0, 2). If
we allow to be complex, then we must have
| 1| < 1
for the relaxation method to converge.
Proof. Observe that the product 1 n of the eigenvalues of L , which is equal to det(L ),
is given by
1
D+F
det
= (1 )n .
1 n = det(L ) =
D
E
det
It follows that
(L ) |1 n |1/n = | 1|.
The proof is the same if C.
We now consider the case where A is a tridiagonal matrix , possibly by blocks. In this
case, we obtain precise results about the spectral radius of J and L , and as a consequence,
about the convergence of these methods. We also obtain some information about the rate of
convergence of these methods. We begin with the case = 1, which is technically easier to
deal with. The following proposition gives us the precise relationship between the spectral
radii (J) and (L1 ) of the Jacobi matrix and the Gauss-Seidel matrix.
272
Proposition 9.8. Let A be a tridiagonal matrix (possibly by blocks). If (J) is the spectral
radius of the Jacobi matrix and (L1 ) is the spectral radius of the Gauss-Seidel matrix, then
we have
(L1 ) = ((J))2 .
Consequently, the method of Jacobi and the method of Gauss-Seidel both converge or both
diverge simultaneously (even when A is tridiagonal by blocks); when they converge, the method
of Gauss-Seidel converges faster than Jacobis method.
Proof. We begin with a preliminary result. Let A() with a tridiagonal matrix by block of
the form
A1 1 C1
0
0
0
B1
A2
1 C2
0
.
..
..
..
.
.
.
.
0
A() = .
,
...
...
..
..
.
0
0
Bp2 Ap1 1 Cp1
0
then
det(A()) = det(A(1)),
Bp1
Ap
6= 0.
273
Since A is tridiagonal (or tridiagonal by blocks), using our preliminary result with = 6= 0,
we get
qL1 (2 ) = det(2 D 2 E F ) = det(2 D E F ) = n qJ ().
By continuity, the above equation also holds for = 0. But then, we deduce that:
1. For any 6= 0, if is an eigenvalue of L1 , then 1/2 and 1/2 are both eigenvalues of
J, where 1/2 is one of the complex square roots of .
2. For any 6= 0, if and are both eigenvalues of J, then 2 is an eigenvalue of L1 .
The above immediately implies that (L1 ) = ((J))2 .
We now consider the more general situation where is any real in (0, 2).
Proposition 9.9. Let A be a tridiagonal matrix (possibly by blocks), and assume that the
eigenvalues of the Jacobi matrix are all real. If (0, 2), then the method of Jacobi and the
method of relaxation both converge or both diverge simultaneously (even when A is tridiagonal
by blocks). When they converge, the function 7 (L ) (for (0, 2)) has a unique
minimum equal to 0 1 for
2
p
0 =
,
1 + 1 ((J))2
where 1 < 0 < 2 if (J) > 0. We also have (L1 ) = ((J))2 , as before.
Proof. The proof is very technical and can be found in Serre [96] and Ciarlet [24]. As in the
proof of the previous proposition, we begin by showing that the eigenvalues of the matrix
L are the zeros of the polynomnial
D
+1
D E F = det
E pL (),
qL () = det
where pL () is the characteristic polynomial of L . Then, using the preliminary fact from
Proposition 9.8, it is easy to show that
2
+1
2
n
qL ( ) = qJ
,
for all C, with 6= 0. This time, we cannot extend the above equation to = 0. This
leads us to consider the equation
2 + 1
= ,
which is equivalent to
2 + 1 = 0,
for all 6= 0. Since 6= 0, the above equivalence does not hold for = 1, but this is not a
problem since the case = 1 has already been considered in the previous proposition. Then,
we can show the following:
274
+1
1/2
are eigenvalues of J.
2. For every 6= 0, if and are eigenvalues of J, then + (, ) and (, ) are
eigenvalues of L , where + (, ) and (, ) are the squares of the roots of the
equation
2 + 1 = 0.
It follows that
(L ) =
| pJ ()=0
and since we are assuming that J has real roots, we are led to study the function
M (, ) = max{|+ (, )|, | (, )|},
where R and (0, 2). Actually, because M (, ) = M (, ), it is only necessary to
consider the case where 0.
Note that for 6= 0, the roots of the equation
2 + 1 = 0.
are
2 2 4 + 4
.
2
In turn, this leads to consider the roots of the equation
2 2 4 + 4 = 0,
which are
2(1
1 2 )
2(1 + 1 2 )
2
2(1 + 1 2 )(1 1 2 )
=
=
2
2
2
(1 1 )
1 1 2
and
2(1
1 2 )
2(1 +
1 2 )(1 1 2 )
2
=
,
2
2
(1 + 1 )
1 + 1 2
,
1 + 1 2
1 () =
.
1 1 2
275
Observe that the expression for 0 () is exactly the expression in the statement of our
proposition! The rest of the proof consists in analyzing the variations of the function M (, )
by considering various cases for . In the end, we find that the minimum of (L ) is obtained
for 0 ((J)). The details are tedious and we omit them. The reader will find complete proofs
in Serre [96] and Ciarlet [24].
Combining the results of Theorem 9.6 and Proposition 9.9, we obtain the following result
which gives precise information about the spectral radii of the matrices J, L1 , and L .
Proposition 9.10. Let A be a tridiagonal matrix (possibly by blocks) which is Hermitian,
positive, definite. Then, the methods of Jacobi, Gauss-Seidel, and relaxation, all converge
for (0, 2). There is a unique optimal relaxation parameter
0 =
2
1+
1 ((J))2
such that
(L0 ) = inf (L ) = 0 1.
0<<2
and since A and D are hermitian, positive, definite, we have u Au > 0 and u Du > 0 if
u 6= 0, which proves that R. The rest follows from Theorem 9.6 and Proposition 9.9.
Remark: It is preferable to overestimate rather than underestimate the relaxation parameter when the optimum relaxation parameter is not known exactly.
276
9.5
Summary
The main concepts and results of this chapter are listed below:
Iterative methods. Splitting A as A = M N .
Convergence of a sequence of vectors or matrices.
A criterion for the convergence of the sequence (B k ) of powers of a matrix B to zero
in terms of the spectral radius (B).
A characterization of the spectral radius (B) as the limit of the sequence (kB k k1/k ).
A criterion of the convergence of iterative methods.
Asymptotic behavior of iterative methods.
Splitting A as A = D E F , and the methods of Jacobi , Gauss-Seidel , and relaxation
(and SOR).
The Jacobi matrix, J = D1 (E + F ).
The Gauss-Seidel matrix , L2 = (D E)1 F .
The matrix of relaxation, L = (D E)1 ((1 )D + F ).
Convergence of iterative methods: a general result when A = M N is Hermitian,
positive, definite.
A sufficient condition for the convergence of the methods of Jacobi, Gauss-Seidel, and
relaxation. The Ostrowski-Reich Theorem: A is symmetric, positive, definite, and
(0, 2).
A necessary condition for the convergence of the methods of Jacobi , Gauss-Seidel, and
relaxation: (0, 2).
The case of tridiagonal matrices (possibly by blocks). Simultaneous convergence or divergence of Jacobis method and Gauss-Seidels method, and comparison of the spectral
radii of (J) and (L1 ): (L1 ) = ((J))2 .
The case of tridiagonal, Hermitian, positive, definite matrices (possibly by blocks).
The methods of Jacobi, Gauss-Seidel, and relaxation, all converge.
In the above case, there is a unique optimal relaxation parameter for which (L0 ) <
(L1 ) = ((J))2 < (J) (if (J) 6= 0).
Chapter 10
Euclidean Spaces
Rien nest beau que le vrai.
Hermann Minkowski
10.1
So far, the framework of vector spaces allows us to deal with ratios of vectors and linear
combinations, but there is no way to express the notion of length of a line segment or to talk
about orthogonality of vectors. A Euclidean structure allows us to deal with metric notions
such as orthogonality and length (or distance).
This chapter covers the bare bones of Euclidean geometry. Deeper aspects of Euclidean
geometry are investigated in Chapter 11. One of our main goals is to give the basic properties
of the transformations that preserve the Euclidean structure, rotations and reflections, since
they play an important role in practice. Euclidean geometry is the study of properties
invariant under certain affine maps called rigid motions. Rigid motions are the maps that
preserve the distance between points.
We begin by defining inner products and Euclidean spaces. The CauchySchwarz inequality and the Minkowski inequality are shown. We define orthogonality of vectors and of
subspaces, orthogonal bases, and orthonormal bases. We prove that every finite-dimensional
Euclidean space has orthonormal bases. The first proof uses duality, and the second one
the GramSchmidt orthogonalization procedure. The QR-decomposition for invertible matrices is shown as an application of the GramSchmidt procedure. Linear isometries (also
called orthogonal transformations) are defined and studied briefly. We conclude with a short
section in which some applications of Euclidean geometry are sketched. One of the most
important applications, the method of least squares, is discussed in Chapter 17.
For a more detailed treatment of Euclidean geometry, see Berger [8, 9], Snapper and
Troyer [99], or any other book on geometry, such as Pedoe [88], Coxeter [26], Fresnel [40],
Tisseron [109], or Cagnac, Ramis, and Commeau [19]. Serious readers should consult Emil
277
278
Artins famous book [3], which contains an in-depth study of the orthogonal group, as well
as other groups arising in geometry. It is still worth consulting some of the older classics,
such as Hadamard [53, 54] and Rouche and de Comberousse [89]. The first edition of [53]
was published in 1898, and finally reached its thirteenth edition in 1947! In this chapter it is
assumed that all vector spaces are defined over the field R of real numbers unless specified
otherwise (in a few cases, over the complex numbers C).
First, we define a Euclidean structure on a vector space. Technically, a Euclidean structure over a vector space E is provided by a symmetric bilinear form on the vector space
satisfying some extra properties. Recall that a bilinear form : E E R is definite if for
every u E, u 6= 0 implies that (u, u) 6= 0, and positive if for every u E, (u, u) 0.
Definition 10.1. A Euclidean space is a real vector space E equipped with a symmetric
bilinear form : E E R that is positive definite. More explicitly, : E E R satisfies
the following axioms:
(u1 + u2 , v)
(u, v1 + v2 )
(u, v)
(u, v)
(u, v)
u
=
=
=
=
=
6=
The real number (u, v) is also called the inner product (or scalar product) of u and v. We
also define the quadratic form associated with as the function : E R+ such that
(u) = (u, u),
for all u E.
Since is bilinear, we have (0, 0) = 0, and since it is positive definite, we have the
stronger fact that
(u, u) = 0 iff u = 0,
that is, (u) = 0 iff u = 0.
Given an inner product : E E R on a vector space E, we also denote (u, v) by
uv
or hu, vi or (u|v),
p
and (u) by kuk.
Example 10.1. The standard example of a Euclidean space is Rn , under the inner product
defined such that
(x1 , . . . , xn ) (y1 , . . . , yn ) = x1 y1 + x2 y2 + + xn yn .
This Euclidean space is denoted by En .
279
Example 10.2. For instance, let E be a vector space of dimension 2, and let (e1 , e2 ) be a
basis of E. If a > 0 and b2 ac < 0, the bilinear form defined such that
(x1 e1 + y1 e2 , x2 e1 + y2 e2 ) = ax1 x2 + b(x1 y2 + x2 y1 ) + cy1 y2
yields a Euclidean structure on E. In this case,
(xe1 + ye2 ) = ax2 + 2bxy + cy 2 .
Example 10.3. Let C[a, b] denote the set of continuous functions f : [a, b] R. It is
easily checked that C[a, b] is a vector space of infinite dimension. Given any two functions
f, g C[a, b], let
Z b
f (t)g(t)dt.
hf, gi =
a
We leave as an easy exercise that h, i is indeed an inner product on C[a, b]. In the case
where a = and b = (or a = 0 and b = 2, this makes basically no difference), one
should compute
hsin px, sin qxi,
for all natural numbers p, q 1. The outcome of these calculations is what makes Fourier
analysis possible!
Example 10.4. Let E = Mn (R) be the vector space of real n n matrices. If we view
a matrix A Mn (R) as a long column vector obtained by concatenating together its
columns, we can define the inner product of two matrices A, B Mn (R) as
hA, Bi =
n
X
aij bij ,
i,j=1
Since this can be viewed as the Euclidean product on Rn , it is an inner product on Mn (R).
The corresponding norm
p
kAkF = tr(A> A)
Let us observe that can be recovered from . Indeed, by bilinearity and symmetry, we
have
(u + v) =
=
=
=
(u + v, u + v)
(u, u + v) + (v, u + v)
(u, u) + 2(u, v) + (v, v)
(u) + 2(u, v) + (v).
280
Thus, we have
1
(u, v) = [(u + v) (u) (v)].
2
We also say that is the polar form of .
If E is finite-dimensional and if P
: E E R is aPbilinear form on E, given any basis
(e1 , . . . , en ) of E, we can write x = ni=1 xi ei and y = nj=1 yj ej , and we have
X
X
n
n
n
X
(x, y) =
xi e i ,
yj ej =
xi yj (ei , ej ).
i=1
j=1
i,j=1
If we let G be the matrix G = ((ei , ej )), and if x and y are the column vectors associated
with (x1 , . . . , xn ) and (y1 , . . . , yn ), then we can write
(x, y) = x> Gy = y > G> x.
P
Note that we are committing an abuse of notation, since x = ni=1 xi ei is a vector in E, but
the column vector associated with (x1 , . . . , xn ) belongs to Rn . To avoid this minor abuse, we
could denote the column vector associated with (x1 , . . . , xn ) by x (and similarly y for the
column vector associated with (y1 , . . . , yn )), in wich case the correct expression for (x, y)
is
(x, y) = x> Gy.
However, in view of the isomorphism between E and Rn , to keep notation as simple as
possible, we will use x and y instead of x and y.
Also observe that is symmetric iff G = G> , and is positive definite iff the matrix G
is positive definite, that is,
x> Gx > 0 for all x Rn , x 6= 0.
The matrix G associated with an inner product is called the Gram matrix of the inner
product with respect to the basis (e1 , . . . , en ).
Conversely, if A is a symmetric positive definite n n matrix, it is easy to check that the
bilinear form
hx, yi = x> Ay
is an inner product. If we make a change of basis from the basis (e1 , . . . , en ) to the basis
(f1 , . . . , fn ), and if the change of basis matrix is P (where the jth column of P consists of
the coordinates of fj over the basis (e1 , . . . , en )), then with respect to coordinates x0 and y 0
over the basis (f1 , . . . , fn ), we have
hx, yi = x> Gy = x0> P > GP y 0 ,
so the matrix of our inner product over the basis (f1 , . . . , fn ) is P > GP . We summarize these
facts in the following proposition.
281
Proposition 10.1. Let E be a finite-dimensional vector space, and let (e1 , . . . , en ) be a basis
of E.
1. For any inner product h, i on E, if G = (hei , ej i) is the Gram matrix of the inner
product h, i w.r.t. the basis (e1 , . . . , en ), then G is symmetric positive definite.
2. For any change of basis matrix P , the Gram matrix of h, i with respect to the new
basis is P > GP .
3. If A is any n n symmetric positive definite matrix, then
hx, yi = x> Ay
is an inner product on E.
We will see later that a symmetric matrix is positive definite iff its eigenvalues are all
positive.
p
One of the very important properties of an inner product is that the map u 7 (u)
is a norm.
Proposition 10.2. Let E be a Euclidean space with inner product , and let be the
corresponding quadratic form. For all u, v E, we have the CauchySchwarz inequality
(u, v)2 (u)(v),
the equality holding iff u and v are linearly dependent.
We also have the Minkowski inequality
p
p
p
(u + v) (u) + (v),
the equality holding iff u and v are linearly dependent, where in addition if u 6= 0 and v 6= 0,
then u = v for some > 0.
Proof. For any vectors u, v E, we define the function T : R R such that
T () = (u + v),
for all R. Using bilinearity and symmetry, we have
(u + v) =
=
=
=
(u + v, u + v)
(u, u + v) + (v, u + v)
(u, u) + 2(u, v) + 2 (v, v)
(u) + 2(u, v) + 2 (v).
282
p
(u)(v).
(u)(v),
which is trivial when (u, v) 0, and follows from the CauchySchwarz inequality when
(u, v) 0. Thus, the Minkowski inequality holds. Finally, assume that u 6= 0 and v 6= 0,
and that
p
p
p
(u + v) = (u) + (v).
When this is the case, we have
(u, v) =
p
(u)(v),
and we know from the discussion of the CauchySchwarz inequality that the equality holds
iff u and v are linearly dependent. The Minkowski inequality is an equality when u or v is
null. Otherwise, if u 6= 0 and v 6= 0, then u = v for some 6= 0, and since
p
(u, v) = (v, v) = (u)(v),
by positivity, we must have > 0.
283
284
where the second formula is obtained by swapping x and y. Then by adding up these
equations, we get
kx + y + zk2 = kxk2 + kyk2 + kx + zk2 + ky + zk2
1
1
kx y + zk2 ky x + zk2 .
2
2
1
1
kx y zk2 ky x zk2 ,
2
2
Since kx y + zk = k(x y + z)k = ky x zk and ky x + zk = k(y x + z)k =
kx y zk, by subtracting the last two equations, we get
kx + y zk2 = kxk2 + kyk2 + kx zk2 + ky zk2
1
hx + y, zi = (kx + y + zk2 kx + y zk2 )
4
1
1
= (kx + zk2 kx zk2 ) + (ky + zk2 ky zk2 )
4
4
= hx, zi + hy, zi,
as desired.
Proving that
hx, yi = hx, yi for all R
is a little tricky. The strategy is to prove the identity for Z, then to promote it to Q,
and then to R by continuity.
Since
1
hu, vi = (ku + vk2 ku vk2 )
4
1
= (ku vk2 ku + vk2 )
4
= hu, vi,
285
the property holds for = 1. By linearity and by induction, for any n N with n 1,
writing n = n 1 + 1, we get
hx, yi = hx, yi for all N,
and since the above also holds for = 1, it holds for all Z. For = p/q with p, q Z
and q 6= 0, we have
qh(p/q)u, vi = hpu, vi = phu, vi,
which shows that
h(p/q)u, vi = (p/q)hu, vi,
and thus
hx, yi = hx, yi for all Q.
To finish the proof, we use the fact that a norm is a continuous map x 7 kxk. Then, the
continuous function t 7 1t htu, vi defined on R {0} agrees with hu, vi on Q {0}, so it is
equal to hu, vi on R {0}. The case = 0 is trivial, so we are done.
We now define orthogonality.
10.2
An inner product on a vector space gives the ability to define the notion of orthogonality.
Families of nonnull pairwise orthogonal vectors must be linearly independent. They are
called orthogonal families. In a vector space of finite dimension it is always possible to find
orthogonal bases. This is very useful theoretically and practically. Indeed, in an orthogonal
basis, finding the coordinates of a vector is very cheap: It takes an inner product. Fourier
series make crucial use of this fact. When E has finite dimension, we prove that the inner
product on E induces a natural isomorphism between E and its dual space E . This allows
us to define the adjoint of a linear map in an intrinsic fashion (i.e., independently of bases).
It is also possible to orthonormalize any basis (certainly when the dimension is finite). We
give two proofs, one using duality, the other more constructive using the GramSchmidt
orthonormalization procedure.
Definition 10.2. Given a Euclidean space E, any two vectors u, v E are orthogonal, or
perpendicular , if u v = 0. Given a family (ui )iI of vectors in E, we say that (ui )iI is
orthogonal if ui uj = 0 for all i, j I, where i 6= j. We say that the family (ui )iI is
orthonormal if ui uj = 0 for all i, j I, where i 6= j, and kui k = ui ui = 1, for all i I.
For any subset F of E, the set
F = {v E | u v = 0, for all u F },
of all vectors orthogonal to all vectors in F , is called the orthogonal complement of F .
286
Since inner products are positive definite, observe that for any vector u E, we have
u v = 0 for all v E
iff u = 0.
and
if p = q, p, q 1,
if p =
6 q, p, q 0,
Proposition 10.3. Given a Euclidean space E, for any family (ui )iI of nonnull vectors in
E, if (ui )iI is orthogonal, then it is linearly independent.
Proposition 10.4. Given a Euclidean space E, any two vectors u, v E are orthogonal iff
ku + vk2 = kuk2 + kvk2 .
One of the most useful features of orthonormal bases is that they afford a very simple
method for computing the coordinates of a vector over any basis vector. Indeed, assume
that (e1 , . . . , em ) is an orthonormal basis. For any vector
x = x1 e1 + + xm em ,
if we compute the inner product x ei , we get
x ei = x1 e1 ei + + xi ei ei + + xm em ei = xi ,
ei ej =
287
1 if i = j,
0 if i 6= j
where the family of scalars (xi )iI has finite support, which means that xi = 0 for all
i I J, where J is a finite set. Thus, even though the family (sin px)p1 (cos qx)q0 is
orthogonal
(it
is not orthonormal, but becomes so if we divide every trigonometric function by
X
(ak cos kx + bk sin kx)
f (x) = a0 +
k=1
does not mean that (sin px)p1 (cos qx)q0 is a basis of this vector space of functions,
because in general, the families (ak ) and (bk ) do not have finite support! In order for this
infinite linear combination to make sense, it is necessary to prove that the partial sums
a0 +
n
X
k=1
of the series converge to a limit when n goes to infinity. This requires a topology on the
space.
A very important property of Euclidean spaces of finite dimension is that the inner
product induces a canonical bijection (i.e., independent of the choice of bases) between the
vector space E and its dual E .
Given a Euclidean space E, for any vector u E, let u : E R be the map defined
such that
u (v) = u v,
288
for all v E.
Since the inner product is bilinear, the map u is a linear form in E . Thus, we have a
map [ : E E , defined such that
[(u) = u .
Theorem 10.5. Given a Euclidean space E, the map [ : E E defined such that
[(u) = u
is linear and injective. When E is also of finite dimension, the map [ : E E is a canonical
isomorphism.
Proof. That [ : E E is a linear map follows immediately from the fact that the inner
product is bilinear. If u = v , then u (w) = v (w) for all w E, which by definition of u
means that
uw =vw
for all w E, which by bilinearity is equivalent to
(v u) w = 0
for all w E, which implies that u = v, since the inner product is positive definite. Thus,
[ : E E is injective. Finally, when E is of finite dimension n, we know that E is also of
dimension n, and then [ : E E is bijective.
The inverse of the isomorphism [ : E E is denoted by ] : E E.
As a consequence of Theorem 10.5, if E is a Euclidean space of finite dimension, every
linear form f E corresponds to a unique u E such that
f (v) = u v,
for every v E. In particular, if f is not the null form, the kernel of f , which is a hyperplane
H, is precisely the set of vectors that are orthogonal to u.
Remarks:
(1) The musical map [ : E E is not surjective when E has infinite dimension. The
result can be salvaged by restricting our attention to continuous linear maps, and by
assuming that the vector space E is a Hilbert space (i.e., E is a complete normed vector
space w.r.t. the Euclidean norm). This is the famous little Riesz theorem (or Riesz
representation theorem).
289
(2) Theorem 10.5 still holds if the inner product on E is replaced by a nondegenerate
symmetric bilinear form . We say that a symmetric bilinear form : E E R is
nondegenerate if for every u E,
if (u, v) = 0 for all v E, then u = 0.
For example, the symmetric bilinear form on R4 defined such that
((x1 , x2 , x3 , x4 ), (y1 , y2 , y3 , y4 )) = x1 y1 + x2 y2 + x3 y3 x4 y4
is nondegenerate. However, there are nonnull vectors u R4 such that (u, u) = 0,
which is impossible in a Euclidean space. Such vectors are called isotropic.
The existence of the isomorphism [ : E E is crucial to the existence of adjoint maps.
The importance of adjoint maps stems from the fact that the linear maps arising in physical
problems are often self-adjoint, which means that f = f . Moreover, self-adjoint maps can
be diagonalized over orthonormal bases of eigenvectors. This is the key to the solution of
many problems in mechanics, and engineering in general (see Strang [104]).
Let E be a Euclidean space of finite dimension n, and let f : E E be a linear map.
For every u E, the map
v 7 u f (v)
is clearly a linear form in E , and by Theorem 10.5, there is a unique vector in E denoted
by f (u) such that
f (u) v = u f (v),
for every v E. The following simple proposition shows that the map f is linear.
Proposition 10.6. Given a Euclidean space E of finite dimension, for every linear map
f : E E, there is a unique linear map f : E E such that
f (u) v = u f (v),
for all u, v E. The map f is called the adjoint of f (w.r.t. to the inner product).
Proof. Given u1 , u2 E, since the inner product is bilinear, we have
(u1 + u2 ) f (v) = u1 f (v) + u2 f (v),
for all v E, and
290
and
f (u2 ) v = u2 f (v),
for all v E, we get
291
called normal linear maps. We will see later on that normal maps can always be diagonalized
over orthonormal bases of eigenvectors, but this will require using a Hermitian inner product
(over C).
Given two Euclidean spaces E and F , where the inner product on E is denoted by h, i1
and the inner product on F is denoted by h, i2 , given any linear map f : E F , it is
immediately verified that the proof of Proposition 10.6 can be adapted to show that there
is a unique linear map f : F E such that
hf (u), vi2 = hu, f (v)i1
for all u E and all v F . The linear map f is also called the adjoint of f .
Remark: Given any basis for E and any basis for F , it is possible to characterize the matrix
of the adjoint f of f in terms of the matrix of f , and the symmetric matrices defining the
inner products. We will do so with respect to orthonormal bases. Also, since inner products
are symmetric, the adjoint f of f is also characterized by
f (u) v = u f (v),
for all u, v E.
We can also use Theorem 10.5 to show that any Euclidean space of finite dimension has
an orthonormal basis.
Proposition 10.7. Given any nontrivial Euclidean space E of finite dimension n 1, there
is an orthonormal basis (u1 , . . . , un ) for E.
Proof. We proceed by induction on n. When n = 1, take any nonnull vector v E, which
exists, since we assumed E nontrivial, and let
v
.
u=
kvk
If n 2, again take any nonnull vector v E, and let
v
u1 =
.
kvk
Consider the linear form u1 associated with u1 . Since u1 6= 0, by Theorem 10.5, the linear
form u1 is nonnull, and its kernel is a hyperplane H. Since u1 (w) = 0 iff u1 w = 0,
the hyperplane H is the orthogonal complement of {u1 }. Furthermore, since u1 6= 0 and
the inner product is positive definite, u1 u1 6= 0, and thus, u1
/ H, which implies that
E = H Ru1 . However, since E is of finite dimension n, the hyperplane H has dimension
n 1, and by the induction hypothesis, we can find an orthonormal basis (u2 , . . . , un ) for H.
Now, because H and the one dimensional space Ru1 are orthogonal and E = H Ru1 , it is
clear that (u1 , . . . , un ) is an orthonormal basis for E.
292
There is a more constructive way of proving Proposition 10.7, using a procedure known as
the GramSchmidt orthonormalization procedure. Among other things, the GramSchmidt
orthonormalization procedure yields the QR-decomposition for matrices, an important tool
in numerical methods.
Proposition 10.8. Given any nontrivial Euclidean space E of finite dimension n 1, from
any basis (e1 , . . . , en ) for E we can construct an orthonormal basis (u1 , . . . , un ) for E, with
the property that for every k, 1 k n, the families (e1 , . . . , ek ) and (u1 , . . . , uk ) generate
the same subspace.
Proof. We proceed by induction on n. For n = 1, let
u1 =
e1
.
ke1 k
u1 =
e1
,
ke1 k
and assuming that (u1 , . . . , uk ) is an orthonormal system that generates the same subspace
as (e1 , . . . , ek ), for every k with 1 k < n, we note that the vector
u0k+1
k
X
= ek+1
(ek+1 ui ) ui
i=1
is nonnull, since otherwise, because (u1 , . . . , uk ) and (e1 , . . . , ek ) generate the same subspace,
(e1 , . . . , ek+1 ) would be linearly dependent, which is absurd, since (e1 , . . ., en ) is a basis.
Thus, the norm of the vector u0k+1 being nonzero, we use the following construction of the
vectors uk and u0k :
u0
u01 = e1 ,
u1 = 01 ,
ku1 k
and for the inductive step
u0k+1
k
X
= ek+1
(ek+1 ui ) ui ,
i=1
uk+1 =
u0k+1
,
ku0k+1 k
293
Note that u0k+1 is obtained by subtracting from ek+1 the projection of ek+1 itself onto the
orthonormal vectors u1 , . . . , uk that have already been computed. Then, u0k+1 is normalized.
Remarks:
(1) The QR-decomposition can now be obtained very easily, but we postpone this until
Section 10.4.
(2) We could compute u0k+1 using the formula
u0k+1
= ek+1
k
X
ek+1 u0
i
i=1
ku0i k2
u0i ,
and normalize the vectors u0k at the end. This time, we are subtracting from ek+1
the projection of ek+1 itself onto the orthogonal vectors u01 , . . . , u0k . This might be
preferable when writing a computer program.
(3) The proof of Proposition 10.8 also works for a countably infinite basis for E, producing
a countably infinite orthonormal basis.
Example 10.6. If we consider polynomials and the inner product
hf, gi =
f (t)g(t)dt,
1
where fn
and Pn (x) =
1
2n n!
fn(n) (x),
294
hf, gi =
f (t)g(t)dt.
1 t2
1
We leave it as an exercise to prove that the above defines an inner product. It can be shown
that the polynomials Tn (x) given by
Tn (x) = cos(n arccos x),
n 0,
(equivalently, with x = cos , we have Tn (cos ) = cos(n)) are orthogonal with respect to
the above inner product. These polynomials are the Chebyshev polynomials. Their norm is
not equal to 1. Instead, we have
(
if n > 0,
hTn , Tn i = 2
if n = 0.
Using the identity (cos + i sin )n = cos n + i sin n and the binomial formula, we obtain
the following expression for Tn (x):
Tn (x) =
bn/2c
X
k=0
n
(x2 1)k xn2k .
2k
n 1.
(x x2 1)n + (x + x2 1)n
.
Tn (x) =
2
295
The polynomial Tn has n distinct roots in the interval [1, 1]. The Chebyshev polynomials
play an important role in approximation theory. They are used as an approximation to a
best polynomial approximation of a continuous function under the sup-norm (-norm).
The inner products of the last two examples are special cases of an inner product of the
form
Z 1
hf, gi =
W (t)f (t)g(t)dt,
1
where W (t) is a weight function. If W is a nonzero continuous function such that W (x) 0
on (1, 1), then the above bilinear form is indeed positive definite. Families of orthogonal
polynomials used in approximation theory and in physics arise by a suitable choice of the
weight function W . Besides the previous two examples, the Hermite polynomials correspond
2
to W (x) = ex , the Laguerre polynomials to W (x) = ex , and the Jacobi polynomials
to W (x) = (1 x) (1 + x) , with , > 1. Comprehensive treatments of orthogonal
polynomials can be found in Lebedev [72], Sansone [91], and Andrews, Askey and Roy [2].
As a consequence of Proposition 10.7 (or Proposition 10.8), given any Euclidean space
of finite dimension n, if (e1 , . . . , en ) is an orthonormal basis for E, then for any two vectors
u = u1 e1 + + un en and v = v1 e1 + + vn en , the inner product u v is expressed as
u v = (u1 e1 + + un en ) (v1 e1 + + vn en ) =
n
X
ui vi ,
i=1
X
n
u2i
1/2
.
i=1
The fact that a Euclidean space always has an orthonormal basis implies that any Gram
matrix G can be written as
G = Q> Q,
for some invertible matrix Q. Indeed, we know that in a change of basis matrix, a Gram
matrix G becomes G0 = P > GP . If the basis corresponding to G0 is orthonormal, then G0 = I,
so G = (P 1 )> P 1 .
We can also prove the following proposition regarding orthogonal spaces.
Proposition 10.9. Given any nontrivial Euclidean space E of finite dimension n 1, for
any subspace F of dimension k, the orthogonal complement F of F has dimension n k,
and E = F F . Furthermore, we have F = F .
Proof. From Proposition 10.7, the subspace F has some orthonormal basis (u1 , . . . , uk ). This
linearly independent family (u1 , . . . , uk ) can be extended to a basis (u1 , . . . , uk , vk+1 , . . . , vn ),
296
10.3
In this section we consider linear maps between Euclidean spaces that preserve the Euclidean
norm. These transformations, sometimes called rigid motions, play an important role in
geometry.
Definition 10.3. Given any two nontrivial Euclidean spaces E and F of the same finite
dimension n, a function f : E F is an orthogonal transformation, or a linear isometry, if
it is linear and
kf (u)k = kuk, for all u E.
Remarks:
(1) A linear isometry is often defined as a linear map such that
kf (v) f (u)k = kv uk,
for all u, v E. Since the map f is linear, the two definitions are equivalent. The
second definition just focuses on preserving the distance between vectors.
(2) Sometimes, a linear map satisfying the condition of Definition 10.3 is called a metric
map, and a linear isometry is defined as a bijective metric map.
An isometry (without the word linear) is sometimes defined as a function f : E F (not
necessarily linear) such that
kf (v) f (u)k = kv uk,
for all u, v E, i.e., as a function that preserves the distance. This requirement turns out to
be very strong. Indeed, the next proposition shows that all these definitions are equivalent
when E and F are of finite dimension, and for functions such that f (0) = 0.
Proposition 10.10. Given any two nontrivial Euclidean spaces E and F of the same finite
dimension n, for every function f : E F , the following properties are equivalent:
(1) f is a linear map and kf (u)k = kuk, for all u E;
297
for all u, v E, then for any vector E, the function g : E F defined such that
g(u) = f ( + u) f ( )
for all u E is a linear map such that g(0) = 0 and (3) holds. Clearly, g(0) = f ( )f ( ) = 0.
Note that from the hypothesis
kf (v) f (u)k = kv uk
for all u, v E, we conclude that
kg(v) g(u)k =
=
=
=
kf ( + v) f ( ) (f ( + u) f ( ))k,
kf ( + v) f ( + u)k,
k + v ( + u)k,
kv uk,
for all v E. In other words, g preserves both the distance and the norm.
To prove that g preserves the inner product, we use the simple fact that
2u v = kuk2 + kvk2 ku vk2
298
and thus g(u) g(v) = u v, for all u, v E, which is (3). In particular, if f (0) = 0, by letting
= 0, we have g = f , and f preserves the scalar product, i.e., (3) holds.
Now assume that (3) holds. Since E is of finite dimension, we can pick an orthonormal basis (e1 , . . . , en ) for E. Since f preserves inner products, (f (e1 ), . . ., f (en )) is also
orthonormal, and since F also has dimension n, it is a basis of F . Then note that for any
u = u1 e1 + + un en , we have
ui = u e i ,
for all i, 1 i n. Thus, we have
n
X
f (u) =
(f (u) f (ei ))f (ei ),
i=1
n
X
i=1
(u ei )f (ei ) =
n
X
ui f (ei ),
i=1
which shows that f is linear. Obviously, f preserves the Euclidean norm, and (3) implies
(1).
Finally, if f (u) = f (v), then by linearity f (v u) = 0, so that kf (v u)k = 0, and since
f preserves norms, we must have kv uk = 0, and thus u = v. Thus, f is injective, and
since E and F have the same finite dimension, f is bijective.
Remarks:
(i) The dimension assumption is needed only to prove that (3) implies (1) when f is not
known to be linear, and to prove that f is surjective, but the proof shows that (1)
implies that f is injective.
(ii) The implication that (3) implies (1) holds if we also assume that f is surjective, even
if E has infinite dimension.
In (2), when f does not satisfy the condition f (0) = 0, the proof shows that f is an affine
map. Indeed, taking any vector as an origin, the map g is linear, and
f ( + u) = f ( ) + g(u) for all u E.
From section 19.7, this shows that f is affine with associated linear map g.
This fact is worth recording as the following proposition.
299
Proposition 10.11. Given any two nontrivial Euclidean spaces E and F of the same finite
dimension n, for every function f : E F , if
kf (v) f (u)k = kv uk
for all u, v E,
10.4
In this section we explore some of the basic properties of the orthogonal group and of
orthogonal matrices.
Proposition 10.12. Let E be any Euclidean space of finite dimension n, and let f : E E
be any linear map. The following properties hold:
(1) The linear map f : E E is an isometry iff
f f = f f = id.
(2) For every orthonormal basis (e1 , . . . , en ) of E, if the matrix of f is A, then the matrix
of f is the transpose A> of A, and f is an isometry iff A satisfies the identities
A A> = A> A = In ,
where In denotes the identity matrix of order n, iff the columns of A form an orthonormal basis of E, iff the rows of A form an orthonormal basis of E.
Proof. (1) The linear map f : E E is an isometry iff
f (u) f (v) = u v,
for all u, v E, iff
(f (f (u)) u) v = 0
300
for all u, v E. Since the inner product is positive definite, we must have
f (f (u)) u = 0
for all u E, that is,
f f = f f = id.
301
The proof of Proposition 10.10 (3) also shows that if f is an isometry, then the image of an
orthonormal basis (u1 , . . . , un ) is an orthonormal basis. Students often ask why orthogonal
matrices are not called orthonormal matrices, since their columns (and rows) are orthonormal
bases! I have no good answer, but isometries do preserve orthogonality, and orthogonal
matrices correspond to isometries.
Recall that the determinant det(f ) of a linear map f : E E is independent of the
choice of a basis in E. Also, for every matrix A Mn (R), we have det(A) = det(A> ), and
for any two n n matrices A and B, we have det(AB) = det(A) det(B). Then, if f is an
isometry, and A is its matrix with respect to any orthonormal basis, A A> = A> A = In
implies that det(A)2 = 1, that is, either det(A) = 1, or det(A) = 1. It is also clear that
the isometries of a Euclidean space of dimension n form a group, and that the isometries of
determinant +1 form a subgroup. This leads to the following definition.
Definition 10.5. Given a Euclidean space E of dimension n, the set of isometries f : E E
forms a subgroup of GL(E) denoted by O(E), or O(n) when E = Rn , called the orthogonal
group (of E). For every isometry f , we have det(f ) = 1, where det(f ) denotes the determinant of f . The isometries such that det(f ) = 1 are called rotations, or proper isometries,
or proper orthogonal transformations, and they form a subgroup of the special linear group
SL(E) (and of O(E)), denoted by SO(E), or SO(n) when E = Rn , called the special orthogonal group (of E). The isometries such that det(f ) = 1 are called improper isometries,
or improper orthogonal transformations, or flip transformations.
As an immediate corollary of the GramSchmidt orthonormalization procedure, we obtain
the QR-decomposition for invertible matrices.
10.5
Now that we have the definition of an orthogonal matrix, we can explain how the Gram
Schmidt orthonormalization procedure immediately yields the QR-decomposition for matrices.
Proposition 10.13. Given any real n n matrix A, if A is invertible, then there is an
orthogonal matrix Q and an upper triangular matrix R with positive diagonal entries such
that A = QR.
Proof. We can view the columns of A as vectors A1 , . . . , An in En . If A is invertible, then they
are linearly independent, and we can apply Proposition 10.8 to produce an orthonormal basis
using the GramSchmidt orthonormalization procedure. Recall that we construct vectors
0
Qk and Q k as follows:
0
Q1
01
1
1
,
Q =A ,
Q =
kQ0 1 k
302
k+1
=A
k
X
(Ak+1 Qi ) Qi ,
i=1
k+1
Q k+1
,
=
kQ0 k+1 k
0
A1 = kQ 1 kQ1 ,
..
.
0
j
A = (Aj Q1 ) Q1 + + (Aj Qi ) Qi + + kQ j kQj ,
..
.
0
n
A = (An Q1 ) Q1 + + (An Qn1 ) Qn1 + kQ n kQn .
0
0 0 5
A = 0 4 1 .
1 1 1
303
0 0 1
Q = 0 1 0
1 0 0
and
1 1 1
R = 0 4 1 .
0 0 5
2 1/2 2
1 1 2
1/ 2 1/ 2 0
0 1 0 1/ 2
A = 0 0 1 = 0
2 .
1 0 0
1/ 2 1/ 2 0
0
0
1
The QR-decomposition yields a rather efficient and numerically stable method for solving
systems of linear equations. Indeed, given a system Ax = b, where A is an n n invertible
matrix, writing A = QR, since Q is orthogonal, we get
Rx = Q> b,
and since R is upper triangular, we can solve it by Gaussian elimination, by solving for the
last variable xn first, substituting its value into the system, then solving for xn1 , etc. The
QR-decomposition is also very useful in solving least squares problems (we will come back
to this later on), and for finding eigenvalues. It can be easily adapted to the case where A is
a rectangular m n matrix with independent columns (thus, n m). In this case, Q is not
quite orthogonal. It is an m n matrix whose columns are orthogonal, and R is an invertible
n n upper triangular matrix with positive diagonal entries. For more on QR, see Strang
[104, 105], Golub and Van Loan [49], Demmel [27], Trefethen and Bau [110], or Serre [96].
It should also be said that the GramSchmidt orthonormalization procedure that we have
presented is not very stable numerically, and instead, one should use the modified Gram
0
Schmidt method . To compute Q k+1 , instead of projecting Ak+1 onto Q1 , . . . , Qk in a single
k+1
k+1
step, it is better to perform k projections. We compute Qk+1
as follows:
1 , Q2 , . . . , Qk
Qk+1
= Ak+1 (Ak+1 Q1 ) Q1 ,
1
Qk+1
= Qk+1
(Qk+1
Qi+1 ) Qi+1 ,
i+1
i
i
0
10.6
Euclidean geometry has applications in computational geometry, in particular Voronoi diagrams and Delaunay triangulations. In turn, Voronoi diagrams have applications in motion
planning (see ORourke [87]).
304
Euclidean geometry also has applications to matrix analysis. Recall that a real n n
matrix A is symmetric if it is equal to its transpose A> . One of the most important properties
of symmetric matrices is that they have real eigenvalues and that they can be diagonalized
by an orthogonal matrix (see Chapter 13). This means that for every symmetric matrix A,
there is a diagonal matrix D and an orthogonal matrix P such that
A = P DP > .
Even though it is not always possible to diagonalize an arbitrary matrix, there are various
decompositions involving orthogonal matrices that are of great practical interest. For example, for every real matrix A, there is the QR-decomposition, which says that a real matrix
A can be expressed as
A = QR,
where Q is orthogonal and R is an upper triangular matrix. This can be obtained from the
GramSchmidt orthonormalization procedure, as we saw in Section 10.5, or better, using
Householder matrices, as shown in Section 11.2. There is also the polar decomposition,
which says that a real matrix A can be expressed as
A = QS,
where Q is orthogonal and S is symmetric positive semidefinite (which means that the eigenvalues of S are nonnegative). Such a decomposition is important in continuum mechanics
and in robotics, since it separates stretching from rotation. Finally, there is the wonderful
singular value decomposition, abbreviated as SVD, which says that a real matrix A can be
expressed as
A = V DU > ,
where U and V are orthogonal and D is a diagonal matrix with nonnegative entries (see
Chapter 16). This decomposition leads to the notion of pseudo-inverse, which has many
applications in engineering (least squares solutions, etc). For an excellent presentation of all
these notions, we highly recommend Strang [105, 104], Golub and Van Loan [49], Demmel
[27], Serre [96], and Trefethen and Bau [110].
The method of least squares, invented by Gauss and Legendre around 1800, is another
great application of Euclidean geometry. Roughly speaking, the method is used to solve
inconsistent linear systems Ax = b, where the number of equations is greater than the
number of variables. Since this is generally impossible, the method of least squares consists
in finding a solution x minimizing the Euclidean norm kAx bk2 , that is, the sum of the
squares of the errors. It turns out that there is always a unique solution x+ of smallest
norm minimizing kAx bk2 , and that it is a solution of the square system
A> Ax = A> b,
called the system of normal equations. The solution x+ can be found either by using the QRdecomposition in terms of Householder transformations, or by using the notion of pseudoinverse of a matrix. The pseudo-inverse can be computed using the SVD decomposition.
10.7. SUMMARY
305
Least squares methods are used extensively in computer vision More details on the method
of least squares and pseudo-inverses can be found in Chapter 17.
10.7
Summary
The main concepts and results of this chapter are listed below:
Bilinear forms; positive definite bilinear forms.
inner products, scalar products, Euclidean spaces.
quadratic form associated with a bilinear form.
The Euclidean space En .
The polar form of a quadratic form.
Gram matrix associated with an inner product.
The CauchySchwarz inequality; the Minkowski inequality.
The parallelogram law .
Orthogonality, orthogonal complement F ; orthonormal family.
The musical isomorphisms [ : E E and ] : E E (when E is finite-dimensional);
Theorem 10.5.
The adjoint of a linear map (with respect to an inner product).
Existence of an orthonormal basis in a finite-dimensional Euclidean space (Proposition
10.7).
The GramSchmidt orthonormalization procedure (Proposition 10.8).
The Legendre and the Chebyshev polynomials.
Linear isometries (orthogonal transformations, rigid motions).
The orthogonal group, orthogonal matrices.
The matrix representing the adjoint f of a linear map f is the transpose of the matrix
representing f .
The orthogonal group O(n) and the special orthogonal group SO(n).
QR-decomposition for invertible matrices.
306
Chapter 11
QR-Decomposition for Arbitrary
Matrices
11.1
Orthogonal Reflections
Hyperplane reflections are represented by matrices called Householder matrices. These matrices play an important role in numerical methods, for instance for solving systems of linear
equations, solving least squares problems, for computing eigenvalues, and for transforming a
symmetric matrix into a tridiagonal matrix. We prove a simple geometric lemma that immediately yields the QR-decomposition of arbitrary matrices in terms of Householder matrices.
Orthogonal symmetries are a very important example of isometries. First let us review
the definition of projections. Given a vector space E, let F and G be subspaces of E that
form a direct sum E = F G. Since every u E can be written uniquely as u = v + w,
where v F and w G, we can define the two projections pF : E F and pG : E G such
that pF (u) = v and pG (u) = w. It is immediately verified that pG and pF are linear maps,
and that p2F = pF , p2G = pG , pF pG = pG pF = 0, and pF + pG = id.
Definition 11.1. Given a vector space E, for any two subspaces F and G that form a direct
sum E = F G, the symmetry (or reflection) with respect to F and parallel to G is the
linear map s : E E defined such that
s(u) = 2pF (u) u,
for every u E.
Because pF + pG = id, note that we also have
s(u) = pF (u) pG (u)
and
s(u) = u 2pG (u),
307
308
309
with respect to the origin. When G is a plane, p = n 2, and det(s) = (1)2 = 1, so that a
flip about F is a rotation. In particular, when n = 3, F is a line, and a flip about the line
F is indeed a rotation of measure .
Remark: Given any two orthogonal subspaces F, G forming a direct sum E = F G, let
f be the symmetry with respect to F and parallel to G, and let g be the symmetry with
respect to G and parallel to F . We leave as an exercise to show that
f g = g f = id.
When F = H is a hyperplane, we can give an explicit formula for s(u) in terms of any
nonnull vector w orthogonal to H. Indeed, from
u = pH (u) + pG (u),
since pG (u) G and G is spanned by w, which is orthogonal to H, we have
pG (u) = w
for some R, and we get
and thus
u w = kwk2 ,
pG (u) =
Since
we get
(u w)
w.
kwk2
(u w)
w.
kwk2
Such reflections are represented by matrices called Householder matrices, and they play
an important role in numerical matrix analysis (see Kincaid and Cheney [63] or Ciarlet
[24]). Householder matrices are symmetric and orthogonal. It is easily checked that over an
orthonormal basis (e1 , . . . , en ), a hyperplane reflection about a hyperplane H orthogonal to
a nonnull vector w is represented by the matrix
s(u) = u 2
H = In 2
WW>
WW>
=
I
2
,
n
kW k2
W >W
where W is the column vector of the coordinates of w over the basis (e1 , . . . , en ), and In is
the identity n n matrix. Since
pG (u) =
(u w)
w,
kwk2
310
WW>
,
W >W
and since pH + pG = id, the matrix representing pH is
In
WW>
.
W >W
These formulae can be used to derive a formula for a rotation of R3 , given the direction w
of its axis of rotation and given the angle of rotation.
The following fact is the key to the proof that every isometry can be decomposed as a
product of reflections.
Proposition 11.1. Let E be any nontrivial Euclidean space. For any two vectors u, v E,
if kuk = kvk, then there is a hyperplane H such that the reflection s about H maps u to v,
and if u 6= v, then this reflection is unique.
Proof. If u = v, then any hyperplane containing u does the job. Otherwise, we must have
H = {v u} , and by the above formula,
s(u) = u 2
2kuk2 2u v
(u (v u))
(v
u)
=
u
+
(v u),
k(v u)k2
k(v u)k2
and since
k(v u)k2 = kuk2 + kvk2 2u v
and kuk = kvk, we have
We now show that hyperplane reflections can be used to obtain another proof of the
QR-decomposition.
11.2
First, we state the result geometrically. When translated in terms of Householder matrices,
we obtain the fact advertised earlier that every matrix (not necessarily invertible) has a
QR-decomposition.
311
Proposition 11.2. Let E be a nontrivial Euclidean space of dimension n. For any orthonormal basis (e1 , . . ., en ) and for any n-tuple of vectors (v1 , . . ., vn ), there is a sequence of n
isometries h1 , . . . , hn such that hi is a hyperplane reflection or the identity, and if (r1 , . . . , rn )
are the vectors given by
rj = hn h2 h1 (vj ),
then every rj is a linear combination of the vectors (e1 , . . . , ej ), 1 j n. Equivalently, the
matrix R whose columns are the components of the rj over the basis (e1 , . . . , en ) is an upper
triangular matrix. Furthermore, the hi can be chosen so that the diagonal entries of R are
nonnegative.
Proof. We proceed by induction on n. For n = 1, we have v1 = e1 for some R. If
0, we let h1 = id, else if < 0, we let h1 = id, the reflection about the origin.
For n 2, we first have to find h1 . Let
r1,1 = kv1 k.
If v1 = r1,1 e1 , we let h1 = id. Otherwise, there is a unique hyperplane reflection h1 such that
h1 (v1 ) = r1,1 e1 ,
defined such that
h1 (u) = u 2
for all u E, where
(u w1 )
w1
kw1 k2
w1 = r1,1 e1 v1 .
The map h1 is the reflection about the hyperplane H1 orthogonal to the vector w1 = r1,1 e1
v1 . Letting
r1 = h1 (v1 ) = r1,1 e1 ,
it is obvious that r1 belongs to the subspace spanned by e1 , and r1,1 = kv1 k is nonnegative.
Next, assume that we have found k linear maps h1 , . . . , hk , hyperplane reflections or the
identity, where 1 k n 1, such that if (r1 , . . . , rk ) are the vectors given by
rj = hk h2 h1 (vj ),
then every rj is a linear combination of the vectors (e1 , . . . , ej ), 1 j k. The vectors
(e1 , . . . , ek ) form a basis for the subspace denoted by Uk0 , the vectors (ek+1 , . . . , en ) form
a basis for the subspace denoted by Uk00 , the subspaces Uk0 and Uk00 are orthogonal, and
E = Uk0 Uk00 . Let
uk+1 = hk h2 h1 (vk+1 ).
We can write
uk+1 = u0k+1 + u00k+1 ,
312
(u wk+1 )
wk+1
kwk+1 k2
The map hk+1 is the reflection about the hyperplane Hk+1 orthogonal to the vector wk+1 =
rk+1,k+1 ek+1 u00k+1 . However, since u00k+1 , ek+1 Uk00 and Uk0 is orthogonal to Uk00 , the subspace
Uk0 is contained in Hk+1 , and thus, the vectors (r1 , . . . , rk ) and u0k+1 , which belong to Uk0 , are
invariant under hk+1 . This proves that
hk+1 (uk+1 ) = hk+1 (u0k+1 ) + hk+1 (u00k+1 ) = u0k+1 + rk+1,k+1 ek+1
is a linear combination of (e1 , . . . , ek+1 ). Letting
rk+1 = hk+1 (uk+1 ) = u0k+1 + rk+1,k+1 ek+1 ,
since uk+1 = hk h2 h1 (vk+1 ), the vector
rk+1 = hk+1 h2 h1 (vk+1 )
is a linear combination of (e1 , . . . , ek+1 ). The coefficient of rk+1 over ek+1 is rk+1,k+1 = ku00k+1 k,
which is nonnegative. This concludes the induction step, and thus the proof.
Remarks:
(1) Since every hi is a hyperplane reflection or the identity,
= hn h2 h1
is an isometry.
(2) If we allow negative diagonal entries in R, the last isometry hn may be omitted.
313
wk = rk,k ek u00k ,
Proof. The jth column of A can be viewed as a vector vj over the canonical basis (e1 , . . . , en )
of En (where (ej )i = 1 if i = j, and 0 otherwise, 1 i, j n). Applying Proposition 11.2
to (v1 , . . . , vn ), there is a sequence of n isometries h1 , . . . , hn such that hi is a hyperplane
reflection or the identity, and if (r1 , . . . , rn ) are the vectors given by
rj = hn h2 h1 (vj ),
then every rj is a linear combination of the vectors (e1 , . . . , ej ), 1 j n. Letting R be the
matrix whose columns are the vectors rj , and Hi the matrix associated with hi , it is clear
that
R = Hn H2 H1 A,
where R is upper triangular and every Hi is either a Householder matrix or the identity.
However, hi hi = id for all i, 1 i n, and so
vj = h1 h2 hn (rj )
for all j, 1 j n. But = h1 h2 hn is an isometry represented by the orthogonal
matrix Q = H1 H2 Hn . It is clear that A = QR, where R is upper triangular. As we noted
in Proposition 11.2, the diagonal entries of R can be chosen to be nonnegative.
314
Remarks:
(1) Letting
Ak+1 = Hk H2 H1 A,
uk+1
..
.. .. .. ..
0 ...
.
. . . .
k+1
0 0 uk
0 0 0 uk+1
k+1
,
Ak+1 =
k+1
0
0
0
u
k+2
.. .. ..
..
.. .. .. ..
. . .
.
. . . .
k+1
0 0 0 un1
0 0 0 uk+1
n
where the (k + 1)th column of the matrix is the vector
uk+1 = hk h2 h1 (vk+1 ),
and thus
k+1
u0k+1 = uk+1
1 , . . . , uk
and
k+1
k+1
u00k+1 = uk+1
,
u
,
.
.
.
,
u
.
n
k+1
k+2
If the last n k 1 entries in column k + 1 are all zero, there is nothing to do, and we
let Hk+1 = I. Otherwise, we kill these n k 1 entries by multiplying Ak+1 on the
left by the Householder matrix Hk+1 sending
k+1
0, . . . , 0, uk+1
to (0, . . . , 0, rk+1,k+1 , 0, . . . , 0),
k+1 , . . . , un
k+1
where rk+1,k+1 = k(uk+1
k+1 , . . . , un )k.
(2) If A is invertible and the diagonal entries of R are positive, it can be shown that Q
and R are unique.
(3) If we allow negative diagonal entries in R, the matrix Hn may be omitted (Hn = I).
(4) The method allows the computation of the determinant of A. We have
det(A) = (1)m r1,1 rn,n ,
where m is the number of Householder matrices (not the identity) among the Hi .
11.3. SUMMARY
315
(5) The condition number of the matrix A is preserved (see Strang [105], Golub and Van
Loan [49], Trefethen and Bau [110], Kincaid and Cheney [63], or Ciarlet [24]). This is
very good for numerical stability.
(6) The method also applies to a rectangular m n matrix. In this case, R is also an
m n matrix (and it is upper triangular).
11.3
Summary
The main concepts and results of this chapter are listed below:
Symmetry (or reflection) with respect to F and parallel to G.
Orthogonal symmetry (or reflection) with respect to F and parallel to G; reflections,
flips.
Hyperplane reflections and Householder matrices.
A key fact about reflections (Proposition 11.1).
QR-decomposition in terms of Householder transformations (Theorem 11.3).
316
Chapter 12
Hermitian Spaces
12.1
In this chapter we generalize the basic results of Euclidean geometry presented in Chapter
10 to vector spaces over the complex numbers. Such a generalization is inevitable, and not
simply a luxury. For example, linear maps may not have real eigenvalues, but they always
have complex eigenvalues. Furthermore, some very important classes of linear maps can
be diagonalized if they are extended to the complexification of a real vector space. This
is the case for orthogonal matrices, and, more generally, normal matrices. Also, complex
vector spaces are often the natural framework in physics or engineering, and they are more
convenient for dealing with Fourier series. However, some complications arise due to complex
conjugation.
Recall that for any complex number z C, if z = x + iy where x, y R, we let <z = x,
the real part of z, and =z = y, the imaginary part of z. We also denote the conjugate of
z = x + iy by z = x iy, and the absolute value (or length, or modulus) of z by |z|. Recall
that |z|2 = zz = x2 + y 2 .
There are many natural situations where a map : E E C is linear in its first
argument and only semilinear in its second argument, which means that (u, v) = (u, v),
as opposed to (u, v) = (u, v). For example, the natural inner product to deal with
functions f : R C, especially Fourier series, is
Z
hf, gi =
f (x)g(x)dx,
which is semilinear (but not linear) in g. Thus, when generalizing a result from the real case
of a Euclidean space to the complex case, we always have to check very carefully that our
proofs do not rely on linearity in the second argument. Otherwise, we need to revise our
proofs, and sometimes the result is simply wrong!
317
318
319
Note that restricted to real coefficients, a sesquilinear form is bilinear (we sometimes say
R-bilinear). The function : E C defined such that (u) = (u, u) for all u E is called
the quadratic form associated with .
The standard example of a Hermitian form on Cn is the map defined such that
((x1 , . . . , xn ), (y1 , . . . , yn )) = x1 y1 + x2 y2 + + xn yn .
This map is also positive definite, but before dealing with these issues, we show the following
useful proposition.
Proposition 12.1. Given a complex vector space E, the following properties hold:
(1) A sesquilinear form : E E C is a Hermitian form iff (u, u) R for all u E.
(2) If : E E C is a sesquilinear form, then
4(u, v) = (u + v, u + v) (u v, u v)
+ i(u + iv, u + iv) i(u iv, u iv),
and
2(u, v) = (1 + i)((u, u) + (v, v)) (u v, u v) i(u iv, u iv).
These are called polarization identities.
Proof. (1) If is a Hermitian form, then
(v, u) = (u, v)
implies that
(u, u) = (u, u),
and thus (u, u) R. If is sesquilinear and (u, u) R for all u E, then
(u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v),
which proves that
(u, v) + (v, u) = ,
where is real, and changing u to iu, we have
i((u, v) (v, u)) = ,
where is real, and thus
(u, v) =
i
2
and (v, u) =
+ i
,
2
320
Proposition 12.1 shows that a sesquilinear form is completely determined by the quadratic
form (u) = (u, u), even if is not Hermitian. This is false for a real bilinear form, unless
it is symmetric. For example, the bilinear form : R2 R2 R defined such that
((x1 , y1 ), (x2 , y2 )) = x1 y2 x2 y1
is not identically zero, and yet it is null on the diagonal. However, a real symmetric bilinear
form is indeed determined by its values on the diagonal, as we saw in Chapter 10.
As in the Euclidean case, Hermitian forms for which (u, u) 0 play an important role.
We warn our readers that some authors, such as Lang [69], define a pre-Hilbert space as
what we define as a Hermitian space. We prefer following the terminology used in Schwartz
[93] and Bourbaki [16]. The quantity (u, v) is usually called the Hermitian product of u
and v. We will occasionally call it the inner product of u and v.
Given a pre-Hilbert space hE, i, as in the case of a Euclidean space, we also denote
(u, v) by
u v or hu, vi or (u|v),
p
and (u) by kuk.
Example 12.1. The complex vector space Cn under the Hermitian form
((x1 , . . . , xn ), (y1 , . . . , yn )) = x1 y1 + x2 y2 + + xn yn
is a Hermitian space.
Example 12.2. Let l2 denote
of all countably infinite sequences
x = (xi )iN of
Pn
P the set
2
2
|x
|x
|
is
defined
(i.e.,
the
sequence
complex numbers such that
i | converges as
i
i=0
i=0
n ). It can be shown that the map : l2 l2 C defined such that
((xi )iN , (yi )iN ) =
xi yi
i=0
is well defined, and l2 is a Hermitian space under . Actually, l2 is even a Hilbert space.
Example 12.3. Let Cpiece [a, b] be the set of piecewise bounded continuous functions
f : [a, b] C under the Hermitian form
Z b
hf, gi =
f (x)g(x)dx.
a
It is easy to check that this Hermitian form is positive, but it is not definite. Thus, under
this Hermitian form, Cpiece [a, b] is only a pre-Hilbert space.
321
Example 12.4. Let C[a, b] be the set of complex-valued continuous functions f : [a, b] C
under the Hermitian form
Z b
f (x)g(x)dx.
hf, gi =
a
It is easy to check that this Hermitian form is positive definite. Thus, C[a, b] is a Hermitian
space.
Example 12.5. Let E = Mn (C) be the vector space of complex n n matrices. If we
view a matrix A Mn (C) as a long column vector obtained by concatenating together its
columns, we can define the Hermitian product of two matrices A, B Mn (C) as
hA, Bi =
n
X
aij bij ,
i,j=1
Since this can be viewed as the standard Hermitian product on Cn , it is a Hermitian product
on Mn (C). The corresponding norm
p
kAkF = tr(A A)
is the Frobenius norm (see Section 7.2).
If E is finite-dimensional and if : EP
E R is a sequilinear
form on E, given any
Pn
n
basis (e1 , . . . , en ) of E, we can write x = i=1 xi ei and y = j=1 yj ej , and we have
(x, y) =
X
n
i=1
xi ei ,
n
X
j=1
yj ej
n
X
xi y j (ei , ej ).
i,j=1
If we let G be the matrix G = ((ei , ej )), and if x and y are the column vectors associated
with (x1 , . . . , xn ) and (y1 , . . . , yn ), then we can write
(x, y) = x> G y = y G> x,
where y corresponds to (y 1 , . . . , y n ). As in SectionP
10.1, we are committing the slight abuse of
notation of letting x denote both the vector x = ni=1 xi ei and the column vector associated
with (x1 , . . . , xn ) (and similarly for y). The correct expression for (x, y) is
(x, y) = y G> x = x> Gy.
Observe that in (x, y) = y G> x, the matrix involved is the transpose of G = ((ei , ej )).
322
Furthermore, observe that is Hermitian iff G = G , and is positive definite iff the
matrix G is positive definite, that is,
x> Gx > 0 for all x Cn , x 6= 0.
The matrix G associated with a Hermitian product is called the Gram matrix of the Hermitian product with respect to the basis (e1 , . . . , en ).
Remark: To avoid the transposition in the expression for (x, y) = y G> x, some authors
(such as Hoffman and Kunze [62]), define the Gram matrix G0 = (gij0 ) associated with h, i
so that (gij0 ) = ((ej , ei )); that is, G0 = G> .
Conversely, if A is a Hermitian positive definite n n matrix, it is easy to check that the
Hermitian form
hx, yi = y Ax
is positive definite. If we make a change of basis from the basis (e1 , . . . , en ) to the basis
(f1 , . . . , fn ), and if the change of basis matrix is P (where the jth column of P consists of
the coordinates of fj over the basis (e1 , . . . , en )), then with respect to coordinates x0 and y 0
over the basis (f1 , . . . , fn ), we have
x> Gy = x0> P > GP y 0 ,
so the matrix of our inner product over the basis (f1 , . . . , fn ) is P > GP = (P ) GP . We
summarize these facts in the following proposition.
Proposition 12.2. Let E be a finite-dimensional vector space, and let (e1 , . . . , en ) be a basis
of E.
1. For any Hermitian inner product h, i on E, if G = (hei , ej i) is the Gram matrix of
the Hermitian product h, i w.r.t. the basis (e1 , . . . , en ), then G is Hermitian positive
definite.
2. For any change of basis matrix P , the Gram matrix of h, i with respect to the new
basis is (P ) GP .
3. If A is any n n Hermitian positive definite matrix, then
hx, yi = y Ax
is a Hermitian product on E.
We will see later that a Hermitian matrix is positive definite iff its eigenvalues are all
positive.
The following result reminiscent of the first polarization identity of Proposition 12.1 can
be used to prove that two linear maps are identical.
323
Proposition 12.3. Given any Hermitian space E with Hermitian product h, i, for any
linear map f : E E, if hf (x), xi = 0 for all x E, then f = 0.
Proof. Compute hf (x + y), x + yi and hf (x y), x yi:
hf (x + y), x + yi = hf (x), xi + hf (x), yi + hf (y), xi + hy, yi
hf (x y), x yi = hf (x), xi hf (x), yi hf (y), xi + hy, yi;
then, subtract the second equation from the first, to obtain
hf (x + y), x + yi hf (x y), x yi = 2(hf (x), yi + hf (y), xi).
If hf (u), ui = 0 for all u E, we get
hf (x), yi + hf (y), xi = 0 for all x, y E.
Then, the above equation also holds if we replace x by ix, and we obtain
ihf (x), yi ihf (y), xi = 0,
for all x, y E,
so we have
hf (x), yi + hf (y), xi = 0
hf (x), yi hf (y), xi = 0,
which implies that hf (x), yi = 0 for all x, y E. Since h, i is positive definite, we have
f (x) = 0 for all x E; that is, f = 0.
One should be careful not to apply Proposition 12.3 to a linear map on a real Euclidean
space, because it is false! The reader should find a counterexample.
The CauchySchwarz inequality and the Minkowski inequalities extend to pre-Hilbert
spaces and to Hermitian spaces.
Proposition 12.4. Let hE, i be a pre-Hilbert space with associated quadratic form . For
all u, v E, we have the CauchySchwarz inequality
p
p
|(u, v)| (u) (v).
Furthermore, if hE, i is a Hermitian space, the equality holds iff u and v are linearly dependent.
We also have the Minkowski inequality
p
p
p
(u + v) (u) + (v).
Furthermore, if hE, i is a Hermitian space, the equality holds iff u and v are linearly dependent, where in addition, if u 6= 0 and v 6= 0, then u = v for some real such that
> 0.
324
(u + t0 ei v) = 0.
p
p
(u) (v).
325
p
p
(u) (v),
If is positive definite and u and v are linearly dependent, it is immediately verified that
we get an equality. Conversely, if equality holds in the Minkowski inequality, we must have
p
p
<((u, v)) = (u) (v),
which implies that
|(u, v)| =
p
p
(u) (v),
326
If hE, i is only pre-Hilbertian, kuk is called a seminorm. In this case, the condition
kuk = 0 implies u = 0
is not necessarily true. However, the CauchySchwarz inequality shows that if kuk = 0, then
u v = 0 for all v E.
Remark: As in the case of real vector spaces, a norm on a complex vector space is induced
by some psotive definite Hermitian product h, i iff it satisfies the parallelogram law :
ku + vk2 + ku vk2 = 2(kuk2 + kvk2 ).
This time, the Hermitian product is recovered using the polarization identity from Proposition 12.1:
4hu, vi = ku + vk2 ku vk2 + i ku + ivk2 i ku ivk2 .
It is easy to check that hu, ui = kuk2 , and
hv, ui = hu, vi
hiu, vi = ihu, vi,
so it is enough to check linearity in the variable u, and only for real scalars. This is easily
done by applying the proof from Section 10.1 to the real and imaginary part of hu, vi; the
details are left as an exercise.
We will now basically mirror the presentation of Euclidean geometry given in Chapter
10 rather quickly, leaving out most proofs, except when they need to be seriously amended.
12.2
In this section we assume that we are dealing with Hermitian spaces. We denote the Hermitian inner product by u v or hu, vi. The concepts of orthogonality, orthogonal family of
vectors, orthonormal family of vectors, and orthogonal complement of a set of vectors are
unchanged from the Euclidean case (Definition 10.2).
For example, the set C[, ] of continuous functions f : [, ] C is a Hermitian
space under the product
Z
hf, gi =
f (x)g(x)dx,
ikx
)kZ is orthogonal.
Proposition 10.3 and 10.4 hold without any changes. It is easy to show that
2
n
n
X
X
X
ui
=
kui k2 +
2<(ui uj ).
i=1
i=1
1i<jn
327
Analogously to the case of Euclidean spaces of finite dimension, the Hermitian product
induces a canonical bijection (i.e., independent of the choice of bases) between the vector
space E and the space E . This is one of the places where conjugation shows up, but in this
case, troubles are minor.
Given a Hermitian space E, for any vector u E, let lu : E C be the map defined
such that
lu (v) = u v, for all v E.
Similarly, for any vector v E, let rv : E C be the map defined such that
rv (u) = u v,
for all u E.
Since the Hermitian product is linear in its first argument u, the map rv is a linear form
in E , and since it is semilinear in its second argument v, the map lu is also a linear form
in E . Thus, we have two maps [l : E E and [r : E E , defined such that
[l (u) = lu ,
and [r (v) = rv .
for all u E
328
As a corollary of the isomorphism [ : E E , if E is a Hermitian space of finite dimension, then every linear form f E corresponds to a unique v E, such that
f (u) = u v,
for every u E.
In particular, if f is not the null form, the kernel of f , which is a hyperplane H, is precisely
the set of vectors that are orthogonal to v.
Remark: The musical map [ : E E is not surjective when E has infinite dimension.
This result can be salvaged by restricting our attention to continuous linear maps, and by
assuming that the vector space E is a Hilbert space.
The existence of the isomorphism [ : E E is crucial to the existence of adjoint maps.
Indeed, Theorem 12.5 allows us to define the adjoint of a linear map on a Hermitian space.
Let E be a Hermitian space of finite dimension n, and let f : E E be a linear map. For
every u E, the map
v 7 u f (v)
is clearly a linear form in E , and by Theorem 12.5, there is a unique vector in E denoted
by f (u), such that
f (u) v = u f (v),
that is,
f (u) v = u f (v),
for every v E.
Proposition 12.6. Given a Hermitian space E of finite dimension, for every linear map
f : E E there is a unique linear map f : E E such that
f (u) v = u f (v),
for all u, v E. The map f is called the adjoint of f (w.r.t. to the Hermitian product).
Proof. Careful inspection of the proof of Proposition 10.6 reveals that it applies unchanged.
The only potential problem is in proving that f (u) = f (u), but everything takes place
in the first argument of the Hermitian product, and there, we have linearity.
The fact that
vu=uv
329
330
X
n
i=1
|ui |
n
X
ui vi ,
i=1
1/2
.
The fact that a Hermitian space always has an orthonormal basis implies that any Gram
matrix G can be written as
G = Q Q,
for some invertible matrix Q. Indeed, we know that in a change of basis matrix, a Gram
matrix G becomes G0 = (P ) GP . If the basis corresponding to G0 is orthonormal, then
1
1
G0 = I, so G = (P ) P .
Proposition 10.9 also holds unchanged.
Proposition 12.10. Given any nontrivial Hermitian space E of finite dimension n 1, for
any subspace F of dimension k, the orthogonal complement F of F has dimension n k,
and E = F F . Furthermore, we have F = F .
12.3
In this section we consider linear maps between Hermitian spaces that preserve the Hermitian
norm. All definitions given for Euclidean spaces in Section 10.3 extend to Hermitian spaces,
except that orthogonal transformations are called unitary transformation, but Proposition
10.10 extends only with a modified condition (2). Indeed, the old proof that (2) implies
(3) does not work, and the implication is in fact false! It can be repaired by strengthening
condition (2). For the sake of completeness, we state the Hermitian version of Definition
10.3.
331
Definition 12.4. Given any two nontrivial Hermitian spaces E and F of the same finite
dimension n, a function f : E F is a unitary transformation, or a linear isometry, if it is
linear and
kf (u)k = kuk, for all u E.
Proposition 10.10 can be salvaged by strengthening condition (2).
Proposition 12.11. Given any two nontrivial Hermitian spaces E and F of the same finite
dimension n, for every function f : E F , the following properties are equivalent:
(1) f is a linear map and kf (u)k = kuk, for all u E;
(2) kf (v) f (u)k = kv uk and f (iu) = if (u), for all u, v E.
(3) f (u) f (v) = u v, for all u, v E.
Furthermore, such a map is bijective.
Proof. The proof that (2) implies (3) given in Proposition 10.10 needs to be revised as
follows. We use the polarization identity
2(u, v) = (1 + i)(kuk2 + kvk2 ) ku vk2 iku ivk2 .
Since f (iv) = if (v), we get f (0) = 0 by setting v = 0, so the function f preserves distance
and norm, and we get
2(f (u), f (v)) = (1 + i)(kf (u)k2 + kf (v)k2 ) kf (u) f (v)k2
ikf (u) if (v)k2
= (1 + i)(kf (u)k2 + kf (v)k2 ) kf (u) f (v)k2
ikf (u) f (iv)k2
= (1 + i)(kuk2 + kvk2 ) ku vk2 iku ivk2
= 2(u, v),
which shows that f preserves the Hermitian inner product, as desired. The rest of the proof
is unchanged.
Remarks:
(i) In the Euclidean case, we proved that the assumption
kf (v) f (u)k = kv uk for all u, v E and f (0) = 0
implies (3). For this we used the polarization identity
2u v = kuk2 + kvk2 ku vk2 .
(20 )
332
(ii) If we modify (2) by changing the second condition by now requiring that there be some
E such that
f ( + iu) = f ( ) + i(f ( + u) f ( ))
for all u E, then the function g : E E defined such that
g(u) = f ( + u) f ( )
satisfies the old conditions of (2), and the implications (2) (3) and (3) (1) prove
that g is linear, and thus that f is affine. In view of the first remark, some condition
involving i is needed on f , in addition to the fact that f is distance-preserving.
12.4
In this section, as a mirror image of our treatment of the isometries of a Euclidean space,
we explore some of the fundamental properties of the unitary group and of unitary matrices.
As an immediate corollary of the GramSchmidt orthonormalization procedure, we obtain
the QR-decomposition for invertible matrices. In the Hermitian framework, the matrix of
the adjoint of a linear map is not given by the transpose of the original matrix, but by its
conjugate.
Definition 12.5. Given a complex m n matrix A, the transpose A> of A is the n m
matrix A> = a>
i j defined such that
a>
i j = aj i ,
and the conjugate A of A is the m n matrix A = (bi j ) defined such that
b i j = ai j
for all i, j, 1 i m, 1 j n. The adjoint A of A is the matrix defined such that
A = (A> ) = A
>
Proposition 12.12. Let E be any Hermitian space of finite dimension n, and let f : E E
be any linear map. The following properties hold:
333
Remarks:
(1) The conditions A A = In , A A = In , and A1 = A are equivalent. Given any two
orthonormal bases (u1 , . . . , un ) and (v1 , . . . , vn ), if P is the change of basis matrix from
(u1 , . . . , un ) to (v1 , . . . , vn ), it is easy to show that the matrix P is unitary. The proof
of Proposition 12.11 (3) also shows that if f is an isometry, then the image of an
orthonormal basis (u1 , . . . , un ) is an orthonormal basis.
334
(2) Using the explicit formula for the determinant, we see immediately that
det(A) = det(A).
If f is unitary and A is its matrix with respect to any orthonormal basis, from AA = I,
we get
det(AA ) = det(A) det(A ) = det(A)det(A> ) = det(A)det(A) = | det(A)|2 ,
and so | det(A)| = 1. It is clear that the isometries of a Hermitian space of dimension
n form a group, and that the isometries of determinant +1 form a subgroup.
This leads to the following definition.
Definition 12.7. Given a Hermitian space E of dimension n, the set of isometries f : E
E forms a subgroup of GL(E, C) denoted by U(E), or U(n) when E = Cn , called the
unitary group (of E). For every isometry f we have | det(f )| = 1, where det(f ) denotes
the determinant of f . The isometries such that det(f ) = 1 are called rotations, or proper
isometries, or proper unitary transformations, and they form a subgroup of the special
linear group SL(E, C) (and of U(E)), denoted by SU(E), or SU(n) when E = Cn , called
the special unitary group (of E). The isometries such that det(f ) 6= 1 are called improper
isometries, or improper unitary transformations, or flip transformations.
A veryimportant example of unitary matrices is provided by Fourier matrices (up to a
factor of n), matrices that arise in the various versions of the discrete Fourier transform.
For more on this topic, see the problems, and Strang [104, 106].
Now that we have the definition of a unitary matrix, we can explain how the Gram
Schmidt orthonormalization procedure immediately yields the QR-decomposition for matrices.
Proposition 12.13. Given any n n complex matrix A, if A is invertible, then there is a
unitary matrix Q and an upper triangular matrix R with positive diagonal entries such that
A = QR.
The proof is absolutely the same as in the real case!
Due to space limitations, we will not study the isometries of a Hermitian space in this
chapter. However, the reader will find such a study in the supplements on the web site (see
http://www.cis.upenn.edu/ejean/gbooks/geom2.html).
12.5
In this section, we assume that the field K is not a field of characteristic 2. Recall that a
linear map f : E E is an involution iff f 2 = id, and is idempotent iff f 2 = f . We know
from Proposition 4.7 that if f is idempotent, then
E = Im(f ) Ker (f ),
335
and that the restriction of f to its image is the identity. For this reason, a linear involution
is called a projection. The connection between involutions and projections is given by the
following simple proposition.
Proposition 12.14. For any linear map f : E E, we have f 2 = id iff 21 (id f ) is a
projection iff 12 (id + f ) is a projection; in this case, f is equal to the difference of the two
projections 21 (id + f ) and 12 (id f ).
Proof. We have
1
(id f )
2
so
1
(id f )
2
We also have
1
= (id 2f + f 2 )
4
1
= (id f ) iff f 2 = id.
2
2
1
1
(id + f ) = (id + 2f + f 2 ),
2
4
so
2
2
1
(id + f )
2
2
1
= (id + f ) iff f 2 = id.
2
Ker
1
(id f )
2
1
Im (id + f ) .
2
Therefore,
+
U = Ker
1
(id f )
2
1
= Im (id + f ) ,
2
and so, f (u) = u on U + and f (u) = u on U . The involutions of E that are unitary
transformations are characterized as follows.
336
Proposition 12.15. Let f GL(E) be an involution. The following properties are equivalent:
(a) The map f is unitary; that is, f U(E).
(b) The subspaces U = Im( 21 (id f )) and U + = Im( 21 (id + f )) are orthogonal.
Furthermore, if E is finite-dimensional, then (a) and (b) are equivalent to
(c) The map is self-adjoint; that is, f = f .
Proof. If f is unitary, then from hf (u), f (v)i = hu, vi for all u, v E, we see that if u U +
and v U 1 , we get
hu, vi = hf (u), f (v)i = hu, vi = hu, vi,
so 2hu, vi = 0, which implies hu, vi = 0, that is, U + and U are orthogonal. Thus, (a)
implies (b).
Conversely, if (b) holds, since f (u) = u on U + and f (u) = u on U , we see that
hf (u), f (v)i = hu, vi if u, v U + or if u, v U . Since E = U + U and since U + and U
are orthogonal, we also have hf (u), f (v)i = hu, vi for all u, v E, and (b) implies (a).
A unitary involution is the identity on U + = Im( 21 (id + f )), and f (v) = v for all
v U = Im( 21 (id f )). Furthermore, E is an orthogonal direct sum E = U + U 1 . We
say that f is an orthogonal reflection about U + . In the special case where U + is a hyperplane,
we say that f is a hyperplane reflection. We already studied hyperplane reflections in the
Euclidean case; see Chapter 11.
If f : E E is a projection (f 2 = f ), then
(id 2f )2 = id 4f + 4f 2 = id 4f + 4f = id,
so id 2f is an involution. As a consequence, we get the following result.
Proposition 12.16. If f : E E is a projection (f 2 = f ), then Ker (f ) and Im(f ) are
orthogonal iff f = f .
Proof. Apply Proposition 12.15 to g = id 2f . Since id g = 2f we have
1
+
U = Ker
(id g) = Ker (f )
2
and
1
U = Im (id g) = Im(f ),
2
337
12.6
Dual Norms
In the remark following the proof of Proposition 7.8, we explained that if (E, k k) and
(F, k k) are two normed vector spaces and if we let L(E; F ) denote the set of all continuous
(equivalently, bounded) linear maps from E to F , then, we can define the operator norm (or
subordinate norm) k k on L(E; F ) as follows: for every f L(E; F ),
kf k = sup
xE
x6=0
kf (x)k
= sup kf (x)k .
kxk
xE
kxk=1
In particular, if F = C, then L(E; F ) = E 0 is the dual space of E, and we get the operator
norm denoted by k k given by
kf k = sup |f (x)|.
xE
kxk=1
338
be the dual norm of k k (on E). If E is a real Euclidean space, then the dual norm is defined
by
kykD = sup hx, yi
xE
kxk=1
for all y E.
Beware that k k is generally not the Hermitian norm associated with the Hermitian innner
product. The dual norm shows up in convex programming; see Boyd and Vandenberghe [17],
Chapters 2, 3, 6, 9.
The fact that k kD is a norm follows from the fact that k k is a norm and can also be
checked directly. It is worth noting that the triangle inequality for k kD comes for free, in
the sense that it holds for any function p : E R. Indeed, we have
pD (x + y) = sup |hz, x + yi|
p(z)=1
p(z)=1
= p (x) + p (y).
If p : E R is a function such that
(1) p(x) 0 for all x E, and p(x) = 0 iff x = 0;
(2) p(x) = ||p(x), for all x E and all C;
339
then we say that p is a pre-norm. Obviously, every norm is a pre-norm, but a pre-norm
may not satisfy the triangle inequality. However, we just showed that the dual norm of any
pre-norm is actually a norm.
Since E is finite dimensional, the unit sphere S n1 = {x E | kxk = 1} is compact, so
there is some x0 S n1 such that
kykD = |hx0 , yi|.
If hx0 , yi = ei , with 0, then
|hei x0 , yi| = |ei hx0 , yi| = |ei ei | = ,
so
kykD = = |hei x0 , yi|,
with
ei x0
= kx0 k = 1. On the other hand,
<hx, yi |hx, yi|,
so we get
kykD = sup |hx, yi| = sup <hx, yi.
xE
kxk=1
xE
kxk=1
which yields
|hx, yi| kxk kykD .
The second inequality holds because |hx, yi| = |hy, xi|.
340
kykD
= kyk1
kykD
2 = kyk2 .
Thus, the Euclidean norm is autodual. More generally, if p, q 1 and 1/p + 1/q = 1, we
have
kykD
p = kykq .
It can also be shown that the dual of the spectral norm is the trace norm (or nuclear norm)
from Section 16.3. We close this section by stating the following duality theorem.
Theorem 12.18. If E is a finite-dimensional Hermitian space, then for any norm k k on
E, we have
kykDD = kyk
for all y E.
Proof. By Proposition 12.17, we have
|hx, yi| kxkD kyk ,
so we get
kykDD = sup |hx, yi| kyk ,
kxkD =1
for all y E.
for all y E.
Proofs of this fact can be found in Horn and Johnson [57] (Section 5.5), and in Serre [96]
(Chapter 7). The proof makes use of the fact that a nonempty, closed, convex set has a
supporting hyperplane through each of its boundary points, a result known as Minkowskis
lemma. This result is a consequence of the HahnBanach theorem; see Gallier [44]. We give
the proof in the case where E is a real Euclidean space. Some minor modifications have to
be made when dealing with complex vector spaces and are left as an exercise.
Since the unit ball B = {z E | kzk 1} is closed and convex, the Minkowski lemma
says for every x such that kxk = 1, there is an affine map g, of the form
g(z) = hz, wi hx, wi
with kwk = 1, such that g(x) = 0 and g(z) 0 for all z such that kzk 1. Then, it is clear
that
sup hz, wi = hx, wi,
kzk=1
341
12.7. SUMMARY
and so
It follows that
kxkDD hw/ kwkD , xi =
hx, wi
kwkD
= 1 = kxk
for all x such that kxk = 1. By homogeneity, this is true for all y E, which completes the
proof in the real case. When E is a complex vector space, we have to view the unit ball B
as a closed convex set in R2n and we use the fact that there is real affine map of the form
g(z) = <hz, wi <hx, wi
such that g(x) = 0 and g(z) 0 for all z with kzk = 1, so that kwkD = <hx, wi.
More details on dual norms and unitarily invariant norms can be found in Horn and
Johnson [57] (Chapters 5 and 7).
12.7
Summary
The main concepts and results of this chapter are listed below:
Semilinear maps.
Sesquilinear forms; Hermitian forms.
Quadratic form associated with a sesquilinear form.
Polarization identities.
Positive and positive definite Hermitian forms; pre-Hilbert spaces, Hermitian spaces.
Gram matrix associated with a Hermitian product.
The CauchySchwarz inequality and the Minkowski inequality.
Hermitian inner product, Hermitian norm.
The parallelogram law .
The musical isomorphisms [ : E E and ] : E E; Theorem 12.5 (E is finitedimensional).
The adjoint of a linear map (with respect to a Hermitian inner product).
Existence of orthonormal bases in a Hermitian space (Proposition 12.8).
GramSchmidt orthonormalization procedure.
342
Chapter 13
Spectral Theorems in Euclidean and
Hermitian Spaces
13.1
Introduction
The goal of this chapter is to show that there are nice normal forms for symmetric matrices,
skew-symmetric matrices, orthogonal matrices, and normal matrices. The spectral theorem
for symmetric matrices states that symmetric matrices have real eigenvalues and that they
can be diagonalized over an orthonormal basis. The spectral theorem for Hermitian matrices
states that Hermitian matrices also have real eigenvalues and that they can be diagonalized
over a complex orthonormal basis. Normal real matrices can be block diagonalized over an
orthonormal basis with blocks having size at most two, and there are refinements of this
normal form for skew-symmetric and orthogonal matrices.
13.2
We begin by studying normal maps, to understand the structure of their eigenvalues and
eigenvectors. This section and the next two were inspired by Lang [67], Artin [4], Mac Lane
and Birkhoff [73], Berger [8], and Bertin [12].
Definition 13.1. Given a Euclidean space E, a linear map f : E E is normal if
f f = f f.
A linear map f : E E is self-adjoint if f = f , skew-self-adjoint if f = f , and orthogonal
if f f = f f = id.
Obviously, a self-adjoint, skew-self-adjoint, or orthogonal linear map is a normal linear
map. Our first goal is to show that for every normal linear map f : E E, there is an
orthonormal basis (w.r.t. h, i) such that the matrix of f over this basis has an especially
343
344
nice form: It is a block diagonal matrix in which the blocks are either one-dimensional
matrices (i.e., single entries) or two-dimensional matrices of the form
.
This normal form can be further refined if f is self-adjoint, skew-self-adjoint, or orthogonal. As a first step, we show that f and f have the same kernel when f is normal.
Proposition 13.1. Given a Euclidean space E, if f : E E is a normal linear map, then
Ker f = Ker f .
Proof. First, let us prove that
hf (u), f (v)i = hf (u), f (v)i
for all u, v E. Since f is the adjoint of f and f f = f f , we have
hf (u), f (u)i = hu, (f f )(u)i,
= hu, (f f )(u)i,
= hf (u), f (u)i.
Since h, i is positive definite,
hf (u), f (u)i = 0 iff f (u) = 0,
hf (u), f (u)i = 0 iff f (u) = 0,
and since
hf (u), f (u)i = hf (u), f (u)i,
we have
f (u) = 0 iff f (u) = 0.
Consequently, Ker f = Ker f .
The next step is to show that for every linear map f : E E there is some subspace W
of dimension 1 or 2 such that f (W ) W . When dim(W ) = 1, the subspace W is actually
an eigenspace for some real eigenvalue of f . Furthermore, when f is normal, there is a
subspace W of dimension 1 or 2 such that f (W ) W and f (W ) W . The difficulty is
that the eigenvalues of f are not necessarily real. One way to get around this problem is to
complexify both the vector space E and the inner product h, i.
Every real vector space E can be embedded into a complex vector space EC , and every
linear map f : E E can be extended to a linear map fC : EC EC .
345
Definition 13.2. Given a real vector space E, let EC be the structure E E under the
addition operation
(u1 , u2 ) + (v1 , v2 ) = (u1 + v1 , u2 + v2 ),
and let multiplication by a complex scalar z = x + iy be defined such that
(x + iy) (u, v) = (xu yv, yu + xv).
The space EC is called the complexification of E.
It is easily shown that the structure EC is a complex vector space. It is also immediate
that
(0, v) = i(v, 0),
and thus, identifying E with the subspace of EC consisting of all vectors of the form (u, 0),
we can write
(u, v) = u + iv.
Observe that if (e1 , . . . , en ) is a basis of E (a real vector space), then (e1 , . . . , en ) is also
a basis of EC (recall that ei is an abreviation for (ei , 0)).
A linear map f : E E is extended to the linear map fC : EC EC defined such that
fC (u + iv) = f (u) + if (v).
For any basis (e1 , . . . , en ) of E, the matrix M (f ) representing f over (e1 , . . . , en ) is identical to the matrix M (fC ) representing fC over (e1 , . . . , en ), where we view (e1 , . . . , en ) as a
basis of EC . As a consequence, det(zI M (f )) = det(zI M (fC )), which means that f
and fC have the same characteristic polynomial (which has real coefficients). We know that
every polynomial of degree n with real (or complex) coefficients always has n complex roots
(counted with their multiplicity), and the roots of det(zI M (fC )) that are real (if any) are
the eigenvalues of f .
Next, we need to extend the inner product on E to an inner product on EC .
The inner product h, i on a Euclidean space E is extended to the Hermitian positive
definite form h, iC on EC as follows:
hu1 + iv1 , u2 + iv2 iC = hu1 , u2 i + hv1 , v2 i + i(hu2 , v1 i hu1 , v2 i).
It is easily verified that h, iC is indeed a Hermitian form that is positive definite, and
it is clear that h, iC agrees with h, i on real vectors. Then, given any linear map
f : E E, it is easily verified that the map fC defined such that
fC (u + iv) = f (u) + if (v)
for all u, v E is the adjoint of fC w.r.t. h, iC .
Assuming again that E is a Hermitian space, observe that Proposition 13.1 also holds.
We deduce the following corollary.
346
Proposition 13.2. Given a Hermitian space E, for any normal linear map f : E E, we
have Ker (f ) Im(f ) = (0).
Proof. Assume v Ker (f ) Im(f ) = (0), which means that v = f (u) for some u E, and
f (v) = 0. By Proposition 13.1, Ker (f ) = Ker (f ), so f (v) = 0 implies that f (v) = 0.
Consequently,
0 = hf (v), ui
= hv, f (u)i
= hv, vi,
and thus, v = 0.
We also have the following crucial proposition relating the eigenvalues of f and f .
Proposition 13.3. Given a Hermitian space E, for any normal linear map f : E E, a
vector u is an eigenvector of f for the eigenvalue (in C) iff u is an eigenvector of f for
the eigenvalue .
Proof. First, it is immediately verified that the adjoint of f id is f id. Furthermore,
f id is normal. Indeed,
(f id) (f id) = (f id) (f id),
= f f f f + id,
= f f f f + id,
= (f id) (f id),
= (f id) (f id).
Applying Proposition 13.1 to f id, for every nonnull vector u, we see that
(f id)(u) = 0 iff (f id)(u) = 0,
which is exactly the statement of the proposition.
The next proposition shows a very important property of normal linear maps: Eigenvectors corresponding to distinct eigenvalues are orthogonal.
Proposition 13.4. Given a Hermitian space E, for any normal linear map f : E E, if
u and v are eigenvectors of f associated with the eigenvalues and (in C) where 6= ,
then hu, vi = 0.
Proof. Let us compute hf (u), vi in two different ways. Since v is an eigenvector of f for ,
by Proposition 13.3, v is also an eigenvector of f for , and we have
hf (u), vi = hu, vi = hu, vi
347
where the last identity holds because of the semilinearity in the second argument, and thus
hu, vi = hu, vi,
that is,
( )hu, vi = 0,
We can also show easily that the eigenvalues of a self-adjoint linear map are real.
Proposition 13.5. Given a Hermitian space E, all the eigenvalues of any self-adjoint linear
map f : E E are real.
Proof. Let z (in C) be an eigenvalue of f and let u be an eigenvector for z. We compute
hf (u), ui in two different ways. We have
hf (u), ui = hzu, ui = zhu, ui,
and since f = f , we also have
hf (u), ui = hu, f (u)i = hu, f (u)i = hu, zui = zhu, ui.
Thus,
zhu, ui = zhu, ui,
which implies that z = z, since u 6= 0, and z is indeed real.
There is also a version of Proposition 13.5 for a (real) Euclidean space E and a self-adjoint
map f : E E.
Proposition 13.6. Given a Euclidean space E, if f : E E is any self-adjoint linear map,
then every eigenvalue of fC is real and is actually an eigenvalue of f (which means that
there is some real eigenvector u E such that f (u) = u). Therefore, all the eigenvalues of
f are real.
Proof. Let EC be the complexification of E, h, iC the complexification of the inner product
h, i on E, and fC : EC EC the complexification of f : E E. By definition of fC and
h, iC , if f is self-adjoint, we have
hfC (u1 + iv1 ), u2 + iv2 iC = hf (u1 ) + if (v1 ), u2 + iv2 iC
= hf (u1 ), u2 i + hf (v1 ), v2 i + i(hu2 , f (v1 )i hf (u1 ), v2 i)
= hu1 , f (u2 )i + hv1 , f (v2 )i + i(hf (u2 ), v1 i hu1 , f (v2 )i)
= hu1 + iv1 , f (u2 ) + if (v2 )iC
= hu1 + iv1 , fC (u2 + iv2 )iC ,
348
1
...
2 . . .
..
.. . .
.. ,
.
. .
.
. . . n
with i R.
Proof. We proceed by induction on the dimension n of E as follows. If n = 1, the result is
trivial. Assume now that n 2. From Proposition 13.6, all the eigenvalues of f are real, so
pick some eigenvalue R, and let w be some eigenvector for . By dividing w by its norm,
we may assume that w is a unit vector. Let W be the subspace of dimension 1 spanned by w.
Clearly, f (W ) W . We claim that f (W ) W , where W is the orthogonal complement
of W .
Indeed, for any v W , that is, if hv, wi = 0, because f is self-adjoint and f (w) = w,
we have
hf (v), wi = hv, f (w)i
= hv, wi
= hv, wi = 0
349
f (W ) W .
350
we have
f (u) = u v
and f (v) = u + v,
and f (v) = u + v,
351
The beginning of the proof of Proposition 13.9 actually shows that for every linear map
f : E E there is some subspace W such that f (W ) W , where W has dimension 1 or
2. In general, it doesnt seem possible to prove that W is invariant under f . However, this
happens when f is normal.
We can finally prove our first main theorem.
Theorem 13.10. (Main spectral theorem) Given a Euclidean space E of dimension n, for
every normal linear map f : E E, there is an orthonormal basis (e1 , . . . , en ) such that the
matrix of f w.r.t. this basis is a block diagonal matrix of the form
A1
...
A2 . . .
..
.. . .
..
.
. .
.
. . . Ap
such that each block Aj is either a one-dimensional matrix (i.e., a real scalar) or a twodimensional matrix of the form
j j
,
Aj =
j j
where j , j R, with j > 0.
Proof. We proceed by induction on the dimension n of E as follows. If n = 1, the result is
trivial. Assume now that n 2. First, since C is algebraically closed (i.e., every polynomial
has a root in C), the linear map fC : EC EC has some eigenvalue z = + i (where
, R). Let w = u + iv be some eigenvector of fC for + i (where u, v E). We can
now apply Proposition 13.9.
If = 0, then either u or v is an eigenvector of f for R. Let W be the subspace
of dimension 1 spanned by e1 = u/kuk if u 6= 0, or by e1 = v/kvk otherwise. It is obvious
that f (W ) W and f (W ) W. The orthogonal W of W has dimension n 1, and by
Proposition 13.8, we have f W W . But the restriction of f to W is also normal,
and we conclude by applying the induction hypothesis to W .
If 6= 0, then hu, vi = 0 and hu, ui = hv, vi, and if W is the subspace spanned by u/kuk
and v/kvk, then f (W ) = W and f (W ) = W . We also know that the restriction of f to W
has the matrix
with respect to the basis (u/kuk, v/kvk). If < 0, we let 1 = , 1 = , e1 = u/kuk, and
e2 = v/kvk. If > 0, we let 1 = , 1 = , e1 = v/kvk, and e2 = u/kuk. In all cases, it
is easily verified that the matrix of the restriction of f to W w.r.t. the orthonormal basis
(e1 , e2 ) is
1 1
A1 =
,
1 1
352
1
...
2 . . .
..
.. . .
.. ,
.
. .
.
. . . n
where j C.
Proof. We proceed by induction on the dimension n of E as follows. If n = 1, the result is
trivial. Assume now that n 2. Since C is algebraically closed (i.e., every polynomial has
a root in C), the linear map f : E E has some eigenvalue C, and let w be some unit
eigenvector for . Let W be the subspace of dimension 1 spanned by w. Clearly, f (W ) W .
By Proposition 13.3, w is an eigenvector of f for , and thus f (W ) W . By Proposition
13.8, we also have f (W ) W . The restriction of f to W is still normal, and we conclude
by applying the induction hypothesis to W (whose dimension is n 1).
Thus, in particular, self-adjoint, skew-self-adjoint, and orthogonal linear maps can be
diagonalized with respect to an orthonormal basis of eigenvectors. In this latter case, though,
an orthogonal map is called a unitary map. Also, Proposition 13.5 shows that the eigenvalues
of a self-adjoint linear map are real. It is easily shown that skew-self-adjoint maps have
eigenvalues that are pure imaginary or null, and that unitary maps have eigenvalues of
absolute value 1.
Remark: There is a converse to Theorem 13.11, namely, if there is an orthonormal basis
(e1 , . . . , en ) of eigenvectors of f , then f is normal. We leave the easy proof as an exercise.
13.3
353
Theorem 13.12. Given a Euclidean space E of dimension n, for every self-adjoint linear
map f : E E, there is an orthonormal basis (e1 , . . . , en ) of eigenvectors of f such that the
matrix of f w.r.t. this basis is a diagonal matrix
1
...
2 . . .
..
.. . .
.. ,
.
.
.
.
. . . n
where i R.
Proof. We already proved this; see Theorem 13.6. However, it is instructive to give a more
direct method not involving the complexification of h, i and Proposition 13.5.
Since C is algebraically closed, fC has some eigenvalue + i, and let u + iv be some
eigenvector of fC for + i, where , R and u, v E. We saw in the proof of Proposition
13.9 that
f (u) = u v and f (v) = u + v.
Since f = f ,
and f (v) = u + v,
we get
hf (u), vi = hu v, vi = hu, vi hv, vi
and
hu, f (v)i = hu, u + vi = hu, ui + hu, vi,
and thus we get
hu, vi hv, vi = hu, ui + hu, vi,
that is,
(hu, ui + hv, vi) = 0,
which implies = 0, since either u 6= 0 or v 6= 0. Therefore, is a real eigenvalue of f .
Now, going back to the proof of Theorem 13.10, only the case where = 0 applies, and
the induction shows that all the blocks are one-dimensional.
Theorem 13.12 implies that if 1 , . . . , p are the distinct real eigenvalues of f , and Ei is
the eigenspace associated with i , then
E = E1 Ep ,
354
(X), . . . ,
(X)
x1
xn
355
A1
...
A2 . . .
..
.. . .
..
.
. .
.
. . . Ap
such that each block Aj is either 0 or a two-dimensional matrix of the form
0 j
Aj =
,
j
0
where j R, with j > 0. In particular, the eigenvalues of fC are pure imaginary of the
form ij or 0.
Proof. The case where n = 1 is trivial. As in the proof of Theorem 13.10, fC has some
eigenvalue z = + i, where , R. We claim that = 0. First, we show that
hf (w), wi = 0
for all w E. Indeed, since f = f , we get
hf (w), wi = hw, f (w)i = hw, f (w)i = hw, f (w)i = hf (w), wi,
since h, i is symmetric. This implies that
hf (w), wi = 0.
Applying this to u and v and using the fact that
f (u) = u v
and f (v) = u + v,
we get
and
Then, going back to the proof of Theorem 13.10, unless = 0, the case where u and v
are orthogonal and span a subspace of dimension 2 applies, and the induction shows that all
the blocks are two-dimensional or reduced to 0.
356
Remark: One will note that if f is skew-self-adjoint, then ifC is self-adjoint w.r.t. h, iC .
By Proposition 13.5, the map ifC has real eigenvalues, which implies that the eigenvalues of
fC are pure imaginary or 0.
Finally, we consider orthogonal linear maps.
Theorem 13.14. Given a Euclidean space E of dimension n, for every orthogonal linear
map f : E E there is an orthonormal basis (e1 , . . . , en ) such that the matrix of f w.r.t.
this basis is a block diagonal matrix of the form
A1
...
A2 . . .
..
.. . .
..
.
.
.
.
. . . Ap
such that each block Aj is either 1, 1, or a two-dimensional matrix of the form
cos j sin j
Aj =
sin j cos j
where 0 < j < . In particular, the eigenvalues of fC are of the form cos j i sin j , 1, or
1.
Proof. The case where n = 1 is trivial. As in the proof of Theorem 13.10, fC has some
eigenvalue z = + i, where , R. It is immediately verified that f f = f f = id
implies that fC fC = fC fC = id, so the map fC is unitary. In fact, the eigenvalues of fC
have absolute value 1. Indeed, if z (in C) is an eigenvalue of fC , and u is an eigenvector for
z, we have
hfC (u), fC (u)i = hzu, zui = zzhu, ui
and
A1 . . .
.. . .
.
..
.
. ..
.
.
.
.
A
r
Iq
...
Ip
357
A1 . . .
.. . .
.
.
. ..
. . . Am
...
In2m
where the first m blocks Aj are of the form
cos j sin j
Aj =
sin j cos j
with 0 < j .
358
t
\
i=1
E(1, si ) E(1, f ),
where E(1, si ) is the hyperplane defining the reflection si . By the Grassmann relation, if
we intersect t n hyperplanes, the dimension of their intersection is at least n t. Thus,
n t p, that is, t n p, and n p is the smallest number of reflections composing f .
As a corollary of Theorem 13.15, we obtain the following fact: If the dimension n of the
Euclidean space E is odd, then every rotation f SO(E) admits 1 has an eigenvalue.
Proof. The characteristic polynomial det(XI f ) of f has odd degree n and has real coefficients, so it must have some real root . Since f is an isometry, its n eigenvalues are of the
form, +1, 1, and ei , with 0 < < , so = 1. Now, the eigenvalues ei appear in
conjugate pairs, and since n is odd, the number of real eigenvalues of f is odd. This implies
that +1 is an eigenvalue of f , since otherwise 1 would be the only real eigenvalue of f , and
since its multiplicity is odd, we would have det(f ) = 1, contradicting the fact that f is a
rotation.
When n = 3, we obtain the result due to Euler which says that every 3D rotation R has
an invariant axis D, and that restricted to the plane orthogonal to D, it is a 2D rotation.
Furthermore, if (a, b, c) is a unit vector defining the axis D of the rotation R and if the angle
of the rotation is , if B is the skew-symmetric matrix
0 c b
0 a ,
B= c
b a
0
then it can be shown that
R = I + sin B + (1 cos )B 2 .
The theorems of this section and of the previous section can be immediately applied to
matrices.
13.4
359
symmetric if
skew-symmetric if
orthogonal if
A A> = A> A,
A> = A,
A> = A,
A A> = A> A = In .
Recall from Proposition 10.12 that when E is a Euclidean space and (e1 , . . ., en ) is an
orthonormal basis for E, if A is the matrix of a linear map f : E E w.r.t. the basis
(e1 , . . . , en ), then A> is the matrix of the adjoint f of f . Consequently, a normal linear map
has a normal matrix, a self-adjoint linear map has a symmetric matrix, a skew-self-adjoint
linear map has a skew-symmetric matrix, and an orthogonal linear map has an orthogonal
matrix. Similarly, if E and F are Euclidean spaces, (u1 , . . . , un ) is an orthonormal basis for
E, and (v1 , . . . , vm ) is an orthonormal basis for F , if a linear map f : E F has the matrix
A w.r.t. the bases (u1 , . . . , un ) and (v1 , . . . , vm ), then its adjoint f has the matrix A> w.r.t.
the bases (v1 , . . . , vm ) and (u1 , . . . , un ).
Furthermore, if (u1 , . . . , un ) is another orthonormal basis for E and P is the change of
basis matrix whose columns are the components of the ui w.r.t. the basis (e1 , . . . , en ), then
P is orthogonal, and for any linear map f : E E, if A is the matrix of f w.r.t (e1 , . . . , en )
and B is the matrix of f w.r.t. (u1 , . . . , un ), then
B = P > AP.
As a consequence, Theorems 13.10 and 13.1213.14 can be restated as follows.
360
Theorem 13.16. For every normal matrix A there is an orthogonal matrix P and a block
diagonal matrix D such that A = P D P > , where D is of the form
D1
...
D2 . . .
D = ..
.. . .
..
.
. .
.
. . . Dp
such that each block Dj is either a one-dimensional matrix (i.e., a real scalar) or a twodimensional matrix of the form
j j
Dj =
,
j j
where j , j R, with j > 0.
Theorem 13.17. For every symmetric matrix A there is an orthogonal matrix P and a
diagonal matrix D such that A = P D P > , where D is of the form
1
...
2 . . .
D = ..
.. . .
.. ,
.
. .
.
. . . n
where i R.
Theorem 13.18. For every skew-symmetric matrix A there is an orthogonal matrix P and
a block diagonal matrix D such that A = P D P > , where D is of the form
D1
...
D2 . . .
D = ..
.. . .
..
.
. .
.
. . . Dp
such that each block Dj is either 0 or a two-dimensional matrix of the form
0 j
Dj =
,
j
0
where j R, with j > 0. In particular, the eigenvalues of A are pure imaginary of the
form ij , or 0.
Theorem 13.19. For every orthogonal matrix A there is an orthogonal matrix P and a
block diagonal matrix D such that A = P D P > , where D is of the form
D1
...
D2 . . .
D = ..
.. . .
..
.
. .
.
. . . Dp
361
Hermitian if
skew-Hermitian if
unitary if
AA = A A,
A = A,
A = A,
AA = A A = In .
Recall from Proposition 12.12 that when E is a Hermitian space and (e1 , . . ., en ) is an
orthonormal basis for E, if A is the matrix of a linear map f : E E w.r.t. the basis
(e1 , . . . , en ), then A is the matrix of the adjoint f of f . Consequently, a normal linear map
has a normal matrix, a self-adjoint linear map has a Hermitian matrix, a skew-self-adjoint
linear map has a skew-Hermitian matrix, and a unitary linear map has a unitary matrix.
362
Similarly, if E and F are Hermitian spaces, (u1 , . . . , un ) is an orthonormal basis for E, and
(v1 , . . . , vm ) is an orthonormal basis for F , if a linear map f : E F has the matrix A w.r.t.
the bases (u1 , . . . , un ) and (v1 , . . . , vm ), then its adjoint f has the matrix A w.r.t. the bases
(v1 , . . . , vm ) and (u1 , . . . , un ).
Furthermore, if (u1 , . . . , un ) is another orthonormal basis for E and P is the change of
basis matrix whose columns are the components of the ui w.r.t. the basis (e1 , . . . , en ), then
P is unitary, and for any linear map f : E E, if A is the matrix of f w.r.t (e1 , . . . , en ) and
B is the matrix of f w.r.t. (u1 , . . . , un ), then
B = P AP.
Theorem 13.11 can be restated in terms of matrices as follows. We can also say a little
more about eigenvalues (easy exercise left to the reader).
Theorem 13.20. For every complex normal matrix A there is a unitary matrix U and a
diagonal matrix D such that A = U DU . Furthermore, if A is Hermitian, then D is a real
matrix; if A is skew-Hermitian, then the entries in D are pure imaginary or null; and if A
is unitary, then the entries in D have absolute value 1.
13.5
0
1 0
1 0
A=
.. ..
.
.
1 0
1 0
has the eigenvalue 0 with multiplicity n. However, if we perturb the top rightmost entry of
A by , it is easy to see that the characteristic polynomial of the matrix
0
1 0
1 0
A() =
.. ..
.
.
1 0
1 0
is X n . It follows that if n = 40 and = 1040 , A(1040 ) has the eigenvalue 101 with
multiplicity 40. Thus, we see that a very small change ( = 1040 ) to the matrix A causes
363
a huge change to the eigenvalues of A (from 0 to 101 ). Indeed, the relative error is 1039 .
Worse, due to machine precision, since very small numbers are treated as 0, the error on the
computation of eigenvalues (for example, of the matrix A(1040 )) can be very large.
This phenomenon is similar to the phenomenon discussed in Section 7.3 where we studied
the effect of a small pertubation of the coefficients of a linear system Ax = b on its solution.
In Section 7.3, we saw that the behavior of a linear system under small perturbations is
governed by the condition number cond(A) of the matrix A. In the case of the eigenvalue
problem (finding the eigenvalues of a matrix), we will see that the conditioning of the problem
depends on the condition number of the change of basis matrix P used in reducing the matrix
A to its diagonal form D = P 1 AP , rather than on the condition number of A itself. The
following proposition in which we assume that A is diagonalizable and that the matrix norm
k k satisfies a special condition (satisfied by the operator norms k kp for p = 1, 2, ), is due
to Bauer and Fike (1960).
Proposition 13.21. Let A Mn (C) be a diagonalizable matrix, P be an invertible matrix
and, D be a diagonal matrix D = diag(1 , . . . , n ) such that
A = P DP 1 ,
and let k k be a matrix norm such that
kdiag(1 , . . . , n )k = max |i |,
1in
for every diagonal matrix. Then, for every perturbation matrix A, if we write
Bi = {z C | |z i | cond(P ) kAk},
for every eigenvalue of A + A, we have
n
[
Bk .
k=1
Proof. Let be any eigenvalue of the matrix A + A. If = j for some j, then the result
is trivial. Thus, assume that 6= j for j = 1, . . . , n. In this case, the matrix D I is
invertible (since its eigenvalues are j for j = 1, . . . , n), and we have
P 1 (A + A I)P = D I + P 1 (A)P
= (D I)(I + (D I)1 P 1 (A)P ).
Since is an eigenvalue of A + A, the matrix A + A I is singular, so the matrix
I + (D I)1 P 1 (A)P
must also be singular. By Proposition 7.9(2), we have
1
(D I)1 P 1 (A)P
,
364
1
,
|k |
and we obtain
| k |
P 1
kAk kP k = cond(P ) kAk ,
which proves our result.
Proposition 13.21 implies that for any diagonalizable matrix A, if we define (A) by
(A) = inf{cond(P ) | P 1 AP = D},
then for every eigenvalue of A + A, we have
n
[
{z Cn | |z k | (A) kAk}.
k=1
The number (A) is called the conditioning of A relative to the eigenvalue problem. If A is
a normal matrix, since by Theorem 13.20, A can be diagonalized with respect to a unitary
matrix U , and since for the spectral norm kU k2 = 1, we see that (A) = 1. Therefore,
normal matrices are very well conditionned w.r.t. the eigenvalue problem. In fact, for every
eigenvalue of A + A (with A normal), we have
n
[
{z Cn | |z k | kAk2 }.
k=1
If A and A+A are both symmetric (or Hermitian), there are sharper results; see Proposition
13.27.
Note that the matrix A() from the beginning of the section is not normal.
13.6
365
A fact that is used frequently in optimization problem is that the eigenvalues of a symmetric
matrix are characterized in terms of what is known as the Rayleigh ratio, defined by
R(A)(x) =
x> Ax
,
x> x
x Rn , x 6= 0.
The following proposition is often used to prove the correctness of various optimization
or approximation problems (for example PCA).
Proposition 13.22. (RayleighRitz) If A is a symmetric n n matrix with eigenvalues
1 2 n and if (u1 , . . . , un ) is any orthonormal basis of eigenvectors of A, where
ui is a unit eigenvector associated with i , then
max
x6=0
x> Ax
= n
x> x
x6=0,x{unk+1 ,...,un }
x> Ax
= nk
x> x
k = 1, . . . , n.
x> Ax
= max{x> Ax | x> x = 1},
x
x> x
and similarly,
max
x6=0,x{unk+1 ,...,un }
x> Ax
= max x> Ax | (x {unk+1 , . . . , un } ) (x> x = 1) .
>
x
x x
Since A is a symmetric matrix, its eigenvalues are real and it can be diagonalized with respect
to an orthonormal basis of eigenvectors, so let (u1 , . . . , un ) be such a basis. If we write
x=
n
X
i=1
xi ui ,
366
n
X
i x2i .
i=1
If x> x = 1, then
Pn
i=1
x Ax =
n
X
i x2i
i=1
X
n
x2i
= n .
i=1
Thus,
max x> Ax | x> x = 1 n ,
x
and since this maximum is achieved for en = (0, 0, . . . , 1), we conclude that
max x> Ax | x> x = 1 = n .
x
Next,
that x {unk+1 , . . . , un } and x> x = 1 iff xnk+1 = = xn = 0 and
Pnk observe
2
i=1 xi = 1. Consequently, for such an x, we have
>
x Ax =
nk
X
i=1
i x2i
nk
X
n
x2i
= nk .
i=k+1
Thus,
max x> Ax | (x {unk+1 , . . . , un } ) (x> x = 1) nk ,
x
and since this maximum is achieved for enk = (0, . . . , 0, 1, 0, . . . , 0) with a 1 in position
n k, we conclude that
max x> Ax | (x {unk+1 , . . . , un } ) (x> x = 1) = nk ,
x
as claimed.
For our purposes, we need the version of Proposition 13.22 applying to min instead of
max, whose proof is obtained by a trivial modification of the proof of Proposition 13.22.
Proposition 13.23. (RayleighRitz) If A is a symmetric n n matrix with eigenvalues
1 2 n and if (u1 , . . . , un ) is any orthonormal basis of eigenvectors of A, where
ui is a unit eigenvector associated with i , then
min
x6=0
x> Ax
= 1
x> x
367
x6=0,x{u1 ,...,ui1 }
x> Ax
= i
x> x
k =
min
x6=0,xWk
x> Ax
x> Ax
=
min
,
x> x
x> x
x6=0,xVk1
k = 1, . . . , n.
Propositions 13.22 and 13.23 together are known the RayleighRitz theorem.
As an application of Propositions 13.22 and 13.23, we prove a proposition which allows
us to compare the eigenvalues of two symmetric matrices A and B = R> AR, where R is a
rectangular matrix satisfying the equation R> R = I.
First, we need a definition. Given an n n symmetric matrix A and an m m symmetric
B, with m n, if 1 2 n are the eigenvalues of A and 1 2 m are
the eigenvalues of B, then we say that the eigenvalues of B interlace the eigenvalues of A if
i i nm+i ,
i = 1, . . . , m.
j = 1, . . . , i 1,
we have Rv (Ui1 ) . By Proposition 13.23 and using the fact that R> R = I, we have
i
v > Bv
(Rv)> ARv
=
.
(Rv)> Rv
v>v
368
max
x6=0,x{vi+1 ,...,vn }
x> Bx
x> Bx
=
max
,
x6=0,x{v1 ,...,vi } x> x
x> x
w> Bw
i
w> w
for all w Vi ,
v > Bv
i ,
v>v
i = 1, . . . , m.
We can apply the same argument to the symmetric matrices A and B, to conclude that
nm+i i ,
that is,
i nm+i ,
i = 1, . . . , m.
Therefore,
i i nm+i ,
i = 1, . . . , m,
as desired.
(b) If i = i , then
i =
v > Bv
(Rv)> ARv
=
= i ,
(Rv)> Rv
v>v
so v must be an eigenvector for B and Rv must be an eigenvector for A, both for the
eigenvalue i = i .
Proposition 13.24 immediately implies the Poincare separation theorem. It can be used
in situations, such as in quantum mechanics, where one has information about the inner
products u>
i Auj .
Proposition 13.25. (Poincare separation theorem) Let A be a n n symmetric (or Hermitian) matrix, let r be some integer with 1 r n, and let (u1 , . . . , ur ) be r orthonormal
vectors. Let B = (u>
i Auj ) (an r r matrix), let 1 (A) . . . n (A) be the eigenvalues of
A and 1 (B) . . . r (B) be the eigenvalues of B; then we have
k (A) k (B) k+nr (A),
k = 1, . . . , r.
369
If P1 is the the n (n 1) matrix obtained from the identity matrix by dropping its last
column, we have P1> P = I, and the matrix B = P1> AP1 is the matrix obtained from A by
deleting its last row and its last column. In this case, the interlacing result is
1 1 2 2 n2 n1 n1 n ,
a genuine interlacing. We obtain similar results with the matrix Pnr obtained by dropping
>
APnr (B is the r r
the last n r columns of the identity matrix and setting B = Pnr
matrix obtained from A by deleting its last n r rows and columns). In this case, we have
the following interlacing inequalities known as Cauchy interlacing theorem:
k k k+nr ,
k = 1, . . . , r.
()
Another useful tool to prove eigenvalue equalities is the CourantFischer characterization of the eigenvalues of a symmetric matrix, also known as the Min-max (and Max-min)
theorem.
Theorem 13.26. (CourantFischer ) Let A be a symmetric n n matrix with eigenvalues
1 2 n and let (u1 , . . . , un ) be any orthonormal basis of eigenvectors of A,
where ui is a unit eigenvector associated with i . If Vk denotes the set of subspaces of Rn of
dimension k, then
x> Ax
W Vnk+1 xW,x6=0 x> x
x> Ax
k = min max
.
W Vk xW,x6=0 x> x
k =
max
min
Proof. Let us consider the second equality, the proof of the first equality being similar.
Observe that the space Vk spanned by (u1 , . . . , uk ) has dimension k, and by Proposition
13.22, we have
x> Ax
x> Ax
min
max
.
k = max
W Vk xW,x6=0 x> x
x6=0,xVk x> x
Therefore, we need to prove the reverse inequality; that is, we have to show that
k max
x6=0,xW
x> Ax
,
x> x
for all W Vk .
k =
min
x6=0,xVk1
v > Av
x> Ax
x> Ax
max
.
xW,x6=0 x> x
x> x
v>v
),
dim(W ) + dim(Vk1
) = dim(W Vk1
) + dim(W + Vk1
370
k + n k + 1 dim(W Vk1
) + n;
), as claimed.
that is, 1 dim(W Vk1
The CourantFischer theorem yields the following useful result about perturbing the
eigenvalues of a symmetric matrix due to Hermann Weyl.
Proposition 13.27. Given two n n symmetric matrices A and B = A + A, if 1 2
n are the eigenvalues of A and 1 2 n are the eigenvalues of B, then
|k k | (A) kAk2 ,
k = 1, . . . , n.
Proof. Let Vk be defined as in the CourantFischer theorem and let Vk be the subspace
spanned by the k eigenvectors associated with 1 , . . . , k . By the CourantFischer theorem
applied to B, we have
x> Bx
W Vk xW,x6=0 x> x
x> Bx
max >
xVk x x
>
x Ax x> Ax
= max
+
xVk
x> x
x> x
x> Ax
x> A x
max > + max
.
xVk x x
xVk
x> x
k = min max
x> Ax
,
x> x
so we obtain
x> A x
x> Ax
+
max
xVk
xVk x> x
x> x
x> A x
= k + max
xVk
x> x
x> A x
k + maxn
.
xR
x> x
k max
371
k = 1, . . . , n,
as claimed.
Proposition 13.27 also holds for Hermitian matrices.
A pretty result of Wielandt and Hoffman asserts that
n
X
k=1
(k k )2 kAk2F ,
where k kF is the Frobenius norm. However, the proof is significantly harder than the above
proof; see Lax [71].
The CourantFischer theorem can also be used to prove some famous inequalities due to
Hermann Weyl. Given two symmetric (or Hermitian) matrices A and B, let i (A), i (B),
and i (A + B) denote the ith eigenvalue of A, B, and A + B, respectively, arranged in
nondecreasing order.
Proposition 13.28. (Weyl) Given two symmetric (or Hermitian) n n matrices A and B,
the following inequalities hold: For all i, j, k with 1 i, j, k n:
1. If i + j = k + 1, then
i (A) + j (B) k (A + B).
2. If i + j = k + n, then
k (A + B) i (A) + j (B).
Proof. Observe that the first set of inequalities is obtained form the second set by replacing
A by A and B by B, so it is enough to prove the second set of inequalities. By the
CourantFischer theorem, there is a subspace H of dimension n k + 1 such that
x> (A + B)x
.
xH,x6=0
x> x
k (A + B) = min
Similarly, there exist a subspace F of dimension i and a subspace G of dimension j such that
x> Ax
,
xF,x6=0 x> x
i (A) = max
x> Bx
.
xG,x6=0 x> x
j (B) = max
372
We claim that F G H 6= (0). To prove this, we use the Grassmann relation twice. First,
dim(F G H) = dim(F ) + dim(G H) dim(F + (G H)) dim(F ) + dim(G H) n,
and second,
dim(G H) = dim(G) + dim(H) dim(G + H) dim(G) + dim(H) n,
so
However,
and i + j = k + n, so we have
dim(F G H) i + j + n k + 1 2n = k + n + n k + 1 2n = 1,
whcih shows that F G H 6= (0). Then, for any unit vector z F G H 6= (0), we have
k (A + B) z > (A + B)z,
n (A + B) n (A) + n (B).
13.7. SUMMARY
13.7
373
Summary
The main concepts and results of this chapter are listed below:
Normal linear maps, self-adjoint linear maps, skew-self-adjoint linear maps, and orthogonal linear maps.
Properties of the eigenvalues and eigenvectors of a normal linear map.
The complexification of a real vector space, of a linear map, and of a Euclidean inner
product.
The eigenvalues of a self-adjoint map in a Hermitian space are real .
The eigenvalues of a self-adjoint map in a Euclidean space are real .
Every self-adjoint linear map on a Euclidean space has an orthonormal basis of eigenvectors.
Every normal linear map on a Euclidean space can be block diagonalized (blocks of
size at most 2 2) with respect to an orthonormal basis of eigenvectors.
Every normal linear map on a Hermitian space can be diagonalized with respect to an
orthonormal basis of eigenvectors.
The spectral theorems for self-adjoint, skew-self-adjoint, and orthogonal linear maps
(on a Euclidean space).
The spectral theorems for normal, symmetric, skew-symmetric, and orthogonal (real)
matrices.
The spectral theorems for normal, Hermitian, skew-Hermitian, and unitary (complex)
matrices.
The conditioning of eigenvalue problems.
The Rayleigh ratio and the RayleighRitz theorem.
Interlacing inequalities and the Cauchy interlacing theorem.
The Poincare separation theorem.
The CourantFischer theorem.
Inequalities involving perturbations of the eigenvalues of a symmetric matrix.
The Weyl inequalities.
374
Chapter 14
Bilinear Forms and Their Geometries
14.1
Bilinear Forms
In this chapter, we study the structure of a K-vector space E endowed with a nondegenerate
bilinear form : E E K (for any field K), which can be viewed as a kind of generalized
inner product. Unlike the case of an inner product, there may be nonzero vectors u E such
that (u, u) = 0, so the map u 7 (u, u) can no longer be interpreted as a notion of square
length (also, (u, u) may not be real and positive!). However, the notion of orthogonality
survives: we say that u, v E are orthogonal iff (u, v) = 0. Under some additional
conditions on , it is then possible to split E into orthogonal subspaces having some special
properties. It turns out that the special cases where is symmetric (or Hermitian) or skewsymmetric (or skew-Hermitian) can be handled uniformly using a deep theorem due to Witt
(the Witt decomposition theorem (1936)).
We begin with the very general situation of a bilinear form : E F K, where K is an
arbitrary field, possibly of characteristric 2. Actually, even though at first glance this may
appear to be an uncessary abstraction, it turns out that this situation arises in attempting
to prove properties of a bilinear map : E E K, because it may be necessary to restrict
to different subspaces U and V of E. This general approach was pioneered by Chevalley
[22], E. Artin [3], and Bourbaki [13]. The third source was a major source of inspiration,
and many proofs are taken from it. Other useful references include Snapper and Troyer [99],
Berger [9], Jacobson [59], Grove [52], Taylor [108], and Berndt [11].
Definition 14.1. Given two vector spaces E and F over a field K, a map : E F K
is a bilinear form iff the following conditions hold: For all u, u1 , u2 E, all v, v1 , v2 F , for
all K, we have
(u1 + u2 , v) = (u1 , v) + (u2 , v)
(u, v1 + v2 ) = (u, v1 ) + (u, v2 )
(u, v) = (u, v)
(u, v) = (u, v).
375
376
A bilinear form as in Definition 14.1 is sometimes called a pairing. The first two conditions
imply that (0, v) = (u, 0) = 0 for all u E and all v F .
If E = F , observe that
(u + v, u + v) = (u, u + v) + (v, u + v)
= 2 (u, u) + (u, v) + (v, u) + 2 (v, v).
If we let = = 1, we get
(u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v).
If is symmetric, which means that
(u, v) = (v, u) for all u, v E,
then
2(u, v) = (u + v, u + v) (u, u) (v, v).
The function defined such that
(u) = (u, u) u E,
is called the quadratic form associated with . If the field K is not of characteristic 2, then
is completely determined by its quadratic form . The symmetric bilinear form is called
the polar form of . This suggests the following definition.
Definition 14.2. A function : E K is a quadratic form on E if the following conditions
hold:
(1) We have (u) = 2 (u), for all u E and all E.
(2) The map 0 given by 0 (u, v) = (u + v) (u) (v) is bilinear. Obviously, the map
0 is symmetric.
Since (x + x) = (2x) = 4(x), we have
0 (u, u) = 2(u) u E.
If the field K is not of characteristic 2, then = 21 0 is the unique symmetric bilinear form
such that that (u, u) = (u) for all u E. The bilinear form = 12 0 is called the polar
form of . In this case, there is a bijection between the set of bilinear forms on E and the
set of quadratic forms on E.
If K is a field of characteristic 2, then 0 is alternating, which means that
0 (u, u) = 0 for all u E.
377
Thus, cannot be recovered from the symmetric bilinear form 0 . However, there is some
(nonsymmetric) bilinear form such that (u) = (u, u) for all u E. Thus, quadratic
forms are more general than symmetric bilinear forms (except in characteristic 6= 2).
In general, if K is a field of any characteristic, the identity
Definition 14.3. Given a bilinear map : E F K, for every u E, let l (u) be the
linear form in F given by
l (u)(y) = (u, y) for all y F ,
and for every v F , let r (v) be the linear form in E given by
r (v)(x) = (x, v) for all x E.
Because is bilinear, the maps l : E F and r : F E are linear.
Definition 14.4. A bilinear map : E F K is said to be nondegenerate iff the following
conditions hold:
(1) For every u E, if (u, v) = 0 for all v F , then u = 0, and
(2) For every v F , if (u, v) = 0 for all u E, then v = 0.
The following proposition shows the importance of l and r .
Proposition 14.1. Given a bilinear map : E F K, the following properties hold:
(a) The map l is injective iff property (1) of Definition 14.4 holds.
(b) The map r is injective iff property (2) of Definition 14.4 holds.
(c) The bilinear form is nondegenerate and iff l and r are injective.
(d) If the bilinear form is nondegenerate and if E and F have finite dimensions, then
dim(E) = dim(F ), and l : E F and r : F E are linear isomorphisms.
378
Proof. (a) Assume that (1) of Definition 14.4 holds. If l (u) = 0, then l (u) is the linear
form whose value is 0 for all y; that is,
l (u)(y) = (u, y) = 0 for all y F ,
and by (1) of Definition 14.4, we must have u = 0. Therefore, l is injective. Conversely, if
l is injective, and if
l (u)(y) = (u, y) = 0 for all y F ,
then l (u) is the zero form, and by injectivity of l , we get u = 0; that is, (1) of Definition
14.4 holds.
(b) The proof is obtained by swapping the arguments of .
(c) This follows from (a) and (b).
(d) If E and F are finite dimensional, then dim(E) = dim(E ) and dim(F ) = dim(F ).
Since is nondegenerate, l : E F and r : F E are injective, so dim(E) dim(F ) =
dim(F ) and dim(F ) dim(E ) = dim(E), which implies that
dim(E) = dim(F ),
and thus, l : E F and r : F E are bijective.
As a corollary of Proposition 14.1, we have the following characterization of a nondegenerate bilinear map. The proof is left as an exercise.
Proposition 14.2. Given a bilinear map : E F K, if E and F have the same finite
dimension, then the following properties are equivalent:
(1) The map l is injective.
(2) The map l is surjective.
(3) The map r is injective.
(4) The map r is surjective.
(5) The bilinear form is nondegenerate.
Observe that in terms of the canonical pairing between E and E given by
hf, ui = f (u),
f E , u E,
379
xi ei ,
yj fj =
xi (ei , fj )yj .
i=1
j=1
i=1 j=1
380
This shows that is completely determined by the m n matrix M = ((ei , ej )), and in
matrix form, we have
(x, y) = x> M y = y > M > x,
where x and y are the column vectors associated with (x1 , . . . , xm ) K m and (y1 , . . . , yn )
K n . As in Section 10.1,
are committing the slight abuse of notation of letting x denote
Pwe
n
both the vector x =
i=1 xi ei and the column vector associated with (x1 , . . . , xn ) (and
similarly for y). We call M the matrix of with respect to the bases (e1 , . . . , em ) and
(f1 , . . . , fn ).
If m = dim(E) = dim(F ) = n, then it is easy to check that is nondegenerate iff M is
invertible iff det(M ) 6= 0.
As we will see later, most bilinear forms that we will encounter are equivalent to one
whose matrix is of the following form:
1. In , In .
2. If p + q = n, with p, q 1,
Ip,q =
Ip 0
0 Iq
3. If n = 2m,
Jm.m =
0
Im
Im 0
4. If n = 2m,
0 Im
.
Im 0
Am,m = Im.m Jm.m =
xi e i =
i x2i ,
i=1
i=1
381
xi e i =
x2i ,
i=1
i=1
xi e i =
i=p+1
i=1
i=1
with 0 p, q and p + q n.
Proof. The first statement is a direct consequence of Theorem 14.4. If K = C, then every
i has a square root i , and if replace ei by ei /i , we obtained the desired form.
If K = R, then there are two cases:
1. If i > 0, let i be a positive square root of i and replace ei by ei /i .
2. If i < 0, et i be a positive square root of i and replace ei by ei /i .
In the nondegenerate case, the matrices corresponding to the complex and the real case
are, In , In , and Ip,q . Observe that the second statement of Proposition 14.4 holds in any
field in which every element has a square root. In the case K = R, we can show that(p, q)
only depends on .
For any subspace U of E, we say that is positive definite on U iff (u, u) > 0 for all
nonzero u U , and we say that is negative definite on U iff (u, u) < 0 for all nonzero
u U . Then, let
r = max{dim(U ) | U E, is positive definite on U }
and let
s = max{dim(U ) | U E, is negative definite on U }
Proposition 14.6. (Sylvesters inertia law ) Given any symmetric bilinear form : E E
R with dim(E) = n, for any basis (e1 , . . . , en ) of E such that
X
n
i=1
xi e i
p
X
i=1
x2i
p+q
X
x2i ,
i=p+1
382
These last two results can be generalized to ordered fields. For example, see Snapper and
Troyer [99], Artin [3], and Bourbaki [13].
14.2
Sesquilinear Forms
383
2( )(u, v) = (u + v, u + v) (u v, u v) (u + v, u + v) + (u v, u v).
If the automorphism 7 is not the identity, then there is some K such that 6= 0,
and if K is not of characteristic 2, then we see that the sesquilinear form is completely
determined by its restriction to the diagonal (that is, the set of values {(u, u) | u E}).
In the special case where K = C, we can pick = i, and we get
4(u, v) = (u + v, u + v) (u v, u v) + i(u + v, u + v) i(u v, u v).
Remark: If the automorphism 7 is the identity, then in general is not determined
by its value on the diagonal, unless is symmetric.
In the sesquilinear setting, it turns out that the following two cases are of interest:
1. We have
(v, u) = (u, v),
for all u, v E,
in which case we say that is Hermitian. In the special case where K = C and the
involutive automorphism is conjugation, we see that (u, u) R, for u E.
2. We have
(v, u) = (u, v),
for all u, v E,
384
Proof. We give the prooof in the Hermitian case, the skew-Hermitian case being left as an
exercise. Assume that is alternating. From the identity
(u + v, u + v) = (u, u) + (u, v) + (u, v) + (v, v),
we get
(u, v) = (u, v) for all u, v E.
Since is not the zero form, there exist some nonzero vectors u, v E such that (u, v) = 1.
For any K, we have
(u, v) = (u, v) = (u, v) = (u, v),
and since (u, v) = 1, we get
= for all K.
385
The reader should check that because we used (u, y) in the definition of l (u)(y), the
function l (u) is indeed a linear form in F . It is also easy to check that l is a linear
map l : E F , and that r is a linear map r : F E (equivalently, l : E F and
r : F E are semilinear).
The notion of a nondegenerate sesquilinear form is identical to the notion for bilinear
forms. For the convenience of the reader, we repeat the definition.
xi e i ,
y j fj =
xi (ei , fj )y j .
i=1
j=1
i=1 j=1
This shows that is completely determined by the m n matrix M = ((ei , ej )), and in
matrix form, we have
(x, y) = x> M y = y M > x,
where x and y are the column vectors associated with (x1 , . . . , xm ) K m and (y 1 , . . . , y n )
K n , and y = y > . As earlier, we are committing the slight abuse of notation of letting x
386
P
denote both the vector x = ni=1 xi ei and the column vector associated with (x1 , . . . , xn )
(and similarly for y). We call M the matrix of with respect to the bases (e1 , . . . , em ) and
(f1 , . . . , fn ).
If m = dim(E) = dim(F ) = n, then is nondegenerate iff M is invertible iff det(M ) 6= 0.
Observe that if is a Hermitian form (E = F ) and if K does not have characteristic 2,
then by Theorem 14.9, there is a basis of E with respect to which the matrix M representing
is a diagonal matrix. If K = C, then these entries are real, and this allows us to classify
completely the Hermitian forms.
Proposition 14.10. Given any Hermitian form : E E C with dim(E) = n, there is
a basis (e1 , . . . , en ) of E such that
X
n
i=1
xi e i
p
X
i=1
x2i
p+q
X
x2i ,
i=p+1
with 0 p, q and p + q n.
The proof of Proposition 14.10 is the same as the real case of Proposition 14.5. Sylvesters
inertia law (Proposition 14.6) also holds for Hermitian forms: p and q only depend on .
14.3
Orthogonality
387
14.3. ORTHOGONALITY
is a notational ambiguity if U = V . In this case, we may write U r for the right orthogonal
and U l for the left orthogonal.
The above discussion brings up the following point: When is orthogonality symmetric?
If is bilinear, it is shown in E. Artin [3] (and in Jacobson [59]) that orthogonality is
symmetric iff either is symmetric or is alternating ((u, u) = 0 for all u E).
If is sesquilinear, the answer is more complicated. In addition to the previous two
cases, there is a third possibility:
(u, v) = (v, u) for all u, v E,
where is some nonzero element in K. We say that is -Hermitian. Observe that
(u, u) = (u, u),
so if is not alternating, then (u, u) 6= 0 for some u, and we must have = 1. The most
common cases are
1. = 1, in which case is Hermitian, and
2. = 1, in which case is skew-Hermitian.
If is alternating and K is not of characteristic 2, then the automorphism 7 must be
the identity if is nonzero. If so, is skew-symmetric, so = 1.
In summary, if is either symmetric, alternating, or -Hermitian, then orthogonality is
symmetric, and it makes sense to talk about the orthogonal subspace U of U .
Observe that if is -Hermitian, then
r = l .
This is because
l (u)(y) = (u, y)
r (u)(y) = (y, u)
= (u, y),
so r = l .
If E and F are finite-dimensional with bases (e1 , . . . , em ) and (f1 , . . . , fn ), and if is
represented by the m n matrix M , then is -Hermitian iff
M = M ,
where M = (M )> (as usual). This captures the following kinds of familiar matrices:
388
1. Symmetric matrices ( = 1)
2. Skew-symmetric matrices ( = 1)
3. Hermitian matrices ( = 1)
4. Skew-Hermitian matrices ( = 1).
Going back to a sesquilinear form : E F K, for any subspace U of E, it is easy to
check that
U (U ) ,
and that for any subspace V of F , we have
V (V ) .
For simplicity of notation, we write U instead of (U ) (and V instead of (V ) ).
Given any two subspaces U1 and U2 of E, if U1 U2 , then U2 U1 (and similarly for any
two subspaces V1 V2 of F ). As a consequence, it is easy to show that
U = U ,
V = V .
Proposition 14.11. For any sesquilinear form : E F K, the space E/F is finitedimensional iff the space F/E is finite-dimensional; if so, dim(E/F ) = dim(F/E ).
Proof. Since the sesquilinear form [] : (E/F ) (F/E ) K is nondegenerate, the maps
l[] : (E/F ) (F/E ) and r[] : (F/E ) (E/F ) are injective. If dim(E/F ) =
m, then dim(E/F ) = dim((E/F ) ), so by injectivity of r[] , we have dim(F/E ) =
dim((F/E )) m. A similar reasoning using the injectivity of l[] applies if dim(F/E ) = n,
and we get dim(E/F ) = dim((E/F )) n. Therefore, dim(E/F ) = m is finite iff
dim(F/E ) = n is finite, in which case m = n.
If U is a subspace of a space E, recall that the codimension of U is the dimension of
E/U , which is also equal to the dimension of any subspace V such that E is a direct sum of
U and V (E = U V ).
Proposition 14.11 implies the following useful fact.
389
14.3. ORTHOGONALITY
390
14.4
for all x E1 .
391
Thus, we get a function f l : E2 E1 . We claim that this function is a linear map. For any
v1 , v2 E2 , we have
2 (f (x), v1 + v2 ) = 2 (f (x), v1 ) + 2 (f (x), v2 )
= 1 (x, f l (v1 )) + 1 (x, f l (v2 ))
= 1 (x, f l (v1 ) + f l (v2 ))
= 1 (x, f l (v1 + v2 )),
for all x E1 . Since r1 is injective, we conclude that
f l (v1 + v2 ) = f l (v1 ) + f l (v2 ).
For any K, we have
2 (f (x), v) = 2 (f (x), v)
= 1 (x, f l (v))
= 1 (x, f l (v))
= 1 (x, f l (v)),
for all x E1 . Since r1 is injective, we conclude that
f l (v) = f l (v).
Therefore, f l is linear. We call it the left adjoint of f .
Now, for any fixed u E2 , we can consider the linear form in E1 given by
x 7 2 (u, f (x)) x E1 .
Since l1 : E1 E1 is bijective, there is a unique y E1 so that
2 (u, f (x)) = 1 (y, x),
for all x E1 .
The map f l is called the left adjoint of f , and the map f r is called the right adjoint of f .
392
If E1 and E2 are finite-dimensional with bases (e1 , . . . , em ) and (f1 , . . . , fn ), then we can
work out the matrices Al and Ar corresponding to the left adjoint f l and the right adjoint
f r of f . Assuming that f is represented by the n m matrix A, 1 is represented by the
m m matrix M1 , and 2 is represented by the n n matrix M2 , we find that
Al = (M1 )1 A M2
Ar = (M1> )1 A M2> .
If 1 and 2 are symmetric bilinear forms, then f l = f r . This also holds if is
-Hermitian. Indeed, since
2 (u, f (x)) = 1 (f r (u), x),
we get
2 (f (x), u) = 1 (x, f r (u)),
and since 7 is an involution, we get
2 (f (x), u) = 1 (x, f r (u)).
Since we also have
2 (f (x), u) = 1 (x, f l (u)),
we obtain
1 (x, f r (u)) = 1 (x, f l (u)) for all x E1 , and all u E2 ,
and since 1 is nondegenerate, we conclude that f l = f r . Whenever f l = f r , we use the
simpler notation f .
If f : E1 E2 and g : E1 E2 are two linear maps, we have the following properties:
(f + g)l = f l + g l
idl = id
(f )l = f l ,
and similarly for right adjoints. If E3 is another space, 3 is a sesquilinear form on E3 , and
if l2 and r2 are bijective, then for any linear maps f : E1 E2 and g : E2 E3 , we have
(g f )l = f l g l ,
and similarly for right adjoints. Furthermore, if E1 = E2 and : E E K is -Hermitian,
for any linear map f : E E (recall that in this case f l = f r = f ). we have
f = f.
14.5
393
The notion of adjoint is a good tool to investigate the notion of isometry between spaces
equipped with sesquilinear forms. First, we define metric maps and isometries.
Definition 14.11. If (E1 , 1 ) and (E2 , 2 ) are two pairs of spaces and sesquilinear maps
1 : E1 E2 K and 2 : E2 E2 K, a metric map from (E1 , 1 ) to (E2 , 2 ) is a linear
map f : E1 E2 such that
1 (u, v) = 2 (f (u), f (v)) for all u, v E1 .
We say that 1 and 2 are equivalent iff there is a metric map f : E1 E2 which is bijective.
Such a metric map is called an isometry.
The problem of classifying sesquilinear forms up to equivalence is an important but very
difficult problem. Solving this problem depends intimately on properties of the field K, and
a complete answer is only known in a few cases. The problem is easily solved for K = R,
K = C. It is also solved for finite fields and for K = Q (the rationals), but the solution is
surprisingly involved!
It is hard to say anything interesting if 1 is degenerate and if the linear map f does not
have adjoints. The next few propositions make use of natural conditions on 1 that yield a
useful criterion for being a metric map.
Proposition 14.15. With the same assumptions as in Definition 14.10, if f : E1 E2 is a
bijective linear map, then we have
1 (x, y) = 2 (f (x), f (y)) for all x, y E1 iff
f 1 = f l = f r .
Proof. We have
1 (x, y) = 2 (f (x), f (y))
iff
1 (x, y) = 2 (f (x), f (y)) = 1 (x, f l (f (y))
iff
1 (x, (id f l f )(y)) = 0 for all E1 and all y E2 .
Since 1 is nondegenerate, we must have
f l f = id,
which implies that f 1 = f l . similarly,
1 (x, y) = 2 (f (x), f (y))
394
iff
1 (x, y) = 2 (f (x), f (y)) = 1 (f r (f (x)), y)
iff
1 ((id f r f )(x), y) = 0 for all E1 and all y E2 .
f r f = id,
which implies that f 1 = f r . Therefore, f 1 = f l = f r . For the converse, do the
computations in reverse.
As a corollary, we get the following important proposition.
Proposition 14.16. If : E E K is a sesquilinear map, and if l and r are bijective,
for every bijective linear map f : E E, then we have
(f (x), f (y)) = (x, y)
f 1 = f l = f r .
for all x, y E,
()
then f is injective.
(2) If E is finite-dimensional and if is nondegenerate, then the linear maps f : E E
satisfying () form a group. The inverse of f is given by f 1 = f .
Proof. (1) If f (x) = 0, then
(x, y) = (f (x), f (y)) = (0, f (y)) = 0 for all y E.
Since l is injective, we must have x = 0, and thus f is injective.
(2) If E is finite-dimensional, since a linear map satisfying () is injective, it is a bijection.
By Proposition 14.16, we have f 1 = f . We also have
(f (x), f (y)) = ((f f )(x), y) = (x, y) = ((f f )(x), y) = (f (x), f (y)),
which shows that f satisfies (). If (f (x), f (y)) = (x, y) for all x, y E and (g(x), g(y))
= (x, y) for all x, y E, then we have
((g f )(x), (g f )(y)) = (f (x), f (y)) = (x, y) for all x, y E.
Obviously, the identity map idE satisfies (). Therefore, the set of linear maps satisfying ()
is a group.
395
If is symmetric, then the group Isom() is denoted O() and called the orthogonal
group of . If is alternating, then the group Isom() is denoted Sp() and called the
symplectic group of . If is -Hermitian, then the group Isom() is denoted U () and
called the -unitary group of . When = 1, we drop and just say unitary group.
If (e1 , . . . , en ) is a basis of E, is the represented by the n n matrix M , and f is
represented by the n n matrix A, then we find that f Isom() iff
A M > A = M >
iff A> M A = M,
The group O(n) is the orthogonal group, Sp(2n, R) is the real symplectic group, and
SO(n) is the special orthogonal group. We can define the group
{A Mat2n (R) | A> An,n A = An,n },
The group U(n) is the unitary group, Sp(2n, C) is the complex symplectic group, and
SU(n) is the special unitary group.
It can be shown that if A Sp(2n, R) or if A Sp(2n, C), then det(A) = 1.
396
14.6
397
= U U V + V U V
= U U + V V
= rad(U ) + rad(V ).
398
(i) U is nondegenerate.
(ii) U is nondegenerate.
(iii) E = U U .
Proof. By definition, rad(U ) = U U , and since is nondegenerate and U is finitedimensional, U = U , so rad(U ) = U U = U U = rad(U ).
By Proposition 14.19, (i) implies (iii). If E = U U , then rad(U ) = U U = (0),
so U is nondegenerate and (iii) implies (i). Since rad(U ) = rad(U ), (iii) also implies (ii).
Now, if U is nondegenerate, we have U U = (0), and since U U , we get
U U U U = (0),
which shows that U is nondegenerate, proving the implication (ii) = (i).
If E is finite-dimensional, we have the following results.
Proposition 14.21. Given an -Hermitian form : E E K on a finite-dimensional
space E, if is nondegenerate, then for every subspace U of E we have
(i) dim(U ) + dim(U ) = dim(E).
(ii) U = U .
Proof. (i) Since is nondegenerate and E is finite-dimensional, the semilinear map l : E
E is bijective. By transposition, the inclusion i : U E yields a surjection r : E U
(with r(f ) = f i for every f E ; the map f i is the restriction of the linear form f to
U ). It follows that the semilinear map r l : E U given by
(r l )(x)(u) = (x, u) x E, u U
is surjective, and its kernel is U . Thus, we have
dim(U ) + dim(U ) = dim(E),
and since dim(U ) = dim(U ) because U is finite-dimensional, we get
dim(U ) + dim(U ) = dim(U ) + dim(U ) = dim(E).
(ii) Applying the above formula to U , we deduce that dim(U ) = dim(U ). Since
U U , we must have U = U .
399
400
with respect to which the matrix representing is a block diagonal matrix M of the form
J
0
J
...
M =
,
J
0
0n2r
with
J=
0 1
.
1 0
Proof. If = 0, then E = E and we are done. Otherwise, there are two nonzero vectors
u, v E such that (u, v) 6= 0, so by Proposition 14.22, we obtain a hyperbolic plane W2
spanned by two vectors u1 , v1 such that (u1 , v1 ) = 1. The subspace W1 is nondegenerate
(for example, det(J) = 1), so by Proposition 14.20, we get a direct sum
E = W1 W1 .
By Proposition 14.13, we also have
E = (W1 W1 ) = W1 W1 = rad(W1 ).
By the induction hypothesis applied to W1 , we obtain our theorem.
The following corollary follows immediately.
Proposition 14.24. Let : E E K be an alternating bilinear form on a space E of
finite dimension n.
(1) The rank of is even.
(2) If is nondegenerate, then dim(E) = n is even.
(3) Two alternating bilinear forms 1 : E1 E1 K and 2 : E2 E2 K are equivalent
iff dim(E1 ) = dim(E2 ) and 1 and 2 have the same rank.
The only part that requires a proof is part (3), which is left as an easy exercise.
If is nondegenerate, then n = 2r, and a basis of E as in Theorem 14.23 is called a
symplectic basis. The space E is called a hyperbolic space (or symplectic space).
Observe that if we reorder the vectors in the basis
(u1 , v1 , . . . , ur , vr , w1 , . . . , wn2r )
401
0 Ir
0
Ir 0
0 .
0
0 0n2r
This particularly simple matrix is often preferable, especially when dealing with the matrices
(symplectic matrices) representing the isometries of (in which case n = 2r).
We now return to the Witt decomposition. From now on, : EE K is an -Hermitian
form. The following assumption will be needed:
Property (T). For every u E, there is some K such that (u, u) = + .
Property (T) is always satisfied if is alternating, or if K is of characteristic 6= 2 and
= 1, with = 12 (u, u).
The following (bizarre) technical lemma will be needed.
Lemma 14.25. Let be an -Hermitian form on E and assume that satisfies property
(T). For any totally isotropic subspace U 6= (0) of E, for every x E not orthogonal to U ,
and for every K, there is some y U so that
(x + y, x + y) = + .
Proof. By property (T), we have (x, x) = + for some K. For any y U , since
is -Hermitian, (y, x) = (x, y), and since U is totally isotropic (y, y) = 0, so we have
(x + y, x + y) = (x, x) + (x, y) + (y, x) + (y, y)
= + + (x, y) + (x, y)
= + (x, y) + ( + (x, y).
Since x is not orthogonal to U , the function y 7 (x, y) + is not the constant function.
Consequently, this function takes the value for some y U , which proves the lemma.
Definition 14.14. Let be an -Hermitian form on E. A Witt decomposition of E is a
triple (U, U 0 , W ), such that
(i) E = U U 0 W (a direct sum)
(ii) U and U 0 are totally isotropic
(iii) W is nondegenerate and orthogonal to U U 0 .
402
0
A
0
A 0
0 0
0 B
E = (U U 0 ) W .
As a warm up for Proposition 14.27, we prove an analog of Proposition 14.22 in the case
of a symmetric bilinear form.
Proposition 14.26. Let : E E K be a nondegenerate symmetric bilinear form with K
a field of characteristic different from 2. For any nonzero isotropic vector u, there is another
nonzero isotropic vector v such that (u, v) = 2, and u and v are linearly independent. In
the basis (u, v/2), the restriction of to the plane spanned by u and v/2 is of the form
0 1
.
1 0
Proof. Since is nondegenerate, there is some nonzero vector z such that (rescaling z if
necessary) (u, z) = 1. If
v = 2z (z, z)u,
then since (u, u) = 0 and (u, z) = 1, note that
(u, v) = (u, 2z (z, z)u) = 2(u, z) (z, z)(u, u) = 2,
and
(v, v) = (2z (z, z)u, 2z (z, z)u)
= 4(z, z) 4(z, z)(u, z) + (z, z)2 (u, u)
= 4(z, z) 4(z, z) = 0.
If u and z were linearly dependent, as u, z 6= 0, we could write z = u for some 6= 0, but
then, we would have
(u, z) = (u, u) = (u, u) = 0,
contradicting the fact that (u, z) 6= 0. Then u and v = 2z (z, z)u are also linearly
independent, since otherwise z could be expressed as a multiple of u. The rest is obvious.
403
Proposition 14.26 yields a plane spanned by two vectors u1 , v1 such that (u1 , u1 ) =
(v1 , v1 ) = 0 and (u1 , v1 ) = 1. Such a plane is called an Artinian plane. Proposition 14.26
also shows that nonzero isotropic vectors come in pair.
Remark: Some authors refer to the above plane as a hyperbolic plane. Berger (and others)
point out that this terminology is undesirable because the notion of hyperbolic plane already
exists in differential geometry and refers to a very different object.
We leave it as an exercice to figure out that the group of isometries of the Artinian plane,
the set of all 2 2 matrices A such that
0 1
> 0 1
A=
,
A
1 0
1 0
consists of all matrices of the form
0
or
0 1
0
,
1 0
K {0}.
In particular, if K = R, then this group denoted O(1, 1) has four connected components.
The first step in showing the existence of a Witt decomposition is this.
Proposition 14.27. Let be an -Hermitian form on E, assume that is nondegenerate
and satisfies property (T), and let U be any totally isotropic subspace of E of finite dimension
dim(U ) = r.
(1) If U 0 is any totally isotropic subspace of dimension r and if U 0 U = (0), then U U 0
is nondegenerate, and for any basis (u1 , . . . , ur ) of U , there is a basis (u01 , . . . , u0r ) of U 0
such that (ui , u0j ) = ij , for all i, j = 1, . . . , r.
(2) If W is any totally isotropic subspace of dimension at most r and if W U = (0),
then there exists a totally isotropic subspace U 0 with dim(U 0 ) = r such that W U 0
and U 0 U = (0).
Proof. (1) Let 0 be the restriction of to U U 0 . Since U 0 U = (0), for any v U 0 ,
if (u, v) = 0 for all u U , then v = 0. Thus, 0 is nondegenerate (we only have to check
on the left since is -Hermitian). Then, the assertion about bases follows from the version
of Proposition 14.3 for sesquilinear forms. Since U is totally isotropic, U U , and since
U 0 U = (0), we must have U 0 U = (0), which show that we have a direct sum U U 0 .
It remains to prove that U + U 0 is nondegenerate. Observe that
H = (U + U 0 ) (U + U 0 ) = (U + U 0 ) U U 0 .
Since U is totally isotropic, U U , and since U 0 U = (0), we have
(U + U 0 ) U = (U U ) + (U 0 U ) = U + (0) = U,
404
405
Proposition 14.29. Any two -Hermitian neutral forms satisfying property (T) defined on
spaces of the same dimension are equivalent.
Note that under the conditions of Proposition 14.28, (U, U 0 , (U U 0 ) ) is a Witt decomposition for E. By Proposition 14.27(1), the block A in the matrix of is the identity
matrix.
The following proposition shows that every subspace U of E can be embedded into a
nondegenerate subspace.
Proposition 14.30. Let be an -Hermitian form on E which is nondegenerate and satisfies
property (T). For any subspace U of E of finite dimension, if we write
U = V W,
for some orthogonal complement W of V = rad(U ), and if we let r = dim(rad(U )), then
there exists a totally isotropic subspace V 0 of dimension r such that V V 0 = (0), and
so (V V 0 ) W = V 0 U is nondegenerate.
We leave the second statement about extending f as an exercise (use the fact that f (U ) =
406
(U W ) D and E = S N , so E = S (U W ) D.
407
0
Ir
0
Ir 0
0 0,
0 P
where either n = 2r and P does not occur, or n > 2r and P is a definite symmetric
matrix.
(c) if is -Hermitian (the involutive automorphism 7 is not the identity), then
is represented by a matrix of the form
0 Ir 0
Ir 0 0 ,
0 0 P
where either n = 2r and P does not occur, or n > 2r and P is a definite matrix
such that P = P .
408
Proof. Part (1) follows from Theorem 14.31. By Proposition 14.28, we obtain a totally
isotropic subspace U 0 of dimension r such that U U 0 = (0). By applying Theorem 14.31
to U1 = U and U2 = U 0 , we get U = W = (0), which proves (2). Part (3) is an immediate
consequence of (2).
As a consequence of Theorem 14.32, we make the following definition.
Definition 14.15. Let E be a vector space of finite dimension n, and let be an -Hermitian
form on E which is nondegenerate and satisfies property (T). The index (or Witt index )
of , is the common dimension of all totally isotropic maximal subspaces of E. We have
2 n.
Neutral forms only exist if n is even, in which case, = n/2. Forms of index = 0
have no nonzero isotropic vectors. When K = R, this is satisfied by positive definite or
negative definite symmetric forms. When K = C, this is satisfied by positive definite or
negative definite Hermitian forms. The vector space of a neutral Hermitian form ( = +1) is
an Artinian space, and the vector space of a neutral alternating form is a hyperbolic space.
If the field K is algebraically closed, we can describe all nondegenerate quadratic forms.
Proposition 14.33. If K is algebraically closed and E has dimension n, then for every
nondegenerate quadratic form , there is a basis (e1 , . . . , en ) such that is given by
X
(Pm
n
xi xm+i
if n = 2m
xi ei = Pi=1
m
2
if n = 2m + 1.
i=1 xi xm+i + x2m+1
i1
Proof. We work with the polar form of . Let U1 and U2 be some totally isotropic
subspaces such that U1 U2 = (0) given by Theorem 14.32, and let q be their common
dimension. Then, W = U = (0). Since we can pick bases (e1 , . . . eq ) in U1 and (eq+1 , . . . , e2q )
in U2 such that (ei , ei+q ) = 0, for i, j = 1, . . . , q, it suffices to proves that dim(D) 1. If
x, y D with x 6= 0, from the identity
(y x) = (y) (x, y) + 2 (x)
and the fact that (x) 6= 0 since x D and x 6= 0, we see that the equation (y y) = 0
has at least one solution. Since (z) 6= 0 for every nonzero z D, we get y = x, and thus
dim(D) 1, as claimed.
We also have the following proposition which has applications in number theory.
Proposition 14.34. If is any nondegenerate quadratic form such that there is some
nonzero vector x E with (x) = 0, then for every K, there is some y E such that
(y) = .
The proof is left as an exercise. We now turn to the Witt extension theorem.
14.7
409
Witts Theorem
Witts theorem was referred to as a scandal by Emil Artin. What he meant by this is
that one had to wait until 1936 (Witt [115]) to formulate and prove a theorem at once so
simple in its statement and underlying concepts, and so useful in various domains (geometry,
arithmetic of quadratic forms).1
Besides Witts original proof (Witt [115]), Chevalleys proof [22] seems to be the best
proof that applies to the symmetric as well as the skew-symmetric case. The proof in
Bourbaki [13] is based on Chevalleys proof, and so are a number of other proofs. This is
the one we follow (slightly reorganized). In the symmetric case, Serres exposition is hard to
beat (see Serre [97], Chapter IV).
Theorem 14.35. (Witt, 1936) Let E and E 0 be two finite-dimensional spaces respectively
equipped with two nondegenerate -Hermitan forms and 0 satisfying condition (T), and
assume that there is an isometry between (E, ) and (E 0 , 0 ). For any subspace U of E,
every injective metric linear map f from U into E 0 extends to an isometry from E to E 0 .
Proof. Since (E, ) and (E 0 , 0 ) are isometric, we may assume that E 0 = E and 0 = (if
h : E E 0 is an isometry, then h1 f is an injective metric map from U into E. The
details are left to the reader). We begin with the following observation. If U1 and U2 are
two subspaces of E such that U1 U2 = (0) and if we have metric linear maps f1 : U1 E
and f2 : U2 E such that
(f1 (u1 ), f2 (u2 )) = (u1 , u2 ) for ui Ui (i = 1, 2),
()
then the linear map f : U1 U2 E given by f (u1 + u2 ) = f1 (u1 ) + f2 (u2 ) extends f1 and
f2 and is metric. Indeed, since f1 and f2 are metric and using (), we have
(f1 (u1 ) + f2 (u2 ), f1 (v1 ) + f2 (v2 )) = (f1 (u1 ), f1 (v1 )) + (f1 (u1 ), f2 (v2 ))
+ (f2 (u2 ), f1 (v1 )) + (f2 (u2 ), f2 (v2 ))
= (u1 , v1 ) + (u1 , v2 ) + (u2 , v1 ) + (u2 , v2 )
= (u1 + u2 , v2 + v2 ).
Furthermore, if f1 and f2 are injective, then so if f .
We now proceed by induction on the dimension r of U . The case r = 0 is trivial. For
the induction step, r 1 so U 6= (0), and let H be any hyperplane in U . Let f : U E
be an injective metric linear map. By the induction hypothesis, the restriction f0 of f to H
extends to an isometry g0 of E. If g0 extends f , we are done. Otherwise, H is the subspace
of elements of U left fixed by g01 f . If the theorem holds in this situation, namely the
1
Curiously, some references to Witts paper claim its date of publication to be 1936, but others say 1937.
The answer to this mystery is that Volume 176 of Crelle Journal was published in four issues. The cover
page of volume 176 mentions the year 1937, but Witts paper is dated May 1936. This is not the only paper
of Witt appearing in this volume!
410
()
In this case, formula () show that f (U ) is not contained in D (check this!). Consequently,
U D = f (U ) D = H.
We can pick V to be any supplement of H in D , and the above formula shows that V U =
V f (U ) = (0). Since U V contains the hyperplane D (since D = H V and H U ),
and U V 6= D (since U is not contained in D and V D ), we must have E = U V ,
and as we showed as a consequence of hypothesis (V), f can be extended to an isometry of
U V = E.
Case (b). U D .
411
We show that case (b) can be reduced to the situation where U = D and f is an
isometry of U . For this, we show that there exists a subspace V of D , such that
D = U V = f (U ) V.
This is obvious if U = f (U ). Otherwise, let x U with x
/ H, and let y f (U ) with y
/ H.
Since f (H) = H (pointwise), f is injective, and H is a hyperplane in U , we have
U = H Kx,
f (U ) = H Ky.
We claim that x + y
/ U . Otherwise, since y = x + y x, with x + y, x U and since
y f (U ), we would have y U f (U ) = H, a contradiction. Similarly, x + y
/ f (U ). It
follows that
U + f (U ) = U K(x + y) = f (U ) K(x + y).
The above argument shows that we are reduced to the situation where U = D is a
hyperplane in E and f is an isometry of U . If we pick any v
/ U , then E = U Kv, and if
we can find some v1 E such that
(f (u), v1 ) = (u, v) for all u U
(v1 , v1 ) = (v, v),
then as we showed at the beginning of the proof, we can extend f to a metric map g of
U + Kv = E such that g(v) = v1 .
To find v1 , let us prove that for every v E, there is some v 0 E such that
(f (u), v 0 ) = (u, v) for all u U .
()
This is because the linear form u 7 (f 1 (u), v) (u U ) is the restriction of a linear form
E , and since is nondegenerate, there is some (unique) v 0 E, such that
(x) = (x, v 0 ) for all x E,
412
413
Remark: Witts cancelation theorem can be used to define an equivalence relation on Hermitian spaces and to define a group structure on these equivalence classes. This way, we
obtain the Witt group, but we will not discuss it here.
14.8
Symplectic Groups
In this section, we are dealing with a nondegenerate alternating form on a vector space E
of dimension n. As we saw earlier, n must be even, say n = 2m. By Theorem 14.23, there
is a direct sum decomposition of E into pairwise orthogonal subspaces
E = W1 Wm ,
where each Wi is a hyperbolic plane. Each Wi has a basis (ui , vi ), with (ui , ui ) = (vi , vi ) =
0 and (ui , vi ) = 1, for i = 1, . . . , m. In the basis
(u1 , . . . , um , v1 , . . . , vm ),
is represented by the matrix
Jm,m =
0
Im
.
Im 0
The symplectic group Sp(2m, K) is the group of isometries of . The maps in Sp(2m, K)
are called symplectic maps. With respect to the above basis, Sp(2m, K) is the group of
2m 2m matrices A such that
A> Jm,m A = Jm,m .
Matrices satisfying the above identity are called symplectic matrices. In this section, we show
that Sp(2m, K) is a subgroup of SL(2m, K) (that is, det(A) = +1 for all A Sp(2m, K)),
and we show that Sp(2m, K) is generated by special linear maps called symplectic transvections.
First, we leave it as an easy exercise to show that Sp(2, K) = SL(2, K). The reader
should also prove that Sp(2m, K) has a subgroup isomorphic to GL(m, K).
Next we characterize the symplectic maps f that leave fixed every vector in some given
hyperplane H, that is,
f (v) = v for all v H.
414
which shows that f (u) u H for all u E. Therefore, f id is a linear map from E
into the line H whose kernel contains H, which means that there is some nonzero vector
w H and some linear form such that
f (u) = u + (u)w,
u E.
Since f is an isometry, we must have (f (u), f (v)) = (u, v) for all u, v E, which means
that
(u, v) = (f (u), f (v))
= (u + (u)w, v + (v)w)
= (u, v) + (u)(w, v) + (v)(u, w) + (u)(v)(w, w)
= (u, v) + (u)(w, v) (v)(w, u),
which yields
(u)(w, v) = (v)(w, u) for all u, v E.
Since is nondegenerate, we can pick some v0 such that (w, v0 ) 6= 0, and we get
(u)(w, v0 ) = (v0 )(w, u) for all u E; that is,
(u) = (w, u) for all u E,
for some K. Therefore, f is of the form
f (u) = u + (w, u)w,
for all u E.
It is also clear that every f of the above form is a symplectic map. If = 0, then f = id.
Otherwise, if 6= 0, then f (u) = u iff (w, u) = 0 iff u (Kw) = H, where H is a
hyperplane. Thus, f fixes every vector in the hyperplane H. Note that since is alternating,
(w, w) = 0, which means that w H.
In summary, we have characterized all the symplectic maps that leave every vector in
some hyperplane fixed, and we make the following definition.
for all u E,
for some nonzero w E and some K. If 6= 0, the subspace of vectors left fixed by f
is the hyperplane H = (Kw) . The map f is also denoted u, .
Observe that
u, u, = u,+
and u, = id iff = 0. The above shows that det(u, ) = 1, since when 6= 0, we have
u, = (u,/2 )2 .
Our next goal is to show that if u and v are any two nonzero vectors in E, then there is
a simple symplectic map f such that f (u) = v.
415
Proposition 14.38. Given any two nonzero vectors u, v E, there is a symplectic map
f such that f (u) = v, and f is either a symplectic transvection, or the composition of two
symplectic transvections.
Proof. There are two cases.
Case 1 . (u, v) 6= 0.
In this case, u 6= v, since (u, u) = 0. Let us look for a symplectic transvection of the
form vu, . We want
v = u + (v u, u)(v u) = u + (v, u)(v u),
which yields
((v, u) 1)(v u) = 0.
Since (u, v) 6= 0 and (v, u) = (u, v), we can pick = (v, u)1 and vu, maps u to v.
Case 2 . (u, v) = 0.
If u = v, use u,0 = id. Now, assume u 6= v. We claim that it is possible to pick some
w E such that (u, w) 6= 0 and (v, w) 6= 0. Indeed, if (Ku) = (Kv) , then pick any
nonzero vector w not in the hyperplane (Ku) . Othwerwise, (Ku) and (Kv) are two
distinct hyperplanes, so neither is contained in the other (they have the same dimension),
so pick any nonzero vector w1 such that w1 (Ku) and w1
/ (Kv) , and pick any
416
E = U U ,
and if is a transvection of H , then we can form the linear map idU whose restriction
Theorem 14.40. The symplectic group Sp(2m, K) is generated by the symplectic transvections. For every transvection f Sp(2m, K), we have det(f ) = 1.
Proof. Let G be the subgroup of Sp(2m, K) generated by the tranvections. We need to
prove that G = Sp(2m, K). Let (u1 , v1 , . . . , um , vm ) be a symplectic basis of E, and let f
Sp(2m, K) be any symplectic map. Then, f maps (u1 , v1 , . . . , um , vm ) to another symplectic
0
basis (u01 , v10 , . . . , u0m , vm
). If we prove that there is some g G such that g(ui ) = u0i and
g(vi ) = vi0 for i = 1, . . . , m, then f = g and G = Sp(2m, K).
We use induction on i to prove that there is some gi G so that gi maps (u1 , v1 , . . . , ui , vi )
to (u01 , v10 , . . . , u0i , vi0 ).
The base case i = 1 follows from Proposition 14.39.
For the induction step, assume that we have some gi G mapping (u1 , v1 , . . . , ui , vi )
00
00
to (u01 , v10 , . . . , u0i , vi0 ), and let (u00i+1 , vi+1
, . . . , u00m , vm
) be the image of (ui+1 , vi+1 , . . . , um , vm )
417
0
by gi . If U is the subspace spanned by (u01 , v10 , . . . , u0m , vm
), then each hyperbolic plane
0
0
0
00
00
Wi+k given by (ui+k , vi+k ) and each hyperbolic plane Wi+k given by (u00i+k , vi+k
) belongs to
U . Using the remark before the theorem and Proposition 14.39, we can find a transvec0
00
and leaving every vector in U fixed. Then, gi maps
onto Wi+1
tion mapping Wi+1
0
0
0
), establishing the induction step.
(u1 , v1 , . . . , ui+1 , vi+1 ) to (u1 , v1 , . . . , u0i+1 , vi+1
For the second statement, since we already proved that every transvection has a determinant equal to +1, this also holds for any composition of transvections in G, and since
G = Sp(2m, K), we are done.
It can also be shown that the center of Sp(2m, K) is reduced to the subgroup {id, id}.
The projective symplectic group PSp(2m, K) is the quotient group PSp(2m, K)/{id, id}.
All symplectic projective groups are simple, except PSp(2, F2 ), PSp(2, F3 ), and PSp(4, F2 ),
see Grove [52].
The orders of the symplectic groups over finite fields can be determined. For details, see
Artin [3], Jacobson [59] and Grove [52].
An interesting property of symplectic spaces is that the determinant of a skew-symmetric
matrix B is the square of some polynomial Pf(B) called the Pfaffian; see Jacobson [59] and
Artin [3]. We leave considerations of the Pfaffian to the exercises.
We now take a look at the orthogonal groups.
14.9
Orthogonal Groups
In this section, we are dealing with a nondegenerate symmetric bilinear from over a finitedimensional vector space E of dimension n over a field of characateristic not equal to 2.
Recall that the orthogonal group O() is the group of isometries of ; that is, the group of
linear maps f : E E such that
(f (u), f (v)) = (u, v) for all u, v E.
The elements of O() are also called orthogonal transformations. If M is the matrix of in
any basis, then a matrix A represents an orthogonal transformation iff
A> M A = M.
Since is nondegenerate, M is invertible, so we see that det(A) = 1. The subgroup
SO() = {f O() | det(f ) = 1}
is called the special orthogonal group (of ), and its members are called rotations (or proper
orthogonal transformations). Isometries f O() such that det(f ) = 1 are called improper
orthogonal transformations, or sometimes reversions.
418
E = H H .
For any nonzero vector u D = H Consider the map u given by
u (v) = v 2
(v, u)
u for all v E.
(u, u)
(v, u)
(v, u)
(v, u)
u = v 2 2
u = v 2
u,
(u, u)
(u, u)
(u, u)
which shows that u depends only on the line D, and thus only the hyperplane H. Therefore,
denote by H the linear map u determined as above by any nonzero vector u H . Note
that if v H, then
H (v) = v,
and if v D, then
H (v) = v.
(v, u)
u for all v E
(u, u)
is an involutive isometry of E called the reflection through (or about) the hyperplane H.
Remarks:
1. It can be shown that if f O() leaves every vector in some hyperplane H fixed, then
either f = id or f = H ; see Taylor [108] (Chapter 11). Thus, there is no analog to
symplectic transvections in the orthogonal group.
2. If K = R and is the usual Euclidean inner product, the matrices corresponding to
hyperplane reflections are called Householder matrices.
Our goal is to prove that O() is generated by the hyperplane reflections. The following
proposition is needed.
419
vu (u) = u 2
Next, assume that n 1. Since is nondegenerate, we know that there is some nonisotropic vector u E. There are three cases.
Case 1 . f (u) = u.
erate, E = H (Ku) , and since f (u) = u, we must have f (H) = H. The restriction f 0 of
of f to H is an isometry of H. By the induction hypothesis, we can write
f 0 = k0 10 ,
where i is some hyperplane reflection about a hyperplane Li in H, with k 2n 3. We
can extend each i0 to a reflection i about the hyperplane Li Ku so that i (u) = u, and
clearly,
f = k 1 .
Case 2 . f (u) = u.
420
where and the i are hyperplane reflections, with k 2n 3, and we get a total of 2n 2
hyperplane reflections.
Case 3 . f (u) 6= u and f (u) 6= u.
where f (u)u and the i are hyperplane reflections, with k 2n 3, and we get a total of
2n 2 hyperplane reflections.
If f (u) + u is not isotropic, then the reflection f (u)+u is such that
f (u)+u (u) = f (u),
and since f2(u)+u = id, if g = f (u)+u f , then g(u) = u, and we are back to case (2). We
obtain
f = f (u)u k 1
where , f (u)u and the i are hyperplane reflections, with k 2n 3, and we get a total of
2n 1 hyperplane reflections. This proves the induction step.
The bound 2n 1 is not optimal. The strong version of the CartanDieudonne theorem
says that at most n reflections are needed, but the proof is harder. Here is a neat proof due
to E. Artin (see [3], Chapter III, Section 4).
Case 1 remains unchanged. Case 2 is slightly different: f (u) u is not isotropic. Since
(f (u) + u, f (u) u) = 0, as in the first subcase of Case (3), g = f (u)u f is such that
g(u) = u and we are back to Case 1. This only costs one more reflection.
The new (bad) case is:
421
Case 3 . f (u) u is nonzero and isotropic for all nonisotropic u E. In this case, what
saves us is that E must be an Artinian space of dimension n = 2m and that f must be a
rotation (f SO()).
If we acccept this fact, then pick any hyperplane reflection . Then, since f is a rotation,
g = f is not a rotation because det(g) = det( ) det(f ) = (1)(+1) = 1, so g(u) u
is not isotropic for all nonisotropic u E, we are back to Case 2, and using the induction
hypothesis, we get
f = k . . . , 1 ,
where each i is a hyperplane reflection, and k 2m. Since f is not a rotation, actually
k 2m 1, and then f = k . . . , 1 , the composition of at most k + 1 2m hyperplane
reflections.
Therefore, except for the fact that in Case 3, E must be an Artinian space of dimension
n = 2m and that f must be a rotation, which has not been proven yet, we proved the
following theorem.
Theorem 14.43. (CartanDieudonne, strong form) Let be a nondegenerate symmetric
bilinear form on a K-vector space E of dimension n (char(K) 6= 2). Then, every isometry
f O() with f 6= id is the composition of at most n hyperplane reflections.
To fill in the gap, we need two propositions.
Proposition 14.44. Let (E, ) be an Artinian space of dimension 2m, and let U be a totally
isotropic subspace of dimension m. For any isometry f O(), we have det(f ) = 1 (f is a
rotation).
Proof. We know that we can find a basis (u1 , . . . , um , v1 , . . . , vm ) of E such (u1 , . . . , um ) is a
basis of U and is represented by the matrix
0 Im
.
Im 0
Since f (U ) = U , the matrix representing f is of the form
B C
A=
.
0 D
The condition A> Am,m A = Am,m translates as
>
B
0
0 Im
B C
0 Im
=
C > D>
Im 0
0 D
Im 0
that is,
B> 0
C > D>
0 D
B C
=
B>D
D> B C > D + D> C
0
=
0 Im
,
Im 0
422
0
=
,
1
1
0
1
423
424
E=U U ,
E = U = (V V 0 ) W,
where V V 0 = Ar2r = W is an Artinian space. Any isometry h of E which is the identity
on U and with det(h) = 1 is the identity on W , and thus it must map W = Ar2r = V V 0
into itself, and the restriction h0 of h to Ar2r has det(h0 ) = 1. However, h0 is the identity
on V = rad(U ), a totally isotopic subspace of Ar2r of dimension r, and by Proposition 14.44,
we have det(h0 ) = +1, a contradiction.
It can be shown that the center of O() is {id, id}. For further properties of orthogonal
groups, see Grove [52], Jacobson [59], Taylor [108], and Artin [3].
Chapter 15
Variational Approximation of
Boundary-Value Problems;
Introduction to the Finite Elements
Method
15.1
Consider a beam of unit length supported at its ends in 0 and 1, stretched along its axis by
a force P , and subjected to a transverse load f (x)dx per element dx, as illustrated in Figure
15.1.
dx
f (x)dx
Figure 15.1: Vertical deflection of a beam
The bending moment u(x) at the absissa x is the solution of a boundary problem (BP)
of the form
u00 (x) + c(x)u(x) = f (x),
u(0) =
u(1) = ,
425
0<x<1
426
where c(x) = P/(EI(x)), where E is the Youngs modulus of the material of which the beam
is made and I(x) is the principal moment of inertia of the cross-section of the beam at the
abcissa x, and with = = 0. For this problem, we may assume that c(x) 0 for all
x [0, 1].
Remark: The vertical deflection w(x) of the beam and the bending moment u(x) are related
by the equation
d2 w
u(x) = EI 2 .
dx
If we seek a solution u C 2 ([0, 1]), that is, a function whose first and second derivatives
exist and are continuous, then it can be shown that the problem has a unique solution
(assuming c and f to be continuous functions on [0, 1]).
Except in very rare situations, this problem has no closed-form solution, so we are led to
seek approximations of the solutions.
One one way to proceed is to use the finite difference method , where we discretize the
problem and replace derivatives by differences. Another way is to use a variational approach.
In this approach, we follow a somewhat surprising path in which we come up with a so-called
weak formulation of the problem, by using a trick based on integrating by parts!
First, let us observe that we can always assume that = = 0, by looking for a solution
of the form u(x) ((1 x) + x). This turns out to be crucial when we integrate by parts.
There are a lot of subtle mathematical details involved to make what follows rigorous, but
we here, we will take a relaxed approach.
First, we need to specify the space of weak solutions. This will be the vector space V of
continuous functions f on [0, 1], with f (0) = f (1) = 0, and which are piecewise continuously
differentiable on [0, 1]. This means that there is a finite number of points x0 , . . . , xN +1 with
x0 = 0 and xN +1 = 1, such that f 0 (xi ) is undefined for i = 1, . . . , N , but otherwise f 0 is
defined and continuous on each interval (xi , xi+1 ) for i = 0, . . . , N .1 The space V becomes a
Euclidean vector space under the inner product
Z 1
hf, giV =
(f (x)g(x) + f 0 (x)g 0 (x))dx,
0
0<x<1
We also assume that f 0 (x) has a limit when x tends to a boundary of (xi , xi+1 ).
427
()
c(x)u(x)v(x)dx =
f (x)v(x)dx.
()
Now, the trick is to use integration by parts on the first term. Recall that
(u0 v)0 = u00 v + u0 v 0 ,
and to be careful about discontinuities, write
1
00
u (x)v(x)dx =
0
N Z
X
i=0
xi+1
u00 (x)v(x)dx.
xi
= [u0 (x)v(x)]x=xi+1
i
xi
Z xi+1
0
0
u0 (x)v 0 (x)dx.
= u (xi+1 )v(xi+1 ) u (xi )v(xi )
xi
It follows that
Z 1
N Z
X
00
u (x)v(x)dx =
0
xi+1
i=0 xi
N
X
i=0
u00 (x)v(x)dx
= u (1)v(1) u (0)v(0)
xi+1
u (x)v (x)dx
xi
u0 (x)v 0 (x)dx.
However, the test function v satisfies the boundary conditions v(0) = v(1) = 0 (recall that
v V ), so we get
Z 1
Z 1
00
u (x)v(x)dx =
u0 (x)v 0 (x)dx.
0
f (x)v(x)dx,
428
or
Z
0 0
f vdx,
(u v + cuv)dx =
0
for all v V.
()
Z
a(u, v) =
(u0 v 0 + cuv)dx,
for all u, v V ,
f (x)v(x)dx,
fe(v) =
0
for all v V .
Then, () becomes
a(u, v) = fe(v),
for all v V.
Z
a(u, v) =
for all v V,
(u0 v 0 + cuv)dx,
and
Z
fe(v) =
(WF)
for all u, v V ,
f (x)v(x)dx,
0
for all v V .
(2) If c(x) 0 for all x [0, 1], then a function u V is a solution of (WF) iff u
minimizes J(v), that is,
J(u) = inf J(v),
vV
with
Furthermore, u is unique.
1
J(v) = a(v, v) fe(v) v V.
2
429
for all v V.
(f 0 (x))2 dx,
for all v V.
and so
kvk2V
Z
=
0
1
2
since
Z
a(v, v) =
Z
0
((v 0 )2 + cv 2 )dx.
for all u, v V .
1
1
J(u + v) J(u) = a(v, v) kvkV 0 for all v V.
2
4
since a(u, v) fe(v) = 0 for all v V . Therefore, J achieves a minimun for u.
We also have
2
a(v, v) for all R,
2
and so J(u + v) J(u) 0 for all R. Consequently, if J achieves a minimum for u,
then a(u, v) = fe(v), which means that u is a solution of (WF).
J(u + v) J(u) = (a(u, v) f (v)) +
Finally, assuming that c(x) 0, we claim that if v V and v 6= 0, then a(v, v) > 0. This
is because if a(v, v) = 0, since
kvk2V 2a(v, v) for all v V,
we would have kvkV = 0, that is, v = 0. Then, if v 6= 0, from
1
J(u + v) J(u) = a(v, v) for all v V
2
we see that J(u + v) > J(u), so the minimum u is unique
430
Theorem 15.1 shows that every solution u of our boundary problem (BP) is a solution
(in fact, unique) of the equation (WF).
The equation (WF) is called the weak form or variational equation associated with the
boundary problem. This idea to derive these equations is due to Ritz and Galerkin.
Now, the natural question is whether the variational equation (WF) has a solution, and
whether this solution, if it exists, is also a solution of the boundary problem (it must belong
to C 2 ([0, 1]), which is far from obvious). Then, (BP) and (WF) would be equivalent.
Some fancy tools of analysis can be used to prove these assertions. The first difficulty is
that the vector space V is not the right space of solutions, because in order for the variational
problem to have a solution, it must be complete. So, we must construct a completion of the
vector space V . This can be done and we get the Sobolev space H01 (0, 1). Then, the question
of the regularity of the weak solution can also be tackled.
We will not worry about all this. Instead, let us find approximations of the problem (WF).
Instead of using the infinite-dimensional vector space V , we consider finite-dimensional subspaces Va (with dim(Va ) = n) of V , and we consider the discrete problem:
Find a function u(a) Va , such that
a(u(a) , v) = fe(v),
for all v Va .
(DWF)
Since Va is finite dimensional (of dimension n), let us pick a basis of functions (w1 , . . . , wn )
in Va , so that every function u Va can we written as
u = u1 w1 + + un wn .
Then, the equation (DWF) holds iff
a(u, wj ) = fe(wj ),
j = 1, . . . , n,
i=1
1 j n.
Because a(v, v) 21 kvkVa , the bilinear form a is symmetric positive definite, and thus
the matrix (a(wi , wj )) is symmetric positive definite, and thus invertible. Therefore, (DWF)
has a solution given by a linear system!
From a practical point of view, we have to compute the integrals
Z 1
aij = a(wi , wj ) =
(wi0 wj0 + cwi wj )dx,
0
and
Z
bj = fe(wj ) =
f (x)wj (x)dx.
0
431
However, if the basis functions are simple enough, this can be done by hand. Otherwise,
numerical integration methods must be used, but there are some good ones.
Let us also remark that the proof of Theorem 15.1 also shows that the unique solution of
(DWF) is the unique minimizer of J over all functions in Va . It is also possible to compare
the approximate solution u(a) Va with the exact solution u V .
Theorem 15.2. Suppose c(x) 0 for all x [0, 1]. For every finite-dimensional subspace
Va (dim(Va ) = n) of V , for every basis (w1 , . . . , wn ) of Va , the following properties hold:
(1) There is a unique function u(a) Va such that
a(u(a) , v) = fe(v),
for all v Va ,
(DWF)
We proved (1) and (2), but we will omit the proof of (3) which can be found in Ciarlet
[24].
Let us now give examples of the subspaces Va used in practice. They usually consist of
piecewise polynomial functions.
Pick an integer N 1 and subdivide [0, 1] into N + 1 intervals [xi , xi+1 ], where
xi = hi,
h=
1
,
N +1
i = 0, . . . , N + 1.
432
There are various ways to prove this. One way is to use the Bernstein basis, because
the kth derivative of a polynomial is given by a formula in terms of its control points. For
example, for m = 1, every degree 3 polynomial can be written as
P (x) = (1 x)3 b0 + 3(1 x)2 x b1 + 3(1 x)x2 b2 + x3 b3 ,
with b0 , b1 , b2 , b3 R, and we showed that
P 0 (0) = 3(b1 b0 )
P 0 (1) = 3(b3 b2 ).
Given P (0) and P (1), we determine b0 and b3 , and from P 0 (0) and P 0 (1), we determine b1
and b2 .
In general, for a polynomial of degree m written as
P (x) =
m
X
bj Bjm (x)
j=0
m
in terms of the Bernstein basis (B0m (x), . . . , Bm
(x)) with
m
m
Bj (x) =
(1 x)mj xj ,
j
(k)
X
k
k
ki
(0) = m(m 1) (m k + 1)
(1) bi ,
i
i=0
(k)
m(m 1) (m k + 1)
(0) =
(s r)k
X
k
k
ki
(1) bi ,
i
i=0
with a similar formula for P (k) (1). In our case, we set r = xi , s = xi+1 .
Now, if the 2m + 2 values
P (0), P (1) (0), . . . , P (m) (0), P (1), P (1) (1), . . . , P (m) (1)
433
are given, we obtain a triangular system that determines uniquely the 2m + 2 control points
b0 , . . . , b2m+1 .
Recall that C m ([0, 1]) denotes the set of C m functions f on [0, 1], which means that
f, f (1) , . . . , f (m) exist are are continuous on [0, 1].
We define the vector space VNm as the subspace of C m ([0, 1]) consisting of all functions f
such that
1. f (0) = f (1) = 0.
2. The restriction of f to [xi , xi+1 ] is a polynomial of degree 2m + 1, for i = 0, . . . , N .
Observe that the functions in VN0 are the piecewise affine functions f with f (0) = f (1) =
0; an example is shown in Figure 15.2.
ih
N
X
i=1
v(ih)wi (x),
x [0, 1].
434
wi
(i 1)h ih (i + 1)h
Figure 15.3: A basis hat function
In general, it it not hard to see that VNm has dimension mN + 2(m 1).
Going back to our problem (the bending of a beam), assuming that c and f are constant
functions, it is not hard to show that the linear system () becomes
2c 2
h
3
+ 6c h2
2+
...
1 + 6c h2
2+
2c 2
h
3
...
1 + 6c h2
1 + 6c h2
...
2+
2c 2
h
3
1 + 6c h2
1 + 6c h2
u2
f
.
.. = h ... .
f
N
1
2c 2
2+ 3h
uN
f
u1
We can also find a basis of 2N + 2 cubic functions for VN1 consisting of functions with
small support. This basis consists of the N functions wi0 and of the N + 2 functions wi1
435
1
(x (i 1)h)2 ((2i + 1)h 2x),
h3
(i 1)h x ih,
wi0 (x) =
1
((i + 1)h x)2 (2x (2i 1)h),
3
h
ih x (i + 1)h,
1
(ih x)(x (i 1)h)2 ,
2
h
(i 1)h x ih,
and
wj1 (x) =
1
((i + 1)h x)2 (x ih),
h2
ih x (i + 1)h,
N
X
v(ih)wi0 (x) +
i=1
N
+1
X
j=0
v 0 jih)wj1 (x),
x [0, 1].
we find that if c = 0, the matrix A of the system () is tridiagonal by blocks, where the blocks
are 2 2, 2 1, or 1 2 matrices, and with single entries in the top left and bottom right
corner. A different order of the basis vectors would mess up the tridiagonal block structure
of A. We leave the details as an exercise.
Let us now take a quick look at a two-dimensional problem, the bending of an elastic
membrane.
436
wi0
w01
0
wj1
ih
jh
1
wN
+1
15.2
Consider an elastic membrane attached to a round contour whose projection on the (x1 , x2 )plane is the boundary of an open, connected, bounded region in the (x1 , x2 )-plane, as
illustrated in Figure 15.5. In other words, we view the membrane as a surface consisting of
the set of points (x, z) given by an equation of the form
z = u(x),
with x = (x1 , x2 ) , where u : R is some sufficiently regular function, and we think
of u(x) as the vertical displacement of this membrane.
We assume that this membrane is under the action of a vertical force f (x)dx per surface
element in the horizontal plane (where is the tension of the membrane). The problem is
437
f (x)dx
g(y)
u(x)
x2
dx
y
x1
x
x ,
where g : R represents the height of the contour of the membrane. We are looking for
a function u in C 2 () C 1 (). The operator is the Laplacian, and it is given by
u(x) =
2u
2u
(x)
+
(x).
x21
x22
This is an example of a boundary problem, since the solution u of the PDE must satisfy the
condition u(x) = g(x) on the boundary of the domain . The above equation is known as
Poissons equation, and when f = 0 as Laplaces equation.
It can be proved that if the data f, g and are sufficiently smooth, then the problem has
a unique solution.
To get a weak formulation of the problem, first we have to make the boundary condition
homogeneous, which means that g(x) = 0 on . It turns out that g can be extended to the
whole of as some sufficiently smooth function b
h, so we can look for a solution of the form
ub
h, but for simplicity, let us assume that the contour of lies in a plane parallel to the
438
(where nRdenotes the outward pointing unit normal to the surface). Because v = 0 on , the
integral drops out, and we get an equation of the form
a(u, v) = fe(v) for all v V,
where a is the bilinear form given by
Z
u v
u v
+
dx
a(u, v) =
x1 x1 x2 x2
Z
f vdx.
fe(v) =
We get the same equation as in section 15.2, but over a set of functions defined on a
two-dimensional domain. As before, we can choose a finite-dimensional subspace Va of V
and consider the discrete problem with respect to Va . Again, if we pick a basis (w1 , . . . , wn )
of Va , a vector u = u1 w1 + + un wn is a solution of the Weak Formulation of our problem
iff u = (u1 , . . . , un ) is a solution of the linear system
Au = b,
with A = (a(wi , wj )) and b = (fe(wj )). However, the integrals that give the entries in A and
b are much more complicated.
An approach to deal with this problem is the method of finite elements. The idea is
to also discretize the boundary curve . If we assume that is a polygonal line, then we
can triangulate the domain , and then we consider spaces of functions which are piecewise
defined on the triangles of the triangulation of . The simplest functions are piecewise affine
and look like tents erected above groups of triangles. Again, we can define base functions
with small support, so that the matrix A is tridiagonal by blocks.
The finite element method is a vast subject and it is presented in many books of various
degrees of difficulty and obscurity. Let us simply state three important requirements of the
finite element method:
439
1. Good triangulations must be found. This in itself is a vast research topic. Delaunay
triangulations are good candidates.
2. Good spaces of functions must be found; typically piecewise polynomials and splines.
3. Good bases consisting of functions will small support must be found, so that integrals
can be easily computed and sparse banded matrices arise.
We now consider boundary problems where the solution varies with time.
15.3
Consider a homogeneous string (or rope) of constant cross-section, of length L, and stretched
(in a vertical plane) between its two ends which are assumed to be fixed and located along
the x-axis at x = 0 and at x = L.
p
with c = /, where is the linear density of the string, known as the one-dimensional
wave equation.
440
Furthermore, the initial shape of the string is known at t = 0, as well as the distribution
of the initial velocities along the string; in other words, there are two functions ui,0 and ui,1
such that
u(x, 0) = ui,0 (x),
u
(x, 0) = ui,1 (x),
t
0 x L,
0 x L.
For example, if the string is simply released from its given starting position, we have ui,1 = 0.
Lastly, because the ends of the string are fixed, we must have
u(0, t) = u(L, t) = 0,
t 0.
(x, t) = 0,
c2 t2
x2
Let us try our trick of multiplying by a test function v depending only on x, C 1 on [0, L],
and such that v(0) = v(L) = 0, and integrate by parts. We get the equation
Z L 2
Z L 2
u
u
2
(x, t)v(x)dx c
(x, t)v(x)dx = 0.
2
2
0 x
0 t
For the first term, we get
Z L 2
Z L 2
u
(x, t)v(x)dx =
[u(x, t)v(x)]dx
2
2
0 t
0 t
Z
d2 L
= 2
u(x, t)v(x)dx
dt 0
d2
= 2 hu, vi,
dt
441
where hu, vi is the inner product in L2 ([0, L]). The fact that it is legitimate to move 2 /t2
outside of the integral needs to be justified rigorously, but we wont do it here.
For the second term, we get
x=L Z L
Z L 2
u
dv
u
u
(x, t)v(x)
+
(x, t) (x)dx,
(x, t)v(x)dx =
2
x
dx
0 x
0 x
x=0
and because v V , we have v(0) = v(L) = 0, so we obtain
Z L 2
Z L
u
u
dv
(x, t) (x)dx.
(x, t)v(x)dx =
2
dx
0 x
0 x
Our integrated equation becomes
Z L
d2
u
dv
2
hu, vi + c
(x, t) (x)dx = 0,
2
dt
dx
0 x
for all v V
and all t 0.
d2
hu, vi + a(u, v) = 0,
for all v V and all t 0
dt2
u(x, 0) = ui,0 (x), 0 x L (intitial condition),
u
(x, 0) = ui,1 (x), 0 x L (intitial condition).
t
for all v V
(Poincares inequality), which shows that a is positive definite on V . The above method is
known as the method of Rayleigh-Ritz .
A study of the above equation requires some sophisticated tools of analysis which go
far beyond the scope of these notes. Let us just say that there is a countable sequence of
solutions with separated variables of the form
kx
kct
kx
kct
(1)
(2)
uk = sin
cos
, uk = sin
sin
, k N+ ,
L
L
L
L
442
called modes (or normal modes). Complete solutions of the problem are series obtained by
combining the normal modes, and they are of the form
u(x, t) =
sin
k=1
kx
L
kct
kct
Ak cos
+ Bk sin
,
L
L
where the coefficients Ak , Bk are determined from the Fourier series of ui,0 and ui,1 .
We now consider discrete approximations of our problem. As before, consider a finite
dimensional subspace Va of V and assume that we have approximations ua,0 and ua,1 of ui,0
and ui,1 . If we pick a basis (w1 , . . . , wn ) of Va , then we can write our unknown function
u(x, t) as
u(x, t) = u1 (t)w1 + + un (t)wn ,
where u1 , . . . , un are functions of t. Then, if we write u = (u1 , . . . , un ), the discrete version
of our problem is
A
d2 u
+ Ku = 0,
dt2
u(x, 0) = ua,0 (x),
u
(x, 0) = ua,1 (x),
t
0 x L,
0 x L,
where A = (hwi , wj i) and K = (a(wi , wj )) are two symmetric matrices, called the mass
matrix and the stiffness matrix , respectively. In fact, because a and the inner product
h, i are positive definite, these matrices are also positive definite.
We have made some progress since we now have a system of ODEs, and we can solve it
by analogy with the scalar case. So, we look for solutions of the form U cos t (or U sin t),
where U is an n-dimensional vector. We find that we should have
(K 2 A)U cos t = 0,
which implies that must be a solution of the equation
KU = 2 AU.
Thus, we have to find some such that
KU = AU,
a problem known as a generalized eigenvalue problem, since the ordinary eigenvalue problem
for K is
KU = U.
443
i = 1, . . . , n,
are linearly independent and are solutions of the generalized eigenvalue problem; that is,
KUi = i2 AUi ,
i = 1, . . . , n.
More is true. Because the vectors Y1 , . . . , Yn are orthonormal, and because Yi = L> Ui ,
from
(Yi )> Yj = ij ,
we get
(Ui )> LL> Uj = ij ,
and since A = LL> , this yields
(Ui )> AUj = ij ,
1 i, j n,
1 i, j n.
444
n
X
Uik wk .
k=1
As a final step, let us look again for a solution of our discrete weak formulation of the
problem, this time expressing the unknown solution u(x, t) over the modal basis (U 1 , . . . , U n ),
say
n
X
u=
u
ej (t)U j ,
j=1
where each u
ej is a function of t. Because
u=
n
X
u
ej (t)U j =
j=1
n
X
u
ej (t)
j=1
n
X
!
Ujk wk
k=1
Pn
j=1
u=
n
n
X
X
k=1
!
u
ej (t)Ujk
wk ,
j=1
u
ej (t)Ujk for k = 1, . . . , n, we see that
n
X
u
ej Uj ,
j=1
j = 1, . . . , n,
d2 u
+ Ku = 0
dt2
n
X
[(e
uj )00 + j2 u
ej ]AUj = 0.
j=1
Since A is invertible and since (U1 , . . . , Un ) are linearly independent, the vectors (AU1 ,
. . . , AUn ) are linearly independent, and consequently we get the system of n ODEs
(e
uj )00 + j2 u
ej = 0,
1 j n.
445
0 x L,
0 x L,
by expressing ua,0 and ua,1 on the modal basis (U 1 , . . . , U n ). Furthermore, the modal functions (U 1 , . . . , U n ) form an orthonormal basis of Va for the inner product a.
If we use the vector space VN0 of piecewise
and K are familar! Indeed,
2 1
1 2
1
A = ... . . .
h
0
0
0
0
and
0
0
0
1 0
0
..
.. ..
. .
.
1 2 1
0 1 2
4 1 0 0 0
1 4 1 0 0
h
.. . . . . . . ..
K = .
.
.
. . .
6
0 0 1 4 1
0 0 0 1 4
To conclude this section, let us discuss briefly the wave equation for an elastic membrane,
as described in Section 15.2. This time, we look for a function u : R+ R satisfying
the following conditions:
1 2u
(x, t) u(x, t) = f (x, t), x , t > 0,
c2 t2
u(x, t) = 0, x , t 0 (boundary condition),
u(x, 0) = ui,0 (x), x (intitial condition),
u
(x, 0) = ui,1 (x), x (intitial condition).
t
Assuming that f = 0, we look for solutions in the subspace V of the Sobolev space H01 ()
consisting of functions v such that v = 0 on . Multiplying by a test function v V and
using Greens first identity, we get the weak formulation of our problem:
446
and
hu, vi =
Z
uvdx.
d2 u
+ Ku = 0,
dt2
u(x, 0) = ua,0 (x),
u
(x, 0) = ua,1 (x),
t
x ,
x ,
where A = (hwi , wj i) and K = (a(wi , wj )) are two symmetric positive definite matrices.
In principle, the problem is solved, but, it may be difficult to find good spaces Va , good
triangulations of , and good bases of Va , to be able to compute the matrices A and K, and
to ensure that they are sparse.
Chapter 16
Singular Value Decomposition and
Polar Form
16.1
In this section we assume that we are dealing with a real Euclidean space E. Let f : E E
be any linear map. In general, it may not be possible to diagonalize f . We show that every
linear map can be diagonalized if we are willing to use two orthonormal bases. This is the
celebrated singular value decomposition (SVD). A close cousin of the SVD is the polar form
of a linear map, which shows how a linear map can be decomposed into its purely rotational
component (perhaps with a flip) and its purely stretching part.
The key observation is that f f is self-adjoint, since
h(f f )(u), vi = hf (u), f (v)i = hu, (f f )(v)i.
Similarly, f f is self-adjoint.
The fact that f f and f f are self-adjoint is very important, because it implies that
f f and f f can be diagonalized and that they have real eigenvalues. In fact, these
eigenvalues are all nonnegative. Indeed, if u is an eigenvector of f f for the eigenvalue ,
then
h(f f )(u), ui = hf (u), f (u)i
and
h(f f )(u), ui = hu, ui,
and thus
hu, ui = hf (u), f (u)i,
448
for f f . The situation is even better, since we will show shortly that f f and f f
have the same eigenvalues.
Remark: Given any two linear maps f : E F and g : F E, where dim(E) = n and
dim(F ) = m, it can be shown that
()m det(g f In ) = ()n det(f g Im ),
and thus g f and f g always have the same nonzero eigenvalues!
Definition 16.1. The square roots i > 0 of the positive eigenvalues of f f (and f f )
are called the singular values of f .
Definition 16.2. A self-adjoint linear map f : E E whose eigenvalues are nonnegative is
called positive semidefinite (or positive), and if f is also invertible, f is said to be positive
definite. In the latter case, every eigenvalue of f is strictly positive.
We just showed that f f and f f are positive semidefinite self-adjoint linear maps.
This fact has the remarkable consequence that every linear map has two important decompositions:
1. The polar form.
2. The singular value decomposition (SVD).
The wonderful thing about the singular value decomposition is that there exist two orthonormal bases (u1 , . . . , un ) and (v1 , . . . , vn ) such that, with respect to these bases, f is
a diagonal matrix consisting of the singular values of f , or 0. Thus, in some sense, f can
always be diagonalized with respect to two orthonormal bases. The SVD is also a useful tool
for solving overdetermined linear systems in the least squares sense and for data analysis, as
we show later on.
First, we show some useful relationships between the kernels and the images of f , f ,
f f , and f f . Recall that if f : E F is a linear map, the image Im f of f is the
subspace f (E) of F , and the rank of f is the dimension dim(Im f ) of its image. Also recall
that (Theorem 4.11)
dim (Ker f ) + dim (Im f ) = dim (E),
and that (Propositions 10.9 and 12.10) for every subspace W of E,
dim (W ) + dim (W ) = dim (E).
449
Proposition 16.1. Given any two Euclidean spaces E and F , where E has dimension n
and F has dimension m, for any linear map f : E F , we have
Ker f = Ker (f f ),
Ker f = Ker (f f ),
Ker f = (Im f ) ,
Ker f = (Im f ) ,
dim(Im f ) = dim(Im f ),
and f , f , f f , and f f have the same rank.
Proof. To simplify the notation, we will denote the inner products on E and F by the same
symbol h, i (to avoid subscripts). If f (u) = 0, then (f f )(u) = f (f (u)) = f (0) = 0,
and so Ker f Ker (f f ). By definition of f , we have
hf (u), f (u)i = h(f f )(u), ui
for all u E. If (f f )(u) = 0, since h, i is positive definite, we must have f (u) = 0,
and so Ker (f f ) Ker f . Therefore,
Ker f = Ker (f f ).
The proof that Ker f = Ker (f f ) is similar.
By definition of f , we have
Let us explain why Ker f = (Im f ) , the proof of the other equation being similar.
Because the inner product is positive definite, for every u E, we have
u Ker f
iff f (u) = 0
iff hf (u), vi = 0 for all v,
by () iff hu, f (v)i = 0 for all v,
iff u (Im f ) .
Since
dim(Im f ) = n dim(Ker f )
and
dim(Im f ) = n dim((Im f ) ),
()
450
from
Ker f = (Im f )
we also have
dim(Ker f ) = dim((Im f ) ),
from which we obtain
dim(Im f ) = dim(Im f ).
Since
dim(Ker (f f )) + dim(Im (f f )) = dim(E),
Ker (f f ) = Ker f and Ker f = (Im f ) , we get
dim((Im f ) ) + dim(Im (f f )) = dim(E).
Since
dim((Im f ) ) + dim(Im f ) = dim(E),
we deduce that
dim(Im f ) = dim(Im (f f )).
A similar proof shows that
dim(Im f ) = dim(Im (f f )).
Consequently, f , f , f f , and f f have the same rank.
We will now prove that every square matrix has an SVD. Stronger results can be obtained
if we first consider the polar form and then derive the SVD from it (there are uniqueness
properties of the polar decomposition). For our purposes, uniqueness results are not as
important so we content ourselves with existence results, whose proofs are simpler. Readers
interested in a more general treatment are referred to [44].
The early history of the singular value decomposition is described in a fascinating paper
by Stewart [101]. The SVD is due to Beltrami and Camille Jordan independently (1873,
1874). Gauss is the grandfather of all this, for his work on least squares (1809, 1823)
(but Legendre also published a paper on least squares!). Then come Sylvester, Schmidt, and
Hermann Weyl. Sylvesters work was apparently opaque. He gave a computational method
to find an SVD. Schmidts work really has to do with integral equations and symmetric and
asymmetric kernels (1907). Weyls work has to do with perturbation theory (1912). Autonne
came up with the polar decomposition (1902, 1915). Eckart and Young extended SVD to
rectangular matrices (1936, 1939).
451
Theorem 16.2. (Singular value decomposition) For every real n n matrix A there are two
orthogonal matrices U and V and a diagonal matrix D such that A = V DU > , where D is of
the form
1
...
2 . . .
D = .. .. . .
. ,
. .
. ..
. . . n
where 1 , . . . , r are the singular values of f , i.e., the (positive) square roots of the nonzero
eigenvalues of A> A and A A> , and r+1 = = n = 0. The columns of U are eigenvectors
of A> A, and the columns of V are eigenvectors of A A> .
Proof. Since A> A is a symmetric matrix, in fact, a positive semidefinite matrix, there exists
an orthogonal matrix U such that
A> A = U D2 U > ,
with D = diag(1 , . . . , r , 0, . . . , 0), where 12 , . . . , r2 are the nonzero eigenvalues of A> A,
and where r is the rank of A; that is, 1 , . . . , r are the singular values of A. It follows that
U > A> AU = (AU )> AU = D2 ,
and if we let fj be the jth column of AU for j = 1, . . . , n, then we have
hfi , fj i = i2 ij ,
1 i, j r
and
fj = 0,
If we define (v1 , . . . , vr ) by
r + 1 j n.
vj = j1 fj ,
1 j r,
then we have
hvi , vj i = ij ,
1 i, j r,
1 i n, 1 j r
452
AA> = V D2 V > ,
which shows that A> A and AA> have the same eigenvalues, that the columns of U are
eigenvectors of A> A, and that the columns of V are eigenvectors of AA> .
Theorem 16.2 suggests the following definition.
Definition 16.3. A triple (U, D, V ) such that A = V D U > , where U and V are orthogonal
and D is a diagonal matrix whose entries are nonnegative (it is positive semidefinite) is called
a singular value decomposition (SVD) of A.
The proof of Theorem 16.2 shows that there are two orthonormal bases (u1 , . . . , un ) and
(v1 , . . . , vn ), where (u1 , . . . , un ) are eigenvectors of A> A and (v1 , . . . , vn ) are eigenvectors
of AA> . Furthermore, (u1 , . . . , ur ) is an orthonormal basis of Im A> , (ur+1 , . . . , un ) is an
orthonormal basis of Ker A, (v1 , . . . , vr ) is an orthonormal basis of Im A, and (vr+1 , . . . , vn )
is an orthonormal basis of Ker A> .
Using a remark made in Chapter 3, if we denote the columns of U by u1 , . . . , un and the
columns of V by v1 , . . . , vn , then we can write
>
A = V D U > = 1 v1 u>
1 + + r vr ur .
As a consequence, if r is a lot smaller than n (we write r n), we see that A can be
reconstructed from U and V using a much smaller number of elements. This idea will be
used to provide low-rank approximations of a matrix. The idea is to keep only the k top
singular values for some suitable k r for which k+1 , . . . r are very small.
Remarks:
(1) In Strang [105] the matrices U, V, D are denoted by U = Q2 , V = Q1 , and D = , and
an SVD is written as A = Q1 Q>
2 . This has the advantage that Q1 comes before Q2 in
A = Q1 Q>
.
This
has
the
disadvantage
that A maps the columns of Q2 (eigenvectors
2
>
of A A) to multiples of the columns of Q1 (eigenvectors of A A> ).
(2) Algorithms for actually computing the SVD of a matrix are presented in Golub and
Van Loan [49], Demmel [27], and Trefethen and Bau [110], where the SVD and its
applications are also discussed quite extensively.
(3) The SVD also applies to complex matrices. In this case, for every complex nn matrix
A, there are two unitary matrices U and V and a diagonal matrix D such that
A = V D U ,
where D is a diagonal matrix consisting of real entries 1 , . . . , n , where 1 , . . . , r are
the singular values of A, i.e., the positive square roots of the nonzero eigenvalues of
A A and A A , and r+1 = . . . = n = 0.
453
1 1
1
1
1 1 1 1 1
A=
2 1 1 1 1
1 1 1 1
is both orthogonal and symmetric, and A = RS with R = A and S = I, which implies that
some of the eigenvalues of A are negative.
Remark: In the complex case, the polar decomposition states that for every complex n n
matrix A, there is some unitary matrix U and some positive semidefinite Hermitian matrix
H such that
A = U H.
454
1 2 0 0
0 1 2 0
0 0 1 2
A = ... ... . . . . . .
0 0 . . . 0
0 0 . . . 0
0 0 ... 0
0
0
..
.
2 0
1 2
0 1
0
0
0
..
.
has the eigenvalue 1 with multiplicity n, but its singular values, 1 n , which are
the positive square roots of the eigenvalues of the matrix B = A> A with
1 2 0 0 ... 0 0
2 5 2 0 . . . 0 0
0 2 5 2 . . . 0 0
0 0 . . . 2 5 2 0
0 0 . . . 0 2 5 2
0 0 ... 0 0 2 5
have a wide spread, since
1
= cond2 (A) 2n1 .
n
k = 1, . . . , n 1.
16.2
1
...
2 . . .
.. .. . .
..
.
.
.
0
.
.
.
0
1
. .
. .
2 . . .
0 . . . 0
. . . n
D=
or D = .. .. . .
,
..
..
..
. .
.
.
0
.
0
0 . ... 0
. . .
. . . m 0 . . . 0
.. ..
. . ...
.
0 .. . . . 0
where 1 , . . . , r are the singular values of f , i.e. the (positive) square roots of the nonzero
eigenvalues of A> A and A A> , and r+1 = . . . = p = 0, where p = min(m, n). The columns
of U are eigenvectors of A> A, and the columns of V are eigenvectors of A A> .
Proof. As in the proof of Theorem 16.2, since A> A is symmetric positive semidefinite, there
exists an n n orthogonal matrix U such that
A> A = U 2 U > ,
with = diag(1 , . . . , r , 0, . . . , 0), where 12 , . . . , r2 are the nonzero eigenvalues of A> A,
and where r is the rank of A. Observe that r min{m, n}, and AU is an m n matrix. It
follows that
U > A> AU = (AU )> AU = 2 ,
and if we let fj Rm be the jth column of AU for j = 1, . . . , n, then we have
hfi , fj i = i2 ij ,
1 i, j r
456
and
fj = 0,
If we define (v1 , . . . , vr ) by
r + 1 j n.
vj = j1 fj ,
1 j r,
then we have
hvi , vj i = ij ,
1 i, j r,
1 i m, 1 j r
1
...
2 . . .
..
.. . .
..
. .
.
.
.
.
.
n
D=
=
,
.
0mn
0 .. . . . 0
.
.. . .
..
..
.
.
.
..
0 . ... 0
else if n m, then we let
D = ..
.
2
..
.
...
...
..
.
..
.
. . . m
0 ... 0
0 . . . 0
.
..
0 . 0
0 ... 0
which shows that A> A and AA> have the same nonzero eigenvalues, that the columns of U
are eigenvectors of A> A, and that the columns of V are eigenvectors of AA> .
A triple (U, D, V ) such that A = V D U > is called a singular value decomposition (SVD)
of A.
Even though the matrix D is an m n rectangular matrix, since its only nonzero entries
are on the descending diagonal, we still say that D is a diagonal matrix.
If we view A as the representation of a linear map f : E F , where dim(E) = n and
dim(F ) = m, the proof of Theorem 16.4 shows that there are two orthonormal bases (u1 , . . .,
un ) and (v1 , . . . , vm ) for E and F , respectively, where (u1 , . . . , un ) are eigenvectors of f f
and (v1 , . . . , vm ) are eigenvectors of f f . Furthermore, (u1 , . . . , ur ) is an orthonormal basis
of Im f , (ur+1 , . . . , un ) is an orthonormal basis of Ker f , (v1 , . . . , vr ) is an orthonormal basis
of Im f , and (vr+1 , . . . , vm ) is an orthonormal basis of Ker f .
The SVD of matrices can be used to define the pseudo-inverse of a rectangular matrix; we
will do so in Chapter 17. The reader may also consult Strang [105], Demmel [27], Trefethen
and Bau [110], and Golub and Van Loan [49].
One of the spectral theorems states that a symmetric matrix can be diagonalized by
an orthogonal matrix. There are several numerical methods to compute the eigenvalues
of a symmetric matrix A. One method consists in tridiagonalizing A, which means that
there exists some orthogonal matrix P and some symmetric tridiagonal matrix T such that
A = P T P > . In fact, this can be done using Householder transformations. It is then possible
to compute the eigenvalues of T using a bisection method based on Sturm sequences. One can
also use Jacobis method. For details, see Golub and Van Loan [49], Chapter 8, Demmel [27],
Trefethen and Bau [110], Lecture 26, or Ciarlet [24]. Computing the SVD of a matrix A is
more involved. Most methods begin by finding orthogonal matrices U and V and a bidiagonal
matrix B such that A = V BU > . This can also be done using Householder transformations.
Observe that B > B is symmetric tridiagonal. Thus, in principle, the previous method to
diagonalize a symmetric tridiagonal matrix can be applied. However, it is unwise to compute
B > B explicitly, and more subtle methods are used for this last step. Again, see Golub and
Van Loan [49], Chapter 8, Demmel [27], and Trefethen and Bau [110], Lecture 31.
The polar form has applications in continuum mechanics. Indeed, in any deformation it
is important to separate stretching from rotation. This is exactly what QS achieves. The
orthogonal part Q corresponds to rotation (perhaps with an additional reflection), and the
symmetric matrix S to stretching (or compression). The real eigenvalues 1 , . . . , r of S are
the stretch factors (or compression factors) (see Marsden and Hughes [76]). The fact that
S can be diagonalized by an orthogonal matrix corresponds to a natural choice of axes, the
principal axes.
458
The SVD has applications to data compression, for instance in image processing. The
idea is to retain only singular values whose magnitudes are significant enough. The SVD
can also be used to determine the rank of a matrix when other methods such as Gaussian
elimination produce very small pivots. One of the main applications of the SVD is the
computation of the pseudo-inverse. Pseudo-inverses are the key to the solution of various
optimization problems, in particular the method of least squares. This topic is discussed in
the next chapter (Chapter 17). Applications of the material of this chapter can be found
in Strang [105, 104]; Ciarlet [24]; Golub and Van Loan [49], which contains many other
references; Demmel [27]; and Trefethen and Bau [110].
16.3
The singular values of a matrix can be used to define various norms on matrices which
have found recent applications in quantum information theory and in spectral graph theory.
Following Horn and Johnson [58] (Section 3.4) we can make the following definitions:
Definition 16.5. For any matrix A Mm,n (C), let q = min{m, n}, and if 1 q are
the singular values of A, for any k with 1 k q, let
Nk (A) = 1 + + k ,
called the Ky Fan k-norm of A.
More generally, for any p 1 and any k with 1 k q, let
Nk;p (A) = (1p + + kp )1/p ,
called the Ky Fan p-k-norm of A. When k = q, Nq;p is also called the Schatten p-norm.
Observe that when k = 1, N1 (A) = 1 , and the Ky Fan norm N1 is simply the spectral
norm from Chapter 7, which is the subordinate matrix norm associated with the Euclidean
norm. When k = q, the Ky Fan norm Nq is given by
Nq (A) = 1 + + q = tr((A A)1/2 )
and is called the trace norm or nuclear norm. When p = 2 and k = q, the Ky Fan Nq;2 norm
is given by
p
Nk;2 (A) = (12 + + q2 )1/2 = tr(A A) = kAkF ,
which is the Frobenius norm of A.
It can be shown that Nk and Nk;p are unitarily invariant norms, and that when m = n,
they are matrix norms; see Horn and Johnson [58] (Section 3.4, Corollary 3.4.4 and Problem
3).
16.4. SUMMARY
16.4
459
Summary
The main concepts and results of this chapter are listed below:
For any linear map f : E E on a Euclidean space E, the maps f f and f f are
self-adjoint and positive semidefinite.
The singular values of a linear map.
Positive semidefinite and positive definite self-adjoint maps.
Relationships between Im f , Ker f , Im f , and Ker f .
The singular value decomposition theorem for square matrices (Theorem 16.2).
The SVD of matrix.
The polar decomposition of a matrix.
The Weyl inequalities.
The singular value decomposition theorem for m n matrices (Theorem 16.4).
Ky Fan k-norms, Ky Fan p-k-norms, Schatten p-norms.
460
Chapter 17
Applications of SVD and
Pseudo-inverses
De tous les principes quon peut proposer pour cet objet, je pense quil nen est pas de
plus general, de plus exact, ni dune application plus facile, que celui dont nous avons
fait usage dans les recherches pecedentes, et qui consiste `a rendre minimum la somme
des carres des erreurs. Par ce moyen il setablit entre les erreurs une sorte dequilibre
qui, empechant les extremes de prevaloir, est tr`es propre `a faire connaitre letat du
syst`eme le plus proche de la verite.
Legendre, 1805, Nouvelles Methodes pour la determination des Orbites des
Com`etes
17.1
This chapter presents several applications of SVD. The first one is the pseudo-inverse, which
plays a crucial role in solving linear systems by the method of least squares. The second application is data compression. The third application is principal component analysis (PCA),
whose purpose is to identify patterns in data and understand the variancecovariance structure of the data. The fourth application is the best affine approximation of a set of data, a
problem closely related to PCA.
The method of least squares is a way of solving an overdetermined system of linear
equations
Ax = b,
i.e., a system in which A is a rectangular m n matrix with more equations than unknowns
(when m > n). Historically, the method of least squares was used by Gauss and Legendre
to solve problems in astronomy and geodesy. The method was first published by Legendre
in 1805 in a paper on methods for determining the orbits of comets. However, Gauss had
already used the method of least squares as early as 1801 to determine the orbit of the asteroid
461
462
Ceres, and he published a paper about it in 1810 after the discovery of the asteroid Pallas.
Incidentally, it is in that same paper that Gaussian elimination using pivots is introduced.
The reason why more equations than unknowns arise in such problems is that repeated
measurements are taken to minimize errors. This produces an overdetermined and often
inconsistent system of linear equations. For example, Gauss solved a system of eleven equations in six unknowns to determine the orbit of the asteroid Pallas. As a concrete illustration,
suppose that we observe the motion of a small object, assimilated to a point, in the plane.
From our observations, we suspect that this point moves along a straight line, say of equation
y = dx + c. Suppose that we observed the moving point at three different locations (x1 , y1 ),
(x2 , y2 ), and (x3 , y3 ). Then we should have
c + dx1 = y1 ,
c + dx2 = y2 ,
c + dx3 = y3 .
If there were no errors in our measurements, these equations would be compatible, and c
and d would be determined by only two of the equations. However, in the presence of errors,
the system may be inconsistent. Yet we would like to find c and d!
The idea of the method of least squares is to determine (c, d) such that it minimizes the
sum of the squares of the errors, namely,
(c + dx1 y1 )2 + (c + dx2 y2 )2 + (c + dx3 y3 )2 .
In general, for an overdetermined mn system Ax = b, what Gauss and Legendre discovered
is that there are solutions x minimizing
kAx bk2
(where kuk2 = u21 + +u2n , the square of the Euclidean norm of the vector u = (u1 , . . . , un )),
and that these solutions are given by the square n n system
A> Ax = A> b,
called the normal equations. Furthermore, when the columns of A are linearly independent,
it turns out that A> A is invertible, and so x is unique and given by
x = (A> A)1 A> b.
Note that A> A is a symmetric matrix, one of the nice features of the normal equations of a
least squares problem. For instance, the normal equations for the above problem are
3
x1 + x2 + x3
c
y1 + y2 + y3
=
.
x1 + x2 + x3 x21 + x22 + x23
d
x1 y1 + x2 y2 + x3 y3
In fact, given any real m n matrix A, there is always a unique x+ of minimum norm
that minimizes kAx bk2 , even when the columns of A are linearly dependent. How do we
prove this, and how do we find x+ ?
463
and
vectors
py
bp are orthogonal, which implies that
2.
k byk2 = k bpk2 + k
pyk
Thus, p is indeed the unique point in U that minimizes the distance from b to any point in
U.
To show that there is a unique x+ of minimum norm minimizing the (square) error
kAx bk2 , we use the fact that
Rn = Ker A (Ker A) .
Indeed, every x Rn can be written uniquely as x = u + v, where u Ker A and v
(Ker A) , and since u and v are orthogonal,
kxk2 = kuk2 + kvk2 .
Furthermore, since u Ker A, we have Au = 0, and thus Ax = p iff Av = p, which shows
that the solutions of Ax = p for which x has minimum norm must belong to (Ker A) .
However, the restriction of A to (Ker A) is injective. This is because if Av1 = Av2 , where
v1 , v2 (Ker A) , then A(v2 v2 ) = 0, which implies v2 v1 Ker A, and since v1 , v2
(Ker A) , we also have v2 v1 (Ker A) , and consequently, v2 v1 = 0. This shows that
there is a unique x of minimum norm minimizing kAx bk2 , and that it must belong to
(Ker A) .
The proof also shows that x minimizes kAx bk2 iff pb = b Ax is orthogonal to U ,
which can be expressed by saying that b Ax is orthogonal to every column of A. However,
this is equivalent to
A> (b Ax) = 0, i.e., A> Ax = A> b.
Finally, it turns out that the minimum norm least squares solution x+ can be found in terms
of the pseudo-inverse A+ of A, which is itself obtained from any SVD of A.
464
Actually, it seems that A+ depends on the specific choice of U and V in an SVD (U, D, V )
for A, but the next theorem shows that this is not so.
Theorem 17.2. The least squares solution of smallest norm of the linear system Ax = b,
where A is an m n matrix, is given by
x+ = A+ b = U D+ V > b.
Proof. First, assume that A is a (rectangular) diagonal matrix D, as above. Then, since x
minimizes kDx bk2 iff Dx is the projection of b onto the image subspace F of D, it is fairly
obvious that x+ = D+ b. Otherwise, we can write
A = V DU > ,
where U and V are orthogonal. However, since V is an isometry,
kAx bk = kV DU > x bk = kDU > x V > bk.
Letting y = U > x, we have kxk = kyk, since U is an isometry, and since U is surjective,
kAx bk is minimized iff kDy V > bk is minimized, and we have shown that the least
solution is
y + = D+ V > b.
Since y = U > x, with kxk = kyk, we get
x+ = U D+ V > b = A+ b.
Thus, the pseudo-inverse provides the optimal solution to the least squares problem.
By Lemma 17.2 and Theorem 17.1, A+ b is uniquely defined by every b, and thus A+
depends only on A.
465
>
>
AA = U V V U = U U = U
Ir
0
U>
0 0nr
and
+
>
A A = V U U V
>
= V V
>
Ir
0
=V
V >.
0 0nr
We immediately get
(AA+ )2 = AA+ ,
(A+ A)2 = A+ A,
so both AA+ and A+ A are orthogonal projections (since they are both symmetric). We
claim that AA+ is the orthogonal projection onto the range of A and A+ A is the orthogonal
projection onto Ker(A) = Im(A> ), the range of A> .
Obviously, we have range(AA+ ) range(A), and for any y = Ax range(A), since
AA A = A, we have
AA+ y = AA+ Ax = Ax = y,
+
so the image of AA+ is indeed the range of A. It is also clear that Ker(A) Ker(A+ A), and
since AA+ A = A, we also have Ker(A+ A) Ker(A), and so
Ker(A+ A) = Ker(A).
Since A+ A is Hermitian, range(A+ A) = Ker(A+ A) = Ker(A) , as claimed.
It will also be useful to see that range(A) = range(AA+ ) consists of all vectors y Rn
such that
z
>
U y=
,
0
with z Rr .
Indeed, if y = Ax, then
r
0
z
>
U y = U Ax = U U V x = V x =
V x=
,
0 0nr
0
>
>
>
>
>
466
If y = A+ Au, then
+
y = A Au = V
Ir
0
z
>
V u=V
,
0 0nr
0
467
If A is a normal matrix, which means that AA> = A> A, then there is an intimate
relationship between SVDs of A and block diagonalizations of A. As a consequence, the
pseudo-inverse of a normal matrix A can be obtained directly from a block diagonalization
of A.
If A is a (real) normal matrix, then we know from Theorem 13.16 that A can be block
diagonalized with respect to an orthogonal matrix U as
A = U U > ,
where is the (real) block diagonal matrix
= diag(B1 , . . . , Bn ),
consisting either of 2 2 blocks of the form
j j
Bj =
j j
with j 6= 0, or of one-dimensional blocks Bk = (k ). Then we have the following proposition:
Proposition 17.3. For any (real) normal matrix A and any block diagonalization A =
U U > of A as above, the pseudo-inverse of A is given by
A+ = U + U > ,
where + is the pseudo-inverse of . Furthermore, if
r 0
=
,
0 0
where r has rank r, then
1
r 0
=
.
0 0
+
Proof. Assume that B1 , . . . , Bp are 2 2 blocks and that 2p+1 , . . . , n are the scalar entries.
We know
q that the numbers j ij , and the 2p+k are the eigenvalues of A. Let 2j1 =
1 j n.
468
1
r 0
=
.
0 0
+
Therefore, the pseudo-inverse of a normal matrix can be computed directly from any block
diagonalization of A, as claimed.
The following properties, due to Penrose, characterize the pseudo-inverse of a matrix.
We have already proved that the pseudo-inverse satisfies these equations. For a proof of the
converse, see Kincaid and Cheney [63].
Lemma 17.4. Given any m n matrix A (real or complex), the pseudo-inverse A+ of A is
the unique n m matrix satisfying the following properties:
AA+ A = A,
A+ AA+ = A+ ,
(AA+ )> = AA+ ,
(A+ A)> = A+ A.
469
R1
0mn
c
x=
,
d
where R1 is an invertible n n matrix (since A has rank n), c Rn , and d Rmn , and the
least squares solution of smallest norm is
x+ = R11 c.
Since R1 is a triangular matrix, it is very easy to invert R1 .
The method of least squares is one of the most effective tools of the mathematical sciences.
There are entire books devoted to it. Readers are advised to consult Strang [105], Golub and
Van Loan [49], Demmel [27], and Trefethen and Bau [110], where extensions and applications
of least squares (such as weighted least squares and recursive least squares) are described.
Golub and Van Loan [49] also contains a very extensive bibliography, including a list of
books on least squares.
17.2
Among the many applications of SVD, a very useful one is data compression, notably for
images. In order to make precise the notion of closeness of matrices, we review briefly the
notion of matrix norm. We assume that the reader is familiar with the concept of vector
nroem and a matrix norm. The concept of a norm is defined in Chapter 7 and the reader
may want to review it before reading any further.
Given an m n matrix of rank r, we would like to find a best approximation of A by a
matrix B of rank k r (actually, k < r) so that kA Bk2 (or kA BkF ) is minimized.
470
Proposition 17.5. Let A be an m n matrix of rank r and let V DU > = A be an SVD for
A. Write ui for the columns of U , vi for the columns of V , and 1 2 p for the
singular values of A (p = min(m, n)). Then a matrix of rank k < r closest to A (in the k k2
norm) is given by
k
X
>
Ak =
i vi u>
i = V diag(1 , . . . , k )U
i=1
and kA Ak k2 = k+1 .
Proof. By construction, Ak has rank k, and we have
p
X
>
kA Ak k2 =
i vi ui
=
V diag(0, . . . , 0, k+1 , . . . , p )U >
2 = k+1 .
i=k+1
It remains to show that kA Bk2 k+1 for all rank-k matrices B. Let B be any rank-k
matrix, so its kernel has dimension p k. The subspace Vk+1 spanned by (v1 , . . . , vk+1 ) has
dimension k + 1, and because the sum of the dimensions of the kernel of B and of Vk+1 is
(p k) + k + 1 = p + 1, these two subspaces must intersect in a subspace of dimension at
least 1. Pick any unit vector h in Ker(B) Vk+1 . Then since Bh = 0, we have
2
>
2
2
2
U h
= k+1
kA Bk22 k(A B)hk22 = kAhk22 =
V DU > h
2 k+1
,
2
which proves our claim.
Note that Ak can be stored using (m + n)k entries, as opposed to mn entries. When
k m, this is a substantial gain.
A nice example of the use of Proposition 17.5 in image compression is given in Demmel
[27], Chapter 3, Section 3.2.3, pages 113115; see the Matlab demo.
An interesting topic that we have not addressed is the actual computation of an SVD.
This is a very interesting but tricky subject. Most methods reduce the computation of an
SVD to the diagonalization of a well-chosen symmetric matrix (which is not A> A). Interested
readers should read Section 5.4 of Demmels excellent book [27], which contains an overview
of most known methods and an extensive list of references.
17.3
Name
Carl Friedrich Gauss
Camille Jordan
Adrien-Marie Legendre
Bernhard Riemann
David Hilbert
Henri Poincare
Emmy Noether
Karl Weierstrass
Eugenio Beltrami
Hermann Schwarz
471
year length
1777
0
1838
12
1752
0
1826
15
1862
2
1854
5
1882
0
1815
0
1835
2
1843
20
We usually form the n d matrix X whose ith row is Xi , with 1 i n. Then the
jth column is denoted by Cj (1 j d). It is sometimes called a feature vector , but this
terminology is far from being universally accepted. In fact, many people in computer vision
call the data points Xi feature vectors!
The purpose of principal components analysis, for short PCA, is to identify patterns in
data and understand the variancecovariance structure of the data. This is useful for the
following tasks:
1. Data reduction: Often much of the variability of the data can be accounted for by a
smaller number of principal components.
2. Interpretation: PCA can show relationships that were not previously suspected.
Given a vector (a sample of measurements) x = (x1 , . . . , xn ) Rn , recall that the mean
(or average) x of x is given by
Pn
xi
x = i=1 .
n
We let x x denote the centered data point
x x = (x1 x, . . . , xn x).
In order to measure the spread of the xi s around the mean, we define the sample variance
(for short, variance) var(x) (or s2 ) of the sample x by
Pn
(xi x)2
var(x) = i=1
.
n1
There is a reason for using n 1 instead of n. The above definition makes var(x) an
unbiased estimator of the variance of the random variable being sampled. However, we
472
dont need to worry about this. Curious readers will find an explanation of these peculiar
definitions in Epstein [35] (Chapter 14, Section 14.5), or in any decent statistics book.
Given two vectors x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ), the sample covariance (for short,
covariance) of x and y is given by
Pn
cov(x, y) =
x)(yi y)
.
n1
i=1 (xi
The covariance of x and y measures how x and y vary from the mean with respect to each
other . Obviously, cov(x, y) = cov(y, x) and cov(x, x) = var(x).
Note that
cov(x, y) =
(x x)> (y y)
.
n1
1
n1
2 = 5.6,
Name
Carl Friedrich Gauss
Camille Jordan
Adrien-Marie Legendre
Bernhard Riemann
David Hilbert
Henri Poincare
Emmy Noether
Karl Weierstrass
Eugenio Beltrami
Hermann Schwarz
473
year length
51.4
5.6
9.6
6.4
76.4
5.6
2.4
9.4
33.6
3.6
25.6
0.6
53.6
5.6
13.4
5.6
6.6
3.6
14.6
14.4
We can think of the vector Cj as representing the features of X in the direction ej (the
jth canonical basis vector in Rd , namely ej = (0, . . . , 1, . . . 0), with a 1 in the jth position).
If v Rd is a unit vector, we wish to consider the projection of the data points X1 , . . . , Xn
onto the line spanned by v. Recall from Euclidean geometry that if x Rd is any vector
and v Rd is a unit vector, the projection of x onto the line spanned by v is
hx, viv.
Thus, with respect to the basis v, the projection of x has coordinate hx, vi. If x is represented
by a row vector and v by a column vector, then
hx, vi = xv.
Therefore, the vector Y Rn consisting of the coordinates of the projections of X1 , . . . , Xn
onto the line spanned by v is given by Y = Xv, and this is the linear combination
Xv = v1 C1 + + vd Cd
of the columns of X (with v = (v1 , . . . , vd )).
Observe that because j is the mean of the vector Cj (the jth column of X), we get
Y = Xv = v1 1 + + vd d ,
and so the centered point Y Y is given by
Y Y = v1 (C1 1 ) + + vd (Cd d ) = (X )v.
Furthermore, if Y = Xv and Z = Xw, then
((X )v)> (X )w
n1
1
= v>
(X )> (X )w
n1
= v > w,
cov(Y, Z) =
474
1
(X )> (X )v.
n1
The above suggests that we should move the origin to the centroid of the Xi s and consider
the matrix X of the centered data points Xi .
From now on, beware that we denote thePcolumns of X by C1 , . . . , Cd and that Y
denotes the centered point Y = (X )v = dj=1 vj Cj , where v is a unit vector.
Basic idea of PCA: The principal components of X are uncorrelated projections Y of the
data points X1 , . . ., Xn onto some directions v (where the vs are unit vectors) such that
var(Y ) is maximal.
This suggests the following definition:
Definition 17.2. Given an n d matrix X of data points X1 , . . . , Xn , if is the centroid of
the Xi s, then a first principal component of X (first PC) is a centered point Y1 = (X )v1 ,
the projection of X1 , . . . , Xn onto a direction v1 such that var(Y1 ) is maximized, where v1 is
a unit vector (recall that Y1 = (X )v1 is a linear combination of the Cj s, the columns of
X ).
More generally, if Y1 , . . . , Yk are k principal components of X along some unit vectors
v1 , . . . , vk , where 1 k < d, a (k +1)th principal component of X ((k +1)th PC) is a centered
point Yk+1 = (X )vk+1 , the projection of X1 , . . . , Xn onto some direction vk+1 such that
var(Yk+1 ) is maximized, subject to cov(Yh , Yk+1 ) = 0 for all h with 1 h k, and where
vk+1 is a unit vector (recall that Yh = (X )vh is a linear combination of the Cj s). The
vh are called principal directions.
The following lemma is the key to the main result about PCA:
Lemma 17.6. If A is a symmetric d d matrix with eigenvalues 1 2 d and
if (u1 , . . . , ud ) is any orthonormal basis of eigenvectors of A, where ui is a unit eigenvector
associated with i , then
x> Ax
max > = 1
x6=0 x x
(with the maximum attained for x = u1 ) and
max
x6=0,x{u1 ,...,uk }
x> Ax
= k+1
x> x
475
x> Ax
= max{x> Ax | x> x = 1},
>
x
x x
and similarly,
max
x6=0,x{u1 ,...,uk }
>
x> Ax
>
=
max
x
Ax
|
(x
{u
,
.
.
.
,
u
}
)
(x
x
=
1)
.
1
k
x
x> x
Since A is a symmetric matrix, its eigenvalues are real and it can be diagonalized with respect
to an orthonormal basis of eigenvectors, so let (u1 , . . . , ud ) be such a basis. If we write
x=
d
X
xi ui ,
i=1
d
X
i x2i .
i=1
If x> x = 1, then
Pd
i=1
i=1
Thus,
max x> Ax | x> x = 1 1 ,
x
and since this maximum is achieved for e1 = (1, 0, . . . , 0), we conclude that
max x> Ax | x> x = 1 = 1 .
x
Pd
i=1
xi = 1.
i=k+1
Thus,
max x> Ax | (x {u1 , . . . , uk } ) (x> x = 1) k+1 ,
x
and since this maximum is achieved for ek+1 = (0, . . . , 0, 1, 0, . . . , 0) with a 1 in position k +1,
we conclude that
max x> Ax | (x {u1 , . . . , uk } ) (x> x = 1) = k+1 ,
x
as claimed.
476
The quantity
x> Ax
x> x
is known as the RayleighRitz ratio and Lemma 17.6 is often known as part of the Rayleigh
Ritz theorem.
Lemma 17.6 also holds if A is a Hermitian matrix and if we replace x> Ax by x Ax and
x> x by x x. The proof is unchanged, since a Hermitian matrix has real eigenvalues and
is diagonalized with respect to an orthonormal basis of eigenvectors (with respect to the
Hermitian inner product).
We then have the following fundamental result showing how the SVD of X yields the
PCs:
Theorem 17.7. (SVD yields PCA) Let X be an n d matrix of data points X1 , . . . , Xn ,
and let be the centroid of the Xi s. If X = V DU > is an SVD decomposition of X
and if the main diagonal of D consists of the singular values 1 2 d , then the
centered points Y1 , . . . , Yd , where
Yk = (X )uk = kth column of V D
and uk is the kth column of U , are d principal components of X. Furthermore,
var(Yk ) =
k2
n1
1
(X )> (X )v.
n1
var(Y ) = v >
1
D2 U > w.
(n 1)
477
12
n1
d2
,
n1
and
where v is a unit vector. By Lemma 17.6, the maximum of the above quantity is the largest
12
1
eigenvalue of U (n1)
D2 U > , namely n1
, and it is achieved for u1 , the first columnn of U .
Now we get
Y1 = (X )u1 = V DU > u1 ,
and since the columns of U form an orthonormal basis, U > u1 = e1 = (1, 0, . . . , 0), and so Y1
is indeed the first column of V D.
1
D2 U > w,
(n 1)
and since the columns of U form an orthonormal basis, U > uk+1 = ek+1 , and Yk+1 is indeed
the (k + 1)th column of V D, which completes the proof of the induction step.
The d columns u1 , . . . , ud of U are usually called the principal directions of X (and
X). We note that not only do we have cov(Yh , Yk ) = 0 whenever h 6= k, but the directions
u1 , . . . , ud along which the data are projected are mutually orthogonal.
We know from our study of SVD that 12 , . . . , d2 are the eigenvalues of the symmetric
positive semidefinite matrix (X )> (X ) and that u1 , . . . , ud are corresponding eigenvectors. Numerically, it is preferable to use SVD on X rather than to compute explicitly
(X )> (X ) and then diagonalize it. Indeed, the explicit computation of A> A from
478
a matrix A can be numerically quite unstable, and good SVD algorithms avoid computing
A> A explicitly.
In general, since an SVD of X is not unique, the principal directions u1 , . . . , ud are not
unique. This can happen when a data set has some rotational symmetries, and in such a
case, PCA is not a very good method for analyzing the data set.
17.4
A problem very close to PCA (and based on least squares) is to best approximate a data
set of n points X1 , . . . , Xn , with Xi Rd , by a p-dimensional affine subspace A of Rd , with
1 p d 1 (the terminology rank d p is also used).
First, consider p = d 1. Then A = A1 is an affine hyperplane (in Rd ), and it is given
by an equation of the form
a1 x1 + + ad xd + c = 0.
By best approximation, we mean that (a1 , . . . , ad , c) solves the homogeneous linear system
a1
0
x1 1 x1 d 1 . .
..
..
..
.. .. = ..
.
.
.
.
ad 0
xn 1 xn d 1
c
0
in the least squares sense, subject to the condition that a = (a1 , . . . , ad ) is a unit vector , that
is, a> a = 1, where Xi = (xi 1 , , xi d ).
If we form the symmetric matrix
x1 1
..
..
.
.
xn 1
>
x1 d 1
x1 1
..
.. ..
..
.
. .
.
xn d 1
xn 1
x1 d 1
..
..
.
.
xn d 1
involved in the normal equations, we see that the bottom row (and last column) of that
matrix is
n1 nd n,
Pn
where nj = i=1 xi j is n times the mean of the column Cj of X.
Therefore, if (a1 , . . . , ad , c) is a least squares solution, that is, a solution of the normal
equations, we must have
n1 a1 + + nd ad + nc = 0,
that is,
a1 1 + + ad d + c = 0,
479
which means that the hyperplane A1 must pass through the centroid of the data points
X1 , . . . , Xn . Then we can rewrite the original system with respect to the centered data
Xi , and we find that the variable c drops out and we get the system
(X )a = 0,
where a = (a1 , . . . , ad ).
Thus, we are looking for a unit vector a solving (X )a = 0 in the least squares sense,
that is, some a such that a> a = 1 minimizing
a> (X )> (X )a.
Compute some SVD V DU > of X , where the main diagonal of D consists of the singular
values 1 2 d of X arranged in descending order. Then
a> (X )> (X )a = a> U D2 U > a,
where a is the last column in U for some SVD V DU > of X , we have shown that the
affine hyperplane A1 = + Ud1 is a best approximation of the data set X1 , . . . , Xn in the
least squares sense.
Is is easy to show that this hyperplane A1 = + Ud1 minimizes the sum of the square
distances of each Xi to its orthogonal projection onto A1 . Also, since Ud1 is the orthogonal
complement of a, the last column of U , we see that Ud1 is spanned by the first d1 columns
of U , that is, the first d 1 principal directions of X .
All this can be generalized to a best (dk)-dimensional affine subspace Ak approximating
X1 , . . . , Xn in the least squares sense (1 k d 1). Such an affine subspace Ak is cut out
by k independent hyperplanes Hi (with 1 i k), each given by some equation
ai 1 x1 + + ai d xd + ci = 0.
If we write ai = (ai 1 , , ai d ), to say that the Hi are independent means that a1 , . . . , ak are
linearly independent. In fact, we may assume that a1 , . . . , ak form an orthonormal system.
Then, finding a best (d k)-dimensional affine subspace Ak amounts to solving the
homogeneous linear system
a1
X 1 0 0 0 0
0
c1
.. .. .. . . .. .. .. .. ..
. . .
. . . . . = . ,
0 0 0 0 X 1 ak
0
ck
480
Again, it is easy to see that each hyperplane Hi must pass through the centroid of
X1 , . . . , Xn , and by switching to the centered data Xi we get the system
X 0
0
a1
0
..
.. . .
.. .. = .. ,
.
.
.
. . .
0
ak
with a>
i aj = i j for all i, j with 1 i, j k.
where Udk is the linear subspace spanned by the first dk principal directions of X , that
is, the first d k columns of U . Consequently, we get the following interesting interpretation
of PCA (actually, principal directions):
Theorem 17.8. Let X be an nd matrix of data points X1 , . . . , Xn , and let be the centroid
of the Xi s. If X = V DU > is an SVD decomposition of X and if the main diagonal
of D consists of the singular values 1 2 d , then a best (d k)-dimensional
affine approximation Ak of X1 , . . . , Xn in the least squares sense is given by
Ak = + Udk ,
where Udk is the linear subspace spanned by the first d k columns of U , the first d k
principal directions of X (1 k d 1).
There are many applications of PCA to data compression, dimension reduction, and
pattern analysis. The basic idea is that in many cases, given a data set X1 , . . . , Xn , with
Xi Rd , only a small subset of m < d of the features is needed to describe the data set
accurately.
If u1 , . . . , ud are the principal directions of X , then the first m projections of the data
(the first m principal components, i.e., the first m columns of V D) onto the first m principal
directions represent the data without much loss of information. Thus, instead of using the
481
original data points X1 , . . . , Xn , with Xi Rd , we can use their projections onto the first m
principal directions Y1 , . . . , Ym , where Yi Rm and m < d, obtaining a compressed version
of the original data set.
For example, PCA is used in computer vision for face recognition. Sirovitch and Kirby
(1987) seem to be the first to have had the idea of using PCA to compress facial images.
They introduced the term eigenpicture to refer to the principal directions, ui . However, an
explicit face recognition algorithm was given only later, by Turk and Pentland (1991). They
renamed eigenpictures as eigenfaces.
For details on the topic of eigenfaces, see Forsyth and Ponce [39] (Chapter 22, Section
22.3.2), where you will also find exact references to Turk and Pentlands papers.
Another interesting application of PCA is to the recognition of handwritten digits. Such
an application is described in Hastie, Tibshirani, and Friedman, [55] (Chapter 14, Section
14.5.1).
482
Chapter 18
Quadratic Optimization Problems
18.1
In this chapter, we consider two classes of quadratic optimization problems that appear
frequently in engineering and in computer science (especially in computer vision):
1. Minimizing
1
f (x) = x> Ax + x> b
2
n
over all x R , or subject to linear or affine constraints.
2. Minimizing
1
f (x) = x> Ax + x> b
2
484
n
DX
i=1
xi ei , f
n
X
i=1
xi e i
E
n
DX
i=1
xi ei ,
n
X
i=1
i xi ei =
n
X
i=1
i x2i ,
485
which is strictly positive, since i > 0 for i = 1, . . . , n, and x2i > 0 for some i, since x 6= 0.
Conversely, assume that
n
X
i x2i ,
i=1
1
P (x) = x> Ax x> b
2
has a global minimum when A is symmetric positive definite.
Proposition 18.2. Given a quadratic function
1
P (x) = x> Ax x> b,
2
if A is symmetric positive definite, then P (x) has a unique global minimum for the solution
of the linear system Ax = b. The minimum value of P (x) is
1
P (A1 b) = b> A1 b.
2
486
Proof. Since A is positive definite, it is invertible, since its eigenvalues are all strictly positive.
Let x = A1 b, and compute P (y) P (x) for any y Rn . Since Ax = b, we get
1
1
P (y) P (x) = y > Ay y > b x> Ax + x> b
2
2
1 >
1
>
= y Ay y Ax + x> Ax
2
2
1
>
= (y x) A(y x).
2
Since A is positive definite, the last expression is nonnegative, and thus
P (y) P (x)
for all y Rn , which proves that x = A1 b is a global minimum of P (x). A simple
computation yields
1
P (A1 b) = b> A1 b.
2
Remarks:
(1) The quadratic function P (x) is also given by
1
P (x) = x> Ax b> x,
2
but the definition using x> b is more convenient for the proof of Proposition 18.2.
(2) If P (x) contains a constant term c R, so that
1
P (x) = x> Ax x> b + c,
2
the proof of Proposition 18.2 still shows that P (x) has a unique global minimum for
x = A1 b, but the minimal value is
1
P (A1 b) = b> A1 b + c.
2
Thus, when the energy function P (x) of a system is given by a quadratic function
1
P (x) = x> Ax x> b,
2
where A is symmetric positive definite, finding the global minimum of P (x) is equivalent to
solving the linear system Ax = b. Sometimes, it is useful to recast a linear problem Ax = b
487
as a variational problem (finding the minimum of some energy function). However, very
often, a minimization problem comes with extra constraints that must be satisfied for all
admissible solutions. For instance, we may want to minimize the quadratic function
Q(y1 , y2 ) =
1 2
y1 + y22
2
488
We shall prove that our constrained minimization problem has a unique solution given
by the system of linear equations
C 1 y + A = b,
A> y = f,
f
Note that the matrix of this system is symmetric. Eliminating y from the first equation
C 1 y + A = b,
we get
y = C(b A),
and substituting into the second equation, we get
A> C(b A) = f,
that is,
A> CA = A> Cb f.
However, by a previous remark, since C is symmetric positive definite and the columns of
A are linearly independent, A> CA is symmetric positive definite, and thus invertible. Note
that this way of solving the system requires solving for the Lagrange multipliers first.
Letting e = b A, we also note that the system
1
C
A
y
b
=
A> 0
f
is equivalent to the system
e = b A,
y = Ce,
A> y = f.
The latter system is called the equilibrium equations by Strang [104]. Indeed, Strang shows
that the equilibrium equations of many physical systems can be put in the above form.
This includes spring-mass systems, electrical networks, and trusses, which are structures
built from elastic bars. In each case, y, e, b, C, , f , and K = A> CA have a physical
489
interpretation. The matrix K = A> CA is usually called the stiffness matrix . Again, the
reader is referred to Strang [104].
In order to prove that our constrained minimization problem has a unique solution, we
proceed to prove that the constrained minimization of Q(y) subject to A> y = f is equivalent
to the unconstrained maximization of another function P (). We get P () by minimizing
the Lagrangian L(y, ) treated as a function of y alone. Since C 1 is symmetric positive
definite and
1
L(y, ) = y > C 1 y (b A)> y > f,
2
by Proposition 18.2 the global minimum (with respect to y) of L(y, ) is obtained for the
solution y of
C 1 y = b A,
y = C(b A),
1
min L(y, ) = (A b)> C(A b) > f.
y
2
Letting
1
P () = (A b)> C(A b) + > f,
2
we claim that the solution of the constrained minimization of Q(y) subject to A> y = f
is equivalent to the unconstrained maximization of P (). Of course, since we minimized
L(y, ) with respect to y, we have
L(y, ) P ()
for all y and all . However, when the constraint A> y = f holds, L(y, ) = Q(y), and thus
for any admissible y, which means that A> y = f , we have
min Q(y) max P ().
y
In order to prove that the unique minimum of the constrained problem Q(y) subject to
A> y = f is the unique maximum of P (), we compute Q(y) + P ().
Proposition 18.3. The quadratic constrained minimization problem of Definition 18.3 has
a unique solution (y, ) given by the system
1
C
A
y
b
=
.
>
A
0
f
Furthermore, the component of the above solution is the unique value for which P () is
maximum.
490
Proof. As we suggested earlier, let us compute Q(y) + P (), assuming that the constraint
A> y = f holds. Eliminating f , since b> y = y > b and > A> y = y > A, we get
1
1
Q(y) + P () = y > C 1 y b> y + (A b)> C(A b) + > f
2
2
1 1
>
= (C y + A b) C(C 1 y + A b).
2
Since C is positive definite, the last expression is nonnegative. In fact, it is null iff
C 1 y + A b = 0,
that is,
C 1 y + A = b.
But then the unique constrained minimum of Q(y) subject to A> y = f is equal to the
unique maximum of P () exactly when A> y = f and C 1 y + A = b, which proves the
proposition.
Remarks:
(1) There is a form of duality going on in this situation. The constrained minimization
of Q(y) subject to A> y = f is called the primal problem, and the unconstrained
maximization of P () is called the dual problem. Duality is the fact stated slightly
loosely as
min Q(y) = max P ().
y
1
P () = e> Ce + > f.
2
This expression often represents the total potential energy of a system. Again, the
optimal solution is the one that minimizes the potential energy (and thus maximizes
P ()).
(2) It is immediately verified that the equations of Proposition 18.3 are equivalent to the
equations stating that the partial derivatives of the Lagrangian L(y, ) are null:
L
= 0,
yi
L
= 0,
j
i = 1, . . . , m,
j = 1, . . . , n.
491
1 2
y1 + y22 + (2y1 y2 5),
2
and the equations stating that the Lagrangian has a saddle point are
L(y1 , y2 , ) =
y1 + 2 = 0,
y2 = 0,
2y1 y2 5 = 0.
We obtain the solution (y1 , y2 , ) = (2, 1, 1).
Much more should be said about the use of Lagrange multipliers in optimization or
variational problems. This is a vast topic. Least squares methods and Lagrange multipliers
are used to tackle many problems in computer graphics and computer vision; see Trucco
and Verri [111], Metaxas [79], Jain, Katsuri, and Schunck [60], Faugeras [37], and Foley, van
Dam, Feiner, and Hughes [38]. For a lucid introduction to optimization methods, see Ciarlet
[24].
18.2
In this section, we complete the study initiated in Section 18.1 and give necessary and
sufficient conditions for the quadratic function 12 x> Ax + x> b to have a global minimum. We
begin with the following simple fact:
Proposition 18.4. If A is an invertible symmetric matrix, then the function
1
f (x) = x> Ax + x> b
2
has a minimum value iff A 0, in which case this optimal value is obtained for a unique
value of x, namely x = A1 b, and with
1
f (A1 b) = b> A1 b.
2
492
1
1
1
f (x) = x> Ax + x> b = (x + A1 b)> A(x + A1 b) b> A1 b.
2
2
2
If A has some negative eigenvalue, say (with > 0), if we pick any eigenvector u of A
associated with , then for any R with 6= 0, if we let x = u A1 b, then since
Au = u, we get
1
1
f (x) = (x + A1 b)> A(x + A1 b) b> A1 b
2
2
1 >
1 > 1
= u Au b A b
2
2
1 2
1
2
= kuk2 b> A1 b,
2
2
and since can be made as large as we want and > 0, we see that f has no minimum.
Consequently, in order for f to have a minimum, we must have A 0. In this case, since
(x + A1 b)> A(x + A1 b) 0, it is clear that the minimum value of f is achieved when
x + A1 b = 0, that is, x = A1 b.
Let us now consider the case of an arbitrary symmetric matrix A.
Proposition 18.5. If A is a symmetric matrix, then the function
1
f (x) = x> Ax + x> b
2
has a minimum value iff A 0 and (I AA+ )b = 0, in which case this minimum value is
1
p = b> A+ b.
2
Furthermore, if A = U > U is an SVD of A, then the optimal value is achieved by all x Rn
of the form
+
> 0
x = A b + U
,
z
for any z Rnr , where r is the rank of A.
Proof. The case that A is invertible is taken care of by Proposition 18.4, so we may assume
that A is singular. If A has rank r < n, then we can diagonalize A as
> r 0
A=U
U,
0 0
493
y
Ux =
z
c
and U b =
,
d
Ux =
and U b =
,
0
0
494
> r c
U b = A+ b
= U
= U
x = U
0
0
0
0
0
0
and the minimum value of f is
For any x Rn of the form
1
f (x ) = b> A+ b.
2
0
x = A b + U
,
z
+
>
for any z Rnr , our previous calculations show that f (x) = 12 b> A+ b.
The case in which we add either linear constraints of the form C > x = 0 or affine constraints of the form C > x = t (where t 6= 0) can be reduced to the unconstrained case using a
QR-decomposition of C or N . Let us show how to do this for linear constraints of the form
C > x = 0.
If we use a QR decomposition of C, by permuting the columns, we may assume that
> R S
C=Q
,
0 0
where R is an r r invertible upper triangular matrix and S is an r (m r) matrix (C
has rank r). Then, if we let
> y
x=Q
,
z
where y Rr and z Rnr , then
> R
C > x = 0 becomes
>
0
0
y
> R
Qx =
= 0,
>
0
S
0
z
1 > >
> y
(y , z )QAQ
+ (y > , z > )Qb
z
2
y = 0, y Rr , z Rnr .
b
Qb = 1 ,
b2
495
b1 Rr , b2 Rnr ,
1
minimize z > G22 z + z > b2 ,
2
the problem solved in Proposition 18.5.
z Rnr ,
which yields
R> y = t.
Since R is invertible, we get y = (R> )1 t, and then it is easy to see that our original problem
reduces to an unconstrained problem in terms of the matrix P > AP ; the details are left as
an exercise.
18.3
In this section we discuss various quadratic optimization problems mostly arising from computer vision (image segmentation and contour grouping). These problems can be reduced to
the following basic optimization problem: Given an n n real symmetric matrix A
maximize
subject to
x> Ax
x> x = 1, x Rn .
496
In view of Proposition 17.6, the maximum value of x> Ax on the unit sphere is equal
to the largest eigenvalue 1 of the matrix A, and it is achieved for any unit eigenvector u1
associated with 1 .
A variant of the above problem often encountered in computer vision consists in minimizing x> Ax on the ellipsoid given by an equation of the form
x> Bx = 1,
where B is a symmetric positive definite matrix. Since B is positive definite, it can be
diagonalized as
B = QDQ> ,
where Q is an orthogonal matrix and D is a diagonal matrix,
D = diag(d1 , . . . , dn ),
with di > 0, for i = 1, . . . , n. If we define the matrices B 1/2 and B 1/2 by
p
p
d1 , . . . , dn Q >
B 1/2 = Q diag
and
p
p
B 1/2 = Q diag 1/ d1 , . . . , 1/ dn Q> ,
it is clear that these matrices are symmetric, that B 1/2 BB 1/2 = I, and that B 1/2 and
B 1/2 are mutual inverses. Then, if we make the change of variable
x = B 1/2 y,
the equation x> Bx = 1 becomes y > y = 1, and the optimization problem
maximize
subject to
x> Ax
x> Bx = 1, x Rn ,
x Ax
x x = 1, x Cn .
497
Again by Proposition 17.6, the maximum value of x Ax on the unit sphere is equal to the
largest eigenvalue 1 of the matrix A and it is achieved for any unit eigenvector u1 associated
with 1 .
It is worth pointing out that if A is a skew-Hermitian matrix, that is, if A = A, then
x Ax is pure imaginary or zero.
x> Ax
x> x = 1, C > x = 0, x Rn .
Golub shows that the linear constraint C > x = 0 can be eliminated as follows: If we use
a QR decomposition of C, by permuting the columns, we may assume that
> R S
C=Q
,
0 0
where R is an rr invertible upper triangular matrix and S is an r(pr) matrix (assuming
C has rank r). Then if we let
> y
x=Q
,
z
where y Rr and z Rnr , then C > x = 0 becomes
>
>
0
0
y
> R
> R
Qx =
= 0,
>
>
S
0
S
0
z
498
minimize
>
>
z > z = 1, z Rnr ,
y = 0, y Rr .
subject to
minimize
subject to
JQAQ J =
0 0
,
0 G22
and if we set
P = Q> JQ,
then
P AP = Q> JQAQ> JQ.
Now, Q> JQAQ> JQ and JQAQ> J have the same eigenvalues, so P AP and JQAQ> J also
have the same eigenvalues. It follows that the solutions of our optimization problem are
among the eigenvalues of K = P AP , and at least r of those are 0. Using the fact that CC +
is the projection onto the range of C, where C + is the pseudo-inverse of C, it can also be
shown that
P = I CC + ,
the projection onto the kernel of C > . In particular, when n p and C has full rank (the
columns of C are linearly independent), then we know that C + = (C > C)1 C > and
P = I C(C > C)1 C > .
499
This fact is used by Cour and Shi [25] and implicitly by Yu and Shi [116].
The problem of adding affine constraints of the form N > x = t, where t 6= 0, also comes
up in practice. At first glance, this problem may not seem harder than the linear problem in
which t = 0, but it is. This problem was extensively studied in a paper by Gander, Golub,
and von Matt [45] (1989).
Gander, Golub, and von Matt consider the following problem: Given an (n+m)(n+m)
real symmetric matrix
matrix N with full rank, and a nonzero
A>(with
n > 0), an (n+m)m
m
>
vector t R with (N ) t < 1 (where (N ) denotes the pseudo-inverse of N > ),
minimize
subject to
x> Ax
x> x = 1, N > x = t, x Rn+m .
The condition
(N > ) t
< 1 ensures that the problem has a solution and is not trivial.
The authors begin by proving that the affine constraint N > x = t can be eliminated. One
way to do so is to use a QR decomposition of N . If
R
N =P
,
0
where P is an orthogonal matrix and R is an m m invertible upper triangular matrix, then
if we observe that
x> Ax = x> P P > AP P > x,
N > x = (R> , 0)P > x = t,
x> x = x> P P > x = 1,
and if we write
B >
P AP =
C
>
and
y
P x=
,
z
>
then we get
x> Ax = y > By + 2z > y + z > Cz,
R> y = t,
y > y + z > z = 1.
Thus
y = (R> )1 t,
500
and if we write
s2 = 1 y > y > 0
and
b = y,
we get the simplified problem
minimize
subject to
z > Cz + 2z > b
z > z = s2 , z R m .
18.4
Summary
The main concepts and results of this chapter are listed below:
Quadratic optimization problems; quadratic functions.
Symmetric positive definite and positive semidefinite matrices.
The positive semidefinite cone ordering.
Existence of a global minimum when A is symmetric positive definite.
Constrained quadratic optimization problems.
Lagrange multipliers; Lagrangian.
Primal and dual problems.
Quadratic optimization problems: the case of a symmetric invertible matrix A.
Quadratic optimization problems: the general case of a symmetric matrix A.
Adding linear constraints of the form C > x = 0.
Adding affine constraints of the form C > x = t, with t 6= 0.
Maximizing a quadratic function over the unit sphere.
Maximizing a quadratic function over an ellipsoid.
Maximizing a Hermitian quadratic form.
Adding linear constraints of the form C > x = 0.
Adding affine constraints of the form N > x = t, with t 6= 0.
Chapter 19
Basics of Affine Geometry
Lalg`ebre nest quune geometrie ecrite; la geometrie nest quune alg`ebre figuree.
Sophie Germain
19.1
Affine Spaces
Geometrically, curves and surfaces are usually considered to be sets of points with some
special properties, living in a space consisting of points. Typically, one is also interested
in geometric properties invariant under certain transformations, for example, translations,
rotations, projections, etc. One could model the space of points as a vector space, but this is
not very satisfactory for a number of reasons. One reason is that the point corresponding to
the zero vector (0), called the origin, plays a special role, when there is really no reason to have
a privileged origin. Another reason is that certain notions, such as parallelism, are handled
in an awkward manner. But the deeper reason is that vector spaces and affine spaces really
have different geometries. The geometric properties of a vector space are invariant under
the group of bijective linear maps, whereas the geometric properties of an affine space are
invariant under the group of bijective affine maps, and these two groups are not isomorphic.
Roughly speaking, there are more affine maps than linear maps.
Affine spaces provide a better framework for doing geometry. In particular, it is possible
to deal with points, curves, surfaces, etc., in an intrinsic manner, that is, independently
of any specific choice of a coordinate system. As in physics, this is highly desirable to
really understand what is going on. Of course, coordinate systems have to be chosen to
finally carry out computations, but one should learn to resist the temptation to resort to
coordinate systems until it is really necessary.
Affine spaces are the right framework for dealing with motions, trajectories, and physical
forces, among other things. Thus, affine geometry is crucial to a clean presentation of
kinematics, dynamics, and other parts of physics (for example, elasticity). After all, a rigid
motion is an affine map, but not a linear map in general. Also, given an m n matrix A
and a vector b Rm , the set U = {x Rn | Ax = b} of solutions of the system Ax = b is an
501
502
b = a + ab,
addition being understood as addition in R3 . Then, in the standard frame, given a point
x = (x1 , x2 , x3 ), the position of x is the vector Ox = (x1 , x2 , x3 ), which coincides with the
point itself. In the standard frame, points and vectors are identified. Points and free vectors
are illustrated in Figure 19.1.
503
ab
a
O
Ox = (x1 , x2 , x3 )
in the frame (O, (e1 , e2 , e3 )) and
x = (x1 1 , x2 2 , x3 3 )
Ox = O + x and O = (1 , 2 , 3 ).
We note that in the second frame (, (e1 , e2 , e3 )), points and position vectors are no longer
identified. This gives us evidence that points are not vectors. It may be computationally
convenient to deal with points using position vectors, but such a treatment is not frame
invariant, which has undesirable effets.
Inspired by physics, we deem it important to define points and properties of points that
are frame invariant. An undesirable side effect of the present approach shows up if we attempt
to define linear combinations of points. First, let us review the notion of linear combination
of vectors. Given two vectors u and v of coordinates (u1 , u2 , u3 ) and (v1 , v2 , v3 ) with respect
to the basis (e1 , e2 , e3 ), for any two scalars , , we can define the linear combination u + v
as the vector of coordinates
(u1 + v1 , u2 + v2 , u3 + v3 ).
If we choose a different basis (e01 , e02 , e03 ) and if the matrix P expressing the vectors (e01 , e02 , e03 )
over the basis (e1 , e2 , e3 ) is
504
a1 b 1 c 1
P = a2 b2 c2 ,
a3 b 3 c 3
which means that the columns of P are the coordinates of the e0j over the basis (e1 , e2 , e3 ),
since
u1 e1 + u2 e2 + u3 e3 = u01 e01 + u02 e02 + u03 e03
and
v1 e1 + v2 e2 + v3 e3 = v10 e01 + v20 e02 + v30 e03 ,
it is easy to see that the coordinates (u1 , u2 , u3 ) and (v1 , v2 , v3 ) of u and v with respect to
the basis (e1 , e2 , e3 ) are given in terms of the coordinates (u01 , u02 , u03 ) and (v10 , v20 , v30 ) of u and
v with respect to the basis (e01 , e02 , e03 ) by the matrix equations
0
0
v1
v1
u1
u1
u2 = P u02 and v2 = P v20 .
v30
v3
u03
u3
From the above, we get
0
u1
u1
u02 = P 1 u2
u03
u3
and
0
v1
v1
v20 = P 1 v2 ,
v30
v3
0
u1
v1
u1 + v1
u1 + v10
u02 + v20 = P 1 u2 + P 1 v2 = P 1 u2 + v2 .
u3
u03 + v30
v3
u3 + v3
Everything worked out because the change of basis does not involve a change of origin. On the
other hand, if we consider the change of frame from the frame (O, (e1 , e2 , e3 )) to the frame
505
unless + = 1.
Thus, we have discovered a major difference between vectors and points: The notion of
linear combination of vectors is basis independent, but the notion of linear combination of
points is frame dependent. In order to salvage the notion of linear combination of points,
some restriction is needed: The scalar coefficients must add up to 1.
A clean way to handle the problem of frame invariance and to deal with points in a more
intrinsic manner is to make a clearer distinction between points and vectors. We duplicate
R3 into two copies, the first copy corresponding to points, where we forget the vector space
structure, and the second copy corresponding to free vectors, where the vector space structure
is important. Furthermore, we make explicit the important fact that the vector space R3 acts
on the set of points R3 : Given any point a = (a1 , a2 , a3 ) and any vector v = (v1 , v2 , v3 ),
we obtain the point
a + v = (a1 + v1 , a2 + v2 , a3 + v3 ),
which can be thought of as the result of translating a to b using the vector v. We can imagine
that v is placed such that its origin coincides with a and that its tip coincides with b. This
action + : R3 R3 R3 satisfies some crucial properties. For example,
a + 0 = a,
(a + u) + v = a + (u + v),
and for any two points a, b, there is a unique free vector ab such that
b = a + ab.
It turns out that the above properties, although trivial in the case of R3 , are all that is
needed to define the abstract notion of affine space (or affine structure). The basic idea is
to consider two (distinct) sets E and E , where E is a set of points (with no structure) and
506
Intuitively, we can think of the elements of E as forces moving the points in E, considered
a translation. By this, we mean that for every force u E , the action of the force u is to
move every point a E to the point a + u E obtained by the translation corresponding
or a triple E, E , + consisting of a nonempty set E (of points), a vector space E (of trans
b = a + ab
The dimension of the affine space E, E , + is the dimension dim( E ) of the vector space
Conditions (A1) and (A2) say that the (abelian) group E acts on E, and condition (A3)
a(a + v) = v
for all a E and all v E , since a(a + v) is the unique vector such that a+v = a+ a(a + v).
507
E
b=a+u
u
a
c=a+w
w
v
The axioms defining an affine space E, E , + can be interpreted intuitively as saying
that E and E are two different ways of looking at the same object, but wearing different
sets of glasses, the second set of glasses depending on the choice of an origin in E. Indeed,
we can choose to look at the points in E, forgetting that every pair (a, b) of points defines a
unique vector ab in E , or we can choose to look at the vectors u in E , forgetting the points
in E. Furthermore, if we also pick any point a in E, a point that can be viewed as an origin
in E, then we can recover all the points in E as the translated points a + u for all u E .
b 7 ab,
where b E. The composition of the first mapping with the second is
u 7 a + u 7 a(a + u),
which, in view of (A3), yields u. The composition of the second with the first mapping is
b 7 ab 7 a + ab,
which, in view of (A3), yields b. Thus, these compositions are the identity from E to E
and the identity from E to E, and the mappings are both bijections.
When we identify E with E via the mapping b 7 ab, we say that we consider E as the
vector space obtained by taking a as the origin in E, and we denote it by Ea . Because Ea is
508
a vector space, to be consistent with our notational conventions we should use the notation
Thus, an affine space E, E , + is a way of defining a vector space structure on a set of
points E, without making a commitment to a fixed origin in E. Nevertheless, as soon as
we commit to an origin a in E, we can view E as the vector space Ea . However, we urge
the reader to think of E as a physical set of points and of E as a set of forces acting on E,
rather than reducing E to some isomorphic copy of Rn . After all, points are points, and not
vectors! For notational simplicity, we will often denote an affine space E, E , + by (E, E ),
or even by E. The vector space E is called the vector space associated with E.
One should be careful about the overloading of the addition symbol +. Addition
is well-defined on vectors, as in u + v; the translate a + u of a point a E by a
vector u E is also well-defined, but addition of points a + b does not make sense. In
this respect, the notation b a for the unique vector u such that b = a + u is somewhat
confusing, since it suggests that points can be subtracted (but not added!).
Any vector space E has an affine space structure specified by choosing E = E , and
letting + be addition in the vector space E . We will refer to the affine structure E , E , +
vector space Rn can be viewed as the
space
Rn , Rn , + , denoted by An . In general,
naffine
n
if K is any field, the affine space K , K , + is denoted by AnK . In order to distinguish
between the double role played by members of Rn , points and vectors, we will denote points
by row vectors, and vectors by column vectors. Thus, the action of the vector space Rn over
the set Rn simply viewed as a set of points is given by
u1
..
(a1 , . . . , an ) + . = (a1 + u1 , . . . , an + un ).
un
We will also use the convention that if x = (x1 , . . . , xn ) Rn , then the column vector
associated with x is denoted by x (in boldface notation). Abusing the notation slightly, if
a Rn is a point, we also write a An . The affine space An is called the real affine space of
dimension n. In most cases, we will consider n = 1, 2, 3.
19.2
Let us now give an example of an affine space that is not given as a vector space (at least, not
in an obvious fashion). Consider the subset L of A2 consisting of all points (x, y) satisfying
the equation
x + y 1 = 0.
The set L is the line of slope 1 passing through the points (1, 0) and (0, 1) shown in Figure
19.3.
509
510
b
a
ab
c
ac
bc
19.3
Chasless Identity
c = b + bc = (a + ab) + bc = a + ( ab + bc)
by (A2), and thus, by (A3),
ab + bc =
ac,
= 0.
aa
Thus, letting a = c in Chasless identity, we get
ba = ab.
Given any four points a, b, c, d E, since by Chasless identity
ab + bc = ad + dc =
ac,
511
ab = dc iff bc = ad.
19.4
iI
i = 1, then
a+
=b+
i
aa
i
iI
(2) If
iI
iI
i bai .
iI
i = 0, then
X
=
i
aa
i
X
iI
i bai .
512
=a+
a+
i
aa
i ( ab + bai )
i
iI
iI
=a+
X
i ab +
i bai
iI
= a + ab +
i bai
iI
since
iI
=b+
i bai
iI
i = 1
since b = a + ab.
iI
=
aa
i
i
i ( ab + bai )
iI
iI
X
X
i ab +
i bai
iI
i bai ,
iI
iI
since
iI
i = 0.
Thus, by Lemma
P 19.1, for any family of points (ai )iI in E, for any family (i )iI of
scalars such that iI i = 1, the point
x=a+
i
aa
i
iI
is independent of the choice of the origin a E. This property motivates the following
definition.
Definition
19.2. For any family of points (ai )iI in E, for any family (i )iI of scalars such
P
that iI i = 1, and for any a E, the point
a+
i
aa
i
iI
513
=
for every a E,
ax
i
aa
i
iI
(3) When iI P
i = 0, the vector
iI i aai does not depend on the point a, and we may
denote it by iI i ai . This observation will be used to define a vector space in which
linear
combinations of both points and vectors make sense, regardless of the value of
P
iI i .
Figure 19.5 illustrates
the
geometric
construction of the barycenters g1 and g2 of the
weighted points a, 14 , b, 41 , and c, 21 , and (a, 1), (b, 1), and (c, 1).
The point g1 can be constructed geometrically as the middle of the segment joining c to
the middle 12 a + 12 b of the segment (a, b), since
1 1
1
1
g1 =
a + b + c.
2 2
2
2
The point g2 can be constructed geometrically as the point such that the middle 12 b + 12 c of
the segment (b, c) is the middle of the segment (a, g2 ), since
1
1
g2 = a + 2 b + c .
2
2
514
g1
a
b
c
g2
Such a curve is called a Bezier curve, and (a, b, c, d) are called its control points. Note that
the curve passes through a and d, but generally not through b and c. It can be sbown
that any point F (t) on the curve can be constructed using an algorithm performing affine
interpolation steps (the de Casteljau algorithm).
19.5
Affine Subspaces
515
Definition 19.3. Given an affine space E, E , + , a subset V of E is an affine subspace (of
P
E, E , + ) if for
P every family of weighted points ((ai , i ))iI in V such that iI i = 1,
the barycenter iI i ai belongs to V .
An affine subspace is also called a flat by some authors. According to Definition 19.3,
the empty set is trivially an affine subspace, and every intersection of affine subspaces is an
affine subspace.
As an example, consider the subset U of R2 defined by
U = (x, y) R2 | ax + by = c ,
i.e., the set of solutions of the equation
ax + by = c,
where it is assumed that a 6= 0 or b 6= 0. Given any m points (xi , yi ) U and any m scalars
i such that 1 + + m = 1, we claim that
m
X
i=1
i (xi , yi ) U.
axi + byi = c,
and if we multiply both sides of this equation by i and add up the resulting m equations,
we get
m
m
X
X
(i axi + i byi ) =
i c,
i=1
i=1
m
X
i=1
i=1
i xi ,
m
X
i=1
!
i
c = c,
i=1
!
i yi
m
X
m
X
i=1
i (xi , yi ) U.
U = (x, y) R2 | ax + by = 0 ,
i.e., the set of solutions of the homogeneous equation
ax + by = 0
516
i (xi , yi ) U ,
this time without any restriction on the i , since the right-hand side of the equation is
U = (x0 , y0 ) + U ,
where
o
(x0 , y0 ) + U = (x0 + u1 , y0 + u2 ) | (u1 , u2 ) U .
First, (x0 , y0 ) + U U , since ax0 + by0 = c and au1 + bu2 = 0 for all (u1 , u2 ) U . Second,
if (x, y) U , then ax + by = c, and since we also have ax0 + by0 = c, by subtraction, we get
a(x x0 ) + b(y y0 ) = 0,
which shows that (x x0 , y y0 ) U , and thus (x, y) (x0 , y0 ) + U . Hence, we also have
517
U = (x0 , y0 ) + U .
More generally, it is easy to prove the following fact. Given any m n matrix A and any
vector b Rm , the subset U of Rn defined by
U = {x Rn | Ax = b}
is an affine subspace of An .
Actually, observe that Ax = b should really be written as Ax> = b, to be consistent with
our convention that points are represented by row vectors. We can also use the boldface
notation for column vectors, in which case the equation is written as Ax = b. For the sake of
minimizing the amount of notation, we stick to the simpler (yet incorrect) notation Ax = b.
If we consider the corresponding homogeneous equation Ax = 0, the set
U = {x Rn | Ax = 0}
is a subspace of Rn , and for any x0 U , we have
U = x0 + U .
E . Let V be a nonempty subset of E. For every family (a1 , . . . , an ) in V , for any family
(1 , . . . , n ) of scalars, and for every point a V , observe that for every x E,
x=a+
n
X
i
aa
i
i=1
since
n
X
i=1
i + 1
n
X
i = 1.
i=1
Given any point a E and any subset V of E , let a + V denote the following subset of E:
n
o
a+ V = a+v |v V .
518
V =a+ V
Lemma 19.2. Let E, E , + be an affine space.
(1) A nonempty subset V of E is an affine subspace iff for every point a V , the set
|xV}
Va = {
ax
| x, y V }
V = {
xy
(2) For any subspace V of E and for any a E, the set V = a + V is an affine subspace.
Proof. The proof is straightforward, and is omitted. It is also given in Gallier [43].
In particular, when E is the natural affine space associated with a vector space E , Lemma
19.2 shows that every affine subspace of E is of the form u + U , for a subspace U of E .
also clear that the map + : V V V induced by + : E E E confers to V, V , + an
affine structure. Figure 19.7 illustrates the notion of affine subspace.
519
We say that two affine subspaces U and V are parallel if their directions are identical.
ac
that
ac = ab, and we define the ratio
= .
ab
Lemma 19.3. Given
an
affine
space
E,
P
P E , + , for any family (ai )iI of points in E, the
set V of barycenters iI i ai (where iI i = 1) is the smallest affine subspace containing
(ai )iI .
P
Proof. If (ai )iI is empty, then V = , because of the condition iI i = 1. If (ai )iI is
nonempty, then
P the smallest affine subspace containing (ai )iI must contain the set V of
barycenters iI i ai , and thus, it is enough to show that V is closed under affine combinations, which is immediately verified.
Given a nonempty subset S of E, the smallest affine subspace of E generated by S is
often denoted by hSi. For example, a line specified by two distinct points a and b is denoted
by ha, bi, or even (a, b), and similarly for planes, etc.
Remarks:
(1) Since it can be shown that the barycenter of n weighted points can be obtained by
repeated computations of barycenters of two weighted points, a nonempty subset V
of E is an affine subspace iff for every two points a, b V , the set V contains all
barycentric combinations of a and b. If V contains at least two points, then V is an
affine subspace iff for any two distinct points a, b V , the set V contains the line
determined by a and b, that is, the set of all points (1 )a + b, R.
(2) This result still holds if the field K has at least three distinct elements, but the proof
is trickier!
19.6
Corresponding to the notion of linear independence in vector spaces, we have the notion of
affine independence. Given a family (ai )iI of points in an affine space E, we will reduce the
520
notion of (affine) independence of these points to the (linear) independence of the families
(
a
i aj )j(I{i}) of vectors obtained by choosing any ai as an origin. First, the following lemma
shows that it is sufficient to consider only one of these families.
Lemma 19.4. Given an affine space E, E , + , let (ai )iI be a family of points in E. If the
family (
a
a)
is linearly independent for some i I, then (
a
a)
is linearly
i j j(I{i})
i j j(I{i})
j
a
k aj = 0.
j(I{k})
Since
a
k aj = ak ai + ai aj ,
we have
X
j
a
k aj =
j(I{k})
j
a
k ai +
X
j(I{k})
j(I{k})
j(I{i,k})
and thus
X
j(I{i,k})
j
a
i aj
j
a
i aj ,
j(I{k})
j
a
k ai +
X
X
j
a
i aj ,
j(I{i,k})
j
a
i aj
X
X
j
a
i ak ,
j(I{k})
j
a
i ak = 0.
j(I{k})
Definition 19.4. Given an affine space E, E , + , a family (ai )iI of points in E is affinely
independent if the family (
a
a)
is linearly independent for some i I.
i j j(I{i})
Definition 19.4 is reasonable, because by Lemma 19.4, the independence of the family
(ai aj )j(I{i}) does not depend on the choice of ai . A crucial property of linearly independent
vectors (u1 , . . . , um ) is that if a vector v is a linear combination
v=
m
X
i ui
i=1
of the ui , then the i are unique. A similar result holds for affinely independent points.
521
E
a2
a
0 a2
a0
a1
a
0 a1
Lemma 19.5. Given an affine space E, E
am ) be a family of m + 1 points
P,m+ , let (a0 , . . . ,P
m
i = 1. Then, the family
in E. Let x E, and assume
Pmthat x = i=0 i ai , where i=0
a
is
unique
iff
the
family
(
a
(0 , . . . , m ) such that x =
0 a1 , . . . , a0 am ) is linearly
i=0 i i
independent.
Proof. The proof is straightforward and is omitted. It is also given in Gallier [43].
Lemma 19.5 suggests the notion of affine frame. Affine frames are the affine analogues
of bases in vector spaces. Let E, E , + be a nonempty affine space, and let (a0 , . . . , am )
be a family of m + 1 points in E. The family (a0 , . . . , am ) determines the family of m
vectors (
a
0 a1 , . . . , a0 am ) in E . Conversely, given a point a0 in E and a family of m vectors
E, and a pair (a0 , (u1 , . . . , um )), where the ui are vectors in E . Figure 19.8 illustrates the
notion of affine independence.
Remark: The above observation also applies to infinite families (ai )iI of points in E and
families (ui )iI{0} of vectors in E , provided that the index set I contains 0.
When (
a
0 a1 , . . . , a0 am ) is a basis of E then, for every x E, since x = a0 + a0 x, there
is a unique family (x1 , . . . , xm ) of scalars such that
x=a +x
a
a + + x
a a.
0
1 0 1
m 0 m
(a0 , (
a
0 a1 , . . . , a0 am )). Since
!
m
m
m
X
X
X
x=a +
x a a iff x = 1
x a +
xa,
0
i 0 i
i=1
i=1
i i
i=1
522
m
X
i ai
i=0
P
Pm
with m
i=0 i = 1, and where 0 = 1
i=1 xi , and i = xi for 1 i m. The scalars
(0 , . . . , m ) are also certain kinds of coordinates with respect to (a0 , . . . , am ). All this is
summarized in the following definition.
Definition 19.5. Given an affine space E, E , + , an affine frame with origin a0 is a family
x = a0 + x1
a
0 a1 + + x m a0 am
for a unique family (x1 , . . . , xm ) of scalars, called the coordinates of x w.r.t. the affine frame
(a0 , (
a
0 a1 , . . ., a0 am )). Furthermore, every x E can be written as
x = 0 a0 + + m am
for some unique family (0 , . . . , m ) of scalars such that 0 + +m = 1 called the barycentric
coordinates of x with respect to the affine frame (a0 , . . . , am ).
The coordinates (x1 , .P
. . , xm ) and the barycentric coordinates (0 , . . ., m ) are related by
the equations 0 = 1 m
i=1 xi and i = xi , for 1 i m. An affine frame is called an
affine basis by some authors. A family (ai )iI of points in E is affinely dependent if it is not
affinely independent. We can also characterize affinely dependent families as follows.
Lemma 19.6. Given an affine space E, E , + , let (ai )iI be a family of points in E. The
family P
(ai )iI is affinelyP
dependent iff there is a family (i )iI such that j 6= 0 for some
= 0 for every x E.
j I, iI i = 0, and iI i
xa
i
Proof. By Lemma 19.5, the family (ai )iI is affinely dependent iff the family of vectors
(
a
i aj )j(I{i}) is linearly dependent for some i I. For any i I, the family (ai aj )j(I{i})
is linearly dependent iff there is a family (j )j(I{i}) such that j 6= 0 for some j, and such
that
X
j
a
i aj = 0.
j(I{i})
j
a
i aj =
j(I{i})
X
j(I{i})
X
j(I{i})
)
j (
xa
xa
j
i
j
xa
j
X
j(I{i})
,
j
xa
i
523
a0
a0
a1
a3
a0
a1
a0
a2
a1
and letting i =
j(I{i}) j , we get
iI i xai = 0, with
iI i = 0 and j 6= 0 for
some j I. The converse is obvious by setting x = ai for some i such that i 6= 0, since
P
iI i = 0 implies that j 6= 0, for some j 6= i.
P
Even though Lemma 19.6 is rather dull, it is one of the key ingredients in the proof of
beautiful and deep theorems about convex sets, such as Caratheodorys theorem, Radons
theorem, and Hellys theorem.
iff ab and
ac are linearly independent, which means that a, b, and c are not on the same line
(they are not collinear). In this case, the affine subspace generated by (a, b, c) is the set of all
points (1 )a + b + c, which is the unique plane containing a, b, and c. A family of
524
spanned by (a0 , . . . , an )). When n = 1, we get the segment between a0 and a1 , including
a0 and a1 . When n = 2, we get the interior of the triangle whose vertices are a0 , a1 , a2 ,
including boundary points (the edges). When n = 3, we get the interior of the tetrahedron
whose vertices are a0 , a1 , a2 , a3 , including boundary points (faces and edges). The set
{a +
a
a + +
a
a | where 0 1 ( R)}
0
1 0 1
n 0 n
x = 0 a + 1 b + 2 c,
where 0 + 1 + 2 = 1. How can we compute 0 , 1 , 2 ? Letting a = (a1 , a2 , a3 ), b =
(b1 , b2 , b3 ), c = (c1 , c2 , c3 ), and x = (x1 , x2 , x3 ) be the coordinates of a, b, c, x in the standard
frame of A3 , it is tempting to solve the system of equations
a1 b 1 c 1
0
x1
a2 b2 c2 1 = x2 .
a3 b 3 c 3
2
x3
However, there is a problem when the origin of the coordinate system belongs to the plane
(a, b, c), since in this case, the matrix is not invertible! What we should really be doing is to
solve the system
0 Oa + 1 Ob + 2 Oc = Ox,
where O is any point not in the plane (a, b, c). An alternative is to use certain well-chosen
cross products.
It can be shown that barycentric coordinates correspond to various ratios of areas and
volumes; see the problems.
19.7
Affine Maps
Corresponding to linear maps we have the notion of an affine map. An affine map is defined
as a map preserving affine combinations.
Definition 19.6. Given two affine spaces E, E , + and E 0 , E 0 , +0 , a functionPf : E E 0
is an affine map iff for every family ((ai , i ))iI of weighted points in E such that iI i = 1,
we have
!
X
X
f
i ai =
i f (ai ).
iI
iI
525
Affine maps can be obtained from linear maps as follows. For simplicity of notation, the
same symbol + is used for both affine spaces (instead of using both + and +0 ).
Given any point a E, any point b E 0 , and any linear map h : E E 0 , we claim that
the map f : E E 0 defined such that
f (a + v) = b + h(v)
P
is an affine map. Indeed, for any family (i )iI of scalars with iI i = 1 and any family
(vi )iI , since
X
X
X
i (a + vi ) = a +
i a(a + vi ) = a +
i vi
iI
iI
and
X
i (b + h(vi )) = b +
iI
iI
i b(b + h(vi )) = b +
i h(vi ),
iI
iI
we have
!
f
i (a + vi )
!
X
= f a+
iI
i vi
iI
!
X
= b+h
i vi
iI
= b+
i h(vi )
iI
i (b + h(vi ))
iI
i f (a + vi ).
iI
iI
i (a + vi ) = a +
iI
i vi
iI
and
X
iI
i (b + h(vi )) = b +
i h(vi ).
iI
526
a!
a
c!
b!
2/2 2/2
1 2
1 1
,
= 2
0 1
1 3
2/2
2/2
this affine map is the composition
of a shear, followed by a rotation of angle /4, followed by
a magnification of ratio 2, followed by a translation. The effect of this map on the square
(a, b, c, d) is shown in Figure 19.11. The image of the square (a, b, c, d) is the parallelogram
(a0 , b0 , c0 , d0 ).
The following lemma shows the converse of what we just showed. Every affine map is
determined by the image of any point and a linear map.
f (a + v) = f (a) + f (v),
f (v) = f (a)f (a + v)
527
c!
d!
c
b!
a!
and also
since a + v = a + a(a + v) + (1 )
aa,
a + u + v = (a + u) + (a + v) a,
since a + u + v = a + a(a + u) + a(a + v)
aa. Since f preserves barycenters, we get
f (a + v) = f (a + v) + (1 )f (a).
If we recall
P that x =
(with iI i = 1) iff
iI
bx =
i bai for every b E,
iI
we get
showing that f (u + v) = f (u) + f (v). Consequently, f is a linear map. For any other
point b E, since
b + v = a + ab + v = a + a(a + v)
aa + ab,
528
f (b)f (b + v) = f (b)f (a + v) f (b)f (a) + f (b)f (b),
= f (a)f (b) + f (b)f (a + v),
= f (a)f (a + v).
Thus, f (b)f (b + v) = f (a)f (a + v), which shows that the definition of f does not depend
on the choice of a E. The fact that f is unique is obvious: We must have f (v) =
f (a)f (a + v).
The unique linear map f : E E 0 given by Lemma 19.7 is called the linear map
associated with the affine map f .
Note that the condition
f (a + v) = f (a) + f (v),
f (x) = f (a) + f (
ax),
or f (a)f (x) = f (
ax),
for all a, x E. Lemma 19.7 shows that for any affine map f : E E 0 , there are points
f (a + v) = b + f (v),
for all v E (just let b = f (a), for any a E). Affine maps for which f is the identity
=x+
+
f (x) = f (a) + f (
ax) = f (a) +
ax
xa
af (a) +
ax
+ af (a)
= x + af (a),
= x+
xa
xa
and so
xf (x) = af (a),
which shows that f is the translation induced by the vector af (a) (which does not depend
on a).
Since an affine map preserves barycenters, and since an affine subspace V is closed under
barycentric combinations, the image f (V ) of V is an affine subspace in E 0 . So, for example,
the image of a line is a point or a line, and the image of a plane is either a point, a line, or
a plane.
529
It is easily verified that the composition of two affine maps is an affine map. Also, given
affine maps f : E E 0 and g : E 0 E 00 , we have
An affine map f : E E 0 is constant iff f : E E 0 is the null (constant) linear map equal
to 0 for all v E .
If E is an affine space of dimension m and (a0 , a1 , . . . , am ) is an affine frame for E, then
for any other affine space F and for any sequence (b0 , b1 , . . . , bm ) of m + 1 points in F , there
is a unique affine map f : E F such that f (ai ) = bi , for 0 i m. Indeed, f must be
such that
f (0 a0 + + m am ) = 0 b0 + + m bm ,
where 0 + +m = 1, and this defines a unique affine map on all of E, since (a0 , a1 , . . . , am )
is an affine frame for E.
Using affine frames, affine maps can be represented in terms of matrices. We explain how
an affine map f : E E is represented with respect to a frame (a0 , . . . , an ) in E, the more
general case where an affine map f : E F is represented with respect to two affine frames
(a0 , . . . , an ) in E and (b0 , . . . , bm ) in F being analogous. Since
x = x1
a
0 a1 + + x n a0 an ,
a0 f (a0 ) = b1
a
0 a1 + + b n a0 an ,
a0 f (a0 + x) = y1
a
0 a1 + + yn a0 an ,
530
part of the affine map. Affine maps do not always have a fixed point. Obviously, nonnull
translations have no fixed point. A less trivial example is given by the affine map
x1
1 0
x1
1
7
+
.
x2
0 1
x2
0
This map is a reflection about the x-axis followed by a translation along the x-axis. The
affine map
3
x1
1
x1
1
+
7
x2
1
x2
3/4 1/4
can also be written as
x1
1
x1
2 0
1/2
3/2
+
7
x
1
x2
0 1/2
3/2
1/2
2
which shows that it is the composition of a rotation of angle /3, followed by a stretch (by a
factor of 2 along the x-axis, and by a factor of 12 along the y-axis), followed by a translation.
It is easy to show that this affine map has a unique fixed point. On the other hand, the
affine map
x1
8/5 6/5
x1
1
7
+
x2
3/10 2/5
x2
1
has no fixed point, even though
8/5 6/5
2 0
4/5 3/5
=
,
3/10 2/5
0 1/2
3/5 4/5
and the second matrix is a rotation of angle such that cos = 54 and sin = 35 . For more
on fixed points of affine maps, see the problems.
There is a useful trick to convert the equation y = Ax + b into what looks like a linear
equation. The trick is to consider an (n + 1) (n + 1) matrix. We add 1 as the (n + 1)th
component to the vectors x, y, and b, and form the (n + 1) (n + 1) matrix
A b
0 1
so that y = Ax + b is equivalent to
y
A b
x
=
.
1
0 1
1
This trick is very useful in kinematics and dynamics, where A is a rotation matrix. Such
affine maps are called rigid motions.
If f : E E 0 is a bijective affine map, given any three collinear points a, b, c in E,
with a 6= b, where, say, c = (1 )a + b, since f preserves barycenters, we have f (c) =
531
(1 )f (a) + f (b), which shows that f (a), f (b), f (c) are collinear in E 0 . There is a converse
to this property, which is simpler to state when the ground field is K = R. The converse
states that given any bijective function f : E E 0 between two real affine spaces of the
same dimension n 2, if f maps any three collinear points to collinear points, then f is
affine. The proof is rather long (see Berger [8] or Samuel [90]).
Given three collinear points a, b, c, where a 6= c, we have b = (1 )a + c for some
unique , and we define the ratio of the sequence a, b, c, as
ab
=
,
ratio(a, b, c) =
(1 )
bc
provided that 6= 1, i.e., b 6= c. When b = c, we agree that ratio(a, b, c) = . We warn our
ba
readers that other authors define the ratio of a, b, c as ratio(a, b, c) =
. Since affine maps
bc
preserve barycenters, it is clear that affine maps preserve the ratio of three points.
19.8
Affine Groups
We now take a quick look at the bijective affine maps. Given an affine space E, the set of
affine bijections f : E E is clearly a group, called the affine group of E, and denoted by
GA(E). Recall that the group of bijective linear maps of the vector space E is denoted by
E
E
) of GA(E) is particularly interesting. It turns out that it
subgroup DIL(E) = L1 (R id
E
is the disjoint union of the translations and of the dilatations of ratio 6= 1. The elements
of DIL(E) are called affine dilatations.
Given any point a E, and any scalar R, a dilatation or central dilatation (or
homothety) of center a and ratio is a map Ha, defined such that
Ha, (x) = a +
ax,
for every x E.
Remark: The terminology does not seem to be universally agreed upon. The terms affine
dilatation and central dilatation are used by Pedoe [88]. Snapper and Troyer use the term
dilation for an affine dilatation and magnification for a central dilatation [99]. Samuel uses
homothety for a central dilatation, a direct translation of the French homothetie [90]. Since
dilation is shorter than dilatation and somewhat easier to pronounce, perhaps we should use
that!
Observe that Ha, (a) = a, and when 6= 0 and x 6= a, Ha, (x) is on the line defined by
by .
a and x, and is obtained by scaling
ax
532
b!
b
c
c!
E
Ha, is an affine bijection. It is immediately verified that
Ha, Ha, = Ha, .
We have the following useful result.
,
Lemma 19.8. Given any affine space E, for any affine bijection f GA(E), if f = id
E
for some R with 6= 1, then there is a unique point c E such that f = Hc, .
Proof. The proof is straightforward, and is omitted. It is also given in Gallier [43].
Clearly, if f = id
, the affine map f is a translation. Thus, the group of affine
E
dilatations DIL(E) is the disjoint union of the translations and of the dilatations of ratio
6= 0, 1. Affine dilatations can be given a purely geometric characterization.
Another point worth mentioning is that affine bijections preserve the ratio of volumes of
parallelotopes. Indeed, given any basis B = (u1 , . . . , um ) of the vector space E associated
with the affine space E, given any m + 1 affinely independent points (a0 , . . . , am ), we can
detB f (
a0 a1 ), . . . , f (
a0 am ) = det f detB (
a
0 a1 , . . . , a0 am )
533
and the determinant of a linear map is intrinsic (i.e., depends only on f , and not on the
particular basis B), we conclude that the ratio
detB f (
a0 a1 ), . . . , f (
a0 am )
= det f
det (a a , . . . , a a )
B
0 1
0 m
such that det f = 1 preserve volumes. These affine maps form a subgroup SA(E) of
GA(E) called the special affine group of E. We now take a glimpse at affine geometry.
19.9
In this section we state and prove three fundamental results of affine geometry. Roughly
speaking, affine geometry is the study of properties invariant under affine bijections. We now
prove one of the oldest and most basic results of affine geometry, the theorem of Thales.
Lemma 19.9. Given any affine space E, if H1 , H2 , H3 are any three distinct parallel hyperplanes, and A and B are any two lines not parallel to Hi , letting ai = Hi A and bi = Hi B,
then the following ratios are equal:
a
b1 b3
1 a3
= = .
a1 a2
b1 b2
Conversely, for any point d on the line A, if
a1 d
a
1 a2
= , then d = a3 .
Proof. Figure 19.13 illustrates the theorem of Thales. We sketch a proof, leaving the details
as an exercise. Since H1 , H2 , H3 are parallel, they have the same direction H , a hyperplane
in E . Let u E H be any nonnull vector such that A = a1 +Ru. Since A is not parallel to
H, we have E = H Ru, and thus we can define the linear map p : E Ru, the projection
for all w E . Clearly, f (b1 ) = a1 , and since H1 , H2 , H3 all have direction H , we also have
f (b2 ) = a2 and f (b3 ) = a3 . Since f is affine, it preserves ratios, and thus
a
b1 b3
1 a3
= .
a1 a2
b1 b2
The converse is immediate.
534
a1
b1
H1
H2
a2
a3
b2
b3
H3
A
535
We also have the following simple lemma, whose proof is left as an easy exercise.
Lemma 19.10. Given any affine space E, given any two distinct points a, b E, and for
any affine dilatation f different from the identity, if a0 = f (a), D = ha, bi is the line passing
through a and b, and D0 is the line parallel to D and passing through a0 , the following are
equivalent:
(i) b0 = f (b);
(ii) If f is a translation, then b0 is the intersection of D0 with the line parallel to ha, a0 i
passing through b;
If f is a dilatation of center c, then b0 = D0 hc, bi.
The first case is the parallelogram law, and the second case follows easily from Thales
theorem.
We are now ready to prove two classical results of affine geometry, Pappuss theorem and
Desarguess theorem. Actually, these results are theorems of projective geometry, and we
are stating affine versions of these important results. There are stronger versions that are
best proved using projective geometry.
Lemma 19.11. Given any affine plane E, any two distinct lines D and D0 , then for any
distinct points a, b, c on D and a0 , b0 , c0 on D0 , if a, b, c, a0 , b0 , c0 are distinct from the intersection of D and D0 (if D and D0 intersect) and if the lines ha, b0 i and ha0 , bi are parallel,
and the lines hb, c0 i and hb0 , ci are parallel, then the lines ha, c0 i and ha0 , ci are parallel.
Proof. Pappuss theorem is illustrated in Figure 19.14. If D and D0 are not parallel, let d
be their intersection. Let f be the dilatation of center d such that f (a) = b, and let g be the
dilatation of center d such that g(b) = c. Since the lines ha, b0 i and ha0 , bi are parallel, and
the lines hb, c0 i and hb0 , ci are parallel, by Lemma 19.10 we have a0 = f (b0 ) and b0 = g(c0 ).
However, we observed that dilatations with the same center commute, and thus f g = g f ,
and thus, letting h = g f , we get c = h(a) and a0 = h(c0 ). Again, by Lemma 19.10, the
lines ha, c0 i and ha0 , ci are parallel. If D and D0 are parallel, we use translations instead of
dilatations.
There is a converse to Pappuss theorem, which yields a fancier version of Pappuss
theorem, but it is easier to prove it using projective geometry. It should be noted that
in axiomatic presentations of projective geometry, Pappuss theorem is equivalent to the
commutativity of the ground field K (in the present case, K = R). We now prove an affine
version of Desarguess theorem.
Lemma 19.12. Given any affine space E, and given any two triangles (a, b, c) and (a0 , b0 , c0 ),
where a, b, c, a0 , b0 , c0 are all distinct, if ha, bi and ha0 , b0 i are parallel and hb, ci and hb0 , c0 i are
parallel, then ha, ci and ha0 , c0 i are parallel iff the lines ha, a0 i, hb, b0 i, and hc, c0 i are either
parallel or concurrent (i.e., intersect in a common point).
536
b
a
c!
b!
a!
D!
identical. Since a0 c0 and b0 c0 are linearly independent, these lines have a unique intersection,
which must be c00 = c0 .
The direction where it is assumed that the lines ha, a0 i, hb, b0 i and hc, c0 i, are either parallel
or concurrent is left as an exercise (in fact, the proof is quite similar).
537
b!
b
c
c!
19.10
Affine Hyperplanes
We now consider affine forms and affine hyperplanes. In Section 19.5 we observed that the
set L of solutions of an equation
ax + by = c
is an affine subspace of A2 of dimension 1, in fact, a line (provided that a and b are not both
null). It would be equally easy to show that the set P of solutions of an equation
ax + by + cz = d
is an affine subspace of A3 of dimension 2, in fact, a plane (provided that a, b, c are not all
null). More generally, the set H of solutions of an equation
1 x1 + + m xm =
is an affine subspace of Am , and if 1 , . . . , m are not all null, it turns out that it is a subspace
of dimension m 1 called a hyperplane.
We can interpret the equation
1 x1 + + m xm =
in terms of the map f : Rm R defined such that
f (x1 , . . . , xm ) = 1 x1 + + m xm
for all (x1 , . . . , xm ) Rm . It is immediately verified that this map is affine, and the set H of
solutions of the equation
1 x1 + + m xm =
538
is the null set, or kernel, of the affine map f : Am R, in the sense that
H = f 1 (0) = {x Am | f (x) = 0},
where x = (x1 , . . . , xm ).
Thus, it is interesting to consider affine forms, which are just affine maps f : E R
from an affine space to R. Unlike linear forms f , for which Ker f is never empty (since it
always contains the vector 0), it is possible that f 1 (0) = for an affine form f . Given an
affine map f : E R, we also denote f 1 (0) by Ker f , and we call it the kernel of f . Recall
that an (affine) hyperplane is an affine subspace of codimension 1. The relationship between
affine hyperplanes and affine forms is given by the following lemma.
Lemma 19.13. Let E be an affine space. The following properties hold:
(a) Given any nonconstant affine form f : E R, its kernel H = Ker f is a hyperplane.
(b) For any hyperplane H in E, there is a nonconstant affine form f : E R such that
H = Ker f . For any other affine form g : E R such that H = Ker g, there is some
R such that g = f (with 6= 0).
(c) Given any hyperplane H in E and any (nonconstant) affine form f : E R such that
H = Ker f , every hyperplane H 0 parallel to H is defined by a nonconstant affine form
g such that g(a) = f (a) , for all a E and some R.
Proof. The proof is straightforward, and is omitted. It is also given in Gallier [43].
When E is of dimension n, given an affine frame (a0 , (u1 , . . . , un )) of E with origin
a0 , recall from Definition 19.5 that every point of E can be expressed uniquely as x =
a0 + x1 u1 + + xn un , where (x1 , . . . , xn ) are the coordinates of x with respect to the affine
frame (a0 , (u1 , . . . , un )).
Also recall that every linear form f is such that f (x) = 1 x1 + + n xn , for every
x = x1 u1 + + xn un and some 1 , . . . , n R. Since an affine form f : E R satisfies the
property f (a0 + x) = f (a0 ) + f (x), denoting f (a0 + x) by f (x1 , . . . , xn ), we see that we have
f (x1 , . . . , xn ) = 1 x1 + + n xn + ,
where = f (a0 ) R and 1 , . . . , n R. Thus, a hyperplane is the set of points whose
coordinates (x1 , . . . , xn ) satisfy the (affine) equation
1 x1 + + n xn + = 0.
19.11
539
In this section we take a closer look at the intersection of affine subspaces. This subsection
can be omitted at first reading.
First, we need a result of linear algebra. Given a vector space E and any two subspaces M
and N , there are several interesting linear maps. We have the canonical injections i : M
M +N and j : N M +N , the canonical injections in1 : M M N and in2 : N M N ,
and thus, injections f : M N M N and g : M N M N , where f is the composition
of the inclusion map from M N to M with in1 , and g is the composition of the inclusion
map from M N to N with in2 . Then, we have the maps f + g : M N M N , and
i j : M N M + N.
Lemma 19.14. Given a vector space E and any two subspaces M and N , with the definitions
above,
f +g
ij
0 M N M N M + N 0
is a short exact sequence, which means that f + g is injective, i j is surjective, and that
Im (f + g) = Ker (i j). As a consequence, we have the Grassmann relation
dim(M ) + dim(N ) = dim(M + N ) + dim (M N ).
Proof. It is obvious that i j is surjective and that f + g is injective. Assume that (i
j)(u + v) = 0, where u M , and v N . Then, i(u) = j(v), and thus, by definition of i and
j, there is some w M N , such that i(u) = j(v) = w M N . By definition of f and
g, u = f (w) and v = g(w), and thus Im (f + g) = Ker (i j), as desired. The second part
of the lemma follows from standard results of linear algebra (see Artin [4], Strang [105], or
Lang [67]).
We now prove a simple lemma about the intersection of affine subspaces.
Lemma 19.15. Given any affine space E, for any two nonempty affine subspaces M and
N , the following facts hold:
and M N = {0}.
| x M } and
ax
N = { by | y N }, if M N 6= , for any c M N we have
Since M = {
ab =
ac bc, with
ac M and bc N , and thus, ab M + N . Conversely, assume that
540
+
ab M + N for some a M and some b N . Then ab =
ax
by, for some x M and
some y N . But we also have
+
+
ab =
ax
xy
yb,
(y, 1). Thus x also belongs to N , since N being an affine subspace, it is closed under
barycenters. Thus, x M N , and M N 6= .
(2) Note that in general, if M N 6= , then
M N = M N,
because
M N = { ab | a, b M N } = { ab | a, b M } { ab | a, b N } = M N .
M N = c + M N for any c M N .
From this it follows that if M N 6= , then M N consists of a single point iff M N = {0}.
This fact together with what we proved in (1) proves (2).
(3) This is left as an easy exercise.
Remarks:
(1) The proof of Lemma 19.15 shows that if M N 6= , then ab M + N for all a M
and all b N .
(2) Lemma 19.15 implies that for any two nonempty affine subspaces M and N , if E =
for all a M and all b N , and since M N = {0}, the result follows from part (2)
of the lemma.
We can now state the following lemma.
Lemma 19.16. Given an affine space E and any two nonempty affine subspaces M and N ,
if S is the least affine subspace containing M and N , then the following properties hold:
(1) If M N = , then
and
(2) If M N 6= , then
Proof. The proof is not difficult, using Lemma 19.15 and Lemma 19.14, but we leave it as
an exercise.
541
19.12. PROBLEMS
19.12
Problems
Problem 19.1. Given a triangle (a, b, c), give a geometric construction of the barycenter of
the weighted points (a, 41 ), (b, 41 ), and (c, 12 ). Give a geometric construction of the barycenter
of the weighted points (a, 32 ), (b, 23 ), and (c, 2).
Problem 19.2. Given a tetrahedron (a, b, c, d) and any two distinct points x, y {a, b, c, d},
let let mx,y be the middle of the edge (x, y). Prove that the barycenter g of the weighted points
(a, 14 ), (b, 41 ), (c, 14 ), and (d, 14 ) is the common intersection of the line segments (ma,b , mc,d ),
(ma,c , mb,d ), and (ma,d , mb,c ). Show that if gd is the barycenter of the weighted points
(a, 31 ), (b, 13 ), (c, 13 ), then g is the barycenter of (d, 14 ) and (gd , 34 ).
Problem 19.3. Let E be a nonempty set, and E a vector space and assume that there is a
function : E E E , such that if we denote (a, b) by ab, the following properties hold:
(1) ab + bc =
ac, for all a, b, c E;
(2) For every a E, the map a : E E defined such that for every b E, a (b) = ab,
is a bijection.
Note. We showed in the text that an affine space (E, E , +) satisfies the properties stated
above. Thus, we obtain an equivalent characterization of affine spaces.
Problem 19.4. Given any three points a, b, c in the affine plane A2 , letting (a1 , a2 ), (b1 , b2 ),
and (c1 , c2 ) be the coordinates of a, b, c, with respect to the standard affine frame for A2 ,
prove that a, b, c are collinear iff
a1 b 1 c 1
a2 b2 c2 = 0,
1 1 1
i.e., the determinant is null.
Letting (a0 , a1 , a2 ), (b0 , b1 , b2 ), and (c0 , c1 , c2 ) be the barycentric coordinates of a, b, c with
respect to the standard affine frame for A2 , prove that a, b, c are collinear iff
a0 b 0 c 0
a1 b1 c1 = 0.
a2 b 2 c 2
542
Given any four points a, b, c, d in the affine space A3 , letting (a1 , a2 , a3 ), (b1 , b2 , b3 ), (c1 , c2 , c3 ),
and (d1 , d2 , d3 ) be the coordinates of a, b, c, d, with respect to the standard affine frame for
A3 , prove that a, b, c, d are coplanar iff
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3 = 0,
1 1 1 1
i.e., the determinant is null.
Letting (a0 , a1 , a2 , a3 ), (b0 , b1 , b2 , b3 ), (c0 , c1 , c2 , c3 ), and (d0 , d1 , d2 , d3 ) be the barycentric
coordinates of a, b, c, d, with respect to the standard affine frame for A3 , prove that a, b, c, d
are coplanar iff
a0 b 0 c 0 d 0
a1 b 1 c 1 d 1
a2 b2 c2 d2 = 0.
a3 b 3 c 3 d 3
Problem 19.5. The function f : A A3 given by
t 7 (t, t2 , t3 )
defines what is called a twisted cubic curve. Given any four pairwise distinct values t1 , t2 , t3 , t4 ,
prove that the points f (t1 ), f (t2 ), f (t3 ), and f (t4 ) are not coplanar.
Hint. Have you heard of the Vandermonde determinant?
Problem 19.6. For any two distinct points a, b A2 of barycentric coordinates (a0 , a1 , a2 )
and (b0 , b1 , b2 ) with respect to any given affine frame (O, i, j), show that the equation of the
line ha, bi determined by a and b is
a0 b 0 x
a1 b1 y = 0,
a2 b 2 z
or, equivalently,
(a1 b2 a2 b1 )x + (a2 b0 a0 b2 )y + (a0 b1 a1 b0 )z = 0,
where (x, y, z) are the barycentric coordinates of the generic point on the line ha, bi.
Prove that the equation of a line in barycentric coordinates is of the form
ux + vy + wz = 0,
where u 6= v or v 6= w or u 6= w. Show that two equations
ux + vy + wz = 0 and u0 x + v 0 y + w0 z = 0
543
19.12. PROBLEMS
represent the same line in barycentric coordinates iff (u0 , v 0 , w0 ) = (u, v, w) for some R
(with 6= 0).
A triple (u, v, w) where u 6= v or v 6= w or u =
6 w is called a system of tangential
coordinates of the line defined by the equation
ux + vy + wz = 0.
Problem 19.7. Given two lines D and D0 in A2 defined by tangential coordinates (u, v, w)
and (u0 , v 0 , w0 ) (as defined in Problem 19.6), let
u v w
0 0
d = u v w0 = vw0 wv 0 + wu0 uw0 + uv 0 vu0 .
1 1 1
(a) Prove that D and D0 have a unique intersection point iff d 6= 0, and that when it
exists, the barycentric coordinates of this intersection point are
1
(vw0 wv 0 , wu0 uw0 , uv 0 vu0 ).
d
(b) Letting (O, i, j) be any affine frame for A2 , recall that when x + y + z = 0, for any
point a, the vector
xaO + y ai + z aj
is independent of a and equal to
y Oi + z Oj = (y, z).
The triple (x, y, z) such that x + y + z = 0 is called the barycentric coordinates of the vector
544
u v w
0
u v 0 w0 = 0.
00 00
u v w00
Problem 19.8. Let (A, B, C) be a triangle in A2 . Let M, N, P be three points respectively
on the lines BC, CA, and AB, of barycentric coordinates (0, m0 , m00 ), (n, 0, n00 ), and (p, p0 , 0),
w.r.t. the affine frame (A, B, C).
(a) Assuming that M 6= C, N 6= A, and P 6= B, i.e., m0 n00 p 6= 0, show that
MB
MC
NC
NA
PA
m00 np0
=
m0 n00 p
PB
MB
MC
NC
NA
PA
= 1.
PB
(c) Prove Cevas theorem: The lines AM, BN, CP have a unique intersection point or
are parallel iff
m00 np0 m0 n00 p = 0.
When M 6= C, N 6= A, and P 6= B, this is equivalent to
MB
MC
NC
NA
PA
= 1.
PB
Problem 19.9. This problem uses notions and results from Problems 19.6 and 19.7. In view
of (a) and (b) of Problem 19.7, it is natural to extend the notion of barycentric coordinates
of a point in A2 as follows. Given any affine frame (a, b, c) in A2 , we will say that the
barycentric coordinates (x, y, z) of a point M , where x + y + z = 1, are the normalized
barycentric coordinates of M . Then, any triple (x, y, z) such that x + y + z 6= 0 is also called
a system of barycentric coordinates for the point of normalized barycentric coordinates
1
(x, y, z).
x+y+z
With this convention, the intersection of the two lines D and D0 is either a point or a vector,
in both cases of barycentric coordinates
(vw0 wv 0 , wu0 uw0 , uv 0 vu0 ).
545
19.12. PROBLEMS
When the above is a vector, we can think of it as a point at infinity (in the direction of the
line defined by that vector).
Let (D0 , D00 ), (D1 , D10 ), and (D2 , D20 ) be three pairs of six distinct lines, such that the
four lines belonging to any union of two of the above pairs are neither parallel nor concurrent
(have a common intersection point). If D0 and D00 have a unique intersection point, let M be
this point, and if D0 and D00 are parallel, let M denote a nonnull vector defining the common
direction of D0 and D00 . In either case, let (m, m0 , m00 ) be the barycentric coordinates of M ,
as explained at the beginning of the problem. We call M the intersection of D0 and D00 .
Similarly, define N = (n, n0 , n00 ) as the intersection of D1 and D10 , and P = (p, p0 , p00 ) as the
intersection of D2 and D20 .
Prove that
m n p
0
m n0 p 0 = 0
00 00 00
m n p
iff either
(i) (D0 , D00 ), (D1 , D10 ), and (D2 , D20 ) are pairs of parallel lines; or
(ii) the lines of some pair (Di , Di0 ) are parallel, each pair (Dj , Dj0 ) (with j 6= i) has a unique
intersection point, and these two intersection points are distinct and determine a line
parallel to the lines of the pair (Di , Di0 ); or
(iii) each pair (Di , Di0 ) (i = 0, 1, 2) has a unique intersection point, and these points M, N, P
are distinct and collinear.
Problem 19.10. Prove the following version of Desarguess theorem. Let A, B, C, A0 , B 0 , C 0
be six distinct points of A2 . If no three of these points are collinear, then the lines AA0 , BB 0 ,
and CC 0 are parallel or collinear iff the intersection points M, N, P (in the sense of Problem
19.7) of the pairs of lines (BC, B 0 C 0 ), (CA, C 0 A0 ), and (AB, A0 B 0 ) are collinear in the sense
of Problem 19.9.
Problem 19.11. Prove the following version of Pappuss theorem. Let D and D0 be distinct
lines, and let A, B, C and A0 , B 0 , C 0 be distinct points respectively on D and D0 . If these
points are all distinct from the intersection of D and D0 (if it exists), then the intersection
points (in the sense of Problem 19.7) of the pairs of lines (BC 0 , CB 0 ), (CA0 , AC 0 ), and
(AB 0 , BA0 ) are collinear in the sense of Problem 19.9.
Problem 19.12. The purpose of this problem is to prove Pascals theorem for the nondegenerate conics. In the affine plane A2 , a conic is the set of points of coordinates (x, y) such
that
x2 + y 2 + 2xy + 2x + 2y + = 0,
where 6= 0 or 6= 0 or 6= 0. We can write
(x, y, 1)
x
y = 0.
546
B = ,
1 0 0
C = 0 1 0 ,
1 1 1
X = y .
z
(a) Letting A = C > BC, prove that the equation of the conic becomes
X > AX = 0.
Prove that A is symmetric, that det(A) = det(B), and that X > AX is homogeneous of degree
2. The equation X > AX = 0 is called the homogeneous equation of the conic.
We say that a conic of homogeneous equation X > AX = 0 is nondegenerate if det(A) 6= 0,
and degenerate if det(A) = 0. Show that this condition does not depend on the choice of the
affine frame.
(b) Given an affine frame (A, B, C), prove that any conic passing through A, B, C has
an equation of the form
ayz + bxz + cxy = 0.
Prove that a conic containing more than one point is degenerate iff it contains three distinct
collinear points. In this case, the conic is the union of two lines.
(c) Prove Pascals theorem. Given any six distinct points A, B, C, A0 , B 0 , C 0 , if no three of
the above points are collinear, then a nondegenerate conic passes through these six points iff
the intersection points M, N, P (in the sense of Problem 19.7) of the pairs of lines (BC 0 , CB 0 ),
(CA0 , AC 0 ) and (AB 0 , BA0 ) are collinear in the sense of Problem 19.9.
Hint. Use the affine frame (A, B, C), and let (a, a0 , a00 ), (b, b0 , b00 ), and (c, c0 , c00 ) be the
barycentric coordinates of A0 , B 0 , C 0 respectively, and show that M, N, P have barycentric
coordinates
(bc, cb0 , c00 b), (c0 a, c0 a0 , c00 a0 ), (ab00 , a00 b0 , a00 b00 ).
Problem 19.13. The centroid of a triangle (a, b, c) is the barycenter of (a, 31 ), (b, 13 ), (c, 13 ).
If an affine map takes the vertices of triangle 1 = {(0, 0), (6, 0), (0, 9)} to the vertices of
triangle 2 = {(1, 1), (5, 4), (3, 1)}, does it also take the centroid of 1 to the centroid of
2 ? Justify your answer.
Problem 19.14. Let E be an affine space over R, and let (a1 , . . . , an ) be any n 3 points
in E. Let (1 , . . . , n ) be any n scalars in R, with 1 + + n = 1. Show that there must
be some i, 1 i n, such that i 6= 1. To simplify the notation, assume that 1 6= 1. Show
that the barycenter 1 a1 + + n an can be obtained by first determining the barycenter b
of the n 1 points a2 , . . . , an assigned some appropriate weights, and then the barycenter of
547
19.12. PROBLEMS
a1 and b assigned the weights 1 and 2 + + n . From this, show that the barycenter of
any n 3 points can be determined by repeated computations of barycenters of two points.
Deduce from the above that a nonempty subset V of E is an affine subspace iff whenever V
contains any two points x, y V , then V contains the entire line (1 )x + y, R.
Problem 19.15. Assume that K is a field such that 2 = 1 + 1 6= 0, and let E be an affine
space over K. In the case where 1 + + n = 1 and i = 1, for 1 i n and n 3,
show that the barycenter a1 + a2 + + an can still be computed by repeated computations
of barycenters of two points.
Finally, assume that the field K contains at least three elements (thus, there is some
K such that 6= 0 and 6= 1, but 2 = 1 + 1 = 0 is possible). Prove that the barycenter
of any n 3 points can be determined by repeated computations of barycenters of two
points. Prove that a nonempty subset V of E is an affine subspace iff whenever V contains
any two points x, y V , then V contains the entire line (1 )x + y, K.
Hint. When 2 = 0, 1 + + n = 1 and i = 1, for 1 i n, show that n must be
odd, and that the problem reduces to computing the barycenter of three points in two steps
involving two barycenters. Since there is some K such that 6= 0 and 6= 1, note that
1 and (1 )1 both exist, and use the fact that
1
+
= 1.
1 1
Problem 19.16. (i) Let (a, b, c) be three points in A2 , and assume that (a, b, c) are not
collinear. For any point x A2 , if x = 0 a + 1 b + 2 c, where (0 , 1 , 2 ) are the barycentric
coordinates of x with respect to (a, b, c), show that
det(xb, bc)
0 =
,
det( ab,
ac)
det(
ax,
ac)
1 =
,
det( ab,
ax)
2 =
.
det( ab,
ac)
Conclude that 0 , 1 , 2 are certain signed ratios of the areas of the triangles (a, b, c), (x, a, b),
(x, a, c), and (x, b, c).
(ii) Let (a, b, c) be three points in A3 , and assume that (a, b, c) are not collinear. For any
point x in the plane determined by (a, b, c), if x = 0 a + 1 b + 2 c, where (0 , 1 , 2 ) are the
barycentric coordinates of x with respect to (a, b, c), show that
xb bc
,
0 =
ab
ac
ax
ac
1 =
,
ab ac
ab ax
2 =
.
ab
ac
Given any point O not in the plane of the triangle (a, b, c), prove that
det(Oa, Ob, Ox)
2 =
,
det(Oa, Ob, Oc)
548
and
(iii) Let (a, b, c, d) be four points in A3 , and assume that (a, b, c, d) are not coplanar. For
any point x A3 , if x = 0 a + 1 b + 2 c + 3 d, where (0 , 1 , 2 , 3 ) are the barycentric
coordinates of x with respect to (a, b, c, d), show that
det(
ax,
ac, ad)
det( ab,
ax, ad)
det( ab,
ac, ax)
1 =
, 2 =
, 3 =
,
det( ab,
ac, ad)
det( ab,
ac, ad)
det( ab,
ac, ad)
and
.
det( ab,
ac, ad)
Conclude that 0 , 1 , 2 , 3 are certain signed ratios of the volumes of the five tetrahedra
(a, b, c, d), (x, a, b, c), (x, a, b, d), (x, a, c, d), and (x, b, c, d).
(iv) Let (a0 , . . . , am ) be m+1 points in Am , and assume that they are affinely independent.
For any point x Am , if x = 0 a0 + + m am , where (0 , . . . , m ) are the barycentric
coordinates of x with respect to (a0 , . . . , am ), show that
i =
for every i, 1 i m, and
det(
a
0 a1 , . . . , a0 ai1 , a0 x, a0 ai+1 , . . . , a0 am )
det(
a
0 a1 , . . . , a0 ai1 , a0 ai , a0 ai+1 , . . . , a0 am )
,
det(
xa
1 a1 a2 , . . . , a1 am )
0 =
.
det(
a
0 a1 , . . . , a0 ai , . . . , a0 am )
Conclude that i is the signed ratio of the volumes of the simplexes (a0 , . . ., x, . . . am ) and
(a0 , . . . , ai , . . . am ), where 0 i m.
Problem 19.17. With respect to the standard affine frame for the plane A2 , consider the
three geometric transformations f1 , f2 , f3 defined by
1
3
3
3
1
3
0
0
x = x
y+ , y =
x y+
,
4
4
4
4
4
4
3
1
1
3
3
3
0
0
y , y =
x y+
,
x = x+
4
4
4
4
4
4
1
1
3
x0 =
x, y 0 = y +
.
2
2
2
(a) Prove that these maps are affine. Can you describe geometrically what their action
is (rotation, translation, scaling)?
549
19.12. PROBLEMS
(b) Given any polygonal line L, define the following sequence of polygonal lines:
S0 = L,
Sn+1 = f1 (Sn ) f2 (Sn ) f3 (Sn ).
Construct S1 starting from the line segment L = ((1, 0), (1, 0)).
Can you figure out what Sn looks like in general? (You may want to write a computer
program.) Do you think that Sn has a limit?
Problem 19.18. In the plane A2 , with respect to the standard affine frame, a point of
coordinates (x, y) can be represented as the complex number z = x + iy. Consider the set
of geometric transformations of the form
z 7 az + b,
where a, b are complex numbers such that a 6= 0.
(a) Prove that these maps are affine. Describe what these maps do geometrically.
(b) Prove that the above set of maps is a group under composition.
(c) Consider the set of geometric transformations of the form
z 7 az + b or z 7 az + b,
where a, b are complex numbers such that a 6= 0, and where z = x iy if z = x + iy.
Describe what these maps do geometrically. Prove that these maps are affine and that this
set of maps is a group under composition.
Problem 19.19. Given a group G, a subgroup H of G is called a normal subgroup of G iff
xHx1 = H for all x G (where xHx1 = {xhx1 | h H}).
(i) Given any two subgroups H and K of a group G, let
HK = {hk | h H, k K}.
Prove that every x HK can be written in a unique way as x = hk for h H and k K
iff H K = {1}, where 1 is the identity element of G.
(ii) If H and K are subgroups of G, and H is a normal subgroup of G, prove that HK
is a subgroup of G. Furthermore, if G = HK and H K = {1}, prove that G is isomorphic
to H K under the multiplication operation
(h1 , k1 ) (h2 , k2 ) = (h1 k1 h2 k11 , k1 k2 ).
When G = HK, where H, K are subgroups of G, H is a normal subgroup of G, and
H K = {1}, we say that G is the semidirect product of H and K.
(iii) Let (E, E ) be an affine space. Recall that the affine group of E, denoted by GA(E),
is the set of affine bijections of E, and that the linear group of E , denoted by GL( E ), is
the group of bijective linear maps of E . The map f 7 f defines a group homomorphism
550
L : GA(E) GL( E ), and the kernel of this map is the set of translations on E, denoted
as T (E). Prove that T (E) is a normal subgroup of GA(E).
(iv) For any a E, let
GAa (E) = {f GA(E) | f (a) = a},
the set of affine bijections leaving a fixed. Prove that that GAa (E) is a subgroup of GA(E),
and that GAa (E) is isomorphic to GL( E ). Prove that GA(E) is isomorphic to the direct
product of T (E) and GAa (E).
Hint. Note that if u = f (a)a and tu is the translation associated with the vector u, then
tu f GAa (E) (where the translation tu is defined such that tu (a) = a+u for every a E).
(v) Given a group G, let Aut(G) denote the set of homomorphisms f : G G. Prove
that the set Aut(G) is a group under composition (called the group of automorphisms of G).
Given any two groups H and K and a homomorphism : K Aut(H), we define H K
as the set H K under the multiplication operation
(h1 , k1 ) (h2 , k2 ) = (h1 (k1 )(h2 ), k1 k2 ).
Prove that H K is a group.
Hint. The inverse of (h, k) is ((k 1 )(h1 ), k 1 ).
Prove that the group H K is the semidirect product of the subgroups
{(h, 1) | h H} and {(1, k) | k K}. The group H K is also called the semidirect
product of H and K relative to .
Note. It is natural to identify {(h, 1) | h H} with H and {(1, k) | k K} with K.
If G is the semidirect product of two subgroups H and K as defined in (ii), prove that
the map : K Aut(H) defined by conjugation such that
(k)(h) = khk 1
is a homomorphism, and that G is isomorphic to H K.
(vii) Let SL( E ) be the subgroup of GL( E ) consisting of the linear maps such that
det(f ) = 1 (the special linear group of E ), and let SA(E) be the subgroup of GA(E) (the
special affine group of E) consisting of the affine maps f such that f SL( E ). Prove that
(viii) Assume that (E, E ) is a Euclidean affine space. Let SO( E ) be the special or
thogonal group of E (the isometries with determinant +1), and let SE(E) be the subgroup
of SA(E) (the special Euclidean group of E) consisting of the affine isometries f such that
551
19.12. PROBLEMS
Problem 19.20. The purpose of this problem is to study certain affine maps of A2 .
(1) Consider affine maps of the form
x1
cos sin
x1
b
7
+ 1 .
x2
sin cos
x2
b2
Prove that such maps have a unique fixed point c if 6= 2k, for all integers k. Show that
these are rotations of center c, which means that with respect to a frame with origin c (the
unique fixed point), these affine maps are represented by rotation matrices.
(2) Consider affine maps of the form
x1
cos sin
x1
b
7
+ 1 .
x2
sin cos
x2
b2
Prove that such maps have a unique fixed point iff ( + ) cos 6= 1 + . Prove that if
= 1 and > 0, there is some angle for which either there is no fixed point, or there are
infinitely many fixed points.
(3) Prove that the affine map
1
8/5 6/5
x1
x1
+
7
x2
1
x2
3/10 2/5
has no fixed point.
(4) Prove that an arbitrary affine map
x1
a1 a2
x1
b
7
+ 1
x2
a3 a4
x2
b2
has a unique fixed point iff the matrix
a1 1
a2
a3
a4 1
is invertible.
Problem 19.21. Let (E, E ) be any affine space of finite dimension. For every affine map
f : E E, let Fix(f ) = {a E | f (a) = a} be the set of fixed points of f .
(i) Prove that if Fix(f ) 6= , then Fix(f ) is an affine subspace of E such that for every
b Fix(f ),
f (a) a = f () + f (a) a,
for any two points , a E.
552
Problem 19.22. Given two affine spaces (E, E ) and (F, F ), let A(E, F ) be the set of all
affine maps f : E F .
(i) Prove that the set A(E, F ) (viewing F as an affine space) is a vector space under
the operations f + g and f defined such that
(f + g)(a) = f (a) + g(a),
(f )(a) = f (a),
for all a E.
(ii) Define an action
A(E, F ),
(f + h)(a) = f (a) + h(a).
Hint. Show that for any two affine maps f, g A(E, F ), the map f g defined such that
f g(a) = f (a)g(a)
(for every a E) is affine, and thus f g A(E, F ). Furthermore, f g is the unique map in
f + f g = g.
(iii) If E has dimension m and F has dimension n, prove that A(E, F ) has dimension
n + mn = n(m + 1).
Problem 19.23. Let (c1 , . . . , cn ) be n 3 points in Am (where m 2). Investigate whether
there is a closed polygon with n vertices (a1 , . . . , an ) such that ci is the middle of the edge
(ai , ai+1 ) for every i with 1 i n 1, and cn is the middle of the edge (an , a0 ).
Hint. The parity (odd or even) of n plays an important role. When n is odd, there is a
unique solution, and when n is even, there are no solutions or infinitely many solutions.
Clarify under which conditions there are infinitely many solutions.
Problem 19.24. Given an affine space E of dimension n and an affine frame (a0 , . . . , an ) for
E, let f : E E and g : E E be two affine maps represented by the two (n + 1) (n + 1)
matrices
A b
B c
and
0 1
0 1
w.r.t. the frame (a0 , . . . , an ). We also say that f and g are represented by (A, b) and (B, c).
19.12. PROBLEMS
553
b = a00 f (a0 ) is expressed over the basis (a00 a01 , . . . , a00 a0n ), and ai j is the ith coefficient of
0 0
0 0
0
0
f (
a
0 aj ) over the basis (a0 a1 , . . . , a0 an ). Given any three frames (a0 , . . . , an ), (a0 , . . . , an ),
and (a000 , . . . , a00n ), for any two affine maps f : E E and g : E E, if f has the matrix
representation (A, b) w.r.t. (a0 , . . . , an ) and (a00 , . . . , a0n ) and g has the matrix representation
(B, c) w.r.t. (a00 , . . . , a0n ) and (a000 , . . . , a00n ), prove that g f has the matrix representation
(B, c)(A, b) w.r.t. (a0 , . . . , an ) and (a000 , . . . , a00n ).
(4) Given two affine frames (a0 , . . . , an ) and (a00 , . . . , a0n ) for E, there is a unique affine
map h : E E such that h(ai ) = a0i for i = 0, . . . , n, and we let (P, ) be its associated
matrix representation with respect to the frame (a0 , . . . , an ). Note that = a0 a00 , and that
554
the frame (a0 , . . . , an )), prove that g f is represented by the matrix (A, b)(B, c) w.r.t. the
frame (a0 , . . . , an ).
Remark: Note that this is the opposite of what happens if f and g are both represented
by matrices w.r.t. the fixed frame (a0 , . . . , an ), where g f is represented by the matrix
(B, c)(A, b). The frame (a00 , . . . , a0n ) can be viewed as a moving frame. The above has
applications in robotics, for example to rotation matrices expressed in terms of Euler angles,
or roll, pitch, and yaw.
Problem 19.25. (a) Let E be a vector space, and let U and V be two subspaces of E so
that they form a direct sum E = U V . Recall that this means that every vector x E can
be written as x = u + v, for some unique u U and some unique v V . Define the function
pU : E U (resp. pV : E V ) so that pU (x) = u (resp. pV (x) = v), where x = u + v, as
explained above. Check that that pU and pV are linear.
(b) Now assume that E is an affine space (nontrivial), and let U and V be affine subspaces
qU (a) = p
(a) (resp. qV (a) = p
(a)),
U
V
for every a E.
Prove that qU does not depend on the choice of V (resp. qV does not depend on the
choice of U ). Define the map pU : E U (resp. pV : E V ) so that
pU (a) = a qV (a) (resp. pV (a) = a qU (a)),
for every a E.
Chapter 20
Polynomials, Ideals and PIDs
20.1
Multisets
This chapter contains a review of polynomials and their basic properties. First, multisets
are defined. Polynomials in one variable are defined next. The notion of a polynomial
function in one argument is defined. Polynomials in several variable are defined, and so is
the notion of a polynomial function in several arguments. The Euclidean division algorithm is
presented, and the main consequences of its existence are derived. Ideals are defined, and the
characterization of greatest common divisors of polynomials in one variables (gcds) in terms
of ideals is shown. We also prove the Bezout identity. Next, we consider the factorization of
polynomials in one variables into irreducible factors. The unique factorization of polynomials
in one variable into irreducible factors is shown. Roots of polynomials and their multiplicity
are defined. It is shown that a nonnull polynomial in one variable and of degree m over an
integral domain has at most m roots. The chapter ends with a brief treatment of polynomial
interpolation: Lagrange, Newton, and Hermite interpolants are introduced.
In this chapter, it is assumed that all rings considered are commutative. Recall that a
(commutative) ring A is an integral domain (or an entire ring) if 1 6= 0, and if ab = 0, then
either a = 0 or b = 0, for all a, b A. This second condition is equivalent to saying that if
a 6= 0 and b 6= 0, then ab 6= 0. Also, recall that a 6= 0 is not a zero divisor if ab 6= 0 whenever
b 6= 0. Observe that a field is an integral domain.
Our goal is to define polynomials in one or more indeterminates (or variables) X1 , . . . , Xn ,
with coefficients in a ring A. This can be done in several ways, and we choose a definition
that has the advantage of extending immediately from one to several variables. First, we
need to review the notion of a (finite) multiset.
Definition 20.1. Given a set I, a (finite) multiset over I is any function M : I N such
that M (i) 6= 0 for finitely many i I. The multiset M such that M (i) = 0 for all i I is
the empty multiset, and it is denoted by 0. If M (i) = k 6= 0, we say that i is a member of
M of multiplicity k. The union M1 + M2 of two multisets M1 and M2 is defined such that
(M1 + M2 )(i) = M1 (i) + M2 (i), for every i I. If I is finite, say I = {1, . . . , n}, the multiset
555
556
However, beware that when n 2, the set N(n) of multisets cannot be identified with the
set of strings in {1, . . . , n} , because multiset union is commutative, but concatenation
of strings in {1, . . . , n} is not commutative when n 2. This is because in a multiset
k1 1 + + kn n, the order is irrelevant, whereas in a string, the order is relevant. For
example, 2 1 + 3 2 = 3 2 + 2 1, but 11222 6= 22211, as strings over {1, 2}.
Nevertherless, N(n) and the set Nn of ordered n-tuples under component-wise addition
are isomorphic under the map
k1 1 + + kn n 7 (k1 , . . . , kn ).
Thus, since the notation (k1 , . . . , kn ) is less cumbersome that k1 1 + + kn n, it will be
preferred. We just have to remember that the order of the ki is really irrelevant.
But when I is infinite, beware that N(I) and the set NI of ordered I-tuples are not
isomorphic.
We are now ready to define polynomials.
20.2
Polynomials
557
20.2. POLYNOMIALS
and R = P Q such that
ck =
ai b j .
i+j=k
We define the polynomial ek such that ek (k) = 1 and ek (i) = 0 for i 6= k. We also denote
e0 by 1 when k = 0. Given a polynomial P , the ak = P (k) A are called the coefficients
of P . If P is not the null polynomial, there is a greatest n 0 such that an 6= 0 (and thus,
ak = 0 for all k > n) called the degree of P and denoted by deg(P ). Then, P is written
uniquely as
P = a0 e0 + a1 e1 + + an en .
When P is the null polynomial, we let deg(P ) = .
There is an injection of A into PA (1) given by the map a 7 a1 (recall that 1 denotes e0 ).
There is also an injection of N into PA (1) given by the map k 7 ek . Observe that ek = ek1
(with e01 = e0 = 1). In order to alleviate the notation, we often denote e1 by X, and we call
X a variable (or indeterminate). Then, ek = ek1 is denoted by X k . Adopting this notation,
given a nonnull polynomial P of degree n, if P (k) = ak , P is denoted by
P = a0 + a1 X + + an X n ,
or by
P = an X n + an1 X n1 + + a0 ,
if this is more convenient (the order of the terms does not matter anyway). Sometimes, it
will also be convenient to write a polynomial as
P = a0 X n + a1 X n1 + + an .
The set PA (1) is also denoted by A[X] and a polynomial P may be denoted by P (X).
In denoting polynomials, we will use both upper-case and lower-case letters, usually, P, Q,
R, S, p, q, r, s, but also f, g, h, etc., if needed (as long as no ambiguities arise).
Given a nonnull polynomial P of degree n, the nonnull coefficient an is called the leading
coefficient of P . The coefficient a0 is called the constant term of P . A polynomial of the
form ak X k is called a monomial . We say that ak X k occurs in P if ak 6= 0. A nonzero
polynomial P of degree n is called a monic polynomial (or unitary polynomial, or monic) if
an = 1, where an is its leading coefficient, and such a polynomial can be written as
P = X n + an1 X n1 + + a0
or
P = X n + a1 X n1 + + an .
The choice of the variable X to denote e1 is standard practice, but there is nothing special
about X. We could have chosen Y , Z, or any other symbol, as long as no ambiguities
arise.
558
Formally, the definition of PA (1) has nothing to do with X. The reason for using X is
simply convenience. Indeed, it is more convenient to write a polynomial as P = a0 + a1 X +
+ an X n rather than as P = a0 e0 + a1 e1 + + an en .
We have the following simple but crucial proposition.
Proposition 20.1. Given two nonnull polynomials P (X) = a0 +a1 X + +am X m of degree
m and Q(X) = b0 + b1 X + + bn X n of degree n, if either am or bn is not a zero divisor,
then am bn 6= 0, and thus, P Q 6= 0 and
deg(P Q) = deg(P ) + deg(Q).
In particular, if A is an integral domain, then A[X] is an integral domain.
Proof. Since the coefficient of X m+n in P Q is am bn , and since we assumed that either am or
an is not a zero divisor, we have am bn 6= 0, and thus, P Q 6= 0 and
deg(P Q) = deg(P ) + deg(Q).
Then, it is obvious that A[X] is an integral domain.
It is easily verified that A[X] is a commutative ring, with multiplicative identity 1X 0 = 1.
It is also easily verified that A[X] satisfies all the conditions of Definition 2.9, but A[X] is
not a vector space, since A is not necessarily a field.
A structure satisfying the axioms of Definition 2.9 when K is a ring (and not necessarily a
field) is called a module. As we mentioned in Section 4.2, we will not study modules because
they fail to have some of the nice properties that vector spaces have, and thus, they are
harder to study. For example, there are modules that do not have a basis.
However, when the ring A is a field, A[X] is a vector space. But even when A is just a
ring, the family of polynomials (X k )kN is a basis of A[X], since every polynomial P (X) can
be written in a unique way as P (X) = a0 + a1 X + + an X n (with P (X) = 0 when P (X)
is the null polynomial). Thus, A[X] is a free module.
Next, we want to define the notion of evaluating a polynomial P (X) at some A. For
this, we need a proposition.
Proposition 20.2. Let A, B be two rings and let h : A B be a ring homomorphism.
For any B, there is a unique ring homomorphism : A[X] B extending h such that
(X) = , as in the following diagram (where we denote by h+ the map h+ : A{X} B
such that (h + )(a) = h(a) for all a A and (h + )(X) = ):
/
A[X]
LL
LL
LL
L
h+ LLL
%
A {X}
559
20.2. POLYNOMIALS
Proof. Let (0) = 0, and for every nonull polynomial P (X) = a0 + a1 X + + an X n , let
(P (X)) = h(a0 ) + h(a1 ) + + h(an ) n .
It is easily verified that is the unique homomorphism : A[X] B extending h such that
(X) = .
Taking A = B in Proposition 20.2 and h : A A the identity, for every A, there
is a unique homomorphism : A[X] A such that (X) = , and for every polynomial
P (X), we write (P (X)) as P () and we call P () the value of P (X) at X = . Thus, we
can define a function PA : A A such that PA () = P (), for all A. This function is
called the polynomial function induced by P .
More generally, PB can be defined for any (commutative) ring B such that A B. In
general, it is possible that PA = QA for distinct polynomials P, Q. We will see shortly
conditions for which the map P 7 PA is injective. In particular, this is true for A = R (in
general, any infinite integral domain). We now define polynomials in n variables.
Definition 20.3. Given n 1 and a ring A, the set PA (n) of polynomials over A in n
variables is the set of functions P : N(n) A such that P (k1 , . . . , kn ) 6= 0 for finitely many
(k1 , . . . , kn ) N(n) . The polynomial such that P (k1 , . . . , kn ) = 0 for all (k1 , . . . , kn ) is
the null (or zero) polynomial and it is denoted by 0. We define addition of polynomials,
multiplication by a scalar, and multiplication of polynomials, as follows: Given any three
polynomials P, Q, R PA (n), letting a(k1 ,...,kn ) = P (k1 , . . . , kn ), b(k1 ,...,kn ) = Q(k1 , . . . , kn ),
c(k1 ,...,kn ) = R(k1 , . . . , kn ), for every (k1 , . . . , kn ) N(n) , we define R = P + Q such that
c(k1 ,...,kn ) = a(k1 ,...,kn ) + b(k1 ,...,kn ) ,
R = P , where A, such that
c(k1 ,...,kn ) = a(k1 ,...,kn ) ,
and R = P Q, such that
c(k1 ,...,kn ) =
For every (k1 , . . . , kn ) N(n) , we let e(k1 ,...,kn ) be the polynomial such that
e(k1 ,...,kn ) (k1 , . . . , kn ) = 1 and e(k1 ,...,kn ) (h1 , . . . , hn ) = 0,
for (h1 , . . . , hn ) 6= (k1 , . . . , kn ). We also denote e(0,...,0) by 1. Given a polynomial P , the
a(k1 ,...,kn ) = P (k1 , . . . , kn ) A, are called the coefficients of P . If P is not the null polynomial,
there is a greatest d 0 such that a(k1 ,...,kn ) 6= 0 for some (k1 , . . . , kn ) N(n) , with d =
k1 + + kn , called the total degree of P and denoted by deg(P ). Then, P is written
uniquely as
X
P =
a(k1 ,...,kn ) e(k1 ,...,kn ) .
(k1 ,...,kn )N(n)
560
There is an injection of A into PA (n) given by the map a 7 a1 (where 1 denotes e(0,...,0) ).
There is also an injection of N(n) into PA (n) given by the map (h1 , . . . , hn ) 7 e(h1 ,...,hn ) . Note
that e(h1 ,...,hn ) e(k1 ,...,kn ) = e(h1 +k1 ,...,hn +kn ) . In order to alleviate the notation, let X1 , . . . , Xn
be n distinct variables and denote e(0,...,0,1,0...,0) , where 1 occurs in the position i, by Xi
(where 1 i n). With this convention, in view of e(h1 ,...,hn ) e(k1 ,...,kn ) = e(h1 +k1 ,...,hn +kn ) , the
polynomial e(k1 ,...,kn ) is denoted by X1k1 Xnkn (with e(0,...,0) = X10 Xn0 = 1) and it is called
a primitive monomial . Then, P is also written as
X
P =
a(k1 ,...,kn ) X1k1 Xnkn .
(k1 ,...,kn )N(n)
a monomial a(k1 ,...,kn ) X1k1 Xnkn occurs in the polynomial P if a(k1 ,...,kn ) 6= 0.
A polynomial
P =
X
(k1 ,...,kn )N(n)
is homogeneous of degree d if
deg(X1k1 Xnkn ) = d,
for every monomial a(k1 ,...,kn ) X1k1 Xnkn occurring in P . If P is a polynomial of total degree
d, it is clear that P can be written uniquely as
P = P (0) + P (1) + + P (d) ,
where P (i) is the sum of all monomials of degree i occurring in P , where 0 i d.
561
20.2. POLYNOMIALS
TTTT
TTTT
h+
)
A {X1 , . . . , X
}
Tn
let
(P (X1 , . . . , Xn )) =
562
P
k1
kn
Given any nonnull polynomial P (X1 , . . . , Xn ) =
(k1 ,...,kn )N(n) a(k1 ,...,kn ) X1 Xn in
A[X1 , . . . , Xn ], where n 2, P (X1 , . . . , Xn ) can be uniquely written as
X
P (X1 , . . . , Xn ) =
Qkn (X1 , . . . , Xn1 )Xnkn ,
where each polynomial Qkn (X1 , . . . , Xn1 ) is in A[X1 , . . . , Xn1 ]. Thus, even if A is a field,
A[X1 , . . . , Xn1 ] is not a field, which confirms that it is useful (and necessary!) to consider
polynomials over rings that are not necessarily fields.
It is not difficult to show that A[X1 , . . . , Xn ] and A[X1 , . . . , Xn1 ][Xn ] are isomorphic
rings. This way, it is often possible to prove properties of polynomials in several variables
X1 , . . . , Xn , by induction on the number n of variables. For example, given two nonnull
polynomials P (X1 , . . . , Xn ) of total degree p and Q(X1 , . . . , Xn ) of total degree q, since we
assumed that A is an integral domain, we can prove that
deg(P Q) = deg(P ) + deg(Q),
and that A[X1 , . . . , Xn ] is an integral domain.
Next, we will consider the division of polynomials (in one variable).
20.3
We know that every natural number n 2 can be written uniquely as a product of powers of
prime numbers and that prime numbers play a very important role in arithmetic. It would be
nice if every polynomial could be expressed (uniquely) as a product of irreducible factors.
This is indeed the case for polynomials over a field. The fact that there is a division algorithm
for the natural numbers is essential for obtaining many of the arithmetical properties of the
natural numbers. As we shall see next, there is also a division algorithm for polynomials in
A[X], when A is a field.
Proposition 20.4. Let A be a ring, let f (X), g(X) A[X] be two polynomials of degree
m = deg(f ) and n = deg(g) with f (X) 6= 0, and assume that the leading coefficient am of
f (X) is invertible. Then, there exist unique polynomials q(X) and r(X) in A[X] such that
g = fq + r
and
If n < m, then let q = 0 and r = g. Since deg(g) < deg(f ) and r = g, we have deg(r) <
deg(f ).
563
and thus,
nm
g1 (X) = g(X) bn a1
f (X) = f (X)q1 (X) + r(X),
m X
nm
+ q1 (X), we get
from which, letting q(X) = bn a1
m X
g = fq + r
564
20.4
Note that every a A divides 0. However, it is customary to say that a is a zero divisor
iff ac = 0 for some c 6= 0. With this convention, 0 is a zero divisor unless A = {0} (the
trivial ring), and A is an integral domain iff 0 is the only zero divisor in A.
Given a, b A with a, b 6= 0, if (a) = (b) then there exist c, d A such that a = bc and
b = ad. From this, we get a = adc and b = bcd, that is, a(1 dc) = 0 and b(1 cd) = 0. If A
is an integral domain, we get dc = 1 and cd = 1, that is, c is invertible with inverse d. Thus,
when A is an integral domain, we have b = ad, with d invertible. The converse is obvious, if
b = ad with d invertible, then (a) = (b).
As a summary, if A is an integral domain, for any a, b A with a, b 6= 0, we have (a) = (b)
iff there exists some invertible d A such that b = ad. An invertible element u A is also
called a unit.
565
566
567
(2) Assume that I is a maximal ideal. As in (1), A/I is not the trivial ring (0). Let
[a] 6= 0 in A/I. We need to prove that [a] has a multiplicative inverse. Since [a] 6= 0, we
have a
/ I. Let Ia be the ideal generated by I and a. We have
I Ia
and I 6= Ia ,
since a
/ I, and since I is maximal, this implies that
Ia = A.
However, we know that
Ia = {ax + h | x A, h I},
and thus, there is some x A so that
ax + h = 1,
which proves that [a][x] = 1, as desired.
Conversely, assume that A/I is a field. Again, since A/I is not the trivial ring, I 6= A.
Let J be any proper ideal such that I J, and assume that I 6= J. Thus, there is some
j J I, and since Ker = I, we have (j) 6= 0. Since A/I is a field and is surjective,
there is some k A so that (j)(k) = 1, which implies that
jk 1 = i
for some i J, and since I J and J is an ideal, it follows that 1 = jk i J, showing
that J = A, a contradiction. Therefore, I = J, and I is a maximal ideal.
As a corollary, we obtain the following useful result. It emphasizes the importance of
maximal ideals.
Corollary 20.8. Given any ring A, every maximal ideal I in A is a prime ideal.
Proof. If I is a maximal ideal, then, by Proposition 20.7, the quotient ring A/I is a field.
However, a field is an integral domain, and by Proposition 20.7 (again), I is a prime ideal.
Observe that a ring A is an integral domain iff (0) is a prime ideal. This is an example
of a prime ideal which is not a maximal ideal, as immediately seen in A = Z, where (p) is a
maximal ideal for every prime number p.
A less obvious example of a prime ideal which is not a maximal ideal, is the ideal (X) in
the ring of polynomials Z[X]. Indeed, (X, 2) is also a prime ideal, but (X) is properly
contained in (X, 2).
568
Definition 20.5. An integral domain in which every ideal is a principal ideal is called a
principal ring or principal ideal domain, for short, a PID.
The ring Z is a PID. This is a consequence of the existence of a (Euclidean) division
algorithm. As we shall see next, when K is a field, the ring K[X] is also a principal ring.
However, when n 2, the ring K[X1 , . . . , Xn ] is not principal. For example, in the ring
K[X, Y ], the ideal (X, Y ) generated by X and Y is not principal. First, since (X, Y )
is the set of all polynomials of the form Xq1 + Y q2 , where q1 , q2 K[X, Y ], except when
Xq1 + Y q2 = 0, we have deg(Xq1 + Y q2 ) 1. Thus, 1
/ (X, Y ). Now if there was some p
K[X, Y ] such that (X, Y ) = (p), since 1
/ (X, Y ), we must have deg(p) 1. But we would
also have X = pq1 and Y = pq2 , for some q1 , q2 K[X, Y ]. Since deg(X) = deg(Y ) = 1,
this is impossible.
Even though K[X, Y ] is not a principal ring, a suitable version of unique factorization in
terms of irreducible factors holds. The ring K[X, Y ] (and more generally K[X1 , . . . , Xn ]) is
what is called a unique factorization domain, for short, UFD, or a factorial ring.
From this point until Definition 20.10, we consider polynomials in one variable over a
field K.
Remark: Although we already proved part (1) of Proposition 20.9 in a more general situation above, we reprove it in the special case of polynomials. This may offend the purists,
but most readers will probably not mind.
Proposition 20.9. Let K be a field. The following properties hold:
(1) For any two nonzero polynomials f, g K[X], (f ) = (g) iff there is some 6= 0 in K
such that g = f .
(2) For every nonnull ideal I in K[X], there is a unique monic polynomial f K[X] such
that I = (f ).
Proof. (1) If (f ) = (g), there are some nonzero polynomials q1 , q2 K[X] such that g = f q1
and f = gq2 . Thus, we have f = f q1 q2 , which implies f (1 q1 q2 ) = 0. Since K is a
field, by Proposition 20.1, K[X] has no zero divisor, and since we assumed f 6= 0, we must
have q1 q2 = 1. However, if either q1 or q2 is not a constant, by Proposition 20.1 again,
deg(q1 q2 ) = deg(q1 ) + deg(q2 ) 1, contradicting q1 q2 = 1, since deg(1) = 0. Thus, both
q1 , q2 K {0}, and (1) holds with = q1 . In the other direction, it is obvious that g = f
implies that (f ) = (g).
(2) Since we are assuming that I is not the null ideal, there is some polynomial of smallest
degree in I, and since K is a field, by suitable multiplication by a scalar, we can make sure
that this polynomial is monic. Thus, let f be a monic polynomial of smallest degree in I.
By (ID2), it is clear that (f ) I. Now, let g I. Using the Euclidean algorithm, there
exist unique q, r K[X] such that
g = qf + r
569
In particular, note that f and g are relatively prime when f is a nonzero constant
polynomial (a scalar 6= 0 in K) and g is any nonzero polynomial.
We can characterize gcds of polynomials as follows.
Proposition 20.10. Let K be a field and let f, g K[X] be any two nonzero polynomials.
For every polynomial d K[X], the following properties are equivalent:
(1) The polynomial d is a gcd of f and g.
(2) The polynomial d divides f and g and there exist u, v K[X] such that
d = uf + vg.
(3) The ideals (f ), (g), and (d) satisfy the equation
(d) = (f ) + (g).
In addition, d 6= 0, and d is unique up to multiplication by a nonzero scalar in K.
Proof. Given any two nonzero polynomials u, v K[X], observe that u divides v iff (v) (u).
Now, (2) can be restated as (f ) (d), (g) (d), and d (f ) + (g), which is equivalent to
(d) = (f ) + (g), namely (3).
If (2) holds, since d = uf + vg, whenever h K[X] divides f and g, then h divides d,
and d is a gcd of f and g.
570
Assume that d is a gcd of f and g. Then, since d divides f and d divides g, we have
(f ) (d) and (g) (d), and thus (f ) + (g) (d), and (f ) + (g) is nonempty since f and
g are nonzero. By Proposition 20.9, there exists a monic polynomial d1 K[X] such that
(d1 ) = (f ) + (g). Then, d1 divides both f and g, and since d is a gcd of f and g, then d1
divides d, which shows that (d) (d1 ) = (f ) + (g). Consequently, (f ) + (g) = (d), and (3)
holds.
Since (d) = (f ) + (g) and f and g are nonzero, the last part of the proposition is
obvious.
As a consequence of Proposition 20.10, two nonzero polynomials f, g K[X] are relatively prime iff there exist u, v K[X] such that
uf + vg = 1.
The identity
d = uf + vg
of part (2) of Proposition 20.10 is often called the Bezout identity.
We derive more useful consequences of Proposition 20.10.
Proposition 20.11. Let K be a field and let f, g K[X] be any two nonzero polynomials.
For every gcd d K[X] of f and g, the following properties hold:
(1) For every nonzero polynomial q K[X], the polynomial dq is a gcd of f q and gq.
(2) For every nonzero polynomial q K[X], if q divides f and g, then d/q is a gcd of f /q
and g/q.
Proof. (1) By Proposition 20.10 (2), d divides f and g, and there exist u, v K[X], such
that
d = uf + vg.
Then, dq divides f q and gq, and
dq = uf q + vgq.
By Proposition 20.10 (2), dq is a gcd of f q and gq. The proof of (2) is similar.
The following proposition is used often.
Proposition 20.12. (Euclids proposition) Let K be a field and let f, g, h K[X] be any
nonzero polynomials. If f divides gh and f is relatively prime to g, then f divides h.
Proof. From Proposition 20.10, f and g are relatively prime iff there exist some polynomials
u, v K[X] such that
uf + vg = 1.
Then, we have
uf h + vgh = h,
and since f divides gh, it divides both uf h and vgh, and so, f divides h.
571
Proposition 20.13. Let K be a field and let f, g1 , . . . , gm K[X] be some nonzero polynomials. If f and gi are relatively prime for all i, 1 i m, then f and g1 gm are relatively
prime.
Proof. We proceed by induction on m. The case m = 1 is trivial. Let h = g2 gm . By the
induction hypothesis, f and h are relatively prime. Let d be a gcd of f and g1 h. We claim
that d is relatively prime to g1 . Otherwise, d and g1 would have some nonconstant gcd d1
which would divide both f and g1 , contradicting the fact that f and g1 are relatively prime.
Now, by Proposition 20.12, since d divides g1 h and d and g1 are relatively prime, d divides
h = g2 gm . But then, d is a divisor of f and h, and since f and h are relatively prime, d
must be a constant, and f and g1 gm are relatively prime.
Definition 20.6 is generalized to any finite number of polynomials as follows.
Definition 20.7. Given any nonzero polynomials f1 , . . . , fn K[X], where n 2, a polynomial d K[X] is a greatest common divisor of f1 , . . . , fn (for short, a gcd of f1 , . . . , fn )
if d divides each fi and whenever h K[X] divides each fi , then h divides d. We say that
f1 , . . . , fn are relatively prime if 1 is a gcd of f1 , . . . , fn .
It is easily shown that Proposition 20.10 can be generalized to any finite number of
polynomials, and similarly for its relevant corollaries. The details are left as an exercise.
Proposition 20.14. Let K be a field and let f1 , . . . , fn K[X] be any n 2 nonzero
polynomials. For every polynomial d K[X], the following properties are equivalent:
(1) The polynomial d is a gcd of f1 , . . . , fn .
(2) The polynomial d divides each fi and there exist u1 , . . . , un K[X] such that
d = u1 f1 + + un fn .
(3) The ideals (fi ), and (d) satisfy the equation
(d) = (f1 ) + + (fn ).
In addition, d 6= 0, and d is unique up to multiplication by a nonzero scalar in K.
As a consequence of Proposition 20.14, some polynomials f1 , . . . , fn K[X] are relatively
prime iff there exist u1 , . . . , un K[X] such that
u1 f1 + + un fn = 1.
The identity
u1 f1 + + un fn = 1
572
20.5
X 4 + 1 = (X 2 2X + 1)(X 2 + 2X + 1).
However, in view of the above factorization, X 4 + 1 is irreducible over Q.
It can be shown that the irreducible polynomials over R are the polynomials of degree 1,
or the polynomials of degree 2 of the form aX 2 + bX + c, for which b2 4ac < 0 (i.e., those
having no real roots). This is not easy to prove! Over the complex numbers C, the only
irreducible polynomials are those of degree 1. This is a version of a fact often referred to as
the Fundamental theorem of Algebra, or, as the French sometimes say, as dAlemberts
theorem!
We already observed that for any two nonzero polynomials f, g K[X], f divides g iff
(g) (f ). In view of the definition of a maximal ideal given in Definition 20.4, we now prove
that a polynomial p K[X] is irreducible iff (p) is a maximal ideal in K[X].
Proposition 20.15. A polynomial p K[X] is irreducible iff (p) is a maximal ideal in
K[X].
Proof. Since K[X] is an integral domain, for all nonzero polynomials p, q K[X], deg(pq) =
deg(p) + deg(q), and thus, (p) 6= K[X] iff deg(p) 1. Assume that p K[X] is irreducible.
Since every ideal in K[X] is a principal ideal, every ideal in K[X] is of the form (q), for
some q K[X]. If (p) (q), with deg(q) 1, then q divides p, and since p K[X] is
irreducible, this implies that p = q for some 6= 0 in K, and so, (p) = (q). Thus, (p) is a
maximal ideal. Conversely, assume that (p) is a maximal ideal. Then, as we showed above,
deg(p) 1, and if q divides p, with deg(q) 1, then (p) (q), and since (p) is a maximal
ideal, this implies that (p) = (q), which means that p = q for some 6= 0 in K, and so, p
is irreducible.
Let p K[X] be irreducible. Then, for every nonzero polynomial g K[X], either p and
g are relatively prime, or p divides g. Indeed, if d is any gcd of p and g, if d is a constant, then
573
p and g are relatively prime, and if not, because p is irreducible, we have d = p for some
6= 0 in K, and thus, p divides g. As a consequence, if p, q K[X] are both irreducible,
then either p and q are relatively prime, or p = q for some 6= 0 in K. In particular, if
p, q K[X] are both irreducible monic polynomials and p 6= q, then p and q are relatively
prime.
We now prove the (unique) factorization of polynomials into irreducible factors.
Theorem 20.16. Given any field K, for every nonzero polynomial
f = ad X d + ad1 X d1 + + a0
of degree d = deg(f ) 1 in K[X], there exists a unique set {hp1 , k1 i, . . . , hpm , km i} such that
f = ad pk11 pkmm ,
where the pi K[X] are distinct irreducible monic polynomials, the ki are (not necessarily
distinct) integers, and m 1, ki 1.
Proof. First, we prove the existence of such a factorization by induction on d = deg(f ).
Clearly, it is enough to prove the result for monic polynomials f of degree d = deg(f ) 1.
If d = 1, then f = X + a0 , which is an irreducible monic polynomial.
Assume d 2, and assume the induction hypothesis for all monic polynomials of degree
< d. Consider the set S of all monic polynomials g such that deg(g) 1 and g divides
f . Since f S, the set S is nonempty, and thus, S contains some monic polynomial p1 of
minimal degree. Since deg(p1 ) 1, the monic polynomial p1 must be irreducible. Otherwise
we would have p1 = g1 g2 , for some monic polynomials g1 , g2 such that deg(p1 ) > deg(g1 ) 1
and deg(p1 ) > deg(g2 ) 1, and since p1 divide f , then g1 would divide f , contradicting
the minimality of the degree of p1 . Thus, we have f = p1 q, for some irreducible monic
polynomial p1 , with q also monic. Since deg(p1 ) 1, we have deg(q) < deg(f ), and we can
apply the induction hypothesis to q. Thus, we obtain a factorization of the desired form.
We now prove uniqueness. Assume that
f = ad pk11 pkmm ,
and
Thus, we have
f = ad q1h1 qnhn .
ad pk11 pkmm = ad q1h1 qnhn .
574
and since q1 and the pi are irreducible monic, we must have m = 1 and p1 = q1 .
If h1 + + hn 2, since K[X] is an integral domain and since h1 1, we have
pk11 pkmm = q1 q,
with
q = q1h1 1 qnhn ,
575
for some q, r A, with (r) < (b). Since b I and I is an ideal, we also have bq I,
and since a, bq I and I is an ideal, then r I with (r) < (b) = m, contradicting the
minimality of m. Thus, r = 0 and a (b). But then,
I (b),
and since b I, we get
I = (b),
and A is a PID.
As a corollary of Proposition 20.17, the ring Z is a Euclidean domain (using the function
(a) = |a|) and thus, a PID. If K is a field, the function on K[X] defined such that
(f ) =
0
if f = 0,
deg(f ) + 1 if f 6= 0,
Remark: Given any integer, d Z, suchthat d 6= 0, 1 and d does not have any square factor
greater than one,
the quadratic field , Q( d), is the field consisting of all complex
numbers
of the form a + ib d if d < 0, and
of all the real numbers of the form a + b d if d > 0,
with a, b Q. The
subring of Q( d) consisting of all elements as above
for which a, b Z
is denoted
by
Z[
d].
We
define
the
ring
of
integers
of
the
field
Q(
d) as the subring of
576
Observe
that when d 2 (mod 4) or d 3 (mod 4), the ring of integers of Q( d) is equal to
Z[ d]. For more on quadratic fields and their rings of integers, see Stark [100] (Chapter 8)
or Niven, Zuckerman
and Montgomery [86] (Chapter 9). It can be shown that the rings of
integers, Z[ d], where d = 19, 43, 67, 163, are PIDs, but not Euclidean rings.
Actually the rings of integers of Q( d) that are Euclidean domains are completely determined but the proof is quite difficult. It turns out that there are twenty one such rings corresponding to the integers: 11, 7, 3, 2, 1, 2, 3, 5, 6, 7, 11, 13, 17, 19, 21, 29, 33, 37, 41, 57
and 73, see Stark [100] (Chapter 8).
It is possible to characterize a larger class of rings (in terms of ideals), factorial rings (or
unique factorization domains), for which unique factorization holds (see Section 21.1). We
now consider zeros (or roots) of polynomials.
20.6
Roots of Polynomials
577
578
579
Since fields are integral domains, Theorem 20.22 holds for nonnull polynomials over fields
and in particular, for R and C. An important consequence of Theorem 20.22 is the following.
Proposition 20.23. Let A be an integral domain. For any two polynomials f, g A[X], if
deg(f ) n, deg(g) n, and if there are n + 1 distinct elements 1 , 2 , . . . , n+1 A (with
i 6= j for i 6= j) such that f (i ) = g(i ) for all i, 1 i n + 1, then f = g.
Proof. Assume f 6= g, then, (f g) is nonnull, and since f (i ) = g(i ) for all i, 1 i n+1,
the polynomial (f g) has n + 1 distinct roots. Thus, (f g) has n + 1 distinct roots and
is of degree at most n, which contradicts Theorem 20.22.
Proposition 20.23 is often used to show that polynomials coincide. We will use it to show
some interpolation formulae due to Lagrange and Hermite. But first, we characterize the
multiplicity of a root of a polynomial. For this, we need the notion of derivative familiar in
analysis. Actually, we can simply define this notion algebraically.
First, we need to rule out some undesirable behaviors. Given a field K, as we saw in
Example 2.4, we can define a homomorphism : Z K given by
(n) = n 1,
where 1 is the multiplicative identity of K. Recall that we define n a by
na=a
+ a}
| + {z
n
if n 0 (with 0 a = 0) and
n a = (n) a
if n < 0. We say that the field K is of characteristic zero if the homomorphism is injective.
Then, for any a K with a 6= 0, we have n a 6= 0 for all n 6= 0
The fields Q, R, and C are of characteristic zero. In fact, it is easy to see that every
field of characteristic zero contains a subfield isomorphic to Q. Thus, finite fields cant be of
characteristic zero.
Remark: If a field is not of characteristic zero, it is not hard to show that its characteristic,
that is, the smallest n 2 such that n 1 = 0, is a prime number p. The characteristic p of K
is the generator of the principal ideal pZ, the kernel of the homomorphism : Z K. Thus,
every finite field is of characteristic some prime p. Infinite fields of nonzero characteristic
also exist.
580
The following standard properties of derivatives are recalled without proof (prove them
as an exercise).
Given any two polynomials, f, g A[X], we have
(f + g)0 = f 0 + g 0 ,
(f g)0 = f 0 g + f g 0 .
For example, if f = (X )k g and k 1, we have
f 0 = k(X )k1 g + (X )k g 0 .
We can now give a criterion for the existence of simple roots. The first proposition holds for
any ring.
Proposition 20.24. Let A be any ring. For every nonnull polynomial f A[X], A is
a simple root of f iff is a root of f and is not a root of f 0 .
Proof. Since A is a root of f , we have f = (X )g for some g A[X]. Now, is a
simple root of f iff g() 6= 0. However, we have f 0 = g + (X )g 0 , and so f 0 () = g().
Thus, is a simple root of f iff f 0 () 6= 0.
We can improve the previous proposition as follows.
Proposition 20.25. Let A be any ring. For every nonnull polynomial f A[X], let A
be a root of multiplicity k 1 of f . Then, is a root of multiplicity at least k 1 of f 0 . If
A is a field of characteristic zero, then is a root of multiplicity k 1 of f 0 .
581
582
f = 0,
When f A[X1 , . . . , Xn ] and n 2, one should not apply Proposition 20.26 abusively.
For example, let
f (X, Y ) = X 2 + Y 2 1,
HERMITE)
583
for every t R, it would be tempting to say that f = 0. But whats wrong with the above
reasoning is that there are no two infinite subsets R1 , R2 of R such that f (1 , 2 ) = 0 for
all (1 , 2 ) R2 . For every 1 R, there are at most two 2 R such that f (1 , 2 ) = 0.
What the example shows though, is that a nonnull polynomial f A[X1 , . . . , Xn ] where
n 2 can have an infinite number of zeros. This is in contrast with nonnull polynomials in
one variables over an infinite field (which have a number of roots bounded by their degree).
We now look at polynomial interpolation.
20.7
Let K be a field. First, we consider the following interpolation problem: Given a sequence
(1 , . . . , m+1 ) of pairwise distinct scalars in K and any sequence (1 , . . . , m+1 ) of scalars
in K, where the j are not necessarily distinct, find a polynomial P (X) of degree m such
that
P (1 ) = 1 , . . . , P (m+1 ) = m+1 .
First, observe that if such a polynomial exists, then it is unique. Indeed, this is a
consequence of Proposition 20.23. Thus, we just have to find any polynomial of degree m.
Consider the following so-called Lagrange polynomials:
Li (X) =
Note that L(i ) = 1 and that L(j ) = 0, for all j 6= i. But then,
P (X) = 1 L1 + + m+1 Lm+1
is the unique desired polynomial, since clearly, P (i ) = i . Such a polynomial is called a
Lagrange interpolant. Also note that the polynomials (L1 , . . . , Lm+1 ) form a basis of the
vector space of all polynomials of degree m. Indeed, if we had
1 L1 (X) + + m+1 Lm+1 (X) = 0,
setting X to i , we would get i = 0. Thus, the Li are linearly independent, and by the
previous argument, they are a set of generators. We we call (L1 , . . . , Lm+1 ) the Lagrange
basis (of order m + 1).
It is known from numerical analysis that from a computational point of view, the Lagrange
basis is not very good. Newton proposed another solution, the method of divided differences.
Consider the polynomial P (X) of degree m, called the Newton interpolant,
P (X) = 0 + 1 (X 1 ) + 2 (X 1 )(X 2 ) + + m (X 1 )(X 2 ) (X m ).
584
HERMITE)
585
using
Q(1 , . . . , i+1 ) =
Then, the computation can be arranged into a triangular array reminiscent of Pascals
triangle, as follows:
Initially, Q(j ) = j , 1 j m + 1, and
Q(1 )
Q(1 , 2 )
Q(2 )
Q(1 , 2 , 3 )
Q(2 , 3 )
Q(3 )
Q(3 , 4 )
Q(4 )
...
...
...
Q(2 , 3 , 4 )
...
In this computation, each successive column is obtained by forming the difference quotients of the preceeding column according to the formula
Q(k , . . . , i+k ) =
1
,
1 + x2
in the interval [5, +5]. Assuming a uniform distribution of points on the curve in the
interval [5, +5], as the degree of the Lagrange interpolant increases, the interpolant shows
586
wilder and wilder oscillations around the points x = 5 and x = +5. This phenomenon
becomes quite noticeable beginning for degree 14, and gets worse and worse. For degree 22,
things are quite bad! Equivalently, one may consider the function
f (x) =
1
,
1 + 25x2
P (1 ) = 10 ,
D1 P (1 ) = 11 ,
Note that the above equations constitute n + 1 constraints, and thus, we can expect that
there is a unique polynomial of degree n satisfying the above problem. This is indeed the
case and such a polynomial is called a Hermite polynomial . We call the above problem the
Hermite interpolation problem.
Proposition 20.28. The Hermite interpolation problem has a unique solution of degree n,
where n = n1 + + nm+1 + m.
Proof. First, we prove that the Hermite interpolation problem has at most one solution.
Assume that P and Q are two distinct solutions of degree n. Then, by Proposition 20.25
and the criterion following it, P Q has among its roots 1 of multiplicity at least n1 + 1, . . .,
m+1 of multiplicity at least nm+1 + 1. However, by Theorem 20.22, we should have
n1 + 1 + + nm+1 + 1 = n1 + + nm+1 + m + 1 n,
which is a contradiction, since n = n1 + + nm+1 + m. Thus, P = Q. We are left with
proving the existence of a Hermite interpolant. A quick way to do so is to use Proposition
5.13, which tells us that given a square matrix A over a field K, the following properties
hold:
HERMITE)
587
For every column vector B, there is a unique column vector X such that AX = B iff the
only solution to AX = 0 is the trivial vector X = 0 iff D(A) 6= 0.
Proposition 20.28 shows the existence of unique polynomials Hji (X) of degree n such
that Di Hji (j ) = 1 and Dk Hji (l ) = 0, for k 6= i or l 6= j, 1 j, l m + 1, 0 i, k nj .
The polynomials Hji are called Hermite basis polynomials.
One problem with Proposition 20.28 is that it does not give an explicit way of computing
the Hermite basis polynomials. We first show that this can be done explicitly in the special
cases n1 = . . . = nm+1 = 1, and n1 = . . . = nm+1 = 2, and then suggest a method using a
generalized Newton interpolant.
Assume that n1 = . . . = nm+1 = 1. We try Hj0 = (a(X j ) + b)L2j , and Hj1 =
(c(X j ) + d)L2j , where Lj is the Lagrange interpolant determined earlier. Since
DHj0 = aL2j + 2(a(X j ) + b)Lj DLj ,
requiring that Hj0 (j ) = 1, Hj0 (k ) = 0, DHj0 (j ) = 0, and DHj0 (k ) = 0, for k 6= j, implies
b = 1 and a = 2DLj (j ). Similarly, from the requirements Hj1 (j ) = 0, Hj1 (k ) = 0,
DHj1 (j ) = 1, and DHj1 (k ) = 0, k 6= j, we get c = 1 and d = 0.
Thus, we have the Hermite polynomials
Hj0 = (1 2DLj (j )(X j ))L2j ,
Hj1 = (X j )L2j .
= 2X 3 3X 2 + 1,
= 2X 3 + 3X 2 ,
= X 3 2X 2 + X,
= X 3 X 2.
588
X a
.
ba
Observe the presence of the extra factor (b a) in front of m0 and m1 , the formula would
be false otherwise!
t=
Going back to the general problem, it seems to us that a kind of Newton interpolant will
be more manageable. Let
P00 (X) = 1,
Pj0 (X) = (X 1 )n1 +1 (X j )nj +1 , 1 j m
j=m,i=nj+1
P (X) =
ij Pji (X).
j=0,i=0
We can think of P (X) as a generalized Newton interpolant. We can compute the derivatives Dk Pji , for 1 k nj+1 , and if we look for the Hermite basis polynomials Hji (X) such
that Di Hji (j ) = 1 and Dk Hji (l ) = 0, for k 6= i or l 6= j, 1 j, l m + 1, 0 i, k nj ,
we find that we have to solve triangular systems of linear equations. Thus, as in the simple
case n1 = . . . = nm+1 = 0, we can solve successively for the ij . Obviously, the computations
are quite formidable and we leave such considerations for further study.
Chapter 21
UFDs, Noetherian Rings, Hilberts
Basis Theorem
21.1
We saw in Section 20.5 that if K is a field, then every nonnull polynomial in K[X] can
be factored as a product of irreducible factors, and that such a factorization is essentially
unique. The same property holds for the ring K[X1 , . . . , Xn ] where n 2, but a different
proof is needed.
The reason why unique factorization holds for K[X1 , . . . , Xn ] is that if A is an integral
domain for which unique factorization holds in some suitable sense, then the property of
unique factorization lifts to the polynomial ring A[X]. Such rings are called factorial rings,
or unique factorization domains. The first step if to define the notion of irreducible element
in an integral domain, and then to define a factorial ring. If will turn out that in a factorial
ring, any nonnull element a is irreducible (or prime) iff the principal ideal (a) is a prime
ideal.
Recall that given a ring A, a unit is any invertible element (w.r.t. multiplication). The
set of units of A is denoted by A . It is a multiplicative subgroup of A, with identity 1. Also,
given a, b A, recall that a divides b if b = ac for some c A; equivalently, a divides b iff
(b) (a). Any nonzero a A is divisible by any unit u, since a = u(u1 a). The relation a
divides b, often denoted by a | b, is reflexive and transitive, and thus, a preorder on A {0}.
Definition 21.1. Let A be an integral domain. Some element a A is irreducible if a 6= 0,
a
/ A (a is not a unit), and whenever a = bc, then either b or c is a unit (where b, c A).
Equivalently, a A is reducible if a = 0, or a A (a is a unit), or a = bc where b, c
/ A
(a, b are both noninvertible) and b, c 6= 0.
Observe that if a A is irreducible and u A is a unit, then ua is also irreducible.
Generally, if a A, a 6= 0, and u is a unit, then a and ua are said to be associated . This
is the equivalence relation on nonnull elements of A induced by the divisibility preorder.
589
590
591
There are integral domains that are not UFDs. For example,
the subring Z[ 5] of C
9 = 3 3 = (2 + i 5)(2 i 5),
and it can be shown that 3, 2 + i 5, and 2 i 5 are irreducible, and that the units are 1.
The uniqueness condition (2) fails and Z[ 5] is not a UFD.
Remark: For d Z with d < 0, it is known that the ring of integers of Q( d) is a UFD iff d
is one of the nine primes, d = 1, 2, 3, 7, 11, 19, 43, 67 and 163. This is a hard
theorem that was conjectured by Gauss but not proved until 1966, independently by Stark
and Baker. Heegner had published a proof of this result in 1952 but there was some doubt
about its validity. After finding his proof, Stark reexamined Heegners proof and concluded
that it was essentially correct after all. In sharp contrast, when
d is a positive integer, the
problem of determining which of the rings of integers
of Q( d) are UFDs, is still open. It
d] is a UFD iff d = 1 or d = 2. If
can also be shown that
if
d
<
0,
then
the
ring
Z[
d 1 (mod 4), then Z[ d] is never a UFD. For more details about these remarkable results,
see Stark [100] (Chapter 8).
Proposition 21.2. Let A be an integral domain satisfying condition (1) in Definition 21.2.
Then, condition (2) in Definition 21.2 is equivalent to the following condition:
(20 ) If a A is irreducible and a divides the product bc, where b, c A and b, c 6= 0, then
either a divides b or a divides c.
Proof. First, assume that (2) holds. Let bc = ad, where d A, d 6= 0. If b is a unit, then
c = adb1 ,
and c is divisible by a. A similar argument applies to c. Thus, we may assume that b and c
are not units. In view of (1), we can write
b = p1 pm
where pi A is irreducible. Since bc = ad, a is irreducible, and b, c are not units, d cannot
be a unit. In view of (1), we can write
d = q1 qr ,
where qi A is irreducible. Thus,
p1 pm pm+1 pm+n = aq1 qr ,
where all the factors involved are irreducible. By (2), we must have
a = ui0 pi0
592
a = a1 am = b 1 b n ,
where ai A and bj A are irreducible. Without loss of generality, we may assume that
m n. We proceed by induction on m. If m = 1,
a1 = b 1 b n ,
and since a1 is irreducible, u = b1 bi1 bi+1 bn must be a unit for some i, 1 i n. Thus,
(2) holds with n = 1 and a1 = bi u. Assume that m > 1 and that the induction hypothesis
holds for m 1. Since
a1 a2 am = b 1 b n ,
a1 divides b1 bn , and in view of (20 ), a1 divides some bj . Since a1 and bj are irreducible,
we must have bj = uj a1 , where uj A is a unit. Since A is an integral domain,
a1 a2 am = b1 bj1 uj a1 bj+1 bn
implies that
a2 am = (uj b1 ) bj1 bj+1 bn ,
and by the induction hypothesis, m 1 = n 1 and ai = vi b (i) for some units vi A and
some bijection between {2, . . . , m} and {1, . . . , j 1, j + 1, . . . , m}. However, the bijection
extends to a permutation of {1, . . . , m} by letting (1) = j, and the result holds by
letting v1 = u1
j .
As a corollary of Proposition 21.2. we get the converse of Proposition 21.1.
Proposition 21.3. Let A be a factorial ring. For any a A with a 6= 0, the principal ideal
(a) is a prime ideal iff a is irreducible.
Proof. In view of Proposition 21.1, we just have to prove that if a A is irreducible, then the
principal ideal (a) is a prime ideal. Indeed, if bc (a), then a divides bc, and by Proposition
21.2, property (20 ) implies that either a divides b or a divides c, that is, either b (a) or
c (a), which means that (a) is prime.
Because Proposition 21.3 holds, in a UFD, an irreducible element is often called a prime.
In a UFD A, every nonzero element a A that is not a unit can be expressed as a
product a = a1 an of irreducible elements ai , and by property (2), the number n of factors
only depends on a, that is, it is the same for all factorizations into irreducible factors. We
agree that this number is 0 for a unit.
Remark: If A is a UFD, we can state the factorization properties so that they also applies
to units:
593
594
Proof. Assume that f (X) = ag(X), for some g(X) A[X]. Since a 6= 0 and A is an
integral ring, f (X) and g(X) have the same degree m, and since for every i (0 i m)
the coefficient of X i in f (X) is equal to the coefficient of X i in ag(x), we have fi = agi , and
whenever fi 6= 0, we see that a divides fi .
Lemma 21.5. (Gausss lemma) Let A be a UFD. For any a A, if a is irreducible and a
divides the product f (X)g(X) of two polynomials f (X), g(X) A[X], then either a divides
f (X) or a divides g(X).
Proof. Let f (X) = fm X m + + fi X i + + f0 and g(X) = gn X n + + gj X j + + g0 .
Assume that a divides neither f (X) nor g(X). By the (easy) converse of Proposition 21.4,
there is some i (0 i m) such that a does not divide fi , and there is some j (0 j n)
such that a does not divide gj . Pick i and j minimal such that a does not divide fi and a
does not divide gj . The coefficient ci+j of X i+j in f (X)g(X) is
ci+j = f0 gi+j + f1 gi+j1 + + fi gj + + fi+j g0
(letting fh = 0 if h > m and gk = 0 if k > n). From the choice of i and j, a cannot divide
fi gj , since a being irreducible, by (20 ) of Proposition 21.2, a would divide fi or gj . However,
by the choice of i and j, a divides every other nonnull term in the sum for ci+j , and since a
is irreducible and divides f (X)g(X), by Proposition 21.4, a divides ci+j , which implies that
a divides fi gj , a contradiction. Thus, either a divides f (X) or a divides g(X).
As a corollary, we get the following proposition.
Proposition 21.6. Let A be a UFD. For any a A, a 6= 0, if a divides the product
f (X)g(X) of two polynomials f (X), g(X) A[X] and f (X) is irreducible and of degree at
least 1, then a divides g(X).
Proof. The Proposition is trivial is a is a unit. Otherwise, a = a1 am where ai A is
irreducible. Using induction and applying Lemma 21.5, we conclude that a divides g(X).
We now show that Lemma 21.5 also applies to the case where a is an irreducible polynomial. This requires a little excursion involving the fraction field F of A.
Remark: If A is a UFD, it is possible to prove the uniqueness condition (2) for A[X] directly
without using the fraction field of A, see Malliavin [75], Chapter 3.
Given an integral domain A, we can construct a field F such that every element of F
is of the form a/b, where a, b A, b 6= 0, using essentially the method for constructing the
field Q of rational numbers from the ring Z of integers.
Proposition 21.7. Let A be an integral domain.
(1) There is a field F and an injective ring homomorphism i : A F such that every
element of F is of the form i(a)i(b)1 , where a, b A, b 6= 0.
595
(2) For every field K and every injective ring homomorphism h : A K, there is a
(unique) field homomorphism b
h : F K such that
b
h(i(a)i(b)1 ) = h(a)h(b)1
for all a, b A, b 6= 0.
(3) The field F in (1) is unique up to isomorphism.
Proof. (1) Consider the binary relation ' on A (A {0}) defined as follows:
(a, b) ' (a0 , b0 ) iff ab0 = a0 b.
It is easily seen that ' is an equivalence relation. Note that the fact that A is an integral
domain is used to prove transitivity. The equivalence class of (a, b) is denoted by a/b. Clearly,
(0, b) ' (0, 1) for all b A, and we denote the class of (0, 1) also by 0. The equivalence class
a/1 of (a, 1) is also denoted by a. We define addition and multiplication on A (A {0})
as follows:
(a, b) + (a0 , b0 ) = (ab0 + a0 b, bb0 ),
(a, b) (a0 , b0 ) = (aa0 , bb0 ).
It is easily verified that ' is congruential w.r.t. + and , which means that + and are
well-defined on equivalence classes modulo '. When a, b 6= 0, the inverse of a/b is b/a, and
it is easily verified that F is a field. The map i : A F defined such that i(a) = a/1 is an
injection of A into F and clearly
a
= i(a)i(b)1 .
b
(2) Given an injective ring homomorphism h : A K into a field K,
a
a0
= 0
b
b
iff ab0 = a0 b,
596
The field F given by Proposition 21.7 is called the fraction field of A, and it is denoted
by Frac(A).
In particular, given an integral domain A, since A[X1 , . . . , Xn ] is also an integral domain, we can form the fraction field of the polynomial ring A[X1 , . . . , Xn ], denoted by
F (X1 , . . . , Xn ), where F = Frac(A) is the fraction field of A. It is also called the field
of rational functions over F , although the terminology is a bit misleading, since elements of
F (X1 , . . . , Xn ) only define functions when the dominator is nonnull.
We now have the following crucial lemma which shows that if a polynomial f (X) is
reducible over F [X] where F is the fraction field of A, then f (X) is already reducible over
A[X].
Lemma 21.8. Let A be a UFD and let F be the fraction field of A. For any nonnull
polynomial f (X) A[X] of degree m, if f (X) is not the product of two polynomials of
degree strictly smaller than m, then f (X) is irreducible in F [X].
Proof. Assume that f (X) is reducible in F [X] and that f (X) is neither null nor a unit.
Then,
f (X) = G(X)H(X),
where G(X), H(X) F [X] are polynomials of degree p, q 1. Let a be the product of
the denominators of the coefficients of G(X), and b the product of the denominators of
the coefficients of H(X). Then, a, b 6= 0, g1 (X) = aG(X) A[X] has degree p 1,
h1 (X) = bH(X) A[X] has degree q 1, and
abf (X) = g1 (X)h1 (X).
Let c = ab. If c is a unit, then f (X) is also reducible in A[X]. Otherwise, c = c1 cn ,
where ci A is irreducible. We now use induction on n to prove that
f (X) = g(X)h(X),
for some polynomials g(X) A[X] of degree p 1 and h(X) A[X] of degree q 1.
597
for some polynomials g2 (X) A[X] of degree p 1 and h2 (X) A[X] of degree q 1. By
the induction hypothesis, we get
f (X) = g(X)h(X),
for some polynomials g(X) A[X] of degree p 1 and h(X) A[X] of degree q 1,
showing that f (X) is reducible in A[X].
Finally, we can prove that (20 ) holds.
Lemma 21.9. Let A be a UFD. Given any three nonnull polynomials f (X), g(X), h(X)
A[X], if f (X) is irreducible and f (X) divides the product g(X)h(X), then either f (X)
divides g(X) or f (X) divides h(X).
Proof. If f (X) has degree 0, then the result follows from Lemma 21.5. Thus, we may assume
that the degree of f (X) is m 1. Let F be the fraction field of A. By Lemma 21.8, f (X)
is also irreducible in F [X]. Since F [X] is a UFD (by Theorem 20.16), either f (X) divides
g(X) or f (X) divides h(X), in F [X]. Assume that f (X) divides g(X), the other case being
similar. Then,
g(X) = f (X)G(X),
for some G(X) F [X]. If a is the product the denominators of the coefficients of G, we
have
ag(X) = q1 (X)f (X),
where q1 (X) = aG(X) A[X]. If a is a unit, we see that f (X) divides g(X). Otherwise,
a = a1 an , where ai A is irreducible. We prove by induction on n that
g(X) = q(X)f (X)
for some q(X) A[X].
by Lemma 21.5, a1 divides q1 (X). Thus, q1 (X) = a1 q(X) where q(X) A[X]. Since A[X]
is an integral domain, we get
g(X) = q(X)f (X),
and f (X) divides g(X). If n > 1, from
a1 an g(X) = q1 (X)f (X),
we note that a1 divides q1 (X)f (X), and as in the previous case, a1 divides q1 (X). Thus,
q1 (X) = a1 q2 (X) where q2 (X) A[X], and we get
a2 an g(X) = q2 (X)f (X).
598
599
600
(an )
n1
(an ) = (a)
n1
for some a A. However, there must be some n such that a (an ), and thus,
(an ) (a) (an ),
and the chain stabilizes at (an ). As a consequence, for any ideal (d) such that
(an ) (d)
601
and (an ) 6= (d), d has the desired factorization. Observe that an is not irreducible, since
(an ) S, and thus,
an = bc
for some b, c A, where neither b nor c is a unit. Then,
(an ) (b) and (an ) (c).
If (an ) = (b), then b = an u for some u A, and then
an = an uc,
so that
1 = uc,
since A is an integral domain, and thus, c is a unit, a contradiction. Thus, (an ) 6= (b), and
similarly, (an ) 6= (c). But then, both b and c factor as products of irreducible elements and
so does an = bc, a contradiction. This implies that S = .
To prove the uniqueness of factorizations, we use Proposition 21.2. Assume that a is
irreducible and that a divides bc. If a does not divide b, by a previous remark, a and b are
relatively prime, and by Proposition 21.11, there are some x, y A such that
ax + by = 1.
Thus,
acx + bcy = c,
and since a divides bc, we see that a must divide c, as desired.
Thus, we get another justification of the fact that Z is a UFD and that if K is a field,
then K[X] is a UFD.
It should also be noted that in a UFD, gcds of nonnull elements always exist. Indeed,
this is trivial if a or b is a unit, and otherwise, we can write
a = p1 p m
and b = q1 qn
where pi , qj A are irreducible, and the product of the common factors of a and b is a gcd
of a and b (it is 1 is there are no common factors).
We conclude this section on UFDs by proving a proposition characterizing when a UFD
is a PID. The proof is nontrivial and makes use of Zorns lemma (several times).
Proposition 21.13. Let A be a ring that is a UFD, and not a field. Then, A is a PID iff
every nonzero prime ideal is maximal.
602
Proof. Assume that A is a PID that is not a field. Consider any nonzero prime ideal, (p),
and pick any proper ideal A in A such that
(p) A.
Since A is a PID, the ideal A is a principal ideal, so A = (q), and since A is a proper nonzero
ideal, q 6= 0 and q is not a unit. Since
(p) (q),
q divides p, and we have p = qp1 for some p1 A. Now, by Proposition 21.1, since p 6= 0
and (p) is a prime ideal, p is irreducible. But then, since p = qp1 and p is irreducible, p1
must be a unit (since q is not a unit), which implies that
(p) = (q);
that is, (p) is a maximal ideal.
Conversely, let us assume that every nonzero prime ideal is maximal. First, we prove that
every prime ideal is principal. This is obvious for (0). If A is a nonzero prime ideal, then,
by hypothesis, it is maximal. Since A 6= (0), there is some nonzero element a A. Since A
is maximal, a is not a unit, and since A is a UFD, there is a factorization a = a1 an of a
into irreducible elements. Since A is prime, we have ai A for some i. Now, by Proposition
21.3, since ai is irreducible, the ideal (ai ) is prime, and so, by hypothesis, (ai ) is maximal.
Since (ai ) A and (ai ) is maximal, we get A = (ai ).
Next, assume that A is not a PID. Define the set, F, by
F = {A | A A,
Since A is not a PID, the set F is nonempty. Also, the reader will easily check that every
chain in F is bounded. Then, by Zorns lemma (Lemma 31.1), the set F has some maximal
element, A. Clearly, A 6= (0) is a proper ideal (since A = (1)), and A is not prime, since we
just showed that prime ideals are principal. Then, by Theorem 31.3, there is some maximal
ideal, M, so that A M. However, a maximal ideal is prime, and we have shown that a
prime ideal is principal. Thus,
A (p),
for some p A that is not a unit. Moreover, by Proposition 21.1, the element p is irreducible.
Define
B = {a A | pa A}.
603
Observe that the above proof shows that Proposition 21.13 also holds under the assumption that every prime ideal is principal.
21.2
In this section, which is a bit of an interlude, we prove a basic result about quotients of
commutative rings by products of ideals that are pairwise relatively prime. This result has
applications in number theory and in the structure theorem for finitely generated modules
over a PID, which will be presented later.
Given two ideals a and b of a ring A, we define the ideal ab as the set of all finite sums
of the form
a1 b1 + + ak bk , ai a, bi b.
The reader should check that ab is indeed an ideal. Observe that ab a and ab b, so that
ab a b.
In general, equality does not hold. However if
a + b = A,
then we have
ab = a b.
This is because there is some a a and some b b such that
a + b = 1,
so for every x a b, we have
x = xa + xb,
which shows that x ab. Ideals a and b of A that satisfy the condition a + b = A are
sometimes said to be comaximal .
We define the homomorphism : A A/a A/b by
(x) = (xa , xb ),
where xa is the equivalence class of x modulo a (resp. xb is the equivalence class of x modulo
b). Recall that the ideal a defines the equivalence relation a on A given by
x a y
iff x y a,
and that A/a is the quotient ring of equivalence classes xa , where x A, and similarly for
A/b. Sometimes, we also write x y (mod a) for x a y.
604
(mod a)
(mod b).
605
a1 + a2 an = A.
a1 a2 an = a1 (a2 an ) = a1 a2 an .
Let us now prove that is surjective by induction. The case n = 2 is Theorem 21.14. Let
x1 , . . . , xn be any n 3 elements of A. First, applying Theorem 21.14 to a1 and a2 an ,
we can find y1 A such that
y1 1
y1 0
(mod a1 )
(mod a2 an ).
By the induction hypothesis, we can find y2 , . . . , yn A such that for all i, j with 2 i, j n,
yi 1
yi 0
(mod ai )
(mod aj ),
j 6= i.
We claim that
x = x 1 y 1 + x2 y 2 + + xn y n
(mod ai ),
(mod ai ),
i = 2, . . . , n
(mod ai ),
i = 2, . . . , n.
()
606
For i = 1, we get
x x1
(mod a1 ),
therefore
x xi
(mod ai ),
i = 1, . . . , n.
proving surjectivity.
The classical version of the Chinese Remainder Theorem is the case where A = Z and
where the ideals ai are defined by n pairwise relatively prime integers m1 , . . . , mn . By the
Bezout identity, since mi and mj are relatively prime whenever i 6= j, there exist some
ui , uj Z such that ui mi + uj mj = 1, and so mi Z + mj Z = Z. In this case, we get an
isomorphism
n
Y
Z/(m1 mn )Z
Z/mi Z.
i=1
Z/pri i Z.
In the previous situation where the integers m1 , . . . , mn are pairwise relatively prime, if
we write m = m1 mn and m0i = m/mi for i = 1 . . . , n, then mi and m0i are relatively
prime, and so m0i has an inverse modulo mi . If ti is such an inverse, so that
m0i ti 1
(mod mi ),
(mod mi ),
i = 1, . . . , n.
Theorem 21.15 can be used to characterize rings isomorphic to finite products of quotient
rings. Such rings play a role in the structure theorem for torsion modules over a PID.
Given n rings A1 , . . . , An , recall that the product ring A = A1 An is the ring in
which addition and multiplication are defined componenwise. That is,
(a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn )
(a1 , . . . , an ) (b1 , . . . , bn ) = (a1 b1 , . . . , an bn ).
607
The additive identity is 0A = (0, . . . , 0) and the multiplicative identity is 1A = (1, . . . , 1).
Then, for i = 1, . . . , n, we can define the element ei A as follows:
ei = (0, . . . , 0, 1, 0, . . . , 0),
where the 1 occurs in position i. Observe that the following properties hold for all i, j =
1, . . . , n:
e2i = ei
ei ej = 0, i 6= j
e1 + + en = 1A .
Also, for any element a = (a1 , . . . , an ) A, we have
ei a = (0, . . . , 0, ai , 0, . . . , 1) = pri (a),
where pri is the projection of A onto Ai . As a consequence
Ker (pri ) = (1A ei )A.
Definition 21.3. Given a commutative ring A, a direct decomposition of A is a sequence
(b1 , . . . , bn ) of ideals in A such that there is an isomorphism A A/b1 A/bn .
The following theorem gives useful conditions characterizing direct decompositions of a
ring.
Theorem 21.16. Let A be a commutative ring and let (b1 , . . . , bn ) be a sequence of ideals
in A. The following conditions are equivalent:
(a) The sequence (b1 , . . . , bn ) is a direct decomposition of A.
(b) There exist some elements e1 , . . . , en of A such that
e2i = ei
ei ej = 0, i 6= j
e1 + + en = 1A ,
and bi = (1A ei )A, for i, j = 1, . . . , n.
(c) We have bi + bj = A for all i 6= j, and b1 bn = (0).
(d) We have bi + bj = A for all i 6= j, and b1 bn = (0).
608
Proof. Assume (a). Since we have an isomorphism A A/b1 A/bn , we may identify
A with A/b1 A/bn , and bi with Ker (pri ). Then, e1 , . . . , en are the elements defined
just before Definition 21.3. As noted, bi = Ker (pri ) = (1A ei )A. This proves (b).
i=1
Y
n
i=1
i=1
n
X
xi (1A
ei )
i=1
= 0,
which proves that b1 bn = (0). Thus, (c) holds.
The equivalence of (c) and (d) follows from the proof of Theorem 21.15.
The fact that (c) implies (a) is an immediate consequence of Theorem 21.15.
21.3
Given a (commutative) ring A (with unit element 1), an ideal A A is said to be finitely
generated if there exists a finite set {a1 , . . . , an } of elements from A so that
A = (a1 , . . . , an ) = {1 a1 + + n an | i A, 1 i n}.
If K is a field, it turns out that every polynomial ideal A in K[X1 , . . . , Xm ] is finitely
generated. This fact due to Hilbert and known as Hilberts basis theorem, has very important
consequences. For example, in algebraic geometry, one is interested in the zero locus of a set
of polyomial equations, i.e., the set, V (P), of n-tuples (1 , . . . , n ) K n so that
Pi (1 , . . . , n ) = 0
for all polynomials Pi (X1 , . . . , Xn ) in some given family, P = (Pi )iI . However, it is clear
that
V (P) = V (A),
where A is the ideal generated by P. Then, Hilberts basis theorem says that V (A) is actually
defined by a finite number of polynomials (any set of generators of A), even if P is infinite.
The property that every ideal in a ring is finitely generated is equivalent to other natural
properties, one of which is the so-called ascending chain condition, abbreviated a.c.c. Before
proving Hilberts basis theorem, we explore the equivalence of these conditions.
609
Definition 21.4. Let A be a commutative ring with unit 1. We say that A satisfies the
ascending chain condition, for short, the a.c.c, if for every ascending chain of ideals
A1 A2 Ai ,
there is some integer n 1 so that
Ai = An
for all i n + 1.
We say that A satisfies the maximum condition if every nonempty collection C of ideals in
A has a maximal element, i.e., there is some ideal A C which is not contained in any other
ideal in C.
Proposition 21.17. A ring A satisfies the a.c.c if and only if it satisfies the maximum
condition.
Proof. Suppose that A does not satisfy the a.c.c. Then, there is an infinite strictly ascending
sequence of ideals
A1 A2 Ai ,
and the collection C = {Ai } has no maximal element.
Conversely, assume that A satisfies the a.c.c. Let C be a nonempty collection of ideals
Since C is nonempty, we may pick some ideal A1 in C. If A1 is not maximal, then there is
some ideal A2 in C so that
A1 A2 .
Using this process, if C has no maximal element, we can define by induction an infinite
strictly increasing sequence
A1 A2 Ai .
However, the a.c.c. implies that such a sequence cannot exist. Therefore, C has a maximal
element.
Having shown that the a.c.c. condition is equivalent to the maximal condition, we now
prove that the a.c.c. condition is equivalent to the fact that every ideal is finitely generated.
Proposition 21.18. A ring A satisfies the a.c.c if and only if every ideal is finitely generated.
Proof. Assume that every ideal is finitely generated. Consider an ascending sequence of
ideals
A1 A2 Ai .
S
Observe that A = i Ai is also an ideal. By hypothesis, A has a finite generating set
{a1 , . . . , an }. By definition of A, each ai belongs to some Aji , and since the Ai form an
ascending chain, there is some m so that ai Am for i = 1, . . . , n. But then,
Ai = Am
610
Conversely, assume that the a.c.c. holds. Let A be any ideal in A and consider the family
C of subideals of A that are finitely generated. The family C is nonempty, since (0) is a
subideal of A. By Proposition 21.17, the family C has some maximal element, say B. For
any a A, the ideal B + (a) (where B + (a) = {b + a | b B, A}) is also finitely
generated (since B is finitely generated), and by maximality, we have
B = B + (a).
So, we get a B for all a A, and thus, A = B, and A is finitely generated.
Definition 21.5. A commutative ring A (with unit 1) is called noetherian if it satisfies the
a.c.c. condition. A noetherian domain is a noetherian ring that is also a domain.
By Proposition 21.17 and Proposition 21.18, a noetherian ring can also be defined as a
ring that either satisfies the maximal property or such that every ideal is finitely generated.
The proof of Hilberts basis theorem will make use the following lemma:
Lemma 21.19. Let A be a (commutative) ring. For every ideal A in A[X], for every i 0,
let Li (A) denote the set of elements of A consisting of 0 and of the coefficients of X i in all
the polynomials f (X) A which are of degree i. Then, the Li (A)s form an ascending chain
of ideals in A. Furthermore, if B is any ideal of A[X] so that A B and if Li (A) = Li (B)
for all i 0, then A = B.
Proof. That Li (A) is an ideal and that Li (A) Li+1 (A) follows from the fact that if f (X)
A and g(X) A, then f (X) + g(X), f (X), and Xf (X) all belong to A. Now, let g(X) be
any polynomial in B, and assume that g(X) has degree n. Since Ln (A) = Ln (B), there is
some polynomial fn (X) in A, of degree n, so that g(X) fn (X) is of degree at most n 1.
Now, since A B, the polynomial g(X) fn (X) belongs to B. Using this process, we can
define by induction a sequence of polynomials fn+i (X) A, so that each fn+i (X) is either
zero or has degree n i, and
g(X) (fn (X) + fn+1 (X) + + fn+i (X))
is of degree at most n i 1. Note that this last polynomial must be zero when i = n, and
thus, g(X) A.
We now prove Hilberts basis theorem. The proof is substantially Hilberts original proof.
A slightly shorter proof can be given but it is not as transparent as Hilberts proof (see the
remark just after the proof of Theorem 21.20, and Zariski and Samuel [117], Chapter IV,
Section 1, Theorem 1).
Theorem 21.20. (Hilberts basis theorem) If A is a noetherian ring, then A[X] is also a
noetherian ring.
611
Proof. Let A be any ideal in A[X], and denote by L the set of elements of A consisting of 0
and of all the coefficients of the highest degree terms of all the polynomials in A. Observe
that
[
L=
Li (A).
i
n
X
i fi (X)X ddi ,
i=1
where di is the degree of fi (X). Now, g(X) g1 (X) is a polynomial in A of degree at most
d 1. By repeating this procedure, we get a sequence of polynomials gi (X) in B, having
strictly decreasing degrees, and such that the polynomial
g(X) (g1 (X) + + gi (X))
is of degree at most d i. This polynomial must be of degree at most q 1 as soon as
i = d q + 1. Thus, we proved that every polynomial in A of degree d q belongs to B.
It remains to take care of the polynomials in A of degree at most q 1. Since A is
noetherian, each ideal Li (A) is finitely generated, and let {ai1 , . . . , aini } be a set of generators
for Li (A) (for i = 0, . . . , q 1). Let fij (X) be a polynomial in A having aij X i as its highest
degree term. Given any polynomial g(X) A of degree d q 1, if we denote its term of
highest degree by aX d , then, as in the previous argument, we can write
a = 1 ad1 + + nd adnd ,
and we define
g1 (X) =
nd
X
i=1
where di is the degree of fdi (X). Then, g(X) g1 (X) is a polynomial in A of degree at most
d 1, and by repeating this procedure at most q times, we get an element of A of degree 0,
and the latter is a linear combination of the f0i s. This proves that every polynomial in A
of degree at most q 1 is a combination of the polynomials fij (X), for 0 i q 1 and
1 j ni . Therefore, A is generated by the fk (X)s and the fij (X)s, a finite number of
polynomials.
612
Remark: Only a small part of Lemma 21.19 was used in the above proof, namely, the fact
that Li (A) is an ideal. A shorter proof of Theorem 21.21 making full use of Lemma 21.19
can be given as follows:
Proof. (Second proof) Let (Ai )i1 be an ascending sequence of ideals in A[X]. Consider
the doubly indexed family (Li (Aj )) of ideals in A. Since A is noetherian, by the maximal
property, this family has a maximal element Lp (Aq ). Since the Li (Aj )s form an ascending
sequence when either i or j is fixed, we have Li (Aj ) = Lp (Aq ) for all i and j with i p and
j q, and thus, Li (Aq ) = Li (Aj ) for all i and j with i p and j q. On the other hand,
for any fixed i, the a.c.c. shows that there exists some integer n(i) so that Li (Aj ) = Li (An(i) )
for all j n(i). Since Li (Aq ) = Li (Aj ) when i p and j q, we may take n(i) = q if
i p. This shows that there is some n0 so that n(i) n0 for all i 0, and thus, we have
Li (Aj ) = Li (An(0) ) for every i and for every j n(0). By Lemma 21.19, we get Aj = An(0)
for every j n(0), establishing the fact that A[X] satisfies the a.c.c.
Using induction, we immediately obtain the following important result.
Corollary 21.21. If A is a noetherian ring, then A[X1 , . . . , Xn ] is also a noetherian ring.
Since a field K is obviously noetherian (since it has only two ideals, (0) and K), we also
have:
Corollary 21.22. If K is a field, then K[X1 , . . . , Xn ] is a noetherian ring.
21.4
Futher Readings
The material of this Chapter is thoroughly covered in Lang [67], Artin [4], Mac Lane and
Birkhoff [73], Bourbaki [14, 15], Malliavin [75], Zariski and Samuel [117], and Van Der
Waerden [112].
Chapter 22
Annihilating Polynomials and the
Primary Decomposition
22.1
615
22.2
In this section, we prove that if the minimal polynomial mf of a linear map f is of the form
mf = (X 1 ) (X k )
for disctinct scalars 1 , . . . , k K, then f is diagonalizable. This is a powerful result that
has a number of implications. We need of few properties of invariant subspaces.
Given a linear map f : E E, recall that a subspace W of E is invariant under f if
f (u) W for all u W .
Proposition 22.2. Let W be a subspace of E invariant under the linear map f : E E
(where E is finite-dimensional). Then, the minimal polynomial of the restriction f | W of
f to W divides the minimal polynomial of f , and the characteristic polynomial of f | W
divides the characteristic polynomial of f .
Sketch of proof. The key ingredient is that we can pick a basis (e1 , . . . , en ) of E in which
(e1 , . . . , ek ) is a basis of W . Then, the matrix of f over this basis is a block matrix of the
form
B C
A=
,
0 D
where B is a k k matrix, D is a (n k) (n k) matrix, and C is a k (n k) matrix.
Then
det(XI A) = det(XI B) det(XI D),
617
Proof. Observe that (a) and (b) together assert that the f -conductor of u into W is a
polynomial of the form X i . Pick any vector v E not in W , and let g be the conductor
of v into W . Since g divides m and v
/ W , the polynomial g is not a constant, and thus it
is of the form
g = (X 1 )s1 (X k )sk ,
with at least some si > 0. Choose some index j such that sj > 0. Then X j is a factor
of g, so we can write
g = (X j )q.
By definition of g, the vector u = q(f )(v) cannot be in W , since otherwise g would not be
of minimal degree. However,
(f j id)(u) = (f j id)(q(f )(v))
= g(f )(v)
is in W , which concludes the proof.
We can now prove the main result of this section.
Theorem 22.5. Let f : E E be a linear map on a finite-dimensional space E. Then f is
diagonalizable iff its minimal polynomial m is of the form
m = (X 1 ) (X k ),
where 1 , . . . , k are distinct elements of K.
Proof. We already showed in Section 22.2 that if f is diagonalizable, then its minimal polynomial is of the above form (where 1 , . . . , k are the distinct eigenvalues of f ).
For the converse, let W be the subspace spanned by all the eigenvectors of f . If W 6= E,
since W is invariant under f , by Proposition 22.4, there is some vector u
/ W such that for
some j , we have
(f j id)(u) W.
Let v = (f j id)(u) W . Since v W , we can write
v = w1 + + wk
where f (wi ) = i wi (either wi = 0 or wi is an eigenvector for i ), and so, for every polynomial
h, we have
h(f )(v) = h(1 )w1 + + h(k )wk ,
which shows that h(f )(v) W for every polynomial h. We can write
m = (X j )q
which implies that q(f )(u) W (either q(f )(u) = 0, or it is an eigenvector associated with
j ). However,
q(f )(u) q(j )u = p(f )((f j id)(u)) = p(f )(v),
and since p(f )(v) W and q(f )(u) W , we conclude that q(j )u W . But, u
/ W , which
implies that q(j ) = 0, so j is a double root of m, a contradiction. Therefore, we must have
W = E.
Remark: Proposition 22.4 can be used to give a quick proof of Theorem 8.4.
Using Theorem 22.5, we can give a short proof about commuting diagonalizable linear
maps. If F is a family of linear maps on a vector space E, we say that F is a commuting
family iff f g = g f for all f, g F.
Proposition 22.6. Let F be a finite commuting family of diagonalizable linear maps on a
vector space E. There exists a basis of E such that every linear map in F is represented in
that basis by a diagonal matrix.
Proof. We proceed by induction on n = dim(E). If n = 1, there is nothing to prove. If
n > 1, there are two cases. If all linear maps in F are of the form id for some
K, then the proposition holds trivially. In the second case, let f F be some linear
map in F which is not a scalar multiple of the identity. In this case, f has at least two
distinct eigenvalues 1 , . . . , k , and because f is diagonalizable, E is the direct sum of the
corresponding eigenspaces E1 , . . . , Ek . For every index i, the eigenspace Ei is invariant
under f and under every other linear map g in F, since for any g F and any u Ei ,
because f and g commute, we have
f (g(u)) = g(f (u)) = g(i u) = i g(u)
so g(u) Ei . Let Fi be the family obtained by restricting each f F to Ei . By
proposition 22.2, the minimal polynomial of every linear map f | Ei in Fi divides the
minimal polynomial mf of f , and since f is diagonalizable, mf is a product of distinct
linear factors, so the minimal polynomial of f | Ei is also a product of distinct linear
factors. By Theorem 22.5, the linear map f | Ei is diagonalizable. Since k > 1, we have
dim(Ei ) < dim(E) for i = 1, . . . , k, and by the induction hypothesis, for each i there is
a basis of Ei over which f | Ei is represented by a diagonal matrix. Since the above
argument holds for all i, by combining the bases of the Ei , we obtain a basis of E such that
the matrix of every linear map f F is represented by a diagonal matrix.
619
Remark: Proposition 22.6 also holds for infinite commuting familes F of diagonalizable
linear maps, because E being finite dimensional, there is a finite subfamily of linearly independent linear maps in F spanning F. There is also an analogous result for commuting
families of linear maps represented by upper triangular matrices.
22.3
for some r 1?
This result is very nice but seems to require that the eigenvalues of f all belong to K.
Actually, it is a special case of a more general result involving the factorization of the
minimal polynomial m into its irreducible monic factors (See Theorem 20.16),
m = pr11 prkk ,
where the pi are distinct irreducible monic polynomials over K.
Theorem 22.7. (Primary Decomposition Theorem) Let f : E E be a linear map on the
finite-dimensional vector space E over the field K. Write the minimal polynomial m of f as
m = pr11 prkk ,
where the pi are distinct irreducible monic polynomials over K, and the ri are positive integers. Let
Wi = Ker (pri i (f )), i = 1, . . . , k.
Then
Proof. The trick is to construct projections i using the polynomials pj j so that the range
of i is equal to Wi . Let
Y r
gi = m/pri i =
pj j .
j6=i
Note that
pri i gi = m.
Since p1 , . . . , pk are irreducible and distinct, they are relatively prime. Then, using Proposition 20.13, it is easy to show that g1 , . . . , gk are relatively prime. Otherwise, some irreducible
polynomial p would divide all of g1 , . . . , gk , so by Proposition 20.13 it would be equal to one
of the irreducible factors pi . But, that pi is missing from gi , a contradiction. Therefore, by
Proposition 20.14, there exist some polynomials h1 , . . . , hk such that
g1 h1 + + gk hk = 1.
Let qi = gi hi and let i = qi (f ) = gi (f )hi (f ). We have
q1 + + qk = 1,
and since m divides qi qj for i 6= j, we get
1 + + k = id
i j = 0,
i 6= j.
(We implicitly used the fact that if p, q are two polynomials, the linear maps p(f ) q(f )
and q(f ) p(f ) are the same since p(f ) and q(f ) are polynomials in the powers of f , which
commute.) Composing the first equation with i and using the second equation, we get
i2 = i .
Therefore, the i are projections, and E is the direct sum of the images of the i . Indeed,
every u E can be expressed as
u = 1 (u) + + k (u).
Also, if
then by applying i we get
1 (u) + + k (u) = 0,
0 = i2 (u) = i (u),
i = 1, . . . k.
621
j 6= i.
u E,
But then, qgi is divisible by the minimal polynomial m = pri i gi of f , and since pri i and gi are
relatively prime, by Euclids Proposition, pri i must divide q. This finishes the proof that the
minimal polynomial of fi is pri i , which is (c).
If all the eigenvalues of f belong to the field K, we obtain the following result.
Theorem 22.8. (Primary Decomposition Theorem, Version 2) Let f : E E be a linear map on the finite-dimensional vector space E over the field K. If all the eigenvalues
1 , . . . , k of f belong to K, write
m = (X 1 )r1 (X k )rk
for the minimal polynomial of f ,
f = (X 1 )n1 (X k )nk
for the characteristic polynomial of f , with 1 ri ni , and let
Wi = Ker (i id f )ri ,
Then
i = 1, . . . , k.
i = 1, . . . , k.
Because E is the direct sum of the Wi , we have dim(W1 ) + + dim(Wk ) = n, and since
n1 + + nk = n, we must have
dim(Wi ) = ni ,
i = 1, . . . , k,
proving (c).
Definition 22.3. If K is an eigenvalue of f , we define a generalized eigenvector of f as
a nonzero vector u E such that
(id f )r (u) = 0,
for some r 1.
623
we have
f = f 1 + + f k ,
and so we get
f D = (f 1 id)1 + + (f k id)k .
Since the i are polynomials in f , they commute with f , and if we write N = f D, using
the properties of the i , we get
N r = (f 1 id)r 1 + + (f k id)r k .
Therefore, if r = max{ri }, we have (f k id)r = 0 for i = 1, . . . , k, which implies that
N r = 0.
A linear map g : E E is said to be nilpotent if there is some positive integer r such
that g r = 0.
Since N is a polynomial in f , it commutes with f , and thus with D. From
D = 1 1 + + k k ,
and
1 + + k = id,
we see that
D i id = 1 1 + + k k i (1 + + k )
= (1 i )1 + + (i1 i )i1 + (i+1 i )i+1 + + (k i )k .
Since the projections j with j 6= i vanish on Wi , the above equation implies that D i id
vanishes on Wi and that (D j id)(Wi ) Wi , and thus that the minimal polynomial of D
is
(X 1 ) (X k ).
Since the i are distinct, by Theorem 22.5, the linear map D is diagonalizable, so we have
shown that when all the eigenvalues of f belong to K, there exist a diagonalizable linear
map D and a nilpotent linear map N , such that
f =D+N
DN = N D,
and N and D are polynomials in f .
A decomposition of f as above is called a Jordan decomposition. In fact, we can prove
more: The maps D and N are uniquely determined by f .
Since both N and N 0 are nilpotent, we have N r1 = 0 and (N 0 )r2 = 0, for some r1 , r2 > 0, so
for r r1 + r2 , the right-hand side of the above expression is zero, which shows that N 0 N
is nilpotent. (In fact, it is easy that r1 = r2 = n works). It follows that D D0 = N 0 N
is both diagonalizable and nilpotent. Clearly, the minimal polynomial of a nilpotent linear
map is of the form X r for some r > 0 (and r dim(E)). But D D0 is diagonalizable, so
its minimal polynomial has simple roots, which means that r = 1. Therefore, the minimal
polynomial of D D0 is X, which says that D D0 = 0, and then N = N 0 .
If K is an algebraically closed field, then Theorem 22.9 holds. This is the case when
K = C. This theorem reduces the study of linear maps (from E to itself) to the study of
nilpotent operators. There is a special normal form for such operators which is discussed in
the next section.
22.4
625
This section is devoted to a normal form for nilpotent maps. We follow Godements exposition [47]. Let f : E E be a nilpotent linear map on a finite-dimensional vector space over
a field K, and assume that f is not the zero map. Then, there is a smallest positive integer
r 1 such f r 6= 0 and f r+1 = 0. Clearly, the polynomial X r+1 annihilates f , and it is the
minimal polynomial of f since f r 6= 0. It follows that r + 1 n = dim(E). Let us define
the subspaces Ni by
Ni = Ker (f i ), i 0.
Note that N0 = (0), N1 = Ker (f ), and Nr+1 = E. Also, it is obvious that
Ni Ni+1 ,
i 0.
Proposition 22.10. Given a nilpotent linear map f with f r 6= 0 and f r+1 = 0 as above, the
inclusions in the following sequence are strict:
(0) = N0 N1 Nr Nr+1 = E.
Proof. We proceed by contradiction. Assume that Ni = Ni+1 for some i with 0 i r.
Since f r+1 = 0, for every u E, we have
0 = f r+1 (u) = f i+1 (f ri (u)),
which shows that f ri (u) Ni+1 . Since Ni = Ni+1 , we get f ri (u) Ni , and thus f r (u) = 0.
Since this holds for all u E, we see that f r = 0, a contradiction.
Proposition 22.11. Given a nilpotent linear map f with f r 6= 0 and f r+1 = 0, for any
integer i with 1 i r, for any subspace U of E, if U Ni = (0), then f (U ) Ni1 = (0),
and the restriction of f to U is an isomorphism onto f (U ).
Proof. Pick v f (U ) Ni1 . We have v = f (u) for some u U and f i1 (v) = 0, which
means that f i (u) = 0. Then, u U Ni , so u = 0 since U Ni = (0), and v = f (u) = 0.
Therefore, f (U ) Ni1 = (0). The restriction of f to U is obviously surjective on f (U ).
Suppose that f (u) = 0 for some u U . Then u U N1 U Ni = (0) (since i 1), so
u = 0, which proves that f is also injective on U .
Proposition 22.12. Given a nilpotent linear map f with f r 6= 0 and f r+1 = 0, there exists
a sequence of subspace U1 , . . . , Ur+1 of E with the following properties:
(1) Ni = Ni1 Ui , for i = 1, . . . , r + 1.
(2) We have f (Ui ) Ui1 , and the restriction of f to Ui is an injection, for i = 2, . . . , r+1.
and f (Ur+1 ) Ur .
The fact that f is an injection from Ui into Ui1 follows from Proposition 22.11. Therefore,
the induction step is proved. The construction stops when i = 1.
Because N0 = (0) and Nr+1 = E, we see that E is the direct sum of the Ui :
E = U1 Ur+1 ,
with f (Ui ) Ui1 , and f an injection from Ui to Ui1 , for i = r + 1, . . . , 2. By a clever
choice of bases in the Ui , we obtain the following nice theorem.
Theorem 22.13. For any nilpotent linear map f : E E on a finite-dimensional vector
space E of dimension n over a field K, there is a basis of E such that the matrix N of f is
of the form
0 1 0 0 0
0 0 2 0 0
.. .. .. ,
N = ... ... ...
. . .
0 0 0 0 n
0 0 0 0 0
where i = 1 or i = 0.
627
Lr+1
i=1
Ui . Then, we define
r+1
er+1
1 , . . . , enr+1
with
ei1
= f (eij ),
j
j = 1 . . . , ni .
e
1
nr+1
.
..
..
.
.
..
..
.
1
1
e1 enr+1
j = 1, . . . , n1 .
ernr+1 +1
..
.
ernr
..
.
er1
nr+1 +1
..
.
..
.
1
enr+1 +1
er1
er1
nr +1
nr
..
..
.
.
..
..
.
.
1
1
enr enr +1
er1
nr1
..
.
..
.
1
enr1
e1n1
Finally, we define the basis (e1 , . . . , en ) by listing each column of the above matrix from
the bottom-up, starting with column one, then column two, etc. This means that we list
the vectors eij in the following order:
For j = 1, . . . , nr+1 , list e1j , . . . , er+1
j ;
In general, for i = r, . . . , 1,
for j = ni+1 + 1, . . . , ni , list e1j , . . . , eij .
Then, because f (e1j ) = 0 and ei1
= f (eij ) for i 2, either
j
f (ei ) = 0 or f (ei ) = ei1 ,
which proves the theorem.
1 0 0
0 1 0
. .
.
.
.
Jr () = .. .. . . . . .. ,
0 0 0 . . . 1
0 0 0
where K, with J1 () = () if r = 1. A
matrix of the form
Jr1 (1 )
..
J = .
0
..
..
,
.
.
Jrm (m )
where each Jrk (k ) is a Jordan block associated with some k K, and with r1 + +rm = n.
To simplify notation, we often write
matrix with four blocks:
1
0
0 0
0 0
J =
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.
0
Theorem 22.14. (Jordan form) Let E be a vector space of dimension n over a field K and
let f : E E be a linear map. The following properties are equivalent:
(1) The eigenvalues of f all belong to K (i.e. the roots of the characteristic polynomial f
all belong to K).
(2) There is a basis of E in which the matrix of f is a Jordan matrix.
L
Proof. Assume (1). First we apply Theorem 22.8, and we get a direct sum E = kj=1 Wk ,
such that the restriction of gi = f j id to Wi is nilpotent. By Theorem 22.13, there is a
basis of Wi such that the matrix of the restriction of gi is of the form
0 1 0 0 0
0 0 2 0 0
.. .. ..
.
.
.
.
.
.
Gi = . . .
,
.
.
.
0 0 0 0 ni
0 0 0 0 0
629
Chapter 23
Tensor Algebras, Symmetric Algebras
and Exterior Algebras
23.1
Tensors Products
We begin by defining tensor products of vector spaces over a field and then we investigate
some basic properties of these tensors, in particular the existence of bases and duality. After
this, we investigate special kinds of tensors, namely symmetric tensors and skew-symmetric
tensors. Tensor products of modules over a commutative ring with identity will be discussed
very briefly. They show up naturally when we consider the space of sections of a tensor
product of vector bundles.
Given a linear map f : E F , we know that if we have a basis (ui )iI for E, then f
is completely determined by its values f (ui ) on the basis vectors. For a multilinear map
f : E n F , we dont know if there is such a nice property but it would certainly be very
useful.
In many respects, tensor products allow us to define multilinear maps in terms of their
action on a suitable basis. The crucial idea is to linearize, that is, to create a new vector space
E n such that the multilinear map f : E n F is turned into a linear map f : E n F
which is equivalent to f in a strong sense. If in addition, f is symmetric, then we can define
a symmetric tensor power Symn (E), and every symmetric multilinear map f : E n F is
turned into a linear map f : Symn (E) F which is equivalent to f in a strong
Vn sense.
Similarly, if f is alternating, then we can define a skew-symmetric tensor
(E), and
V power
every alternating multilinear map is turned into a linear map f : n (E) F which is
equivalent to f in a strong sense.
Tensor products can be defined in various ways, some more abstract than others. We
tried to stay down to earth, without excess!
Let K be a given field, and let E1 , . . . , En be n 2 given vector spaces. For any vector
space F , recall that a map f : E1 En F is multilinear iff it is linear in each of its
631
632
The set of multilinear maps as above forms a vector space denoted L(E1 , . . . , En ; F ) or
Hom(E1 , . . . , En ; F ). When n = 1, we have the vector space of linear maps L(E, F ) (also
denoted Hom(E, F )). (To be very precise, we write HomK (E1 , . . . , En ; F ) and HomK (E, F ).)
As usual, the dual space E of E is defined by E = Hom(E, K).
Before proceeding any further, we recall a basic fact about pairings. We will use this fact
to deal with dual spaces of tensors.
Definition 23.1. Given two vector spaces E and F , a map h, i : E F K is a
nondegenerate pairing iff it is bilinear and iff hu, vi = 0 for all v F implies u = 0, and
hu, vi = 0 for all u E implies v = 0. A nondegenerate pairing induces two linear maps
: E F and : F E defined by
(u)(y) = hu, yi
(v)(x) = hx, vi,
for all u, x E and all v, y F .
Proposition 23.1. For every nondegenerate pairing h, i : E F K, the induced maps
: E F and : F E are linear and injective. Furthermore, if E and F are finite
dimensional, then : E F and : F E are bijective.
Proof. The maps : E F and : F E are linear because u, v 7 hu, vi is bilinear.
Assume that (u) = 0. This means that (u)(y) = hu, yi = 0 for all y F , and as our
pairing is nondegenerate, we must have u = 0. Similarly, is injective. If E and F are finite
dimensional, then dim(E) = dim(E ) and dim(F ) = dim(F ). However, the injectivity of
and implies that that dim(E) dim(F ) and dim(F ) dim(E ). Consequently dim(E)
dim(F ) and dim(F ) dim(E), so dim(E) = dim(F ). Therefore, dim(E) = dim(F ) and
is bijective (and similarly dim(F ) = dim(E ) and is bijective).
Proposition 23.1 shows that when E and F are finite dimensional, a nondegenerate pairing
induces canonical isomorphims : E F and : F E ; that is, isomorphisms that do
not depend on the choice of bases. An important special case is the case where E = F and
we have an inner product (a symmetric, positive definite bilinear form) on E.
Remark: When we use the term canonical isomorphism, we mean that such an isomorphism is defined independently of any choice of bases. For example, if E is a finite dimensional vector space and (e1 , . . . , en ) is any basis of E, we have the dual basis (e1 , . . . , en ) of
633
u (ei ) = hu, ei i =
so we get
[
u =
X
n
n
X
uj ej , ei
j=1
n
X
j=1
i ei ,
uj hej , ei i =
with i =
n
X
i=1
n
X
gij uj ,
j=1
gij uj .
j=1
If we P
use the convention that coordinates of vectors are written using superscripts
n
i
(u = P
i=1 u ei ) and coordinates of one-forms (covectors) are written using subscripts
n
indices. The
( =
i=1 i ei ), then the map [ has the effect of lowering (flattening!)
Pn
and
] E as
e
inverse
of
[
is
denoted
]
:
E
E.
If
we
write
E
as
=
i=1 i i
P
] = nj=1 ( ] )j ej , since
i = (ei ) = h ] , ei i =
n
X
( ] )j gij ,
j=1
we get
] i
( ) =
n
X
1 i n,
g ij j ,
j=1
ij
for all u, v E.
n
X
k=1
g ik ek ,
634
u, v E,
is clearly bilinear. It is also clear that the above defines a linear map from Hom(E, E) to
Hom(E, E; K). This map is injective, because if f (u, v) = 0 for all u, v E, as h, i is
an inner product, we get g(u) = 0 for all u E. Furthermore, both spaces Hom(E, E) and
Hom(E, E; K) have the same dimension, so our linear map is an isomorphism.
If (e1 , . . . , en ) is an orthonormal basis of E, then we check immediately that the trace of
a linear map g (which is independent of the choice of a basis) is given by
tr(g) =
n
X
i=1
hg(ei ), ei i,
n
X
f (ei , ei ),
i=1
for any orthonormal basis (e1 , . . . , en ) of E. We can also check directly that the above
expression is independent of the choice of an orthonormal basis.
We will also need the following Proposition to show that various families are linearly
independent.
635
Proposition 23.3. Let E and F be two nontrivial vector spaces and let (ui )iI be any family
of vectors ui E. The family (ui )iI is linearly independent iff for every family (vi )iI of
vectors vi F , there is some linear map f : E F so that f (ui ) = vi for all i I.
Proof. Left as an exercise.
First, we define tensor products, and then we prove their existence and uniqueness up to
isomorphism.
Definition 23.2. A tensor product of n 2 vector spaces E1 , . . . , En is a vector space T
together with a multilinear map : E1 En T , such that for every vector space F
and for every multilinear map f : E1 En F , there is a unique linear map f : T F
with
f (u1 , . . . , un ) = f ((u1 , . . . , un )),
for all u1 E1 , . . . , un En , or for short
f = f .
Equivalently, there is a unique linear map f such that the following diagram commutes:
/ T
NNN
NNN
f
NNN
f
N&
E1 N En
First, we show that any two tensor products (T1 , 1 ) and (T2 , 2 ) for E1 , . . . , En , are
isomorphic.
Proposition 23.4. Given any two tensor products (T1 , 1 ) and (T2 , 2 ) for E1 , . . . , En , there
is an isomorphism h : T1 T2 such that
2 = h 1 .
Proof. Focusing on (T1 , 1 ), we have a multilinear map 2 : E1 En T2 , and thus
there is a unique linear map (2 ) : T1 T2 with
2 = (2 ) 1 .
Similarly, focusing now on on (T2 , 2 ), we have a multilinear map 1 : E1 En T1 ,
and thus there is a unique linear map (1 ) : T2 T1 with
1 = (1 ) 2 .
636
637
i I. Furthermore, it is easy to show that for any vector space F , and for any function
f : I F , there is a unique linear map f : K (I) F such that f = f , as in the following
diagram:
I CC / K (I)
CC
CC
f
C
f CC
!
(I)
This shows that K is the free vector space generated by I. Now, apply this construction
to the cartesian product I = E1 En , obtaining the free vector space M = K (I) on
I = E1 En . Since every ei is uniquely associated with some n-tuple i = (u1 , . . . , un )
E1 En , we denote ei by (u1 , . . . , un ).
Next, let N be the subspace of M generated by the vectors of the following type:
(u1 , . . . , ui + vi , . . . , un ) (u1 , . . . , ui , . . . , un ) (u1 , . . . , vi , . . . , un ),
(u1 , . . . , ui , . . . , un ) (u1 , . . . , ui , . . . , un ).
We let E1 En be the quotient M/N of the free vector space M by N , : M M/N
be the quotient map, and set
= .
E1 REn
Because f is multilinear, note that we must have f (w) = 0 for every w N . But then,
f : M F induces a linear map h : M/N F such that
f = h ,
638
by defining h([z]) = f (z) for every z M , where [z] denotes the equivalence class in M/N
of z M :
E1 SEn / K (E1 En ) /N
SSS
SSS
SSS
h
SSS
SSS
f
S)
Indeed, the fact that f vanishes on N insures that h is well defined on M/N , and it is clearly
linear by definition. However, we showed that such a linear map h is unique, and thus it
agrees with the linear map f defined by
f (u1 un ) = f (u1 , . . . , un )
on the generators of E1 En .
What is important about Theorem 23.5 is not so much the construction itself but the
fact that it produces a tensor product with the universal mapping property with respect to
multilinear maps. Indeed, Theorem 23.5 yields a canonical isomorphism
L(E1 En , F )
= L(E1 , . . . , En ; F )
between the vector space of linear maps L(E1 En , F ), and the vector space of multilinear maps L(E1 , . . . , En ; F ), via the linear map defined by
h 7 h ,
where h L(E1 En , F ). Indeed, h is clearly multilinear, and since by Theorem
23.5, for every multilinear map f L(E1 , . . . , En ; F ), there is a unique linear map f
L(E1 En , F ) such that f = f , the map is bijective. As a matter of fact,
its inverse is the map
f 7 f .
Using the Hom notation, the above canonical isomorphism is written
Hom(E1 En , F )
= Hom(E1 , . . . , En ; F ).
Remarks:
(1) To be very precise, since the tensor product depends on the field K, we should subscript
the symbol with K and write
E1 K K En .
However, we often omit the subscript K unless confusion may arise.
639
(2) For F = K, the base field, we obtain a canonical isomorphism between the vector
space L(E1 En , K), and the vector space of multilinear forms L(E1 , . . . , En ; K).
However, L(E1 En , K) is the dual space (E1 En ) , and thus the vector space
of multilinear forms L(E1 , . . . , En ; K) is canonically isomorphic to (E1 En ) .
We write
L(E1 , . . . , En ; K)
= (E1 En ) .
The fact that the map : E1 En E1 En is multilinear, can also be
expressed as follows:
u1 (vi + wi ) un = (u1 vi un ) + (u1 wi un ),
u1 (ui ) un = (u1 ui un ).
Of course, this is just what we wanted! Tensors in E1 En are also called n-tensors,
and tensors of the form u1 un , where ui Ei are called simple (or indecomposable)
n-tensors. Those n-tensors that are not simple are often called compound n-tensors.
Not only do tensor products act on spaces, but they also act on linear maps (they are
functors). Given two linear maps f : E E 0 and g : F F 0 , we can define h : E F
E 0 F 0 by
h(u, v) = f (u) g(v).
It is immediately verified that h is bilinear, and thus it induces a unique linear map
f g : E F E0 F 0
such that
(f g)(u v) = f (u) g(u).
If we also have linear maps f 0 : E 0 E 00 and g 0 : F 0 F 00 , we can easily verify that
the linear maps (f 0 f ) (g 0 g) and (f 0 g 0 ) (f g) agree on all vectors of the form
u v E F . Since these vectors generate E F , we conclude that
(f 0 f ) (g 0 g) = (f 0 g 0 ) (f g).
The generalization to the tensor product f1 fn of n 3 linear maps fi : Ei Fi
is immediate, and left to the reader.
23.2
640
for some family of scalars (vjk )jIk . Let F be any nontrivial vector space. We show that for
every family
(wi1 ,...,in )(i1 ,...,in )I1 ...In ,
of vectors in F , there is some linear map h : E1 En F such that
h(u1i1 unin ) = wi1 ,...,in .
Then, by Proposition 23.3, it follows that
(u1i1 unin )(i1 ,...,in )I1 ...In
is linearly independent. However, since (uki )iIk is a basis for Ek , the u1i1 unin also
generate E1 En , and thus, they form a basis of E1 En .
We define the function f : E1 En F as follows:
X
X
X
f(
vj11 u1j1 , . . . ,
vj11 vjnn wj1 ,...,jn .
vjnn unjn ) =
j1 I1
jn In
j1 I1 ,...,jn In
for some unique family of scalars i1 ,...,in K, all zero except for a finite number.
23.3
641
Proposition 23.7. Given 3 vector spaces E, F, G, there exists unique canonical isomorphisms
(1) E F ' F E
(2) (E F ) G ' E (F G) ' E F G
(3) (E F ) G ' (E G) (F G)
(4) K E ' E
such that respectively
(a) u v 7 v u
(b) (u v) w 7 u (v w) 7 u v w
(c) (u, v) w 7 (u w, v w)
(d) u 7 u.
Proof. These isomorphisms are proved using the universal mapping property of tensor products. We illustrate the proof method on (2). Fix some w G. The map
(u, v) 7 u v w
from E F to E F G is bilinear, and thus there is a linear map fw : E F E F G
such that fw (u v) = u v w.
Next, consider the map
(z, w) 7 fw (z),
from (E F ) G into E F G. It is easily seen to be bilinear, and thus it induces a
linear map
f : (E F ) G E F G
such that f ((u v) w) = u v w.
Also consider the map
(u, v, w) 7 (u v) w
from E F G to (E F ) G. It is trilinear, and thus there is a linear map
g : E F G (E F ) G
such that g(u v w) = (u v) w. Clearly, f g and g f are identity maps, and thus
f and g are isomorphisms. The other cases are similar.
642
23.4
In this section, all vector spaces are assumed to have finite dimension. Let us now see how
tensor products behave under duality. For this, we define a pairing between E1 En and
E1 En as follows: For any fixed (v1 , . . . , vn ) E1 En , we have the multilinear
map
lv1 ,...,vn : (u1 , . . . , un ) 7 v1 (u1 ) vn (un )
from E1 En to K. The map lv1 ,...,vn extends uniquely to a linear map
Lv1 ,...,vn : E1 En K. We also have the multilinear map
(v1 , . . . , vn ) 7 Lv1 ,...,vn
from E1 En to Hom(E1 En , K), which extends to a linear map L from
E1 En to Hom(E1 En , K). However, in view of the isomorphism
Hom(U V, W )
= Hom(U, Hom(V, W )),
we can view L as a linear map
L : (E1 En ) (E1 En ) K,
which corresponds to a bilinear map
(E1 En ) (E1 En ) K,
643
(u f )(x) = u (x)f.
Proposition 23.9. If E and F are vector spaces, then the following properties hold:
(1) The linear map : E F Hom(E, F ) is injective.
(2) If E is finite-dimensional, then : E F Hom(E, F ) is a canonical isomorphism.
(3) If F is finite-dimensional, then : E F Hom(E, F ) is a canonical isomorphism.
Proof. (1) Let (ei )iI be a basis of E and let (fj )jJ be a basis of F . Then, we know that
(ei fj )iI,jJ is a basis of E F . To prove that is injective, let us show that its kernel
is reduced to (0). For any vector
X
=
ij ei fj
iI 0 ,jJ 0
in E F , with I 0 and J 0 some finite sets, assume that () = 0. This means that for
every x E, we have ()(x) = 0; that is,
X
XX
(ij ei fj )(x) =
ij ei (x) fj = 0.
iI 0 ,jJ 0
jJ 0
iI
644
But, then (ei )iI 0 would be linearly dependent, contradicting the fact that (ei )iI is a basis
of E , so we must have
ij = 0,
f1 + +
en
n
n
X
X
fn ))(x) =
( (ei fi ))(x) =
ei (x)fi .
i=1
i=1
645
23.5
Tensor Algebras
is also denoted as
m
O
or V m
and is called the m-th tensor power of V (with V 1 = V , and V 0 = K). We can pack all
the tensor powers of V into the big vector space
M
T (V ) =
V m ,
m0
also denoted T (V ) to avoid confusion with the tangent bundle. This is an interesting object
because we can define a multiplication operation on it which makes it into an algebra called
the tensor algebra of V . When V is of finite dimension n, this space corresponds to the
algebra of polynomials with coefficients in K in n noncommuting variables.
Let us recall the definition of an algebra over a field. Let K denote any (commutative)
field, although for our purposes, we may assume that K = R (and occasionally, K = C).
Since we will only be dealing with associative algebras with a multiplicative unit, we only
define algebras of this kind.
Definition 23.3. Given a field K, a K-algebra is a K-vector space A together with a bilinear
operation : A A A, called multiplication, which makes A into a ring with unity 1 (or
1A , when we want to be very precise). This means that is associative and that there is
a multiplicative identity element 1 so that 1 a = a 1 = a, for all a A. Given two
K-algebras A and B, a K-algebra homomorphism h : A B is a linear map that is also a
ring homomorphism, with h(1A ) = 1B .
For example, the ring Mn (K) of all n n matrices over a field K is a K-algebra.
646
where vi V ni and the ni are natural numbers with ni 6= nj if i 6= j, to define multiplication in T (V ), using bilinearity, it is enough to define multiplication operations
: V m V n V (m+n) , which, using the isomorphisms V n
= n (V n ), yield multiplication operations : m (V m ) n (V n ) m+n (V (m+n) ). More precisely, we use the
canonical isomorphism
V m V n
= V (m+n)
which defines a bilinear operation
V m V n V (m+n) ,
V
V}
= V (m+n) ,
| {z
n
which can be shown using methods similar to those used to proved associativity. Of course,
the multiplication V m V n V (m+n) is defined so that
(v1 vm ) (w1 wn ) = v1 vm w1 wn .
(This has to be made rigorous by using isomorphisms involving the associativity of tensor
products, for details, see see Atiyah and Macdonald [5].)
Remark: It is important to note that multiplication in T (V ) is not commutative. Also, in
all rigor, the unit 1 of T (V ) is not equal to 1, the unit of the field K. However, in view
of the injection 0 : K T (V ), for the sake of notational simplicity, we will denote 1 by 1.
More generally, in view of the injections n : V n T (V ), we identify elements of V n with
their images in T (V ).
The algebra T (V ) satisfies a universal mapping property which shows that it is unique
up to isomorphism. For simplicity of notation, let i : V T (V ) be the natural injection of
V into T (V ).
Proposition 23.10. Given any K-algebra A, for any linear map f : V A, there is a
unique K-algebra homomorphism f : T (V ) A so that
f = f i,
as in the diagram below:
V EE i / T (V )
EE
EE
f
E
f EE
"
A
647
k0
and the multiplication behaves well w.r.t. the grading, i.e., : V m V n V (m+n) .
Generally, a K-algebra E is said to be a graded algebra iff there is a sequence of subspaces
E n E such that
M
E=
E n,
k0
(with E 0 = K) and the multiplication respects the grading; that is, : E m E n E m+n .
Elements in E n are called homogeneous elements of rank (or degree) n.
In differential geometry and in physics it is necessary to consider slightly more general
tensors.
Definition 23.4. Given a vector space V , for any pair of nonnegative integers (r, s), the
tensor space T r,s (V ) of type (r, s) is the tensor product
V } ,
T r,s (V ) = V r (V )s = V
V} V
| {z
| {z
r
648
(T r,s (V ))
= Hom(V r , (V )s ; K).
For finite dimensional vector spaces, the duality of Section 23.4 is also easily extended to the
tensor spaces T r,s (V ). We define the pairing
T r,s (V ) T r,s (V ) K
as follows: If
and
then
u = u1 ur vr+1
vr+s
T r,s (V ),
(v , u) = v1 (u1 ) vr+s
(ur+s ).
The tradition in classical tensor notation is to use lower indices on vectors and upper
indices on linear forms and in accordance to Einstein summation convention (or Einstein
notation) the position of the indices on the coefficients is reversed. Einstein summation
convention is to assume that a summation is performed for all values of every index that
appears simultaneously once as an upper index and once as a lower index. According to this
convention, the tensor above is written
j1
js
r
= aij11,...,i
,...,js ei1 eir e e .
649
Definition 23.5. For all r, s 1, the contraction ci,j : T r,s (V ) T r1,s1 (V ), with 1 i r
and 1 j s, is the linear map defined on generators by
ci,j (u1 ur v1 vs )
As
c1,1 (ei ej ) = i,j ,
we get
c1,1 (h) =
n
X
aii = tr(h),
i=1
where tr(h) is the trace of h, where h is viewed as the linear map given by the matrix, (aij ).
Actually, since c1,1 is defined independently of any basis, c1,1 provides an intrinsic definition
of the trace of a linear map h Hom(V, V ).
Remark: Using the Einstein summation convention, if
j1
js
r
= aij11,...,i
,...,js ei1 eir e e ,
then
i ,...,i
,i,ik+1 ...,ir
j1
js
c
j
ck,l () = aj11 ,...,jk1
e ec
ik eir e e l e .
l1 ,i,jl+1 ,...,js i1
If E and F are two K-algebras, we know that their tensor product E F exists as a
vector space. We can make E F into an algebra as well. Indeed, we have the multilinear
map
E F E F E F
650
given by (a, b, c, d) 7 (ac) (bd), where ac is the product of a and c in E and bd is the
product of b and d in F . By the universal mapping property, we get a linear map,
E F E F E F.
Using the isomorphism
EF EF
= (E F ) (E F ),
we get a linear map
(E F ) (E F ) E F,
and thus a bilinear map,
(E F ) (E F ) E F
which is our multiplication operation in E F . This multiplication is determined by
(a b) (c d) = (ac) (bd).
One immediately checks that E F with this multiplication is a K-algebra.
We now turn to symmetric tensors.
23.6
Our goal is to come up with a notion of tensor product that will allow us to treat symmetric
multilinear maps as linear maps. First, note that we have to restrict ourselves to a single
vector space E, rather then n vector spaces E1 , . . . , En , so that symmetry makes sense.
Recall that a multilinear map f : E n F is symmetric iff
f (u(1) , . . . , u(n) ) = f (u1 , . . . , un ),
for all ui E and all permutations, : {1, . . . , n} {1, . . . , n}. The group of permutations
on {1, . . . , n} (the symmetric group) is denoted Sn . The vector space of all symmetric
multilinear maps f : E n F is denoted by Sn (E; F ). Note that S1 (E; F ) = Hom(E, F ).
We could proceed directly as in Theorem 23.5 and construct symmetric tensor products
from scratch. However, since we already have the notion of a tensor product, there is a more
economical method. First, we define symmetric tensor powers.
Definition 23.6. An n-th symmetric tensor power of a vector space E, where n 1, is a
vector space S together with a symmetric multilinear map : E n S such that, for every
vector space F and for every symmetric multilinear map f : E n F , there is a unique linear
map f : S F , with
f (u1 , . . . , un ) = f ((u1 , . . . , un )),
651
f = f .
Equivalently, there is a unique linear map f such that the following diagram commutes:
/ S
CC
CC
f
C
f CC!
E nC
First, we show that any two symmetric n-th tensor powers (S1 , 1 ) and (S2 , 2 ) for E
are isomorphic.
Proposition 23.11. Given any two symmetric n-th tensor powers (S1 , 1 ) and (S2 , 2 ) for
E, there is an isomorphism h : S1 S2 such that
2 = h 1 .
Proof. Replace tensor product by n-th symmetric tensor power in the proof of Proposition
23.4.
We now give a construction that produces a symmetric n-th tensor power of a vector
space E.
Theorem 23.12. Given a vector space E, a symmetric n-th tensor power (Symn (E), )
for E can be constructed (n 1). Furthermore, denoting (u1 , . . . , un ) as u1 un ,
the symmetric tensor power Symn (E) is generated by the vectors u1 un , where
u1 , . . . , un E, and for every symmetric multilinear map f : E n F , the unique linear
map f : Symn (E) F such that f = f is defined by
f (u1 un ) = f (u1 , . . . , un )
on the generators u1 un of Symn (E).
Proof. The tensor power E n is too big, and thus we define an appropriate quotient. Let C
be the subspace of E n generated by the vectors of the form
u1 un u(1) u(n) ,
for all ui E, and all permutations : {1, . . . , n} {1, . . . , n}. We claim that the quotient
space (E n )/C does the job.
Let p : E n (E n )/C be the quotient map. Let : E n (E n )/C be the map
(u1 , . . . , un ) 7 p(u1 un ),
652
/ E n
FF
FF
f
F
f FF
#
E n FF
E n JJ
(E n )/C
JJ
JJ
JJ
h
f JJJ
%
Again, the actual construction is not important. What is important is that the symmetric
n-th power has the universal mapping property with respect to symmetric multilinear maps.
Remark: The notation for the commutative multiplication of symmetric tensor powers is
not standard. Another notation commonly used is . We often abbreviate symmetric tensor
n
power as symmetric power. The symmetric power Symn (E) is also denoted
Jn Sym E or
S(E). To be consistent with the use of , we could have used the notation
E. Clearly,
1
0
653
The last identity shows that the operation is commutative. Thus, we can view the
symmetric tensor u1 un as a multiset.
Theorem 23.12 yields a canonical isomorphism
Hom(Symn (E), F )
= S(E n ; F ),
between the vector space of linear maps Hom(Symn (E), F ), and the vector space of symmetric multilinear maps S(E n ; F ), via the linear map defined by
h 7 h ,
where h Hom(Symn (E), F ). Indeed, h is clearly symmetric multilinear, and since by
Theorem 23.12, for every symmetric multilinear map f S(E n ; F ), there is a unique linear
map f Hom(Symn (E), F ) such that f = f , the map is bijective. As a matter
of fact, its inverse is the map
f 7 f .
In particular, when F = K, we get a canonical isomorphism
(Symn (E))
= Sn (E; K).
Symmetric tensors in Symn (E) are also called symmetric n-tensors, and tensors of the
form u1 un , where ui E, are called simple (or decomposable) symmetric n-tensors.
Those symmetric n-tensors that are not simple are often called compound symmetric ntensors.
by
It is immediately verified that h is symmetric bilinear, and thus it induces a unique linear
map
f g : Sym2 (E) Sym2 (E 0 ),
such that
(f g)(u v) = f (u) g(u).
If we also have linear maps f 0 : E 0 E 00 and g 0 : E 0 E 00 , we can easily verify that
(f 0 f ) (g 0 g) = (f 0 g 0 ) (f g).
The generalization to the symmetric tensor product f1 fn of n 3 linear maps
fi : E E 0 is immediate, and left to the reader.
654
23.7
The vectors u1 un , where u1 , . . . , un E, generate Symn (E), but they are not linearly
independent. We will prove a version of Proposition 23.6 for symmetric tensor powers. For
this, recall that a (finite) multiset over a set I is a function M : I N, such that M (i) 6= 0
for finitely many i I, and that the set of all multisets over I is denoted as N(I) . We let
(I)
dom(M ) = {i
PI | M (i) 6= 0}, which is a finite
Pset. Then, for
P any multiset M N , note
that the sum
iI M (i) makes sense, since
iI M (i) =
idom(M ) M (i), and dom(M )
(I)
is finite. For every multiset M N , for any n 2, we define the set JM of functions
: {1, . . . , n} dom(M ), as follows:
X
JM = { | : {1, . . . , n} dom(M ), | 1 (i)| = M (i), i dom(M ),
M (i) = n}.
iI
In other words, if iI M (i) = n and dom(M ) = {i1 , . . . , ik },1 any function JM specifies
a sequence of length n, consisting of M (i1 ) occurrences of i1 , M (i2 ) occurrences of i2 , . . .,
M (ik ) occurrences of ik . Intuitively, any defines a permutation of the sequence (of length
n)
hi1 , . . . , i1 , i2 , . . . , i2 , . . . , ik , . . . , ik i.
| {z }
| {z } | {z }
P
M (i1 )
M (i2 )
M (ik )
as uk .
We can now prove the following Proposition.
Proposition 23.13. Given a vector space E, if (ui )iI is a basis for E, then the family of
vectors
M (i1 )
M (ik )
ui1
u ik
P
M N(I) ,
iI
Proof. The proof is very similar to that of Proposition 23.6. For any nontrivial vector space
F , for any family of vectors
(wM )M N(I) , PiI M (i)=n ,
h(ui1
1
M (ik )
u ik
) = wM ,
655
jn I
(I)
P M N
iI M (i)=n
JM
It is not difficult to verify that f is symmetric and multilinear. By the universal mapping
property of the symmetric tensor product, the linear map f : Symn (E) F such that
f = f , is the desired map h. Then, by Proposition 23.3, it follows that the family
M (ik )
M (i1 )
u ik
ui1
P
M N(I) ,
iI
is linearly independent. Using the commutativity of , we can also show that these vectors
generate Symn (E), and thus, they form a basis for Symn (E). The details are left as an
exercise.
As a consequence, when I is finite, say of size p = dim(E), the dimension of Symn (E) is
the number of finite multisets (j1 , . . . , jp ), such
that j1 + + jp = n, jk 0. We leave as
an exercise to show that this number is p+n1
. Thus, if dim(E) = p, then the dimension of
n
p+n1
n
Sym (E) is
. Compare with the dimension of E n , which is pn . In particular, when
n
p = 2, the dimension of Symn (E) is n + 1. This can also be seen directly.
Remark: The number p+n1
is also the number of homogeneous monomials
n
X1j1 Xpjp
of total degree n in p variables (we have j1 + + jp = n). This is not a coincidence!
Symmetric tensor products are closely related to polynomials (for more on this, see the next
remark).
Given a vector space E and a basis (ui )iI for E, Proposition 23.13 shows that every
symmetric tensor z Symn (E) can be written in a unique way as
X
M (i )
M (i )
z=
M ui1 1 uik k ,
(I)
P M N
iI M (i)=n
{i1 ,...,ik }=dom(M )
for some unique family of scalars M K, all zero except for a finite number.
This looks like a homogeneous polynomial of total degree n, where the monomials of total
degree n are the symmetric tensors
M (i1 )
ui1
M (ik )
uik
in the indeterminates ui , where i I (recall that M (i1 ) + + M (ik ) = n). Again, this
is not a coincidence. Polynomials can be defined in terms of symmetric tensors.
656
23.8
We can show the following property of the symmetric tensor product, using the proof technique of Proposition 23.7:
Sym (E F )
=
n
23.9
n
M
k=0
In this section, all vector spaces are assumed to have finite dimension. We define a nondegenerate pairing Symn (E ) Symn (E) K as follows: Consider the multilinear map
(E )n E n K
given by
(v1 , . . . , vn , u1 , . . . , un ) 7
X
Sn
v(1)
(u1 ) v(n)
(un ).
Note that the expression on the right-hand side is almost the determinant det(vj (ui )),
except that the sign sgn() is missing (where sgn() is the signature of the permutation ;
that is, the parity of the number of transpositions into which can be factored). Such an
expression is called a permanent.
vj .
It is easily checked that this expression is symmetric w.r.t. the ui s and also w.r.t. the
For any fixed (v1 , . . . , vn ) (E )n , we get a symmetric multinear map
X
from E n to K. The map lv1 ,...,vn extends uniquely to a linear map Lv1 ,...,vn : Symn (E) K.
Now, we also have the symmetric multilinear map
(v1 , . . . , vn ) 7 Lv1 ,...,vn
from (E )n to Hom(Symn (E), K), which extends to a linear map L from Symn (E ) to
Hom(Symn (E), K). However, in view of the isomorphism
Hom(U V, W )
= Hom(U, Hom(V, W )),
we can view L as a linear map
L : Symn (E ) Symn (E) K,
which corresponds to a bilinear map
Symn (E ) Symn (E) K.
657
Now, this pairing in nondegenerate. This can be shown using bases and we leave it as
an exercise to the reader (see Knapp [64], Appendix A). Therefore, we get a canonical
isomorphism
(Symn (E))
= Symn (E ).
Since we also have an isomorphism
(Symn (E))
= Sn (E, K),
we get a canonical isomorphism
: Symn (E )
= Sn (E, K)
which allows us to interpret symmetric tensors over E as symmetric multilinear maps.
Remark: The isomorphism : Symn (E )
= Sn (E, K) discussed above can be described
explicity as the linear extension of the map given by
X
v(1)
(v1 vn )(u1 , . . . , un ) =
(u1 ) v(n)
(un ).
Sn
It is immediately checked that this is a left action of the symmetric group Sn on E n , and
the tensors z E n such that
z = z,
for all Sn
1 X
1 X
(u1 un ) =
u(1) u(n) .
n! S
n! S
n
As the right hand side is clearly symmetric, we get a linear map : Symn (E) E n .
Clearly, (Symn (E)) is the set of symmetrized tensors in E n . If we consider the map
658
It turns out that Ker S = E n I = Ker , where I is the two-sided ideal of T (E) generated
by all tensors of the form u v v u E 2 (for example, see Knapp [64], Appendix A).
Therefore, is injective,
E n = (Symn (E)) (E n I) = (Symn (E)) Ker ,
and the symmetric tensor power Symn (E) is naturally embedded into E n .
23.10
Symmetric Algebras
As in the case of tensors, we can pack together all the symmetric powers Symn (V ) into an
algebra
M
Sym(V ) =
Symm (V ),
m0
called the symmetric tensor algebra of V . We could adapt what we did in Section 23.5 for
general tensor powers to symmetric tensors but since we already have the algebra T (V ),
we can proceed faster. If I is the two-sided ideal generated by all tensors of the form
u v v u V 2 , we set
Sym (V ) = T (V )/I.
Then, Sym (V ) automatically inherits a multiplication operation which is commutative, and
since T (V ) is graded, that is
M
T (V ) =
V m ,
m0
we have
Sym (V ) =
M
m0
V m /(I V m ).
Sym (V )
= Sym(V ).
659
Proposition 23.14. Given any commutative K-algebra A, for any linear map f : V A,
there is a unique K-algebra homomorphism f : Sym(V ) A so that
f = f i,
as in the diagram below:
V HH / Sym(V )
HH
HH
H
f
f HHH
$
A
i
X
Sn
v(1)
(u1 ) v(n)
(un ).
Symm+n (E )
Sm (E, K) Sn (E, K)
The answer is yes! The solution is to define this multiplication such that, for f Sm (E, K)
and g Sn (E, K),
(f g)(u1 , . . . , um+n ) =
shuffle(m,n)
where shuffle(m, n) consists of all (m, n)-shuffles; that is, permutations of {1, . . . m + n}
such that (1) < < (m) and (m + 1) < < (m + n). We urge the reader to check
this fact.
Another useful canonical isomorphim (of K-algebras) is
Sym(E F )
= Sym(E) Sym(F ).
660
23.11
661
(ii) Clearly, the symmetric group, Sn , acts on Altn (E; F ) on the left, via
f (u1 , . . . , un ) = f (u(1) , . . . , u(n) ).
so that
2f (. . . , ui , ui , . . .) = 0,
and in every characteristic except 2, we conclude that f (. . . , ui , ui , . . .) = 0, namely f is
alternating.
Proposition 23.15 shows that in every characteristic except 2, alternating and skewsymmetric multilinear maps are identical. Using Proposition 23.15 we easily deduce the
following crucial fact:
Proposition 23.16. Let f : E n F be an alternating multilinear map. For any families of
vectors, (u1 , . . . , un ) and (v1 , . . . , vn ), with ui , vi E, if
vj =
n
X
aij ui ,
i=1
1 j n,
then
!
f (v1 , . . . , vn ) =
X
Sn
662
f = f .
Equivalently, there is a unique linear map f such that the following diagram commutes:
/ A
CC
CC
f
C
f CC!
E nC
First, we show that any two n-th exterior tensor powers (A1 , 1 ) and (A2 , 2 ) for E are
isomorphic.
Proposition 23.17. Given any two n-th exterior tensor powers (A1 , 1 ) and (A2 , 2 ) for
E, there is an isomorphism h : A1 A2 such that
2 = h 1 .
Proof. Replace tensor product by n exterior tensor power in the proof of Proposition 23.4.
We now give a construction that produces an n-th exterior tensor power of a vector space
E.
V
Theorem 23.18. Given a vector space E, an n-th exterior tensor power ( n (E), ) for E
can be constructed
Vn (n 1). Furthermore, denoting (u1 , . . . , un ) as u1 un , the exterior
tensor power
(E) is generated by the vectors u1 un , where u1 , . . .V
, un E, and for
n
every alternating multilinear map f : E F , the unique linear map f : n (E) F such
that f = f is defined by
f (u1 un ) = f (u1 , . . . , un )
on the gxenerators u1 un of
Vn
(E).
663
Proof sketch. We can give a quick proof using the tensor algebra T (E). let Ia be the twosided ideal of T (E) generated by all tensors of the form u u E 2 . Then, let
n
^
(E) = E n /(Ia E n )
Vn
n
and let be the projection
(E). If we let u1 un = (u1 un ), it
Vn : E
is easy to check that ( (E), ) satisfies the conditions of Theorem 23.18.
Remark: We can also define
n
^
M^
(E) = T (E)/Ia =
(E),
n0
the exterior algebra of E. This is the skew-symmetric counterpart of Sym(E), and we will
study it a little later.
V
V
For simplicity of notation, we may writeV n E for n (E). We also abbreviate V
exterior
0
tensor power as exterior power. Clearly, 1 (E)
(E) =
= E, and it is convenient to set
K.
V
The fact that the map : E n n (E) is alternating and multinear can also be expressed
as follows:
u1 (ui + vi ) un = (u1 ui un )
+ (u1 vi un ),
u1 (ui ) un = (u1 ui un ),
u(1) u(n) = sgn() u1 un ,
for all Sn .
V
between the vector space of linear maps Hom( n (E), F ) and the vector space of alternating
multilinear maps Altn (E; F ), via the linear map defined by
h 7 h ,
V
where h Hom( n (E), F ). In particular, when F = K, we get a canonical isomorphism
!
n
^
(E)
= Altn (E; K).
664
V
Tensors n (E) are called alternating n-tensors or alternating tensors of degree n
and we write deg() = n. Tensors of the form u1 un , where ui E, are called simple
(or decomposable) alternating n-tensors. Those alternating n-tensors that areVnot simple are
often called compound alternating
Simple tensors u1 un n (E) are also
Vn n-tensors.
called n-vectors and tensors in
(E ) are often called (alternating) n-forms.
V
Given two linear maps f : E E 0 and g : E E 0 , we can define h : E E 2 (E 0 ) by
h(u, v) = f (u) g(v).
It is immediately verified that h is alternating bilinear, and thus it induces a unique linear
map
2
2
^
^
f g:
(E) (E 0 )
such that
(f g)(u v) = f (u) g(u).
If we also have linear maps f 0 : E 0 E 00 and g 0 : E 0 E 00 , we can easily verify that
(f 0 f ) (g 0 g) = (f 0 g 0 ) (f g).
The generalization to the alternating product f1 fn of n 3 linear maps fi : E E 0
is immediate, and left to the reader.
23.12
Let E be any vector space. For any basis (ui )i for E, we assume that some total ordering
on has been chosen. Call the pair ((ui )i , ) an ordered basis. Then, for any nonempty
finite subset I , let
uI = ui1 uim ,
where I = {i1 , . . . , im }, with i1 < < im .
V
Since n (E) is generated by the tensors of the form v1 vn , with
V vi E, in view of
skew-symmetry, it is clear that the tensors uI , with |I| = n, generate n (E). Actually, they
form a basis.
Proposition 23.19. Given any vector
V space E, if E has finite
V dimension d = dim(E), then
for all n > d, the exterior power n (E) is trivial; that is n (E) = (0). If n d or if E
is infinite dimensional, then for every ordered basis ((ui )i , ), the family (uI ) is basis of
V
n
(E), where I ranges over finite nonempty subsets of of size |I| = n.
665
Proof.
First, assume that E has finite dimension d = dim(E) and that n > d. We know that
Vn
(E) is generated by the tensors of the form v1 vn , with vi E. If u1 , . . . , ud is a
basis of E, as every vi is a linear combination of the uj , when we expand v1 vn using
multilinearity, we get a linear combination of the form
X
v1 vn =
(j1 ,...,jn ) uj1 ujn ,
(j1 ,...,jn )
where each (j1 , . . . , jn ) is some sequence of integers jk {1, . . . , d}. As n > d, each sequence
(j1 , . . . , jn ) must contain two identical
Vn elements. By alternation, uj1 ujn = 0, and so
v1 vn = 0. It follows that
(E) = (0).
Vn
(E) K.
where the above sum is finite and involves nonempty finite subset I with |I| = n, for
every such I, when we apply LI we get
I = 0,
proving linear independence.
As a corollary, if E is finite dimensional, say dim(E) = d, and if 1 n d, then we have
n
^
n
dim( (E)) =
,
d
666
V
and if n > d, then dim( n (E)) = 0.
Remark: When n = 0, if we set u = 1, then (u ) = (1) is a basis of
V0
(V ) = K.
Vn
(E). As,
Lu1 ,...,un (u1 un ) = 1,
we conclude that u1 un 6= 0.
Proposition 23.20 shows that, geometrically, every nonzero wedge u1 un corresponds
to some oriented version of an n-dimensional subspace of E.
23.13
We can show the following property of the exterior tensor product, using the proof technique
of Proposition 23.7:
n
n ^
k
nk
^
M
^
(E)
(F ).
(E F ) =
k=0
23.14
667
X
Sn
It is easily checked that this expression is alternating w.r.t. the ui s and also w.r.t. the vj .
For any fixed (v1 , . . . , vn ) (E )n , we get an alternating multinear map
lv1 ,...,vn : (u1 , . . . , un ) 7 det(vj (ui ))
from E n to K. By the argument used in the symmetric case, we get a bilinear map
n
n
^
^
(E ) (E) K.
Now, this pairing in nondegenerate. This can be shown using bases and we leave it as an
exercise to the reader. Therefore, we get a canonical isomorphism
n
n
^
^
( (E))
= (E ).
668
with the factor n!1 added in front of the determinant. Each version has its its own merits
and inconvenients. Morita [83] uses 0 because it is more convenient than when dealing
with characteristic classes. On the other hand, when using 0 , some extra factor is needed
in defining the wedge operation of alternating multilinear forms (see Section 23.15) and for
exterior differentiation. The version is the one adopted by Warner [114], Knapp [64],
Fulton and Harris [42], and Cartan [20, 21].
by
v F .
Consequently, we have
f > (v )(u) = v (f (u)),
For any p 1, the map
V
V
V
V
from E n to p F is multilinear alternating, so it induces a linear map p f : p E p F
defined on generators by
p
^
f (u1 up ) = f (u1 ) f (up ).
Combining
by
Vp
>
Vp
f> :
Vp
Vp
E defined on generators
p
^
f () (u1 , . . . , up ) = ()(f (u1 ), . . . , f (up )),
>
p
^
F , u1 , . . . , up E.
p
^
f > (v1 vp ) (u1 , . . . , up ) = (f > (v1 ) f > (vp ))(u1 , . . . , up )
= det(f > (vj )(ui ))
= det(vj (f (ui )))
= (v1 vp )(f (u1 ), . . . , f (up )),
as claimed.
669
V
The map p f > is often denoted f , although this is an ambiguous notationVsince p is
dropped. Proposition 23.21 gives us the behavior of f under the identification of p E and
Altp (E; K) via the isomorphism .
Vn
n
As in the case of symmetric powers, the map
from
E
to
(E) given by (u1 , . . . , un ) 7
Vn
n
u1 un yields a surjection
: E
Vn
(E) E n .
As the right
hand
side
is
clearly
an
alternating
map,
we
get
a
linear
map
:
Vn
Clearly, ( (E)) is the set of antisymmetrized tensors in E n . If we consider the map
A = : E n E n , it is easy to check that A A = A. Therefore, A is a projection,
and by linear algebra, we know that
E
= A(E
n
^
) Ker A = ( (A)) Ker A.
It turns out that Ker A = E n Ia = Ker , where Ia is the two-sided ideal of T (E)
generated by all tensors of the form u u E 2 (for example, see Knapp [64], Appendix
A). Therefore, is injective,
n
n
^
^
n
E = ( (E)) (E I) = ( (E)) Ker ,
V
and the exterior tensor power n (E) is naturally embedded into E n .
n
23.15
Exterior Algebras
As in the case of symmetric tensors, we can pack together all the exterior powers
an algebra
m
^
M^
(V ) =
(V ),
Vn
(V ) into
m0
called the exterior algebra (or Grassmann algebra) of V . We mimic the procedure used
for symmetric powers. If Ia is the two-sided ideal generated by all tensors of the form
u u V 2 , we set
^
(V ) = T (V )/Ia .
670
V
Then, (V ) automatically inherits a multiplication operation, called wedge product, and
since T (V ) is graded, that is
M
T (V ) =
V m ,
m0
we have
(V ) =
M
m0
V m /(Ia V m ).
(V )
= V m /(Ia V m ),
so
(V )
=
(V ).
(V ) =
d ^
m
M
(V ),
m=0
Vm
d
m
(V ) has dimension
, we deduce that
^
dim( (V )) = 2d = 2dim(V ) .
The multiplication, :
precise sense:
Vm
(V )
Vn
Vm
(V )
Vm+n
(V ) and all
Vn
(V ), we have
= (1)mn .
The above discussion suggests that it might be useful to know whenVan alternating tensor
is simple, that is, decomposable. It can be shown that for tensors 2 (V ), = 0 iff
is simple. A general criterion for decomposability can be given in terms of some operations
known as left hook and right hook (also called interior products); see Section 23.17.
V
It is easy to see that (V ) satisfies the following universal mapping property:
671
Proposition 23.23. Given any K-algebra A, for any linear map fV: V A, if (f (v))2 = 0
for all v V , then there is a unique K-algebra homomorphism f : (V ) A so that
f = f i,
as in the diagram below:
V F
F
V
(V )
FF
FF
f
F
f FF
"
Vn
Vm
(E )
Vn
(E )
Vm+n
(E ). The fol-
Can we define a multiplication Altm (E; K) Altn (E; K) Altm+n (E; K) directly on
alternating multilinear forms, so that the following diagram commutes:
Vm
(E )
Vn
(E )
Vm+n
(E )
As in the symmetric case, the answer is yes! The solution is to define this multiplication
such that, for f Altm (E; K) and g Altn (E; K),
(f g)(u1 , . . . , um+n ) =
shuffle(m,n)
where shuffle(m, n) consists of all (m, n)-shuffles; that is, permutations of {1, . . . m + n}
such that (1) < < (m) and (m+1) < < (m+n). For example, when m = n = 1,
we have
(f g)(u, v) = f (u)g(v) g(u)f (v).
When m = 1 and n 2, check that
(f g)(u1 , . . . , um+1 ) =
m+1
X
i=1
where the hat over the argument ui means that it should be omitted.
672
Altn (E; K)
n0
V
is an algebra under the above multiplication, and this algebra is isomorphic to (E ). For
the record, we state
V
Proposition 23.24. When E is finite dimensional, the maps : n (E ) Altn (E; K)
induced by the linear extensions of the maps given by
(v1 vn )(u1 , . . . , un ) = det(vj (ui ))
V
yield a canonical isomorphism of algebras : (E ) Alt(E), where the multiplication in
Alt(E) is defined by the maps : Altm (E; K) Altn (E; K) Altm+n (E; K), with
X
(f g)(u1 , . . . , um+n ) =
sgn() f (u(1) , . . . , u(m) )g(u(m+1) , . . . , u(m+n) ),
shuffle(m,n)
where shuffle(m, n) consists of all (m, n)-shuffles, that is, permutations of {1, . . . m + n}
such that (1) < < (m) and (m + 1) < < (m + n).
V
Remark: The algebra (E) is a graded algebra. Given two graded algebras E and F , we
b F , where E
b F is equal to E F as a vector space,
can make a new tensor product E
but with a skew-commutative multiplication given by
(a b) (c d) = (1)deg(b)deg(c) (ac) (bd),
where a E m , b F p , c E n , d F q . Then, it can be shown that
^
^
^
b
(E F )
(F ).
= (E)
23.16
nk
^
V,
673
V
It is easy to show that if (e1 , . . . , en ) is an orthonormal basis of V , then the basis of k V
consisting
V of the eI (where I = {i1 , . . . , ik }, with 1 i1 < < ik n) is anorthonormal
basis of k V . Since the inner product on V induces an inner product V
on V (recall that
]
]
k
^
nk
^
V,
called the Hodge -operator , as follows: For any choice of a positively oriented orthonormal
basis (e1 , . . . , en ) of V , set
(e1 ek ) = ek+1 en .
(1) = e1 en
(e1 en ) = 1.
It is easy to see that the definition of does not depend on the choice of positively oriented
orthonormal basis.
V
Vk
V nk V induces aVlinear bijection
The Hodge
V
V
V -operators :
: (V ) (V ). We also have Hodge -operators : k V nk V .
The following proposition is easy to show:
Proposition 23.25. If V is any oriented vector space of dimension n, for every k with
0 k n, we have
(i) = (id)k(nk) .
Vk
V.
1
(1) = p
v1 vn
det(hvi , vj i)
674
23.17
In this section, all vector spaces are assumed to have finite dimension. Say dim(E) = n.
Using our nonsingular pairing
h, i :
p
^
p
^
E K
(1 p n)
defined on generators by
hu1 up , v1 up i = det(ui (vj )),
we define various contraction operations
y:
and
p
^
p+q
p+q
x:
p
^
E
E
q
^
q
^
(left hook)
(right hook),
as well as the versions obtained by replacing E by E and E by E. We begin with the left
interior product or left hook, y.
V
Let u p E. For any q such that p + q n, multiplication on the right by u is a linear
map
q
p+q
^
^
R (u) :
E
E
given by
where v
Vq
v 7 v u
E. The transpose of R (u) yields a linear map
p+q
t
(R (u)) : (
which, using the isomorphisms (
as a map
Vp+q
q
^
E) ( E) ,
Vp+q
V
Vq
E)
E and ( q E)
E , can be viewed
=
=
p+q
t
(R (u)) :
q
^
given by
where z
Vp+q
E .
We denote z R (u) by
z 7 z R (u),
u y z.
675
for all u
Vp
Vq
E, v
E and z
Vp+q
E .
p
^
p+q
q
^
E .
p
^
p+q
q
^
E.
for all u
Vp
E , v
Vq
E and z
Vp+q
E.
Vp In order to proceed any further, we need some combinatorial properties of the basis of
E constructed from a basis (e1 , . . . , en ) of E. Recall that for any (nonempty) subset
I {1, . . . , n}, we let
eI = ei1 eip ,
where I = {i1 , . . . , ip } with i1 < < ip . We also let e = 1.
Given any two subsets H, L {1, . . . , n}, let
0
if H L 6= ,
H,L =
(1) if H L = ,
where
= |{(h, l) | (h, l) H L, h > l}|.
Proposition 23.26. For any basis (e1 , . . . , en ) of E the following properties hold:
(1) If H L = , |H| = h, and |L| = l, then
H,L L,H = (1)hl .
676
y:
p+q
q
^
E ,
we have
eH y eL = 0 if H 6 L
eH y eL = LH,H eLH if H L.
Similar formulae hold for y :
have the
Vp
Vp+q
Vq
p
^
p+q
q
^
E ,
Vs
E .
Proof. We can prove the above identity assuming that x and y are of the form eI and eJ
using Proposition 23.26, but this is rather tedious. There is also a proof involving determinants; see Warner [114], Chapter 2.
Thus, y is almost an anti-derivation, except that the sign (1)s is applied to the wrong
factor.
It is also possible to define a right interior product or right hook x, using multiplication
on the left rather than multiplication on the right. Then, x defines a right action
p+q
x:
p
^
q
^
such that
hz , u vi = hz x u, vi,
for all u
Vp
E, v
Vq
x:
p
^
q
^
E,
E, and z
Vp+q
E .
677
Vp
hu y z , vi = hz , v ui,
for all u
Vp+q
for all u
Vp
E , v
Vq
E is defined by
Vp
E, v
Vq
Vq
E , and z
E and z
Vp+q
Vp+q
E.
E ,
x:
p
^
q
^
by
hz x u, vi = hz , u vi,
for all u
Vp
E, v
Vq
E, and z
Vp+q
E ,
Vp
E and z
Vp+q
E .
Using the above property and Proposition 23.27, we get the following version of Proposition 23.27 for the right hook:
Proposition 23.28. For the right hook
p+q
x:
p
^
q
^
E ,
Vr
E .
Thus, x is an anti-derivation.
For u E, the right hook z x u is also denoted i(u)z , and called insertion operator or
interior
V product. This operator plays an important role in differential geometry. If we view
z n+1 (E ) as an alternating multilinear map in Altn+1 (E; K), then i(u)z Altn (E; K)
is given by
(i(u)z )(v1 , . . . , vn ) = z (u, v1 , . . . , vn ).
Note that certain authors, such as Shafarevitch [98], denote our right hook z x u (which
is also the right hook in Bourbaki [14] and Fulton and Harris [42]) by u y z .
678
Vp
Vnp
Using
the
two
versions
of
y,
we
can
define
linear
maps
:
E
E and
Vp
Vnp
:
E
E. For any basis (e1 , . . . , en ) of E, if we let M = {1, . . . , n}, e = e1 en ,
and e = e1 en , then
(u) = u y e and (v) = v y e,
V
V
for all u p E and all v p E . The following proposition is easily shown.
V
V
V
V
Proposition 23.29. The linear maps : p E np E and : p E np E are
isomorphims. The isomorphisms
and map decomposable vectors to decomposable vectors.
Vp
Vp
Furthermore, if z
E is decomposable, thenVh(z), ziV= 0, and similarly
for
z
V
V E .
If (e01 , . . . , e0n ) is any other basis of E and 0 : p E np E and 0 : p E np E
are the corresponding isomorphisms, then 0 = and 0 = 1 for some nonzero .
Proof. Using Proposition 23.26, for any subset J {1, . . . , n} = M such that |J| = p, we
have
(eJ ) = eJ y e = M J,J eM J and (eJ ) = eJ y e = M J,J eM J .
Thus,
(eJ ) = M J,J J,M J eJ = (1)p(np) eJ .
679
V
Proof. First, let W be any subspace such that z p (E) and let
P(e1 , . . . , er , er+1 , . . . , en ) be
= 1.
It follows that
eI y z = eI y (z 0 + w z 00 ) = eI y z 0 + eI y (w z 00 ) = eI y z 0 + w,
/ W 0 . Therefore, W is indeed generated by the
with eI y z 0 W 0 , which shows that eIVy z
p1
as u y z for some u
E , and since (u y z) z = 0 for all u
E , we get
ej z = 0 for j = 1, . . . , n.
P
By wedging z = I I eI with each ej , as n > p, we deduce I = 0 for all I, so z = 0, a
contradiction. Therefore, n = p and z is decomposable.
680
Vp1
are
(eH y z) z = 0
Vp+1
E, this is equivalent to
eJ ((eH y z) z) = 0
for all H, J {1, . . . , n}, with |H| = p 1 and |J| = p + 1. Then, for all I, I 0 {1, . . . , n}
with |I| = |I 0 | = p, we can show that
eJ ((eH y eI ) eI 0 ) = 0,
unless there is some i {1, . . . , n} such that
I H = {i},
J I 0 = {i}.
In this case,
eJ (eH y eH{i} ) eJ{i} = {i},H {i},J{i} .
If we let
i,J,H = {i},H {i},J{i} ,
we have i,J,H = +1 if the parity of the number of j J such that j < i is the same as the
parity of the number of h H such that h < i, and i,J,H = 1 otherwise.
Finally, we obtain the following criterion in terms of quadratic equations (Pl
uckers equations) for the decomposability of an alternating tensor:
V
P
Proposition 23.32. (Grassmann-Pl
uckers Equations) For z = I I eI p E, the conditions for z 6= 0 to be decomposable are
X
i,J,H H{i} J{i} = 0,
iJH
Using these criteria, it is a good exercise to prove that if dim(E) = n, then every tensor
V
n1
(E) is decomposable. This can also be shown directly.
It should be noted that the equations given by Proposition 23.32 are not independent.
For example, when dim(E) = n = 4 and p = 2, these equations reduce to the single equation
12 34 13 24 + 14 23 = 0.
When the field K is the field of complex numbers, this is the homogeneous equation of a
quadric in CP5 known as the Klein quadric. The points on this quadric are in one-to-one
correspondence with the lines in CP3 .
681
23.18
n
n
^
^
Alt (E; F )
(E ) F.
=
n
Note
F may have infinite dimension. This isomorphism allows us to view the tensors in
Vn that
with i
Vn
(E ). We also let
n
^
M ^
(E; F ) =
(E )
n=0
!
F =
^
(E) F.
:
(E ) F
(E ) G
(E ) H
by
( f ) ( g) = ( ) (f, g).
As in Section 23.15 (following H. Cartan [21]) we can also define a multiplication
: Altm (E; F ) Altm (E; G) Altm+n (E; H)
682
directly on alternating multilinear maps as follows: For f Altm (E; F ) and g Altn (E; G),
(f g)(u1 , . . . , um+n ) =
shuffle(m,n)
where shuffle(m, n) consists of all (m, n)-shuffles; that is, permutations of {1, . . . m + n}
such that (1) < < (m) and (m + 1) < < (m + n).
In general, not much can be said about , unless has some additional properties. In
particular, is generally not associative. We also have the map
!
n
^
:
(E ) F Altn (E; F )
defined on generators by
((v1 vn ) a)(u1 , . . . , un ) = (det(vj (ui ))a.
Proposition 23.33. The map
:
!
n
^
(E ) F Altn (E; F )
vector spaces,
Vn F, G, H, and any bilinear map : F G H, for all ( (E )) F and
all ( (E )) G,
( ) = () ().
V
Proof. Since we already know that ( nV(E ))F and Altn (E; F ) are isomorphic, it is enough
to show that maps some basis of ( n (E )) F to linearly independent elements. Pick
some bases (e1 , . . . , ep ) in E and (fj )jJ in F V
. Then, we know that the vectors eI fj , where
I {1, . . . , p} and |I| = n, form a basis of ( n (E )) F . If we have a linear dependence
X
I,j
I,j (eI fj ) = 0,
applying the above combination to each (ei1 , . . . , ein ) (I = {i1 , . . . , in }, i1 < < in ), we
get the linear combination
X
I,j fj = 0,
j
and by linear independence of the fj s, we get I,j = 0 for all I and all j. Therefore, the
(eI fj ) are linearly independent, and we are done. The second part of the proposition is
easily checked (a simple computation).
683
A special case of interest is the case where F = G = H is a Lie algebra and (a,P
b) = [a, b]
is the LieP
bracket of F . In this case, using a basis (f1 , . . . , fr ) of F , if we write = i i fi
and = j j fj , we have
X
[, ] =
i j [fi , fj ].
i,j
Consequently,
[, ] = (1)mn+1 [, ].
The following proposition will be useful in dealing with vector-valued differential forms:
V
Proposition 23.34. If (e1 , . . . , ep ) is any basis of E, then every element ( n (E )) F
can be written in a unique way as
X
=
fI F,
eI fI ,
I
where the
eI
V
Proof. Since, Vby Proposition 23.19, the eI form a basis of n (E ), elements of the form
eI f span ( n (E )) F . Now, if we apply () to (ei1 , . . . , ein ), where I = {i1 , . . . , in }
{1, . . . , p}, we get
()(ei1 , . . . , ein ) = (eI fI )(ei1 , . . . , ein ) = fI .
(u1 un f ) = (u1 un ) f.
Proposition 23.35. If (e1 , . . . , ep ) is any basis of E, then every element Altn (E; F )
can be written in a unique way as
X
=
eI fI ,
fI F,
I
684
23.19
Let so(2n) denote the vector space (actually, Lie algebra) of 2n 2n real skew-symmetric
matrices. It is well-known that every matrix A so(2n) can be written as
A = P DP > ,
where P is an orthogonal matrix and where D is a block diagonal matrix
D1
D2
D=
..
.
Dn
consisting of 2 2 blocks of the form
Di =
0 ai
.
ai 0
For a proof, see Horn and Johnson [57] (Corollary 2.5.14), Gantmacher [46] (Chapter IX),
or Gallier [44] (Chapter 11).
Since det(Di ) = a2i and det(A) = det(P DP > ) = det(D) = det(D1 ) det(Dn ), we get
det(A) = (a1 an )2 .
The Pfaffian is a polynomial function Pf(A) in skew-symmetric 2n 2n matrices A (a
polynomial in (2n 1)n variables) such that
Pf(A)2 = det(A),
and for every arbitrary matrix B,
Pf(BAB > ) = Pf(A) det(B).
The Pfaffian shows up in the definition of the Euler class of a vector bundle. There is a
simple way to define the Pfaffian using some exterior algebra. Let (e1 , . . . , e2n ) be any basis
of R2n . For any matrix A so(2n), let
X
(A) =
aij ei ej ,
i<j
Vn
Definition 23.9. For every skew symmetric matrix A so(2n), the Pfaffian polynomial or
Pfaffian, is the degree n polynomial Pf(A) defined by
n
^
685
Clearly, Pf(A) is independent of the basis chosen. If A is the block diagonal matrix D,
a simple calculation shows that
(D) = (a1 e1 e2 + a2 e3 e4 + + an e2n1 e2n )
and that
n
^
and so
Pf(D) = (1)n a1 an .
i,j
k,l
k bki ek ,
k,l
>
Now,
=2
n
^
= C f1 f2 f2n ,
for some C R. If B is singular, then the fi are linearly dependent, which implies that
f1 f2 f2n = 0, in which case
Pf(BAB > ) = 0,
686
= 2n n! Pf(A) f1 f2 f2n .
n
^
and as
we get
n
^
as claimed.
Remark: It can be shown that the polynomial Pf(A) is the unique polynomial with integer
coefficients such that Pf(A)2 = det(A) and Pf(diag(S, . . . , S)) = +1, where
0 1
S=
;
1 0
see Milnor and Stasheff [82] (Appendix C, Lemma 9). There is also an explicit formula for
Pf(A), namely:
n
Y
1 X
sgn()
a(2i1) (2i) .
Pf(A) = n
2 n! S
i=1
2n
Beware, some authors use a different sign convention and require the Pfaffian to have
the value +1 on the matrix diag(S 0 , . . . , S 0 ), where
0 1
0
S =
.
1 0
For example, if R2n is equipped with an inner product h, i, then some authors define (A)
as
X
(A) =
hAei , ej i ei ej ,
i<j
where A = (aij ). But then, hAei , ej i = aji and not aij , and this Pfaffian takes the value +1
on the matrix diag(S 0 , . . . , S 0 ). This version of the Pfaffian differs from our version by the
factor (1)n . In this respect, Madsen and Tornehave [74] seem to have an incorrect sign in
Proposition B6 of Appendix C.
687
We will also need another property of Pfaffians. Recall that the ring Mn (C) of n n
matrices over C is embedded in the ring M2n (R) of 2n 2n matrices with real coefficients,
using the injective homomorphism that maps every entry z = a + ib C to the 2 2 matrix
a b
.
b a
If A Mn (C), let AR M2n (R) denote the real matrix obtained by the above process.
>
Observe that every skew Hermitian matrix A u(n) (i.e., with A = A = A) yields a
matrix AR so(2n).
Proposition 23.37. For every skew Hermitian matrix A u(n), we have
Pf(AR ) = in det(A).
Proof. It is well-known that a skew Hermitian matrix can be diagonalized with respect to a
unitary matrix U and that the eigenvalues are pure imaginary or zero, so we can write
A = U diag(ia1 , . . . , ian )U ,
for some reals aj R. Consequently, we get
AR = UR diag(D1 , . . . , Dn )UR> ,
where
Dj =
0 aj
aj 0
and
Pf(AR ) = Pf(diag(D1 , . . . , Dn )) = (1)n a1 an ,
as we saw before. On the other hand,
det(A) = det(diag(ia1 , . . . , ian )) = in a1 an ,
and as (1)n = in in , we get
Pf(AR ) = in det(A),
as claimed.
Madsen and Tornehave [74] state Proposition 23.37 using the factor (i)n , which is
wrong.
688
Chapter 24
Introduction to Modules; Modules
over a PID
24.1
In this chapter, we introduce modules over a commutative ring (with unity). After a quick
overview of fundamental concepts such as free modules, torsion modules, and some basic
results about them, we focus on finitely generated modules over a PID and we prove the
structure theorems for this class of modules (invariant factors and elementary divisors). Our
main goal is not to give a comprehensive exposition of modules, but instead to apply the
structure theorem to the K[X]-module Ef defined by a linear map f acting on a finitedimensional vector space E, and to obtain several normal forms for f , including the rational
canonical form.
A module is the generalization of a vector space E over a field K obtained replacing
the field K by a commutative ring A (with unity 1). Although formally, the definition is
the same, the fact that some nonzero elements of A are not invertible has some serious
conequences. For example, it is possible that u = 0 for some nonzero A and some
nonzero u E, and a module may no longer have a basis.
For the sake of completeness, we give the definition of a module, although it is the same
as Definition 2.9 with the field K replaced by a ring A. In this chapter, all rings under
consideration are assumed to be commutative and to have an identity element 1.
Definition 24.1. Given a ring A, a (left) module over A (or A-module) is a set M (of vectors)
together with two operations + : M M M (called vector addition),1 and : A M M
(called scalar multiplication) satisfying the following conditions for all , A and all
u, v M ;
The symbol + is overloaded, since it denotes both addition in the ring A and addition of vectors in M .
It is usually clear from the context which + is intended.
689
690
(M1) (u + v) = ( u) + ( v);
(M2) ( + ) u = ( u) + ( u);
(M3) ( ) u = ( u);
(M4) 1 u = u.
Unless specified otherwise or unless we are dealing with several different rings, in the rest
of this chapter, we assume that all A-modules are defined with respect to a fixed ring A.
Thus, we will refer to a A-module simply as a module.
From (M0), a module always contains the null vector 0, and thus is nonempty. From
(M1), we get 0 = 0, and (v) = ( v). From (M2), we get 0 v = 0, and
() v = ( v). The ring A itself can be viewed as a module over itself, addition of
vectors being addition in the ring, and multiplication by a scalar being multiplication in the
ring.
When the ring A is a field, an A-module is a vector space. When A = Z, a Z-module is
just an abelian group, with the action given by
0 u = 0,
nu=u
+ u},
| + {z
n>0
n u = (n) u,
n < 0.
All definitions from Section 2.3, linear combinations, linear independence and linear
dependence, subspaces renamed as submodules, apply unchanged to modules. Proposition
2.7 also holds for the module spanned by a set of vectors. The definition of a basis (Definition
2.12) also applies to modules, but the only result from Section 2.4 that holds for modules
is Proposition 2.14. Unfortunately, it is longer true that every module has a basis. For
example, for any nonzero integer m Z, the Z-module Z/mZ has no basis. Similarly, Q,
as a Z-module, has no basis. In fact, any two distinct nonzero elements p1 /q1 and p2 /q2 are
linearly dependent, since
p2
p1
(p1 q2 )
= 0.
(p2 q1 )
q1
q2
Definition 2.13 can be generalized to rings and yields free modules.
Definition 24.2. Given a commutative ring A and any (nonempty) set I, let A(I) be the
subset of the cartesian product AI consisting of all families (i )iI with finite support of
scalars in A.2 We define addition and multiplication by a scalar as follows:
(i )iI + (i )iI = (i + i )iI ,
2
691
and
(i )iI = (i )iI .
It is immediately verified that addition and multiplication by a scalar are well defined.
Thus, A(I) is a module. Furthermore, because families with finite support are considered, the
family (ei )iI of vectors ei , defined such that (ei )j = 0 if j 6= i and (ei )i = 1, is clearly a basis
of the module A(I) . When I = {1, . . . , n}, we denote A(I) by An . The function : I A(I) ,
such that (i) = ei for every i I, is clearly an injection.
Definition 24.3. An A-module M is free iff it has a basis.
The module A(I) is a free module.
All definitions from Section 2.5 apply to modules, linear maps, kernel, image, except the
definition of rank, which has to be defined differently. Propositions 2.15, 2.16, 2.17, and
2.18 hold for modules. However, the other propositions do not generalize to modules. The
definition of an isomorphism generalizes to modules. As a consequence, a module is free iff
it is isomorphic to a module of the form A(I) .
Section 2.6 generalizes to modules. Given a submodule N of a module M , we can define
the quotient module M/N .
If a is an ideal in A and if M is an A-module, we define aM as the set of finite sums of
the form
a1 m1 + + ak mk , ai a, mi M.
It is immediately verified that aM is a submodule of M .
Interestingly, the part of Theorem 2.13 that asserts that any two bases of a vector space
have the same cardinality holds for modules. One way to prove this fact is to pass to a
vector space by a quotient process.
Theorem 24.1. For any free module M , any two bases of M have the same cardinality.
Proof sketch. We give the argument for finite bases, but it also holds for infinite bases. The
trick is to pick any maximal ideal m in A (whose existence is guaranteed by Theorem 31.3).
Then, A/m is a field, and M/mM can be made into a vector space over A/m; we leave the
details as an exercise. If (u1 , . . . , un ) is a basis of M , then it is easy to see that the image of
this basis is a basis of the vector space M/mM . By Theorem 2.13, the number n of elements
in any basis of M/mM is an invariant, so any two bases of M must have the same number
of elements.
The common number of elements in any basis of a free module is called the dimension
(or rank ) of the free module.
One should realize that the notion of linear independence in a module is a little tricky.
According to the definition, the one-element sequence (u) consisting of a single nonzero
692
vector is linearly independent if for all A, if u = 0 then = 0. However, there are free
modules that contain nonzero vectors that are not linearly independent! For example, the
ring A = Z/6Z viewed as a module over itself has the basis (1), but the zero-divisors, such
as 2 or 4, are not linearly independent. Using language introduced in Definition 24.4, a free
module may have torsion elements. There are also nonfree modules such that every nonzero
vector is linearly independent, such as Q over Z.
All definitions from Section 3.1 about matrices apply to free modules, and so do all the
proposition. Similarly, all definitions from Section 4.1 about direct sums and direct products
apply to modules. All propositions that do not involve extending bases still hold. The
important proposition 4.10 survives in the following form.
Proposition 24.2. Let f : E F be a surjective linear between two A-modules with F a
free module. Given any basis (v1 , . . . , vr ) of F , for any r vectors u1 , . . . , ur E such that
f (ui ) = vi for i = 1, . . . , r, the vectors (u1 , . . . , ur ) are linearly independent and the module
E is the direct sum
E = Ker (f ) U,
where U is the free submodule of E spanned by the basis (u1 , . . . , ur ).
Proof. Pick any w E, write f (w) over the basis (v1 , . . . , vr ) as f (w) = a1 v1 + + ar vr ,
and let u = a1 u1 + + ar ur . Observe that
f (w u) = f (w) f (u)
= a1 v1 + + ar vr (a1 f (u1 ) + + ar f (ur ))
= a1 v1 + + ar vr (a1 v1 + + ar vr )
= 0.
Therefore, h = w u Ker (f ), and since w = h + u with h Ker (f ) and u U , we have
E = Ker (f ) + U .
If u = a1 u1 + + ar ur U also belongs to Ker (f ), then
0 = f (u) = f (a1 u1 + + ar ur ) = a1 v1 + + ar vr ,
and since (v1 , . . . , vr ) is a basis, ai = 0 for i = 1, . . . , r, which shows that Ker (f ) U = (0).
Therefore, we have a direct sum
E = Ker (f ) U.
Finally, if
a1 u1 + + ar ur = 0,
the above reasoning shows that ai = 0 for i = 1, . . . , r, so (u1 , . . . , ur ) are linearly independent. Therefore, the module U is a free module.
693
694
If a ring has zero divisors, then the set of all torsion elements in an A-module M may not
be a submodule of M . For example, if M = A = Z/6Z, then Mtor = {2, 3, 4}, but 3 + 4 = 1
is not a torsion element. Also, a free module may not be torsion-free because there may be
torsion elements, as the example of Z/6Z as a free module over itself shows.
However, if A is an integral domain, then a free module is torsion-free and Mtor is a
submodule of M . (Recall that an integral domain is commutative).
Proposition 24.3. If A is an integral domain, then for any A-module M , the set Mtor of
torsion elements in M is a submodule of M .
Proof. If x, y M are torsion elements (x, y 6= 0), then there exist some nonzero elements
a, b A such that ax = 0 and by = 0. Since A is an integral domain, ab 6= 0, and then for
all , A, we have
ab(x + y) = bax + aby = 0.
Therefore, Mtor is a submodule of M .
The module Mtor is called the torsion submodule of M . If Mtor = (0), then we say that
M is torsion-free, and if M = Mtor , then we say that M is a torsion module.
If M is not finitely generated, then it is possible that Mtor 6= 0, yet the annihilator of
Mtor is reduced to 0 (find an example). However, if M is finitely generated, this cannot
happen, since if x1 , . . . , xn generate M and if a1 , . . . , an annihilate x1 , . . . , xn , then a1 an
annihilates every element of M .
Proposition 24.4. If A is an integral domain, then for any A-module M , the quotient
module M/Mtor is torsion free.
Proof. Let x be an element of M/Mtor and assume that ax = 0 for some a 6= 0 in A. This
means that ax Mtor , so there is some b 6= 0 in A such that bax = 0. Since a, b 6= 0 and A
is an integral domain, ba 6= 0, so x Mtor , which means that x = 0.
If A is an integral domain and if F is a free A-module with basis (u1 , . . . , un ), then F
can be embedded in a K-vector space FK isomorphic to K n , where K = Frac(A) is the
fraction field of A. Similarly, any submodule M of F is embedded into a subspace MK of
FK . Note that any linearly independent vectors (u1 , . . . , um ) in the A-module M remain
linearly independent in the vector space MK , because any linear dependence over K is of
the form
a1
am
u1 + +
um = 0
b1
bm
for some ai , bi A, with b1 bm 6= 0, so if we multiply by b1 bm 6= 0, we get a linear dependence in the A-module M . Then, we see that the maximum number of linearly
independent vectors in the A-module M is at most n. The maximum number of linearly
independent vectors in a finitely generated submodule of a free module (over an integral
domain) is called the rank of the module M . If (u1 , . . . , um ) are linearly independent where
695
m is the rank of m, then for every nonzero v M , there are some a, a1 , . . . , am A, not all
zero, such that
av = a1 u1 + + am um .
We must have a 6= 0, since otherwise, linear independence of the ui would imply that
a1 = = am = 0, contradicting the fact that a, a1 , . . . , am A are not all zero.
696
We can also prove that a finitely generated torsion-free module over a PID is actually
free. We will give another proof of this fact later, but the following proof is instructive.
Proposition 24.6. If A is a PID and if M is a finitely generated module which is torsionfree, then M is free.
Proof. Let (y1 , . . . , yn ) be some generators for M , and let (u1 , . . . , um ) be a maximal subsequence of (y1 , . . . , yn ) which is linearly independent. If m = n, we are done. Otherwise,
due to the maximality of m, for i = 1, . . . , n, there is some ai 6= 0 such that such that
ai yi can be expressed as a linear combination of (u1 , . . . , um ). If we let a = a1 . . . an , then
a1 . . . an yi Au1 Aum for i = 1, . . . , n, which shows that
aM Au1 Aum .
Now, A is an integral domain, and since ai 6= 0 for i = 1, . . . , n, we have a = a1 . . . an 6= 0,
and because M is torsion-free, the map x 7 ax is injective. It follows that M is isomorphic
to a submodule of the free module Au1 Aum . By Proposition 24.5, this submodule
if free, and thus, M is free.
Although we will obtain this result as a corollary of the structure theorem for finitely
generated modules over a PID, we are in the position to give a quick proof of the following
theorem.
697
Theorem 24.7. Let M be a finitely generated module over a PID. Then M/Mtor is free,
and there exit a free submodule F of M such that M is the direct sum
M = Mtor F.
The dimension of F is uniquely determined.
Proof. By Proposition 24.4 M/Mtor is torsion-free, and since M is finitely generated, it is
also finitely generated. By Proposition 24.6, M/Mtor is free. We have the quotient linear
map : M M/Mtor , which is surjective, and M/Mtor is free, so by Proposition 24.2, there
is a free module F isomorphic to M/Mtor such that
M = Ker () F = Mtor F.
Since F is isomorphic to M/Mtor , the dimension of F is uniquely determined.
Theorem 24.7 reduces the study of finitely generated module over a PID to the study
of finitely generated torsion modules. This is the path followed by Lang [67] (Chapter III,
section 7).
24.2
Since modules are generally not free, it is natural to look for techniques for dealing with
nonfree modules. The hint is that if M is an A-module and if (ui )iI is any set of generators
for M , then we know that there is a surjective homomorphism : A(I) M from the free
module A(I) generated by I onto M . Furthermore M is isomorphic to A(I) /Ker (). Then,
we can pick a set of generators (vj )jJ for Ker (), and again there is a surjective map
: A(J) Ker () from the free module A(J) generated by J onto Ker (). The map can
be viewed a linear map from A(J) to A(I) , we have
Im() = Ker (),
and is surjective. Note that M is isomorphic to A(I) /Im(). In such a situation we say
that we have an exact sequence and this is denoted by the diagram
A(J)
A(I)
/ 0.
A(I)
698
2. is surjective.
Consequently, M is isomorphic to A(I) /Im(). If I and J are both finite, we say that this is
a finite presentation of M .
Observe that in the case of a finite presentation, I and J are finite, and if |J| = n and
|I| = m, then is a linear map : An Am , so it is given by some m n matrix R with
coefficients in A called the presentation matrix of M . Every column Rj of R may thought
of as a relation
aj1 e1 + + ajm em = 0
among the generators e1 , . . . , em of Am , so we have n relations among these generators. Also
the images of e1 , . . . , em in M are generators of M , so we can think of the above relations
as relations among the generators of M . The submodule of Am spanned by the columns of
R is the set of relations of M , and the columns of R are called a complete set of relations
for M . The vectors e1 , . . . , em are called a set of generators for M . We may also say that
the generators e1 , . . . , em and the relations R1 , . . . , Rn (the columns of R) are a (finite)
presentation of the module M .
For example, the Z-module presented by the 1 1 matrix R = (5) is the quotient, Z/5Z,
of Z by the submodule 5Z corresponding to the single relation
5e1 = 0.
But Z/5Z has other presentations. For example, if we consider the matrix of relations
2 1
R=
,
1 2
presenting the module M , then we have the relations
2e1 + e2 = 0
e1 + 2e2 = 0.
From the first equation, we get e2 = 2e1 , and substituting into the second equation we get
5e1 = 0.
It follows that the generator e2 can be eliminated and M is generated by the single generator
e1 satisfying the relation
5e1 = 0,
which shows that M Z/5Z.
The above example shows that many different matrices can present the same module.
Here are some useful rules for manipulating a relation matrix without changing the isomorphism class of the module M it presents.
699
3 8 7 9
R = 2 4 6 6 .
1 2 2 1
700
0 2
0 0
1 2
1 6
2 4 .
2 1
701
Proof. (1) Pick any w E, write f (w) over the generators (v1 , . . . , vr ) of Im(f ) as f (w) =
a1 v1 + + ar vr , and let u = a1 u1 + + ar ur . Observe that
f (w u) = f (w) f (u)
= a1 v1 + + ar vr (a1 f (u1 ) + + ar f (ur ))
= a1 v1 + + ar vr (a1 v1 + + ar vr )
= 0.
Therefore, h = w u Ker (f ), and since w = h + u with h Ker (f ) and u U , we have
E = Ker (f ) + U , as claimed. If Ker (f ) is also finitely generated, by taking the union of a
finite set of generators for Ker (f ) and (v1 , . . . , vr ), we obtain a finite set of generators for E.
(2) If (u1 , . . . , un ) generate E, it is obvious that (f (u1 ), . . . , f (un )) generate Im(f ).
Theorem 24.10. A ring A is Noetherian iff every submodule N of a finitely generated
A-module M is itself finitely generated.
Proof. First, assume that every submodule N of a finitely generated A-module M is itself
finitely generated. The ring A is a module over itself and it is generated by the single element
1. Furthermore, every submodule of A is an ideal, so the hypothesis implies that every ideal
in A is finitely generated, which shows that A is Noetherian.
Now, assume A is Noetherian. First, observe that it is enough to prove the theorem for
the finitely generated free modules An (with n 1). Indeed, assume that we proved for
every n 1 that every submodule of An is finitely generated. If M is any finitely generated
A-module, then there is a surjection : An M for some n (where n is the number of
elements of a finite generating set for M ). Given any submodule N of M , L = 1 (N ) is a
submodule of An . Since An is finitely generated, the submodule N of An is finitely generated,
and then N = (L) is finitely generated.
It remains to prove the theorem for M = An . We proceed by induction on n. For n = 1,
a submodule N of A is an ideal, and since A is Noetherian, N is finitely generated. For the
induction step where n > 1, consider the projection : An An1 given by
(a1 , . . . , an ) = (a1 , . . . , an1 ).
The kernel of is the module
Ker () = {(0, . . . , 0, an ) An | an A} A.
For any submodule N of An , let : N An1 be the restriction of to N . Since (N )
is a submodule of An1 , by the induction hypothesis, Im() = (N ) is finitely generated.
Also, Ker () = N Ker () is a submodule of Ker () A, and thus Ker () is isomorphic
to an ideal of A, and thus is finitely generated (since A is Noetherian). Since both Im()
and Ker () are finitely generated, by Proposition 24.9, the submodule N is also finitely
generated.
702
24.3
It is possible to define tensor products of modules over a ring, just as in Section 23.1, and the
results of this section continue to hold. The results of Section 23.3 also continue to hold since
they are based on the universal mapping property. However, the results of Section 23.2 on
bases generally fail, except for free modules. Similarly, the results of Section 23.4 on duality
generally fail. Tensor algebras can be defined for modules, as in Section 23.5. Symmetric
tensor and alternating tensors can be defined for modules but again, results involving bases
generally fail.
Tensor products of modules have some unexpected properties. For example, if p and q
are relatively prime integers, then
Z/pZ Z Z/qZ = (0).
This is because, by Bezouts identity, there are a, b Z such that
ap + bq = 1,
703
= P
= P A Q.
Proof sketch. We only consider the second isomorphism. Since P is projective, we have some
A-modules, P1 , F , with
P P1 = F,
where F is some free module. Now, we know that for any A-modules, U, V, W , we have
Y
HomA (U V, W )
HomA (V, W )
= HomA (U, W )
= HomA (U, W ) HomA (V, W ),
so
P P1
= F ,
By tensoring with Q and using the fact that tensor distributes w.r.t. coproducts, we get
(P A Q) (P1 Q)
= F A Q.
= (P P1 ) A Q
Now, the proof of Proposition 23.9 goes through because F is free and finitely generated, so
: (P A Q) (P1 Q)
= F A Q HomA (F, Q)
= HomA (P, Q) HomA (P1 , Q)
is an isomorphism and as maps P A Q to HomA (P, Q), it yields an isomorphism between
these two spaces.
704
The isomorphism : P A Q
= HomA (P, Q) of Proposition 24.11 is still given by
(u f )(x) = u (x)f,
u P , f Q, x P.
M
Mi
Mj
jI
M
(i,j)IJ
(Mi Nj ).
705
Proposition 24.14. Given any A-module M and any ideal a in A, there is an isomorphism
(A/a) A M M/aM
given by the map (a u) 7 au (mod aM ), for all a A/a and all u M .
Sketch of proof. Consider the map : (A/a) M M/aM given by
(a, u) = au (mod aM )
for all a A/a and all u M . It is immediately checked that is well-defined because au
(mod aM ) does not depend on the representative a A chosen in the equivalence class a,
and is bilinear. Therefore, induces a linear map : (A/a) M M/aM , such that
(a u) = au (mod aM ). We also define the map : M (A/a) M by
(u) = 1 u.
Since aM is generated by vectors of the form au with a a and u M , and since
(au) = 1 au = a u = 0 u = 0,
we see that aM Ker (), so induces a linear map : M/aM (A/a) M . We have
((a u)) = (au)
= 1 au
=au
and
((u)) = (1 u)
= 1u
= u,
which shows that and are mutual inverses.
24.4
The need to extend the ring of scalars arises, in particular when dealing with eigenvalues.
First, we need to define how to restrict scalar multiplication to a subring. The situation is
that we have two rings A and B, a B-module M , and a ring homomorphism : A B. The
special case that arises often is that A is a subring of B (B could be a field) and is the
inclusion map. Then, we can make M into an A-module by defining the scalar multiplication
: A M M as follows:
a x = (a)x,
706
The map is bilinear so it induces a linear map : (B) A M (B) A M such that
( 0 x) = ( 0 ) x.
then it is easy to check that the axioms M1, M2, M3, M4 hold. Let us check M2 and M3.
We have
1 +2 ( 0 x) = (1 + 2 ) 0 x
= (1 0 + 2 0 ) x
= 1 0 x + 2 0 x
= 1 ( 0 x) + 2 ( 0 x)
and
1 2 ( 0 x) = 1 2 0 x
= 1 (2 0 x)
= 1 (2 ( 0 x)).
With the scalar multiplication by elements of B given by
( 0 x) = ( 0 ) x,
for all x M .
707
(M )
(N )
708
free, any basis (u1 , . . . , un ) of M becomes the basis ((u1 ), . . . , (un )) of (M ); but A/m is
a field, so the dimension n is uniquely determined. This argument also applies to an infinite
basis (ui )iI . Observe that by Proposition 24.14, we have an isomorphism
(M ) = (A/m) A M M/mM,
so M/mM is a vector space over the field A/m, which is the argument used in Theorem 24.1.
Proposition 24.18. Given a ring homomomorphism : A B, for any two A-modules M
and N , there is a unique isomorphism
(M ) B (N ) (M A N ),
such that (1 u) (1 v) 7 1 (u v), for all u M and all v N .
The proof uses identities from Proposition 23.7. It is not hard but it requires a little
gymnastic; a good exercise for the reader.
24.5
We saw in Section 5.7 that given a linear map f : E E from a K-vector space E into itself,
we can define a scalar multiplication : K[X] E E that makes E into a K]X]-module.
If E is finite-dimensional, this K[X]-module denoted by Ef is a torsion module, and the
main results of this chapter yield important direct sum decompositions of E into subspaces
invariant under f .
Recall that given any polynomial p(X) = a0 X n + a1 X n1 + + an with coefficients in
the field K, we define the linear map p(f ) : E E by
p(f ) = a0 f n + a1 f n1 + + an id,
where f k = f f , the k-fold composition of f with itself. Note that
p(f )(u) = a0 f n (u) + a1 f n1 (u) + + an u,
for every vector u E. Then, we define the scalar multiplication : K[X] E E by
polynomials as follows: for every polynomial p(X) K[X], for every u E,
p(X) u = p(f )(u).3
3
709
It is easy to verify that this scalar multiplication satisfies the axioms M1, M2, M3, M4:
p (u + v) = p u + p v
(p + q) u = p u + q u
(pq) u = p (q u)
1 u = u,
for all p, q K[X] and all u, v E. Thus, with this new scalar multiplication, E is a
K[X]-module denoted by Ef .
If p = is just a scalar in K (a polynomial of degree 0), then
u = (id)(u) = u,
which means that K acts on E by scalar multiplication as before. If p(X) = X (the monomial
X), then
X u = f (u).
Since K is a field, the ring K[X] is a PID.
If E is finite-dimensional, say of dimension n, since K is a subring of K[X] and since E is
finitely generated over K, the K[X]-module Ef is finitely generated over K[X]. Furthermore,
Ef is a torsion module. This follows from the Cayley-Hamilton Theorem (Theorem 5.16),
but this can also be shown in an elementary fashion as follows. The space Hom(E, E) of
linear maps of E into itself is a vector space of dimension n2 , therefore the n2 + 1 linear maps
2
id, f, f 2 , . . . , f n
are linearly dependent, which yields a nonzero polynomial q such that q(f ) = 0.
We can now translate notions defined for modules into notions for endomorphisms of
vector spaces.
1. To say that U is a submodule of Ef means that U is a subspace of E invariant under
f ; that is, f (U ) U .
2. To say that V is a cyclic submodule of Ef means that there is some vector u V , such
that V is spanned by (u, f (u), . . . , f k (u), . . .). If E has finite dimension n, then V is
spanned by (u, f (u), . . . , f k (u)) for some k n 1. We say that V is a cyclic subspace
for f with generator u. Sometimes, V is denoted by Z(u; f ).
3. To say that the ideal a = (p(X)) (with p(X) a monic polynomial) is the annihilator
of the submodule V means that p(f )(u) = 0 for all u V , and we call p the minimal
polynomial of V .
710
0 0 0 0 a0
1 0 0 0 a1
0 1 0 0 a2
..
U = .. . . . . . . ..
.
.
.
.
.
.
.
0 0 0 . . . 0 an2
0 0 0 1 an1
It is an easy exercise to prove that the characteristic polynomial U (X) of U gives
back q(X):
U (X) = q(X).
We will need the following proposition to characterize when two linear maps are similar.
Proposition 24.19. Let f : E E and f 0 : E 0 E 0 be two linear maps over the vector
spaces E and E 0 . A linear map g : E E 0 can be viewed as a linear map between the
K[X]-modules Ef and Ef 0 iff
g f = f 0 g.
Proof. First, suppose g is K[X]-linear. Then, we have
g(p f u) = p f 0 g(u)
for all p K[X] and all u E, so for p = X we get
g(p f u) = g(X f u) = g(f (u))
and
p f 0 g(u) = X f 0 g(u) = f 0 (g(u)),
for all n 1.
711
Indeed, we have
g f n+1 = g f n f
= f 0n g f
= f 0n f 0 g
= f 0n+1 g,
establishing the induction step. It follows that for any polynomial p(X) =
have
g(p(X) f u) = g
=
n
X
n
X
k=0
k=0
ak X k , we
ak f (u)
k
k=0
n
X
k=0
Pn
ak g f k (u)
ak f 0k g(u)
n
X
ak f (g(u))
0k
k=0
= p(X) f 0 g(u),
so, g is indeed K[X]-linear.
Definition 24.7. We say that the linear maps f : E E and f 0 : E 0 E 0 are similar iff
there is an isomorphism g : E E 0 such that
f 0 = g f g 1 ,
or equivalently,
g f = f 0 g.
Then, Proposition 24.19 shows the following fact:
Proposition 24.20. With notation of Proposition 24.19, two linear maps f and f 0 are
similar iff g is an isomorphism between Ef and Ef0 0 .
Later on, we will see that the isomorphism of finitely generated torsion modules can be
characterized in terms of invariant factors, and this will be translated into a characterization of similarity of linear maps in terms of so-called similarity invariants. If f and f 0 are
represented by matrices A and A0 over bases of E and E 0 , then f and f 0 are similar iff the
matrices A and A0 are similar (there is an invertible matrix P such that A0 = P AP 1 ).
Similar matrices (and endomorphisms) have the same characteristic polynomial.
712
It turns out that there is a useful relationship between Ef and the module K[X] K E.
Observe that the map : K[X] E E given by
p u = p(f )(u)
is K-bilinear, so it yields a K-linear map : K[X] K E E such that
(p u) = p u = p(f )(u).
We know from Section 24.4 that K[X] K E is a K[X]-module (obtained from the inclusion
K K[X]), which we will denote by E[X]. Since E is a vector space, E[X] is a free
K[X]-module, and if (u1 , . . . , un ) is a basis of E, then (1 u1 , . . . , 1 un ) is a basis of E[X].
The free K[X]-module E[X] is not as complicated as it looks. Over the basis
(1 u1 , . . . , 1 un ), every element z E[X] can be written uniquely as
z = p1 (1 u1 ) + + pn (1 un ) = p1 u1 + + pn un ,
where p1 , . . . , pn are polynomials in K[X]. For notational simplicity, we may write
z = p1 u 1 + + pn u n ,
where p1 , . . . , pn are viewed as coefficients in K[X]. With this notation, we see that E[X] is
isomorphic to (K[X])n , which is easy to understand.
Observe that is K[X]-linear, because
(q(p u)) = ((qp) u)
= (qp) u
= q(f )(p(f )(u))
= q (p(f )(u))
= q (p u).
Therefore, is a linear map of K[X]-modules, : E[X] Ef . Using our simplified notation,
if z = p1 u1 + + pn un E[X], then
(z) = p1 (f )(u1 ) + + pn (f )(un ),
which amounts to plugging f for X and evaluating. Similarly, f is a K[X]-linear map of Ef ,
because
f (p u) = f (p(f )(u))
= (f p(f ))(u)
= p(f )(f (u))
= p f (u),
713
where we used the fact that f p(f ) = p(f )f because p(f ) is a polynomial in f . By Proposition
24.16, the linear map f : E E induces a K[X]-linear map f : E[X] E[X] such that
f (p u) = p f (u).
Observe that we have
f ((p u)) = f (p(f )(u)) = p(f )(f (u))
and
so we get
()
E[X]
E[X]
Ef
0.
This means that is injective, is surjective, and that Im() = Ker (). As a consequence,
Ef is isomorphic to the quotient of E[X] by Im(X1 f ).
Proof. Because (1 u) = u for all u E, the map is surjective. We have
(X(p u)) = X (p u)
= f ((p u)),
which shows that
X1 = f = f ,
714
= X1 f
= f f = 0,
and thus, Im() Ker (). It remains to prove that Ker () Im().
Since the monomials X k form a basis of A[X], by Proposition 24.13 (with the roles of M
and N exchanged), every z E[X] = A[X] A E has a unique expression as
X
z=
X k uk ,
k
X k uk 1 0
X uk 1
X
f (uk )
k
X
(X k uk 1 f k (uk ))
=
k
X
k
=
(X k (1 uk ) f (1 uk ))
k
X
k
(X k 1 f )(1 uk ).
X
k1
j kj1
X 1 f = (X1 f )
(X1) f
,
k
j=0
715
XX
k1
j kj1
z = (X1 f )
(X1) f
(1 uk ) ,
j=0
X
k
X uk
= (X1 f )
k
X
k
k+1
for all k.
Since (uk ) has finite support, there is a largest k, say m + 1 so that um+1 = 0, and then from
uk = f (uk+1 ),
we deduce that uk = 0 for all k. Therefore, z = 0, and is injective.
Remark: The exact sequence of Theorem 24.21 yields a presentation of Mf .
Since A[X] is a free A-module, A[X]A M is a free A-module, but A[X]A M is generally
not a free A[X]-module. However, if M is a free module, then M [X] is a free A[X]-module,
since if (ui )iI is a basis for M , then (1 ui )iI is a basis for M [X]. This allows us to define
the characterisctic polynomial f (X) of an endomorphism of a free module M as
f (X) = det(X1 f ).
Note that to have a correct definition, we need to define the determinant of a linear map
allowing the indeterminate X as a scalar, and this is what the definition of M [X] achieves
(among other things). Theorem 24.21 can be used to quick a short proof of the CayleyHamilton Theorem, see Bourbaki [14] (Chapter III, Section 8, Proposition 20). Proposition
5.10 is still the crucial ingredient of the proof.
We now develop the theory necessary to understand the structure of finitely generated
modules over a PID.
716
24.6
We begin by considering modules over a product ring obtained from a direct decomposition,
as in Definition 21.3. In this section and the next, we closely follow Bourbaki [15] (Chapter
VII). Let A be a commutative ring and let (b1 , . . . , bn ) be ideals in A such that there is
an isomorphism A A/b1 A/bn . From Theorem 21.16 part (b), there exist some
elements e1 , . . . , en of A such that
e2i = ei
ei ej = 0, i 6= j
e1 + + en = 1A ,
and bi = (1A ei )A, for i, j = 1, . . . , n.
Proposition 24.22. Given a ring A A/b1 A/bn as above, the A-module M is the
direct sum
M = M1 Mn ,
where Mi is the submodule of M annihilated by bi .
Proof. For i = 1, . . . , n, let pi : M M be the map given by
pi (x) = ei x,
x M.
The map pi is clearly linear, and because of the properties satisfied by the ei s, we have
p2i = pi
pi pj = 0, i 6= j
p1 + + pn = id.
This shows that the pi are projections, and by Proposition 4.6 (which also holds for modules),
we have a direct sum
M = p1 (M ) pn (M ) = e1 M en M.
717
718
X
pP
xp ,
xp Mp ,
719
xp =
pP
yp
pP
for all p P , with only finitely many xp and yp nonzero, then xp and yp are annihilated by
some common nonzero element a A, so xp , yp M (a). By Proposition 24.23, we must
have xp = yp for all p, which proves that we have a direct sum.
It is clear that if p and p0 are two irreducible elements such that p = up0 for some unit u,
then Mp = Mp0 . Therefore, Mp only depends on the ideal (p).
Definition 24.9. Given a torsion-module M over a PID, the modules Mp associated with
irreducible elements in P are called the p-primary components of M .
The p-primary components of a torsion module uniquely determine the module, as shown
by the next proposition.
Proposition 24.25. Two torsion modules M and N over a PID are isomorphic iff for
every every irreducible element p P , the p-primary components Mp and Np of M and N
are isomorphic.
Proof. Let f : M N be an isomorphism. For any p P , we have x Mp iff pk x = 0 for
some k 1, so
0 = f (pk x) = pk f (x),
which shows that f (x) Np . Therefore, f restricts to a linear map f | Mp from Mp to
Np . Since f is an isomorphism, we also have a linear map f 1 : M N , and our previous
reasoning shows that f 1 restricts to a linear map f 1 | Np from Np to Mp . But, f | Mp and
f 1 | Np are mutual inverses, so Mp and Np are isomorphic.
Conversely,
if Mp NL
p for all p P , by Theorem 24.24, we get an isomorphism between
L
M = pP Mp and N = pP Np .
In view of Proposition 24.25, the direct sum of Theorem 24.24 in terms of its p-primary
components is called the canonical primary decomposition of M .
If M is a finitely generated torsion-module, then Theorem 24.24 takes the following form.
Theorem 24.26. (Primary Decomposition Theorem for finitely generated torsion modules)
Let M be a finitely generated torsion-module over a PID A. If Ann(M ) = (a) and if a =
upn1 1 pnr r is a factorization of a into prime factors, then M is the finite direct sum
M=
r
M
M (pni i ).
i=1
720
721
for all n 1.
24.7
There are several ways of obtaining the decomposition of a finitely generated module as a
direct sum of cyclic modules. One way to proceed is to first use the Primary Decomposition
Theorem and then to show how each primary module Mp is the direct sum of cyclic modules of
the form A/(pn ). This is the approach followed by Lang [67] (Chapter III, section 7), among
others. We prefer to use a proposition that produces a particular basis for a submodule of
a finitely generated free module, because it yields more information. This is the approach
followed in Dummitt and Foote [32] (Chapter 12) and Bourbaki [15] (Chapter VII). The
proof that we present is due to Pierre Samuel.
Proposition 24.30. Let F be a finitely generated free module over a PID A, and let M be
any submodule of F . Then, M is a free module and there is a basis (e1 , ..., en ) of F , some
q n, and some nonzero elements a1 , . . . , aq A, such that (a1 e1 , . . . , aq eq ) is a basis of M
and ai divides ai+1 for all i, with 1 i q 1.
Proof. The proposition is trivial when M = {0}, thus assume that M is nontrivial. Pick some
basis (u1 , . . . , un ) for F . Let L(F, A) be the set of linear forms on F . For any f L(F, A),
it is immediately verified that f (M ) is an ideal in A. Thus, f (M ) = ah A, for some ah A,
since every ideal in A is a principal ideal. Since A is a PID, any nonempty family of ideals
in A has a maximal element, so let f be a linear map such that ah A is a maximal ideal in A.
Let i : F A be the i-th projection, i.e., i is defined such that i (x1 u1 + + xn un ) = xi .
722
It is clear that i is a linear map, and since M is nontrivial, one of the i (M ) is nontrivial,
and ah 6= 0. There is some e0 M such that f (e0 ) = ah .
We claim that, for every g L(F, A), the element ah A divides g(e0 ).
Indeed, if d is the gcd of ah and g(e0 ), by the Bezout identity, we can write
d = rah + sg(e0 ),
and
with e0 = ah e.
M = Ae0 (M f 1 (0)),
To prove that we have a direct sum, it is enough to prove that Ae f 1 (0) = {0}. For
any x = re Ae, if f (x) = 0, then f (re) = rf (e) = r = 0, since f (e) = 1 and, thus, x = 0.
Therefore, the sums are direct sums.
723
We can now prove that M is a free module by induction on the size, q, of a maximal
linearly independent family for M .
If q = 0, the result is trivial. Otherwise, since
M = Ae0 (M f 1 (0)),
it is clear that M f 1 (0) is a submodule of F and that every maximal linearly independent
family in M f 1 (0) has at most q 1 elements. By the induction hypothesis, M f 1 (0)
is a free module, and by adding e0 to a basis of M f 1 (0), we obtain a basis for M , since
the sum is direct.
The second part is shown by induction on the dimension n of F .
The case n = 0 is trivial. Otherwise, since
F = Ae f 1 (0),
and since, by the previous argument, f 1 (0) is also free, f 1 (0) has dimension n 1. By
the induction hypothesis applied to its submodule M f 1 (0), there is a basis (e2 , . . . , en )
of f 1 (0), some q n, and some nonzero elements a2 , . . . , aq A, such that, (a2 e2 , . . . , aq eq )
is a basis of M f 1 (0), and ai divides ai+1 for all i, with 2 i q 1. Let e1 = e, and
a1 = ah , as above. It is clear that (e1 , . . . , en ) is a basis of F , and that that (a1 e1 , . . . , aq eq )
is a basis of M , since the sums are direct, and e0 = a1 e1 = ah e. It remains to show that a1
divides a2 . Consider the linear map g : F A such that g(e1 ) = g(e2 ) = 1, and g(ei ) = 0,
for all i, with 3 i n. We have ah = a1 = g(a1 e1 ) = g(e0 ) g(M ), and thus ah A g(M ).
Since ah A is maximal, we must have g(M ) = ah A = a1 A. Since a2 = g(a2 e2 ) g(M ), we
have a2 a1 A, which shows that a1 divides a2 .
We need the following basic proposition.
Proposition 24.31. For any commutative ring A, if F is a free A-module and if (e1 , . . . , en )
is a basis of F , for any elements a1 , . . . , an A, there is an isomorphism
F/(Aa1 e1 Aan en ) (A/a1 A) (A/an A).
Proof. Let : F A/(a1 A) A/(an A) be the linear map given by
(x1 e1 + + xn en ) = (x1 , . . . , xn ),
where xi is the equivalence class of xi in A/ai A. The map is clearly surjective, and its
kernel consists of all vectors x1 e1 + + xn en such that xi ai A, for i = 1, . . . , n, which
means that
Ker () = Aa1 e1 Aan en .
Since M/Ker () is isomorphic to Im(), we get the desired isomorphism.
724
We can now prove the existence part of the structure theorem for finitely generated
modules over a PID.
Theorem 24.32. Let M be a finitely generated nontrivial A-module, where A a PID. Then,
M is isomorphic to a direct sum of cyclic modules
M A/a1 A/am ,
where the ai are proper ideals of A (possibly zero) such that
a1 a2 am 6= A.
More precisely, if a1 = = ar = (0) and (0) 6= ar+1 am 6= A, then
M Ar (A/ar+1 A/am ),
where A/ar+1 A/am is the torsion submodule of M . The module M is free iff r = m,
and a torsion-module iff r = 0. In the latter case, the annihilator of M is a1 .
Proof. Since M is finitely generated and nontrivial, there is a surjective homomorphism
: An M for some n 1, and M is isomorphic to An /Ker (). Since Ker () is a submodule of the free module An , by Proposition 24.30, Ker () is a free module and there is a basis
(e1 , . . . , en ) of An and some nonzero elements a1 , . . . , aq (q n) such that (a1 e1 , . . . , aq eq ) is
a basis of Ker () and a1 | a2 | | aq . Let aq+1 = . . . = an = 0.
By Proposition 24.31, we have an isomorphism
r=nq
where as+1 | as+2 | | aq are nonzero and nonunits and aq+1 = = an = 0, so we define
the m ideals ai as follows:
(
(0)
if 1 i r
ai =
ar+q+1i A if r + 1 i m.
With these definitions, the ideals ai are proper ideals and we have
ai ai+1 ,
i = 1, . . . , m 1.
725
The natural number r is called the free rank or Betti number of the module M . The
generators 1 , . . . , m of the ideals a1 , . . . , am (defined up to a unit) are often called the
invariant factors of M (in the notation of Theorem 24.32, the generators of the ideals
a1 , . . . , am are denoted by aq , . . . , as+1 , s q).
As corollaries of Theorem 24.32, we obtain again the following facts established in Section
24.1:
1. A finitely generated module over a PID is the direct sum of its torsion module and a
free module.
2. A finitely generated torsion-free module over a PID is free.
It turns out that the ideals a1 a2 am 6= A are uniquely determined by the
module M . Uniqueness proofs found in most books tend to be intricate and not very intuitive.
The shortest proof that we are aware of is from Bourbaki [15] (Chapter VII, Section 4), and
uses wedge products.
The following preliminary results are needed.
Proposition 24.33. If A is a commutative ring and if a1 , . . . , am are ideals of A, then there
is an isomorphism
A/a1 A/am A/(a1 + + am ).
Sketch of proof. We proceed by induction on m. For m = 2, we define the map
: A/a1 A/a2 A/(a1 + a2 ) by
(a, b) = ab (mod a1 + a2 ).
It is also clear that this map is bilinear, so it induces a linear map : A/a1 A/a2
A/(a1 + a2 ) such that (a b) = ab (mod a1 + a2 ).
Next, observe that any arbitrary tensor
a1 b 1 + + an b n
in A/a1 A/a2 can be rewritten as
1 (a1 b1 + + an bn ),
which is of the form 1 s, with s A. We can use this fact to show that is injective and
surjective, and thus an isomorphism.
726
= 1 (a + b)
=1a+1b
=a1+1b
= 0 + 0 = 0,
M=
k
M^
(M ).
k0
i=1
A proof can be found in Bourbaki [14] (Chapter III, Section 7, No 7, Proposition 10).
Proposition 24.35. Let A be a commutative ring and let a1 , . . . , an be n ideals of A. If the
module M is the direct sum of n cyclic modules
M = A/a1 A/an ,
V
then for every p > 0, the exterior power p M is isomorphic to the direct sum of the modules
A/aH , where H ranges over all subsets H {1, . . . , n} with p elements, and with
X
aH =
ah .
hH
Proof. If ui is the image of 1 in A/ai , then A/ai is equal to Aui . By Proposition 24.34, we
have
n ^
^
O
M
(Aui ).
i=1
We also have
^
(Aui ) =
k
M^
k0
(Aui ) A Aui ,
727
(Auk1 ) (Aukp ).
H{1,...,n}
H={k1 ,...,kp }
p
^
A/aH ,
H{1,...,n}
|H|=p
as claimed.
When the ideals ai form a chain of inclusions a1 an , we get the following
remarkable result.
Proposition 24.36. Let A be a commutative ring and let a1 , . . . , an be n ideals of A such
that a1 a2 an . If the module M is the direct sum of n cyclic modules
M = A/a1 A/an ,
then for every p V
with 1 p n, the ideal ap is theVannihilator of the exterior power
If an 6= A, then p M 6= (0) for p = 1, . . . , n, and p M = (0) for p > n.
Vp
M.
Proof. With the notation of Proposition 24.35, we have aH = amax(H) , where max(H) is the
greatest element in the set H. Since max(H) p for any subset with p elements and since
max(H) = p when H = {1, . . . , p}, we see that
\
aH .
ap =
H{1,...,n}
|H|=p
A/aH
H{1,...,n}
|H|=p
Vp
728
729
730
731
Proposition 24.42. If X is an m n matrix of rank r over a PID A, then there exist some
invertible n n matrix P , some invertible m m matrix Q, and a m n matrix D of the
form
1 0 0 0 0
0 2 0 0 0
.
.. . .
.. ..
..
.
.
.
.
.
.
D = 0 0 r 0 0
0 0 0 0 0
.
..
. . .
.
..
. .. .. . . ..
0 0 0 0 0
for some nonzero i A, such that
(1) 1 | 2 | | r ,
(2) X = QDP 1 , and
(3) The i s are uniquely determined up to a unit.
The ideals 1 A, . . . , r A are called the invariant factors of the matrix X. Recall that
two m n matrices X and Y are equivalent iff
Y = QXP 1 ,
for some invertible matrices, P and Q. Then, Proposition 24.42 implies the following fact.
Proposition 24.43. Two m n matrices X and Y are equivalent iff they have the same
invariant factors.
If X is the matrix of a linear map f : F F 0 with respect to some basis (u1 , . . . , un )
of F and some basis (u01 , . . . , u0m ) of F 0 , then the columns of X are the coordinates of the
f (uj ) over the u0i , where the f (uj ) generate f (F ), so Proposition 24.40 applies and yields
the following result:
Proposition 24.44. If X is a m n matrix or rank r over a PID A, and if 1 A, . . . , r A
are its invariant factors, then 1 is a gcd of the entries in X, and for k = 2, . . . , r, the
product 1 k is a gcd of all k k minors of X.
There are algorithms for converting a matrix X over a PID to the form X = QDP 1
as described in Proposition 24.42. For Euclidean domains, this can be achieved by using
the elementary row and column operations P (i, k), Ei,j; , and Ei, described in Chapter 6,
where we require the scalar used in Ei, to be a unit. For an arbitrary PID, another kind
of elementary matrix (containing some 2 2 submatrix in addition to diagonal entries) is
needed. These procedures involve computing gcds and use the Bezout identity to mimic
732
division. Such methods are presented in Serre [96], Jacobson [59], and Van Der Waerden
[112], and sketched in Artin [4]. We describe and justify several of these methods in Section
25.4.
From Section 24.2, we know that a submodule of a finitely generated module over a PID
is finitely presented. Therefore, in Proposition 24.39, the submodule M of the free module
F is finitely presented by some matrix R with a number of rows equal to the dimension
of F . Using Theorem 25.17, the matrix R can be diagonalized as R = QDP 1 where D
is a diagonal matrix. Then, the columns of Q form a basis (e1 , . . . , en ) of F , and since
RP = QD, the nonzero columns of RP form the basis (a1 e1 , . . . , aq eq ) of M . When the ring
A is a Euclidean domain, Theorem 25.14 shows that P and Q are products of elementary
row and column operations. In particular, when A = Z, in which cases our Z-modules are
abelian groups, we can find P and Q using Euclidean division.
In this case, a finitely generated submodule M of Zn is called a lattice. It is given as the
set of integral linear combinations of a finite set of integral vectors.
Here is an example taken from Artin [4] (Chapter 12, Section 4). Let F be the free
Z-module Z2 , and let M be the lattice generated by the columns of the matrix
2 1
R=
.
1 2
The columns (u1 , u2 ) of R are linearly independent, but they are not a basis of Z2 . For
example, in order to obtain e1 as a linear combination of these columns, we would need to
solve the linear system
2x y = 1
x + 2y = 0.
From the second equation, we get x = 2y, which yields
5y = 1.
But, y = 1/5 is not an integer. We leave it as an exercise to check that
1 0
2 1
1 1
1 0
=
,
3 1
1 2
1 2
0 5
which means that
so R = QDP 1 with
2 1
1 0
1 0
2 1
=
,
1 2
3 1
0 5
1 1
1 0
Q=
,
3 1
1 0
D=
,
0 5
P =
1 1
.
1 2
733
The new basis (u01 , u02 ) for Z2 consists of the columns of Q and the new basis for M consists
of the columns (u01 , 5u02 ) of QD, where
1 0
QD =
.
3 5
A picture of the lattice and its generators (u1 , u2 ) and of the same lattice with the new basis
(u01 , 5u02 ) is shown in Figure 24.1, where the lattice points are displayed as stars.
*
*
*
*
*
*
*
*
*
*
*
*
The invariant factor decomposition of a finitely generated module M over a PID A given
by Theorem 24.38 says that
Mtor A/ar+1 A/am ,
a direct sum of cyclic modules, with (0) 6= ar+1 am 6= A. Using the Chinese
Remainder Theorem (Theorem 21.15), we can further decompose each module A/i A into
a direct sum of modules of the form A/pn A, where p is a prime in A.
Theorem 24.45. (Elementary Divisors Decomposition) Let M be a finitely generated nontrivial A-module, where A a PID. Then, M is isomorphic to the direct sum Ar Mtor , where
Ar is a free module and where the torsion module Mtor is a direct sum of cyclic modules of
n
the form A/pi i,j , for some primes p1 , . . . , pt A and some positive integers ni,j , such that
for each i = 1, . . . , t, there is a sequence of integers
1 ni,1 , . . . , ni,1 < ni,2 , . . . , ni,2 < < ni,si , . . . , ni,si ,
|
{z
} |
{z
}
|
{z
}
mi,1
mi,2
mi,si
734
with si 1, and where ni,j occurs mi,j 1 times, for j = 1, . . . , si . Furthermore, the
irreducible elements pi and the integers r, t, ni,j , si , mi,j are uniquely determined.
Proof. By Theorem 24.38, we already know that M Ar Mtor , where r is uniquely
determined, and where
Mtor A/ar+1 A/am ,
a direct sum of cyclic modules, with (0) 6= ar+1 am 6= A. Then, each ai is a principal
ideal of the form i A, where i 6= 0 and i is not a unit. Using the Chinese Remainder
Theorem (Theorem 21.15), if we factor i into prime factors as
i = upk11 pkhh ,
with kj 1, we get an isomorphism
A/i A A/pk11 A A/pkhh .
n
This implies that Mtor is the direct sum of modules of the form A/pi i,j , for some primes
pi A.
To prove uniqueness, observe that the pi -primary component of Mtor is the direct sum
ni,si
A)mi,si ,
and these are uniquely determined. Since ni,1 < < ni,si , we have
ni,si
pi
A pi i,1 A 6= A,
Proposition 24.37 implies that the irreducible elements pi and ni,j , si , and mi,j are unique.
In view of Theorem 24.45, we make the following definition.
Definition 24.11. Given a finitely generated module M over a PID A as in Theorem 24.45,
n
the ideals pi i,j A are called the elementary divisors of M , and the mi,j are their multiplicities.
The ideal (0) is also considered to be an elementary divisor and r is its multiplicity.
Remark: Theorem 24.45 shows how the elementary divisors are obtained from the invariant
factors: the elementary divisors are the prime power factors of the invariant factors.
Conversely, we can get the invariant factors from the elementary divisors. We may assume
that M is a torsion module. Let
m = max {mi,1 + + mi,si },
1it
and construct the t m matrix C = (cij ) whose ith row is the sequence
ni,si , . . . , ni,si , . . . , ni,2 , . . . , ni,2 , ni,1 , . . . , ni,1 , 0, . . . , 0,
|
{z
}
|
{z
} |
{z
}
mi,si
mi,2
mi,1
735
padded with 0s if necessary to make it of length m. Then, the jth invariant factor is
c
From a computational point of view, finding the elementary divisors is usually practically
impossible, because it requires factoring. For example, if A = K[X] where K is a field, such
as K = R or K = C, factoring amounts to finding the roots of a polynomial, but by Galois
theory, in general, this is not algorithmically doable. On the other hand, the invariant factors
can be computed using elementary row and column operations.
It can also be shown that A and the modules of the form A/pn A are indecomposable
(with n > 0). A module M is said to be indecomposable if M cannot be written as a direct
sum of two nonzero proper submodules of M . For a proof, see Bourbaki [15] (Chapter VII,
Section 4, No. 8, Proposition 8). Theorem 24.45 shows that a finitely generated module over
a PID is a direct sum of indecomposable modules.
We will now apply the structure theorems for finitely generated (torsion) modules to the
K[X]-module Ef associated with an endomorphism f on a vector space E.
736
Chapter 25
The Rational Canonical Form and
Other Normal Forms
25.1
Let E be a finite-dimensional vector space over a field K, and let f : E E be an endomorphism of E. We know from Section 24.5 that there is a K[X]-module Ef associated with f ,
and that Mf is a finitely generated torsion module over the PID K[X]. In this chapter, we
show how Theorems from Sections 24.6 and 24.7 yield important results about the structure
of the linear map f .
Recall that the annihilator of a subspace V is an ideal (p) uniquely defined by a monic
polynomial p called the minimal polynomial of V .
Our first result is obtained by translating the primary decomposition theorem, Theorem
24.26. It is not too surprising that we obtain again Theorem 22.7!
Theorem 25.1. (Primary Decomposition Theorem) Let f : E E be a linear map on the
finite-dimensional vector space E over the field K. Write the minimal polynomial m of f as
m = pr11 prkk ,
where the pi are distinct irreducible monic polynomials over K, and the ri are positive integers. Let
Wi = Ker (pi (f )ri ), i = 1, . . . , k.
Then
(a) E = W1 Wk .
(b) Each Wi is invariant under f and the projection from W onto Wi is given by a polynomial in f .
(c) The minimal polynomial of the restriction f | Wi of f to Wi is pri i .
737
738
739
of the cyclic subspace Ei = Z(ui ; f ), with ni = deg(qi ), the matrix of the restriction of f to
Ei is the companion matrix of pi (X), of the form
0 0 0 0 a0
1 0 0 0 a1
0 1 0 0 a2
.
..
. . . . . . . . . . ..
.
.
.
.
0 0 0 . . . 0 an 2
i
0 0 0 1 ani 1
If we put all these bases together, we obtain a block matrix whose blocks are of the above
form. Therefore, we proved the following result.
Theorem 25.3. (Rational Canonical Form, First Version) Let f : E E be an endomorphism on a K-vector space of dimension n. There exist n monic polynomials q1 , . . . , qn
K[X] such that
q1 | q2 | | qn ,
with q1 = = qnm = 1, and a basis of E such
the form
Anm+1
0
0
Anm+2
..
..
X= .
.
0
0
0
0
..
.
0
0
..
.
An1
0
0
0
..
.
0
An
where each Ai is the companion matrix of qi . The polynomials qi satisfying the above conditions are unique, and qn is the minimal polynomial of f .
Definition 25.1. A matrix X as in Theorem 25.3 is called a matrix in rational form. The
polynomials q1 , . . . , qn arising in Theorems 25.2 and 25.3 are called the similarity invariants
(or invariant factors) of f .
Theorem 25.3 shows that every matrix is similar to a matrix in rational form. Such a
matrix is unique.
By Proposition 24.20, two linear maps f and f 0 are similar iff there is an isomorphism
between Ef and Ef0 0 , and thus by the uniqueness part of Theorem 24.38, iff they have the
same similarity invariants q1 , . . . , qn .
Proposition 25.4. If E and E 0 are two finite-dimensional vector spaces and if f : E E
and f 0 : E 0 E 0 are two linear maps, then f and f 0 are similar iff they have the same
similarity invariants.
The effect of extending the fied K to a field L is the object of the next proposition.
740
Proposition 25.5. Let f : E E be a linear map on a K-vector space E, and let (q1 , . . . , qn )
be the similarity invariants of f . If L is a field extension of K (which means that K L),
and if E(L) = L K E is the vector space obtained by extending the scalars, and f(L) = 1L f
the linear map of E(L) induced by f , then the similarity invariants of f(L) are (q1 , . . . , qn )
viewed as polynomials in L[X].
Proof. We know that Ef is isomorphic to the direct sum
Ef K[X]/(q1 K[X]) K[X]/(qn K[X]),
so by tensoring with L[X] and using Propositions 24.12 and 23.7, we get
L[X] K[X] Ef L[X] K[X] (K[X]/(q1 K[X]) K[X]/(qn K[X]))
L[X] K[X] (K[X]/(q1 K[X])) L[X] K[X] (K[X]/(qn K[X]))
(K[X]/(q1 K[X])) K[X] L[X] (K[X]/(qn K[X])) K[X] L[X].
However, by Proposition 24.14, we have isomorphisms
(K[X]/(qi K[X])) K[X] L[X] L[X]/(qi L[X]),
so we get
L[X] K[X] Ef L[X]/(q1 L[X]) L[X]/(qn L[X]).
Since Ef is a K[X]-module, the L[X] module L[X] K[X] Ef is the module obtained from
Ef by the ring extension K[X] L[X], and since f is a K[X]-linear map of Ef , it becomes
f(L[X]) on L[X] K[X] Ef , which is the same as f(L) viewed as an L-linear map of the space
E(L) = L K E, so L[X] K[X] Ef is actually isomorphic to E(L)f(L) , and we have
E(L)f(L) L[X]/(q1 L[X]) L[X]/(qn L[X]),
which shows that (q1 , . . . , qn ) are the similarity invariants of f(L) .
Proposition justifies the terminology invariant in similarity invariants. Indeed, under
a field extension K L, the similarity invariants of f(L) remain the same. This is not true
of the elementary divisors, which depend on the field; indeed, an irreducible polynomial
p K[X] may split over L[X]. Since qn is the minimal polynomial of f , the above reasoning
also shows that the minimal polynomial of f(L) remains the same under a field extension.
Proposition 25.5 has the following corollary.
Proposition 25.6. Let K be a field and let L K be a field extension of K. For any
two square matrices X and Y over K, if there is an invertible matrix Q over L such that
Y = QXQ1 , then there is an invertible matrix P over K such that Y = P XP 1 .
741
/ E[X]
E[X]
/ Ef
742
(2) The minimal polynomial m(X) = qn (X) of f divides the characteristic polynomial
f X) of f .
(3) The characteristic polynomial f X) divides m(X)n .
(4) E is cyclic for f iff m(X) = (X).
Proof. Property (1) follows from Proposition 25.8 for k = n. It also follows from Theorem
25.3 and the fact that for the companion matrix associated with qi , the characteristic polynomial of this matrix is also qi . Property (2) is obvious from (1). Since each qi divides qi+1 ,
each qi divides qn , so their product f (X) divides m(X)n = qn (X)n . The last condition says
that q1 = = qn1 = 1, which means that Ef has a single summand.
Observe that Proposition 25.9 yields another proof of the CayleyHamilton Theorem. It
also implies that a linear map is nilpotent iff its characteristic polynomial is X n .
25.2
Let us now translate the Elementary Divisors Decomposition Theorem, Theorem 24.45, in
terms of Ef . We obtain the following result.
Theorem 25.10. (Cyclic Decomposition Theorem, Second Version) Let f : E E be an
endomorphism on a K-vector space of dimension n. Then, E is the direct sum of of cyclic
n
subspaces Ej = Z(uj ; f ) for f , such that the minimal polynomial of Ej is of the form pi i,j ,
for some irreducible monic polynomials p1 , . . . , pt K[X] and some positive integers ni,j ,
such that for each i = 1, . . . , t, there is a sequence of integers
1 ni,1 , . . . , ni,1 < ni,2 , . . . , ni,2 < < ni,si , . . . , ni,si ,
|
{z
} |
{z
}
{z
}
|
mi,1
mi,2
mi,si
with si 1, and where ni,j occurs mi,j 1 times, for j = 1, . . . , si . Furthermore, the monic
polynomials pi and the integers r, t, ni,j , si , mi,j are uniquely determined.
P
Note that there are =
mi,j cyclic subspaces Z(uj ; f ). Using bases for the cyclic
subspaces Z(uj ; f ) as in Theorem 25.3, we get the following theorem.
Theorem 25.11. (Rational Canonical Form, Second Version) Let f : E E be an endomorphism on a K-vector space of dimension n. There exist t distinct irreducible monic
polynomials p1 , . . . , pt K[X] and some positive integers ni,j , such that for each i = 1, . . . , t,
there is a sequence of integers
1 ni,1 , . . . , ni,1 < ni,2 , . . . , ni,2 < < ni,si , . . . , ni,si ,
|
{z
} |
{z
}
|
{z
}
mi,1
mi,2
mi,si
743
with si 1, and where ni,j occurs mi,j 1 times, for j = 1, . . . , si , and there is a basis of E
such that the matrix X of f is a block matrix of the form
A1 0
0
0
0 A2
0
0
..
.
.
.
.
.
.
.
.
X= .
,
.
.
.
.
0 0 A1 0
0 0
0
A
P
n
where each Aj is the companion matrix of some pi i,j , and =
mi,j . The monic polynomials p1 , . . . , pt and the integers r, t, ni,j , si , mi,j are uniquely determined
n
The polynomials pi i,j are called the elementary divisors of f (and X). These polynomials
are factors of the minimal polynomial.
As we pointed earlier, unlike the similarity invariants, the elementary divisors may change
when we pass to a field extension.
We will now consider the special case where all the irreducible polynomials pi are of the
form X i ; that is, when are the eigenvalues of f belong to K. In this case, we find again
the Jordan form.
25.3
In this section, we assume that all the roots of the minimal polynomial of f belong to K.
This will be the case if K is algebraically closed. The irreducible polynomials pi of Theorem
25.10 are the polynomials X i , for the distinct eigenvalues i of f . Then, each cyclic
subspace Z(uj ; f ) has a minimal polynomial of the form (X )m , for some eigenvalue of
f and some m 1. It turns out that by choosing a suitable basis for the cyclic subspace
Z(uj ; f ), the matrix of the restriction of f to Z(uj ; f ) is a Jordan block.
Proposition 25.12. Let E be a finite-dimensional K-vector space and let f : E E be a
linear map. If E is a cyclic K[X]-module and if (X )n is the minimal polynomial of f ,
then there is a basis of E of the form
((f id)n1 (u), (f id)n2 (u), . . . , (f id)(u), u),
for some u E. With respect to this basis,
0
.
Jn () = ..
0
0
1 0 0
1 0
.. . . . . ..
.
. .
.
.
..
. 1
0 0
0 0
744
0 k n 2.
But this means precisely that the matrix of f in this basis is the Jordan block Jn ().
745
Combining Theorem 25.11 and Proposition 25.12, we obtain a strong version of the
Jordan form.
Theorem 25.13. (Jordan Canonical Form) Let E be finite-dimensional K-vector space.
The following properties are equivalent:
(1) The eigenvalues of f all belong to K.
(2) There is a basis of E in which the matrix of f is upper (or lower) triangular.
(3) There exist a basis of E in which the matrix A of f is Jordan matrix. Furthermore, the
number of Jordan blocks Jr () appearing in A, for fixed r and , is uniquely determined
by f .
Proof. The implication (1) = (3) follows from Theorem 25.11 and Proposition 25.12. The
implications (3) = (2) and (2) = (1) are trivial.
Compared to Theorem 22.14, the new ingredient is the uniqueness assertion in (3), which
is not so easy to prove.
Observe that the minimal polynomial of f is the least common multiple of the polynomials
(X )r associated with the Jordan blocks Jr () appearing in A, and the characteristic
polynomial of A is the product of these polynomials.
We now return to the problem of computing effectively the similarity invariants of a
matrix A. By Proposition 25.7, this is equivalent to computing the invariant factors of
XI A. In principle, this can be done using Proposition 24.42. A procedure to do this
effectively for the ring A = K[X] is to convert XI A to its Smith normal form. This will
also yield the rational canonical form for A.
25.4
The Smith normal form is the special case of Proposition 24.42 applied to the PID K[X]
where K is a field, but it also says that the matrices P and Q are products of elementary
matrices. It turns out that such a result holds for any Euclidean ring, and the proof is
basically the same.
Recall from Definition 20.9 that a Euclidean ring is an integral domain A such that there
exists a function : A N with the following property: For all a, b A with b 6= 0, there
are some q, r A such that
a = bq + r
746
Theorem 25.14. If M is an m n matrix over a Euclidean ring A, then there exist some
invertible n n matrix P and some invertible m m matrix Q, where P and Q are products
of elementary matrices, and a m n matrix D of the form
1 0 0 0 0
0 2 0 0 0
.
.. . .
.. ..
..
.
.
.
.
.
.
D = 0 0 r 0 0
0 0 0 0 0
.
..
. . .
.
..
. .. .. . . ..
0 0 0 0 0
for some nonzero i A, such that
(1) 1 | 2 | | r , and
(2) M = QDP 1 .
Proof. We follow Jacobsons proof [59] (Chapter 3, Theorem 3.8). We proceed by induction
on m + n.
If m = n = 1, let P = (1) and Q = (1).
For the induction step, if M = 0, let P = In and Q = Im . If M 6= 0, the stategy is to
apply a sequence of elementary transformations that converts M to a matrix of the form
1 0 0
0
0
M = ..
Y
0
where Y is a (m 1) (n 1)-matrix such that 1 divides every entry in Y . Then, we
proceed by induction on Y . To find M 0 , we perform the following steps.
Step 1 . Pick some nonzero entry aij in M such that (aij ) is minimal. Then permute
column j and column 1, and permute row i and row 1, to bring this entry in position (1, 1).
We denote this new matrix again by M .
Step 2a.
If m = 1 go to Step 2b.
If m > 1, then there are two possibilities:
(i) M is of the form
..
..
.. .
.
.
.
.
.
.
0 am2 amn
747
(a) If there is some entry ak1 in the first column such that a11 does not divide ak1 , then
pick such an entry (say, with the smallest index i such that (ai1 ) is minimal), and divide
ak1 by a11 ; that is, find bk and bk1 such that
ak1 = a11 bk + bk1 ,
bk1 bk2
a21 a22
M = ..
..
...
.
.
am1 am2
bkn
a2n
.. .
.
amn
(b) If a11 divides every (nonzero) entry ai1 for i 2, say ai1 = a11 bi , then subtract bi
times row 1 from row i for i = 2, . . . , m; go to Step 2b.
Observe that whenever we return to the beginning of Step 2a, we have (bk1 ) < (a11 ).
Therefore, after a finite number of steps, we must exit Step 2a with a matrix in which all
entries in column 1 but the first are zero and go to Step 2b.
Step 2b.
This step is reached only if n > 1 and if the only nonzero entry in the first column is a11 .
(a) If M is of the form
a11
0
..
.
0
0
a22
..
.
..
.
am2
0
a2n
..
.
amn
(b) If there is some entry a1k in the first row such that a11 does not divide a1k , then pick
such an entry (say, with the smallest index j such that (a1j ) is minimal), and divide a1k by
a11 ; that is, find bk and b1k such that
a1k = a11 bk + b1k ,
b1k ak2
b2k a22
M = ..
..
.
.
bmk am2
akn
a2n
.. .
...
.
amn
748
a11 0
0
M = ..
.
Y
0
go to Step 4.
(ii) If Step 2b ruined column 1 which now contains some nonzero entry below a11 , go
back to Step 2a.
We perform a sequence of alternating steps between Step 2a and Step 2b. Because the
-value of the (1, 1)-entry strictly decreases whenever we reenter Step 2a and Step 2b, such
a sequence must terminate with a matrix of the form
a11 0 0
0
M = ..
Y
0
Step 4 . If a11 divides all entries in Y , stop.
Otherwise, there is some column, say j, such that a11 does not divide some entry aij , so
add the jth column to the first column. This yields a matrix of the form
a11 0 0
b2j
M = ..
Y
bmj
where the ith entry in column 1 is nonzero, so go back to Step 2a,
Again, since the -value of the (1, 1)-entry strictly decreases whenever we reenter Step
2a and Step 2b, such a sequence must terminate with a matrix of the form
1 0 0
0
0
M = ..
Y
0
749
If the PID A is the polynomial ring K[X] where K is a field, the i are nonzero polynomials, so we can apply row operations to normalize their leading coefficients to be 1. We
obtain the following theorem.
Theorem 25.15. (Smith Normal Form) If M is an m n matrix over the polynomial ring
K[X], where K is a field, then there exist some invertible nn matrix P and some invertible
m m matrix Q, where P and Q are products of elementary matrices with entries in K[X],
and a m n matrix D of the form
q1 0 0 0 0
0 q2 0 0 0
. .
..
. . . . . .. ..
. . .
. .
D = 0 0 qr 0 0
0 0 0 0 0
. .
1
..
.
D = 0
0
.
..
0
..
.
..
.
0
..
.
0
..
.
0
..
.
1 0 0
0 q1 0
0 0 q2
.. .. ..
. . .
0 0 0
..
.
0
..
.
0
.
..
. ..
qm
750
(1) q1 | q2 | | qm ,
(2) q1 , . . . qm are the similarity invariants of A, and
(3) XI A = QDP 1 .
The matrix D in Theorem 25.16 is often called Smith normal form of A, even though
this is confusing terminology since D is really the Smith normal form of XI A.
Of course, we know from previous work that in Theorem 25.15, the 1 , . . . , r are unique,
and that in Theorem 25.16, the q1 , . . . , qm are unique. This can also be proved using some
simple properties of minors, but we leave it as an exercise (for help, look at Jacobson [59],
Chapter 3, Theorem 3.9).
The rational canonical form of A can also be obtained from Q1 and D, but first, let
us consider the generalization of Theorem 25.15 to PIDs that are not necessarily Euclidean
rings.
We need to find a norm that assigns a natural number (a) to any nonzero element
of a PID A, in such a way that (a) decreases whenever we return to Step 2a and Step 2b.
Since a PID is a UFD, we use the number
(a) = k1 + + kr
of prime factors in the factorization of a nonunit element
a = upk11 pkr r ,
and we set
(u) = 0
if u is a unit.
We cant divide anymore, but we can find gcds and use Bezout to mimic division. The
key ingredient is this: for any two nonzero elements a, b A, if a does not divide b then let
d 6= 0 be a gcd of a and b. By Bezout, there exist x, y A such that
ax + by = d.
We can also write a = td and b = sd, for some s, t A, so that tdx sdy = d, which
implies that
tx sy = 1,
since A is an integral domain. Observe that
t s
x s
1 0
=
,
y x
y t
0 1
751
which shows that both matrices on the left of the equation are invertible, and so is the
transpose of the second one,
x y
s t
(they all have determinant 1). We also have
as + bt = tds sdt = 0,
so
x y
a
d
=
s t
b
0
and
x s
a b
= d 0 .
y t
Because a does not divide b, their gcd d has strictly fewer prime factors than a, so
(d) < (a).
Using matrices of the form
x
s
..
.
0
y
t
0
0
..
.
0
0
1
0
..
.
0
0
0
1
..
.
...
0
0
..
.
1
with xt ys = 1, we can modify Steps 2a and Step 2b to obtain the following theorem.
Theorem 25.17. If M is an m n matrix over a PID A, then there exist some invertible
n n matrix P and some invertible m m matrix Q, where P and Q are products of
elementary matrices and matrices of the form
x
s
..
.
0
y
t
0
0
..
.
0
0
1
0
..
.
0
0
0
1
..
.
..
.
0
0
..
.
1
752
1
0
.
.
.
D=0
0
.
..
0
for some nonzero i A, such that
D of the form
0 0
2 0
.. . .
.
. ..
.
0 r
0 0
..
.
. ..
0 0
0
0
..
.
0
0
.. . .
.
.
0
0
0
..
0
..
.
0
(1) 1 | 2 | | r , and
(2) M = QDP 1 .
Proof sketch. In Step 2a, if a11 does not divide ak1 , then first permute row 2 and row k (if
k 6= 2). Then, if we write a = a11 and b = ak1 , if d is a gcd of a and b and if x, y, s, t are
determined as explained above, multiply on the left by the matrix
x y 0 0 0
s t 0 0 0
0 0 1 0 0
0 0 0 1 0
.. .. .. .. . . ..
. . . .
. .
0 0 0 1
to obtain a matrix of the form
d
0
a31
..
.
a12
a22
a32
..
.
am1 am2
a1n
a2n
a3n
..
.
. . . amn
..
.
x s 0 0 0
y t 0 0 0
0 0 1 0 0
0 0 0 1 0
.. .. .. .. . . ..
. . . .
. .
0 0 0 1
753
d
a21
..
.
0
a22
..
.
a13
a23
..
.
..
.
a1n
a2n
..
.
with (d) < (a11 ). Then, go back to Step 2b. The other steps remain the same. Whenever
we return to Step 2a or Step 2b, the -value of the (1, 1)-entry strictly decreases, so the
whole procedure terminates.
We conclude this section by explaining how the rational canonical form of a matrix A
can be obtained from the canonical form QDP 1 of XI A.
i = 1, . . . , n m.
where K[X]wi K[X]/(qi ) as a cyclic K[X]-module. Since Im() = Ker (), we have
0 = ((unm+i )) = (qi vnm+i ) = qi (vnm+i ) = qi wi ,
754
so as a K-vector space, the cyclic subspace Z(wi ; f ) = K[X]wi has qi as annihilator, and by
a remark from Section 24.5, it has the basis (over K)
(wi , f (wi ), . . . , f ni 1 (wi )),
ni = deg(qi ).
Furthermore, over this basis, the restriction of f to Z(wi ; f ) is represented by the companion
matrix of qi . By putting all these bases together, we obtain a block matrix which is the
canonical rational form of f (and A).
Now, XI A = QDP 1 is the matrix of with respect to the canonical basis (e1 , . . . , en )
(over K[X]), and D is the matrix of with respect to the bases (u1 , . . . , un ) and (v1 , . . . , vn )
(over K[X]), which tells us that the columns of Q consist of the coordinates (in K[X]) of the
basis vectors (v1 , . . . , vn ) with respect to the basis (e1 , . . . , en ). Therefore, the coordinates (in
K) of the vectors (w1 , . . . , wm ) spanning Ef over K[X], where wi = (vnm+i ), are obtained
by substituting the matrix A for X in the coordinates of the columns vectors of Q, and
evaluating the resulting expressions.
Since
D = Q1 (XI A)P,
the matrix D is obtained from A by a sequence of elementary row operations whose product
is Q1 and a sequence of elementary column operations whose product is P . Therefore, to
compute the vectors w1 , . . . , wm from A, we simply have to figure out how to construct Q
from the sequence of elementary row operations that yield Q1 . The trick is to use column
operations to gather a product of row operations in reverse order.
Indeed, if Q1 is the product of elementary row operations
Q1 = Ek E2 E1 ,
then
Q = E11 E21 Ek1 .
Now, row operations operate on the left and column operations operate on the right, so
the product E11 E21 Ek1 can be computed from left to right as a sequence of column
operations.
Let us review the meaning of the elementary row and column operations P (i, k), Ei,j; ,
and Ei, .
1. As a row operation, P (i, k) permutes row i and row k.
2. As a column operation, P (i, k) permutes column i and column k.
3. The inverse of P (i, k) is P (i, k) itself.
4. As a row operation, Ei,j; adds times row j to row i.
755
5. As a column operation, Ei,j; adds times column i to column j (note the switch in
the indices).
6. The inverse of Ei,j; is Ei,j; .
7. As a row operation, Ei, multiplies row i by .
8. As a column operation, Ei, multiplies column i by .
9. The inverse of Ei, is Ei,1 .
Given a square matrix A (over K), the row and column operations applied to XI A in
converting it to its Smith normal form may involve coefficients that are polynomials and it
is necessary to explain what is the action of an operation Ei,j; in this case. If the coefficient
in Ei,j; is a polynomial over K, as a row operation, the action of Ei,j; on a matrix X is
to multiply the jth row of M by the matrix (A) obtained by substituting the matrix A for
X and then to add the resulting vector to row i. Similarly, as a column operation, the action
of Ei,j; on a matrix X is to multiply the ith column of M by the matrix (A) obtained
by substituting the matrix A for X and then to add the resulting vector to column j. An
algorithm to compute the rational canonical form of a matrix can now be given. We apply
the elementary column operations Ei1 for i = 1, . . . k, starting with the identity matrix.
Algorithm for Converting an n n matrix to Rational Canonical Form
While applying elementary row and column operations to compute the Smith normal
form D of XI A, keep track of the row operations and perform the following steps:
1. Let P 0 = In , and for every elementary row operation E do the following:
(a) If E = P (i, k), permute column i and column k of P 0 .
(b) If E = Ei,j; , multiply the ith column of P 0 by the matrix (A) obtained by
substituting the matrix A for X, and then subtract the resulting vector from
column j.
(c) If E = Ei, where K, then multiply the ith column of P 0 by 1 .
2. When step (1) terminates, the first n m columns of P 0 are zero and the last m are
linearly independent. For i = 1, . . . , m, multiply the (n m + i)th column wi of P 0
successively by I, A1 , A2 , Ani 1 , where ni is the degree of the polynomial qi (appearing
in D), and form the n n matrix P consisting of the vectors
w1 , Aw1 , . . . , An1 1 w1 , w2 , Aw2 , . . . , An2 1 w2 , . . . , wm , Awm , . . . , Anm 1 wm .
Then, P 1 AP is the canonical rational form of A.
756
Here is an example taken from Dummit and Foote [32] (Chapter 12, Section 12.2). Let
A be the matrix
1 2 4 4
2 1 4 8
.
A=
1 0
1 2
0 1 2 3
One should check that the following sequence of row and column operations produces the
Smith normal form D of XI A:
row P (1, 3) row E1,1 row E2,1;2 row E3,1;(X1) column E1,3;X1 column E1,4;2
row P (2, 4) rowE2,1 row E3,2;2 row E4,2;(X+1) column E2,3;2
column E2,4;X3 ,
with
1
0
D=
0
0
0
0
1
0
0
.
2
0 (X 1)
0
2
0
0
(X 1)
Then, applying Step 1 of the above algorithm, we get the sequence of column operations:
0 0 1 0
1 0 0 0
0 0 1 0
0 1 0 0
0 1 0 0
0 1 0 0
E1,1
E2,1,2
P (1,3)
0 0 1 0
1 0 0 0
1 0 0 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 1 0
0 0 1 0
0 0 1 0
2 1 0 0 E3,1,AI 0 1 0 0
0 0 0 1
E2,1
P (2,4)
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
0 0 0 1
0 1 0 0
0 0 1 0
0 2 1 0
0 0 1 0
0 0 0 1 E3,2,2
0 0 0 1 E4,2;A+I 0 0 0 1
0
0 0 0 0
0 0 0 0
0 0 0 0 = P .
0 1 0 0
0 1 0 0
0 0 0 0
Step 2 of the algorithm yields the vectors
1
1
1
0
0 2
, A = ,
0 1
0
0
0
0
0
1
,
0
0
0
2
1 1
A
0 = 0 ,
0
1
so we get
1
0
P =
0
0
1
2
1
0
0 2
1 1
.
0 0
0 1
757
P 1
1
0
=
0
0
0 1 2
0 1
0
,
1 2 1
0 0
1
0 1 0 0
1 2 0 0
P 1 AP =
0 0 0 1 .
0 0 1 2
758
Chapter 26
Topology
26.1
This chapter contains a review of basic topological concepts. First, metric spaces are defined.
Next, normed vector spaces are defined. Closed and open sets are defined, and their basic
properties are stated. The general concept of a topological space is defined. The closure and
the interior of a subset are defined. The subspace topology and the product topology are
defined. Continuous maps and homeomorphisms are defined. Limits of seqences are defined.
Continuous linear maps and multilinear maps are defined and studied briefly. The chapter
ends with the definition of a normed affine space.
Most spaces considered in this book have a topological structure given by a metric or a
norm, and we first review these notions. We begin with metric spaces. Recall that R+ =
{x R | x 0}.
Definition 26.1. A metric space is a set E together with a function d : E E R+ ,
called a metric, or distance, assigning a nonnegative real number d(x, y) to any two points
x, y E, and satisfying the following conditions for all x, y, z E:
(D1) d(x, y) = d(y, x).
(symmetry)
(positivity)
(triangle inequality)
Geometrically, condition (D3) expresses the fact that in a triangle with vertices x, y, z,
the length of any side is bounded by the sum of the lengths of the other two sides. From
(D3), we immediately get
|d(x, y) d(y, z)| d(x, z).
Let us give some examples of metric spaces. Recall that the absolute value |x| of a real
number x R is defined such
that |x| = x if x 0, |x| = x if x < 0, and for a complex
number x = a + ib, by |x| = a2 + b2 .
759
760
Example 26.1.
1. Let E = R, and d(x, y) = |x y|, the absolute value of x y. This is the so-called
natural metric on R.
2. Let E = Rn (or E = Cn ). We have the Euclidean metric
d2 (x, y) = |x1 y1 |2 + + |xn yn |2
21
(closed interval)
(open interval)
Let E = [a, b], and d(x, y) = |x y|. Then, ([a, b], d) is a metric space.
We will need to define the notion of proximity in order to define convergence of limits
and continuity of functions. For this, we introduce some standard small neighborhoods.
Definition 26.2. Given a metric space E with metric d, for every a E, for every R,
with > 0, the set
B(a, ) = {x E | d(a, x) }
is called the closed ball of center a and radius , the set
B0 (a, ) = {x E | d(a, x) < }
is called the open ball of center a and radius , and the set
S(a, ) = {x E | d(a, x) = }
is called the sphere of center a and radius . It should be noted that is finite (i.e., not
+). A subset X of a metric space E is bounded if there is a closed ball B(a, ) such that
X B(a, ).
Clearly, B(a, ) = B0 (a, ) S(a, ).
761
1. In E = R with the distance |x y|, an open ball of center a and radius is the open
interval ]a , a + [.
2. In E = R2 with the Euclidean metric, an open ball of center a and radius is the set
of points inside the disk of center a and radius , excluding the boundary points on
the circle.
3. In E = R3 with the Euclidean metric, an open ball of center a and radius is the set
of points inside the sphere of center a and radius , excluding the boundary points on
the sphere.
One should be aware that intuition can be misleading in forming a geometric image of a
closed (or open) ball. For example, if d is the discrete metric, a closed ball of center a and
radius < 1 consists only of its center a, and a closed ball of center a and radius 1
consists of the entire space!
If E = [a, b], and d(x, y) = |x y|, as in Example 26.1, an open ball B0 (a, ), with
< b a, is in fact the interval [a, a + [, which is closed on the left.
We now consider a very important special case of metric spaces, normed vector spaces.
Normed vector spaces have already been defined in Chapter 7 (Definition 7.1) but for the
readers convenience we repeat the definition.
Definition 26.3. Let E be a vector space over a field K, where K is either the field R of
reals, or the field C of complex numbers. A norm on E is a function k k : E R+ , assigning
a nonnegative real number kuk to any vector u E, and satisfying the following conditions
for all x, y, z E:
(N1) kxk 0, and kxk = 0 iff x = 0.
(positivity)
(scaling)
(triangle inequality)
762
it is easily seen that d is a metric. Thus, every normed vector space is immediately a metric
space. Note that the metric associated with a norm is invariant under translation, that is,
d(x + u, y + u) = d(x, y).
For this reason, we can restrict ourselves to open or closed balls of center 0.
Examples of normed vector spaces were given in Example 7.1. We repeat the most
important examples.
Example 26.3. Let E = Rn (or E = Cn ). There are three standard norms. For every
(x1 , . . . , xn ) E, we have the norm kxk1 , defined such that,
kxk1 = |x1 | + + |xn |,
we have the Euclidean norm kxk2 , defined such that,
kxk2 = |x1 |2 + + |xn |2
21
763
is an open set. In fact, it is possible to find a metric for which such open n-cubes are open
balls! Similarly, we can define the closed n-cube
{(x1 , . . . , xn ) E | ai xi bi , 1 i n},
which is a closed set.
The open sets satisfy some important properties that lead to the definition of a topological
space.
Proposition 26.1. Given a metric space E with metric d, the family O of all open sets
defined in Definition 26.4 satisfies the following properties:
(O1) For every finite family (Ui )1in of sets Ui O, we have U1 Un O, i.e., O is
closed under finite intersections.
S
(O2) For every arbitrary family (Ui )iI of sets Ui O, we have iI Ui O, i.e., O is closed
under arbitrary unions.
(O3) O, and E O, i.e., and E belong to O.
Furthermore, for any two distinct points a 6= b in E, there exist two open sets Ua and Ub
such that, a Ua , b Ub , and Ua Ub = .
Proof. It is straightforward. For the last point, letting = d(a, b)/3 (in fact = d(a, b)/2
works too), we can pick Ua = B0 (a, ) and Ub = B0 (b, ). By the triangle inequality, we
must have Ua Ub = .
The above proposition leads to the very general concept of a topological space.
One should be careful that, in general, the family of open sets is not closed under infinite
intersections. ForTexample, in R under the metric |x y|, letting Un =] 1/n, +1/n[,
each Un is open, but n Un = {0}, which is not open.
26.2
Topological Spaces
764
It is possible that an open set is also a closed set. For example, and E are both open
and closed. When a topological space contains a proper nonempty subset U which is
both open and closed, the space E is said to be disconnected .
A topological space (E, O) is said to satisfy the Hausdorff separation axiom (or T2 separation axiom) if for any two distinct points a 6= b in E, there exist two open sets Ua and
Ub such that, a Ua , b Ub , and Ua Ub = . When the T2 -separation axiom is satisfied,
we also say that (E, O) is a Hausdorff space.
As shown by Proposition 26.1, any metric space is a topological Hausdorff space, the
family of open sets being in fact the family of arbitrary unions of open balls. Similarly,
any normed vector space is a topological Hausdorff space, the family of open sets being the
family of arbitrary unions of open balls. The topology O consisting of all subsets of E is
called the discrete topology.
Remark: Most (if not all) spaces used in analysis are Hausdorff spaces. Intuitively, the
Hausdorff separation axiom says that there are enough small open sets. Without this
axiom, some counter-intuitive behaviors may arise. For example, a sequence may have more
than one limit point (or a compact set may not be closed). Nevertheless, non-Hausdorff
topological spaces arise naturally in algebraic geometry. But even there, some substitute for
separation is used.
One of the reasons why topological spaces are important is that the definition of a topology only involves a certain family O of sets, and not how such family is generated from
a metric or a norm. For example, different metrics or different norms can define the same
family of open sets. Many topological properties only depend on the family O and not on
the specific metric or norm. But the fact that a topology is definable from a metric or a
norm is important, because it usually implies nice properties of a space. All our examples
will be spaces whose topology is defined by a metric or a norm.
By taking complements, we can state properties of the closed sets dual to those of Definition 26.5. Thus, and E are closed sets, and the closed sets are closed under finite unions
and arbitrary intersections.
It is also worth noting that the Hausdorff separation axiom implies that for every a E,
the set {a} is closed. Indeed, if x E {a}, then x 6= a, and so there exist open sets Ua
and Ux such that a Ua , x Ux , and Ua Ux = . Thus, for every x E {a}, there is an
open set Ux containing x and contained in E {a}, showing by (O3) that E {a} is open,
and thus that the set {a} is closed.
765
Given a topological space (E, O), given any subset A of E, since E O and E is a closed
set, the family CA = {F | A F, F a closed set} of closed sets containing A is nonempty,
T
and since any arbitrary intersection of closed sets is a closed set, the intersection CA of
the sets in the family CA is the smallest closed set containing A. By a similar reasoning, the
union of all the open subsets contained in A is the largest open set contained in A.
Definition 26.6. Given a topological space (E, O), given any subset A of E, the smallest
closed set containing A is denoted by A, and is called the closure, or adherence of A. A
subset A of E is dense in E if A = E. The largest open set contained in A is denoted by
A, and is called the interior of A. The set Fr A = A E A is called the boundary (or
frontier) of A. We also denote the boundary of A by A.
Remark: The notation A for the closure of a subset A of E is somewhat unfortunate,
since A is often used to denote the set complement of A in E. Still, we prefer it to more
cumbersome notations such as clo(A), and we denote the complement of A in E by E A
(or sometimes, Ac ).
By definition, it is clear that a subset A of E is closed iff A = A. The set Q of rationals
766
One should realize that every open set U O which is entirely contained in A is also in
the family U, but U may contain open sets that are not in O. For example, if E = R
with |x y|, and A = [a, b], then sets of the form [a, c[, with a < c < b belong to U, but they
are not open sets for R under |x y|. However, there is agreement in the following situation.
Proposition 26.4. Given a topological space (E, O), given any subset A of E, if U is the
subspace topology, then the following properties hold.
(1) If A is an open set A O, then every open set U U is an open set U O.
(2) If A is a closed set in E, then every closed set w.r.t. the subspace topology is a closed
set w.r.t. O.
Proof. Left as an exercise.
The concept of product topology is also useful. We have the following proposition.
Proposition 26.5. Given n topological spaces (Ei , Oi ), let B be the family of subsets of
E1 En defined as follows:
B = {U1 Un | Ui Oi , 1 i n},
and let P be the family consisting of arbitrary unions of sets in B, including . Then, P is
a topology on E1 En .
767
A subbasis for O is a family S of subsets of E, such that the family B of all finite
intersections of sets in S (including E itself, in case of the empty intersection) is a basis of
O.
The following proposition gives useful criteria for determining whether a family of open
subsets is a basis of a topological space.
Proposition 26.6. Given a topological space (E, O) and a family B of open subsets in O
the following properties hold:
768
(1) The family B is a basis for the topology O iff for every open set U O and every
x U , there is some B B such that x B and B U .
(2) The family B is a basis for the topology O iff
(a) For every x E, there is some B B such that x B.
(b) For any two open subsets, B1 , B2 B, for every x E, if x B1 B2 , then there
is some B3 B such that x B3 and B3 B1 B2 .
26.3
Definition 26.10. Let (E, OE ) and (F, OF ) be topological spaces, and let f : E F be a
function. For every a E, we say that f is continuous at a, if for every open set V OF
containing f (a), there is some open set U OE containing a, such that, f (U ) V . We say
that f is continuous if it is continuous at every a E.
Define a neighborhood of a E as any subset N of E containing some open set O O
such that a O. Now, if f is continuous at a and N is any neighborhood of f (a), there is
some open set V N containing f (a), and since f is continuous at a, there is some open
set U containing a, such that f (U ) V . Since V N , the open set U is a subset of f 1 (N )
containing a, and f 1 (N ) is a neighborhood of a. Conversely, if f 1 (N ) is a neighborhood
of a whenever N is any neighborhood of f (a), it is immediate that f is continuous at a. It
is easy to see that Definition 26.10 is equivalent to the following statements.
Proposition 26.7. Let (E, OE ) and (F, OF ) be topological spaces, and let f : E F be a
function. For every a E, the function f is continuous at a E iff for every neighborhood
N of f (a) F , then f 1 (N ) is a neighborhood of a. The function f is continuous on E iff
f 1 (V ) is an open set in OE for every open set V OF .
If E and F are metric spaces defined by metrics d1 and d2 , we can show easily that f is
continuous at a iff
for every > 0, there is some > 0, such that, for every x E,
if d1 (a, x) , then d2 (f (a), f (x)) .
Similarly, if E and F are normed vector spaces defined by norms k k1 and k k2 , we can
show easily that f is continuous at a iff
for every > 0, there is some > 0, such that, for every x E,
if kx ak1 , then kf (x) f (a)k2 .
769
It is worth noting that continuity is a topological notion, in the sense that equivalent
metrics (or equivalent norms) define exactly the same notion of continuity.
If (E, OE ) and (F, OF ) are topological spaces, and f : E F is a function, for every
nonempty subset A E of E, we say that f is continuous on A if the restriction of f to A
is continuous with respect to (A, U) and (F, OF ), where U is the subspace topology induced
by OE on A.
v.
w =u+
+1
Since v 6= 0 and > 0, we have w 6= u. Then,
kw uk =
+ 1 v
= + 1 < ,
which shows that kw uk < , for w 6= u.
The following proposition is easily shown.
Proposition 26.8. Given topological spaces (E, OE ), (F, OF ), and (G, OG ), and two functions f : E F and g : F G, if f is continuous at a E and g is continuous at f (a) F ,
then g f : E G is continuous at a E. Given n topological spaces (Fi , Oi ), for every
function f : E F1 Fn , then f is continuous at a E iff every fi : E Fi is
continuous at a, where fi = i f .
One can also show that in a metric space (E, d), the norm d : E E R is continuous,
where E E has the product topology, and that for a normed vector space (E, k k), the
norm k k : E R is continuous.
770
x2
xy
+ y2
The function f is continuous on R R {(0, 0)}, but on the line y = mx, with m 6= 0, we
m
have f (x, y) = 1+m
2 6= 0, and thus, on this line, f (x, y) does not approach 0 when (x, y)
approaches (0, 0).
The following proposition is useful for showing that real-valued functions are continuous.
Proposition 26.9. If E is a topological space, and (R, |x y|) the reals under the standard
topology, for any two functions f : E R and g : E R, for any a E, for any R, if
f and g are continuous at a, then f + g, f , f g, are continuous at a, and f /g is continuous
at a if g(a) 6= 0.
Proof. Left as an exercise.
Using Proposition 26.9, we can show easily that every real polynomial function is continuous.
The notion of isomorphism of topological spaces is defined as follows.
Definition 26.11. Let (E, OE ) and (F, OF ) be topological spaces, and let f : E F be a
function. We say that f is a homeomorphism between E and F if f is bijective, and both
f : E F and f 1 : F E are continuous.
L1 (t) =
If we think of (x(t), y(t)) = (L1 (t), L2 (t)) as a geometric point in R2 , the set of points
(x(t), y(t)) obtained by letting t vary in R from to +, defines a curve having the shape
of a figure eight, with self-intersection at the origin, called the lemniscate of Bernoulli.
The map L is continuous, and in fact bijective, but its inverse L1 is not continuous. Indeed,
when we approach the origin on the branch of the curve in the upper left quadrant (i.e.,
points such that, x 0, y 0), then t goes to , and when we approach the origin on the
771
branch of the curve in the lower right quadrant (i.e., points such that, x 0, y 0), then t
goes to +.
We also review the concept of limit of a sequence. Given any set E, a sequence is any
function x : N E, usually denoted by (xn )nN , or (xn )n0 , or even by (xn ).
Definition 26.12. Given a topological space (E, O), we say that a sequence (xn )nN converges to some a E if for every open set U containing a, there is some n0 0, such that,
xn U , for all n n0 . We also say that a is a limit of (xn )nN .
When E is a metric space with metric d, it is easy to show that this is equivalent to the
fact that,
for every > 0, there is some n0 0, such that, d(xn , a) , for all n n0 .
When E is a normed vector space with norm k k, it is easy to show that this is equivalent
to the fact that,
for every > 0, there is some n0 0, such that, kxn ak , for all n n0 .
The following proposition shows the importance of the Hausdorff separation axiom.
Proposition 26.10. Given a topological space (E, O), if the Hausdorff separation axiom
holds, then every sequence has at most one limit.
Proof. Left as an exercise.
It is worth noting that the notion of limit is topological, in the sense that a sequence
converge to a limit b iff it converges to the same limit b in any equivalent metric (and similarly
for equivalent norms).
We still need one more concept of limit for functions.
Definition 26.13. Let (E, OE ) and (F, OF ) be topological spaces, let A be some nonempty
subset of E, and let f : A F be a function. For any a A and any b F , we say that f (x)
approaches b as x approaches a with values in A if for every open set V OF containing b,
there is some open set U OE containing a, such that, f (U A) V . This is denoted by
lim
xa,xA
f (x) = b.
First, note that by Proposition 26.2, since a A, for every open set U containing a, we
have U A 6= , and the definition is nontrivial. Also, even if a A, the value f (a) of f at
a plays no role in this definition. When E and F are metric space with metrics d1 and d2 ,
it can be shown easily that the definition can be stated as follows:
For every > 0, there is some > 0, such that, for every x A,
if d1 (x, a) , then d2 (f (x), b) .
772
When E and F are normed vector spaces with norms k k1 and k k2 , it can be shown easily
that the definition can be stated as follows:
For every > 0, there is some > 0, such that, for every x A,
if kx ak1 , then kf (x) bk2 .
We have the following result relating continuity at a point and the previous notion.
Proposition 26.11. Let (E, OE ) and (F, OF ) be two topological spaces, and let f : E F
be a function. For any a E, the function f is continuous at a iff f (x) approaches f (a)
when x approaches a (with values in E).
Proof. Left as a trivial exercise.
Another important proposition relating the notion of convergence of a sequence to continuity, is stated without proof.
Proposition 26.12. Let (E, OE ) and (F, OF ) be two topological spaces, and let f : E F
be a function.
(1) If f is continuous, then for every sequence (xn )nN in E, if (xn ) converges to a, then
(f (xn )) converges to f (a).
(2) If E is a metric space, and (f (xn )) converges to f (a) whenever (xn ) converges to a,
for every sequence (xn )nN in E, then f is continuous.
A special case of Definition 26.13 will be used when E and F are (nontrivial) normed
vector spaces with norms k k1 and k k2 . Let U be any nonempty open subset of E. We
showed earlier that E has no isoled points and that every set {v} is closed, for every v E.
Since E is nontrivial, for every v U , there is a nontrivial open ball contained in U (an open
ball not reduced to its center). Then, for every v U , A = U {v} is open and nonempty,
and clearly, v A. For any v U , if f (x) approaches b when x approaches v with values
in A = U {v}, we say that f (x) approaches b when x approaches v with values 6= v in U .
This is denoted by
lim
f (x) = b.
xv,xU,x6=v
Remark: Variations of the above case show up in the following case: E = R, and F is some
arbitrary topological space. Let A be some nonempty subset of R, and let f : A F be
some function. For any a A, we say that f is continuous on the right at a if
lim
xa,xA[a, +[
f (x) = f (a).
773
Let us consider another variation. Let A be some nonempty subset of R, and let f : A F
be some function. For any a A, we say that f has a discontinuity of the first kind at a if
xa,xA ],a[
lim
f (x) = f (a )
lim
f (x) = f (a+ )
and
xa,xA ]a, +[
26.4
Connected Sets
774
and open sets are closed under arbitrary unions. However, either f 1 (y) = if y Y f (X)
or f is constant on U = f 1 (y) if y f (X) (with value y), and since f is locally constant,
for every x U , there is some open set, W X, such that x W and f is constant on W ,
which implies that f (w) = y for all w W and thus, that W U , showing that U is a union
of open sets and thus, is open. The following proposition shows that a space is connected iff
every locally constant function is constant:
Proposition 26.13. A topological space is connected iff every locally constant function is
constant.
Proof. First, assume that X is connected. Let f : X Y be a locally constant function
to some space Y and assume that f is not constant. Pick any y f (Y ). Since f is not
constant, U1 = f 1 (y) 6= X, and of course, U1 6= . We proved just before Proposition
26.13 that f 1 (V ) is open for every subset V Y , and thus U1 = f 1 (y) = f 1 ({y}) and
U2 = f 1 (Y {y}) are both open, nonempty, and clearly X = U1 U2 and U1 and U2 are
disjoint. This contradicts the fact that X is connected and f must be constant.
Assume that every locally constant function, f : X Y , to a Hausdorff space, Y , is
constant. If X is not connected, we can write X = U1 U2 , where both U1 , U2 are open,
disjoint, and nonempty. We can define the function, f : X R, such that f (x) = 1 on U1
and f (x) = 0 on U2 . Since U1 and U2 are open, the function f is locally constant, and yet
not constant, a contradiction.
The following standard proposition characterizing the connected subsets of R can be
found in most topology texts (for example, Munkres [84], Schwartz [93]). For the sake of
completeness, we give a proof.
Proposition 26.14. A subset of the real line, R, is connected iff it is an interval, i.e., of
the form [a, b], ] a, b], where a = is possible, [a, b[ , where b = + is possible, or ]a, b[ ,
where a = or b = + is possible.
Proof. Assume that A is a connected nonempty subset of R. The cases where A = or
A consists of a single point are trivial. We show that whenever a, b A, a < b, then the
entire interval [a, b] is a subset of A. Indeed, if this was not the case, there would be some
c ]a, b[ such that c
/ A, and then we could write A = ( ] , c[ A) ( ]c, +[ A), where
] , c[ A and ]c, +[ A are nonempty and disjoint open subsets of A, contradicting the
fact that A is connected. It follows easily that A must be an interval.
Conversely, we show that an interval, I, must be connected. Let A be any nonempty
subset of I which is both open and closed in I. We show that I = A. Fix any x A
and consider the set, Rx , of all y such that [x, y] A. If the set Rx is unbounded, then
Rx = [x, +[ . Otherwise, if this set is bounded, let b be its least upper bound. We
claim that b is the right boundary of the interval I. Because A is closed in I, unless I
is open on the right and b is its right boundary, we must have b A. In the first case,
A [x, b[ = I [x, b[ = [x, b[ . In the second case, because A is also open in I, unless b is the
775
right boundary of the interval I (closed on the right), there is some open set ]b , b + [
contained in A, which implies that [x, b + /2] A, contradicting the fact that b is the least
upper bound of the set Rx . Thus, b must be the right boundary of the interval I (closed on
the right). A similar argument applies to the set, Ly , of all x such that [x, y] A and either
Ly is unbounded, or its greatest lower bound a is the left boundary of I (open or closed on
the left). In all cases, we showed that A = I, and the interval must be connected.
A characterization on the connected subsets of Rn is harder and requires the notion of
arcwise connectedness. One of the most important properties of connected sets is that they
are preserved by continuous maps.
Proposition 26.15. Given any continuous map, f : E F , if A E is connected, then
f (A) is connected.
Proof. If f (A) is not connected, then there exist some nonempty open sets, U, V , in F such
that f (A) U and f (A) V are nonempty and disjoint, and
f (A) = (f (A) U ) (f (A) V ).
Then, f 1 (U ) and f 1 (V ) are nonempty and open since f is continuous and
A = (A f 1 (U )) (A f 1 (V )),
with A f 1 (U ) and A f 1 (V ) nonempty, disjoint, and open in A, contradicting the fact
that A is connected.
An important corollary of Proposition 26.15 is that for every continuous function, f : E
R, where E is a connected space, f (E) is an interval. Indeed, this follows from Proposition
26.14. Thus, if f takes the values a and b where a < b, then f takes all values c [a, b].
This is a very important property.
Even if a topological space is not connected, it turns out that it is the disjoint union of
maximal connected subsets and these connected components are closed in E. In order to
obtain this result, we need a few lemmas.
Lemma 26.16. Given a topological space, E, for any family, (Ai )iI , of (nonempty)
conS
nected subsets of E, if Ai Aj 6= for all i, j I, then the union, A = iI Ai , of the
family, (Ai )iI , is also connected.
S
Proof. Assume that iI Ai is not connected. Then, there exists two nonempty open subsets,
U and V , of E such that A U and A V are disjoint and nonempty and such that
A = (A U ) (A V ).
Now, for every i I, we can write
Ai = (Ai U ) (Ai V ),
776
where Ai U and Ai V are disjoint, since Ai A and A U and A V are disjoint. Since
Ai is connected, either Ai U = or Ai V = . This implies that either Ai A U or
Ai A V . However, by assumption, Ai Aj 6= , for all i, j I, and thus, either both
Ai A U and Aj A U , or both Ai A V and Aj A V , since A U and A V
are disjoint. Thus, we conclude that either Ai A U for all i I, or Ai A V for all
i I. But this proves that either
[
A=
Ai A U,
iI
or
A=
[
iI
Ai A V,
contradicting the fact that both A U and A V are disjoint and nonempty. Thus, A must
be connected.
In particular, the above lemma applies when the connected sets in a family (Ai )iI have
a point in common.
Lemma 26.17. If A is a connected subset of a topological space, E, then for every subset,
B, such that A B A, where A is the closure of A in E, the set B is connected.
Proof. If B is not connected, then there are two nonempty open subsets, U, V , of E such
that B U and B V are disjoint and nonempty, and
B = (B U ) (B V ).
Since A B, the above implies that
A = (A U ) (A V ),
and since A is connected, either A U = , or A V = . Without loss of generality, assume
that A V = , which implies that A A U B U . However, B U is closed in
the subspace topology for B and since B A and A is closed in E, the closure of A in B
w.r.t. the subspace topology of B is clearly B A = B, which implies that B B U
(since the closure is the smallest closed set containing the given set). Thus, B V = , a
contradiction.
In particular, Lemma 26.17 shows that if A is a connected subset, then its closure, A, is
also connected. We are now ready to introduce the connected components of a space.
Definition 26.15. Given a topological space, (E, O), we say that two points, a, b E, are
connected if there is some connected subset, A, of E such that a A and b A.
777
There are connected spaces that are not locally connected and there are locally connected
spaces that are not connected. The two properties are independent.
Proposition 26.19. A topological space, E, is locally connected iff for every open subset,
A, of E, the connected components of A are open.
Proof. Assume that E is locally connected. Let A be any open subset of E and let C be one
of the connected components of A. For any a C A, there is some connected neigborhood,
U , of a such that U A and since C is a connected component of A containing a, we must
have U C. This shows that for every a C, there is some open subset containing a
contained in C, so C is open.
Conversely, assume that for every open subset, A, of E, the connected components of A
are open. Then, for every a E and every neighborhood, U , of a, since U contains some
open set A containing a, the interior, U , of U is an open set containing a and its connected
components are open. In particular, the connected component C containing a is a connected
open set containing a and contained in U .
Proposition 26.19 shows that in a locally connected space, the connected open sets form a
basis for the topology. It is easily seen that Rn is locally connected. Another very important
property of surfaces and more generally, manifolds, is to be arcwise connected. The intuition
is that any two points can be joined by a continuous arc of curve. This is formalized as
follows.
Definition 26.17. Given a topological space, (E, O), an arc (or path) is a continuous map,
: [a, b] E, where [a, b] is a closed interval of the real line, R. The point (a) is the initial
point of the arc and the point (b) is the terminal point of the arc. We say that is an arc
joining (a) and (b). An arc is a closed curve if (a) = (b). The set ([a, b]) is the trace
of the arc .
778
One should not confuse an arc, : [a, b] E, with its trace. For example, could be
constant, and thus, its trace reduced to a single point.
An arc is a Jordan arc if is a homeomorphism onto its trace. An arc, : [a, b] E,
is a Jordan curve if (a) = (b) and is injective on [a, b[ . Since [a, b] is connected, by
Proposition 26.15, the trace ([a, b]) of an arc is a connected subset of E.
Given two arcs : [0, 1] E and : [0, 1] E such that (1) = (0), we can form a new
arc defined as follows:
Definition 26.18. Given two arcs, : [0, 1] E and : [0, 1] E, such that (1) = (0),
we can form their composition (or product), ,, defined such that
(2t)
if 0 t 1/2;
(t) =
(2t 1) if 1/2 t 1.
The inverse, 1 , of the arc, , is the arc defined such that 1 (t) = (1t), for all t [0, 1].
It is trivially verified that Definition 26.18 yields continuous arcs.
Definition 26.19. A topological space, E, is arcwise connected if for any two points,
a, b E, there is an arc, : [0, 1] E, joining a and b, i.e., such that (0) = a and
(1) = b. A topological space, E, is locally arcwise connected if for every a E, for every
neighborhood, V , of a, there is an arcwise connected neighborhood, U , of a such that U V .
The space Rn is locally arcwise connected, since for any open ball, any two points in this
ball are joined by a line segment. Manifolds and surfaces are also locally arcwise connected.
Proposition 26.15 also applies to arcwise connectedness (this is a simple exercise). The
following theorem is crucial to the theory of manifolds and surfaces:
Theorem 26.20. If a topological space, E, is arcwise connected, then it is connected. If a
topological space, E, is connected and locally arcwise connected, then E is arcwise connected.
Proof. First, assume that E is arcwise connected. Pick any point, a, in E. Since E is arcwise
connected, for every b E, there is a path, b : [0, 1] E, from a to b and so,
[
E=
b ([0, 1])
bE
779
a neighborhood of b). Thus, b can be joined to every point c U by an arc, and since by the
definition of Fa , there is an arc from a to b, the composition of these two arcs yields an arc
from a to c, which shows that c Fa . But then U Fa and thus, Fa is open. Now assume
that b is in the complement of Fa . As in the previous case, there is some arcwise connected
neighborhood U containing b. Thus, every point c U can be joined to b by an arc. If
there was an arc joining a to c, we would get an arc from a to b, contradicting the fact that
b is in the complement of Fa . Thus, every point c U is in the complement of Fa , which
shows that U is contained in the complement of Fa , and thus, that the the complement of
Fa is open. Consequently, we have shown that Fa is both open and closed and since it is
nonempty, we must have E = Fa , which shows that E is arcwise connected.
If E is locally arcwise connected, the above argument shows that the connected components of E are arcwise connected.
It is not true that a connected space is arcwise connected. For example, the space
consisting of the graph of the function
f (x) = sin(1/x),
where x > 0, together with the portion of the y-axis, for which 1 y 1, is connected,
but not arcwise connected.
A trivial modification of the proof of Theorem 26.20 shows that in a normed vector
space, E, a connected open set is arcwise connected by polygonal lines (i.e., arcs consisting
of line segments). This is because in every open ball, any two points are connected by a line
segment. Furthermore, if E is finite dimensional, these polygonal lines can be forced to be
parallel to basis vectors.
We now consider compactness.
26.5
Compact Sets
The property of compactness is very important in topology and analysis. We provide a quick
review geared towards the study of surfaces and for details, we refer the reader to Munkres
[84], Schwartz [93]. In this section, we will need to assume that the topological spaces are
Hausdorff spaces. This is not a luxury, as many of the results are false otherwise.
There are various equivalent ways of defining compactness. For our purposes, the most
convenient way involves the notion of open cover.
Definition 26.20. Given a topological space, E, for any subset,
A, of E, an open cover,
S
(Ui )iI , of A is a family of open subsets of E such that A iI Ui . An open subcover of an
open cover, (Ui )iI , of A is any subfamily, (Uj )jJ , which is an open cover of A, with J I.
An open cover, (Ui )iI , of A is finite if I is finite. The topological space, E, is compact if it
780
is Hausdorff and for every open cover, (Ui )iI , of E, there is a finite open subcover, (Uj )jJ ,
of E. Given any subset, A, of E, we say that A is compact if it is compact with respect to
the subspace topology. We say that A is relatively compact if its closure A is compact.
It is immediately verified that a subset, A, of E is compact in the subspace topology
relative to A iff for every open cover, (Ui )iI , of A by open subsets of E, there is a finite
open subcover, (Uj )jJ , of A. The property that every open cover contains a finite open
subcover is often called the Heine-Borel-Lebesgue property. By considering
T complements, a
Hausdorff space is compact iff for every family, (Fi )iI , of closed sets, if iI Fi = , then
T
jJ Fj = for some finite subset, J, of I.
Definition 26.20 requires that a compact space be Hausdorff. There are books in which a
compact space is not necessarily required to be Hausdorff. Following Schwartz, we prefer
calling such a space quasi-compact.
Another equivalent and useful characterization can be given in terms of families having
theTfinite intersection property. A family, (Fi )iI , of sets has the finite intersection property
if jJ Fj 6= for every finite subset, J, of I. We have the following proposition:
Proposition 26.21. A topological Hausdorff space, E, is compact
T iff for every family,
(Fi )iI , of closed sets having the finite intersection property, then iI Fi 6= .
Proof. If E is T
compact and (Fi )iI is a family of closed sets having the
T finite intersection
property, then iI Fi cannot be empty, since otherwise we would have jJ Fj = for some
finite subset, J, of I, a contradiction. The converse is equally obvious.
Another useful consequence of compactness
T is as follows. For any family, (Fi )iI , of closed
sets such that Fi+1 Fi for all i I, if iI Fi = ,
T then Fi = for some i I. Indeed,
there must be some finite subset, J, of I such that jJ Fj = and since Fi+1 Fi for all
i I, we must have Fj = for the smallest Fj in (Fj )jJ . Using this fact, we note that R
is not compact. Indeed, the family of closed sets, ([n, +[ )n0 , is decreasing and has an
empty intersection.
Given a metric space, if we define a bounded subset to be a subset that can be enclosed
in some closed ball (of finite radius), then any nonbounded subset of a metric space is not
compact. However, a closed interval [a, b] of the real line is compact.
Proposition 26.22. Every closed interval, [a, b], of the real line is compact.
Proof. We proceed by contradiction. Let (Ui )iI be any open cover of [a, b] and assume that
there is no finite open subcover. Let c = (a + b)/2. If both [a, c] and [c, b] had some finite
open subcover, so would [a, b], and thus, either [a, c] does not have any finite subcover, or
[c, b] does not have any finite open subcover. Let [a1 , b1 ] be such a bad subinterval. The
same argument applies and we split [a1 , b1 ] into two equal subintervals, one of which must be
bad. Thus, having defined [an , bn ] of length (b a)/2n as an interval having no finite open
781
subcover, splitting [an , bn ] into two equal intervals, we know that at least one of the two has
no finite open subcover and we denote such a bad interval by [an+1 , bn+1 ]. The sequence
(an ) is nondecreasing and bounded from above by b, and thus, by a fundamental property
of the real line, it converges to its least upper bound, . Similarly, the sequence (bn ) is
nonincreasing and bounded from below by a and thus, it converges to its greatest lowest
bound, . Since [an , bn ] has length (b a)/2n , we must have = . However, the common
limit = of the sequences (an ) and (bn ) must belong to some open set, Ui , of the open
cover and since Ui is open, it must contain some interval [c, d] containing . Then, because
is the common limit of the sequences (an ) and (bn ), there is some N such that the intervals
[an , bn ] are all contained in the interval [c, d] for all n N , which contradicts the fact that
none of the intervals [an , bn ] has a finite open subcover. Thus, [a, b] is indeed compact.
The argument of Proposition 26.22 can be adapted to show that in Rm , every closed set,
[a1 , b1 ] [am , bm ], is compact. At every stage, we need to divide into 2m subpieces
instead of 2.
The following two propositions give very important properties of the compact sets, and
they only hold for Hausdorff spaces:
Proposition 26.23. Given a topological Hausdorff space, E, for every compact subset, A,
and every point, b, not in A, there exist disjoint open sets, U and V , such that A U and
b V . As a consequence, every compact subset is closed.
Proof. Since E is Hausdorff, for every a A, there are some disjoint open sets, Ua and Vb ,
containing a and b respectively. Thus, the family, (Ua )aA , forms an open cover of A.
S Since
A is compact there is a finite open subcover, (Uj )jJ , T
of A, where J A, and then jJ Uj
is an open set containing A disjoint from the open set jJ Vj containing b. This shows that
every point, b, in the complement of A belongs to some open set in this complement and
thus, that the complement is open, i.e., that A is closed.
Actually, the proof of Proposition 26.23 can be used to show the following useful property:
Proposition 26.24. Given a topological Hausdorff space, E, for every pair of compact
disjoint subsets, A and B, there exist disjoint open sets, U and V , such that A U and
B V.
Proof. We repeat the argument of Proposition 26.23 with B playing the role of b and use
Proposition 26.23 to find disjoint open sets, Ua , containing a A and, Va , containing B.
The following proposition shows that in a compact topological space, every closed set is
compact:
Proposition 26.25. Given a compact topological space, E, every closed set is compact.
782
Proof. Since A is closed, E A is open and from any open cover, (Ui )iI , of A, we can form
an open cover of E by adding E A to (Ui )iI and, since E is compact, a finite subcover,
(Uj )jJ {E A}, of E can be extracted such that (Uj )jJ is a finite subcover of A.
Remark: Proposition 26.25 also holds for quasi-compact spaces, i.e., the Hausdorff separation property is not needed.
Putting Proposition 26.24 and Proposition 26.25 together, we note that if X is compact,
then for every pair of disjoint closed, sets A and B, there exist disjoint open sets, U and V ,
such that A U and B V . We say that X is a normal space.
Proposition 26.26. Given a compact topological space, E, for every a E, for every
neighborhood, V , of a, there exists a compact neighborhood, U , of a such that U V
Proof. Since V is a neighborhood of a, there is some open subset, O, of V containing a. Then
the complement, K = E O, of O is closed and since E is compact, by Proposition 26.25, K
is compact. Now, if we consider the family of all closed sets of the form, K F , where F is any
closed neighborhood of a, since a
/ K, this family has an empty intersection and thus, there
is a finite number of closed neighborhood, F1 , . . . , Fn , of a, such that K F1 Fn = .
Then, U = F1 Fn is a compact neigborhood of a contained in O V .
It can be shown that in a normed vector space of finite dimension, a subset is compact
iff it is closed and bounded. For Rn , the proof is simple.
In a normed vector space of infinite dimension, there are closed and bounded sets that
are not compact!
More could be said about compactness in metric spaces but we will only need the notion
of Lebesgue number, which will be discussed a little later. Another crucial property of
compactness is that it is preserved under continuity.
Proposition 26.27. Let E be a topological space and let F be a topological Hausdorff space.
For every compact subset, A, of E, for every continuous map, f : E F , the subspace f (A)
is compact.
Proof. Let (Ui )iI be an open cover of f (A). We claim that (f 1 (Ui ))iI is an open cover of
A, which is easily checked. Since A is compact, there is a finite open subcover, (f 1 (Uj ))jJ ,
of A, and thus, (Uj )jJ is an open subcover of f (A).
As a corollary of Proposition 26.27, if E is compact, F is Hausdorff, and f : E F
is continuous and bijective, then f is a homeomorphism. Indeed, it is enough to show
that f 1 is continuous, which is equivalent to showing that f maps closed sets to closed
sets. However, closed sets are compact and Proposition 26.27 shows that compact sets are
mapped to compact sets, which, by Proposition 26.23, are closed.
783
784
Definition 26.22. Let (E, O) be a locally compact space. Let be any point not in E,
and let E = E {}. Define the family, O , as follows:
O = O {(E K) {} | K compact in E}.
The pair, (E , O ), is called the Alexandroff compactification (or one point compactification)
of (E, O).
The following theorem shows that (E , O ) is indeed a topological space, and that it is
compact.
Theorem 26.29. Let E be a locally compact topological space. The Alexandroff compactification, E , of E is a compact space such that E is a subspace of E and if E is not compact,
then E = E .
Proof. The verification that O is a family of open sets is not difficult but a bit tedious.
Details can be found in Munkres [84] or Schwartz [93]. Let us show that E is compact. For
every open cover, (Ui )iI , of E , since must be covered, there is some Ui0 of the form
Ui0 = (E K0 ) {}
where K0 is compact in E. Consider the family, (Vi )iI , defined as follows:
Vi = Ui if Ui O,
Vi = E K if Ui = (E K) {},
where K is compact in E. Then, because each K is compact and thus closed in E (since E
is Hausdorff), E K is open, and every Vi is an open subset of E. Furthermore, the family,
(Vi )i(I{i0 }) , is an open cover of K0 . Since K0 is compact, there is a finite open subcover,
(Vj )jJ , of K0 , and thus, (Uj )jJ{i0 } is a finite open cover of E .
Let us show that E is Hausdorff. Given any two points, a, b E , if both a, b E, since
E is Hausdorff and every open set in O is an open set in O , there exist disjoint open sets,
U, V (in O), such that a U and b V . If b = , since E is locally compact, there is some
compact set, K, containing an open set, U , containing a and then, U and V = (E K){}
are disjoint open sets (in O ) such that a U and b V .
The space E is a subspace of E because for every open set, U , in O , either U O
and E U = U is open in E, or U = (E K) {}, where K is compact in E, and thus,
U E = E K, which is open in E, since K is compact in E and thus, closed (since E
is Hausdorff). Finally, if E is not compact, for every compact subset, K, of E, E K is
nonempty and thus, for every open set, U = (E K){}, containing , we have U E 6= ,
which shows that E and thus, that E = E .
785
786
Conversely, assume that E is compact, and let (xn ) be any sequence. If l E is not
an accumulation point of the sequence, then there is some open set, Ul , such that l Ul
and xn Ul for only finitely many n. Thus, if (xn ) does not have any accumulation point,
the family, (Ul )lE , is an open cover of E and since E is compact, it has some finite open
subcover, (Ul )lJ , where J is a finite subset of E. But every
S Ul with l J is such that
xn Ul for only finitely many n, and since J is finite, xn lJ Ul for only finitely many n,
which contradicts the fact that (Ul )lJ is an open cover of E, and thus contains all the xn .
Thus, (xn ) has some accumulation point.
Remark: It should be noted that the proof showing that if E is compact, then every sequence has some accumulation point, holds for any arbitrary compact space (the proof does
not use a countable basis for the topology). The converse also holds for metric spaces. We
will prove this converse since it is a major property of metric spaces.
Given a metric space in which every sequence has some accumulation point, we first prove
the existence of a Lebesgue number .
Lemma 26.32. Given a metric space, E, if every sequence, (xn ), has an accumulation point,
for every open cover, (Ui )iI , of E, there is some > 0 (a Lebesgue number for (Ui )iI ) such
that, for every open ball, B0 (a, ), of radius , there is some open subset, Ui , such that
B0 (a, ) Ui .
Proof. If there was no with the above property, then, for every natural number, n, there
would be some open ball, B0 (an , 1/n), which is not contained in any open set, Ui , of the
open cover, (Ui )iI . However, the sequence, (an ), has some accumulation point, a, and since
(Ui )iI is an open cover of E, there is some Ui such that a Ui . Since Ui is open, there is
some open ball of center a and radius contained in Ui . Now, since a is an accumulation
point of the sequence, (an ), every open set containing a contains an for infinitely many n
and thus, there is some n large enough so that
1/n /2 and an B0 (a, /2),
which implies that
B0 (an , 1/n) B0 (a, ) Ui ,
a contradiction.
By a previous remark, since the proof of Proposition 26.31 implies that in a compact
topological space, every sequence has some accumulation point, by Lemma 26.32, in a compact metric space, every open cover has a Lebesgue number. This fact can be used to prove
another important property of compact metric spaces, the uniform continuity theorem.
787
Definition 26.25. Given two metric spaces, (E, dE ) and (F, dF ), a function, f : E F , is
uniformly continuous if for every > 0, there is some > 0, such that, for all a, b E,
if dE (a, b)
788
A metric space satisfying the condition of Lemma 26.34 is sometimes called precompact
(or totally bounded ). We now obtain the WeierstrassBolzano property.
Theorem 26.35. A metric space, E, is compact iff every sequence, (xn ), has an accumulation point.
Proof. We already observed that the proof of Proposition 26.31 shows that for any compact
space (not necessarily metric), every sequence, (xn ), has an accumulation point. Conversely,
let E be a metric space, and assume that every sequence, (xn ), has an accumulation point.
Given any open cover, (Ui )iI , for E, we must find a finite open subcover of E. By Lemma
26.32, there is some > 0 (a Lebesgue number for (Ui )iI ) such that, for every open ball,
B0 (a, ), of radius , there is some open subset, Uj , such that B0 (a, ) Uj . By Lemma
26.34, for every > 0, there is a finite open cover, B0 (a0 , ) B0 (an , ), of E by open
balls of radius . But from the previous statement, every open ball, B0 (ai , ), is contained
in some open set, Uji , and thus, {Uj1 , . . . , Ujn } is an open cover of E.
Another very useful characterization of compact metric spaces is obtained in terms of
Cauchy sequences. Such a characterization is quite useful in fractal geometry (and elsewhere). First, recall the definition of a Cauchy sequence and of a complete metric space.
Definition 26.26. Given a metric space, (E, d), a sequence, (xn )nN , in E is a Cauchy
sequence if the following condition holds: for every > 0, there is some p 0, such that, for
all m, n p, then d(xm , xn ) .
If every Cauchy sequence in (E, d) converges we say that (E, d) is a complete metric
space.
First, let us show the following proposition:
Proposition 26.36. Given a metric space, E, if a Cauchy sequence, (xn ), has some accumulation point, a, then a is the limit of the sequence, (xn ).
Proof. Since (xn ) is a Cauchy sequence, for every > 0, there is some p 0, such that, for
all m, n p, then d(xm , xn ) /2. Since a is an accumulation point for (xn ), for infinitely
many n, we have d(xn , a) /2, and thus, for at least some n p, we have d(xn , a) /2.
Then, for all m p,
d(xm , a) d(xm , xn ) + d(xn , a) ,
which shows that a is the limit of the sequence (xn ).
Recall that a metric space is precompact (or totally bounded ) if for every > 0, there is
a finite open cover, B0 (a0 , ) B0 (an , ), of E by open balls of radius . We can now
prove the following theorem.
Theorem 26.37. A metric space, E, is compact iff it is precompact and complete.
789
Proof. Let E be compact. For every > 0, the family of all open balls of radius is an open
cover for E and since E is compact, there is a finite subcover, B0 (a0 , ) B0 (an , ), of
E by open balls of radius . Thus, E is precompact. Since E is compact, by Theorem 26.35,
every sequence, (xn ), has some accumulation point. Thus, every Cauchy sequence, (xn ), has
some accumulation point, a, and, by Proposition 26.36, a is the limit of (xn ). Thus, E is
complete.
Now, assume that E is precompact and complete. We prove that every sequence, (xn ),
has an accumulation point. By the other direction of Theorem 26.35, this shows that E
is compact. Given any sequence, (xn ), we construct a Cauchy subsequence, (yn ), of (xn )
as follows: Since E is precompact, letting = 1, there exists a finite cover, U1 , of E by
open balls of radius 1. Thus, some open ball, Bo1 , in the cover, U1 , contains infinitely many
elements from the sequence (xn ). Let y0 be any element of (xn ) in Bo1 . By induction, assume
that a sequence of open balls, (Boi )1im , has been defined, such that every ball, Boi , has
radius 21i , contains infinitely many elements from the sequence (xn ) and contains some yi
from (xn ) such that
1
d(yi , yi+1 ) i ,
2
1
, because E is precompact, there is some
for all i, 0 i m 1. Then, letting = 2m+1
finite cover, Um+1 , of E by open balls of radius and thus, of the open ball Bom . Thus, some
open ball, Bom+1 , in the cover, Um+1 , contains infinitely many elements from the sequence,
(xn ), and we let ym+1 be any element of (xn ) in Bom+1 . Thus, we have defined by induction
a sequence, (yn ), which is a subsequence of, (xn ), and such that
d(yi , yi+1 )
1
,
2i
n
X
1
1
,
i
m1
2
2
i=m
and thus, (yn ) is a Cauchy sequence Since E is complete, the sequence, (yn ), has a limit, and
since it is a subsequence of (xn ), the sequence, (xn ), has some accumulation point.
If (E, d) is a nonempty complete metric space, every map, f : E E, for which there is
some k such that 0 k < 1 and
d(f (x), f (y)) kd(x, y)
for all x, y E, has the very important property that it has a unique fixed point, that
is, there is a unique, a E, such that f (a) = a. A map as above is called a contraction
mapping. Furthermore, the fixed point of a contraction mapping can be computed as the
limit of a fast converging sequence.
790
The fixed point property of contraction mappings is used to show some important theorems of analysis, such as the implicit function theorem and the existence of solutions to
certain differential equations. It can also be used to show the existence of fractal sets defined in terms of iterated function systems. Since the proof is quite simple, we prove the
fixed point property of contraction mappings. First, observe that a contraction mapping is
(uniformly) continuous.
Proposition 26.38. If (E, d) is a nonempty complete metric space, every contraction mapping, f : E E, has a unique fixed point. Furthermore, for every x0 E, defining the
sequence, (xn ), such that xn+1 = f (xn ), the sequence, (xn ), converges to the unique fixed
point of f .
Proof. First, we prove that f has at most one fixed point. Indeed, if f (a) = a and f (b) = b,
since
d(a, b) = d(f (a), f (b)) kd(a, b)
and 0 k < 1, we must have d(a, b) = 0, that is, a = b.
Thus, we have
d(xn+p , xn ) d(xn+p , xn+p1 ) + d(xn+p1 , xn+p2 ) + + d(xn+1 , xn )
(k p1 + k p2 + + k + 1)k n d(x1 , x0 )
kn
d(x1 , x0 ).
1k
We conclude that d(xn+p , xn ) converges to 0 when n goes to infinity, which shows that (xn )
is a Cauchy sequence. Since E is complete, the sequence (xn ) has a limit, a. Since f is
continuous, the sequence (f (xn )) converges to f (a). But xn+1 = f (xn ) converges to a and
so f (a) = a, the unique fixed point of f .
Note that no matter how the starting point x0 of the sequence (xn ) is chosen, (xn )
converges to the unique fixed point of f . Also, the convergence is fast, since
d(xn , a)
kn
d(x1 , x0 ).
1k
The Hausdorff distance between compact subsets of a metric space provides a very nice
illustration of some of the theorems on complete and compact metric spaces just presented.
791
Definition 26.27. Given a metric space, (X, d), for any subset, A X, for any, 0,
define the -hull of A as the set
V (A) = {x X, a A | d(a, x) }.
Given any two nonempty bounded subsets, A, B of X, define D(A, B), the Hausdorff distance
between A and B, by
D(A, B) = inf{ 0 | A V (B) and B V (A)}.
Note that since we are considering nonempty bounded subsets, D(A, B) is well defined
(i.e., not infinite). However, D is not necessarily a distance function. It is a distance function
if we restrict our attention to nonempty compact subsets of X (actually, it is also a metric on
closed and bounded subsets). We let K(X) denote the set of all nonempty compact subsets
of X. The remarkable fact is that D is a distance on K(X) and that if X is complete or
compact, then so is K(X). The following theorem is taken from Edgar [33].
Theorem 26.39. If (X, d) is a metric space, then the Hausdorff distance, D, on the set,
K(X), of nonempty compact subsets of X is a distance. If (X, d) is complete, then (K(X), D)
is complete and if (X, d) is compact, then (K(X), D) is compact.
Proof. Since (nonempty) compact sets are bounded, D(A, B) is well defined. Clearly, D is
symmetric. Assume that D(A, B) = 0. Then, for every > 0, A V (B), which means that
for every a A, there is some b B such that d(a, b) , and thus, that A B. Since
B is closed, B = B, and we have A B. Similarly, B A, and thus, A = B. Clearly, if
A = B, we have D(A, B) = 0. It remains to prove the triangle inequality. If B V1 (A)
and C V2 (B), then
V2 (B) V2 (V1 (A)),
and since
we get
Similarly, we can prove that
Next, we need to prove that if (X, d) is complete, then (K(X), D) is also complete. First,
we show that if (An ) is a sequence of nonempty compact sets converging to a nonempty
compact set A in the Hausdorff metric, then
A = {x X | there is a sequence, (xn ), with xn An converging to x}.
Indeed, if (xn ) is a sequence with xn An converging to x and (An ) converges to A then, for
every > 0, there is some xn such that d(xn , x) /2 and there is some an A such that
792
d(an , xn ) /2 and thus, d(an , x) , which shows that x A. Since A is compact, it is
closed, and x A. Conversely, since (An ) converges to A, for every x A, for every n 1,
there is some xn An such that d(xn , x) 1/n and the sequence (xn ) converges to x.
Now, let (An ) be a Cauchy sequence in K(X). It can be proven that (An ) converges to
the set
A = {x X | there is a sequence, (xn ), with xn An converging to x},
and that A is nonempty and compact. To prove that A is compact, one proves that it is
totally bounded and complete. Details are given in Edgar [33].
Finally, we need to prove that if (X, d) is compact, then (K(X), D) is compact. Since we
already know that (K(X), D) is complete if (X, d) is, it is enough to prove that (K(X), D)
is totally bounded if (X, d) is, which is not hard.
In view of Theorem 26.39 and Theorem 26.38, it is possible to define some nonempty
compact subsets of X in terms of fixed points of contraction maps. This can be done in
terms of iterated function systems, yielding a large class of fractals. However, we will omit
this topic and instead refer the reader to Edgar [33].
Finally, returning to second-countable spaces, we give another characterization of accumulation points.
Proposition 26.40. Given a second-countable topological Hausdorff space, E, a point, l, is
an accumulation point of the sequence, (xn ), iff l is the limit of some subsequence, (xnk ), of
(xn ).
Proof. Clearly, if l is the limit of some subsequence (xnk ) of (xn ), it is an accumulation point
of (xn ).
Conversely, let (Uk )k0 be the sequence of open sets containing l, where each Uk belongs
to a countable basis of E, and let Vk = U1 Uk . For every k 1, we can find some
nk > nk1 such that xnk Vk , since l is an accumulation point of (xn ). Now, since every
open set containing l contains some Uk0 and since xnk Uk0 for all k 0, the sequence (xnk )
has limit l.
26.6
793
If E and F are normed vector spaces, we first characterize when a linear map f : E F is
continuous.
Proposition 26.41. Given two normed vector spaces E and F , for any linear map f : E
F , the following conditions are equivalent:
(1) The function f is continuous at 0.
(2) There is a constant k 0 such that,
kf (u)k k, for every u E such that kuk 1.
(3) There is a constant k 0 such that,
kf (u)k kkuk, for every u E.
(4) The function f is continuous at every point of E.
Proof. Assume (1). Then, for every > 0, there is some > 0 such that, for every u E, if
kuk , then kf (u)k . Pick = 1, so that there is some > 0 such that, if kuk , then
kf (u)k 1. If kuk 1, then kuk kuk , and so, kf (u)k 1, that is, kf (u)k 1,
which implies kf (u)k 1 . Thus, (2) holds with k = 1 .
Assume that (2) holds. If u = 0, then by linearity, f (0) = 0, and thus kf (0)k kk0k
holds trivially for all k 0. If u 6= 0, then kuk > 0, and since
u
kuk
= 1,
we have
u
f
k,
kuk
kf (u)k kkuk.
794
Among other things, Proposition 26.41 shows that a linear map is continuous iff the
image of the unit (closed) ball is bounded. If E and F are normed vector spaces, the set of
all continuous linear maps f : E F is denoted by L(E; F ).
Using Proposition 26.41, we can define a norm on L(E; F ) which makes it into a normed
vector space. This definition has already been given in Chapter 7 (Definition 7.7) but for
the readers convenience, we repeat it here.
Definition 26.28. Given two normed vector spaces E and F , for every continuous linear
map f : E F , we define the norm kf k of f as
kf k = min {k 0 | kf (x)k kkxk, for all x E} = max {kf (x)k | kxk 1} .
From Definition 26.28, for every continuous linear map f L(E; F ), we have
kf (x)k kf kkxk,
for every x E. It is easy to verify that L(E; F ) is a normed vector space under the norm
of Definition 26.28. Furthermore, if E, F, G, are normed vector spaces, and f : E F and
g : F G are continuous linear maps, we have
kg f k kgkkf k.
We can now show that when E = Rn or E = Cn , with any of the norms k k1 , k k2 , or
k k , then every linear map f : E F is continuous.
Proposition 26.42. If E = Rn or E = Cn , with any of the norms k k1 , k k2 , or k k , and
F is any normed vector space, then every linear map f : E F is continuous.
Proof. Let (e1 , . . . , en ) be the standard basis of Rn (a similar proof applies to Cn ). In view
of Proposition 7.2, it is enough to prove the proposition for the norm
kxk = max{|xi | | 1 i n}.
We have,
X
X
kf (v) f (u)k = kf (v u)k =
f (
(vi ui )ei )
=
(vi ui )f (ei )
,
1in
1in
and so,
kf (v) f (u)k
X
1in
X
kf (ei )k max |vi ui | =
kf (ei )k kv uk .
1in
1in
By the argument used in Proposition 26.41 to prove that (3) implies (4), f is continuous.
795
Actually, we proved in Theorem 7.3 that if E is a vector space of finite dimension, then
any two norms are equivalent, so that they define the same topology. This fact together with
Proposition 26.42 prove the following:
Theorem 26.43. If E is a vector space of finite dimension (over R or C), then all norms
are equivalent (define the same topology). Furthermore, for any normed vector space F ,
every linear map f : E F is continuous.
If E is a normed vector space of infinite dimension, a linear map f : E F may not be
continuous. As an example, let E be the infinite vector space of all polynomials over R.
Let
kP (X)k = max |P (x)|.
0x1
We leave as an exercise to show that this is indeed a norm. Let F = R, and let f : E F
be the map defined such that, f (P (X)) = P (3). It is clear that f is linear. Consider the
sequence of polynomials
n
X
.
Pn (X) =
2
n
It is clear that kPn k = 12 , and thus, the sequence Pn has the null polynomial as a limit.
However, we have
n
3
,
f (Pn (X)) = Pn (3) =
2
and the sequence f (Pn (X)) diverges to +. Consequently, in view of Proposition 26.12 (1),
f is not continuous.
We now consider the continuity of multilinear maps. We treat explicitly bilinear maps,
the general case being a straightforward extension.
Proposition 26.44. Given normed vector spaces E, F and G, for any bilinear map f : E
E G, the following conditions are equivalent:
(1) The function f is continuous at h0, 0i.
2) There is a constant k 0 such that,
kf (u, v)k k, for all u, v E such that kuk, kvk 1.
3) There is a constant k 0 such that,
kf (u, v)k kkukkvk, for all u, v E.
4) The function f is continuous at every point of E F .
796
Proof. It is similar to that of Proposition 26.41, with a small subtlety in proving that (3)
implies (4), namely that two different s that are not independent are needed.
If E, F , and G, are normed vector spaces, we denote the set of all continuous bilinear
maps f : E F G by L2 (E, F ; G). Using Proposition 26.44, we can define a norm on
L2 (E, F ; G) which makes it into a normed vector space.
Definition 26.29. Given normed vector spaces E, F , and G, for every continuous bilinear
map f : E F G, we define the norm kf k of f as
kf k = min {k 0 | kf (x, y)k kkxkkyk, for all x, y E}
= max {kf (x, y)k | kxk, kyk 1} .
From Definition 26.28, for every continuous bilinear map f L2 (E, F ; G), we have
kf (x, y)k kf kkxkkyk,
for all x, y E. It is easy to verify that L2 (E, F ; G) is a normed vector space under the
norm of Definition 26.29.
Given a bilinear map f : E F G, for every u E, we obtain a linear map denoted
f u : F G, defined such that, f u(v) = f (u, v). Furthermore, since
kf (x, y)k kf kkxkkyk,
it is clear that f u is continuous. We can then consider the map : E L(F ; G), defined
such that, (u) = f u, for any u E, or equivalently, such that,
(u)(v) = f (u, v).
Actually, it is easy to show that is linear and continuous, and that kk = kf k. Thus, f 7
defines a map from L2 (E, F ; G) to L(E; L(F ; G)). We can also go back from L(E; L(F ; G))
to L2 (E, F ; G). We summarize all this in the following proposition.
Proposition 26.45. Let E, F, G be three normed vector spaces. The map f 7 , from
L2 (E, F ; G) to L(E; L(F ; G)), defined such that, for every f L2 (E, F ; G),
(u)(v) = f (u, v),
is an isomorphism of vector spaces, and furthermore, kk = kf k.
As a corollary of Proposition 26.45, we get the following proposition which will be useful
when we define second-order derivatives.
797
Proposition 26.46. Let E, F be normed vector spaces. The map app from L(E; F ) E to
F , defined such that, for every f L(E; F ), for every u E,
app(f, u) = f (u),
is a continuous bilinear map.
Remark: If E and F are nontrivial, it can be shown that kappk = 1. It can also be shown
that composition
: L(E; F ) L(F ; G) L(E; G),
is bilinear and continuous.
The above propositions and definition generalize to arbitrary n-multilinear maps, with
n 2. Proposition 26.44 extends in the obvious way to any n-multilinear map f : E1
En F , but condition (3) becomes:
There is a constant k 0 such that,
kf (u1 , . . . , un )k kku1 k kun k, for all u1 E1 , . . . , un En .
Definition 26.29 also extends easily to
kf k = min {k 0 | kf (x1 , . . . , xn )k kkx1 k kxn k, for all xi Ei , 1 i n}
= max {kf (x1 , . . . , xn )k | kxn k, . . . , kxn k 1} .
Proposition 26.45 is also easily extended, and we get an isomorphism between continuous
n-multilinear maps in Ln (E1 , . . . , En ; F ), and continuous linear maps in
L(E1 ; L(E2 ; . . . ; L(En ; F )))
An obvious extension of Proposition 26.46 also holds.
Definition 26.30. A normed vector space (E, k k) over R (or C) which is a complete metric
space for the distance kv uk, is called a Banach space.
It can be shown that every normed vector space of finite dimension is a Banach space
(is complete). It can also be shown that if E and F are normed vector spaces, and F is a
Banach space, then L(E; F ) is a Banach space. If E, F and G are normed vector spaces,
and G is a Banach space, then L2 (E, F ; G) is a Banach space.
Finally, we consider normed affine spaces.
798
26.7
For geometric applications, we will need to consider affine spaces (E, E ) where the associated
Definition 26.31. Given an affine space (E, E ), where the space of translations E is a
vector space over R or C, we say that (E, E ) is a normed affine space if E is a normed
vector space with norm k k.
Given a normed affine space, there is a natural metric on E itself, defined such that
d(a, b) = k abk.
Observe that this metric is invariant under translation, that is,
d(a + u, b + u) = d(a, b).
Also, for every fixed a E and > 0, if we consider the map h : E E, defined such that,
h(x) = a +
ax,
Note that the map (a, b) 7 ab from E E to E is continuous, and similarly for the map
26.8
Futher Readings
A thorough treatment of general topology can be found in Munkres [84, 85], Dixmier [29],
Lang [70], Schwartz [93, 92], Bredon [18], and the classic, Seifert and Threlfall [95].
Chapter 27
A Detour On Fractals
27.1
A pleasant application of the Hausdorff distance and of the fixed point theorem for contracting mappings is a method for defining a class of self-similar fractals. For this, we can use
iterated function systems.
Definition 27.1. Given a metric space, (X, d), an iterated function system, for short, an
ifs, is a finite sequence of functions, (f1 , . . . , fn ), where each fi : X X is a contracting
mapping. A nonempty compact subset, K, of X is an invariant set (or attractor) for the ifs,
(f1 , . . . , fn ), if
K = f1 (K) fn (K).
The major result about ifss is the following:
Theorem 27.1. If (X, d) is a nonempty complete metric space, then every iterated function
system, (f1 , . . . , fn ), has a unique invariant set, A, which is a nonempty compact subset of
X. Furthermore, for every nonempty compact subset, A0 , of X, this invariant set, A, if the
limit of the sequence, (Am ), where Am+1 = f1 (Am ) fn (Am ).
Proof. Since X is complete, by Theorem 26.39, the space (K(X), D) is a complete metric
space. The theorem will follow from Theorem 26.38 if we can show that the map,
F : K(X) K(X), defined such that
F (K) = f1 (K) fn (K),
for every nonempty compact set, K, is a contracting mapping. Let A, B be any two nonempty
compact subsets of X and consider any D(A, B). Since each fi : X X is a contracting
mapping, there is some i , with 0 i < 1, such that
d(fi (a), fi (b)) i d(a, b),
799
800
F (A) V (F (B)).
F (B) V (F (A)),
and since this holds for all D(A, B), we proved that
D(F (A), F (B)) D(A, B)
where = max{1 , . . . , n }. Since 0 i < 1, we have 0 < 1 and F is indeed a
contracting mapping.
Theorem 27.1 justifies the existence of many familiar self-similar fractals. One of the
best known fractals is the Sierpinski gasket.
Example 27.1. Consider an equilateral triangle with vertices a, b, c, and let f1 , f2 , f3 be
the dilatations of centers a, b, c and ratio 1/2. The Sierpinski gasket is the invariant set of
the ifs (f1 , f2 , f3 ). The dilations f1 , f2 , f
3 can be defined explicitly as follows, assuming that
a = (1/2, 0), b = (1/2, 0), and c = (0, 3/2). The contractions f1 , f2 , f3 are specified by
1
1
x ,
2
4
1
y,
=
2
x0 =
y0
1
1
x+ ,
2
4
1
=
y,
2
x0 =
y0
and
1
x,
2
1
3
y+
.
=
2
4
x0 =
y0
801
3
3
1
0
y+ ,
x = x
4
4
4
3
1
3
x y+
,
y0 =
4
4
4
1
3
3
x0 = x +
y ,
4
4
4
3
1
3
y0 =
x y+
,
4
4
4
1
x,
2
1
3
y+
.
=
2
2
x0 =
y0
802
x0 =
y0
1
y,
2
1
y,
2
1
1
x0 = x y,
2
2
1
1
x y + 1.
y0 =
2
2
It can be shown that for any number of iterations, the polygon does not cross itself. This
means that no edge is traversed twice and that if a point is traversed twice, then this point
is the endpoint of some edge. The result of 13 iterations, starting with the line segment
((0, 0), (0, 1)), is shown in Figure 27.4.
The Heighway dragon turns out to fill a closed and bounded set. It can also be shown
that the plane can be tiled with copies of the Heighway dragon.
Another well known example is the Koch curve.
803
804
1
3
1
=
x
y ,
6
6
6
3
1
3
=
x+ y+
,
6
6
6
1
3
1
=
x+
y+ ,
6 6
6
3
1
3
=
x+ y+
,
6
6
6
x0 =
y0
x0
y0
x0
y0
1
2
x+ ,
3
3
1
=
y.
3
x0 =
y0
805
x0 =
y0
1
1
x+ ,
2
2
1
=
y + 1,
2
x0 =
y0
1
x0 = y + 1,
2
1
1
y0 =
x+ ,
2
2
1
y 1,
2
1
1
= x+ .
2
2
x0 =
y0
806
Chapter 28
Differential Calculus
28.1
This chapter contains a review of basic notions of differential calculus. First, we review
the definition of the derivative of a function f : R R. Next, we define directional derivatives and the total derivative of a function f : E F between normed affine spaces. Basic
properties of derivatives are shown, including the chain rule. We show how derivatives are
represented by Jacobian matrices. The mean value theorem is stated, as well as the implicit
function theorem and the inverse function theorem. Diffeomorphisms and local diffeomorphisms are defined. Tangent spaces are defined. Higher-order derivatives are defined, as well
as the Hessian. Schwarzs lemma (about the commutativity of partials) is stated. Several
versions of Taylors formula are stated, and a famous formula due to Fa`a di Brunos is given.
We first review the notion of the derivative of a real-valued function whose domain is an
open subset of R.
Let f : A R, where A is a nonempty open subset of R, and consider any a A.
The main idea behind the concept of the derivative of f at a, denoted by f 0 (a), is that
locally around a (that is, in some small open set U A containing a), the function f is
approximated linearly by the map
x 7 f (a) + f 0 (a)(x a).
Part of the difficulty in extending this idea to more complex spaces is to give an adequate
notion of linear approximation. Of course, we will use linear maps! Let us now review the
formal definition of the derivative of a real-valued function.
Definition 28.1. Let A be any nonempty open subset of R, and let a A. For any function
f : A R, the derivative of f at a A is the limit (if it exists)
lim
h0, hU
f (a + h) f (a)
,
h
807
808
df
(a).
where U = {h R | a + h A, h 6= 0}. This limit is denoted by f 0 (a), or Df (a), or dx
0
If f (a) exists for every a A, we say that f is differentiable on A. In this case, the map
df
a 7 f 0 (a) is denoted by f 0 , or Df , or dx
.
Note that since A is assumed to be open, A {a} is also open, and since the function
h 7 a + h is continuous and U is the inverse image of A {a} under this function, U is
indeed open and the definition makes sense.
We can also define f 0 (a) as follows: there is some function , such that,
f (a + h) = f (a) + f 0 (a) h + (h)h,
whenever a + h A, where (h) is defined for all h such that a + h A, and
lim
h0, hU
(h) = 0.
Remark: We can also define the notion of derivative of f at a on the left, and derivative
of f at a on the right. For example, we say that the derivative of f at a on the left is the
limit f 0 (a ) (if it exists)
f (a + h) f (a)
lim
,
h0, hU
h
where U = {h R | a + h A, h < 0}.
the vector space associated with E is denoted by E , and that the vector space associated
with F is denoted as F .
Since F is a normed affine space, making sense of f (a+h)f (a) is easy: we can define this
as f (a)f (a + h), the unique vector translating f (a) to f (a + h). We should note however,
that this quantity is a vector and not a point. Nevertheless, in defining derivatives, it is
809
this chapter, the vector ab will be denoted by b a. But now, how do we define the quotient
by a vector? Well, we dont!
A first possibility is to consider the directional derivative with respect to a vector u 6= 0
in E . We can consider the vector f (a + tu) f (a), where t R (or t C). Now,
f (a + tu) f (a)
t
makes sense. The idea is that in E, the points of the form a + tu for t in some small interval
[, +] in R (or C) form a line segment [r, s] in A containing a, and that the image of
this line segment defines a small curve segment on f (A). This curve segment is defined by
the map t 7 f (a + tu), from [r, s] to F , and the directional derivative Du f (a) defines the
direction of the tangent line at a to this curve. This leads us to the following definition.
Definition 28.2. Let E and F be two normed affine spaces, let A be a nonempty open
subset of E, and let f : A F be any function. For any a A, for any u 6= 0 in E , the
directional derivative of f at a w.r.t. the vector u, denoted by Du f (a), is the limit (if it
exists)
f (a + tu) f (a)
,
lim
t0, tU
t
where U = {t R | a + tu A, t 6= 0} (or U = {t C | a + tu A, t 6= 0}).
Since the map t 7 a + tu is continuous, and since A {a} is open, the inverse image U
of A {a} under the above map is open, and the definition of the limit in Definition 28.2
makes sense.
Remark: Since the notion of limit is purely topological, the existence and value of a directional derivative is independent of the choice of norms in E and F , as long as they are
equivalent norms.
The directional derivative is sometimes called the Gateaux derivative.
In the special case where E = R and F = R, and we let u = 1 (i.e., the real number 1,
viewed as a vector), it is immediately verified that D1 f (a) = f 0 (a), in the sense of Definition
28.1. When E = R (or E = C) and F is any normed vector space, the derivative D1 f (a),
also denoted by f 0 (a), provides a suitable generalization of the notion of derivative.
However, when E has dimension 2, directional derivatives present a serious problem,
which is that their definition is not sufficiently uniform. Indeed, there is no reason to believe
that the directional derivatives w.r.t. all nonnull vectors u share something in common. As
a consequence, a function can have all directional derivatives at a, and yet not be continuous
at a. Two functions may have all directional derivatives in some open sets, and yet their
composition may not. Thus, we introduce a more uniform notion.
810
Definition 28.3. Let E and F be two normed affine spaces, let A be a nonempty open subset
of E, and let f : A F be any function. For any a A, we say that f is differentiable at
h0, hU
(h) = 0,
inverse image U of A {a} under the above map is open in E , and it makes sense to say
that
lim (h) = 0.
h0, hU
f (a + h) f (a) L(h)
,
khk
and that the value (0) plays absolutely no role in this definition. The condition for f to be
differentiable at a amounts to the fact that
kf (a + h) f (a) L(h)k
lim
=0
h70
khk
as h 6= 0 approaches 0, when a + h A. However, it does no harm to assume that (0) = 0,
and we will assume this from now on.
811
|t|
f (a + tu) f (a)
= L(u) + (tu)kuk,
t
t
and the limit when t 6= 0 approaches 0 is indeed Du f (a).
The uniqueness of L follows from Proposition 28.1. Also, when E is of finite dimension, it
is easily shown that every linear map is continuous, and this assumption is then redundant.
It is important to note that the derivative Df (a) of f at a is a continuous linear map
from the vector space E to the vector space F , and not a function from the affine space E
to the affine space F .
As an example, consider the map, f : Mn (R) Mn (R), given by
f (A) = A> A I,
where Mn (R) is equipped with anypmatrix norm, since they are all equivalent; for example,
pick the Frobenius norm, kAkF = tr(A> A). We claim that
Df (A)(H) = A> H + H > A,
We have
f (A + H) f (A) (A> H + H > A) = (A + H)> (A + H) I (A> A I) A> H H > A
= H > H.
It follows that
(H) =
lim (H) = 0,
H70
812
Df : A L( E ; F ),
called the derivative of f on A, and also denoted by df . Recall that L( E ; F ) denotes the
is a basis of E , we can define the directional derivatives with respect to the vectors in the
basis (u1 , . . . , un ) (actually, we can also do it for an infinite frame). This way, we obtain the
definition of partial derivatives, as follows.
Definition 28.4. For any two normed affine spaces E and F , if E is of finite dimension
n, for every frame (a0 , (u1 , . . . , un )) for E, for every a E, for every function f : E F ,
the directional derivatives Duj f (a) (if they exist) are called the partial derivatives of f with
respect to the frame (a0 , (u1 , . . . , un )). The partial derivative Duj f (a) is also denoted by
f
j f (a), or
(a).
xj
f
(a) for a partial derivative, although customary and going back to
xj
Leibniz, is a logical obscenity. Indeed, the variable xj really has nothing to do with the
formal definition. This is just another of these situations where tradition is just too hard to
overthrow!
The notation
813
Proof. Straightforward.
We now state the very useful chain rule.
Theorem 28.5. Given three normed affine spaces E, F , and G, let A be an open set in
E, and let B an open set in F . For any functions f : A F and g : B G, such that
f (A) B, for any a A, if Df (a) exists and Dg(f (a)) exists, then D(g f )(a) exists, and
D(g f )(a) = Dg(f (a)) Df (a).
Proof. It is not difficult, but more involved than the previous two.
Theorem 28.5 has many interesting consequences. We mention two corollaries.
Proposition 28.6. Given three normed affine spaces E, F , and G, for any open subset A in
E, for any a A, let f : A F such that Df (a) exists, and let g : F G be a continuous
affine map. Then, D(g f )(a) exists, and
D(g f )(a) = g Df (a),
where g is the linear map associated with the affine map g.
Proposition 28.7. Given two normed affine spaces E and F , let A be some open subset in
E, let B be some open subset in F , let f : A B be a bijection from A to B, and assume
that Df exists on A and that Df 1 exists on B. Then, for every a A,
Df 1 (f (a)) = (Df (a))1 .
Proposition 28.7 has the remarkable consequence that the two vector spaces E and F
have the same dimension. In other words, a local property, the existence of a bijection f
between an open set A of E and an open set B of F , such that f is differentiable on A and
f 1 is differentiable on B, implies a global property, that the two vector spaces E and F
have the same dimension.
We now consider the situation where the normed affine space F is a finite direct sum
F = (F1 , b1 ) (Fm , bm ).
Proposition 28.8. Given normed affine spaces E and F = (F1 , b1 ) (Fm , bm ), given
any open subset A of E, for any a A, for any function f : A F , letting f = (f1 , . . . , fm ),
Df (a) exists iff every Dfi (a) exists, and
Df (a) = in1 Df1 (a) + + inm Dfm (a).
Proof. Observe that f (a + h) f (a) = (f (a + h) b) (f (a) b), where b = (b1 , . . . , bm ),
and thus, as far as dealing with derivatives, Df (a) is equal to Dfb (a), where fb : E F is
defined such that fb (x) = f (x)b, for every x E. Thus, we can work with the vector space
F instead of the affine space F . The proposition is then a simple application of Theorem
28.5.
814
In the special case where F is a normed affine space of finite dimension m, for any frame
m, for any frame (b0 , (v1 , . . . , vm )) of F , where (v1 , . . . , vm ) is a basis of F , for every a E,
a function f : E F is differentiable at a iff each fi is differentiable at a, and
for every u E .
We now consider the situation where E is a finite direct sum. Given a normed affine
space E = (E1 , a1 ) (En , an ) and a normed affine space F , given any open subset A
of E, for any c = (c1 , . . . , cn ) A, we define the continuous functions icj : Ej E, such that
icj (x) = (c1 , . . . , cj1 , x, cj+1 , . . . , cn ).
For any function f : A F , we have functions f icj : Ej F , defined on (icj )1 (A), which
contains cj . If D(f icj )(cj ) exists, we call it the partial derivative of f w.r.t. its jth argument,
This notion is a generalization of the notion defined in Definition 28.4. In fact, when
E is of dimension n, and a frame (a0 , (u1 , . . . , un )) has been chosen, we can write E =
(E1 , a1 ) (En , an ), for some obvious (Ej , aj ) (as explained just after Proposition 28.8),
and then
Dj f (c)(uj ) = j f (c),
and the two notions are consistent.
The definition of icj and of Dj f (c) also makes sense for a finite product E1 En of
affine spaces Ei . We will use freely the notation j f (c) instead of Dj f (c).
The notion j f (c) introduced in Definition 28.4 is really that of the vector derivative,
whereas Dj f (c) is the corresponding linear map. Although perhaps confusing, we identify
the two notions. The following proposition holds.
815
fa : E F such that, fa (u) = f (a + u), for every u E , clearly, Df (c) = Dfa (c a), and
thus, we can work with the function fa whose domain is the vector space E . The proposition
is then a simple application of Theorem 28.5.
28.2
Jacobian Matrices
If both E and F are of finite dimension, for any frame (a0 , (u1 , . . . , un )) of E and any frame
(b0 , (v1 , . . . , vm )) of F , every function f : E F is determined by m functions fi : E R
(or fi : E C), where
f (x) = b0 + f1 (x)v1 + + fm (x)vm ,
for every x E. From Proposition 28.1, we have
Df (a)(uj ) = Duj f (a) = j f (a),
and from Proposition 28.9, we have
Df (a)(uj ) = Df1 (a)(uj )v1 + + Dfi (a)(uj )vi + + Dfm (a)(uj )vm ,
that is,
Df (a)(uj ) = j f1 (a)v1 + + j fi (a)vi + + j fm (a)vm .
Since the j-th column of the mn-matrix representing Df (a) w.r.t. the bases (u1 , . . . , un )
and (v1 , . . . , vm ) is equal to the components of the vector Df (a)(uj ) over the basis (v1 , . . . ,vm ),
the linear map Df (a) is determined by the m n-matrix J(f )(a) = (j fi (a)), (or J(f )(a) =
fi
(
(a))):
xj
J(f )(a) =
..
..
..
.
.
.
.
.
.
1 fm (a) 2 fm (a) . . . n fm (a)
816
or
f1
f1
x1 (a) x2 (a)
f2
f2
x (a) x (a)
2
J(f )(a) = 1
..
..
.
.
fm
fm
(a)
(a)
x1
x2
...
...
..
...
f1
(a)
xn
f2
(a)
xn
..
.
fm
(a)
xn
817
f1
t (a)
f2
J(f )(a) =
t (a)
f
3
(a)
t
2. When E = R2 , and F = R3 , a function : R2 R3 defines a parametric surface.
Letting = (f, g, h), its Jacobian matrix at a R2 is
f
f
u (a) v (a)
g
g
J()(a) =
(a)
(a)
u
v
h
h
(a)
(a)
u
v
3. When E = R3 , and F = R, for a function f : R3 R, the Jacobian matrix at a R3
is
f
f
f
J(f )(a) =
(a)
(a)
(a) .
x
y
z
More generally, when f : Rn R, the Jacobian matrix at a Rn is the row vector
f
f
J(f )(a) =
(a)
(a) .
x1
xn
Its transpose is a column vector called the gradient of f at a, denoted by gradf (a) or f (a).
Then, given any v Rn , note that
Df (a)(v) =
f
f
(a) v1 + +
(a) vn = gradf (a) v,
x1
xn
818
J(h)(a) = ..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
1 gm (b) 2 gm (b) . . . n gm (b)
1 fn (a) 2 fn (a) . . . p fn (a)
or
g1
g1
y1 (b) y2 (b)
g2
g2
y (b) y (b)
2
J(h)(a) = 1
..
..
.
.
gm
gm
(b)
(b)
y1
y2
...
...
..
...
f1
g1
(a)
(b)
yn x1
f2
g2
(a)
(b)
yn
x1
..
..
.
.
fn
gm
(a)
(b)
x1
yn
f1
(a) . . .
x2
f2
(a) . . .
x2
..
..
.
.
fn
(a) . . .
x2
f1
(a)
xp
f2
(a)
xp
.
..
.
fn
(a)
xp
X gi
hi
fk
(a) =
(b)
(a).
xj
y
x
k
j
k=1
Given two normed affine spaces E and F of finite dimension, given an open subset A of
E, if a function f : A F is differentiable at a A, then its Jacobian matrix is well defined.
One should be warned that the converse is false. There are functions such that all the
partial derivatives exist at some a A, but yet, the function is not differentiable at a,
and not even continuous at a. For example, consider the function f : R2 R, defined such
that f (0, 0) = 0, and
x2 y
if (x, y) 6= (0, 0).
f (x, y) = 4
x + y2
h
For any u 6= 0, letting u =
, we have
k
f (0 + tu) f (0)
h2 k
= 2 4
,
t
t h + k2
819
h2
k
if k 6= 0
0 if k = 0.
Thus, Du f (0, 0) exists for all u 6= 0. On the other hand, if Df (0, 0) existed, it would be
a linear map Df (0, 0) : R2 R represented by a row matrix ( ), and we would have
Du f (0, 0) = Df (0, 0)(u) = h + k, but the explicit formula for Du f (0, 0) is not linear. As
a matter of fact, the function f is not continuous at (0, 0). For example, on the parabola
y = x2 , f (x, y) = 21 , and when we approach the origin on this parabola, the limit is 12 , when
in fact, f (0, 0) = 0.
Du f (0, 0) =
However, there are sufficient conditions on the partial derivatives for Df (a) to exist,
namely, continuity of the partial derivatives.
kDf (x)k M,
820
kf (a + h) f (a)k M khk.
28.3
Given three normed affine spaces E, F , and G, given a function f : E F G, given any
c G, it may happen that the equation
f (x, y) = c
821
has the property that, for some open sets A E, and B F , there is a function g : A B,
such that
f (x, g(x)) = c,
for all x A. Such a situation is usually very rare, but if some solution (a, b) E F
such that f (a, b) = c is known, under certain conditions, for some small open sets A E
containing a and B F containing b, the existence of a unique g : A B, such that
f (x, g(x)) = c,
for all x A, can be shown. Under certain conditions, it can also be shown that g is
continuous, and differentiable. Such a theorem, known as the implicit function theorem, can
be shown. We state a version of this result below, following Schwartz [94]. The proof (see
Schwartz [94]) is fairly involved, and uses a fixed-point theorem for contracting mappings in
complete metric spaces. Other proofs can be found in Lang [69] and Cartan [20].
Theorem 28.13. Let E, F , and G, be normed affine spaces, let be an open subset of
E F , let f : G be a function defined on , let (a, b) , let c G, and assume that
f (a, b) = c. If the following assumptions hold
(1) The function f : G is continuous on ;
(2) F is a complete normed affine space (and so is G);
f
f
(x, y) exists for every (x, y) , and
: L( F ; G ) is continuous;
y
y
f
1
f
(a, b) is a bijection of L( F ; G ), and
(a, b)
L( G ; F );
(4)
y
y
(3)
(a, b);
y
x
822
and if in addition
(6)
f
: L( E ; G ) is also continuous (and thus, in view of (3), f is C 1 on );
x
then
(x, g(x)),
y
x
for all x A.
The implicit function theorem plays an important role in the calculus of variations. We
now consider another very important notion, that of a (local) diffeomorphism.
Definition 28.7. Given two topological spaces E and F , and an open subset A of E, we
say that a function f : A F is a local homeomorphism from A to F if for every a A,
there is an open set U A containing a and an open set V containing f (a) such that f is a
homeomorphism from U to V = f (U ). If B is an open subset of F , we say that f : A F
is a (global) homeomorphism from A to B if f is a homeomorphism from A to B = f (A).
If E and F are normed affine spaces, we say that f : A F is a local diffeomorphism from
A to F if for every a A, there is an open set U A containing a and an open set V
containing f (a) such that f is a bijection from U to V , f is a C 1 -function on U , and f 1
is a C 1 -function on V = f (U ). We say that f : A F is a (global) diffeomorphism from A
to B if f is a homeomorphism from A to B = f (A), f is a C 1 -function on A, and f 1 is a
C 1 -function on B.
Note that a local diffeomorphism is a local homeomorphism. Also, as a consequence of
Proposition 28.7, if f is a diffeomorphism on A, then Df (a) is a linear isomorphism for every
a A. The following theorem can be shown. In fact, there is a fairly simple proof using
Theorem 28.13; see Schwartz [94], Lang [69] and Cartan [20].
Theorem 28.14. Let E and F be complete normed affine spaces, let A be an open subset
of E, and let f : A F be a C 1 -function on A. The following properties hold:
(1) For every a A, if Df (a) is a linear isomorphism (which means that both Df (a)
and (Df (a))1 are linear and continuous),1 then there exist some open subset U A
containing a, and some open subset V of F containing f (a), such that f is a diffeomorphism from U to V = f (U ). Furthermore,
Df 1 (f (a)) = (Df (a))1 .
For every neighborhood N of a, its image f (N ) is a neighborhood of f (a), and for every
open ball U A of center a, its image f (U ) contains some open ball of center f (a).
1
Actually, since E and F are Banach spaces, by the Open Mapping Theorem, it is sufficient to assume
that Df (a) is continuous and bijective; see Lang [69].
823
(2) If Df (a) is invertible for every a A, then B = f (A) is an open subset of F , and
f is a local diffeomorphism from A to B. Furthermore, if f is injective, then f is a
diffeomorphism from A to B.
Part (1) of Theorem 28.14 is often referred to as the (local) inverse function theorem.
It plays an important role in the study of manifolds and (ordinary) differential equations.
If E and F are both of finite dimension, and some frames have been chosen, the invertibility of Df (a) is equivalent to the fact that the Jacobian determinant det(J(f )(a))
is nonnull. The case where Df (a) is just injective or just surjective is also important for
defining manifolds, using implicit definitions.
Definition 28.8. Let E and F be normed affine spaces, where E and F are of finite dimension (or both E and F are complete), and let A be an open subset of E. For any a A, a
C 1 -function f : A F is an immersion at a if Df (a) is injective. A C 1 -function f : A F
is a submersion at a if Df (a) is surjective. A C 1 -function f : A F is an immersion on A
(resp. a submersion on A) if Df (a) is injective (resp. surjective) for every a A.
The following results can be shown.
Proposition 28.15. Let A be an open subset of Rn , and let f : A Rm be a function.
For every a A, f : A Rm is a submersion at a iff there exists an open subset U of A
containing a, an open subset W Rnm , and a diffeomorphism : U f (U ) W , such
that,
f = 1 ,
where 1 : f (U ) W f (U ) is the first projection. Equivalently,
(f 1 )(y1 , . . . , ym , . . . , yn ) = (y1 , . . . , ym ).
/ f (U ) W
U AN
NNN
NNN
1
N
f NNN
&
f (U ) Rm
Futhermore, the image of every open subset of A under f is an open subset of F . (The same
result holds for Cn and Cm ).
Proposition 28.16. Let A be an open subset of Rn , and let f : A Rm be a function.
For every a A, f : A Rm is an immersion at a iff there exists an open subset U of
A containing a, an open subset V containing f (a) such that f (U ) V , an open subset W
containing 0 such that W Rmn , and a diffeomorphism : V U W , such that,
f = in1 ,
824
where in1 : U U W is the injection map such that in1 (u) = (u, 0), or equivalently,
( f )(x1 , . . . , xn ) = (x1 , . . . , xn , 0, . . . , 0).
f
/ f (U ) V
U AM
MMM
MMM
M
in1 MMM
&
U W
28.4
825
v = Df (a)(u),
and the affine tangent family at (a, f (a)) to is an affine variety Ta () of E F , defined
by the condition (equation)
(x, y) Ta () iff
subspace of E F , the set Ta () is an affine variety. Thus, the affine tangent space at a
point (a, f (a)) is a familar object, a line, a plane, etc.
As an illustration, when E = R2 and F = R, the affine tangent plane at the point (a, b, c)
to the surface of equation z = f (x, y), is defined by the equation
z =c+
f
f
(a, b)(x a) +
(a, b)(y b).
x
y
If E = R and F = R2 , the tangent line at (a, b, c), to the curve of equations y = g(x),
z = h(x), is defined by the equations
y = b + Dg(a)(x a),
z = c + Dh(a)(x a).
Thus, derivatives and partial derivatives have the desired intended geometric interpretation as tangent spaces. Of course, in order to deal with this topic properly, we really would
have to go deeper into the study of (differential) manifolds.
We now briefly consider second-order and higher-order derivatives.
28.5
Given two normed affine spaces E and F , and some open subset A of E, if Df (a) is defined
826
Recall from Proposition 26.46, that the map app from L( E ; F ) E to F , defined such
is a continuous bilinear map. Thus, in particular, given a fixed v E , the linear map
827
Then, the above discussion can be summarized by saying that when D2 f (a) is defined,
we have
D2 f (a)(u, v) = Du Dv f (a).
When E has finite dimension and (a0 , (e1 , . . . , en )) is a frame for E, we denote Dej Dei f (a)
2f
2f
(a), when i 6= j, and we denote Dei Dei f (a) by
(a).
by
xi xj
x2i
The following important lemma attributed to Schwarz can be shown, using Lemma 28.11.
for all u, v E .
Lemma 28.18. (Schwarzs lemma) Given two normed affine spaces E and F , given any
open subset A of E, given any f : A F , for every a A, if D2 f (a) exists, then D2 f (a)
1
2
2f
f
(a)
(a)
x x
x22
D2 f (a)(u, v) = U >
1 .2
..
..
.
2
2
f
f
(a)
(a)
x1 xn
x2 xn
...
...
...
...
2f
(a)
x1 xn
2
f
(a)
x2 xn V
..
f
(a)
2
xn
828
where U is the column matrix representing u, and V is the column matrix representing v,
over the frame (a0 , (e1 , . . . , en )).
The above symmetric matrix is called the Hessian of f at a. If F itself is of finite
dimension, and (b0 , (v1 , . . . , vm )) is a frame for F , then f = (f1 , . . . , fm ), and each component
D2 f (a)i (u, v) of D2 f (a)(u, v) (1 i m), can be written as
2
2 fi
2 fi
fi
(a) . . .
(a)
x2 (a)
x1 x2
x1 xn
2
2
2
fi
fi
fi
(a)
(a)
...
(a)
2
2
> x x
x2
x2 xn V
D f (a)i (u, v) = U 1 2
.
.
..
...
.
.
.
.
.
2
2
2
fi
fi
fi
(a)
(a) . . .
(a)
x1 xn
x2 xn
x2n
Thus, we could describe the vector D2 f (a)(u, v) in terms of an mnmn-matrix consisting
of m diagonal blocks, which are the above Hessians, and the row matrix (U > , . . . , U > ) (m
times) and the column matrix consisting of m copies of V .
We now indicate briefly how higher-order derivatives are defined. Let m 2. Given
a function f : A F as before, for any a A, if the derivatives Di f exist on A for all
i, 1 i m 1, by induction, Dm1 f can be considered to be a continuous function
for all u1 , . . . , um E , and all permutations on {1, . . . , m}. Then, the following generalization of Schwarzs lemma holds.
829
Lemma 28.19. Given two normed affine spaces E and F , given any open subset A of E,
given any f : A F , for every a A, for every m 1, if Dm f (a) exists, then Dm f (a)
X
j
u1,j1 um,jm
mf
(a),
xj1 . . . xjm
where j ranges over all functions j : {1, . . . , m} {1, . . . , n}, for any m vectors
uj = uj,1 e1 + + uj,n en .
The concept of C 1 -function is generalized to the concept of C m -function, and Theorem
28.12 can also be generalized.
Definition 28.11. Given two normed affine spaces E and F , and an open subset A of E,
for any m 1, we say that a function f : A F is of class C m on A or a C m -function on
A if Dk f exists and is continuous on A for every k, 1 k m. We say that f : A F
is of class C on A or a C -function on A if Dk f exists and is continuous on A for every
k 1. A C -function (on A) is also called a smooth function (on A). A C m -diffeomorphism
f : A B between A and B (where A is an open subset of E and B is an open subset
of B) is a bijection between A and B = f (A), such that both f : A B and its inverse
f 1 : B A are C m -functions.
Equivalently, f is a C m -function on A if f is a C 1 -function on A and Df is a C m1 function on A.
We have the following theorem giving a necessary and sufficient condition for f to a
C -function on A. A generalization to the case where E = (E1 , a1 ) (En , an ) also
holds.
m
Theorem 28.20. Given two normed affine spaces E and F , where E is of finite dimension
n, and where (a0 , (u1 , . . . , un )) is a frame of E, given any open subset A of E, given any
function f : A F , for any m 1, the derivative Dm f is a C m -function on A iff every
kf
(a)) is defined and continuous on A, for all
partial derivative Dujk . . . Duj1 f (or
xj1 . . . xjk
830
X
j
u1,j1 um,jm
mf
(a),
xj1 . . . xjm
where j ranges over all functions j : {1, . . . , m} {1, . . . , n}, for any m vectors
uj = uj,1 e1 + + uj,n en .
We can then group the various occurrences of xjk corresponding to the same variable xjk ,
and this leads to the notation
1 2
n
f (a),
x1
x2
xn
where 1 + 2 + + n = m.
If we denote (1 , . . . , n ) simply by , then we denote
n
1 2
f
x1
x2
xn
by
f,
f.
or
x
` DI BRUNOS FORMULA
28.6. TAYLORS FORMULA, FAA
28.6
831
We discuss, without proofs, several versions of Taylors formula. The hypotheses required in
each version become increasingly stronger. The first version can be viewed as a generalization
by f (h ). The version of Taylors formula given next is sometimes referred to as the formula
of TaylorYoung.
Theorem 28.21. (TaylorYoung) Given two normed affine spaces E and F , for any open
subset A E, for any function f : A F , for any a A, if Dk f exists in A for all k,
1 k m 1, and if Dm f (a) exists, then we have:
f (a + h) = f (a) +
1 m
1 1
D f (a)(h) + +
D f (a)(hm ) + khkm (h),
1!
m!
+
D
f
(a)(h
)
+
,
1!
m!
(m + 1)!
(m + 1)!
where M = maxx]a,a+h[ kDm+1 f (x) Lk.
832
The above theorem is sometimes stated under the slightly stronger assumption that f is
a C m -function on A. If f : A R is a real-valued function, Theorem 28.22 can be refined a
little bit. This version is often called the formula of TaylorMacLaurin.
Theorem 28.23. (TaylorMacLaurin) Let E be a normed affine space, let A be an open
subset of E, and let f : A R be a real-valued function on A. Given any a A and any
1 1
1 m
1
D f (a)(h) + +
D f (a)(hm ) +
Dm+1 f (a + h)(hm+1 ).
1!
m!
(m + 1)!
We also mention for mathematical culture, a version with integral remainder, in the
case of a real-valued function. This is usually called Taylors formula with integral remainder .
Theorem 28.24. (Taylors formula with integral remainder) Let E be a normed affine space,
let A be an open subset of E, and let f : A R be a real-valued function on A. Given any
1 1
1 m
D f (a)(h) + +
D f (a)(hm )
1!
m!
Z 1
i
(1 t)m h m+1
m+1
+
D
f (a + th)(h
) dt.
m!
0
The advantage of the above formula is that it gives an explicit remainder. We now
examine briefly the situation where E is of finite dimension n, and (a0 , (e1 , . . . , en )) is a
frame for E. In this case, we get a more explicit expression for the expression
k=m
X
i=0
1 k
D f (a)(hk )
k!
1 k
D f (a)(hk ) =
k!
k
1 ++kn
k n
hk11 hknn k1
f (a),
k1 ! kn ! x1
xn
m
which, using the abbreviated notation introduced at the end of Section 28.5, can also be
written as
k=m
X h
X 1
Dk f (a)(hk ) =
f (a).
k!
!
k=0
||m
` DI BRUNOS FORMULA
28.6. TAYLORS FORMULA, FAA
833
The advantange of the above notation is that it is the same as the notation used when
n = 1, i.e., when E = R (or E = C). Indeed, in this case, the TaylorMacLaurin formula
reads as:
f (a + h) = f (a) +
hm m
hm+1
h 1
D f (a) + +
D f (a) +
Dm+1 f (a + h),
1!
m!
(m + 1)!
for some R, with 0 < < 1, where Dk f (a) is the value of the k-th derivative of f at
a (and thus, as we have already said several times, this is the kth-order vector derivative,
which is just a scalar, since F = R).
In the above formula, the assumptions are that f : [a, a + h] R is a C m -function on
[a, a + h], and that Dm+1 f (x) exists for every x ]a, a + h[.
Taylors formula is useful to study the local properties of curves and surfaces. In the case
of a curve, we consider a function f : [r, s] F from a closed interval [r, s] of R to some
affine space F , the derivatives Dk f (a)(hk ) correspond to vectors hk Dk f (a), where Dk f (a) is
the kth vector derivative of f at a (which is really Dk f (a)(1, . . . , 1)), and for any a ]r, s[,
Theorem 28.21 yields the following formula:
f (a + h) = f (a) +
hm m
h 1
D f (a) + +
D f (a) + hm (h),
1!
m!
for any h such that a + h ]r, s[, and where limh0, h6=0 (h) = 0.
In the case of functions f : Rn R, it is convenient to have formulae for the Taylor
Young formula and the TaylorMacLaurin formula in terms of the gradient and the Hessian.
Recall that the gradient f (a) of f at a Rn is the column vector
f
x1 (a)
(a)
f (a) x2 ,
..
.
(a)
xn
and that
f 0 (a)(u) = Df (a)(u) = f (a) u,
for any u Rn (where means inner product). The Hessian matrix 2 f (a) of f at a Rn
834
2f
2f
(a)
(a)
x2
x1 x2
1
2
2f
f
(a)
(a)
x1 x2
x22
2 f (a) =
..
..
.
.
2
2
f
f
(a)
(a)
x1 xn
x2 xn
...
...
..
...
2f
(a)
x1 xn
2f
(a)
x2 xn ,
..
f
(a)
2
xn
and we have
D2 f (a)(u, v) = u> 2 f (a) v = u 2 f (a)v = 2 f (a)u v,
for all u, v Rn . Then, we have the following three formulations of the formula of Taylor
Young of order 2:
1
f (a + h) = f (a) + Df (a)(h) + D2 f (a)(h, h) + khk2 (h)
2
1
f (a + h) = f (a) + f (a) h + (h 2 f (a)h) + (h h)(h)
2
1
>
f (a + h) = f (a) + (f (a)) h + (h> 2 f (a) h) + (h> h)(h).
2
with limh70 (h) = 0.
One should keep in mind that only the first formula is intrinsic (i.e., does not depend on
the choice of a basis), whereas the other two depend on the basis and the inner product chosen
on Rn . As an exercise, the reader should write similar formulae for the TaylorMacLaurin
formula of order 2.
Another application of Taylors formula is the derivation of a formula which gives the mth derivative of the composition of two functions, usually known as Fa`a di Brunos formula.
This formula is useful when dealing with geometric continuity of splines curves and surfaces.
Proposition 28.25. Given any normed affine space E, for any function f : R R and any
function g : R E, for any a R, letting b = f (a), f (i) (a) = Di f (a), and g (i) (b) = Di g(b),
for any m 1, if f (i) (a) and g (i) (b) exist for all i, 1 i m, then (gf )(m) (a) = Dm (gf )(a)
exists and is given by the following formula:
(1) i1
(m) im
X
X
f (a)
f (a)
m!
(j)
(m)
g (b)
.
(g f ) (a) =
i
!
i
!
1!
m!
1
m
0jm i +i ++i =j
1
i1 +2i2 ++mim =m
i1 ,i2 , ,im 0
(g f )(2) (a) = g (2) (b)(f (1) (a))2 + g (1) (b)f (2) (a).
28.7
835
In this section, we briefly consider vector fields and covariant derivatives of vector fields.
Such derivatives play an important role in continuous mechanics. Given a normed affine
space (E, E ), a vector field over (E, E ) is a function X : E E . Intuitively, a vector field
assigns a vector to every point in E. Such vectors could be forces, velocities, accelerations,
etc.
Given two vector fields X, Y defined on some open subset of E, for every point a ,
we would like to define the derivative of X with respect to Y at a. This is a type of directional
derivative that gives the variation of X as we move along Y , and we denote it by DY X(a).
The derivative DY X(a) is defined as follows.
Definition 28.12. Let (E, E ) be a normed affine space. Given any open subset of E,
given any two vector fields X and Y defined over , for any a , the covariant derivative
(or Lie derivative) of X w.r.t. the vector field Y at a, denoted by DY X(a), is the limit (if it
exists)
X(a + tY (a)) X(a)
,
lim
t0, tU
t
where U = {t R | a + tY (a) , t 6= 0}.
If Y is a constant vector field, it is immediately verified that the map
X 7 DY X(a)
is a linear map called the derivative of the vector field X, and denoted by DX(a). If
f : E R is a function, we define DY f (a) as the limit (if it exists)
lim
t0, tU
f (a + tY (a)) f (a)
,
t
836
Proposition 28.26. The covariant derivative DY X(a) satisfies the following properties:
D(Y1 +Y2 ) X(a) = DY1 X(a) + DY2 X(a),
Df Y X(a) = f (a)DY X(a),
DY (X1 + X2 )(a) = DY X1 (a) + DY X2 (a),
DY f X(a) = DY f (a)X(a) + f (a)DY X(a),
where X, Y, X1 , X2 , Y1 , Y2 are smooth vector fields over , and f : E R is a smooth function.
In differential geometry, the above properties are taken as the axioms of affine connections, in order to define covariant derivatives of vector fields over manifolds. In many cases,
the vector field Y is the tangent field of some smooth curve : ] , [ E. If so, the
following proposition holds.
Proposition 28.27. Given a smooth curve : ] , [ E, letting Y be the vector field
defined on (] , [) such that
d
Y ((u)) =
(u),
dt
for any vector field X defined on (] , [), we have
d
DY X(a) =
X((t)) (0),
dt
where a = (0).
The derivative DY X(a) is thus the derivative of the vector field X along the curve , and
it is called the covariant derivative of X along .
Given an affine frame (O, (u1 , . . . , un )) for (E, E ), it is easily seen that the covariant
derivative DY X(a) is expressed as follows:
n X
n
X
Xi
DY X(a) =
Yj
(a)ei .
x
j
i=1 j=1
Generally, DY X(a) 6= DX Y (a). The quantity
[X, Y ] = DX Y DY X
is called the Lie bracket of the vector fields X and Y . The Lie bracket plays an important
role in differential geometry. In terms of coordinates,
n X
n
X
Yi
Xi
[X, Y ] =
Xj
Yj
ei .
x
x
j
j
i=1 j=1
28.8
837
Futher Readings
A thorough treatment of differential calculus can be found in Munkres [85], Lang [70],
Schwartz [94], Cartan [20], and Avez [6]. The techniques of differential calculus have many
applications, especially to the geometry of curves and surfaces and to differential geometry
in general. For this, we recommend do Carmo [30, 31] (two beautiful classics on the subject), Kreyszig [65], Stoker [102], Gray [50], Berger and Gostiaux [10], Milnor [81], Lang [68],
Warner [114] and Choquet-Bruhat [23].
838
Chapter 29
Extrema of Real-Valued Functions
29.1
840
By abuse of language, we often say that the point u itself is a local minimum or a
local maximum, even though, strictly speaking, this does not make sense.
We begin with a well-known necessary condition for a local extremum.
Proposition 29.1. Let E be a normed vector space and let J : R be a function, with
some open subset of E. If the function J has a local extremum at some point u and
if J is differentiable at u, then
dJ(u) = J 0 (u) = 0.
Proof. Pick any v E. Since is open, for t small enough we have u + tv , so there is
an open interval I R such that the function given by
(t) = J(u + tv)
for all t I is well-defined. By applying the chain rule, we see that is differentiable at
t = 0, and we get
0 (0) = dJu (v).
Without loss of generality, assume that u is a local minimum. Then we have
0 (0) = lim
t70
(t) (0)
0
t
and
(t) (0)
0,
t
which shows that 0 (0) = dJu (v) = 0. As v E is arbitrary, we conclude that dJu = 0.
0 (0) = lim
t70+
It is important to note that the fact that is open is crucial. For example, if J is the
identity function on [0, 1], then dJ(x) = 1 for all x [0, 1], even though J has a minimum at
x = 0 and a maximum at x = 1. Also, if E = Rn , then the condition dJ(u) = 0 is equivalent
to the system
J
(u1 , . . . , un ) = 0
x1
..
.
J
(u1 , . . . , un ) = 0.
xn
In many practical situations, we need to look for local extrema of a function J under
additional constraints. This situation can be formalized conveniently as follows: We have a
function J : R defined on some open subset of a normed vector space, but we also
have some subset U of and we are looking for the local extrema of J with respect to the
set U . Note that in most cases, U is not open. In fact, U is usually closed.
841
(u1 , u2 ) L(E2 ; E2 )
x2
and
1
L(E2 ; E2 ),
(u1 , u2 )
x2
842
dg(u1 ) =
(u)
(u).
x2
x1
It follows that the restriction of J to (U1 U2 ) U yields a function G of a single variable,
with
G(v1 ) = J(v1 , g(v1 ))
for all v1 U1 . Now, the function G is differentiable at u1 and it has a local extremum at
u1 on U1 , so Proposition 29.1 implies that
dG(u1 ) = 0.
By the chain rule,
J
(u) +
x1
J
(u)
=
x1
dG(u1 ) =
J
(u) dg(u1 )
x2
1
(u)
(u)
(u).
x2
x2
x1
1
(u)
(u),
x2
x1
J
J
(u) =
(u)
x2
x2
1
(u)
(u),
x2
x2
if we let
J
(u) =
(u)
x2
1
(u)
,
x2
then we get
J
J
(u) +
(u)
x1
x2
1
J
=
(u)
(u)
(u) +
(u)
x2
x2
x1
x2
= (u) d(u),
dJ(u) =
843
In most applications, we have E1 = Rnm and E2 = Rm for some integers m, n such that
1 m < n, is an open subset of Rn , J : R, and we have m functions i : R
defining the subset
U = {v | i (v) = 0, 1 i m}.
Theorem 29.2 yields the following necessary condition:
Theorem 29.3. (Necessary condition for a constrained extremum in terms of Lagrange
multipliers) Let be an open subset of Rn , consider m C 1 -functions i : R (with
1 m < n), let
U = {v | i (v) = 0, 1 i m},
n
and let u U be a point such that the derivatives di (u) L(R
; R) are linearly independent;
equivalently, assume that the m n matrix (i /xj )(u) has rank m. If J : R is a
function which is differentiable at u U and if J has a local constrained extremum at u,
then there exist m numbers i (u) R, uniquely defined, such that
However, (u) is defined by some m-tuple (1 (u), . . . , m (u)) Rm , and in view of the
definition of , the above equation is equivalent to
dJ(u) + 1 (u)d1 (u) + + m (u)dm (u) = 0.
The uniqueness of the i (u) is a consequence of the linear independence of the di (u).
The numbers i (u) involved in Theorem 29.3 are called the Lagrange multipliers associated with the constrained extremum u (again, with some minor abuse of language). The
linear independence
of the linear forms di (u) is equivalent to the fact that the Jacobian matrix (i /xj )(u) of = (1 , . . . , m ) at u has rank m. If m = 1, the linear independence
of the di (u) reduces to the condition 1 (u) 6= 0.
844
A fruitful way to reformulate the use of Lagrange multipliers is to introduce the notion
of the Lagrangian associated with our constrained extremum problem. This is the function
L : Rm R given by
L(v, ) = J(v) + 1 1 (v) + + m m (v),
with = (1 , . . . , m ). Then, observe that there exists some = (1 , . . . , m ) and some
u U such that
dJ(u) + 1 d1 (u) + + m dm (u) = 0
if and only if
dL(u, ) = 0,
or equivalently
L(u, ) = 0;
that is, iff (u, ) is a critical point of the Lagrangian L.
Indeed dL(u, ) = 0 if equivalent to
L
(u, ) = 0
v
L
(u, ) = 0
1
..
.
L
(u, ) = 0,
m
and since
and
L
(u, ) = dJ(u) + 1 d1 (u) + + m dm (u)
v
L
(u, ) = i (u),
i
we get
dJ(u) + 1 d1 (u) + + m dm (u) = 0
and
1 (u) = = m (u) = 0,
that is, u U .
If we write out explicitly the condition
dJ(u) + 1 d1 (u) + + m dm (u) = 0,
845
m
\
Ker di (v)
i=1
is the tangent space to U at v (a vector space of dimension n m). Then, the condition
dJ(v) + 1 d1 (v) + + m dm (v) = 0
implies that dJ(v) vanishes on the tangent space Tv U . Conversely, if dJ(v)(w) = 0 for
all w Tv U , this means that dJ(v) is orthogonal (in the sense of Definition 4.7) to Tv U .
Since (by Theorem 4.17 (b)) the orthogonal of Tv U is the space of linear forms spanned
by d1 (v), . . . , dm (v), it follows that dJ(v) must be a linear combination of the di (v).
Therefore, when 0 is a regular value of , Theorem 29.3 asserts that if u U is a local
extremum of J, then dJ(u) must vanish on the tangent space Tu U . We can say even more.
The subset Z(J) of given by
Z(J) = {v | J(v) = J(u)}
846
(the level set of level J(u)) is a hypersurface in , and if dJ(u) 6= 0, the zero locus of dJ(u)
is the tangent space Tu Z(J) to Z(J) at u (a vector space of dimension n 1), where
Tu Z(J) = {w Rn | dJ(u)(w) = 0}.
Consequently, Theorem 29.3 asserts that
Tu U Tu Z(J);
this is a geometric condition.
The beauty of the Lagrangian is that the constraints {i (v) = 0} have been incorporated
into the function L(v, ), and that the necessary condition for the existence of a constrained
local extremum of J is reduced to the necessary condition for the existence of a local extremum of the unconstrained L.
However, one should be careful to check that the assumptions of Theorem 29.3 are satisfied (in particular, the linear independence of the linear forms di ). For example, let
J : R3 R be given by
J(x, y, z) = x + y + z 2
and g : R3 R by
g(x, y, z) = x2 + y 2 .
J
g
(0, 0, z) = (0, 0, z),
y
y
g
(x, y, z) = 2x,
x
g
(x, y, z) = 2y,
y
J
g
(0, 0, z) = (0, 0, z),
z
z
g
(0, 0, z) = 0,
z
the partial derivatives above all vanish for x = y = 0, so at a local extremum we should also
have
J
J
J
(0, 0, z) = 0,
(0, 0, z) = 0,
(0, 0, z) = 0,
x
y
z
but this is absurd since
J
(x, y, z) = 1,
x
J
(x, y, z) = 1,
y
J
(x, y, z) = 2z.
z
The reader should enjoy finding the reason for the flaw in the argument.
847
One should also keep in mind that Theorem 29.3 gives only a necessary condition. The
(u, ) may not correspond to local extrema! Thus, it is always necessary to analyze the local
behavior of J near a critical point u. This is generally difficult, but in the case where J is
affine or quadratic and the constraints are affine or quadratic, this is possible (although not
always easy).
Let us apply the above method to the following example in which E1 = R, E2 = R,
= R2 , and
J(x1 , x2 ) = x2
(x1 , x2 ) = x21 + x22 1.
Observe that
is the unit circle, and since
2x1
,
2x2
it is clear that (x1 , x2 ) 6= 0 for every point = (x1 , x2 ) on the unit circle. If we form the
Lagrangian
L(x1 , x2 , ) = x2 + (x21 + x22 1),
Theorem 29.3 says that a necessary condition for J to have a constrained local extremum is
that L(x1 , x2 , ) = 0, so the following equations must hold:
2x1 = 0
1 + 2x2 = 0
x21 + x22 = 1.
The second equation implies that 6= 0, and then the first yields x1 = 0, so the third yields
x2 = 1, and we get two solutions:
1
= ,
2
1
= ,
2
(x1 , x2 ) = (0, 1)
(x01 , x02 ) = (0, 1).
We can check immediately that the first solution is a minimum and the second is a maximum.
The reader should look for a geometric interpretation of this problem.
Let us now consider the case in which J is a quadratic function of the form
1
J(v) = v > Av v > b,
2
where A is an n n symmetric matrix, b Rn , and the constraints are given by a linear
system of the form
Cv = d,
848
where C is an m n matrix with m < n and d Rm . We also assume that C has rank m.
In this case, the function is given by
(v) = (Cv d)> ,
because we view (v) as a row vector (and v as a column vector), and since
d(v)(w) = C > w,
the condition that the Jacobian matrix of at u have rank m is satisfied. The Lagrangian
of this problem is
1
1
L(v, ) = v > Av v > b + (Cv d)> = v > Av v > b + > (Cv d),
2
2
where is viewed as a column vector. Now, because A is a symmetric matrix, it is easy to
show that
Av b + C >
L(v, ) =
.
Cv d
Therefore, the necessary condition for contrained local extrema is
Av + C > = b
Cv = d,
which can be expressed in matrix form as
A C>
v
b
=
,
C 0
d
where the matrix of the system is a symmetric matrix. We should not be surprised to find
the system of Section 18, except for some renaming of the matrices and vectors involved.
As we know from Section 18.2, the function J has a minimum iff A is positive definite, so
in general, if A is only a symmetric matrix, the critical points of the Lagrangian do not
correspond to extrema of J.
We now investigate conditions for the existence of extrema involving the second derivative
of J.
29.2
For the sake of brevity, we consider only the case of local minima; analogous results are
obtained for local maxima (replace J by J, since maxu J(u) = minu J(u)). We begin
with a necessary condition for an unconstrained local minimum.
849
Proposition 29.4. Let E be a normed vector space and let J : R be a function, with
some open subset of E. If the function J is differentiable in , if J has a second derivative
D2 J(u) at some point u , and if J has a local minimum at u, then
D2 J(u)(w, w) 0 for all w E.
Proof. Pick any nonzero vector w E. Since is open, for t small enough, u + tw and
J(u + tw) J(u), so there is some open interval I R such that
u + tw and J(u + tw) J(u)
for all t I. Using the TaylorYoung formula and the fact that we must have dJ(u) = 0
since J has a local minimum at u, we get
0 J(u + tw) J(u) =
t2 2
D J(u)(w, w) + t2 kwk2 (tw),
2
When E = Rn , Proposition 29.4 says that a necessary condition for having a local
minimum is that the Hessian 2 J(u) be positive semidefinite (it is always symmetric).
We now give sufficient conditions for the existence of a local minimum.
Theorem 29.5. Let E be a normed vector space, let J : R be a function with some
open subset of E, and assume that J is differentiable in and that dJ(u) = 0 at some point
u . The following properties hold:
(1) If D2 J(u) exists and if there is some number R such that > 0 and
D2 J(u)(w, w) kwk2
then J has a strict local minimum at u.
for all w E,
850
(2) If D2 J(v) exists for all v and if there is a ball B centered at u such that
D2 J(v)(w, w) 0 for all v B and all w E,
then J has a local minimum at u.
Proof. (1) Using the formula of TaylorYoung, for every vector w small enough, we can write
1
J(u + w) J(u) = D2 J(u)(w, w) + kwk2 (w)
2
1
+ (w) kwk2
2
with limw70 (w) = 0. Consequently if we pick r > 0 small enough that |(w)| < for all w
with kwk < r, then J(u + w) > J(u) for all u + w B, where B is the open ball of center u
and radius r. This proves that J has a local strict minimum at u.
(2) The formula of TaylorMaclaurin shows that for all u + w B, we have
1
J(u + w) = J(u) + D2 J(v)(w, w) J(u),
2
for some v ]u, w + w[.
There are no converses of the two assertions of Theorem 29.5. However, there is a
condition on D2 J(u) that implies the condition of part (1). Since this condition is easier to
state when E = Rn , we begin with this case.
Recall that a nn symmetric matrix A is positive definite if x> Ax > 0 for all x Rn {0}.
In particular, A must be invertible.
Proposition 29.6. For any symmetric matrix A, if A is positive definite, then there is some
> 0 such that
x> Ax kxk2 for all x Rn .
Proof. Pick any norm in Rn (recall that all norms on Rn are equivalent). Since the unit
sphere S n1 = {x Rn | kxk = 1} is compact and since the function f (x) = x> Ax is never
zero on S n1 , the function f has a minimum > 0 on S n1 . Using the usual trick that
x = kxk (x/ kxk) for every nonzero vector x Rn and the fact that the inequality of the
proposition is trivial for x = 0, from
x> Ax for all x with kxk = 1,
we get
as claimed.
x> Ax kxk2
for all x Rn ,
851
We can combine Theorem 29.5 and Proposition 29.6 to obtain a useful sufficient condition
for the existence of a strict local minimum. First let us introduce some terminology.
Given a function J : R as before, say that a point u is a nondegenerate critical
point if dJ(u) = 0 and if the Hessian matrix 2 J(u) is invertible.
Proposition 29.7. Let J : R be a function defined on some open subset Rn . If
J is differentiable in and if some point u is a nondegenerate critical point such that
2 J(u) is positive definite, then J has a strict local minimum at u.
Remark: It is possible to generalize Proposition 29.7 to infinite-dimensional spaces by finding a suitable generalization of the notion of a nondegenerate critical point. Firstly, we
assume that E is a Banach space (a complete normed vector space). Then, we define the
dual E 0 of E as the set of continuous linear forms on E, so that E 0 = L(E; R). Following
Lang, we use the notation E 0 for the space of continuous linear forms to avoid confusion
with the space E = Hom(E, R) of all linear maps from E to R. A continuous bilinear map
: E E R in L2 (E, E; R) yields a map from E to E 0 given by
(u) = u ,
where u E 0 is the linear form defined by
u (v) = (u, v).
It is easy to check that u is continuous and that the map is continuous. Then, we say
that is nondegenerate iff : E E 0 is an isomorphism of Banach spaces, which means
that is invertible and that both and 1 are continuous linear maps. Given a function
J : R differentiable on as before (where is an open subset of E), if D2 J(u) exists
for some u , we say that u is a nondegenerate critical point if dJ(u) = 0 and if D2 J(u) is
nondegenerate. Of course, D2 J(u) is positive definite if D2 J(u)(w, w) > 0 for all w E {0}.
Using the above definition, Proposition 29.6 can be generalized to a nondegenerate positive definite bilinear form (on a Banach space) and Theorem 29.7 can also be generalized to
the situation where J : R is defined on an open subset of a Banach space. For details
and proofs, see Cartan [20] (Part I Chapter 8) and Avez [6] (Chapter 8 and Chapter 10).
In the next section, we make use of convexity; both on the domain and on the function
J itself.
29.3
852
Definition 29.3. Given any real vector space E, we say that a subset C of E is convex if
either C = or if for every pair of points u, v C,
(1 )u + v C
H,c
= {u E | (u) c},
are convex. Any intersection of halfspaces is convex. More generally, any intersection of
convex sets is convex.
Linear forms are convex functions (but not strictly convex). Any norm k k : E R+ is
a convex function. The max function,
max(x1 , . . . , xn ) = max{x1 , . . . , xn }
is convex on Rn . The exponential x 7 ecx is strictly convex for any c 6= 0 (c R).
The logarithm function is concave on R+ {0}, and the log-determinant function log det is
concave on the set of symmetric positive definite matrices. This function plays an important
role in convex optimization. An excellent exposition of convexity and its applications to
optimization can be found in Boyd [17].
Here is a necessary condition for a function to have a local minimum with respect to a
convex subset U .
853
Theorem 29.8. (Necessary condition for a local minimum on a convex subset) Let J : R
be a function defined on some open subset of a normed vector space E and let U be
a nonempty convex subset. Given any u U , if dJ(u) exists and if J has a local minimum
in u with respect to U , then
dJ(u)(v u) 0 for all v U .
Proof. Let v = u + w be an arbitrary point in U . Since U is convex, we have u + tw U for
all t such that 0 t 1. Since dJ(u) exists, we can write
J(u + tw) J(u) = dJ(u)(tw) + ktwk (tw)
with limt70 (tw) = 0. However, because 0 t 1,
J(u + tw) J(u) = t(dJ(u)(w) + kwk (tw))
and since u is a local minimum with respect to U , we have J(u + tw) J(u) 0, so we get
t(dJ(u)(w) + kwk (tw)) 0.
The above implies that dJ(u)(w) 0, because otherwise we could pick t > 0 small enough
so that
dJ(u)(w) + kwk (tw) < 0,
a contradiction. Since the argument holds for all v = u + w U , the theorem is proved.
Observe that the convexity of U is a substitute for the use of Lagrange multipliers, but
we now have to deal with an inequality instead of an equality.
Consider the special case where U is a subspace of E. In this case, since u U we have
2u U , and for any u + w U , we must have 2u (u + w) = u w U . The previous
theorem implies that dJ(u)(w) 0 and dJ(u)(w) 0, that is, dJ(w) 0, so dJ(w) = 0.
Since the argument holds for w U (because U is a subspace, if u, w U , then u + w U ),
we conclude that
dJ(u)(w) = 0 for all w U .
We will now characterize convex functions when they have a first derivative or a second
derivative.
Proposition 29.9. (Convexity and first derivative) Let f : R be a function differentiable on some open subset of a normed vector space E and let U be a nonempty
convex subset.
(1) The function f is convex on U iff
f (v) f (u) + df (u)(v u) for all u, v U .
854
It follows that
f ((1 )u + v) f (u)
f (v) f (u).
70
df (u)(v u) = lim
If f is strictly convex, the above reasoning does not work, because a strict inequality is not
necessarily preserved by passing to the limit. We have recourse to the following trick: For
any such that 0 < < 1, observe that
(1 )u + v = u + (v u) =
u + (u + (v u)).
f (u) + f (u + (v u)).
f (u + (v u)) f (u)
< f (v) f (u),
855
Proof. First, assume that the inequality in condition (1) is satisfied. For any two distinct
points u, v U , the formula of TaylorMaclaurin yields
1
f (v) f (u) df (u)(v u) = D2 (w)(v u, v u)
2
2
= D2 (w)(v w, v w),
2
for some w = (1 )u + v = u + (v u) with 0 < < 1, and with = 1/(1 ) > 0,
so that v u = (v w). Since D2 f (u)(v w, v w) 0 for all u, w U , we conclude by
applying Theorem 29.9(1).
856
Similarly, if (2) holds, the above reasoning and Theorem 29.9(2) imply that f is strictly
convex.
To prove the necessary condition in (1), define g : R by
g(v) = f (v) df (u)(v),
where u U is any point considered fixed. If f is convex and if f has a local minimum at u
with respect to U , since
g(v) g(u) = f (v) f (u) df (u)(v u),
Theorem 29.9 implies that f (v) f (u) df (u)(v u) 0, which implies that g has a local
minimum at u with respect to all v U . Therefore, we have dg(u) = 0. Observe that g is
twice differentiable in and D2 g(u) = D2 f (u), so the formula of TaylorYoung yields for
every v = u + w U and all t with 0 t 1,
t2 2
0 g(u + tw) g(u) = D (u)(tw, tw) + ktwk2 (tw)
2
t2 2
= (D (u)(w, w) + 2 kwk2 (wt)),
2
with limt70 (wt) = 0, and for t small enough, we must have D2 (u)(w, w) 0, as claimed.
The converse of Theorem 29.10 (2) is false as we see by considering the function f given
by f (x) = x4 . On the other hand, if f is a quadratic function of the form
1
f (u) = u> Au u> b
2
where A is a symmetric matrix, we know that
df (u)(v) = v > (Au b),
so
1
1
f (v) f (u) df (u)(v u) = v > Av v > b u> Au + u> b (v u)> (Au b)
2
2
1 >
1 >
= v Av u Au (v u)> Au
2
2
1 >
1 >
= v Av + u Au v > Au
2
2
1
= (v u)> A(v u).
2
Therefore, Theorem 29.9 implies that if A is positive semidefinite, then f is convex and if A
is positive definite, then f is strictly convex. The converse follows by Theorem 29.10.
We conclude this section by applying our previous theorems to convex functions defined
on convex subsets. In this case, local minima (resp. local maxima) are global minima (resp.
global maxima).
857
Definition 29.4. Let f : E R be any function defined on some normed vector space (or
more generally, any set). For any u E, we say that f has a minimum in u (resp. maximum
in u) if
f (u) f (v) (resp. f (u) f (v)) for all v E.
We say that f has a strict minimum in u (resp. strict maximum in u) if
f (u) < f (v) (resp. f (u) > f (v)) for all v E {u}.
If U E is a subset of E and u U , we say that f has a minimum in u (resp. strict
minimum in u) with respect to U if
f (u) f (v)
for all v U
and similarly for a maximum in u (resp. strict maximum in u) with respect to U with
changed to and < to >.
Sometimes, we say global maximum (or minimum) to stress that a maximum (or a minimum) is not simply a local maximum (or minimum).
Theorem 29.11. Given any normed vector space E, let U be any nonempty convex subset
of E.
(1) For any convex function J : U R, for any u U , if J has a local minimum at u in
U , then J has a (global) minimum at u in U .
(2) Any strict convex function J : U R has at most one minimum (in U ), and if it does,
then it is a strict minimum (in U ).
(3) Let J : R be any function defined on some open subset of E with U and
assume that J is convex on U . For any point u U , if dJ(u) exists, then J has a
minimum in u with respect to U iff
dJ(u)(v u) 0
for all v U .
(4) If the convex subset U in (3) is open, then the above condition is equivalent to
dJ(u) = 0.
Proof. (1) Let v = u + w be any arbitrary point in U . Since J is convex, for all t with
0 t 1, we have
J(u + tw) = J(u + t(v u)) (1 t)J(u) + tJ(v),
which yields
J(u + tw) J(u) t(J(v) J(u)).
858
Because J has a local minimum in u, there is some t0 with 0 < t0 < 1 such that
0 J(u + t0 w) J(u),
which implies that J(v) J(u) 0.
(2) If J is strictly convex, the above reasoning with w 6= 0 shows that there is some t0
with 0 < t0 < 1 such that
0 J(u + t0 w) J(u) < t0 (J(v) J(u)),
which shows that u is a strict global minimum (in U ), and thus that it is unique.
(3) We already know from Theorem 29.9 that the condition dJ(u)(v u) 0 for all v U
is necessary (even if J is not convex). Conversely, because J is convex, careful inspection of
the proof of part (1) of Proposition 29.9 shows that only the fact that dJ(u) exists in needed
to prove that
J(v) J(u) dJ(u)(v u) for all v U ,
and if
dJ(u)(v u) 0 for all v U ,
then
J(v) J(u) 0 for all v U ,
as claimed.
(4) If U is open, then for every u U we can find an open ball B centered at u of radius
small enough so that B U . Then, for any w 6= 0 such that kwk < , we have both
v = u + w B and v 0 = u w B, so condition (3) implies that
dJ(u)(w) 0 and dJ(u)(w) 0,
which yields
dJ(u)(w) = 0.
Since the above holds for all w 6= 0 such such that kwk < and since dJ(u) is linear, we
leave it to the reader to fill in the details of the proof that dJ(u) = 0.
Theorem 29.11 can be used to rederive the fact that the least squares solutions of a linear
system Ax = b (where A is an m n matrix) are given by the normal equation
A> Ax = A> b.
For this, we condider the quadratic function
J(v) =
1
1
kAv bk22 kbk22 ,
2
2
859
29.4. SUMMARY
and our least squares problem is equivalent to finding the minima of J on Rn . A computation
reveals that
1
J(v) = v > A> Av v > B > b,
2
and so
dJ(u) = A> Au B > b.
Since B > B is positive semidefinite, the function J is convex, and Theorem 29.11(4) implies
that the minima of J are the solutions of the equation
A> Au A> b = 0.
The considerations in this chapter reveal the need to find methods for finding the zeros
of the derivative map
dJ : E 0 ,
where is some open subset of a normed vector space E and E 0 is the space of all continuous
linear forms on E (a subspace of E ). Generalizations of Newtons method yield such methods
and they are the objet of the next chapter.
29.4
Summary
The main concepts and results of this chapter are listed below:
Local minimum, local maximum, local extremum, strict local minimum, strict local
maximum.
Necessary condition for a local extremum involving the derivative; critical point.
Local minimum with respect to a subset U , local maximum with respect to a subset U ,
local extremum with respect to a subset U .
Constrained local extremum.
Necessary condition for a constrained extremum.
Necessary condition for a constrained extremum in terms of Lagrange multipliers.
Lagrangian.
Critical points of a Lagrangian.
Necessary condition of an unconstrained local minimum involving the second-order
derivative.
Sufficient condition for a local minimum involving the second-order derivative.
860
Chapter 30
Newtons Method and its
Generalizations
30.1
f (xk )
,
f 0 (xk )
862
for all k 0, provided that f 0 (xk ) 6= 0. The idea is to define xk+1 as the intersection of the
x-axis with the tangent line to the graph of the function x 7 f (x) at the point (xk , f (xk )).
Indeed, the equation of this tangent line is
y f (xk ) = f 0 (xk )(x xk ),
and its intersection with the x-axis is obtained for y = 0, which yields
x = xk
f (xk )
,
f 0 (xk )
as claimed.
For example, if > 0 and f (x) = x2 , Newtons method yields the sequence
1
xk+1 =
xk +
2
xk
to compute the square root of . It can be shown that the method converges to for
any x0 > 0. Actually, the method also converges when x0 < 0! Find out what is the limit.
The case of a real function suggests the following method for finding the zeros of a
function f : Y , with X: given a starting point x0 , the sequence (xk ) is defined
by
xk+1 = xk (f 0 (xk ))1 (f (xk ))
for all k 0.
These are rather demanding conditions but there are sufficient conditions that guarantee
that they are met. Another practical issue is that irt may be very cotstly to compute
(f 0 (xk ))1 at every iteration step. In the next section, we investigate generalizations of
Newtons method which address the issues that we just discussed.
30.2
863
In general, it is very costly to compute J(f )(xk ) at each iteration and then to solve
the corresponding linear system. If the method converges, the consecutive vectors xk should
differ only a little, as also the corresponding matrices J(f )(xk ). Thus, we are led to a variant
of Newtons method which consists in keeping the same matrix for p consecutive steps (where
p is some fixed integer 2):
xk+1 = xk (f 0 (x0 ))1 (f (xk )),
xk+1 = xk (f 0 (xp ))1 (f (xk )),
..
.
xk+1 = xk (f 0 (xrp ))1 (f (xk )),
..
.
0k p1
p k 2p 1
rp k (r + 1)p 1
It is also possible to set p = , that is, to use the same matrix f 0 (x0 ) for all iterations,
which leads to iterations of the form
xk+1 = xk (f 0 (x0 ))1 (f (xk )),
k 0,
k 0.
In the last two cases, if possible, we use an LU factorization of f 0 (x0 ) or A0 to speed up the
method. In some cases, it may even possible to set A0 = I.
The above considerations lead us to the definition of a generalized Newton method , as in
Ciarlet [24] (Chapter 7). Recall that a linear map f L(E; F ) is called an isomorphism iff
f is continuous, bijective, and f 1 is also continuous.
Definition 30.1. If X and Y are two normed vector spaces and if f : Y is a function
from some open subset of X, a generalized Newton method for finding zeros of f consists
of
(1) A sequence of families (Ak (x)) of linear isomorphisms from X to Y , for all x and
all integers k 0;
(2) Some starting point x0 ;
864
k 0,
`=k
` = min{rp, k}, if rp k (r + 1)p 1, r 0
`=0
(3)
kf (x0 )k
r
(1 ).
M
865
0`k
is entirely contained within B and converges to a zero a of f , which is the only zero of f in
B. Furthermore, the convergence is geometric, which means that
kxk ak
kx1 x0 k k
.
1
A proof of Theorem 30.1 can be found in Ciarlet [24] (Section 7.5). It is not really difficult
but quite technical.
If we assume that we already know that some element a is a zero of f , the next
theorem gives sufficient conditions for a special version of a generalized Newton method to
converge. For this special method, the linear isomorphisms Ak (x) are independent of x .
Theorem 30.2. Let X be a Banach space, and let f : Y be differentiable on the open
subset X. If a is a point such that f (a) = 0, if f 0 (a) is a linear isomorphism, and
if there is some with 0 < < 1/2 such that
sup kAk f 0 (a)kL(X;Y )
k0
k(f 0 (a))1 k
,
L(Y ;X)
then there is a closed ball B of center a such that for every x0 B, the sequence (xk ) defined
by
xk+1 = xk A1
k 0,
k (f (xk )),
is entirely contained within B and converges to a, which is the only zero of f in B. Furthermore, the convergence is geometric, which means that
kxk ak k kx0 ak ,
for some < 1.
A proof of Theorem 30.2 can be also found in Ciarlet [24] (Section 7.5).
For the sake of completeness, we state a version of the NewtonKantorovich theorem,
which corresponds to the case where Ak (x) = f 0 (x). In this instance, a stronger result can
be obtained especially regarding upper bounds, and we state a version due to Gragg and
Tapia which appears in Problem 7.5-4 of Ciarlet [24].
Theorem 30.3. (NewtonKantorovich) Let X be a Banach space, and let f : Y be
differentiable on the open subset X. Assume that there exist three positive constants
, , and a point x0 such that
1
0 < ,
2
866
and if we let
1 2
1 + 1 2
+ =
B = {x X | kx x0 k < }
+ = {x | kx x0 k < + },
x,y+
Then, f 0 (x) is isomorphism of L(X; Y ) for all x B, and the sequence defined by
xk+1 = xk (f 0 (xk ))1 (f (xk )),
k0
is entirely contained within the ball B and converges to a zero a of f which is the only zero
of f in + . Finally, if we write = /+ , then we have the following bounds:
2 1 2 2k
1
kx1 x0 k
if <
kxk ak
2k
1
2
kx1 x0 k
1
kxk ak
if = ,
k1
2
2
and
2 kxk+1 xk k
p
kxk ak 2k1 kxk xk1 k .
2k
2k
2
1 + (1 + 4 (1 + ) )
We can now specialize Theorems 30.1 and 30.2 to the search of zeros of the derivative
f : E 0 , of a function f : R, with E. The second derivative J 00 of J is a
continuous bilinear form J 00 : E E R, but is is convenient to view it as a linear map
in L(E, E 0 ); the continuous linear form J 00 (u) is given by J 00 (u)(v) = J 00 (u, v). In our next
theorem, we assume that the Ak (x) are isomorphisms in L(E, E 0 ).
0
Theorem 30.4. Let E be a Banach space, let J : R be twice differentiable on the open
subset E, and assume that there are constants r, M, > 0 such that if we let
B = {x E | kx x0 k r} ,
then
867
(3)
kJ 0 (x0 )k
r
(1 ).
M
0`k
is entirely contained within B and converges to a zero a of J 0 , which is the only zero of J 0
in B. Furthermore, the convergence is geometric, which means that
kxk ak
kx1 x0 k k
.
1
In the next theorem, we assume that the Ak (x) are isomorphisms in L(E, E 0 ) that are
independent of x .
Theorem 30.5. Let E be a Banach space, and let J : R be twice differentiable on the
open subset E. If a is a point such that J 0 (a) = 0, if J 00 (a) is a linear isomorphism,
and if there is some with 0 < < 1/2 such that
sup kAk J 00 (a)kL(E;E 0 )
k0
,
k(J 00 (a))1 kL(E 0 ;E)
then there is a closed ball B of center a such that for every x0 B, the sequence (xk ) defined
by
0
xk+1 = xk A1
k 0,
k (J (xk )),
is entirely contained within B and converges to a, which is the only zero of J 0 in B. Furthermore, the convergence is geometric, which means that
kxk ak k kx0 ak ,
for some < 1.
When E = Rn , the Newton method given by Theorem 30.4 yield an itereation step of
the form
xk+1 = xk A1
0 ` k,
k (x` )J(xk ),
868
k 0,
As remarked in [24] (Section 7.5), generalized Newton methods have a very wide range
of applicability. For example, various versions of gradient descent methods can be viewed as
instances of Newton methods.
Newtons method also plays an important role in convex optimization, in particular,
interior-point methods. A variant of Newtons method dealing with equality constraints has
been developed. We refer the reader to Boyd and Vandenberghe [17], Chapters 10 and 11,
for a comprehensive exposition of these topics.
30.3
Summary
The main concepts and results of this chapter are listed below:
Newtons method for functions f : R R.
Generalized Newton methods.
The Newton-Kantorovich theorem.
Chapter 31
Appendix: Zorns Lemma; Some
Applications
31.1
Zorns lemma is a particularly useful form of the axiom of choice, especially for algebraic
applications. Readers who want to learn more about Zorns lemma and its applications to
algebra should consult either Lang [67], Appendix 2, 2 (pp. 878-884) and Chapter III, 5
(pp. 139-140), or Artin [4], Appendix 1 (pp. 588-589). For the logical ramifications of
Zorns lemma and its equivalence with the axiom of choice, one should consult Schwartz
[93], (Vol. 1), Chapter I, 6, or a text on set theory such as Enderton [34], Suppes [107], or
Kuratowski and Mostowski [66].
is
A pair (S, ), where is a partial order on S, is called a partially ordered set or poset.
Given a poset, (S, ), a subset, C, of S is totally ordered or a chain if for every pair of
elements x, y C, either x y or y x. The empty set is trivially a chain. A subset, P ,
(empty or not) of S is bounded if there is some b S so that x b for all x P . Observe
that the empty subset of S is bounded if and only if S is nonempty. A maximal element of
P is an element, m P , so that m x implies that m = x, for all x P . Zorns lemma
can be stated as follows:
Lemma 31.1. Given a partially ordered set, (S, ), if every chain is bounded, then S has a
maximal element.
869
870
Proof. See any of Schwartz [93], Enderton [34], Suppes [107], or Kuratowski and Mostowski
[66].
Remark: As we noted, the hypothesis of Zorns lemma implies that S is nonempty (since
the empty set must be bounded). A partially ordered set such that every chain is bounded
is sometimes called inductive.
We now give some applications of Zorns lemma.
31.2
Using Zorns lemma, we can prove that Theorem 2.9 holds for arbitrary vector spaces, and
not just for finitely generated vector spaces, as promised in Chapter 2.
Theorem 31.2. Given any family, S = (ui )iI , generating a vector space E and any linearly
independent subfamily, L = (uj )jJ , of S (where J I), there is a basis, B, of E such that
L B S.
Proof. Consider the set L of linearly independent families, B, such that L B S. Since
L L, this set is nonempty. We claim that L is inductive.
Consider any chain, (Bl )l , of
S
linearly independent families Bl in L, and look at B = l Bl . The family B is of the form
B = (vh )hH , for some index set H, and it must be linearly independent. Indeed, if this was
not true, there would be some family (h )hH of scalars, of finite support, so that
X
h vh = 0,
hH
where not all h are zero. Since B = l Bl and only finitely many h are nonzero, there
is a finite subset, F , of , so that vh Bfh iff h 6= 0. But (Bl )l is a chain, and if we let
f = max{fh | fh F }, then vh Bf , for all vh for which h 6= 0. Thus,
X
h vh = 0
hH
31.3
871
Let A be a commutative ring with identity element. Recall that an ideal A in A is a proper
ideal if A 6= A. The following theorem holds:
Theorem 31.3. Given any proper ideal, A A, there is a maximal ideal, B, containing A.
Proof. Let I be the set of all proper ideals, B, in A that contain A. The set I is nonempty,
since A I. We claim that I S
is inductive. Consider any chain (Ai )iI of ideals Ai in A.
One can easily check that B = iI Ai is an ideal. Furthermore, B is a proper ideal, since
otherwise, the identity element 1 would belong to B = A, and so, we would have 1 Ai for
some i, which would imply Ai = A, a contradiction. Also, B is obviously an upper bound
for all the Ai s. By Zorns lemma (Lemma 31.1), the set I has a maximal element, say B,
and B is a maximal ideal containing A.
872
Bibliography
[1] Lars V. Ahlfors and Leo Sario. Riemann Surfaces. Princeton Math. Series, No. 2.
Princeton University Press, 1960.
[2] George E. Andrews, Richard Askey, and Ranjan Roy. Special Functions. Cambridge
University Press, first edition, 2000.
[3] Emil Artin. Geometric Algebra. Wiley Interscience, first edition, 1957.
[4] Michael Artin. Algebra. Prentice Hall, first edition, 1991.
[5] M. F. Atiyah and I. G. Macdonald. Introduction to Commutative Algebra. Addison
Wesley, third edition, 1969.
[6] A. Avez. Calcul Differentiel. Masson, first edition, 1991.
[7] Sheldon Axler. Linear Algebra Done Right. Undergraduate Texts in Mathematics.
Springer Verlag, second edition, 2004.
[8] Marcel Berger. Geometrie 1. Nathan, 1990. English edition: Geometry 1, Universitext,
Springer Verlag.
[9] Marcel Berger. Geometrie 2. Nathan, 1990. English edition: Geometry 2, Universitext,
Springer Verlag.
[10] Marcel Berger and Bernard Gostiaux. Geometrie differentielle: varietes, courbes et
surfaces. Collection Mathematiques. Puf, second edition, 1992. English edition: Differential geometry, manifolds, curves, and surfaces, GTM No. 115, Springer Verlag.
[11] Rolf Berndt. An Introduction to Symplectic Geometry. Graduate Studies in Mathematics, Vol. 26. AMS, first edition, 2001.
[12] J.E. Bertin. Alg`ebre lineaire et geometrie classique. Masson, first edition, 1981.
[13] Nicolas Bourbaki. Alg`ebre, Chapitre 9. Elements de Mathematiques. Hermann, 1968.
[14] Nicolas Bourbaki. Alg`ebre, Chapitres 1-3. Elements de Mathematiques. Hermann,
1970.
873
874
BIBLIOGRAPHY
[15] Nicolas Bourbaki. Alg`ebre, Chapitres 4-7. Elements de Mathematiques. Masson, 1981.
[16] Nicolas Bourbaki. Espaces Vectoriels Topologiques. Elements de Mathematiques. Masson, 1981.
[17] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University
Press, first edition, 2004.
[18] Glen E Bredon. Topology and Geometry. GTM No. 139. Springer Verlag, first edition,
1993.
[19] G. Cagnac, E. Ramis, and J. Commeau. Mathematiques Speciales, Vol. 3, Geometrie.
Masson, 1965.
[20] Henri Cartan. Cours de Calcul Differentiel. Collection Methodes. Hermann, 1990.
[21] Henri Cartan. Differential Forms. Dover, first edition, 2006.
[22] Claude Chevalley. The Algebraic Theory of Spinors and Clifford Algebras. Collected
Works, Vol. 2. Springer, first edition, 1997.
[23] Yvonne Choquet-Bruhat, Cecile DeWitt-Morette, and Margaret Dillard-Bleick. Analysis, Manifolds, and Physics, Part I: Basics. North-Holland, first edition, 1982.
[24] P.G. Ciarlet. Introduction to Numerical Matrix Analysis and Optimization. Cambridge
University Press, first edition, 1989. French edition: Masson, 1994.
[25] Timothee Cour and Jianbo Shi. Solving markov random fields with spectral relaxation.
In Marita Meila and Xiaotong Shen, editors, Artifical Intelligence and Statistics. Society for Artificial Intelligence and Statistics, 2007.
[26] H.S.M. Coxeter. Introduction to Geometry. Wiley, second edition, 1989.
[27] James W. Demmel. Applied Numerical Linear Algebra. SIAM Publications, first edition, 1997.
[28] Jean Dieudonne. Alg`ebre Lineaire et Geometrie Elementaire. Hermann, second edition,
1965.
[29] Jacques Dixmier. General Topology. UTM. Springer Verlag, first edition, 1984.
[30] Manfredo P. do Carmo. Differential Geometry of Curves and Surfaces. Prentice Hall,
1976.
[31] Manfredo P. do Carmo. Riemannian Geometry. Birkhauser, second edition, 1992.
[32] David S. Dummit and Richard M. Foote. Abstract Algebra. Wiley, second edition,
1999.
BIBLIOGRAPHY
875
[33] Gerald A. Edgar. Measure, Topology, and Fractal Geometry. Undergraduate Texts in
Mathematics. Springer Verlag, first edition, 1992.
[34] Herbert B. Enderton. Elements of Set Theory. Academic Press, 1997.
[35] Charles L. Epstein. Introduction to the Mathematics of Medical Imaging. SIAM, second
edition, 2007.
[36] Gerald Farin. Curves and Surfaces for CAGD. Academic Press, fourth edition, 1998.
[37] Olivier Faugeras. Three-Dimensional Computer Vision, A geometric Viewpoint. the
MIT Press, first edition, 1996.
[38] James Foley, Andries van Dam, Steven Feiner, and John Hughes. Computer Graphics.
Principles and Practice. Addison-Wesley, second edition, 1993.
[39] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach. Prentice
Hall, first edition, 2002.
[40] Jean Fresnel. Methodes Modernes En Geometrie. Hermann, first edition, 1998.
[41] William Fulton. Algebraic Topology, A first course. GTM No. 153. Springer Verlag,
first edition, 1995.
[42] William Fulton and Joe Harris. Representation Theory, A first course. GTM No. 129.
Springer Verlag, first edition, 1991.
[43] Jean H. Gallier. Curves and Surfaces In Geometric Modeling: Theory And Algorithms.
Morgan Kaufmann, 1999.
[44] Jean H. Gallier. Geometric Methods and Applications, For Computer Science and
Engineering. TAM, Vol. 38. Springer, second edition, 2011.
[45] Walter Gander, Gene H. Golub, and Urs von Matt. A constrained eigenvalue problem.
Linear Algebra and its Applications, 114/115:815839, 1989.
[46] F.R. Gantmacher. The Theory of Matrices, Vol. I. AMS Chelsea, first edition, 1977.
[47] Roger Godement. Cours dAlg`ebre. Hermann, first edition, 1963.
[48] Gene H. Golub. Some modified eigenvalue problems. SIAM Review, 15(2):318334,
1973.
[49] H. Golub, Gene and F. Van Loan, Charles. Matrix Computations. The Johns Hopkins
University Press, third edition, 1996.
[50] A. Gray. Modern Differential Geometry of Curves and Surfaces. CRC Press, second
edition, 1997.
876
BIBLIOGRAPHY
[51] Donald T. Greenwood. Principles of Dynamics. Prentice Hall, second edition, 1988.
[52] Larry C. Grove. Classical Groups and Geometric Algebra. Graduate Studies in Mathematics, Vol. 39. AMS, first edition, 2002.
[53] Jacques Hadamard. Lecons de Geometrie Elementaire. I Geometrie Plane. Armand
Colin, thirteenth edition, 1947.
[54] Jacques Hadamard. Lecons de Geometrie Elementaire. II Geometrie dans lEspace.
Armand Colin, eighth edition, 1949.
[55] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer, second edition, 2009.
[56] D. Hilbert and S. Cohn-Vossen. Geometry and the Imagination. Chelsea Publishing
Co., 1952.
[57] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press,
first edition, 1990.
[58] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge University Press, first edition, 1994.
[59] Nathan Jacobson. Basic Algebra I. Freeman, second edition, 1985.
[60] Ramesh Jain, Rangachar Katsuri, and Brian G. Schunck. Machine Vision. McGrawHill, first edition, 1995.
[61] J
urgen Jost. Riemannian Geometry and Geometric Analysis. Universitext. Springer
Verlag, fourth edition, 2005.
[62] Hoffman Kenneth and Kunze Ray. Linear Algebra. Prentice Hall, second edition, 1971.
[63] D. Kincaid and W. Cheney. Numerical Analysis. Brooks/Cole Publishing, second
edition, 1996.
[64] Anthony W. Knapp. Lie Groups Beyond an Introduction. Progress in Mathematics,
Vol. 140. Birkhauser, second edition, 2002.
[65] Erwin Kreyszig. Differential Geometry. Dover, first edition, 1991.
[66] K. Kuratowski and A. Mostowski. Set Theory. Studies in Logic, Vol. 86. Elsevier,
1976.
[67] Serge Lang. Algebra. Addison Wesley, third edition, 1993.
[68] Serge Lang. Differential and Riemannian Manifolds. GTM No. 160. Springer Verlag,
third edition, 1995.
BIBLIOGRAPHY
877
[69] Serge Lang. Real and Functional Analysis. GTM 142. Springer Verlag, third edition,
1996.
[70] Serge Lang. Undergraduate Analysis. UTM. Springer Verlag, second edition, 1997.
[71] Peter Lax. Linear Algebra and Its Applications. Wiley, second edition, 2007.
[72] N. N. Lebedev. Special Functions and Their Applications. Dover, first edition, 1972.
[73] Saunders Mac Lane and Garrett Birkhoff. Algebra. Macmillan, first edition, 1967.
[74] Ib Madsen and Jorgen Tornehave. From Calculus to Cohomology. De Rham Cohomology and Characteristic Classes. Cambridge University Press, first edition, 1998.
[75] M.-P. Malliavin. Alg`ebre Commutative. Applications en Geometrie et Theorie des
Nombres. Masson, first edition, 1985.
[76] Jerrold E. Marsden and J.R. Hughes, Thomas. Mathematical Foundations of Elasticity.
Dover, first edition, 1994.
[77] William S. Massey. Algebraic Topology: An Introduction. GTM No. 56. Springer
Verlag, second edition, 1987.
[78] William S. Massey. A Basic Course in Algebraic Topology. GTM No. 127. Springer
Verlag, first edition, 1991.
[79] Dimitris N. Metaxas. Physics-Based Deformable Models. Kluwer Academic Publishers,
first edition, 1997.
[80] Carl D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, first edition, 2000.
[81] John W. Milnor. Topology from the Differentiable Viewpoint. The University Press of
Virginia, second edition, 1969.
[82] John W. Milnor and James D. Stasheff. Characteristic Classes. Annals of Math. Series,
No. 76. Princeton University Press, first edition, 1974.
[83] Shigeyuki Morita. Geometry of Differential Forms. Translations of Mathematical
Monographs No 201. AMS, first edition, 2001.
[84] James R. Munkres. Topology, a First Course. Prentice Hall, first edition, 1975.
[85] James R. Munkres. Analysis on Manifolds. Addison Wesley, 1991.
[86] Ivan Niven, Herbert S. Zuckerman, and Hugh L. Montgomery. An Introduction to the
Theory of Numbers. Wiley, fifth edition, 1991.
878
BIBLIOGRAPHY
BIBLIOGRAPHY
879