algpoly
algpoly
Determinants
(u × v) • w = aku × vk,
and the figures provide a visual explanation for the fact that (u × v) • [w + λu + µv] =
(u × v) • w. Note that Cavalieri’s principle provides a visual proof for additivity!
The behaviour of this oriented volume with respect to permutation of the arguments is
well-known: its sign changes for transpositions, i.e. whenever we swap two vectors, e.g.
swapping u, w we get (u × v) • w = −(w × v) • u, etc.. In fact, should we wish to consider
u, w to form the base of the parallelopiped, we write
(u × v) • w = (w × u) • v = −(u × w) • v,
and proceed accordingly.
Inspired by this example, one may list a set of axioms that a function called ‘signed
volume’ or ‘oriented volume’ should satisfy, and then come up with the determinant.
3
1.2 Alternate n-linear forms on K n
We have working knowledge of 2 × 2 and 3 × 3 determinants. We shall guide ourselves by
this geometrical knowledge to construct an n-dimensional generalisation, which shall be
the determinant of a square matrix of order n. First we shall list the essential properties
of the determinant.
n
Let V : Rn × · · · × Rn → R be an ‘oriented volume function’. From our 2− and
3−dimensional experience, we distilled the following rules.
DET0) (normalization) V (e1 , e2 , . . . , en ) = 1;
DET1) (multilinearity) the function V is n-linear, i.e. linear on each variable if we fix
the arguments in the others. To wit,
and in general
DET2) (alternate) the function V is alternate, i.e. if we swap two arguments, the value
of V changes precisely in a sign:
V (u1 , . . . , v, . . . , v, . . . , un ) = 0.
4
Let us retrieve the determinant in the 2×2 e 3×3 cases. Note that manipulations effected
with the canonical basis work for any basis (though this comment shall be made clear in
the 2nd Semester of the course).
Example 1.2.1 If n = 2, then V (ae1 +be2 , ce1 +de2 ) = aV (e1 , ce1 +de2 )+bV (e2 , ce1 +de2 ).
After developing the expression, it equals
V (A, B, C) = V (a1 e1 + a2 e2 + a3 e3 , b1 e1 + b2 e2 + b3 e3 , c1 e1 + c2 e2 + c3 e3 ).
5
1.3 Binet’s formula
Theorem 1.3.1 det AB = det A det B.
IMPORTANT: What Proposition 1.3.1 proves is that the determinant of a square ma-
trix A, det A, is a scaling factor between the volume of an n-parallelopiped generated by
the vectors u1 , . . . , un and that of its image by A, i.e. to one determined by, Au1 , . . . , Aun .
The sign of det A shows whether the orientation of Aui is the same as, or opposite to,
that of the ui .
Theorem 1.3.2 Let A be an n × n matrix. Then: det A = 0 if and only if its columns
A1 , · · · , An are linearly dependent.
by linearity and alternateness. Now, assume that the columns of A are linearly
P k indepen-
n
dent: since the Ai form a basis of K , the canonical basis is such that ei = bi Ak . This
means that I = AB, and by Binet’s Formula det A det B = det I = 1, hence det A 6= 0.
In
P order to have an explicit formula for the determinant, we use matrix notation: Aj =
aij ei . By DET1, we have:
n
X n
X X
V (A1 , . . . , An ) = V ( ak1 1 ek1 , · · · , akn n ekn ) = ak1 1 · · · akn n V (ek1 , . . . , ekn ).
k1 =1 kn =1 k1 ,...,kn
Again, by DET2, only the terms where k1 , . . . , kn distinct shall survive, i.e. {k1 , . . . , kn } =
{1, . . . , n}. Here we use the notation for permutations: a permutation of n elements is
a bijection σ : {1, . . . , n} → {1, . . . , n}. The set (group) of permutations of {1, · · · , n} is
denoted by Sn , and is called the symmetric group of n elements.
Back to the expression, we have
X
V (A1 , . . . , An ) = aσ(1)1 · · · aσ(n)n V (eσ(1) , . . . , eσ(n) ) = (?)
σ∈Sn
6
Definition A permutation τ ∈ Sn is called transposition (between i, j (com i 6= j) if
τ (i) = j, τ (j) = i and τ (k) = k for every k 6= i, j. In other words, a transposition is a
swap between two elements i, j. One usually writes τ = (i j). (Note that τ −1 = τ for
every transposition τ ).
V (e2 , e3 , e4 , e5 , e1 ) = −V (e2 , e3 , e4 , e1 , e5 ).
Now e4 goes to its rightful place, and for that we swap e1 , e4 :
V (e2 , e3 , e1 , e4 , e5 ) = −V (e2 , e3 , e4 , e1 , e5 ).
We do the same with e3 and e2 , which entails two more sign changes, and thus we get
V (e2 , e3 , e4 , e5 , e1 ) = V (e1 , e2 , e3 , e4 , e5 ).
Note that, if σ = τ1 · · · τr , where τi = τi−1 are transpositions, then using (ab)−1 = b−1 a−1
yields
σ −1 = (τ1 · · · τr )−1 = τr−1 · · · τ1−1 = τr · · · τ1 .
(The process we showed in Example 1.3.4 in fact provides a decomposition of the inverse
σ −1 , but we shall not dwell on this, and instead refer to the first Chapter on groups.)
CLAIM: (proven in the Appendix at the end of this Chapter) The function sgn is well
defined, and multiplicative, i.e.: sgn(ση) = sgn(σ)sgn(η) for σ, η ∈ Sn .
X
(1.1) det A = sgn(σ)aσ(1)1 · · · aσ(n)n ,
σ∈Sn
where sgn is the sign made explicit in Corollary 1.3.6. Another important result follows:
7
Proposition 1.3.7 det A = det AT .
Proof: Clearly, aσ(k)k = ATk,σ(k) . Also, when we have the graph of a bijection, the graph
of its inverse is obtained by transposing the horizontal and vertical axes (i.e. reflection
through the diagonal) in the case of real functions of a real variable. This also holds by
permutations of {1, 2, · · · , n} (A PICTURE IS LACKING!).
Therefore, for every σ ∈ Sn , it follows from the former paragraph that
n
Y n
Y
aσ(k)k = a`σ−1 (`) ,
k=1 `=1
just swapping k by ` = σ(k). On the other hand, sgn(σ) = sgn(σ −1 ), so therefore det A =
det AT .
a22 · · · a2n
det(e1 , A2 , · · · , An ) = . . . . . . . . . . . . . .
an2 · · · ann .
For instance, one may argue that σ(1) = 1 (hence aσ(1)1 = 1) for every permutation with
a nonzero product aσ(1)1 · · · aσ(n)n .
Likewise one may proceed with general A1 :
n
X
det A = det(A1 , A2 , · · · , An ) = ai1 det(ei , A2 , · · · , An ),
i=1
A4
In all, this argument and its column analogue yield the following result.
Theorem 1.4.1 (Laplace’s rule, poor version) Let A be a square matrix of order n.
8
1. (Developing by a column) Fix j (j-th column of A). Then:
n
X c
det A = (−1)k+j akj Ajic ;
k=1
Pn i+k c
2. (Developing by a row) fix i (i-th row of A). Then: det A = k=1 (−1) aik Akic ,
c
where Abac is the determinant of the submatrix resulting from erasing the a-th row and the
b-th column.
bij11 · · · bij1r
BJI = . . . . . . . . . . . . .
bijr1 · · · bijrr
Laplace’s rule has a full-fledged version, which we shall refrain from stating or proving
here.
9
Pn i+k c
3. Fix i (i-th row of A). Then: det A = k=1 (−1) aik Akic ,
Pn i+k c
4. Fix i (i-th row of A), and let j 6= i. Then: k=1 (−1) ajk Akic = 0.
Proof: Parts 1 and 3 are in Theorem 1.4.1. Part 4 is Part 2 applied to AT . Part 2
follows from considering the determinant of A, deleting the j-th column and writing the
k-th column instead. Thus, clearly
10
Proof: Firstly, the condition that x ∈ F is equivalent to rk (A1 · · · Ar x) < r + 1, which
is to say that every minor of order r + 1 is zero.
Let us show that it suffices to test I 0 = I ∪ {r + 1}. If (?) holds, then AI[1,r] xj and hence
xj is a linear combination of xi1 , · · · , xir . This follows from developing the determinant
on their rightmost column. This determines all variables xj , j > r.
1 2
2 1
Example 1.4.7 Consider the vector subspace F = h 1 , 1 i. Let us write a com-
4 −1
plete set of cartesian equations for F .
Consider a generic vector of unknowns x, y, z, t. First of all, we spot in the matrix
1 2
2 1
A=
1 1
4 −1
the minor 2 × 2 of the first two rows and columns, which is nonzero. The augmented
matrix is now
1 2 x
2 1 y
A0 =
1 1 z ,
4 −1 t
and fixing the first two rows yields two equations:
1 2 x
2 1 y =0
1 1 z
and
1 2 x
2 1 y = 0.
4 −1 t
If the reader should choose another two rows, the results would be the same –we mean, up
to linear combinations of both equations, of course.
Example 1.4.8 (An oldie follows from Theorem 1.4.5) Recall the case of u, v ∈
Rn , of coordinates ui , vj respectively. The rank of the matrix (u v) is then the highest
order of a nonzero minor: if one of the vectors is nonzero, then it is at least 1 (a nonzero
coordinate is a nonzero 1 × 1 minor!). The rank is precisely 2 if and only if there is a
nonzero minor ui vj − uj vi . We knew this, but now Theorem 1.4.5 vastly generalises this
old result.
11
1.4.3 Cramer’s rule. Inverses via minors
Theorem 1.4.9 (Cramer’s rule) Let A be an invertible n × n matrix. Let b ∈ K n .
There is a unique solution to the linear system Ax = b, of unknowns x = (x1 x2 · · · xn )T .
The value of each xi is
det(A1 , · · · , Ai−1 , b, Ai+1 , · · · , An )
xi = .
det A
Proof: Clearly, Ax = b ⇔ x = A−1 b, so x = 1
det A
(adj A)b, and reverse application of
Laplace’s rule settles the Theorem.
Remark An alternative proof of 1.4.9 goes as Pfollows. The solution exists, for the columns
n
A1 , · · · , An of A form a basis of R and b = Ai xi has a unique set of coordinates. Now,
write the determinant
n
X
det(A1 , · · · , Ai−1 , b, Ai+1 , · · · , An ) = det(A1 , · · · , Ai−1 , Ak xk , Ai+1 , · · · , An ) = (?).
k=1
Expanding on the i-th column yields (?) = det(A1 , · · · , Ai−1 , Ai xi , Ai+1 , · · · , An )+0, since
the terms with Ak on the i-th position for k 6= i yield zero by DET2. It follows that
Let f1 , · · · , fn ∈ C n−1 (I), where I ⊂ R is an interval. Assume that the fi are linearly
dependent. This means that one has α1 , · · · , αn real constants, not all zero such that
n
X
αi fi (x) = 0, ∀x ∈ I.
i=1
Much more information is contained here than a mere equation. In fact, if we differentiate
up to n − 1 times the above identity, we get a homogeneous system of n equations and n
unknowns with a nontrivial solution for every x ∈ I:
··· α1 0
f1 (x) f2 (x) fn (x)
f10 (x) f 0
2 (x) · · · f 0
n (x) α2 0
. = .. .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
(n−1) (n−1) (n−1)
f1 (x) f1 (x) · · · fn−1 (x) αn 0
12
Definition The Wronskian, or Wronski’s determinant, of n functions fi ∈ C n−1 (I) is
the determinant function
f1 (x) f2 (x) ··· fn (x)
0 0
f1 (x) f2 (x) ··· fn0 (x)
W (f1 , · · · , fn )(x) = .
...................................
(n−1) (n−1) (n−1)
f1 (x) f1 (x) · · · fn−1 (x)
Theorem 1.5.1 Let f1 , · · · , fn ∈ C n−1 (I). If the fi are linearly dependent, then the
Wronskian W (f1 , · · · , fn ) is identically zero on I.
Example 1.5.2 The Theorem allows us to prove that the functions 1, sin x, cos 2x, sin 2x
are linearly independent. We leave the reader to check that their Wronskian is not iden-
tically zero.
Example 1.5.3 Let u1 (x) = x3 , u2 (x) = x2 |x|. The Wronskian W (u1 , u2 ) is identically
zero, although these functions are linearly independent. Thus, the converse holds under
only certain, albeit quite general, hypotheses.
Remark The following Theorem carries a principle of real analytic functions in its proof,
which we shall not state explicitly right now.
Proof: Assume that the functions u1 , · · · , un are linearly independent. We shall prove
by induction on n ∈ N that the wronskian is not identically zero ,the case n = 1 being
trivial. Let a ∈ I, and let r = min{orda ui , i ≤ n}, where orda f denotes the order of
a as a zero of f (i.e. 0 if f (a) 6= 0, and r = orda f is the minimum number such that
f (r) (a) 6= 0).
If F ⊂ C ∞ (I) is a vector subspace of dimension n, the Wronskian is independent of the
basis of F chosen, up to a nonzero constant. Thus, we shall assume that
(if the ui were LD, at least one would come across as identically 0, i.e. one of the
orders would be ∞). This may be viewed as Gaussian elimination on the coefficients
13
on the expansion centred around a. Up to a nonzero constant, we may assume that
ui (x) = (x − a)ri gi (x), where gi (a) = 1 and gi ∈ C ω (I).
Reduce the to an interval J such that a ∈ J ⊂ I, and g1 (x) is invertible in J. Applying
Proposition 1.5.4 to fi |J yields
Now, clearly, if f1 , · · · , fn are linearly independent, so are 1, ff12 , · · · , ffn1 (restricted to J),
and this is in turn equivalent to the derivatives ϕ02 , · · · , ϕ02 being linearly independent,
where ϕi = fi /f1 (defined over J). By induction, the Wronskian of ϕ02 , · · · , ϕ0n is not
identically zero (over J), and so neither is W (f1 , · · · , fn ), as was to be demonstrated.
1.6 Problems
The following was part of a test in 2018.
a21 + x a1 a2 a1 a3 . . . a1 an
a2 a1 a22 + x a2 a3 . . . a2 an
P (x) = .
.................................
an a1 an a2 an a3 . . . a2n + x
The result below is the very foundation of L’Hôpital’s rule, for its proof stems from
this little gem. A determinantal guise is most recommended, both for the proof and for
memory purposes.
14
1.6.4 Given A ∈ Mn (Z) a square matrix with integer coefficients, give necessary and
sufficient conditions for A to have an inverse with integer coefficients.
1.6.5 Let t be an indeterminate. Consider the monomials tr1 , · · · , trn where ri ∈ N. Let
M be the matrix whose first row is tr1 , · · · , trn , with the (i+1)-th row being the derivative
of the i-th row. Prove that det M = CtN , where C is a real constant and N is natural,
and find C, N explicitly.
1 x0 x20 · · · xn0 y0
1 x1 x21 · · · xn1 y1
. . . . . . . . . . . . . . . . . . . . . . . . . = 0.
1 xn x2n · · · xnn yn
1 x x2 · · · xn p(x)
Proof:
Suppose that the indices i, j, a, b are distinct. Then: σ(i) = i < σ(j) = j.
If a < i < b (and j = b), a, i become b, i and there are b − a − 1 inversions. On the
other hand, j, b become j, a, which counts for b − a − 1 inversions.
15
Se i = a < b = j, the pair i, j becomes j, i
Theorem 1.6.8 Let σi be permutations. Then: (−1)N (σ1 σ2 ) = (−1)N (σ1 ) (−1)N (σ2 ) . There-
fore, sgn(σ) = (−1)N (σ) is well defined, i.e. independent of the factorization of σ into
transpositions.
16
Chapter 2
Endomorphisms
2.1 Conventions
Let E be a finite-dimensional vector space over a field K (mostly R or C), and let T : E →
E be an endomorphism (over K), i.e. a K-linear map where both domain and codomain
are the same (i.e. E). There are several considerations to make.
Firstly, if e = (ei ) is a basis of E and u ∈ E, we denote the vector of coordinates of u in
the basis e by ue , namely
α1
α2 X
ue = .. , where u = αi e i .
.
αn
Choice of bases/basis: In order to understand the changes in the geometry effected by
T , one expects to use the same basis both at source and target. Thus, if e = (ei ) is a
basis of E, we consider the matrix Me (T ) = [T ]ee .
2.1.1 Let u = (a, b, c) ∈ R3 . Consider the endomorphism T (x) = u × x. Find the matrix
of T in the canonical basis.
Once we fix E, the vector space HomK (E, E) = EndK (E) of K-endomorphisms of E
appears. In this vector space, a product structure emerges in EndK (E), i.e. the com-
position of endomorphisms: if S, T ∈ EndK (E), then S ◦ T ∈ EndK (E). This product
satisfies both distributive laws, I = IE is its unit element, and it is K-bilinear, namely
λ(S ◦ T ) = (λS) ◦ T = S ◦ (λT ), ∀λ ∈ K.
Example 2.1.2 (Powers of an endomorphism) Let E, T be as above, and let e =
(e1 , . . . , en ) be a basis of E. If S, T ∈ EndK (E) have respective associated matrices A, B
in the basis e, then S ◦ T has associated matrix AB in the same basis.
In particular, the powers S n have associated matrix An .
The convention we adopt is T 0 = I.
Base change: Let f : E → E be an endomorphism of E, dimK E < ∞, and let e = (ei ),
u = (ui ) be two bases of E.
17
2.2 Warm-up problems
These problems shall be relevant later, some are repeated in other sections.
2.2.1 (Projectors/Idempotents) Let E be a vector space, and let p : E → E be an
endomorphism. We say that p is a projector, or that p is idempotent, if p2 = p. Prove
the following claims:
(i) If p is a projector, then so is I − p;
(ii) ker p = Im(I − p), and ker(I − p) = Im p.
(iii) E = ker p ⊕ Im p.
2.2.2 Consider C as a real vector space. Take the basis {1, i} of C over R. Given
α = a+bi, write down the matrix of the endomorphism mα : C → C given by multiplication
by α, namely, mα (z) = αz viewed as a real endomorphism of C in the basis given {1, i}.
2.2.3 Let T : E → E be an endomorphism, with dim E < ∞. Prove that the following
are equivalent:
(i) ker T = ker T 2 ;
(ii) E = ker T ⊕ Im T.
2.2.4 Let A be a square matrix of order n ≥ 1. Assuming that A3 − A − 2I = 0, prove
that A is invertible.
2.2.5 (It will be essential afterwards) Let N be a nilpotent matrix of order n, N ∈
Mn (C) (this means that N m = 0 for some m ∈ N). Prove that, if we denote Fj = Im N j
and Fk 6= 0, then Fk+1 ( Fk . Conclude that N n = 0.
2.3 Introduction
Given an endomorphism f , say, of the plane R2 , a way to study it is to determine its
invariants. The fixed directions offer
important
clues as to its nature.
2 0
Take for instance the matrix A = . The transformation on the plane caused by
0 3
A leaves two lines through the origin stable by A; to wit, the coordinate axes. No other
line through the origin is stable by A.
Note that a line through the origin is a one-dimensional subspace, L = hvi, and that a
one-dimensional hvi ⊂ E is stable by an endomorphism f if and only if f (v) = λv for
some λ ∈ K. Such vectors v 6= 0 are called characteristic vectors, or eigenvectors,
and the factor λ is called characteristic value, or eigenvalue. The German eigen-
(English transl. own) means also ‘peculiar to’.
Example 2.3.1 Let A ∈ Mn (C) be a square matrix of order n. Suppose that the nonzero
vector 0 6= u ∈ Cn is an eigenvector of A, of eigenvalue λ. Is u an eigenvector of Am ,
where m ∈ N? If so, what is its eigenvalue with respect to Am ?
18
2.4 The characteristic polynomial. Eigenspaces
Back to the finite-dimensional case, let us find out the fixed directions (equivalently, the
eigenvectors) and eigenvalues of a given endomorphism T : E → E. We shall assume that
we have chosen a basis e = (ei ) of E, and denote A = Me (T ).
The equation T (u) = λu has a nonzero solution u ∈ E − {0} if and only if
which is tantamount to det(T − λI) = 0. Fixing the basis e, we need to solve the equation
det(A − λI) = 0.
Proof: Indeed, det(A − xI) = det(P BP −1 − xI) = det(P (B − xI)P −1 ). The rest follows
from multiplicativity of the determinant.
Remark 2.4.2 Let us point out some specific coefficients of pA (x), where A is a square
matrix.
a11 − x a12 ... a1n
a11 a12 − x . . . a2n
pA (x) = det(A − xI) = .. .. .. .. .
. . . .
an1 an2 . . . ann − x
Note that, of the terms of the determinant, we have the product of all diagonal terms
and the rest of the terms have degree ≤ n − 2 in x (there cannot be a term consisting of
precisely n − 1 diagonal terms!). On the other hand, pA (0) = det A, so the characteristic
polynomial looks like this:
By Remark 2.4, the trace of A equals the trace of P AP −1 , so we may define the trace of
an endomorphism T ∈ EndK (E), tr T , where dim E < ∞.
19
Thus, the characteristic polynomial of a square matrix A of order n has the form
2.4.4 Find the characteristic polynomial, eigenvalues and eigenvectors (real and com-
plex)of thefollowing
matrices:
2 1 3 1 a −b 3 0 1 1
A= ;B= ;C= (b 6= 0); D = ;E= .
0 3 1 3 b a 0 3 −1 3
Definition Let f ∈ EndK (E) be an endomorphism, where dimK E < ∞. We say that
f diagonalizes in a basis u = (ui ) of E if the matrix Mu (f ) = D is diagonal, in other
words, if f (ui ) = λi ui for all 1 ≤ i ≤ n. That is to say, that the basis (ui ) consists
of eigenvectors. If such basis exists but is not explicitly given, we shall say that f is
diagonalizable.
20
2.5 Eigenspaces as subspaces
Proposition 2.5.1 Let T ∈ End(E) be an endomorphism. Eigenvectors u1 , . . . , ur of
pairwise distinct eigenvalues λi 6= λj are linearly independent.
Proof: Assume that they are linearly dependent. Let 2 ≤ s ≤ r be the minimal number
of nonzero coefficients occurring in a relation of linear dependence among the ui : for
simplicity we assume that these nonzero coefficients accompany u1 , · · · , us . We have:
ai 6= 0 for i ≤ s, and
(2.1) a1 u1 + · · · + as us = 0.
Applying T yields
(2.2) λ1 a1 u1 + . . . + λs as us = 0,
Corollary 2.5.2 There are at most n different eigenvalues (n = dim E) for an endomor-
phism f ∈ End(E). If there are precisely n eigenvalues, then they form a basis of E and
f has a diagonal matrix in this basis.
Proof: The eigenvalues are the zeros of pf (x), which is of degree n, so there are no more
than n. If there are n distinct eigenvalues, there are n eigenvectors ui forming a basis of
E, and since f (ui ) = λi ui the matrix Mu (f ) is diagonal.
Let us rephrase Proposition 2.5.1.
21
Theorem 2.5.4 (First diagonalization criterion) Let f ∈ End(E), where dim E =
n < ∞. f is diagonalizable (over K) if and only if the eigenspaces span E. In other
words,
Mr
f is diagonalizable ⇔ ker(T − λi I) = E.
i=1
Another way to say this is: the sum of geometric multiplicities of all eigenvalues of E
(over K) equals n.
Proof:
P Firstly, one should note that there exists a basis of eigenvectors for E if and only
if ker(T − λi I) = E (we are using that F1 + . . . + Fr is the span of F1 ∪ . . . ∪ Fr ). The
L eigenspace dim Eλ = dim ker(T − λi I) is the geometric multiplicity. We
dimension of each
have equality ri=1 ker(T − λi I) = E if and only if the dimension of the subspace equals
that of E, namely X
dim ker(T − λi I) = dim E.
This means that the sum of all geometric multiplicities of all eigenvalues over K equals
the dimension of E.
Corollary 2.5.5 Let f ∈ End(E), where dim E = n. Suppose that f has n pairwise
distinct eigenvalues λi , 1 ≤ i ≤ n, λi 6= λj for i 6= j. Then f diagonalizes.
Proof: Indeed, there are n eigenvectors (up to respective constant) of algebraic and
geometric multiplicity 1: this follows from Proposition 2.5.1:
Now, we have n eigenvectors ui associated with n distinct eigenvalues, which are linearly
independent, hence a basis since n = dim E. Thus, Mu (f ) is diagonal, which fulfils the
claim.
ni
Q
Remark 2.5.6 Clearly, when K = C, P A ∈ Mn (C), one has cA (x) = (λi − x) , and
also dim ker(T − λi I) ≤ ni , whereas ni = dim E from cA (x) (this exhausts all factors
x − λi ). Thus, in this case the matrix diagonalises if and only if the algebraic multiplicity
P
of each eigenvalue equals its geometric multiplicity (if not, we would have <, since ni =
n = dimC E already. Let us write it down:
X X
dim(f − λi I) ≤ ni = n, and equality holds iff dim ker(f − λi I) = ni , ∀i.
22
The least degree trick: Of all polynomials 0 6= p(x) ∈ K[x] such that p(A) = 0, there
is one which has the least degree. Let us take it to be monic, and let us denote it by
p0 (x). Every polynomial such that p(A) = 0 is a multiple of p0 (x). Note that a nonzero
polynomial of degree ≤ n2 which annihilates A already exists, by the above argument.
To prove this, consider the polynomial division of p(x) by p0 (x): p(x) = q(x)p0 (x) + r(x).
Since p(A) = 0 and p0 (A) = 0, we have r(A) = 0. Now, r(x) = 0, for otherwise we would
have r(A) = 0 with deg r(x) < deg p0 (x), thus contradicting minimality of deg p0 (x).
In our case, deg p0 (x) ≤ n2 , as we argued earlier.
Definition Let f, E be as above. The monic polynomial mf (x) of minimal degree among
those which annihilate f is called the minimal polynomial of f . (Monic means: of
lead term equal to 1).
Proposition 2.6.1 E, f as above. Let λ ∈ K be an eigenvalue of E. The factor x − λ
divides the minimal polynomial mf (x) of f . Vice versa, if x − a|mf (x), then a ∈ K is an
eigenvalue of f .
Proof: Indeed, if u 6= 0 is an eigenvector of λ, then mf (f ) = 0, so mf (f )u = mf (λ)u = 0,
which means mf (λ) = 0. Conversely, for every linear factor x − a of mf (x), we have
det(f − aI) = 0, or else f − aI is invertible. Since mf (x) = (x − a)q(x), one has
0 = mf (f ) = (f − aI)q(f ), hence q(f ) = 0, thus contradicting the definition of minimal
polynomial.
Corollary 2.6.2 Given A ∈ Mn (C), all linear factors of mA (x) come from eigenvalues.
2.6.3 Compute
the minimal polynomial of all matrices in Exercise 2.4.4, and of the
2 1
matrix F = .
0 2
Proposition 2.6.4 E, f as above. Let Eλi = ker(f − λi I) be the eigenspace associated
with the eigenvalue λi of f (each distinct eigenvalue is counted only once!). Assume that
f diagonalizes, i.e.
Mr
ker(f − λi I) = E.
i=1
Qr
The minimal polynomial of f equals mf (x) = i=1 (x − λi ).
Proof: Indeed, note that a polynomial in f , p(f ), is zero if and only if ker p(f ) = E, and
since the sum of Eλi equals E in our case, this condition is equivalent to ker(f − λi I) ⊂
ker p(f ), ∀i. Let u ∈ Eλi − {0}. p(f )u = p(λi )u = 0 if and only if p(λi ) = 0, which means
x − λi |p(x). This condition for all i is equivalent to saying that
r
Y
(x − λi )|p(x),
i=1
Qr
so the monic polynomial of minimal degree is mf (x) = i=1 (x − λi ).
23
Theorem 2.6.5 (Diagonalizability criterion) E, f as above. f diagonalizes if and
only if mf (x) is the product of unrepeated linear factors.
Proof: The if part is proven in Proposition
Qr 2.6.4. The converse is proven as
Q follows: if
λ1 , . . . , λr are all zeros of mf (x) = i=1 (x−λi ), then the polynomial Qi (x) = j6=i (x−λj )
satisfies the following: (f − λi )Qi (f ) = 0, i.e. Im Qi (f ) ⊂ ker(f − λi ). By Proposition
2.6.1, we know that the λi form the whole list of eigenvalues Q of f .
Let u ∈ ker(f − λi I), u 6= 0. One has Qi (f )u = Qi (λi )u = j6=i (λi − λj )u 6= 0. Remember
Q x−λ
the Lagrange interpolation formula applied to λ1 , . . . , λr : let Li (x) = j6=i λi −λjj . One
has Li (λj ) = 0 for i 6= j, and = 1 for i = j. Li (x) is a constant multiple of Qi (x). The
formula
r
X
1= Li (x)
i=1
P P
proves that E = Im Li (f ) ⊂ ker(f − λi I), which together with Proposition 2.5.3
yields f diagonalizable.
24
which is upper triangular, as desired.
Corollary 2.7.3 Let A be a square matrix of order n. The trace of A, tr A, is the sum
of all complex eigenvalues of A, counted with (algebraic) multiplicities. Likewise, det A is
the product of all eigenvalues, counted with multiplicities:
X Y
tr A = λi , det A = λi .
The other coefficients of the characteristic polynomial have similar explanations in terms
of the eigenvalues, as signed elementary symmetric polynomials thereof.
Q
Proof: It suffices to note that cA (x) = (λi − x) after triangularising A in the complex
numbers.
25
2.8.2 Let A, B ∈ M2 (C) be a square matrix of order 2 with tr A 6= 0. We say that two
square matrices X, Y of order n commute if XY = Y X. Prove that B commutes with
A if and only if B commutes with A2 .
2.8.6 Let A be a nilpotent n × n matrix. Prove that An = 0 (all tools from this chapter
are available here).
2.9 Problems
2.9.1 Let A = (aij ) be the n × n real matrix defined by aij = 1 for all i, j = 1, . . . , n.
Prove that A diagonalizes, and find a diagonal matrix D and an invertible matrix P such
that A = P DP −1 .
5/2 −1
2.9.2 Given A = , find A1438 and limn→∞ An .
3 −1
2.9.3 Let A ∈ Mn (R). If n is odd, then A has an eigenvalue. If n is even and det A < 0,
A has at least two eigenvalues. Can you give an example of a real A with no (real)
eigenvalues?
f (e1 ) = . . . = f (en ) = a1 e1 + . . . + an en .
2.9.5 Which of the following matrices are associated with the same endomorphism? (I.e.,
which of them are similar to each other?)
1 −1 3 1 2 3 1 1 0
A = 0 2 1 , B = 0 2 0 , C = 0 2 3 .
0 0 2 0 0 2 0 0 2
For those who are, find the base change making them similar. (You might not be able to
work out each case, in which case you are welcome to try after studying the next chapter.)
26
2.9.6 Let q(x) ∈ C[x] be any polynomial. Let f ∈ End Q C (E) be an endomorphism of a
finite-dimensional vector space E over C. Let cf (x) = ri=1 (ai − x)ni be the characteristic
polynomial of f .
Prove that the characteristic polynomial of q(f ) is
r
Y
(q(ai ) − x)ni .
i=1
2.9.7 Let f ∈ EndK (E), and let F ⊂ E be an invariant subspace (this means, f (F ) ⊂
F ). Define f |F to be the following endomorphism of F :
2.9.9 Let A ∈ Mn (C) be such that there exists P ∈ Mn (C) invertible such that A =
P A2 P −1 . Prove that every nonzero eigenvalue of A is a root of unity.
1 1 2
2.9.10 Find all matrices A ∈ M2 (R) such that A = .
1 −1
2 2 1
2.9.11 Find all matrices A ∈ M2 (R) such that A = .
−4 −2
2.9.12 Let A be an n × n real matrix such that A3 = A + I. Prove that det A > 0.
Is A diagonalizable? Justify your answer. (For other purposes, it may be good to compute
at least several powers of A).
27
2.9.14 (Determinant of a circulant matrix) Let c0 , . . . , cn−1 ∈ C. Compute the fol-
lowing determinant:
c0 cn−1 cn−2 · · · c1
c1 c0 cn−1 · · · c2
c2 c1 c0 · · · c3 .
.........................
cn−1 cn−2 cn−3 · · · c0
(It would be appropriate to use the tools of this chapter, and perhaps more difficult to
solve it otherwise.)
(b) Consider the Fibonacci sequence x0 = 1, x1 = 1, xn+2 = xn+1 + xn . Find the general
term. (Hint: Note the following equation:
xn+2 1 1 xn+1
= .)
xn+1 1 0 xn
28
2.10.2 Determine all complex numbers λ for which there is a positive integer n ∈ N and
a real matrix A ∈ Mn (R) satisfying A2 = AT .
2.10.3 Determine all positive integers n for which there are invertible real matrices A, B
such that AB − BA = B 2 A.
2.10.7 For any integer n ≥ 2, and two invertible n × n matrices A, B with real entries
such that
A−1 + B −1 = (A + B)−1 ,
prove that det A = det B. Is this the case for matrices with complex entries?
(b) Prove that A is nilpotent, i.e. there is a natural number m ∈ N such that Am = 0.
(b) If A is nilpotent, so is TA .
2 1 −5
2.10.10 Find all real matrices A ∈ M2 (R) such that A = .
1 −1
29
The following problem lacks all reference to infinite series, so as not to enter discussions
on convergence and limits at this point.
2.10.11 Let N be a real or complex nilpotent n×n matrix of nilpotence index m, N m = 0.
Define the exponential of N , eN , to be the matrix defined by the Taylor series of the
exponential function:
X Nk
eN = (the sum is finite!).
k≥0
k!
0 0
1. Let t, t0 ∈ C. Prove that etN et N = e(t+t ) N .
2. Let M, N be nilpotent matrices such that M N = N M . Prove that
eM eN = eM +N .
(I + X)α = I + αX + α(α−1) X 2 + · · · αk X k + . . . .
2
One may prove the Cayley-Hamilton Theorem in a purely computational fashion. This
version by Peter Lax shortens that proof.
2.10.13 (Simplified computational proof of the Cayley-Hamilton Theorem) Let
A ∈ Mn (K), where K is an infinite field [26, Ch. II, Th. 5].
1. Let Pi , Qj ∈ Mn (K) be matrices. Consider the following polynomials with matrix
Pd i
Pe j
coefficients, P (x) = i=0 Pi x , Q(x) = j=0 Qj x . Their product R = P Q is
Pd+e
R(x) = k=0 Rk xk , where Rk = i+j=k Pi Qj . If all coefficients Qj satisfy Qj A =
P
AQj , prove that R(A) = P (A)Q(A).
2. Consider the identity
det(xI − A)I = (xI − A)· adj(xI − A),
and write adj(xI −A) = C0 xn−1 +. . .+Cn−1 as a polynomial with matrix coefficients.
Check that Ci A = ACi , and use part 1 to show that cA (A) = 0.
30
2.10.14 Let A, X ∈ Mn (K) be matrices such that AX = XA. Prove that there is a
matrix M such that AM = M A, XM = M X such that
cA (X) = M (X − A).
31
32
Chapter 3
Endomorphisms (II)
In this chapter we shall study the main aspects of endomorphisms more deeply, and shall
show some important applications. We shall work over arbitrary fields unless otherwise
stated.
33
3.1.1 Divisibility revisited
Example 3.1.2 a(x) is a multiple of b(x) (equivalently, b(x)|a(x) if and only if (a(x)) ⊂
(b(x)).
Proposition 3.1.3 Let f (x), g(x) ∈ K[x]. The following subset is an ideal: (f (x)) ∩
(g(x)). Thus, there is a generator for this ideal, unique up to constant multiples, which
we call the least common multiple m(x) of f (x), g(x): (m(x)) = (f (x)) ∩ (g(x)).
Proposition 3.1.4 (Greatest common divisor) Every common divisor d(x) ∈ K[x]
of f (x), g(x) satisfies f (x), g(x) ∈ (d(x)). In particular, the following subset of K[x] is
contained in d(x):
Proof: The subset (f (x), g(x)) is an ideal of K[x], as both conditions are satisfied. The
fact that δ(x) is a common factor of f (x), g(x) follows from the corresponding inclusions
on ideals (f (x)) ⊂ (f (x), g(x)) = (δ(x)) and (g(x)) ⊂ (δ(x)). Again by the inclusions on
ideals, every common factor d(x) of f (x) and g(x) satisfies that δ(x) ∈ (d(x)).
Corollary 3.1.5 (Bézout’s Identity) Let f (x), g(x) be two polynomials in K[x]. Let
d(x) be their highest common factor. One has
Example 3.1.6 A practical fashion of obtaining the G.C.D. is by way of Euclid’s algo-
rithm, and by proceeding backwards.
Remark 3.1.7 (Partial fraction decomposition) Let f (x), g(x) be coprime polyno-
p(x)
mials. By Bézout’s identity, we may decompose a rational function of the form f (x)g(x)
into simpler fractions:
p(x) p(x)(a(x)f (x) + b(x)g(x)) A(x) B(x)
= = + .
f (x)g(x) f (x)g(x) f (x) g(x)
If deg p < deg(f g), then one may pick the individual fractions on the right hand side
and effect polynomial division of each numerator by its corresponding denominator, and
repeat the operation until we get a sum of fractions, the denominators thereof have only
one irreducible factor each. This is the starting point for symbolic integration of a rational
function. To find out more, search e.g. ‘Partial fraction decomposition’ on Wikipedia.
34
3.2 Invariant subspaces. The subspaces Eu
Definition Let f ∈ EndK (E). A linear subspace F ⊂ E is invariant by f (f -invariant),
or stable by f , if f (F ) ⊂ F . We define f |F to be the following endomorphism of F :
3.2.2 Let f ∈ EndK (E), and let F ⊂ E be an invariant subspace (this means, f (F ) ⊂
F ). Define f |F to be the following endomorphism of F :
3.2.4 Let p(x) ∈ K[x] be a polynomial. The subspaces ker p(f ), Im p(f ) are invariant
by f .
35
Proposition 3.2.7 Let u ∈ E, u 6= 0. A basis of Eu is formed by u, f (u), · · · , f d−1 (u),
for suitable d ≥ 1.
Proof: Let d be the maximum degree such that u, f (u), · · · , f d−1 (u) are linearly in-
dependent. This means that there is a monic polynomial mu (x) of degree d such that
mu (f )u = f d (u) + cd−1 f d−1 (u) + · · · + c1 f (u) + c0 u = 0, and by Subsection 3.1 one sees
that any polynomial p(x) such that p(f )u = 0 is a multiple of mu (x).
We have proven that d = dim Eu , where d = deg mu (x). Indeed, any linear combination
of elements in S, i.e. elements f i (u), is of the form p(f )u, and p(x) = q(x)mu (x) + r(x)
where deg r < deg mu , so the linearly independent system u, f (u), . . . , f d−1 (u) spans Eu .
3.2.9 Write down the matrix of f |Eu in the special basis u, f (u), · · · , f d−1 (u).
(called companion matrix of p(x)) find the characteristic and minimal polynomials of
A.
Proof: We choose the basis u, f (u), · · · , f d−1 (u). Note that mu (f )u = 0, and that mu (x)
annihilates the whole of Eu , since mu (f )f i (u) = f i mu (f )u = f i (0) = 0, which shows
mEu (x) = mf (x), since mu |mEu (x).
36
Proposition 3.3.1 Let P (x), Q(x) ∈ K[x] be nonconstant polynomials. If P (x), Q(x)
are coprime and f : E → E is an endomorphism, one has a decomposition
of f -invariant subspaces. (Here, dim E need not be finite). Also, ker P (f ) = Im Q(f )
and ker Q(f ) = Im P (f ).
Proof: All subspaces involved are f -invariant by construction (REF. LACKING), and
both ker P (f ) and ker Q(f ) are clearly contained within ker P (f )Q(f ) = ker Q(f )P (f ).
Now take E = ker P (f )Q(f ) for ease of notation: we may assume P (f )Q(f ) = 0. Bézout’s
identity yields 1 = a(x)P (x) + b(x)Q(x), and in turn we have
Clearly, for u ∈ E the vector Q(f )u ∈ ker P (f ), and vice versa, which together with the
identity u = b(f )Q(f )u + a(f )P (f )u shows ker P (f ) + ker Q(f ) = E. The same identity
shows that ker(f ) ∩ ker Q(f ) = 0. Finally, ı̀f u ∈ ker P (f ), then applying the identity
yields u = 0 + Q(f )b(f )u ∈ Im Q(f ), so ker P (f ) ⊂ Im Q(f ), which is the inclusion
missing. Making the corresponding changes one proves that ker Q(f ) = Im P (f ).
3.3.2 Let a ∈ K, and let p(x) ∈ K[x] be such that p(a) 6= 0, and let q(x) = x − a. Let
f be such that mf (x) = (x − a)p(x). Write down Bézout’s identity and write down the
projectors for each summand.
Remark 3.3.3 One may apply Proposition 3.3.1 to the case of three factors P, Q, R that
are pairwise coprime. As it stands in the proposition, one may apply it to P Q and R, and
then restrict to ker P (f )Q(f ) and apply Proposition 3.3.1 to this invariant subspace with
P , Q. However, we might want to see the resulting decomposition more explicitly, that is,
this formula
ker P (f )Q(f )R(f ) = ker P (f ) ⊕ ker Q(f ) ⊕ ker R(f ).
Writing Bézout’s identity for P Q and R yields aP Q + bR = 1. Now, since Im R(f ) =
ker P (f )Q(f ), it is with b(f )R(f ) that we need to work in order to decompose ker P (f )Q(f )
further. Consider Bézout’s identity for P, Q now: a0 P + b0 Q = 1, and multiply bR by 1 in
the form of a0 P + b0 Q we obtain the following identity of endomorphisms:
37
3.3.5 Prove lemma 3.3.4.
Qr
Theorem 3.3.6 Let f be such that mf (x) factors as mf (x) = i=1 pi (x)ei . The following
splitting exists and is unique:
E = ⊕ri=1 Ei , where mEi (x) = pi (x)ei .
We have a complete orthogonal system of projectors πi ( πi = 1, πi πj = 0∀i 6= j, πi2 =
P
πi ), where πi are polynomials in f and πi E = Ei .
The subspaces Ei are called primary components of the pair (E, f ).
Proof: Write Pi (x) = pi (x)ei , for 1 ≤ i ≤ r, where pi (x) runs across all irreducible factors
of mf (x) over K. Applying Proposition 3.3.1 recursively to the product P1 (x)P2 (x) · · · Pr (x) =
mf (x) yields the following splitting of f -invariant subspaces:
E = ker P1 (f ) ⊕ · · · ⊕ ker Pr (f ).
Let Ei = ker Pi (f ). Note that mEi (x)|Pi (x) by construction. Now, since mf (x) is the
product of all mEi (x) by Lemma 3.3.4, we get mEi (x) = Pi (x) = pi (x)ei . Uniqueness
followsPreadily from the fact that such a decomposition must satisfy Ei ⊂ ker pi (f )ei , and
since dim Ei = dim E, all inclusions are equalities.
Remark 3.3.7 Theorem 3.3.6 together with Lemma 3.3.4 provide another proof of Theo-
rem 2.6.5. In the case where all irreducible factors of mf (x) are linear, the decomposition
into eigenspaces generalises into a decomposition of generalised eigenspaces. where
the generalised eigenspace of the eigenvalue λi is Ei = ker(f − λi I)ei , i.e., the primary
component associated with the linear factor x − λi .
Example 3.3.8 Compute the characteristic and minimal polynomials of the matrix as-
3 3
sociated with the map T : R → R given by T (~x) = u × ~x, where
linear u = (a, b, c).
0 0 −c b
T (e1 ) = u×e1 = c , and the matrix A of T in the canonical basis is c 0 −a.
−b −b a 0
cA (x) = −x(x2 + kuk2 ). Thus, if u 6= 0 there are three different complex eigenvalues, i.e.
A diagonalises over C and mA (x) = −cA (x) = x3 + kuk2 x.
Remark 3.3.9 Let E = F ⊕ G, where both are f -invariant and dim E < ∞. Note that,
if we choose a basis of E by way of respective bases of F and G, the associated matrix of
f has the shape
M 0
,
0 N
where M and N are the respective associated matrices of f |F and f |G. If, however, we
only have an invariant subspace F ⊂ E, completing a basis of F to a basis of E yields a
more generic form
A B
0 C
(here C has an interpretation in terms of the quotient vector space E/F which we shall
refrain from including here).
38
3.3.10 Find a decomposition of mT (x) into irreducible factors over R, and write down
the primary decomposition for T .
We recommend the reader to pause here and solve the problems, the solution of which is
found right below.
Solution for 3.3.10: Let r = |u| > 0. One has mT (x) = x(x2 + r2 ), and Bézout’s
identity yields 1 = (1/r2 )(x2 + r2 ) − x· x, hence
Here the image of T has rank 2, but also coincides with hui⊥ .
and since both x, x2 + r2 are irreducible, we have the following cases. If α 6= 0 and w 6= 0,
then mv (x) = x(x2 + r2 ) and v, T v, T 2 v are linearly independent (this is precisely the
contrary to the statement!). The remaining cases are when either of the components is
zero: mv (x) = 1 if v = 0 (which is why we never consider it!), or mv (x) = x (when w = 0,
i.e. v, u are parallel), and finally the case where α = 0 and w 6= 0, i.e. v = w 6= 0 is
orthogonal to u. The solution is thus complete.
39
Corollary 3.3.13 (Yet another proof of the Cayley-Hamilton Theorem) Take Eu ,
where u is as in Theorem 3.3.12. The characteristic polynomial of f |Eu is precisely mf (x),
and so mf (x)|cf (x).
Proof: Indeed, one has mf (x) = mu (x), and cEu = ±mu (x), so mf (x) = ±cEu (x)|cf (x).
It suffices to take a basis of Eu (with associated matrix A of f |Eu ) and to extend it to
one of E. so the associated matrix M of f has the form
A B
M= ,
0 C
so cf (x) = det(M − xI) = det(A − xI) det(C − xI), where det(A − xI) = cf |Eu (x) =
±mf (x).
3.3.14 Complete the proof of Corollary 3.3.13 so as to prove that both mf (x) and cf (x)
have exactly the same irreducible factors, by induction on dim E (thus avoiding recourse
to algebraically closed fields).
Remark
L 3.3.15 So far we have obtained a decomposition into invariant subspaces E =
Ei , where every mEi (x) has precisely one irreducible factor. Forming a basis e of E by
choosing a basis for each Ei yields a matrix
A1 0 . . . 0
0 A2 . . . 0
A = Me (f ) = . . . . . . . . . . . . . . . . ,
0 0 . . . Ar
Proposition 3.3.16 The decomposition into primary components works for an endomor-
phism T ∈ EndK (E), regardless of dim E, provided that there exists one nonzero polyno-
mial p(x) such that p(T ) = 0. Likewise, there exists u ∈ E such that mu (x) = mT (x).
The proofs given in this section apply to this situation.
40
Theorem 3.4.1 (Jordan canonical form) Under the above hypotheses, there are vec-
tors ui such that
Ms
E= Eui .
i=1
Proof: Replacing f by f − λI, one may assume that λ = 0. Let u be such that mu (x) =
mf (x) = xe ; we shall find an invariant subspace F such that E = Eu ⊕ F . Now fix a
basis for E, and let A be its associated matrix. One may choose the first e vectors to be
a basis of Eu , say, u0 = u, ui = f i (u), for 0 ≤ i ≤ e − 1.
Note that the characteristic and minimal polynomials of A and AT are the same, i.e.
cA (x) = cAT (x), mA (x) = mAT (x). Let N denote the e × e companion matrix
0 0 0 ... 0 0
1 0 0 . . . 0 0
N = 0 1 0 . . . 0 0 .
. . . . . . . . . . . . . . . . . .
0 0 0 ... 1 0
There is a vector y ∈ K n such that (AT )e−1 y 6= 0. Actually, one may choose such y so
that y T Ae−1 e1 6= 0, where (ei ) is the canonical basis of K n . In our case, one may simply
choose y such that ye 6= 0.
Lemma 3.4.2 The linearly independent vectors y, AT y, . . . , (AT )i y, . . . , (AT )e−1 y form
an AT -invariant subspace of K n . Likewise, if we consider this list to be one of row
vectors, namely,
H = hy T , y T A, . . . , y T Ai , . . . , y T Ae−1 i ⊂ (K n )∗ ,
then the annihilator F = {x ∈ K n : v T x = 0 for every v ∈ H} is A-invariant.
41
Note that e − 1 − r + i − 1 ≥ e − 1 > e for i > r, so y T Ae−1−r v = cr (y T Ae−1−r+r u) 6= 0,
which means that v ∈/ F ! Therefore, Eu ⊕ F = E, and induction on dim E completes the
proof.
Jordan blocks: Define the Jordan block Je (λ) to be the following e × e matrix:
λ 0 0 ... 0 0
1 λ 0 . . . 0 0
0 1 λ . . . 0 0 .
Je (λ) = λI + N =
. . . . . . . . . . . . . . . . . .
0 0 0 ... 1 λ
Back to the proof of Theorem 3.4.1, the nilpotent endomorphism f − λI is such that
mF (x)|mf (x), so one actually produces vectors ui with non-increasing dimensions dim Eui ,
e = e1 ≥ e2 ≥ . . . ≥ es .
After choosing suitable bases for Eui that produce respective companion matrices, the
endomorphism f is equivalent to the block matrix
Je1 (λ) 0 ... 0
0 Je2 (λ) 0 . . . 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0 0 . . . Jes (λ).
Thus, on the Jordan block Eu1 , one has f (u1 ) = λu1 +u2 , f (u2 ) = λu2 +u3 , . . . f (ue ) = λue .
In other words, ui = (f − λI)i−1 u1 for 1 ≤ i ≤ e = e1 .
which means that there exists a Jordan basis (formed out of each piece Euij ) rendering its
associated matrix
The answer is to consider the dimensions dim ker(f − λI)k for all k ≥ 1. The following
lemma is straightforward:
Lemma 3.4.4 Given a nilpotent endomorphism f such that E = Eu , dim E = n, one has
dim ker f i = i for 1 ≤ i ≤ n. If i > n, one has dim ker f i = n. The sequence di = ker f i
satisfies di − 2di−1 + di−2 ≤ 0 for every i ≥ 2 integer.
42
Theorem 3.4.5 Let f be an endomorphism. the characteristic polynomial of which is
cf (x) = (λ − x)n . There are unique numbers e1 ≥ e2 ≥ · · · ≥ es ≥ 1 such that f is
similar to a Jordan block of eigenvalue λ and exponents e1 , . . . , es . These exponents may
be retrieved by calculating the dimensions di = ker(f − λI)i for every i ≤ n.
Proof: Consider the sequence di = dim ker(f − λI)i for i ≥ 0. One has
Example 3.4.6 Let A ∈ M4 (K) be such that cA (x) = (λ − x)4 . Assume that mA (x) =
(x − λ)2 . There are two possible outcomes. Firstly, e1 = 2. The value of d1 = dim ker(A −
I) is either 2 or 3: if it is 2, then there are two cyclic subspaces Eu , Ev of dimension 2.
If d1 = 3, then d2 − d1 = 1, i.e. there is only one Eu of dimension 2, and there are two
of dimension 1 in the decomposition given by the computation 2d1 − d2 − d0 = 2.
Remark 3.5.1 Such y(t) is defined onP an interval I, which we suppose to be open. Given
that y exists and one has y = − n−1
(n) (n) (i)
i=0 ai y , then since the r.h.s. is differentiable
so is y . If the function y(t) is n + r times differentiable, then y (n) equals an (r + 1)-
(n)
times differentiable function, hence y(t) is n + r + 1 times differentiable. This shows that
y(t) ∈ C ∞ (I).
We shall consider I = R.
43
Notation: Let E = C ∞ (R) or CC∞ (R) – in the second case, the functions take complex
values. We denote the operator D = dtd . This is an endomorphism of E.
Example 3.5.2 (Eigenfunctions for D) Consider D as above. One may wonder which
are the eigenvalues of D. In the real case, Dy = λy, i.e. y 0 = λy. One knows an obvious
solution, y = eλt . To check whether there are more, w should find analogies with the
derivative of a product.
Consider the function z(t) = e−λt y(t). Differentiating yields
z 0 (t) = e−λt (y 0 − λy) = 0 ⇔ y = Ceλt .
Thus, we have obtained the only eigenfunction of D with eigenvalue λ.
In the complex case, the same proof works, but it takes a bit more work. Note that
eibt = cos bt + i sin bt, and that indeed dtd eibt = b(− sin bt + i cos bt) = ibeibt . On the other
hand, let α = a + bi. One has eαt = eat eibt , and one sees that
d αt
e = αeαt .
dt
3.5.3 (Review) Let A(t), B(t) be matrices, all of whose coefficients are differentiable.
One has dtd (AB) = dA
dt
B + A dB
dt
. This works for all kinds of bilinear maps, such as the
product of two complex functions, or the cross product of two vector functions with values
in R3 , etc.:
(F × G)0 = F 0 × G + F × G0 , (F • G)0 = F 0 • G + F • G0 .
One may proceed termwise, or by using bilinearity and the definition of the derivative.
We encourage the reader to work out the derivative of det(F1 , . . . , Fn ), where Fi (t) are
vector functions in Rn of a parameter t.
First-order linear ODEs - formula: Consider a(t), R t b(t) continuous functions on an in-
terval I ⊂ R. Take a primitive of a(t), say A(t) = t0 a(s) ds where t0 ∈ I. We wish to
solve
y 0 + a(t)y = b(t).
Concentrating on the l.h.s. and looking for the derivative of a suitable product (U y)0 , we
find that (eA y)0 = eA(t) (y 0 + a(t)y), so multiplying by eA(t) yields the equivalent form
Rephrasing our general linear ODE: Let p(x) ∈ C[x] be a monic nonconstant poly-
nomial. We wish to find an explicit form for the subspace K = ker p(D) ⊂ E. Clearly, this
subspace is also D-invariant, and D|K admits a minimal polynomial, since p(D|K) = 0.
This means that the primary components trick applies.
44
3.5.1 Homogeneous case, primary components
(z − λi )ni . By Theorem 3.3.6, one has
Q
Let p(x) =
r
M
ker p(D) = ker(D − λi I)ni .
i=1
Thus, our task reduces to finding ker(D − αI)n explicitly. Note that eαt is a solution.
Actually, writing y = zeαt yields
(D − αI)(zeαt ) = z 0 eαt .
Theorem 3.5.5 (Homogeneous linear ODEs, constant coefficients) Let p(z) ∈ C[z]
be a nonconstant monic polynomial, and let p(z) = (z − λi )ni . The solution space is
Q
Y M
ker p(D) = ker (D − λi I)ni = Cni −1 [t]eλi t .
The following admits a pedestrian solution, but using the tools learnt may provide a more
elegant approach.
3.5.6 Let P (x) ∈ R[x] be a nonzero polynomial. Prove that the following equations have
at most finitely many common solutions:
Z x Z x
P (t) cos t dt = 0, P (t) sin t dt = 0.
0 0
45
3.5.2 The equation x0 = Ax
Note that, if we have an ODE such as those solved above, we may reduce it to an ODE of
order 1. For instance, given the equation y 000 − 3y 0 + 2y = 0, we may set up the variables
x1 = y 00 , x2 = y 0 , x3 = y and we obtain the ODE
0 3 −2
x0 = Ax, where x(t) = (x1 x2 x3 )T , A = 1 0 0 .
0 1 0
An ODE (resp. linear ODE) of order n turns into an ODE (resp. linear ODE) of order
1, where the new unknown is a vector function.
x0 = A(t)x + b(t),
3.6 Problems
3.6.1 Let f be an endomorphism of E, where dimK E < ∞, and let p(x) ∈ K[x]. Prove
that p(f ) is invertible if and only if (p(x), mf (x)) = 1.
2 1 3
3.6.2 Let A = 0 2 1. Find all real matrices X ∈ M3 (R) such that X 2 = A. Solve
0 0 2
also X m = A for any integer m ≥ 3. (Hint: Use Problem 2.10.11 and use the cited result
therein.)
3.6.3 (Fitting decomposition) Given an endomorphism f ∈ EndK (E), where dimK E <
∞, prove that there are invariant subspaces F , G of E such that f |F is nilpotent and f |G
is invertible.
3.6.4 Let A, B, C be as in Problem 2.9.5. Work out the cases you left out.
46
3.6.5 Let A be a complex square matrix. Prove that there is an invertible matrix P such
that AT = P AP −1 .
3.6.6 Let f : E → E be an endomorphism of a finite-dimensional vector space E. Let F
be an invariant subspace of E. Prove that, if Ei are the primary components of f , then
M r
F = F ∩ Ei .
i=1
47
3.8 The rational normal form
The Jordan canonical form works for endomorphisms with only linear factors in the factor-
ization of its characteristic polynomial. In the general case, there is Frobenius’s rational
normal form. The gist is the same: after reducing to one primary component, one need
write E as the direct sum of cyclic subspaces.
Bases for the cyclic subspaces: If p(x) is irreducible and mu (x) = p(x)e , the basis to
be chosen for Eu reminds of the numerical system with
P base p(x),i so to speak. That is,
every polynomial admits a unique expression Q(x) = ri (x)p(x) , where deg ri < deg p.
This presents some advantage regarding the usual basis u, f (u), . . . , f i (u), . . ..
Theorem 3.8.1 Let E be such that mf (x) = p(x)e , with p(x) irreducible. There is
a decomposition of f -invariant subspaces, where Eui are cyclic and mui (x) = p(x)ei .
Ordering the exponents so that e1 ≥ e2 ≥ · · · es ≥ 1, we have that the list of exponents is
an invariant of f .
Proof: The proof is a bit more complex than in the Jordan form case. Choose u ∈ E
such that mu (x) = mf (x) = p(x)e , concretely ω such that ωp(f )e−1 u 6= 0. By using the
dual space E ∗ , one need obtain an f -invariant supplementary F to Eu :
By considering f ∗ ∈ EndK (E ∗ ), which has the same characteristic and minimal polyno-
mials as f , consider ω ∈ E ∗ such that mω (x) = mf (x) = mf ∗ (x). The subspace Eω∗ is
f ∗ -invariant, and its orthogonal (or annihilator) F = (Eω∗ )⊥ is also f -invariant. In other
words, the subspace F is defined by the linearly independent equations
ω(f i (x)) = 0, ∀i ≥ 0,
and is therefore stable by f , of codimension deg mω = deg mu = deg mf = e deg p, which
is precisely the dimension of Eu . The following equations are linearly independent and
define F by themselves:
such that mui+1 (x)|mui (x) for all i ≤ s − 1. The polynomials mui (x) are called invariant
factors, and determine f up to similarity. Q The minimal polynomial of f is mu1 (x), and
its characteristic polynomial is cf (x) = ± mui (x).
48
3.9 Problems
3.9.1 Let f be an endomorphism of a finite-dimensional vector space. We say that f
is semisimple if every invariant subspace F ⊂ E admits an invariant supplement G.
Prove that f is semisimple if and only if mf (x) consists of simple irreducible factors.
49
50
Chapter 4
There are two problems, which are intimately related, in the study of endomorphisms of
a finite-dimensional vector space. Their description amounts to the following:
1. Canonical form.– given A ∈ Mn (K), find a simple, standard form for A. In other
words, find a basis in which A has a simple standard form: A = P JP −1 . In the
case where A diagonalises, this J is the diagonal form of A.
2. Classification.– Given A, B ∈ Mn (K), find a simple way to determine whether
A, B are similar; namely, whether there exists an invertible matrix P ∈ GLn (K)
such that A = P BP −1 .
Such standard form is called canonical form, and it essentially describes the similarity
class of a matrix. In the case where the ground field K is algebraically closed, this
is the Jordan canonical form, and is the next best thing to the diagonal form of a
diagonalisable matrix.
See e.g. [12, Ch. VIII] for this subject and suitable background. References [13], [24]
deal only with the Jordan canonical form. [13] does not spare details or computations,
and contains a pedestrian, firm and progressive grasp of the concepts. [24] has a concise
treatment of the Jordan canonical form.
51
Reading the matrix on the right, we see that we need to find a basis u1 , . . . , un such
that ui = N i−1 u1 satisfies: N un = N n u1 = 0, and u1 , N u1 , . . . , N n−1 u1 are linearly
/ ker N n−1 : the minimal polynomial of u1 is therefore
independent. It suffices to choose u1 ∈
mu1 (x)|x , and does not divide x , so mu1 (x) = xn , which yields dim Eu1 = n =
n n−1
In other words, E admits a basis for which f has a block diagonal matrix form, where
the diagonal blocks are such as in (REF. MISSING). If mvi (x) = xmi . If we assume
that mi ≥ mi+1 for i < s, then the sequence m1 ≥ · · · ≥ ms ≥ 1 characterises f up to
isomorphism. P
Furthermore, mf (x) = xmax{mi } = xm1 and pf (x) = (−x)n = (−x) mj .
Lemma 4.2.1 Let u1 = u, ui = f i−1 (u) where mu (x) = xn and E = Eu . One may
describe the subspaces ker f i and Im f i as follows.
Proof: Immediate.
Now let us assume that E, f admit a special basis such as in (4.1). We represent each Evi
as a stack, and place them adjacently as follows.
52
Example 4.2.2 Let A ∈ M5 (C) be as follows.
0 0 0 0 0
1 0 0 0 0
A= 0 1 0 0 0.
0 0 0 0 0
0 0 0 1 0
We have v11 = v1 = e1 , and v12 = Av1 = e2 , v13 = A2 v1 = e3 . Analogously we have
v2 = v21 = e4 , v22 = Av2 = e5 . Thus the juxtaposition of both stacks yields the following.
Note that there are two stacks, and that dim ker f 1 = 2 (apply Lemma 4.2.1 to each one).
One also has dim ker f 2 = 2 + 2 (total boxes up to floor 2), and again dim ker f 3 = 3 + 2.
Example 4.2.3 Let n = 8. Consider A to be the following matrix:
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0
A= 0 0
.
0 0 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
Thus, the matrix A has the shape A = J4 (0)T ⊕J2 (0)T ⊕J2 (0)T (block diagonal, with those
as its diagonal blocks). A basis adapted to A would be the following:
E = Q8 = Ee4 ⊕ Ee6 ⊕ Ee8 ,
and on each cyclic block we have e3 = Ae4 , e2 = Ae3 , e1 = Ae2 , then e5 = Ae6 , e7 = Ae8
and ker A has e4 , e6 , e8 as a basis.
53
From the above computations, we extract a lesson.
Remark 4.2.4 E, f as above. the number dim ker f i − dim ker f i−1 corresponds to the
number of stacks of height ≥ i (see Lemma 4.2.1).
4.2.5 It would follow from Remark 4.2.4 that νi = dim ker f i − dim ker f i−i is nonin-
creasing. Prove unconditionally that, for every endomorphism f : E → E of a finite-
dimensional vector space E, the sequence νi is nonincreasing.
One may easily retrieve the whole ensemble of stacks by taking the numbers νi .
4.2.6 Given the numbers νi , consider the greatest number i0 such that νi0 > 0 (hence
νi0 +1 = 0). Conclude that there are precisely νi0 stacks of height i0 . Use this fact to obtain
all numbers m1 ≥ · · · ≥ ms ≥ 1.
54
where µ = dim ker f − dim(ker f ∩ Im f ) and dim(F + hw1 , . . . , wµ i) = rk f + µ. This
covers the µ stacks in E that would invisible otherwise, if we previously assumed that
they exist: Ewk = hwk i.
It now remains to consider the t nonzero stacks in Im f , e.g. Fvi , of dimension dim Fvi =
mi . Since vi ∈ F = Im f , one has vi = f (ui ), which yields Fvi ⊂ Eui . Note that
ui ∈/ Im f = F , so Fvi (L Eui andL dim Eui = dim Fvi + 1 (one storey taller). It now
remains to show that E = Eui ⊕ hwk i, and to extract our basis from this splitting.
Write uji = f j−1 (ui ). We need to prove that the n vectors uji , wk are linearly independent.
Indeed, if
X X
A= αij uji + βk wk = 0, then
1≤j≤mi +1,i≤s
0 = f (A) = 1≤j≤mi ,i≤s αi,j+1 vij−1 , which yields αij = 0 for all j ≥ 1, j ≤ mi . Thus it
P
remains to show that f m1 (u1 ), . . . , f mt (ut ) and w1 , . . . , wµ are linearly independent. By
hypothesis, f mi (ui ) = f mi −1 (vi ) form a basis of ker f ∩ Im f , and wj , j ≤ µ complete
these to a basis of ker f , so αi,mi = 0, βj = 0 for all i, j, so we have found a Jordan basis
for E, f .
mf (x) = ri=1 mEi (x) = ri=1 (x − λi )mi , pf (x) = ri=1 (λi − x)ni , where ni = dim Ei and
Q Q Q
1 ≤ mi ≤ ni .
We now consider the pairs (Ei , f |Ei ) in which the pair (E, f ) splits. f |Ei is not nilpotent,
but f − λi I is.
55
Theorem
Lρ 4.4.1L (Jordan canonical form) Let E, f be as above. One has a splitting
E = i=1 Euij , where uij ∈ Ei form the building blocks of f − λi I|Ei . If rij =
deg muij (x) = dim Euij , then f admits a basis wherein its matrix A is block diagonal of
the form
Jr11 (λ1 ) · · · 0 ··· 0
... ... ..
.
..
.
A=
0 · · · J rij (λi ) · · · 0 .
. .. .. ..
..
. . .
0 ··· 0 · · · Jrρ,s(λρ ) (λρ )
Proof: Given a block Euij within ker(f − λi I)ni , choose an adapted basis for f − λi I|Euij .
In this basis, f = f − λi I + λi I has an associated matrix equal to Jrij (λi ) = λi Irij + Nrij ,
and that is the corresponding block. Doing this for every summand in the splitting yields
a matrix as in the statement.
4.5 Problems
4.5.1 Let f be an endomorphism of a finite-dimensional vector space. We say that f
is semisimple if every invariant subspace F ⊂ E admits an invariant supplement G.
Prove that f is semisimple if and only if mf (x) consists of simple irreducible factors.
4.5.3 (Jordan decomposition) Prove Corollary 4.4.2. Use it to compute the expo-
nential of a matrix A ∈ Mn (C).
56
Chapter 5
Inner products
The structure we study here is that of a real or complex vector space, together with an
inner product. In the real case, we are talking about a general form of an inner product
indeed, which we shall call euclidean inner product. In the complex case, however,
we shall introduce another kind of inner product, called hermitian (or unitary) inner
product.
The notions of vector space and inner product became important at the end of the 19th
century, when applications to physics demanded a certain depth in the results. Thus, in
the beginning of the 20th century, a rise in abstraction of linear algebra took place, hand
in hand with its applications to mathematical physics. This grew exponentially when Max
Born, Nobel Laureate in Physics, spotted matricial structures in Heisenberg’s quantization
of the harmonic oscillator. Max Born had taken care of writing the linear algebra part
of Courant-Hilbert’s magnum opus [8], so he had the training to recognise it! Until
that moment, physicists did not have linear algebra in their syllabus, but linear algebra,
functional analysis and group theory became the bread and butter of quantum theorists.
The standard model of particle physics is a living witness to that, still unsurmounted by
any other physical theory. Even more so, superstring theory, is ever more steeped into
mathematics of all kinds, a sort of tutti-frutti of cutting-edge mathematics, and is called
by some physical mathematics (see Lee Smolin [38], Woit [42], Moore [28]).
5.1 Warm-ups
All problems may be solved using the techniques of last semester, and pave the way
for arguments improved or presented differently later in this chapter. The last problem
admits one or more solutions using the tools developed herein.
1. Find dim F ⊥ .
57
2. Prove that F ⊕ F ⊥ = Rn .
3. Prove that F ⊥⊥ = F .
5.1.4 Let E = C(R) be the vector space of continuous functions over R. Define the
subspaces E = {f ∈ E : f (x) = f (−x)}, O = {f ∈ E : f (−x) = −f (x)} of even and odd
functions, respectively. Prove that
E = C(R) = E ⊕ O.
u•A u•B
6= 0 ⇔ F ⊕ hA, Bi = Rn .
v•A v•B
(ii) g is bilinear, that is, linear on each argument: linear on the left, g(αu + βv, w) =
αg(u, w) + βg(v, w); and linear on the right, i.e. g(u, λv + µw) = λg(u, v) + µg(u, w).
Notation: The usual notation for an inner product (euclidean or hermitian) is hu, vi.
However, while this is widely used, it may be confused with our notation for the linear
span of a subset, and in these notes we shall use (u, v) whenever the inner product at
hand need not be specified.
The notation g, or G for matrices associated with an inner product, is chosen after Gram.
58
Pn
Example 5.2.1 The canonical euclidean inner product on Rn , (x, y) = x•y = i=1 xi yi ,
is a prime example.
where a > 0, ac − b2 > 0. Clearly, g is bilinear and symmetric, and if u = (x y)T , then
g(u, u) = ax2 + 2bxy + cy 2 > 0 if u 6= 0. Conversely, if a > 0 and ac − b2 > 0, then the
above formula defines an inner product on R2 .
5.2.1 (Bi)linearity
Note that a map is linear, essentially when it commutes with linear combinations. For
instance, a matrix P A of m Prows and n columns yields a function T = TA : Rn → Rm that
is linear, i.e. T ( λi ui ) = ri=1 λi T (ui ). This follows readily from the matrix product!
Proposition 5.2.4 Let E be a real vector space, dim PnE = n < ∞.P Let B : E × E → R be
a bilinear map, and let (ei ) be a basis of E. If x = i=1 xi ei , y = ni=1 yi ei , then B(x, y)
has the following expression:
Xn n
X n
X
B(x, y) = B( xi ei , yj ej ) = B(ei , ej )xi yj =
i=1 j=1 i,j=1
y1
B(e1 , e1 ) B(e1 , e2 ) . . . B(e1 , en )
B(e2 , e1 ) B(e2 , e2 ) . . . B(e1 , en ) y2
= x1 x2 . . . xn . .
..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B(en , e1 ) B(en , e2 ) . . . B(en , en ) yn
59
5.3.1 Prove that |z|2 + Re(az) + b = 0 (b real) has a solution only for |a|2 ≥ 4b (give an
explicit description of the set of solutions if you like).
(see Beck et al, A first course in complex analysis, Q 1.10.)
All concepts introduced (linear dependence and independence, dimension, etc.) using R
as field of scalars translate to the general structure of vector spaces over a field. Therefore,
when we say that two vectors u, v ∈ Cn are proportional, we say that u = λv for some
λ ∈ C, or v = µu for some µ ∈ C.
5.3.2 Let E be an n-dimensional vector space over C. Then: E is a vector space over
R, and
dimR E = 2 dimC E = 2n.
A basic example in Quantum Mechanics (which may be read in [29]) features the hermitian
inner product on C2 and the double slit experiment.
Since we have the identification C = R2 (i.e. the Argand plane), the first thing that comes
to mind is, why we should define a new inner product on Cn if we have the canonical
inner product on R2n , identifying
Cn = C × . n. . × C = R2 × . n. . × R2 = R2n ,
and using it on the real coordinates. More precisely, let u, v ∈ Cn and let ui , vi be their
complex coordinates. Write now uk = a2k−1 + ia2k , vk = b2k−1 + ib2k separating into
real and imaginary parts. Thus identified, u corresponds to A = (a1 , · · · , a2n ) and v to
B = (. . . , bi , . . .) in R2n . Their canonical inner product on R2n is therefore:
n
!
X
A • B = (a1 b1 + a2 b2 ) + (a3 b3 + a4 b4 ) + · · · (a2n−1 b2n−1 + a2n b2n ) = Re uk vk .
k=1
Example 5.3.3 In C2 , take the vectors u = (i, −1), v = (−1, −i) and u0 = (1, i + 1), v 0 =
(1 − i, 2). The inner product in R4 of u, v yields zero, although v = iu. We see that there
is a loss of information regarding proportionality! In the case of Rn we see proportionality
of two vectors through equality in the Cauchy-Schwarz inequality, but the field of scalars is
now C. Could we define a new inner product with a Cauchy-Schwarz inequality detecting
complex proportionality? .
We started this course showing product formulas for complex numbers, and reading the
geometry from them in Paragraph 1.0.2 of last semester’s notes (first lecture of the course).
60
Note that the case k = 1 (already shown in detail) uv encloses the signed (or oriented)
area of the parallelogram formed by u, v (its opposite in this case).
Note that Re u•v corresponds to the Euclidean inner product on R2n , following the above
identifications. The imaginary part added contains geometrical information, which will
appear below.
1. v • u = u • v;
u • (µ1 v 0 + µ2 v 00 ) = µ1 (u • v 0 ) + µ2 (u • v 00 );
Both second and third properties together are called sesquilinearity. The prefix sesqui
means ‘one and a half ’ (i.e. ‘half linear’ =semilinear on the right). A sesquilinear form
on Cn is hermitian if it satisfies the first property. Finally, properties 1 through 4 define
an hermitian inner product.
Example 5.3.5 Calculate the hermitian inner product for the pairs of vectors of Example
5.3.3. Note that, although u, v = iu appear to be orthogonal in R4 , their hermitian inner
product u • iu = (−i)u • u = −2i, and the objection raised in Example 5.3.3 vanishes.
Remark 5.3.6 Let u, v ∈ Cn , such that u•v 6= 0. There are r > 0 and a complex number
of the form eiθ such that u • v = reiθ , both unique, and so we write u • (eiθ v) = r ≥ 0.
61
Theorem 5.3.7 (Pythagoras’s Theorem in Cn ) If u, v ∈ Cn are orthogonal with re-
spect to the canonical hermitian inner product, then
(u + v) • (u + v) = u • u + v • v + (u • v + v • u) = u • u + v • v + 2Re(u • v).
The angle between u and v is defined to be the same as that resulting from the iden-
tification of Cn with R2n (check out the real part, and proceed accordingly). The new
definition now provides the desired theorem.
Example 5.3.8 Note that, if v = eiθ u where 0 6= u ∈ Cn and 0 ≤ θ < π, then the angle
between u and v is θ.
Proof: We shall provide one by George Pólya here, and hint at other proofs later.
Let u, v ∈ Cn be such that u • v 6= 0 (strict inequality is clear if u, v are orthogonal, as
is equality precisely when either u or v equals zero). We now use Remark 5.3.6, and we
have:
u • eiθ v = r > 0.
62
Applying now the real Cauchy-Schwarz inequality to u•v = Re(u•v), i.e. to the euclidean
inner product on R2n , yields:
and the r.h.s. clearly coincides with (u • u)(v • v). This proves inequality.
Assume that, in the above case, we have equality. This means that both vectors u and
eiθ v, as vectors in R2n , are linearly dependent over R, i.e. since u, v 6= 0, u = λeiθ v for
some λ ∈ R − {0}, which implies that u, v are linearly dependent over C.
If u, v are linearly dependent over C, then it is a simple matter to check that equality
holds in the hermitian case, which concludes the proof.
and derive the Cauchy-Schwarz inequality, specifying the cases where equality holds.
2
Solution for 5.3.1: |z|2 + Re(az) = |z + a2 |2 − |a|4 . The solution follows from this.
Alternatively, write x + iy and an equation for a circle appears.
Using the above, consider A, B ∈ Cn linearly independent vectors. kAz+Bk2 = kAk2 |z|2 +
2Re((A • B)z) + kBk2 > 0 for every z ∈ C, hence |A • B|2 < kAk2 kBk2 . The minimum
value of the expression occurs at z = − B•A
A•A
(see formula (5.1) or 5.3.1), which provides
yet another proof.
63
5.4 Hermitian spaces
Definition An hermitian inner product on a vector space E over C is an hermitian
sesquilinear form that is positive definite. that is:
(IP1) (Sesquilinear) If α, β ∈ C, u, v, w ∈ E, then h(αu + βv, w) = αh(u, w) + βh(v, w)
(C-linear on the 1st variable). For all λ, µ ∈ C, v, w, w0 ∈ C, h(v, λw + µw0 ) =
λh(v, w) + µh(v, w0 ) (semilinear/antilinear on the 2nd variable).
(IP2) (Hermitian) Aside from (IP2), one has h(u, v) = h(v, u).
(IP3) (Positive definite) h(u, u) ≥ 0, and if h(u, u) = 0 then u = 0.
A complex vector space E together with an hermitian inner product h, (E, h), is called
an hermitian vector space (h is omitted unless it is not clear from the context). It is
common to write h(u, v) = (u, v)
Example 5.4.3 Let ek (x) = eikx , where k ∈ Z. We have ek ∈ C([−π, π]). These
functions form an orthonormal system for the normalised L2 product, which is (f, g) =
1
R π
2π −π
f g. Indeed,
Z π
1
(ek , e` ) = ei(k−`)x dx = 1 if k = `, 0 if k 6= `.
2π −π
These functions are linearly independent, as we proved in the chapter of vector spaces,
but cannot quite be called a basis.
Example 5.4.4 (Hilbert-Schmidt inner product) On Mn (C), consider the inner prod-
uct given by (X, Y ) = tr X T Y . This is an hermitian inner product, and (X, X) =
2
P
1≤i,j≤n |xi | .
64
y1
(e1 , e1 ) (e1 , e2 ) . . . (e1 , en )
(e2 , e1 ) (e2 , e2 ) . . . (e1 , en ) y2
= x1 x2 . . . xn . .
..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(en , e1 ) (en , e2 ) . . . (en , en ) yn
Notation and silly differences: Dirac, in his book The Principles of Quantum Me-
chanics, demands semilinearity on the left and C-linearity on the right when introducing
the hermitian inner product. Thus, the physicist’s definition describes h(u, v) = h(v, u),
where h is a mathematician’s hermitian inner product.
Dirac’s notation is hf |gi, and involves further ingenuous notations when we consider
operators, which we shall omit.
More and more mathematicians are using the physicist’s notation, see e.g. [15]. We care
not in the least about these differences, and would have preferred to write this chapter
with the physicist’s convention.
Theorem 5.4.6 (Cauchy-Schwarz inequality) Let E, (, ) be an hermitian or euclidean
vector space. If u, v ∈ E, then
|(u, v)|2 ≤ (u, u)(v, v),
and equality holds if and only if u, v are linearly dependent (i.e. parallel).
Proof: If v = 0, the theorem is clear. If v 6= 0, then (u − (u,v)
(v,v)
v, u − (u,v)
(v,v)
v) > 0 if u, v are
linearly independent, and 0 otherwise. This settles the theorem.
Adapt Subsection 5.3.2 for other proofs of this Theorem.
Proposition 5.4.7 (Triangle inequality) Let E be an phermitian or euclidean vector
space. The triangle inequality holds for the norm kxk = (x, x). That is,
ku + vk ≤ kuk + kvk, with equality if and only if u = λv, λ ≥ 0, or v = µu, µ ≥ 0.
Proof: Indeed, squaring and applying Cauchy-Schwarz yields
ku + vk2 = kuk2 + kvk2 + 2Re(u, v) ≤ kuk2 + kvk2 + 2kuk· kvk,
with equality if and only if u||v and (u, v) is non-negative real.
5.4.8 (Polarization identities) Let E be an euclidean or hermitian vector space, and
let (, ) be its inner product.
1. If E is euclidean, show that
1 1
ku + vk2 − kuk2 − kvk2 = ku + vk2 − ku − vk2 .
(u, v) =
2 4
2. Consider the above expression in the case where E is hermitian. Show that it gives
rise to Re(u, v) instead.
3. Compute (iu, v) in the hermitian case, and show how the norm squared determines
the hermitian inner product, just as it happened in the euclidean case.
65
5.5 Gram-Schmidt orthogonalization process
Let E be an euclidean or hermitian vector space. Given a vector u, it is automatic to
1
normalise u and thus to obtain a unit vector, by choosing kuk u. Likewise, given linearly
independent vectors u, v, it is easy to form another linearly independent set of vectors
which spans the same vector subspace but is orthogonal. Indeed, the pair u, v − (y,u)
(v,v)
u
does the job nicely, and can be normalised to obtain an orthonormal system of vectors.
Proof: We prove by induction on r that there exists an orthogonal basis wi such that
hu1 , . . . , ui i = hw1 , . . . , wi i for every i ≤ r. If r = 1, the statement is obvious. Let
w1 , . . . , wr−1Pbe an orthogonal system such that hu1 , . . . , ui i = hw1 , . . . , wi i for i ≤ r − 1:
write wr = r−1 k=1 ak wk + ur . The condition that wr ⊥ wi , i ≤ r − 1 reads
and so therefore
r−1
X (ur , wk )
wr = ur − wk
k=1
(wk , wk )
1
satisfies the required condition. The choice of vk = w
kwk k k
settles the Theorem.
Proof: Extend the basis of F given to a basis of E, then apply the Gram-Schmidt process.
5.5.4 Let f1 , . . . , fr ∈ C[a, b] be continuous functions on [a, b] with real values. Suppose
that the following determinant is zero:
Rb 2 Rb Rb
a
f1 a
f1 f2 . . . a f1 fn
Rb Rb 2 Rb
a
f 1 f 2 a
f 2 . . . ff
a 2 n = 0.
.............................
Rb Rb Rb 2
f f
a n 1
f f ...
a n 2
f
a n
66
5.5.5 Let u1 , . . . , un be an orthogonal basis of an hermitian space E. Compute explicitly,
for every vector x ∈ E, the coordinates of x in the basis ui .
i−1
X i−1
X
(5.2) wi = ui + ξj wj = ui + ηj uj
j=1 j=1
by the above equality, though we use the l.h.s. (left hand side) expression for computation
purposes.
Thus, if we should write the coordinates of the vectors w1 , . . . , wr in the basis ui , i ≤ r as
columns of a basis, the result shall be w1 = a11 u1 +. . .+ar1 ur , . . ., w1 = a1k u1 +. . .+ark ur ,
and by formula (5.2) one has aii = 1, and aij = 0 for i > j. In other words, the resulting
matrix A is upper triangular unipotent (upper triangular, and its diagonal consists only
of 1’s).
67
If we now consider vi = kw1i k wi , then the procedure yields an r × r matrix B of coefficients,
where the column Bk corresponds to kw1k k Ak . Thus, the matrix B of coordinates of the
vectors vi with respect to the basis ui of F is triangular, with only positive elements in
its diagonal (to be precise, kw1i k ).
Now consider any vector x = rk=1 αk vk . Writing vk as a linear combination of the ui for
P
every k yields an expression
r r
! r r
!
X X X X
αk ajk uj = ajk αk uj ,
k=1 j=1 j=1 k=1
P
and if x = βk uk , then the above reads as follows in matrix form:
β1 a11 . . . a1r α1 β1 α1
.. .. . . . . .. ..
.= . . .. .. , i.e. . = A . .
βr ar1 . . . arr αr βr αr
Since both vk and uk form bases of F , the matrix A is invertible (in our case this is
explicitly shown by construction), and so
α1 β1
.. −1 ..
. = A . .
αr βr
We have thus shown:
The following example will illustrate Exercise 5.5.8 and this paragraph.
R1
Example 5.5.10 Let E = C([−1, 1]) with the corresponding L2 -product, (f, g) = −1 f g.
Consider the elements ui = xi−1 , for i = 0, 1, 2. We shall apply the Gram-Schmidt process
to them. w1 = u1 , w2 = λu1 +u2 so that (w1 , u1 ) = 0, but since (u1 , u2 ) = 0 one has λ = 0.
The element w3 = αu1 + βu2 + u3 = α + βx + x2 is orthogonal to both u1 , u2 , i.e. 1, x,
α = 0, β = − 31 . For any polynomial in F = R2 [x] = hu1 , u2 , u3 i = hw1 , w2 , w3 i,
and so P
p(x) = αi wi , one has
1 1
α1 · 1 + α2 x + α3 (x2 − ) = β1 · 1 + β2 x + β3 x3 , hence β1 = α1 − α3 , β2 = α2 , β3 = α3 .
3 3
68
Thus
1 0 − 31
β1 α1
β2 = A α2 , where A = 0 1 0 .
β3 α3 0 0 1
The Gram matrices of the ui ’s and wi ’s are:
2 0 23
2 0 0
Gu = 0 23 0 , Gw = 0 23 0 .
2
3
0 52 0 0 15 1
69
Pif ui is a set of r vectors and ei is an orthonormal basis of F = hu1 , . . . , ur i,
It follows that,
writing ui = sk=1 qki ek yields, for the Gram matrix of u1 , . . . , ur : G = QT G0 Q, where G0
is the Gram matrix of the ei ’s, hence G0 = Id and det G = | det Q|2 , and det G > 0 if the
ui form a linearly independent system.
Note that 5.5.8 is a stronger result than Lemma 5.6.1 below!
(u1 , ri=1 αi ui )
P
r r
∗
0 = α Gα = α1 . . . αr
.
.
X X
=( αi ui , αi ui ) = 0,
Pr. i=1 i=1
(ur , i=1 αi ui )
hence α1 u1 + . . . + αr ur = 0, so therefore αi = 0 for all i ≤ r.
(x − x∗ , ui ) = 0, for 1 ≤ i ≤ r.
70
Write x∗ = ri=1 ci ui . The condition that x − x∗ ∈ F ⊥ is equivalent to a system of r
P
equations and r unknowns (ci ), namely
r
X
∗
(x , ui ) = ck (uk , ui ) = (x, ui ), 1 ≤ i ≤ r.
k=1
The above system of equations (the normal equations is the usual term) is as follows:
c1 (x, u1 )
(u1 , u1 ) (u2 , u1 ) . . . (ur , u1 )
(u1 , u2 ) (u2 , u2 ) . . . (ur , u2 ) c2 (x, u2 )
=
. . .
.. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(u1 , ur ) (u2 , ur ) . . . (ur , ur ) c (x, ur )
| {z } r
=GT =G
c1
c2
If G is the Gram matrix of u1 , . . . , ur and c = .. , we get
.
cr
(x, u1 )
(x, u2 )
c = (G)−1 .. ,
.
(x, ur )
which proves both existence and uniqueness. It remains to prove that the element obtained
minimizes the distance. Let y ∈ F : by Pythagoras’s theorem, one has
so kx − yk > kx − x∗ k whenever x∗ 6= y ∈ F .
The fact that F ∩ F ⊥ = 0 is immediate (u ∈ F ⊥ ∩ F ⇒ u ⊥ u, i.e. (u, u) = 0), as is the
fact that F + F ⊥ = E (x = x∗ + (x − x∗ ) ∈ F + F ⊥ ).
Remark 5.6.3 (Streamlined proof ) Note that, if we use the Gram-Schmidt process
and produce an
P orthonormal basis e1 , . . . , er of F , the solution is straightforward: the
element x∗ = ri=1 (x, ei )ei is the desired orthogonal projection.
71
Let b0 be the orthogonal projection of b on Im A. The system Ax = b0 has a solution,
which is necessarily unique since ker A = 0 (so the homogeneous solution is {0}). The
unique x satisfying Ax0 = b0 is said to be obtained using the least squares method, for it
minimizes kAx − bk.
Note that, since ker A = 0, ker AT A = 0 by Lemma 5.6.1, and AT A is a square matrix of
order r, hence invertible. Thus, Ax0 = b0 if and only if AT Ax0 = AT b0 = AT b (note that
b − b0 ∈ F ⊥ and so is orthogonal to the columns of A, hence AT (b − b0 ) = 0), and so
x0 = (AT A)−1 AT b.
Remark 5.6.4 The above argument is nothing more than the normal equations applied
to the case ui = Ai , and so on.
so therefore
P 2 P −1 P
a T −1 T a x i x i x i y i
= (A A) A y, i.e., = P P .
b b xi n yi
Indeed, this whole paragraph is merely a particular case of Theorem 5.6.2 and the discus-
sion around it.
Ax − b = (Ax − b0 ) + (b0 − b), hence kAx − bk2 = k(Ax − b0 )k2 + k(b0 − b)k2 ≥ kb0 − bk2 ,
72
5.7 Fourier series and L2-projection
Consider the hermitian space E = CC (T), which corresponds to the space of contin-
uous 2π-periodic functions f : R → C (with complex values), and to the (averaged)
L2 -hermitian product Z π
1
(f, g) = f g.
2π −π
Let Fn = nk=−n Cek , where ek (t) = eikt (this is a known orthonormal system to us). The
L
elements of Fn are called trigonometric polynomials of degree ≤ n.
Definition Let f ∈ CC (T) (one may define this for arbitrary functions that are Riemann
integrable over [−π, π]). Let n ∈ Z. The n-th Fourier coefficient of f is defined to be
Z π
ˆ 1
f (n) = (f, en ) = f (t)e−int dt.
2π −π
Since the ek form an orthonormal system, Fn clearly has an orthonormal basis ek , and
the orthogonal projection of f to Fn coincides with
n
X
Sn [f ](x) = fˆ(k)eikx .
k=−n
5.8 Problems
5.8.1 Consider the euclidean vector space M2
(R) with
the Hilbert-Schmidt inner product,
3 2
(A, B) = tr AT B. Given the matrix U = , find the orthogonal projection of
2 1
1 1
V = on the subspace hU i⊥ .
1 2
73
5.8.5 Let E = C([−1, 1]) (real vector space). Let E ⊂ E (resp. O ⊂ E) be the subspace
of even functions in E (resp. odd functions). Prove that E ⊥ = O.
5.8.6 (QR factorization) Let A be a square matrix of order n, such that det A 6= 0.
Prove that there exist a matrix Q such that QT Q = I and an upper triangular matrix R
with positive diagonals rii > 0 such that A = QR, and that they are both unique.
5.8.8 Let w(x) > 0 be a continuous function, w : [0, 1] → R. Prove that the following
defines an inner product on E = Rn [x], where n ∈ N:
Z 1
(p, q)w = p(x)q(x)w(x) dx.
0
1. Prove that E has an orthonormal basis pk ∈ E, such that deg pk = k for each
0 ≤ k ≤ n.
5.8.10 (Apostol Vol. II, Ch. 1) Define on Rn [x] (vector space of polynomials of de-
gree ≤ n over R) the following function:
n
X k k
B(f, g) = f ( )g( ).
k=0
n n
74
5.8.11 Show that the following functions form an orthogonal system in E = C([−π, π]):
1 π π
Z Z
1
an = f (x) cos nx dx, bn = f (x) sin nx dx,
π −π π −π
where a0 , an and bn (n ∈ N) are the Fourier coefficients associated with the above trigono-
metric functions, i.e. for every N ∈ N, the function f has the following orthogonal
projection Fn = h1, {cos kx, sin kx}1≤k≤n i
n
a0 X
πFn f = + ak cos kx + bk sin kx.
2 k=1
where an , bn are computed in terms of products of the type f (x) cos nx or f (x) sin nx.
Many of Fourier’s arguments were infantile and error-ridden, but this powerful idea sur-
vived and enriched after polishing by towering geniuses such as Riemann and Dirichlet.
75
If we have not thus far unashamedly written that
∞
a0 X
“f (x) = + an cos nx + bn sin nx”,
2 n=1
it is due to lack of precision regarding the word ‘convergence’ and other issues.
76
Problems and famous results on Fourier series
Here we pose problems that use the above described techniques (taken from [40]), and
state some amazing results.
5.8.17 Find the Fourier series of f (x) = x on [−π, π]. (Find the answer below)
sin 2x sin 3x
x ∼ 2 sin x − + − ...
2 3
5.8.19 Find the Fourier series of f (x) = x3 on [−π, π]. (Answer: See the expansion
below, which converges pointwise to x3 if |x| < π)
∞
−π 2
3
X
n 6
x ∼ 2(−1) + 3 sin nx.
n=1
n n
5.8.20 Find the Fourier series of f (x) = |x| on [−π, π]. (The convergence is also point-
wise in the points specified below).
∞
π X 4
|x| = − cos nx, |x| < π
2 n=0 (2n + 1)2 π
5.8.21 Let a ∈
/ Z be a real number. Find the Fourier series of cos ax in [−π, π].
∞
1 X 1 1
cot z = + + .
z n=1 z + nπ z − nπ
77
P∞ 1
Theorem 5.8.23 (Euler) If p ∈ N, then the number ζ(2p) = n=1 n2p is a rational
number times π 2p . More precisely,
22p−1 π 2p B2p
ζ(2p) = (−1)p+1 ,
(2p)!
Euler was the one to introduce the so-called Riemann zeta function, and produced other
interesting results on ζ(s), including the famous formula
∞
X 1 Y 1
s
= 1 .
n=1
n p prime
1 − ps
ζ is called the Riemann zeta function due to a fascinating paper of Riemann’s, where
he proves its analytic continuation in two ways and connects it to the distribution of
prime numbers. The Prime Number Theorem (asymptotic distribution of prime numbers)
had been conjectured by Gauss himself, and was settled independently by Hadamard
and de la Vallée-Poussin using the Riemann zeta function. The Riemann Hypothesis,
which Riemann conjectured in that outstanding paper, is one of the Clay Math Institute’s
Millennium Problems on mathematics.
78
Chapter 6
Proposition 6.1.1 (Base change) Let (ei ), (ui ) be two bases of an n-dimensional vector
space E, and let B be a bilinear form on E. If Γ is the matrix with columns (ui )e , then
Bu = ΓT Be Γ.
79
Definition Let E be a real vector space. An inner product on E is a bilinear form (, )
on E satisfying the following properties:
A symmetric bilinear form such that S(x, x) ≥ 0 but S(x, x) may be 0 for some x 6= 0 is
called positive semidefinite.
Remark 6.1.2 Note that a bilinear form on a vector space E is determined by the values
B(ei , ej ), where (ei ) is a basis of E.
6.1.4 Let E = h1, cos x, cos 2x, · · ·R, cos nx, sin x, · · · , sin nxi ⊂ C ∞ (R). Find the ma-
2π
trix of the bilinear form B(f, g) = 0 f g in the basis listed. Using this, prove that the
functions are linearly independent.
Example 6.1.5 Let ABCD be the standard regular tetrahedron in Euclidean space R3 .
~ AC,
Find the matrix of the canonical inner product in the basis AB, ~ AD.
~
Note that the norm of each vector is 1, and the inner product of two distinct vectors is
1/2. The Gram matrix with respect to this basis is therefore
1 1
1 2 2
1 1 1 .
2 2
1 1
2 2
1
Proof: Indeed, fixing a basis (ei ) on E and its dual basis (e∗i ) on E ∗ , the matrices of both
maps correspond to Be and BeT .
80
Definition Let E be a complex vector space. A sesquilinear form is a map
B :E×E →C
that is bilinear over R, and such that B(λx, y) = λB(x, y), B(x, µy) = µB(x, y) for all
λ, µ ∈ C, x, y ∈ E. In other words, B is bilinear in the first variable and semilinear (i.e.
antilinear) on the second. A sesquilinear form B is called hermitian if B(y, x) = B(y, x)
for all x, y ∈ E.
Proposition
P P Let dim E = n ∈ N (finite-dimensional). If ei is a basis of E and
6.1.9
x = xi ei , y = yj ej , then
B(e1 , e1 ) . . . B(e1 , en ) y1
.
.. .
.. .
.. .
B(x, y) = (x1 · · · xn ) .. .
B(en , e1 ) . . . B(en , en ) yn
Proposition 6.1.10 (Base change for sesquilinear forms) Let (ei ), (ui ) be two bases
of an n-dimensional vector space E over C, and let B be a sesquilinear form on E. If Γ
is the matrix with columns (ui )e , then
Bu = ΓT Be Γ.
An hermitian form such that S(x, x) ≥ 0 but S(x, x) may be 0 for some x 6= 0 is called
positive semidefinite.
81
Example 6.2.1 Rn , resp Cn with the canonical euclidean (resp. hermitian) inner product
are prime examples of euclidean (resp. hermitian) vector spaces.
Example 6.2.2 Let [a, b] ⊂ R be a compact interval, and let V ⊂ CC ([a, b]) be a vector
subspace. Consider the inner product
Z b
hf, gi = f g.
a
Example 6.2.3 Prove that the system (eikx )k∈Z in CC ([−π, π])R 2πis an orthonormal sys-
1
tem, when choosing the normalised inner product hf, ghL2 = 2π 0 f g. Conclude that this
system is linearly independent, and compare with 6.1.4.
and is called Hilbert-Schmidt inner product. (The real version is T r(X T Y ) on Mm×n (R).)
This matrix is called Gram matrix of the basis ei . Given a system of vectors u1 , . . . , ur ,
the Gram matrix associated with the ui ’s is the r × r-matrix
(u1 , u1 ) . . . (u1 , un )
Ge = ... .. .. .
. .
(un , u1 ) . . . (un , un )
Example 6.2.5 In an euclidean vector space E, the Gram matrix associated with an
orthonormal basis ei is the identity, Ge = I. Thus, the inner product in this basis is
expressed as follows: X X X
( xi ei , yj ej ) = xi y i .
If the basis is orthogonal, then Ge is diagonal and non-degenerate (det Ge 6= 0).
82
6.2.6 The Hilbert-Schmidt inner product is the only inner product on Mm×n (C) having
the canonical basis of matrices as an orthonormal basis.
This in turn is equivalent to Re (u, v) ≤ |(u, v)| ≤ |u|· |v|, and for equality to hold the
Cauchy-Schwarz inequality must turn into an equality, i.e. u, v are linearly dependent.
Assuming that none is zero, we have u = αv, and for both inequalities to become equalities
it is necessary and sufficient to have a ∈ [0, ∞).
Note that the notion of angle between two nonzero vectors u, v 6= 0 can now be defined:
such angle θ is the only angle between 0 and π such that cos θ = Re|u|·|v|
(u,v)
.
83
T
Notation: Let A ∈ Mn (R) (resp. C). We denote by A∗ the matrix A (which in the real
case is of course AT ).
6.2.10 Let A be a real or complex m × n matrix. Prove that ker A∗ A = ker A . Analo-
gously, prove that Im AA∗ = Im A.
84
Proof: The ui form a basis of F = hu1 , · · · , ur i. Choose an orthonormal basis (ei ) of F ;
one has
Gu = ΓT Γ,
where Γi = (ui )e base change matrix, so indeed det Gu = | det Γ|2 > 0.
6.2.14 Prove that u1 , · · · , ur are linearly dependent if and only if their Gram matrix is
non-invertible, namely det Gu = 0.
Remark Note that Corollary 6.2.13 and 6.2.14 imply the Cauchy-Schwarz inequality
when r = 2, and form a ‘high-degree’ version thereof.
F ⊕ F ⊥ = E.
Proof: The Gram-Schmidt process makes the proof quick, though it is not strictly nec-
essary. It is clear that F ∩ F ⊥ = 0, for if v ∈ F ∩ F ⊥ , then (v, v) = 0, hence v = 0.
Given an orthonormal basis (ei )1≤i≤r of F , extend it to a basis (ei ) of E. By the Gram-
Schmidt process, one may assume that this basis is orthonormal. It is easy to see that
F ⊥ = her+1 , · · · , en i, which proves the theorem.
Theorem 6.3.1 (Sylvester’s criterion) The following are equivalent for an n × n real
symmetric or complex hermitian matrix B:
1. B is positive definite.
for all r = 1, · · · , n.
85
Proof: One direction follows from Corollary 6.2.13. Assuming 2, let us prove 1 by
induction on n. If n = 1, it is clear. If n > 1, assuming 2 for all r we have: 1 holds for
the subspace he1 , · · · , en−1 i. Thus, since the submatrix B 0 = (bij )1≤i,j≤n−1 satisfies 2, B 0
is positive definite, and so by Corollary 6.2.12 admits an orthonormal basis v1 , · · · , vn−1 .
Let the base change matrix be Γ0 (square (n − 1) × (n − 1)). After performing a base
change, B transforms as follows:
0 T
Γ 0 B w Γ 0
(?) = T .
0 1 w α 0 1
One sees that (?), which is the matrix of B in the basis v1 , · · · , vn−1 , en of Cn , corresponds
to
I a
(?) = Φ = .
aT β
Pn−1
Now, by virtue of 2 we still have det Φ > 0, i.e. det Φ = β − 1=1 |ai |2 > 0 (see e.g. ??).
The matrix Φ still satisfies 2, and the vectors e1 , . . . , en−1 of the canonical basis are
orthonormal. Let us complete the Gram-Schmidt process (if possible).
Indeed, it remains to modify en so that e0n shall be orthogonal to ei , i ≤ n − 1. Consider
n−1
X
e0n = en − (en , ei )ei :
i=1
clearly,
Pn−1 e0n ⊥ ei , i ≤ n − 1 by construction (do check it!) One also has that e0T 0
n Φen =
β − i=1 |ai |2 = µ > 0, and so e1 , . . . , en−1 , √1µ e0n form an orthonormal basis for Φ, which
means that Φ is positive definite and hence so is G.
Theorem 6.3.3 Let G be a real n × n symmetric matrix, with nonzero angular minors
µk 6= 0 of every order k. There is a triangular basis wherein G admits a diagonal form
D = diag(d1 , · · · , dn ), and the signs of the diagonal are as follows: sgn(d1 ) = sgn(g11 ),
and sgn(dk ) = sgn(µk /µk−1 ) for 2 ≤ k ≤ n. An analogous statement holds for a complex
hermitian matrix G: a decomposition LT DL = G exists, and the rest holds verbatim.
86
6.4 The adjoint operator
Definition Given two euclidean or hermitian spaces E, F (of finite dimension), the ad-
joint operator of a linear map f : E → F is the unique linear map f ∗ : F → E such
that, for every x ∈ E, y ∈ F
AT G0 = GB.
In other words,
−1 T T
(6.2) B = G A G = (GT )−1 A GT .
Remark Note that, if E, F are fixed, then the map f 7→ f ∗ from HomC (E, F ) to
HomC (F, E) is R-linear, and C-semilinear in the hermitian case.
Corollary 6.4.2 Choose orthonormal bases (ei ) of E and (uj ) of F . If the matrix of f
T
in these bases is A, then the matrix of f ∗ in the same bases is A .
Proof: We prove the first statement only. f (x) = 0 if and only if (f (x), v) = 0 for all
v ∈ E2 , i.e. if for all v ∈ E2 0 = (x, f ∗ (v)), but this is tantamount to x ∈ (Im f ∗ )⊥ .
Corollary 6.4.4 Given a matrix A ∈ Mm×n (C), one has ker A = (Im A∗ )⊥ , and ker A∗ =
(Im A)⊥ .
6.4.5 Isometries are always injective, and invertible in the case of endomorphisms of a
finite-dimensional vector space.
87
6.5 Spectral theorem for self-adjoint operators
Here E is finite-dimensional.
and so λ is real, since (v, v) 6= 0. The same proof holds for the euclidean case by extending
scalars from R to C on the matrix A, see Exercise ??.
Proof: Let w ∈ F ⊥ . For any u ∈ F , one has (u, f ∗ (w)) = (f (u), w) = 0, which shows
that f ∗ (w) ∈ F ⊥ .
where µ, M are the smallest (resp. greatest) eigenvalues of f . Each one of the equalities
turns into an equality for some value of x.
6.5.5 Prove that f ∗ = −f if and only if, (x, f (x)) = 0 for every x ∈ E.
6.5.6 Prove the spectral theorem for anti-self-adjoint endomorphisms (complex case): if
A∗ = −A, then A admits an orthonormal basis of eigenvectors.
88
6.6 Spectral theorem for unitary endomorphisms
There is a basic observation one can make about the eigenvalues of unitary endomor-
phisms.
Lemma 6.6.1 A unitary endomorphisms of an hermitian vector space has all its eigen-
values of modulus 1.
Theorem 6.6.2 Let E be a finite-dimensional hermitian vector space, and let f ∈ End(E).
There is an orthonormal basis of E in which f diagonalises, and all its eigenvalues are
complex numbers of modulus 1.
Theorem 6.6.3 (Orthogonal case) Let E be euclidean, and let f be orthogonal, i.e.
f f ∗ = IE . f admits an orthonormal basis in which the matrix of f has the form
Ir 0 0 ··· 0
0 −Is 0 · · · 0
0
0 N 1 · · · 0 ,
. . . . . . . . . . . . . . . . . . . . .
0 0 0 · · · Nt
cos θi − sin θi
where Nk = .
sin θi cos θi
Proof: Fix an orthonormal basis, and consider A to be the matrix of f in this basis. The
case λ = ±1 is plain. Suppose that λ ∈ / R is an eigenvalue of A, and that u is a unit vector
in its eigenspace. The vector u is an eigenvector of A, its eigenvalue is λ, and u, u form
an orthonormal system in Cn . Now, hu, ui ∩ Rn is 2-dimensional, and an orthonormal
basis thereof is given by √12 (u + u), i√1 2 (u − u). One may associate with each eigenvalue λ
with Im λ > 0 an orthonormal basis uij , and then pair each vector with its conjugate in
ker(A − λI) as above. Since ker(A − λI) ⊥ ker(A − λ0 I) for λ 6= λ0 , one thus obtains an
orthonormal basis for the whole of Rn .
89
6.7 Normal endomorphisms
Definition Let f : E → E be an endomorphism of a real Euclidean or complex hermitian
space. We say that f is normal if f f ∗ = f ∗ f .
T T T
6.7.1 Let A ∈ Mn (C). Suppose that A, A commute, i.e. AA = A A. Show that
T T
ker A = ker A and Im A = Im A .
6.9 Problems
6.9.1 Consider a set of vectors u1 , · · · , uk ∈ E, where E is an arbitrary vector space
with an inner product h•, •i. Prove that the ui are LD if and only if the Gram matrix
gij = hui , uj i is degenerate, i.e. has determinant 0. More precisely, prove that the rank of
the Gram matrix of the ui coincides with dimhu1 , · · · , uk i.
6.9.2 Given f1 , . . . , fr ∈ CC [a, b] complex continuous functions on [a, b], prove that
dimhf1 , · · · , fr i = rk G(fi ) .
6.9.3 (OIMU 2013, P2) Let V be an infinite-dimensional real vector space, and let
u ∈ V . Let S ⊂ V be an infinite-dimensional vector subspace. Calculate dim(S⊕hui)∩S ⊥ .
90
Solution for 6.9.3: Let F = (S ⊕ hui) ∩ S ⊥ . We shall prove that the desired dim F is 0
or 1.
If u ∈ S ⊥ , the answer is clear: F = S ⊥ ∩ (S ⊕ hui) = hui has dimension 1. Otherwise, let
w ∈ S be such that w • u = 1. We shall assume after rescaling u that |w| = 1. We have
s = s − (s, u)w + (s, u)w, which incarnates the direct sum decomposition S = ker ω ⊕ hwi,
where ω : S → R is the linear functional given by ω(s) = (u, s).
Note that there is no loss of generality in assuming that E = S ⊕ hui. Consider a nonzero
element σ + au ∈ S ⊥ . This means that, for every s ∈ S,
Taking s = w ∈ S, a = −(w, σ) is determined. Thus, our element is of the form σ−(w, σ)u
and satisfies
0 = (σ − (w, σ)u, s) = (σ, s) − (s, u)(w, σ), ∀s ∈ S.
Rearranging the above yields, for all s ∈ S:
(s − (s, u)w, σ) = 0, ∀s ∈ S.
Note that T (s) = s − (s, u)w has image Im T = u⊥ ∩ S, and that ker T = hwi. T is a
projection, albeit not of the orthogonal kind, for T 2 = T and w need not be orthogonal
to u⊥ . The condition on σ becomes: σ ∈ S is orthogonal to Su = u⊥ ∩ S.
Thus, we have the following. Let H = Im T ⊂ S = u⊥ ∩ S. If there is an orthogonal
⊥
decomposition H ⊕ hu0 i = S, then H may be described as u⊥ 0 for u0 ∈ S, and σ ∈ hu0 i
describes a 1-dimensional subspace. If such decomposition does not exist, then σ may
only be 0, and F = 0.
If E is a Hilbert space and S ⊂ E is a closed subspace, then such u0 always exists [25,
Th. V.1.6, Cor. V.1.8]. Otherwise, such decomposition may not exist.
6.9.5 Given an euclidean vector space, prove that given u, v, w vectors one has
6.9.6 Let f : E → F be a linear map between hermitian spaces (of finite dimension),
and let f ∗ be its adjoint. Show that kf k = kf ∗ k.
6.9.7 Let m > n, and let A be an m × n-matrix of rank n. Consider the linear variety
L, defined by AX = I, of Mn×m (R). Find its element of minimal norm.
6.9.8 Let A ∈ Mn (C) be a normal matrix, i.e. AA∗ = A∗ A. Prove the spectral theorem
for A by reducing that for self-adjoint matrices.
91
6.9.9 (80 WLP 2019, B-3) Let u ∈ Rn be a unit vector, |u| = 1, and let
P = I −2uuT . Let Q be an n×n real orthogonal matrix with no eigenvectors of eigenvalue
1. Show that the matrix P Q has 1 as an eigenvalue.
6.9.10 Let u1 , · · · , un ∈ Rn be vectors such that kui k2 < 1. Prove that the vectors
P
ei + ui , for i = 1, · · · , n form a basis of Rn .
Hint for 6.9.10: Write a matrix I + X, and prove that X is small enough so I + X is
invertible.
6.9.12 Let A be a real symmetric matrix. Prove that r = rk A is the largest number
for which there is a nonzero principal minor of order r, AII 6= 0, where I ⊂ {1, 2, . . . , n},
|I| = r.
6.9.13 Given a positive semidefinite hermitian matrix A ∈ Mn (C), prove that there
exists a unique hermitian square root B = B ∗ of A that is also positive semidefinite.
6.9.14 (IMC 2021, P5) Let A ∈ Mn (R) be such that, for all m ∈ N, there exists a
symmetric B ∈ Mn (R) such that
2021B = Am + B 2 .
6.9.15 Let A ∈ GLn (C). Show that there are unique matrices H positive definite her-
mitian and U unitary such that A = HU (this is called polar decomposition). If
A ∈ Mn (C) is not necessarily invertible, prove the existence of such decomposition. What
about uniqueness then?
6.9.16 [?, Th. 2.1.4, Cors. 2.1.5 and 2.1.6] Let u1 , . . . , un be linearly independent
vectors in an hermitian or euclidean vector space. Prove that the Gram-Schmidt process
may be obtained by considering the vectors
and by taking φk = √ 1
ϕk , where ∆k is the Gram determinant of u1 , . . . , uk , ∆k =
∆k ∆k−1
det((ui , uj ))1≤i,j≤k and ∆0 = 1.
6.9.17 Define
α −β
H={ } ⊂ M2 (C).
β α
92
(a) Prove that H is an R-algebra (with the matrix product).
P HP −1 ⊂ H ⇔ P ∈ SU (2).
6.9.19 Compute the dimensions of the vector subspaces of self-adjoint and anti-self-
adjoint endomorphisms of Mn (R) and Mn (C), respectively. Prove that there is a direct
sum decomposition in both cases.
6.9.20 Prove the spectral theorem for normal endomorphisms using the case for hermi-
tian endomorphisms.
6.9.21 Prove the spectral theorem for normal endomorphisms using polar decomposition.
A(A∗ A)k = B.
(6.3) A + A∗ = A2 A∗ .
6.9.25 Show that SU (2)/ ± I ∼ = SO(3). Prove first that every matrix A ∈ SO(3) may
be written as
2
a + b2 − c 2 − d 2 2(bc − ad) 2(ac + bd)
ρ(a, b, c, d) = 2(ad + bc) a2 − b 2 + c 2 − d 2 2(cd − ab) ,
2(bd − ac) 2(ab + cd) a2 − b2 − c2 + d2
where a2 + b2 + c2 + d2 = 1.
93
6.9.26 (CIIM 2022) Let A ∈ M2 (R). Let v be a unit vector, and assume the following
conditions:
(i) The vectors Av, A2 v, A3 v are also unit vectors;
(ii) one has A2 v 6= ±v, A2 v 6= ±Av.
Prove that AT A = I.
Solution for 6.9.26: Let B(x, y) = (Ax, Ay) − (x, y). The symmetric bilinear form B
is zero if and only if B(ui , uj ) = 0 for (ui ) a basis of R2 . If A = λI, one has λ = ±1 by
(i) (which violates (ii)), so mA (x) = pA (x). If τ = tr A, δ = det A, one has A2 = τ A − δA
(Cayley-Hamilton), and we shall assume for now that τ δ 6= 0. Consider the basis u1 =
v, u2 = Av.
In the case where τ δ 6= 0, note that
From the above and using A2 v = −δv + τ Av, the equality B(A2 v, A2 v) = 0 becomes
If Q(u) = (Au, Au) − (u, u), then we have Q(v) = 0, Q(Av) = 0, where v, Av are linearly
independent but this points out to the formula (6.4). Since Q(A2 v) = 0, this points to
more zeros than those listed in (6.4), if A2 v is not parallel to v or Av. This turns out to
be the case (statement plus short arguments), and forces Q to be identically zero.
Solution for 6.7.1: We know that ker A = ker A∗ A, Im A = Im AA∗ . Thus, AA∗ = A∗ A
means ker A∗ = ker AA∗ = ker A∗ A = ker A, and hence also that Im A = (ker A∗ )⊥ =
(ker A)⊥ (use Proposition 6.4.3).
94
Proof of the spectral theorem for normal endomorphisms 6.7.2: One has
⊥
ker A ⊕ Im A = Cn ,
and so Im A = Im A2 , i.e. ker A = ker Ak for every k ∈ N. The multiplicity of the factor
x in the minimal polynomial mA (x) is therefore 1. Since Im A = (ker A)⊥ (which in turn
equal Im A∗ ), one could proceed by induction, but we shall do the following.
Apply 6.7.1 to A − λI, where λ is an eigenvalue. Within Im A, if λ 6= 0 then (A − λI)∗ =
A∗ − λI satisfies:
⊥ ⊥
Cn = ker A ⊕ ker(A − λ1 I) ⊕ (Im A ∩ (Im (A − λ1 I)),
95
96
Chapter 7
Quadratic forms
In the whole chapter, we shall conform ourselves to the real case, i.e. K = R. Over C,
the results are simpler. The term form stands for an homogeneous polynomial: a linear
form is a homogeneous polynomial of degree 1, such as 2x + 4y − 5z. A quadratic form
shall be of degree two, for instance x2 − 2yz + 2z 2 − 2xy.
7.1 Introduction
A form is an homogeneous polynomial function. A linear form (in n variables) is a
polynomial of the form a1 x1 + . . . + an xn , where ai are constants.
P A quadratic form is
a homogeneous polynomial of degree 2, hence of the form 1≤i≤j≤n cij xi xj .
Note that there are n(n+1)
2
monomials xi xj (1 ≤ i ≤ j ≤ n) of degree 2 in n variables, so
the vector space of quadratic forms has dimension n(n + 1)/2. The connection with the
vector space of symmetric matrices of order n is more than a coincidence in dimension.
P
Proposition 7.1.1 Let x = (x1 , . . . , xn ). For every quadratic form Q(x) = 1≤i≤j≤n cij xi xj
there is a unique symmetric matrix A such that Q(x) = xT Ax.
97
Remark 7.1.2 In the above discussion, there is one case where base change works the
same in both cases, that of endomorphisms and of quadratic forms. That is when P is
orthogonal, i.e. P T = P −1 .
7.1.3 Prove that there is a linear (homogeneous) change of variables that takes the
quadratic form xy to x2 − y 2 . (Hint: Aren’t both products?)
We have two problems in the case of quadratic forms: finding invariants (under linear
change of coordinates), and methods to decide whether two quadratic forms are equivalent:
namely, given two symmetric n × n real matrices A, B, how may we decide whether
B = P T AP for some P ∈ GLn (R)? (Recall that GLn is the linear group, which consists
of all invertible n × n matrices).
Definition Let Q(x) = xT Ax be a quadratic form, and let A be its associated symmetric
matrix. We say that Q, or A, is non-degenerate if det A 6= 0. (By Remark 7.2.1, this
condition is invariant under linear change of coordinates).
In other words:
Theorem 7.2.2 The quadratic forms associated to symmetric matrices A, B ∈ Symn (R)
are metrically equivalent (i.e. there is an orthogonal matrix P such that P T AP = B) if
and only if
Example 7.2.3 The difference between metric equivalence and general equivalence of two
quadratic forms is illustrated in the following example. Let Q(x, y) = x2 + y 2 , and let
2 2
Q0 (x, y) = xa + yb , where 0 < a < b. The change of variables x = au, y = bv
transforms Q0 into Q. However, if we take the level curves, say Q(x, y) = 1 and Q0 (x, y) =
1, it is clear that one is a circle and the other one is not, although indeed Q0 (au, bv) =
u2 + v 2 .
98
Example 7.2.4 Clearly, if D, D0 are two diagonal matrices associated to two quadratic
forms, of the same rank r and with the same number of positive eigenvectors, we have the
following. Assume that in both cases the positive eigenvalues are in rows 1 to s:
r
X r
X
Q(x) = δ1 x21 + ... + δs x2s + δi x2i , Q0 (y) = δ10 y12 + ... + δs0 ys2 + δi0 yi2 ;
i=s+1 i=s+1
q
δi
If i = + δi0
, i = 1 for i > r, then yi = i xi (which we write as y = P x, where P is
diagonal of elements i ), where Q0 (y) = Q0 (P x) = Q(x).
Theorem 7.2.5 (Sylvester’s Law of Inertia) Rank and signature are the only invari-
ants of a quadratic form (here, we mean by general linear change of coordinates). In other
words, two quadratic forms may be transformed into one another by means of a linear
change of coordinates if and only if their ranks and signatures are equal.
Proof: We shall proceed by induction on n. Assume that aii 6= 0 for some i. For sim-
plicity, we shall assume that a11 6= 0 (else P one may effect a permutation of the variables).
The terms including x1 are: a11 x21 + 2 ni=2 a1i x1 xi , which we may write as
!2 !2
n n n
X a1i X a1i X a1i
a11 (x21 + 2 x1 xi ) = a11 x1 + xi − xi .
i=2
a 11 i=2
a 11 i=2
a 11
Thus,
n
!2
X a1i
Q(x1 , . . . , xn ) = a11 z12 − xi + q1 (x2 , . . . , xn ).
i=2
a11
In other words, our quadratic form in the variables z1 , x2 , . . . , xn is of the form a11 z12 +
q 00 (x2 , . . . , xn ), where
n
!2
X a1i
q2 (x2 , . . . , xn ) = −a11 xi + q1 (x2 , . . . , xn ).
i=2
a 11
99
Note that (z1 , x2 , x3 , . . . , xn ) is obtained from x by multiplying by an elementary matrix
– one may wish to define zi = xi , for i > 1. The proof by induction is concluded in this
case.
If all diagonal elements of the associated symmetric matrix are zero, either A = 0 (done),
or there is a term aij 6= 0 for some i 6= j. Let us suppose for simplicity that a12 6= 0.
The term a12 x1 x2 may be written as a difference of squares, x1 x2 = 14 [y12 − y22 ], where
y1 = x1 + x2 , y2 = x1 − x2 , and the resulting quadratic form has a011 , a022 6= 0, so the above
induction step works.
Corollary 7.3.2 A quadratic form Q of rank r is equivalent to one of the form ri=1 ±x2i .
P
as desired.
Proof
Pr of Theorem 7.2.5:PLet A, B be two invertible P matrices such P that Q(Ay) =
2 r 2 p 2 q 2
i=1 ±y i and Q(By) = i=1 ±y i . One has Q(x) = i=1 ω(x) − i=1 ξ(x) and
Pp 0 0
Q(x) = i=1 α(x)2 − qi=1 β(x)2 , where p + q = r = p0 + q 0 , where α, β, ω, ξ are lin-
P
ear forms. Assume that p 6= p0 , q 6= q 0 (we assume that p < p0 ). This means that q 0 < q.
One may write the above as follows:
p q p 0 q 0
X X X X
2 2 2
Q(x) = ωi (x) − ξi (x) = αi (x) − βi (x)2 .
i=1 i=1 i=1 i=1
in other words,
p q0 p0 q
X X X X
2 2 2
ωi (x) + βi (x) = αi (x) + ξi (x)2 = (?).
i=1 i=1 i=1 i=1
Remark We have finally shown that both rank and signature characterise a quadratic
form over R up to equivalence. Over fields such as Q, the issue is much more complicated,
see e.g. J.P. Serre, A course in arithmetic for an introductory text to the subject, which
is still active nowadays.
100
The following is a useful computational tool.
Theorem 7.3.3 Let A be the real symmetric matrix associated with a real quadratic form
Q(x) = xT Ax. Assume that the angular (or leading) minors of A,
a11 . . . a1k
∆k = ... . . . ... ,
ak1 . . . akk
∆2 ∆k ∆n
∆1 , ,..., ,...,
∆1 ∆k−1 ∆n−1
B(u1 , u1 ) . . . B(u1 , uk )
B(u2 , u1 ) . . . B(u2 , uk )
∆1 = B(u1 , u1 ), . . . , .. .. .. ,k ≤ r
. . .
B(uk , u1 ) . . . B(uk , uk )
101
since it has two equal columns. Likewise, we see that B(uk , vk ) = ∆k , and since vk =
∆k−1 uk + (a linear combination of u1 , . . . , uk−1 ) we have
which yields an orthogonal basis that diagonalises B. The signature of B reads plainly
from this presentation, and coincides with the description given.
Note that the Gram-Schmidt process, Sylvester’s criterion to characterise inner products
and Theorem 7.3.3 follow from the above argument. A stronger version of Theorem 7.3.3
is as follows.
Theorem 7.3.4 Let B(x, y) = xT Ay be a quadratic form on Rn , where A = AT is of
n
rank 1 and has lead minors ∆1 , . .. , ∆r 6= 0. There is a basis v1 , . . . , vn of R which
D 0
diagonalises B and such that Bv = , where D is r × r and has the shape described
0 0
in Theorem 7.3.3.
0
A A00
Proof: Indeed, since ∆r 6= 0, we write A = , where A0 is invertible. The
B 0 B 00
equations for ker A have the following shape:
y
x= , A0 y + A00 z = 0, i.e. y = −(A0 )−1 A00 z = Cz, where C = −(A0 )−1 A00 .
z
C
Thus, a basis vr+1 , . . . , vn of ker A = Im , together with v1 , . . . , vr shall form the
In−r
desired basis of Rn .
7.4 Quadrics
One may consider, given a quadratic form, the level hypersurfaces given by Q(x) = c,
c ∈ R. The study of their shape may be undertaken by using the tools provided here.
Consider the example x2 + y 2 + z 2 − r2 = 0. If we translate this surface by the vector
(a, b, c), the resulting equation is
102
The equation may be expressed in matrix form as follows: let A = (aij ) be the symmetric
matrix associated with the degree-2 part of the above (aii = αii , aij = αij /2 = aji if
i < j), and let b = (bk ) ∈ Rn be a column vector. In block matrix form, the quadric may
be written as
T A bT /2 x
(x 1) = 0.
b/2 c 1
We shall often consider two quadrics equal whenever their equations differ by a constant
factor. This holds whenever the underlying locus is empty or lower-dimensional, namely
x2 + 2y 2 = 0 and x2 + y 2 = 0 are different quadrics in R2 and R3 .
Example 7.4.1 (Conics) Quadrics on the plane are called conics, a name provided by
Apollonius. The ellipse (x/a)2 + (y/b)2 − 1 = 0 for a, b > 0 is but an example.
and if Q(x) is the quadratic polynomial whose zero locus defines our quadric, the change
of variables Q(P z + q) looks as follows in matrix form:
T
T A bT /2 x T P AP bT z
Q(P z + q) = (x 1) = (z 1) .
b/2 c 1 b c 1
We say that two quadrics are equivalent if Q1 (x) is a constant multiple of Q2 after
effecting a suitable affine change of coordinates x = P z + q.
The affine classification of quadrics consists of two aspects: Firstly, finding a particu-
larly simple form that is equivalent to a given quadric, and presenting a full list of those
types up to equivalence.
Metric classification is similar, except that we consider only those transformations
x = P z + q where the linear part P is orthogonal, P T P = I. Thus, the distances are
preserved: for x0 = P z 0 + q, x00 = P z 00 + q, we have kx0 − x00 k = kP (z 0 − z 00 )k = kz 0 − z 00 k.
103
Example 7.4.3 Consider the general real conic given by the equation
T A bT /2 x
Q(x) = (x 1) = 0.
b/2 c 1
Here A 6= 0 is real symmetric of order 2 (if A = 0 the equation would be linear). The
rank of A may be therefore 1 or 2. If rk A = 2, then b ∈ Im A and the equation is
x2 x2
metrically equivalent to one of the form a21 ± b22 − 1 = 0 if the real locus of zeros is
nonempty, the sign being + if A is (positive or negative) definite (which yields an ellipse
if definite, a hyperbola if indefinite). In the case where rk A = 1, choose the sign so
thatA
1 0
is positive semidefinite: A is metrically equivalent to a matrix of the form (xT 1) ,
0 0
and subtracting to b a suitable element of the image of A (see the proof of Proposition
7.4.2) yields after rescaling an equation of the form
x21 + αx2 + β = 0,
which after writing x2 = ±x02 + γ yields x21 + αx2 = 0.
x2 ± (y/b)2 ± (z/c)2 = 0,
where a, b, c > 0.
If Q(0) 6= 0 (assuming the linear part vanishes), then our Q with a diagonal A looks as
follows:
7.6 Problems
A b
7.6.1 Consider a complex conic Q given by the matrix M = , where A 6= 0.
bT 1
Prove that Q is a pair of lines if and only if det M = 0 (this includes the case of the
double line, i.e. Q(x) = (ax + by + c)2 = 0).
7.6.2 Consider two conics which do not share a line, Q1 = 0, Q2 = 0. Prove that
their intersection consists at most of four points. (Hint: Consider the pencil of conics
Q1 + λQ2 = 0, and find the degenerate conics in the pencil).
7.6.3 Let Q be a quadratic form on Rn . Let F be a vector subspace of Rn such that Q|F
is positive definite. Prove that F ⊥ ⊕ F = Rn .
104
Appendix: Signature and Gauss elimination
In this appendix we provide a proof of Theorem 7.3.3 using the LU decomposition. Note
that this result vastly generalises Sylvester’s characterisation of positive definite quadratic
forms.
The following result may be proven by induction, using block matrices.
Theorem 7.6.4 A square matrix A of order n, all of which angular minors are nonzero,
admits a unique decomposition A = LU , where L is lower triangular unipotent (tii = 1,
tij = 0 for i < j) and U is upper triangular. Moreover, if A is symmetric, then U may be
written as U = DLT , where D is diagonal.
The proof of the first assertion is carried out in the Chapter pertaining to linear sys-
tems (block matrices and induction on n), and the second assertion follows readily from
uniqueness of L, U .
L11 0
Back to Theorem 7.3.3, write L = (blocks of size k and n−k, respectively) and
L
21 L22
D1 0
do the same with D: D = , where D = diag(λ1 , . . . , λk ). Denote the submatrix
0 D2
Ak of A to be that formed by the first k rows and columns. Block multiplication yields
Ak = L11 D1 LT11 , hence ∆k = (det L11 )2 det D1 = det D1 , for L11 is triangular unipotent.
The Theorem follows.
Corollary 7.6.5 (Theorem 7.3.4 Let A be a real symmetric matrix of rank r, such that
its principal minors ∆i 6= 0 for P
1 ≤ i ≤ r. The quadratic form Q associated withA is
equivalent to the quadratic form ri=1 βi x2i , where β1 = ∆1 and βi = ∆∆i−1
i
, 2 ≤ i ≤ r and
βi = 0 for i > r.
The Corollary follows from applying Theorem 7.3.3 to the quadratic form Q|he1 , . . . , er i.
105
106
Bibliography
[3] M.A. Armstrong, Groups and Symmetry, Springer UTM, 1st Ed, 1998.
[9] D.B. Fraleigh, A First Course in Abstract Algebra, 7th Ed, Pearson 2002.
[10] F.Brochero, Gugu Moreira et al, Teoria dos números. Um passeio pelos primos, 3rd
Ed, SBM.
[20] K. Hardy, K.S. Williams, The Green Book of Mathematical Problems, Dover, 1985.
107
[21] K. Hardy, K.S. Williams, The Red Book of Mathematical Problems, reprinted Dover,
1996.
[22] K. Hoffman, R. Kunze, Linear Algebra, 2nd Ed., Prentice Hall, 1971.
[25] S. Lang, Real and Functional Analysis, 3rd Ed, Springer, 1993.
[31] G. Pólya, G. Szegö, Problems and Theorems in Analysis (2 Vols.), Springer, 1972.
[34] A. Reventos, Affine Maps, Euclidean Motions and Quadrics, UTM, Springer, 2011.
[41] I.M. Vinogradov, An introduction to the theory of numbers, Pergamon Press, 1955.
108