0% found this document useful (0 votes)
13 views

algpoly

The document discusses determinants in linear algebra, focusing on their geometric interpretation and properties in both 2D and 3D spaces. It outlines the axioms that define the determinant function, including multilinearity and alternateness, and presents Binet's formula relating the determinants of matrix products. Additionally, it addresses the implications of linear dependence on the determinant and provides examples and theorems to illustrate these concepts.

Uploaded by

darkkshaddow1313
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

algpoly

The document discusses determinants in linear algebra, focusing on their geometric interpretation and properties in both 2D and 3D spaces. It outlines the axioms that define the determinant function, including multilinearity and alternateness, and presents Binet's formula relating the determinants of matrix products. Additionally, it addresses the implications of linear dependence on the determinant and provides examples and theorems to illustrate these concepts.

Uploaded by

darkkshaddow1313
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Selected extra chapters on linear algebra

José J. Ramón Marı́

July 10, 2023


2
Chapter 1

Determinants

Let us go back for a moment to the picture we showed in Chapter (vectors?). If u =


(a, b), v = (c, d) ∈ R2 then the oriented area of the parallelogram formed by u, v is given
by the formula:
a c
= ad − bc = |u|· |v| sin(β − α),
b d
where α, β are the respective arguments of u, v.
If we swap u, v, the result changes sign. Also, given the sign of the sine function, (figure
missing on positive and negative orientations) ...

1.1 Learning lessons from R3


Given u, v, w ∈ R3 vectors, the oriented volume of the parallelopiped formed by u, v, w in
this order satisfies a series of properties, and each one is eloquently explained by a figure
(all figures missing here!).
det(u, v, w) = (u × v) • w views u, v as the (oriented) base and w is the “height provider”.
If w is parallel to the base, i.e. w ∈ hu, vi, then det(u, v, w) = 0.
u×v
Let n = ku×vk . If w = a· n + αu + βv, then

(u × v) • w = aku × vk,
and the figures provide a visual explanation for the fact that (u × v) • [w + λu + µv] =
(u × v) • w. Note that Cavalieri’s principle provides a visual proof for additivity!
The behaviour of this oriented volume with respect to permutation of the arguments is
well-known: its sign changes for transpositions, i.e. whenever we swap two vectors, e.g.
swapping u, w we get (u × v) • w = −(w × v) • u, etc.. In fact, should we wish to consider
u, w to form the base of the parallelopiped, we write
(u × v) • w = (w × u) • v = −(u × w) • v,
and proceed accordingly.
Inspired by this example, one may list a set of axioms that a function called ‘signed
volume’ or ‘oriented volume’ should satisfy, and then come up with the determinant.

3
1.2 Alternate n-linear forms on K n
We have working knowledge of 2 × 2 and 3 × 3 determinants. We shall guide ourselves by
this geometrical knowledge to construct an n-dimensional generalisation, which shall be
the determinant of a square matrix of order n. First we shall list the essential properties
of the determinant.
n
Let V : Rn × · · · × Rn → R be an ‘oriented volume function’. From our 2− and
3−dimensional experience, we distilled the following rules.
DET0) (normalization) V (e1 , e2 , . . . , en ) = 1;
DET1) (multilinearity) the function V is n-linear, i.e. linear on each variable if we fix
the arguments in the others. To wit,

V (λv + µw, u2 , . . . , un ) = λV (v, u2 , . . . , un ) + µV (w, u2 , . . . , un ),

and in general

V (u1 , . . . , ui−1 , λv + µw, ui+1 , . . . , un ) =


λV (u1 , . . . , ui−1 , v, ui+1 , . . . , un ) + µV (u1 , . . . , ui−1 , w, ui+1 , . . . , un );

DET2) (alternate) the function V is alternate, i.e. if we swap two arguments, the value
of V changes precisely in a sign:

V (u1 , . . . uj (place i), . . . , ui (place j), . . . , un ) = −V (u1 , . . . , ui , . . . , uj , . . . , un ).

An obvious consequence (and reformulation!) is that, if we enter the same vector v


in two different arguments, then

V (u1 , . . . , v, . . . , v, . . . , un ) = 0.

Explanation: ˆ DET1) carries behind the following explanation. If we consider the


i-th component to be the height carrier, and the other n-1 vectors to form the base
of the n-dimensional parallelopiped, then linearity is what we expected based on
our 2- and 3-dimensional experience.
ˆ DET2) contains the following. If the height-carrying vector is parallel to the base
(i.e. span of the remaining n-1 vectors), then the determinant must be zero! (For
further details, see Theorem 1.3.2 below).
ˆ DET2) contains some information regarding orientation. If we permute the vectors
u1 , · · · , un , then the value changes in a sign.
ˆ DET0) carries the normalization value, i.e. fixes as 1 the volume of the canonical
parallelopiped, with the natural order of the vectors (since it is a signed volume,
the vectors e.g. e2 , e1 , e3 , · · · , en will yield the value −1 after normalising the value
of V (e1 , · · · , en ) = 1).

4
Let us retrieve the determinant in the 2×2 e 3×3 cases. Note that manipulations effected
with the canonical basis work for any basis (though this comment shall be made clear in
the 2nd Semester of the course).

Example 1.2.1 If n = 2, then V (ae1 +be2 , ce1 +de2 ) = aV (e1 , ce1 +de2 )+bV (e2 , ce1 +de2 ).
After developing the expression, it equals

acV (e1 , e1 ) + adV (e1 , e2 ) + bcV (e2 , e1 ) + bdV (e2 , e2 );

in turn, V (ei , ei ) = 0 by DET2, e V (e2 , e1 ) = −V (e1 , e2 ), hence

V (ae1 + be2 , ce1 + de2 ) = (ad − bc)V (e1 , e2 ).

Example 1.2.2 For n = 3, let us compute

V (A, B, C) = V (a1 e1 + a2 e2 + a3 e3 , b1 e1 + b2 e2 + b3 e3 , c1 e1 + c2 e2 + c3 e3 ).

By multilinearity, V (A, B, C) = a1 b2 c3 V (e1 , e2 , e3 ) + a1 b2 c2 V (e1 , e2 , e2 ) + · · · Observe,


however, that each time ei appears twice in the expansion, we obtain 0 (alternate), so
only the terms where all the ei appear explicitly may survive:

V (A, B, C) = a1 b2 c3 V (e1 , e2 , e3 ) + a1 b3 c2 V (e1 , e3 , e2 ) + a2 b1 c3 V (e2 , e1 , e3 ) +


+a2 b3 c1 V (e2 , e3 , e1 ) + a3 b1 c2 V (e3 , e1 , e2 ) + a3 b2 c1 V (e3 , e2 , e1 ).

To complete the process, observe the following récipé.

RECIPE: When we have a permutation of n elements in the arguments of V , as is


the case, the advice is to swap elements in pairs, leaving en in the rightmost place,
then en−1 in its rightful place, and so on, all along applying alternateness. For instance,
V (e2 , e3 , e1 ) = −V (e2 , e1 , e3 ), and in turn −V (e2 , e1 , e3 ) = V (e1 , e2 , e3 ).

Thus we see that

V (A, B, C) = (a1 b2 c3 − a1 b3 c2 − a2 b1 c3 + a2 b3 c1 + a3 b1 c2 − a3 b2 c1 ) V (e1 , e2 , e3 ).

In other words, V (A, B, C) = det(A, B, C)V (e1 , e2 , e3 ).


Properties DET1, DET2 determine V up to a constant, and condition V (e1 , . . . , en ) = 1
finally fixes the determinant function.

Definition The determinant of n vectors in K n is given by properties DET1, DET2,DET0.


Let A be an n × n square matrix. We define the determinant of the matrix A as
det A = det(A1 , . . . , An ).

5
1.3 Binet’s formula
Theorem 1.3.1 det AB = det A det B.

Proof: Consider T (u1 , · · · , un ) = det(Au1 , · · · , Aun ). Since T is n-linear and alter-


nate, we have T (u1 , · · · , un ) = T (e1 , · · · , en ) det(ei ) (u1 , · · · , un ). In our case, det AB =
det(AB1 , · · · , ABn ), where Bi is the i-th column of B (likewise ABi is the i-th col-
umn of AB). Note that T (e1 , · · · , en ) = det A, so indeed det AB = T (B1 , · · · , Bn ) =
det A det(B1 , · · · , Bn ) = det A det B. 

IMPORTANT: What Proposition 1.3.1 proves is that the determinant of a square ma-
trix A, det A, is a scaling factor between the volume of an n-parallelopiped generated by
the vectors u1 , . . . , un and that of its image by A, i.e. to one determined by, Au1 , . . . , Aun .
The sign of det A shows whether the orientation of Aui is the same as, or opposite to,
that of the ui .

Theorem 1.3.2 Let A be an n × n matrix. Then: det A = 0 if and only if its columns
A1 , · · · , An are linearly dependent.

Proof: Suppose that


P the columns are LD. For simplicity (or after applying DET2),
suppose that A1 = ni=2 λi Ai . One has:
X n
X
det(A1 , . . . , An ) = det( λi Ai , A2 , . . . , An ) = λi det(Ai , A2 , · · · , Ai , · · · , An ) = 0,
i≥2 i=2

by linearity and alternateness. Now, assume that the columns of A are linearly
P k indepen-
n
dent: since the Ai form a basis of K , the canonical basis is such that ei = bi Ak . This
means that I = AB, and by Binet’s Formula det A det B = det I = 1, hence det A 6= 0. 
In
P order to have an explicit formula for the determinant, we use matrix notation: Aj =
aij ei . By DET1, we have:
n
X n
X X
V (A1 , . . . , An ) = V ( ak1 1 ek1 , · · · , akn n ekn ) = ak1 1 · · · akn n V (ek1 , . . . , ekn ).
k1 =1 kn =1 k1 ,...,kn

Again, by DET2, only the terms where k1 , . . . , kn distinct shall survive, i.e. {k1 , . . . , kn } =
{1, . . . , n}. Here we use the notation for permutations: a permutation of n elements is
a bijection σ : {1, . . . , n} → {1, . . . , n}. The set (group) of permutations of {1, · · · , n} is
denoted by Sn , and is called the symmetric group of n elements.
Back to the expression, we have
X
V (A1 , . . . , An ) = aσ(1)1 · · · aσ(n)n V (eσ(1) , . . . , eσ(n) ) = (?)
σ∈Sn

in turn, V (eσ(1) , . . . , eσ(n) ) = sgn(σ)V (e1 , . . . , en ), where sgn(σ) = ±1 (the sign of σ) is


related to the sign changes via DET2.

6
Definition A permutation τ ∈ Sn is called transposition (between i, j (com i 6= j) if
τ (i) = j, τ (j) = i and τ (k) = k for every k 6= i, j. In other words, a transposition is a
swap between two elements i, j. One usually writes τ = (i j). (Note that τ −1 = τ for
every transposition τ ).

Theorem 1.3.3 Every permutation σ ∈ Sn factors as a product of transpositions. If


σ = τ1 · · · τk , then σ −1 = τk · · · τ1 .

The following illustrates a general principle that provides a general proof.


5
Example 1.3.4 Let V : R5 × · · · × R5 → R satisfy DET1), DET2). Consider the
canonical basis ei . We wish to find out who is V (e2 , e3 , e4 , e5 , e1 ). Remember to put the
last elements last, and so on until all are back in position.
First, we swap e5 and e1 , so that e5 is back at its 5th place. By DET2), one has

V (e2 , e3 , e4 , e5 , e1 ) = −V (e2 , e3 , e4 , e1 , e5 ).
Now e4 goes to its rightful place, and for that we swap e1 , e4 :
V (e2 , e3 , e1 , e4 , e5 ) = −V (e2 , e3 , e4 , e1 , e5 ).
We do the same with e3 and e2 , which entails two more sign changes, and thus we get
V (e2 , e3 , e4 , e5 , e1 ) = V (e1 , e2 , e3 , e4 , e5 ).
Note that, if σ = τ1 · · · τr , where τi = τi−1 are transpositions, then using (ab)−1 = b−1 a−1
yields
σ −1 = (τ1 · · · τr )−1 = τr−1 · · · τ1−1 = τr · · · τ1 .
(The process we showed in Example 1.3.4 in fact provides a decomposition of the inverse
σ −1 , but we shall not dwell on this, and instead refer to the first Chapter on groups.)

Example 1.3.5 (Sign of a cycle) A cycle or order r is a permutation of this kind:


there are a1 , · · · , ar elements (within {1, · · · , n}) such that ai 6= aj , and: σ(ai ) = ai+1 for
i < r, σ(ar ) = a1 , and all other elements are fixed, i.e. σ(j) = j for j 6= ai . We write
such σ as σ = (a1 a2 . . . ar ).
One has (a1 a2 . . . ar ) = (a1 a2 )(a2 a3 ) · · · (ar−1 ar ), so its sign is (−1)r−1 .

Corollary 1.3.6 sgn(σ) = (−1)k , where σ = τ1 · · · τk .

CLAIM: (proven in the Appendix at the end of this Chapter) The function sgn is well
defined, and multiplicative, i.e.: sgn(ση) = sgn(σ)sgn(η) for σ, η ∈ Sn .

That is how we obtain the formula for the n × n determinant:

X
(1.1) det A = sgn(σ)aσ(1)1 · · · aσ(n)n ,
σ∈Sn

where sgn is the sign made explicit in Corollary 1.3.6. Another important result follows:

7
Proposition 1.3.7 det A = det AT .

Proof: Clearly, aσ(k)k = ATk,σ(k) . Also, when we have the graph of a bijection, the graph
of its inverse is obtained by transposing the horizontal and vertical axes (i.e. reflection
through the diagonal) in the case of real functions of a real variable. This also holds by
permutations of {1, 2, · · · , n} (A PICTURE IS LACKING!).
Therefore, for every σ ∈ Sn , it follows from the former paragraph that
n
Y n
Y
aσ(k)k = a`σ−1 (`) ,
k=1 `=1

just swapping k by ` = σ(k). On the other hand, sgn(σ) = sgn(σ −1 ), so therefore det A =
det AT . 

1.4 Minors. Laplace’s rule


Clearly, once we have a matrix A with the first column A1 = e1 the determinant reduces
to order n − 1. Indeed,

a22 · · · a2n
det(e1 , A2 , · · · , An ) = . . . . . . . . . . . . . .
an2 · · · ann .

For instance, one may argue that σ(1) = 1 (hence aσ(1)1 = 1) for every permutation with
a nonzero product aσ(1)1 · · · aσ(n)n .
Likewise one may proceed with general A1 :
n
X
det A = det(A1 , A2 , · · · , An ) = ai1 det(ei , A2 , · · · , An ),
i=1

and so it suffices to process each det(ei , A2 , · · · , An ). In order to permute the rows


1, 2, · · · , i, · · · , n into i, 2, 3, · · · , i − 1, 1, i + 1, · · · , n one need perform i − 1 row swaps, i.e.
transpositions (see Example 1.3.5), and then we reduce to the case i = 1, det(ei , A2 , · · · , An ) =
(−1)i−1 det(e1 , A02 , · · · , A0n ), where the matrix A0 is obtained from A by taking the first
row A01 = Ai and the 2nd to n-th rows of A0 to be the rows 1stton − th rows of A after
cancelling out the i-th row. If n = 4 and i = 3, then
 3
A
A1 
A0 =  0 0 0
 3−1
A2  and det(e3 , A2 , A3 , A4 ) = (−1) det(e1 , A2 , A3 , A4 ).

A4
In all, this argument and its column analogue yield the following result.

Theorem 1.4.1 (Laplace’s rule, poor version) Let A be a square matrix of order n.

8
1. (Developing by a column) Fix j (j-th column of A). Then:
n
X c
det A = (−1)k+j akj Ajic ;
k=1

Pn i+k c
2. (Developing by a row) fix i (i-th row of A). Then: det A = k=1 (−1) aik Akic ,
c
where Abac is the determinant of the submatrix resulting from erasing the a-th row and the
b-th column.

Proof: The argument is essentially given before the statement.

Definition Let B ∈ Mm×n (K) be a matrix. A minor, or minor determinant of B, of


order r is the determinant of a square submatrix of B, which is obtained by choosing r
rows i1 < · · · < ir and r columns j1 < · · · < jr of B. The minor with rows I = {i1 , · · · , ir }
and columns J = {j1 , · · · , jr } is the determinant

bij11 · · · bij1r
BJI = . . . . . . . . . . . . .
bijr1 · · · bijrr

Laplace’s rule has a full-fledged version, which we shall refrain from stating or proving
here.

1.4.1 Cofactors. The adjugant matrix


Theorem 1.4.2 The following identities hold for A ∈ Mn (K):

A· adj A = adj A· A = (det A)I.

Thus, if A is invertible then A−1 = 1


det A
adj A.

Proof: Laplace’s rule yields the following lemma.

Lemma 1.4.3 Let A be a square matrix of order n.


1. Fix j (j-th column of A). Then:
n
X c
det A = (−1)k+j akj Ajic ;
k=1

2. Fix j (j-th column of A), and let ` 6= j. Then:


n
X c
(−1)k+j ak` Ajkc = 0.
k=1

9
Pn i+k c
3. Fix i (i-th row of A). Then: det A = k=1 (−1) aik Akic ,
Pn i+k c
4. Fix i (i-th row of A), and let j 6= i. Then: k=1 (−1) ajk Akic = 0.
Proof: Parts 1 and 3 are in Theorem 1.4.1. Part 4 is Part 2 applied to AT . Part 2
follows from considering the determinant of A, deleting the j-th column and writing the
k-th column instead. Thus, clearly

a11 · · · a1k · · · a1k · · · a1n


a11 · · · aik · · · a2k · · · a2n
= 0.
.................................
an1 · · · ank · · · ank · · · ann
The products that appear in the Lemma say precisely that A· adj A = adj A· A =
(det A)· I.

1.4.2 Characterizing the rank via minors


Theorem 1.4.4 Let A1 , · · · , Ar be r column vectors in K n , r ≤ n. If they are linearly
independent, then there are r rows 1 ≤ i1 < · · · < ir ≤ n (write I = {i1 , · · · , ir }) such
that the minor
ai11 · · · ai11
AI[1,r] = . . . . . . . . . . . .
ai1r · · · airr
is nonzero.
Proof: Let us prove an equivalent statement. Given a set A1 , · · · , Ar of linearly inde-
pendent vectors in K n , perform the mixing as in Steinitz’s Theorem ?? so as to obtain a
new basis A1 , · · · , Ar , ek1 , · · · , ekn−r . One has
ai11 · · · ai11
0 6= det(A1 , · · · , Ar , ek1 , · · · , ekn−r ) = ± . . . . . . . . . . . . = ±AI[1,r] =
6 0.
ir ir
ar · · · ar
Theorem 1.4.5 Let M be an m×n matrix with coefficients in K. The rank of M , rk M ,
is the largest order r of a nonzero minor of M .
Proof: Let ρ = rk M . If r > ρ, then r columns of M are surely linearly dependent, so
any r × r minor of M is necessarily zero. If r ≤ ρ, by Theorem 1.4.4 there is some nonzero
minor of order r, so ρ is the biggest among such numbers. 
Corollary 1.4.6 let A1 , · · · , Ar ∈ K n be linearly independent vectors. If r < n, then
their span F = hA1 , · · · , Ar i is given by the following cartesian equations. Fix I = {i1 <
· · · < ir } so that AI[1,r] 6= 0. F is defined by the following n − r equations: for every j ∈ I c ,
ai11 · · · ai11 xi1
............
(?)x ∈ F ⇔ ± ir = 0, ∀j ∈ I c .
a1 · · · airr xir
aj1 · · · ajr xj

10
Proof: Firstly, the condition that x ∈ F is equivalent to rk (A1 · · · Ar x) < r + 1, which
is to say that every minor of order r + 1 is zero.
Let us show that it suffices to test I 0 = I ∪ {r + 1}. If (?) holds, then AI[1,r] xj and hence
xj is a linear combination of xi1 , · · · , xir . This follows from developing the determinant
on their rightmost column. This determines all variables xj , j > r. 
   
1 2
2  1 
Example 1.4.7 Consider the vector subspace F = h 1 ,  1 i. Let us write a com-
  

4 −1
plete set of cartesian equations for F .
Consider a generic vector of unknowns x, y, z, t. First of all, we spot in the matrix
 
1 2
2 1 
A=
1 1 

4 −1

the minor 2 × 2 of the first two rows and columns, which is nonzero. The augmented
matrix is now  
1 2 x
2 1 y 
A0 = 
1 1 z  ,

4 −1 t
and fixing the first two rows yields two equations:

1 2 x
2 1 y =0
1 1 z

and
1 2 x
2 1 y = 0.
4 −1 t
If the reader should choose another two rows, the results would be the same –we mean, up
to linear combinations of both equations, of course.

Example 1.4.8 (An oldie follows from Theorem 1.4.5) Recall the case of u, v ∈
Rn , of coordinates ui , vj respectively. The rank of the matrix (u v) is then the highest
order of a nonzero minor: if one of the vectors is nonzero, then it is at least 1 (a nonzero
coordinate is a nonzero 1 × 1 minor!). The rank is precisely 2 if and only if there is a
nonzero minor ui vj − uj vi . We knew this, but now Theorem 1.4.5 vastly generalises this
old result.

11
1.4.3 Cramer’s rule. Inverses via minors
Theorem 1.4.9 (Cramer’s rule) Let A be an invertible n × n matrix. Let b ∈ K n .
There is a unique solution to the linear system Ax = b, of unknowns x = (x1 x2 · · · xn )T .
The value of each xi is
det(A1 , · · · , Ai−1 , b, Ai+1 , · · · , An )
xi = .
det A
Proof: Clearly, Ax = b ⇔ x = A−1 b, so x = 1
det A
(adj A)b, and reverse application of
Laplace’s rule settles the Theorem. 
Remark An alternative proof of 1.4.9 goes as Pfollows. The solution exists, for the columns
n
A1 , · · · , An of A form a basis of R and b = Ai xi has a unique set of coordinates. Now,
write the determinant
n
X
det(A1 , · · · , Ai−1 , b, Ai+1 , · · · , An ) = det(A1 , · · · , Ai−1 , Ak xk , Ai+1 , · · · , An ) = (?).
k=1

Expanding on the i-th column yields (?) = det(A1 , · · · , Ai−1 , Ai xi , Ai+1 , · · · , An )+0, since
the terms with Ak on the i-th position for k 6= i yield zero by DET2. It follows that

det(A1 , · · · , Ai−1 , b, Ai+1 , · · · , An ) = det(A1 , · · · , Ai−1 , Ai , Ai+1 , · · · , An )xi = xi det A,

which yields Cramer’s rule.

1.5 The Wronskian


Remark The reader should read only up to Exercise 1.5.2, and only proceed to read the
rest if further explorations on the subject are desired. Theorem 1.5.1 is all we shall need
to use.

Let f1 , · · · , fn ∈ C n−1 (I), where I ⊂ R is an interval. Assume that the fi are linearly
dependent. This means that one has α1 , · · · , αn real constants, not all zero such that
n
X
αi fi (x) = 0, ∀x ∈ I.
i=1

Much more information is contained here than a mere equation. In fact, if we differentiate
up to n − 1 times the above identity, we get a homogeneous system of n equations and n
unknowns with a nontrivial solution for every x ∈ I:
   
··· α1 0

f1 (x) f2 (x) fn (x)
 f10 (x) f 0
2 (x) · · · f 0
n (x)   α2  0
 .   =  ..  .
  
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  .
 .  .
(n−1) (n−1) (n−1)
f1 (x) f1 (x) · · · fn−1 (x) αn 0

12
Definition The Wronskian, or Wronski’s determinant, of n functions fi ∈ C n−1 (I) is
the determinant function
f1 (x) f2 (x) ··· fn (x)
0 0
f1 (x) f2 (x) ··· fn0 (x)
W (f1 , · · · , fn )(x) = .
...................................
(n−1) (n−1) (n−1)
f1 (x) f1 (x) · · · fn−1 (x)

The following was proven in the above discussion.

Theorem 1.5.1 Let f1 , · · · , fn ∈ C n−1 (I). If the fi are linearly dependent, then the
Wronskian W (f1 , · · · , fn ) is identically zero on I. 

Example 1.5.2 The Theorem allows us to prove that the functions 1, sin x, cos 2x, sin 2x
are linearly independent. We leave the reader to check that their Wronskian is not iden-
tically zero.

Example 1.5.3 Let u1 (x) = x3 , u2 (x) = x2 |x|. The Wronskian W (u1 , u2 ) is identically
zero, although these functions are linearly independent. Thus, the converse holds under
only certain, albeit quite general, hypotheses.

Proposition 1.5.4 Let h, f1 , · · · , fn ∈ C ∞ (I), where I ⊂ R is an interval. One has


W (hf1 , · · · , hfn ) = hn W (f1 , · · · , fn ). Also, W (1, f2 , · · · , fn ) = W (f20 , · · · , fn0 ).

1.5.5 Prove Proposition 1.5.4.

Remark The following Theorem carries a principle of real analytic functions in its proof,
which we shall not state explicitly right now.

Theorem 1.5.6 Let u1 , · · · , un ∈ C ω (I) be real analytic functions. The functions u1 , · · · , un


are linearly dependent if and only if their Wronskian w(x) = W (u1 , · · · , un ) is not iden-
tically zero.

Proof: Assume that the functions u1 , · · · , un are linearly independent. We shall prove
by induction on n ∈ N that the wronskian is not identically zero ,the case n = 1 being
trivial. Let a ∈ I, and let r = min{orda ui , i ≤ n}, where orda f denotes the order of
a as a zero of f (i.e. 0 if f (a) 6= 0, and r = orda f is the minimum number such that
f (r) (a) 6= 0).
If F ⊂ C ∞ (I) is a vector subspace of dimension n, the Wronskian is independent of the
basis of F chosen, up to a nonzero constant. Thus, we shall assume that

r1 = orda u1 < · · · < rn = orda un

(if the ui were LD, at least one would come across as identically 0, i.e. one of the
orders would be ∞). This may be viewed as Gaussian elimination on the coefficients

13
on the expansion centred around a. Up to a nonzero constant, we may assume that
ui (x) = (x − a)ri gi (x), where gi (a) = 1 and gi ∈ C ω (I).
Reduce the to an interval J such that a ∈ J ⊂ I, and g1 (x) is invertible in J. Applying
Proposition 1.5.4 to fi |J yields

(x − a)r2 −r1 g2 (x − a)rn −r1 gn


 
n
W (f1 , · · · , fn )|J = f1 W 1, ,··· , .
g1 g1

Now, clearly, if f1 , · · · , fn are linearly independent, so are 1, ff12 , · · · , ffn1 (restricted to J),
and this is in turn equivalent to the derivatives ϕ02 , · · · , ϕ02 being linearly independent,
where ϕi = fi /f1 (defined over J). By induction, the Wronskian of ϕ02 , · · · , ϕ0n is not
identically zero (over J), and so neither is W (f1 , · · · , fn ), as was to be demonstrated. 

1.6 Problems
The following was part of a test in 2018.

1.6.1 Let a1 , · · · , an ∈ R. Factor the polynomial P (x) defined as follows.

a21 + x a1 a2 a1 a3 . . . a1 an
a2 a1 a22 + x a2 a3 . . . a2 an
P (x) = .
.................................
an a1 an a2 an a3 . . . a2n + x

The result below is the very foundation of L’Hôpital’s rule, for its proof stems from
this little gem. A determinantal guise is most recommended, both for the proof and for
memory purposes.

1.6.2 (Cauchy’s Mean Value Theorem) Let f, g : [a, b] → R be continuous func-


tions, differentiable on (a, b), such that g(b) 6= g(a). Prove that there exists ξ ∈ (a, b) such
that
(f (b) − f (a))g 0 (ξ) = (g(b) − g(a))f 0 (ξ).
Hint: Let f, g, h : [a, b] → R be continuous on [a, b] and differentiable on (a, b), and
consider the function
f (a) g(a) h(a)
H(x) = f (b) g(b) h(b) .
f (x) g(x) h(x)
You might be surprised, but the following is very doable and you already made its ac-
quaintance!

1.6.3 K = R or C. Let A ∈ Mn (K) and let u, v ∈ K n , where K is a field. Prove that


 
1 vT
det = det(A − uv T ).
u A

14
1.6.4 Given A ∈ Mn (Z) a square matrix with integer coefficients, give necessary and
sufficient conditions for A to have an inverse with integer coefficients.

1.6.5 Let t be an indeterminate. Consider the monomials tr1 , · · · , trn where ri ∈ N. Let
M be the matrix whose first row is tr1 , · · · , trn , with the (i+1)-th row being the derivative
of the i-th row. Prove that det M = CtN , where C is a real constant and N is natural,
and find C, N explicitly.

This was an exam question back in 2019 (VE2, methinks).


1.6.6 Let x0 , · · · , xn ∈ R be pairwise distinct numbers, and let y0 , · · · , yn ∈ R. Prove
that there is precisely one polynomial p(x) ∈ R[x] of degree deg p ≤ n such that p(xi ) = yi ,
and that it is characterised by the condition

1 x0 x20 · · · xn0 y0
1 x1 x21 · · · xn1 y1
. . . . . . . . . . . . . . . . . . . . . . . . . = 0.
1 xn x2n · · · xnn yn
1 x x2 · · · xn p(x)

Appendix Det-A: The sign of a permutation


This optional section is independent from the rest of the chapter, and should be read for
further study only.
We shall prove here that the sign function, sgn : Sn → {±1} is well-defined, and shall
state and prove its relation to the number of inversions of a permutation.
Given a permutation σ, it is not just a bijection of {1, · · · , n} to itself, but also induces
a a bijection of P2 (n) to itself. Here P2 (n) is the set of unordered pairs {i, j}, and the
induced map is P2 (σ), defined by {i, j} 7→ {σ(i), σ(j)}. P2 (σ −1 ) is the inverse of P2 (σ).
It may occur that i < j but σ(i) > σ(j). The number of inversions of σ is the number
of pairs where the order is reversed by σ: N (σ) = {(i, j) : i < j, σ(i) > σ(j)}.

Proposition 1.6.7 Let τ = (a b) be a transposition in Sn . Then: N (τ ) = 2(k − h) − 1


is odd.

Proof:

ˆ Suppose that the indices i, j, a, b are distinct. Then: σ(i) = i < σ(j) = j.

ˆ If i < a < b (j = a or j = b, analogously), then i, a pass to i, b and there is no


inversion.

ˆ If a < b < j, (i = a or = b) then a, j become a, b and b, j go to a, j (no inversion).

ˆ If a < i < b (and j = b), a, i become b, i and there are b − a − 1 inversions. On the
other hand, j, b become j, a, which counts for b − a − 1 inversions.

15
ˆ Se i = a < b = j, the pair i, j becomes j, i

The total number of inversions is therefore N ((a b)) = 2(b − a) − 2 + 1 = 2(b − a) − 1


(odd). 

Theorem 1.6.8 Let σi be permutations. Then: (−1)N (σ1 σ2 ) = (−1)N (σ1 ) (−1)N (σ2 ) . There-
fore, sgn(σ) = (−1)N (σ) is well defined, i.e. independent of the factorization of σ into
transpositions.

Proof: Consider the product σP = 1≤k<h≤n (σ(h) − σ(k)). Note that σP


Q
P
= ±1, for
the terms of P are indexed by the unordered pairs of distinct elements (P2 (n)) and in
calculating σP only the sign may vary between numerator and denominator. Thus, σP P
=
σ(j)−σ(i)
±1. The sign of j−i is negative if and only if (i, j) presents an inversion. This means
that σP
P
= (−1)N (σ) 1, which depends only on σ.
On the other hand,
σ1 σ2 P σ1 σ2 P σ2 P
= ,
P σ2 P P
and reindexing (take i0 = the smaller of the pair σ2 (i), σ2 (j) and j 0 to be the bigger) one
sees that
σ1 σ2 P σ1 P
= .
σ2 P P
The sign of a transposition τ is −1, which matches with (−1)N (τ ) , by Proposition 1.6.7.
But then, the sign defined as (−1)k if σ = τ1 · · · τk coincides with our version via the
number of inversions.


16
Chapter 2

Endomorphisms

2.1 Conventions
Let E be a finite-dimensional vector space over a field K (mostly R or C), and let T : E →
E be an endomorphism (over K), i.e. a K-linear map where both domain and codomain
are the same (i.e. E). There are several considerations to make.
Firstly, if e = (ei ) is a basis of E and u ∈ E, we denote the vector of coordinates of u in
the basis e by ue , namely
 
α1
 α2  X
ue =  ..  , where u = αi e i .
 
 . 
αn
Choice of bases/basis: In order to understand the changes in the geometry effected by
T , one expects to use the same basis both at source and target. Thus, if e = (ei ) is a
basis of E, we consider the matrix Me (T ) = [T ]ee .
2.1.1 Let u = (a, b, c) ∈ R3 . Consider the endomorphism T (x) = u × x. Find the matrix
of T in the canonical basis.
Once we fix E, the vector space HomK (E, E) = EndK (E) of K-endomorphisms of E
appears. In this vector space, a product structure emerges in EndK (E), i.e. the com-
position of endomorphisms: if S, T ∈ EndK (E), then S ◦ T ∈ EndK (E). This product
satisfies both distributive laws, I = IE is its unit element, and it is K-bilinear, namely
λ(S ◦ T ) = (λS) ◦ T = S ◦ (λT ), ∀λ ∈ K.
Example 2.1.2 (Powers of an endomorphism) Let E, T be as above, and let e =
(e1 , . . . , en ) be a basis of E. If S, T ∈ EndK (E) have respective associated matrices A, B
in the basis e, then S ◦ T has associated matrix AB in the same basis.
In particular, the powers S n have associated matrix An .
The convention we adopt is T 0 = I.
Base change: Let f : E → E be an endomorphism of E, dimK E < ∞, and let e = (ei ),
u = (ui ) be two bases of E.

17
2.2 Warm-up problems
These problems shall be relevant later, some are repeated in other sections.
2.2.1 (Projectors/Idempotents) Let E be a vector space, and let p : E → E be an
endomorphism. We say that p is a projector, or that p is idempotent, if p2 = p. Prove
the following claims:
(i) If p is a projector, then so is I − p;
(ii) ker p = Im(I − p), and ker(I − p) = Im p.
(iii) E = ker p ⊕ Im p.
2.2.2 Consider C as a real vector space. Take the basis {1, i} of C over R. Given
α = a+bi, write down the matrix of the endomorphism mα : C → C given by multiplication
by α, namely, mα (z) = αz viewed as a real endomorphism of C in the basis given {1, i}.
2.2.3 Let T : E → E be an endomorphism, with dim E < ∞. Prove that the following
are equivalent:
(i) ker T = ker T 2 ;
(ii) E = ker T ⊕ Im T.
2.2.4 Let A be a square matrix of order n ≥ 1. Assuming that A3 − A − 2I = 0, prove
that A is invertible.
2.2.5 (It will be essential afterwards) Let N be a nilpotent matrix of order n, N ∈
Mn (C) (this means that N m = 0 for some m ∈ N). Prove that, if we denote Fj = Im N j
and Fk 6= 0, then Fk+1 ( Fk . Conclude that N n = 0.

2.3 Introduction
Given an endomorphism f , say, of the plane R2 , a way to study it is to determine its
invariants. The fixed directions offer
 important
 clues as to its nature.
2 0
Take for instance the matrix A = . The transformation on the plane caused by
0 3
A leaves two lines through the origin stable by A; to wit, the coordinate axes. No other
line through the origin is stable by A.
Note that a line through the origin is a one-dimensional subspace, L = hvi, and that a
one-dimensional hvi ⊂ E is stable by an endomorphism f if and only if f (v) = λv for
some λ ∈ K. Such vectors v 6= 0 are called characteristic vectors, or eigenvectors,
and the factor λ is called characteristic value, or eigenvalue. The German eigen-
(English transl. own) means also ‘peculiar to’.
Example 2.3.1 Let A ∈ Mn (C) be a square matrix of order n. Suppose that the nonzero
vector 0 6= u ∈ Cn is an eigenvector of A, of eigenvalue λ. Is u an eigenvector of Am ,
where m ∈ N? If so, what is its eigenvalue with respect to Am ?

18
2.4 The characteristic polynomial. Eigenspaces
Back to the finite-dimensional case, let us find out the fixed directions (equivalently, the
eigenvectors) and eigenvalues of a given endomorphism T : E → E. We shall assume that
we have chosen a basis e = (ei ) of E, and denote A = Me (T ).
The equation T (u) = λu has a nonzero solution u ∈ E − {0} if and only if

(T − λI)u = 0, for some u 6= 0,

which is tantamount to det(T − λI) = 0. Fixing the basis e, we need to solve the equation
det(A − λI) = 0.

Definition The characteristic polynomial of an endomorphism T is


pT (x) = det(T − xI). After fixing a basis e of E, pT (x) = det(A − xI).
The equation det(A − xI) = 0 is called the characteristic equation for A.

Proposition 2.4.1 Let A = P BP −1 . The characteristic polynomials of A and B are the


same. In other words, the characteristic polynomial is invariant with respect to similarity
(which justifies our notation that pT depends only on T and not on the basis chosen).

Proof: Indeed, det(A − xI) = det(P BP −1 − xI) = det(P (B − xI)P −1 ). The rest follows
from multiplicativity of the determinant. 

Remark 2.4.2 Let us point out some specific coefficients of pA (x), where A is a square
matrix.
a11 − x a12 ... a1n
a11 a12 − x . . . a2n
pA (x) = det(A − xI) = .. .. .. .. .
. . . .
an1 an2 . . . ann − x
Note that, of the terms of the determinant, we have the product of all diagonal terms

(a11 − x) · · · (ann − x),

and the rest of the terms have degree ≤ n − 2 in x (there cannot be a term consisting of
precisely n − 1 diagonal terms!). On the other hand, pA (0) = det A, so the characteristic
polynomial looks like this:

det(A − xI) = (−1)n xn + (−1)n−1 (a11 + . . . + ann )xn−1 + . . . + det A.

Definition Let A be a square matrix of order n. The trace of A is defined to be


n
X
tr A = aii .
i=1

By Remark 2.4, the trace of A equals the trace of P AP −1 , so we may define the trace of
an endomorphism T ∈ EndK (E), tr T , where dim E < ∞.

19
Thus, the characteristic polynomial of a square matrix A of order n has the form

pA (x) = det(A − xI) = (−1)n xn + (−1)n−1 tr Axn−1 + . . . + det A.

Proposition 2.4.3 (Commutativity of the trace) Let A, B be an m × n matrix and


an n × m matrix, respectively. One has tr(AB) = tr(BA). In particular, the trace of a
square matrix is invariant by similarity.
We leave the proof to the reader as an exercise. We gave a different proof of the second
assertion in Remark 2.4.

2.4.4 Find the characteristic polynomial, eigenvalues and eigenvectors (real and com-
plex)of thefollowing
 matrices:
      
2 1 3 1 a −b 3 0 1 1
A= ;B= ;C= (b 6= 0); D = ;E= .
0 3 1 3 b a 0 3 −1 3

Definition Let f ∈ EndK (E) be an endomorphism, where dimK E < ∞. We say that
f diagonalizes in a basis u = (ui ) of E if the matrix Mu (f ) = D is diagonal, in other
words, if f (ui ) = λi ui for all 1 ≤ i ≤ n. That is to say, that the basis (ui ) consists
of eigenvectors. If such basis exists but is not explicitly given, we shall say that f is
diagonalizable.

Definition Let T ∈ End(E), and let λ ∈ K be an eigenvalue of T . The eigenspace


associated with λ is the vector subspace Eλ = ker(T − λI).

Definition The algebraic multiplicity of an eigenvalue λ is the multiplicity of x − λ


in cf (x). The geometric multiplicity of an eigenvalue λ is dim ker(f − λI).

Proposition 2.4.5 The algebraic multiplicity of an eigenvalue is greater or equal than


its geometric multiplicity.

Proof: Let v1 , · · · , vr be a basis of ker(A − λI). Complete it to a basis of E. In this


basis, f has the matrix  
λIr B
A = Mv (f ) = ,
0 D
and det(A − xI) = (λ − x)r det(D − xI), which settles the Proposition. 

2.4.6 Find all eigenvalues and eigenvectors of the matrix


 
1 2 1
M= 0 1
 −1 ,
0 1 −1
as well as the algebraic and geometric multiplicities of each eigenvalue.

2.4.7 Let A be an n × n matrix such that A3 − 2A2 − A + 2I = 0. Prove that A is


invertible. What eigenvalues may A have?

20
2.5 Eigenspaces as subspaces
Proposition 2.5.1 Let T ∈ End(E) be an endomorphism. Eigenvectors u1 , . . . , ur of
pairwise distinct eigenvalues λi 6= λj are linearly independent.

Proof: Assume that they are linearly dependent. Let 2 ≤ s ≤ r be the minimal number
of nonzero coefficients occurring in a relation of linear dependence among the ui : for
simplicity we assume that these nonzero coefficients accompany u1 , · · · , us . We have:
ai 6= 0 for i ≤ s, and

(2.1) a1 u1 + · · · + as us = 0.

Applying T yields

(2.2) λ1 a1 u1 + . . . + λs as us = 0,

and (2.2)-λ1 (2.1) yields


s
X
(λi − λ1 )ai ui = 0,
i=2

which contains precisely s − 1 nonzero coefficients, thus contradicting minimality of s. 

Corollary 2.5.2 There are at most n different eigenvalues (n = dim E) for an endomor-
phism f ∈ End(E). If there are precisely n eigenvalues, then they form a basis of E and
f has a diagonal matrix in this basis.

Proof: The eigenvalues are the zeros of pf (x), which is of degree n, so there are no more
than n. If there are n distinct eigenvalues, there are n eigenvectors ui forming a basis of
E, and since f (ui ) = λi ui the matrix Mu (f ) is diagonal. 
Let us rephrase Proposition 2.5.1.

Proposition 2.5.3 If λ1 , . . . , λr are the distinct eigenvalues of T ∈ End(E), then we


have a direct sum
Xr Mr
ker(T − λi I) = ker(T − λi I) ⊂ E.
i=1 i=1

Here, subspaces Fi ⊂ E, i = 1, . . . , r are in direct sum if the following linear map is an


isomorphism:
X r
F1 × · · · × Fr → F1 + · · · + Fr = Fi ⊂ E.
i=1
Lr
The preferred
P notation for the sum of the Fi ’s here is the more specific i=1 Fi , rather
than Fi .

No proof is required, since it is just a reformulation of Proposition 2.5.1.

21
Theorem 2.5.4 (First diagonalization criterion) Let f ∈ End(E), where dim E =
n < ∞. f is diagonalizable (over K) if and only if the eigenspaces span E. In other
words,
Mr
f is diagonalizable ⇔ ker(T − λi I) = E.
i=1

Another way to say this is: the sum of geometric multiplicities of all eigenvalues of E
(over K) equals n.

Proof:
P Firstly, one should note that there exists a basis of eigenvectors for E if and only
if ker(T − λi I) = E (we are using that F1 + . . . + Fr is the span of F1 ∪ . . . ∪ Fr ). The
L eigenspace dim Eλ = dim ker(T − λi I) is the geometric multiplicity. We
dimension of each
have equality ri=1 ker(T − λi I) = E if and only if the dimension of the subspace equals
that of E, namely X
dim ker(T − λi I) = dim E.
This means that the sum of all geometric multiplicities of all eigenvalues over K equals
the dimension of E. 

Corollary 2.5.5 Let f ∈ End(E), where dim E = n. Suppose that f has n pairwise
distinct eigenvalues λi , 1 ≤ i ≤ n, λi 6= λj for i 6= j. Then f diagonalizes.

Proof: Indeed, there are n eigenvectors (up to respective constant) of algebraic and
geometric multiplicity 1: this follows from Proposition 2.5.1:

1 ≤ geom. mult. ≤ alg. mult..

Now, we have n eigenvectors ui associated with n distinct eigenvalues, which are linearly
independent, hence a basis since n = dim E. Thus, Mu (f ) is diagonal, which fulfils the
claim. 
ni
Q
Remark 2.5.6 Clearly, when K = C, P A ∈ Mn (C), one has cA (x) = (λi − x) , and
also dim ker(T − λi I) ≤ ni , whereas ni = dim E from cA (x) (this exhausts all factors
x − λi ). Thus, in this case the matrix diagonalises if and only if the algebraic multiplicity
P
of each eigenvalue equals its geometric multiplicity (if not, we would have <, since ni =
n = dimC E already. Let us write it down:
X X
dim(f − λi I) ≤ ni = n, and equality holds iff dim ker(f − λi I) = ni , ∀i.

2.6 Least degree trick. The minimal polynomial


Let f ∈ End(E), where n = dim E ∈ N (finite dimension). Fixing a basis (ei ), we get a
matrix A. Since dim End(E) = dim Mn (K) = n2 , the successive powers of A cannot all
2 2
be linearly independent. In fact, the I, A, . . . , An (likewise I, f, f 2 , . . . , f n ) are linearly
dependent.

22
The least degree trick: Of all polynomials 0 6= p(x) ∈ K[x] such that p(A) = 0, there
is one which has the least degree. Let us take it to be monic, and let us denote it by
p0 (x). Every polynomial such that p(A) = 0 is a multiple of p0 (x). Note that a nonzero
polynomial of degree ≤ n2 which annihilates A already exists, by the above argument.
To prove this, consider the polynomial division of p(x) by p0 (x): p(x) = q(x)p0 (x) + r(x).
Since p(A) = 0 and p0 (A) = 0, we have r(A) = 0. Now, r(x) = 0, for otherwise we would
have r(A) = 0 with deg r(x) < deg p0 (x), thus contradicting minimality of deg p0 (x).
In our case, deg p0 (x) ≤ n2 , as we argued earlier.
Definition Let f, E be as above. The monic polynomial mf (x) of minimal degree among
those which annihilate f is called the minimal polynomial of f . (Monic means: of
lead term equal to 1).
Proposition 2.6.1 E, f as above. Let λ ∈ K be an eigenvalue of E. The factor x − λ
divides the minimal polynomial mf (x) of f . Vice versa, if x − a|mf (x), then a ∈ K is an
eigenvalue of f .
Proof: Indeed, if u 6= 0 is an eigenvector of λ, then mf (f ) = 0, so mf (f )u = mf (λ)u = 0,
which means mf (λ) = 0. Conversely, for every linear factor x − a of mf (x), we have
det(f − aI) = 0, or else f − aI is invertible. Since mf (x) = (x − a)q(x), one has
0 = mf (f ) = (f − aI)q(f ), hence q(f ) = 0, thus contradicting the definition of minimal
polynomial. 
Corollary 2.6.2 Given A ∈ Mn (C), all linear factors of mA (x) come from eigenvalues.

2.6.3 Compute
 the minimal polynomial of all matrices in Exercise 2.4.4, and of the
2 1
matrix F = .
0 2
Proposition 2.6.4 E, f as above. Let Eλi = ker(f − λi I) be the eigenspace associated
with the eigenvalue λi of f (each distinct eigenvalue is counted only once!). Assume that
f diagonalizes, i.e.
Mr
ker(f − λi I) = E.
i=1
Qr
The minimal polynomial of f equals mf (x) = i=1 (x − λi ).
Proof: Indeed, note that a polynomial in f , p(f ), is zero if and only if ker p(f ) = E, and
since the sum of Eλi equals E in our case, this condition is equivalent to ker(f − λi I) ⊂
ker p(f ), ∀i. Let u ∈ Eλi − {0}. p(f )u = p(λi )u = 0 if and only if p(λi ) = 0, which means
x − λi |p(x). This condition for all i is equivalent to saying that
r
Y
(x − λi )|p(x),
i=1
Qr
so the monic polynomial of minimal degree is mf (x) = i=1 (x − λi ). 

23
Theorem 2.6.5 (Diagonalizability criterion) E, f as above. f diagonalizes if and
only if mf (x) is the product of unrepeated linear factors.
Proof: The if part is proven in Proposition
Qr 2.6.4. The converse is proven as
Q follows: if
λ1 , . . . , λr are all zeros of mf (x) = i=1 (x−λi ), then the polynomial Qi (x) = j6=i (x−λj )
satisfies the following: (f − λi )Qi (f ) = 0, i.e. Im Qi (f ) ⊂ ker(f − λi ). By Proposition
2.6.1, we know that the λi form the whole list of eigenvalues Q of f .
Let u ∈ ker(f − λi I), u 6= 0. One has Qi (f )u = Qi (λi )u = j6=i (λi − λj )u 6= 0. Remember
Q x−λ
the Lagrange interpolation formula applied to λ1 , . . . , λr : let Li (x) = j6=i λi −λjj . One
has Li (λj ) = 0 for i 6= j, and = 1 for i = j. Li (x) is a constant multiple of Qi (x). The
formula
r
X
1= Li (x)
i=1
P P
proves that E = Im Li (f ) ⊂ ker(f − λi I), which together with Proposition 2.5.3
yields f diagonalizable. 

2.7 Triangular form


Let T = (tij ), be an upper
Qn triangular matrix (tij = 0, for i > j). The characteristic
polynomial is pT (x) = i=1 (tii − x). The same goes for a matrix of the form P T P −1 .
2.7.1 (It will be essential afterwards) Let N be a nilpotent matrix of order n, N ∈
Mn (C) (this means that N m = 0 for some m ∈ N). Prove that, if we denote Fi = Im N i
and Fk 6= 0, then Fk+1 ( Fk . Conclude that N n = 0.
Theorem 2.7.2 (Triangularizability criterion) Let f ∈ EndK (E), n = dimK E. f
is triangularizable, i.e. admits a basis of E the associated matrix of which Me (f ) is
upper triangular, if and only if cf (x) splits into linear factors. In particular, if K = C
every square matrix A is similar to an upper triangular matrix A = P T P −1 .
Proof: The case n = 1 is clear. Let us proceed by induction on n. If cf (x) = (λi − x)ni ,
Q
there is an eigenvalue λ1 , with an eigenvector u1 6= 0. We choose a basis u = (ui ) of E
that completes u1 . The matrix Mu (f ) is a block matrix, where A0 is (n − 1) × (n − 1)
and b ∈ K n−1 :  
λ1 bT
A= .
0 A0
The characteristic polynomial cA0 (x) = (λ1 − x)n1 −1 ri=2 (λ1 − x)ni , and by induction
Q
−1
 matrix T0 of order n − 1 such that A0 = P T0 P .
hypothesis there is an upper triangular

1 0
Taking the block matrix Q = yields
0 P
 
−1 λ1 ?
T = Q AQ = ,
0 T0

24
which is upper triangular, as desired. 
Corollary 2.7.3 Let A be a square matrix of order n. The trace of A, tr A, is the sum
of all complex eigenvalues of A, counted with (algebraic) multiplicities. Likewise, det A is
the product of all eigenvalues, counted with multiplicities:
X Y
tr A = λi , det A = λi .
The other coefficients of the characteristic polynomial have similar explanations in terms
of the eigenvalues, as signed elementary symmetric polynomials thereof.
Q
Proof: It suffices to note that cA (x) = (λi − x) after triangularising A in the complex
numbers. 

2.8 The Cayley-Hamilton theorem


Solution for 2.7.1: If Fk = Fk+1 6= 0, this means that N : Im N k → Im N k+1 is
surjective (by definition), hence injective since both source and target spaces have the same
dimension. This would mean that N |Fk is an isomorphism, but then Fk = Im N m 6= 0
for all m ≥ k, which contradicts nilpotence of N . It follows that Fk = Fk+1 necessarily
means Fk = 0, and since Im N k = Fk = 0, N k = 0. The following sequence shows why
the minimum such k, r0 , is less or equal than n:
0 = dim Fr0 < dim Fr0 −1 < . . . < dim F1 = Im N < n,
and so therefore r0 ≤ n.
(The above was to be part and parcel of the first proof of the Cayley-Hamilton Theorem,
but due to lack of time we shall only say that the above may be used to prove this theorem
in the case where there is only one eigenvalue. The exercise has value in itself.)
Theorem 2.8.1 (Cayley-Hamilton) Let A be a square matrix of order n. Both mA (x),
cA (x) have the same irreducible factors, and mA (x)|cA (x) (i.e. cA (A) = 0).
Proof: The proof is by induction on n, and is automatic on a triangular matrix. If
n = 1, the Theorem is obvious. If n ≥ 2, write A = P TP −1 , where T square of order
λ1 ?
n − 1. We shall prove that cT (T ) = 0. Write , and let λ1 = t11 . We have
0 T0
cT (x) = (λ1 − x)cT0 (x), and:
  
0 ? cT0 (λ1 ) ?
cT (T ) = (λ1 I − T )cT0 (T ) = ,
0 λ1 In−1 − T0 0 cT0 (T0 )
which by induction hypothesis equals
    
0 ? cT0 (λ1 ) ? 0 0
= = 0,
0 λ1 In−1 − T0 0 0 0 0
thus completing the proof. 

25
2.8.2 Let A, B ∈ M2 (C) be a square matrix of order 2 with tr A 6= 0. We say that two
square matrices X, Y of order n commute if XY = Y X. Prove that B commutes with
A if and only if B commutes with A2 .

2.8.3 Determine all matrices A ∈ M4 (R) up to similarity such that A2 − 3A + 2I = 0.

2.8.4 Find all matrices A such that A2 = A.

2.8.5 Find all matrices A ∈ M2 (R) such that A2 = 0.

2.8.6 Let A be a nilpotent n × n matrix. Prove that An = 0 (all tools from this chapter
are available here).

2.9 Problems
2.9.1 Let A = (aij ) be the n × n real matrix defined by aij = 1 for all i, j = 1, . . . , n.
Prove that A diagonalizes, and find a diagonal matrix D and an invertible matrix P such
that A = P DP −1 .
 
5/2 −1
2.9.2 Given A = , find A1438 and limn→∞ An .
3 −1

2.9.3 Let A ∈ Mn (R). If n is odd, then A has an eigenvalue. If n is even and det A < 0,
A has at least two eigenvalues. Can you give an example of a real A with no (real)
eigenvalues?

2.9.4 Let e1 , . . . , en be a basis of a real vector space E, and let a1 , . . . , an ∈ R. Let


f ∈ End(E) be the endomorphism defined by

f (e1 ) = . . . = f (en ) = a1 e1 + . . . + an en .

Prove that f is diagonalizable if and only if a1 + · · · + an 6= 0.

2.9.5 Which of the following matrices are associated with the same endomorphism? (I.e.,
which of them are similar to each other?)
     
1 −1 3 1 2 3 1 1 0
A = 0 2 1 , B = 0 2 0 , C = 0 2 3 .
0 0 2 0 0 2 0 0 2

For those who are, find the base change making them similar. (You might not be able to
work out each case, in which case you are welcome to try after studying the next chapter.)

(After triangularization theorems only:)

26
2.9.6 Let q(x) ∈ C[x] be any polynomial. Let f ∈ End Q C (E) be an endomorphism of a
finite-dimensional vector space E over C. Let cf (x) = ri=1 (ai − x)ni be the characteristic
polynomial of f .
Prove that the characteristic polynomial of q(f ) is
r
Y
(q(ai ) − x)ni .
i=1

2.9.7 Let f ∈ EndK (E), and let F ⊂ E be an invariant subspace (this means, f (F ) ⊂
F ). Define f |F to be the following endomorphism of F :

f |F : F → F, defined by x ∈ F 7→ (f |F )(x) = f (x) ∈ F.

Prove that, if f is diagonalizable, then so is f |F .

2.9.8 (A result by Issai Schur) Schur decomposition: Let A ∈ Mn (C). Prove


that there exists a matrix P , the columns of which form an orthonormal basis (in other
T
words, P P = I, or P T P = I), such that A = P T P −1 , where T is an upper triangular
matrix.

2.9.9 Let A ∈ Mn (C) be such that there exists P ∈ Mn (C) invertible such that A =
P A2 P −1 . Prove that every nonzero eigenvalue of A is a root of unity.
 
1 1 2
2.9.10 Find all matrices A ∈ M2 (R) such that A = .
1 −1
 
2 2 1
2.9.11 Find all matrices A ∈ M2 (R) such that A = .
−4 −2

2.9.12 Let A be an n × n real matrix such that A3 = A + I. Prove that det A > 0.

2.9.13 (Permutation matrices I) Consider the following matrix first.


 
0 0 0 0 0 1
1 0 0 0 0 0
 
0 1 0 0 0 0
A=
0
.
 0 1 0 0 0

0 0 0 1 0 0
0 0 0 0 1 0

Is A diagonalizable? Justify your answer. (For other purposes, it may be good to compute
at least several powers of A).

27
2.9.14 (Determinant of a circulant matrix) Let c0 , . . . , cn−1 ∈ C. Compute the fol-
lowing determinant:
c0 cn−1 cn−2 · · · c1
c1 c0 cn−1 · · · c2
c2 c1 c0 · · · c3 .
.........................
cn−1 cn−2 cn−3 · · · c0

(It would be appropriate to use the tools of this chapter, and perhaps more difficult to
solve it otherwise.)

2.9.15 (Permutation matrices II) Consider the following matrices.


   
0 0 0 0 0 1 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0 0 0 0 0 1 0
   
1 0 0 0 0 0 0 0 0 0 0 1 0 0
   
A= 0 1 0 0 0 0 0
 , B = 0 0 0 1 0 0 0 .

0 0 1 0 0 0 0
 0 0 1 0 0 0 0
  
0 0 0 0 1 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0 0 0 0 0

Do they diagonalize? We recommend splitting the corresponding permutations as products


of disjoint cycles. Now you are ready to tackle the general permutation matrix, and to
establish whether it diagonalizes.
 
1 1
2.9.16 (a) Consider the matrix B = . Find a formula for B n , where n ∈ N
1 0
(diagonalize the matrix first).

(b) Consider the Fibonacci sequence x0 = 1, x1 = 1, xn+2 = xn+1 + xn . Find the general
term. (Hint: Note the following equation:
    
xn+2 1 1 xn+1
= .)
xn+1 1 0 xn

2.9.17 Let p : E → E be an endomorphism of a finite-dimensional vector space E.


Assume that p is a projector, i.e. p2 = p. Compute T r p as a function of p.

2.10 More problems


2.10.1 Let X ∈ Mn (C), with det X 6= 0. Let Xi be the i-th column of X. Define the
matrix Y = (X2 X3 · · · Xn 0) ∈ Mn (C). Prove that the matrices X −1 Y and Y X −1 have
only 0 and no other eigenvalues.

28
2.10.2 Determine all complex numbers λ for which there is a positive integer n ∈ N and
a real matrix A ∈ Mn (R) satisfying A2 = AT .

2.10.3 Determine all positive integers n for which there are invertible real matrices A, B
such that AB − BA = B 2 A.

2.10.4 Let f, g ∈ EndK (E) be endomorphisms of a finite-dimensional vector space E


over K. Prove that the eigenspaces of f are invariant by g, and vice versa.

2.10.5 (Simultaneously diagonalizable pairs) Let f, g ∈ EndK (E) be endomorphisms,


with dimK E < ∞. We say that f, g diagonalise simultaneously if there exists a basis
u = (ui ) of E in which the matrices of f, g are both diagonal. Prove that f, g diago-
nalize simultaneously if and only if both f, g are diagonalizable and f g = gf (i.e., they
commute).

2.10.6 (Simultaneous diagonalization - Examples) Consider the following matri-


ces:      
0 1 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 0 0 1 , C = 0 0 1 0 .
   
A= 0 0 ,B = 
0 1 1 0 0 0 0 1 0 0
0 0 1 0 0 1 0 0 1 0 0 0
Show that A, B, C are simultaneously diagonalizable (you might not need to prove that
they commute if you recognise their type! You may choose A, B first).

2.10.7 For any integer n ≥ 2, and two invertible n × n matrices A, B with real entries
such that
A−1 + B −1 = (A + B)−1 ,
prove that det A = det B. Is this the case for matrices with complex entries?

2.10.8 Let α 6= 0 be a complex number. Let A, B ∈ Mn (C) be such that AB − BA = αA.

(a) Find a formula for Ak B − BAk as a function of k ∈ N.

(b) Prove that A is nilpotent, i.e. there is a natural number m ∈ N such that Am = 0.

2.10.9 Let A ∈ Mn (C), and let T = TA : Mn (C) → Mn (C) be the endomorphisms


defined as follows: TA (X) = T (X) = [A, X] = AX − XA. Prove the following assertions:

(a) If A diagonalizes, so does TA .

(b) If A is nilpotent, so is TA .
 
2 1 −5
2.10.10 Find all real matrices A ∈ M2 (R) such that A = .
1 −1

29
The following problem lacks all reference to infinite series, so as not to enter discussions
on convergence and limits at this point.
2.10.11 Let N be a real or complex nilpotent n×n matrix of nilpotence index m, N m = 0.
Define the exponential of N , eN , to be the matrix defined by the Taylor series of the
exponential function:
X Nk
eN = (the sum is finite!).
k≥0
k!
0 0
1. Let t, t0 ∈ C. Prove that etN et N = e(t+t ) N .
2. Let M, N be nilpotent matrices such that M N = N M . Prove that
eM eN = eM +N .

3. If t ∈ R and ϕ(t) = etN , show that N = ϕ0 (0).



1 x x2
Now consider the problem of a 2015 exam, A(x) = 0 1 2x, where the student was
0 0 1
asked to show that A(x + y) = A(x)A(y). Show that A(x) is of the above specified form.

2.10.12 Let X ∈ Mn (C) be a nilpotent matrix. Let α ∈ R, and let αk = α(α−1)···(α−k+1)



k
.
Define the following series:
ˆ log(I + X) = X − 21 X 2 + 13 X 3 + . . . + (−1)k−1 k1 X k + . . . ;

ˆ (I + X)α = I + αX + α(α−1) X 2 + · · · αk X k + . . . .

2

Prove that if N is nilpotent, then so is M = (I + N )β − I, and (I + M )α = (I + N )αβ .


Prove also that (I + N )α = eαlog(I+N ) .

One may prove the Cayley-Hamilton Theorem in a purely computational fashion. This
version by Peter Lax shortens that proof.
2.10.13 (Simplified computational proof of the Cayley-Hamilton Theorem) Let
A ∈ Mn (K), where K is an infinite field [26, Ch. II, Th. 5].
1. Let Pi , Qj ∈ Mn (K) be matrices. Consider the following polynomials with matrix
Pd i
Pe j
coefficients, P (x) = i=0 Pi x , Q(x) = j=0 Qj x . Their product R = P Q is
Pd+e
R(x) = k=0 Rk xk , where Rk = i+j=k Pi Qj . If all coefficients Qj satisfy Qj A =
P
AQj , prove that R(A) = P (A)Q(A).
2. Consider the identity
det(xI − A)I = (xI − A)· adj(xI − A),
and write adj(xI −A) = C0 xn−1 +. . .+Cn−1 as a polynomial with matrix coefficients.
Check that Ci A = ACi , and use part 1 to show that cA (A) = 0.

30
2.10.14 Let A, X ∈ Mn (K) be matrices such that AX = XA. Prove that there is a
matrix M such that AM = M A, XM = M X such that

cA (X) = M (X − A).

31
32
Chapter 3

Endomorphisms (II)

In this chapter we shall study the main aspects of endomorphisms more deeply, and shall
show some important applications. We shall work over arbitrary fields unless otherwise
stated.

3.1 Euclid’s algorithm. Ideals


The exposition is focussed on the ring K[x], but all arguments work for the ring of integers
Z, with minor modifications.
Definition An ideal I ⊂ K[x] is a subset such that
(i) for every f, g ∈ I, f + g ∈ I;
(ii) if f ∈ I, p ∈ K[x], one has pf ∈ I.
Trivial examples of ideals are K[x] and {0}.
Definition Given f (x) ∈ K[x], the subset of multiples of f (x), denoted by (f (x)), is
defined to be
(f (x)) = {p(x)f (x)|p(x) ∈ K[x]}.
One sees that (f (x)) is an ideal of K[x]. Ideals of this type are called principal ideals
(i.e. those admitting only one generator).
Proposition 3.1.1 (All ideals are principal) Let I ⊂ K[x] be an ideal. There is a
polynomial f (x) such that I = (f (x)). f (x) is called a generator of I, and is unique up
to constant multiples.
Proof: If I = {0}, take f (x) = 0. Otherwise, let f (x) be a polynomial of minimal degree
in I − {0}. If p(x) ∈ I, polynomial division yields f (x) = q(x)f (x) + r(x), where either
r(x) = 0 or 0 ≤ deg r < deg p, and r(x) = p(x) − q(x)f (x) ∈ I. By definition of f (x),
r(x) = 0 necessarily.
Let g(x) ∈ I be another element of minimal degree in I − {0}. By the above, g(x) is
a multiple of f (x) and f (x) is a multiple of g(x), which means that they are constant
multiples of each other. 

33
3.1.1 Divisibility revisited
Example 3.1.2 a(x) is a multiple of b(x) (equivalently, b(x)|a(x) if and only if (a(x)) ⊂
(b(x)).

Proposition 3.1.3 Let f (x), g(x) ∈ K[x]. The following subset is an ideal: (f (x)) ∩
(g(x)). Thus, there is a generator for this ideal, unique up to constant multiples, which
we call the least common multiple m(x) of f (x), g(x): (m(x)) = (f (x)) ∩ (g(x)).

Proposition 3.1.4 (Greatest common divisor) Every common divisor d(x) ∈ K[x]
of f (x), g(x) satisfies f (x), g(x) ∈ (d(x)). In particular, the following subset of K[x] is
contained in d(x):

(f (x), g(x)) = {a(x)f (x) + b(x)g(x)|a(x), b(x) ∈ K[x]} ⊂ (d(x)).


The subset (f (x), g(x)) is an ideal, and is therefore principal. Let δ(x) be one of its
generators; one sees that δ(x)|f (x), δ(x)|g(x). We call δ(x) the greatest common
divisor (GCD), or highest common factor (h.c.f.) of f (x), g(x). Every common
factor d(x) of f (x), g(x) is a factor of δ(x), and vice versa.

Proof: The subset (f (x), g(x)) is an ideal of K[x], as both conditions are satisfied. The
fact that δ(x) is a common factor of f (x), g(x) follows from the corresponding inclusions
on ideals (f (x)) ⊂ (f (x), g(x)) = (δ(x)) and (g(x)) ⊂ (δ(x)). Again by the inclusions on
ideals, every common factor d(x) of f (x) and g(x) satisfies that δ(x) ∈ (d(x)).
Corollary 3.1.5 (Bézout’s Identity) Let f (x), g(x) be two polynomials in K[x]. Let
d(x) be their highest common factor. One has

a(x)f (x) + b(x)g(x) = d(x),

for some a(x), b(x) ∈ K[x].

Example 3.1.6 A practical fashion of obtaining the G.C.D. is by way of Euclid’s algo-
rithm, and by proceeding backwards.

Remark 3.1.7 (Partial fraction decomposition) Let f (x), g(x) be coprime polyno-
p(x)
mials. By Bézout’s identity, we may decompose a rational function of the form f (x)g(x)
into simpler fractions:
p(x) p(x)(a(x)f (x) + b(x)g(x)) A(x) B(x)
= = + .
f (x)g(x) f (x)g(x) f (x) g(x)
If deg p < deg(f g), then one may pick the individual fractions on the right hand side
and effect polynomial division of each numerator by its corresponding denominator, and
repeat the operation until we get a sum of fractions, the denominators thereof have only
one irreducible factor each. This is the starting point for symbolic integration of a rational
function. To find out more, search e.g. ‘Partial fraction decomposition’ on Wikipedia.

34
3.2 Invariant subspaces. The subspaces Eu
Definition Let f ∈ EndK (E). A linear subspace F ⊂ E is invariant by f (f -invariant),
or stable by f , if f (F ) ⊂ F . We define f |F to be the following endomorphism of F :

f |F : F → F, determined by x ∈ F 7→ (f |F )(x) = f (x) ∈ F.

Proposition 3.2.1 Let F ⊂ E be an invariant subspace. Denote by mF (x) = mf |F (x).


One has mF (x)|mf (x).

Proof: Clearly, the fact that mf (f ) = 0 implies that mf (f |F ) = 0. 

3.2.2 Let f ∈ EndK (E), and let F ⊂ E be an invariant subspace (this means, f (F ) ⊂
F ). Define f |F to be the following endomorphism of F :

f |F : F → F, defined by x ∈ F 7→ (f |F )(x) = f (x) ∈ F.

Prove that, if f is diagonalizable, then so is f |F .

Notation: Let F ⊂ E be an f -invariant subspace. We use the shorthand mF (x) to


denote the minimal polynomial of f |F .

3.2.3 Let F, G ⊂ E be f -invariant subspaces, where f ∈ EndK (E). Show that F ∩ G


F + G are also f -invariant. If dim E < ∞, prove that mF +G (x) = l.c.m.(mF (x), mG (x)).
Prove also that mF ∩G (x)|(mF (x), mG (x)).

These are important prototypes of invariant subspaces.

3.2.4 Let p(x) ∈ K[x] be a polynomial. The subspaces ker p(f ), Im p(f ) are invariant
by f .

Proposition 3.2.5 Let u ∈ E. The minimal f -invariant subspace containing u is the


vector subspace Eu = h{f i (v)}i≥0 i.

3.2.1 The subspaces Eu and their minimal polynomial


Definition Let u ∈ E, u 6= 0. We define Eu to be the vector subspace

Eu = hu, f (u), f 2 (u), . . . , f k (u), . . .i = h{f i (u)}i≥0 i.

Proposition 3.2.6 The subspace Eu ⊂ E is f -invariant, and is actually the minimal


f -invariant subspace containing u.

Proof: The subset S = {f i (u)}i≥0 satisfies f (S) ⊂ S, so the span of S, Eu = hSi is f -


invariant. On the other hand, if f (F ) ⊂ F , F subspace, and u ∈ F , then clearly S ⊂ F ,
and so therefore Eu ⊂ F . 

35
Proposition 3.2.7 Let u ∈ E, u 6= 0. A basis of Eu is formed by u, f (u), · · · , f d−1 (u),
for suitable d ≥ 1.

Proof: Let d be the maximum degree such that u, f (u), · · · , f d−1 (u) are linearly in-
dependent. This means that there is a monic polynomial mu (x) of degree d such that
mu (f )u = f d (u) + cd−1 f d−1 (u) + · · · + c1 f (u) + c0 u = 0, and by Subsection 3.1 one sees
that any polynomial p(x) such that p(f )u = 0 is a multiple of mu (x).
We have proven that d = dim Eu , where d = deg mu (x). Indeed, any linear combination
of elements in S, i.e. elements f i (u), is of the form p(f )u, and p(x) = q(x)mu (x) + r(x)
where deg r < deg mu , so the linearly independent system u, f (u), . . . , f d−1 (u) spans Eu .


Corollary 3.2.8 One has mu (x)|mf (x).

Proof: See the beginning of the proof of Proposition 3.2.7. 

Definition The polynomial mu (x) is defined to be the minimal polynomial of u.

3.2.9 Write down the matrix of f |Eu in the special basis u, f (u), · · · , f d−1 (u).

3.2.10 Let p(x) = xd + ad−1 xd−1 + · · · + a0 ∈ K[x]. Given the matrix


 
0 0 · · · 0 −a0
1 0 · · · 0 −a1 
 
A=  . . . . . . . . . . . . . . . . . . . . ,

0 0 · · · 0 −ad−2 
0 0 · · · 1 −ad−1

(called companion matrix of p(x)) find the characteristic and minimal polynomials of
A.

Theorem 3.2.11 Let u, Eu be as above. Consider the endomorphism f |Eu of Eu . One


has mEu (x) = mu (x), and cEu (x) = (−1)d mu (x).

Proof: We choose the basis u, f (u), · · · , f d−1 (u). Note that mu (f )u = 0, and that mu (x)
annihilates the whole of Eu , since mu (f )f i (u) = f i mu (f )u = f i (0) = 0, which shows
mEu (x) = mf (x), since mu |mEu (x). 

3.3 Primary decomposition


Just as one has the decomposition Ker( ri=1 (f − λi I) =
Q Lr
i=1 ker(f − λi I) (which is
basically what the proof of Theorem 2.6.5 shows!), we have a natural splitting of E as a
direct sum of invariant subspaces where in f has a simpler behaviour. This shall eventually
lead us to simple matrices (which are the next best thing to the diagonal form, when f
does not diagonalise) where the properties of f are underlined.

36
Proposition 3.3.1 Let P (x), Q(x) ∈ K[x] be nonconstant polynomials. If P (x), Q(x)
are coprime and f : E → E is an endomorphism, one has a decomposition

ker P (f )Q(f ) = ker P (f ) ⊕ ker Q(f )

of f -invariant subspaces. (Here, dim E need not be finite). Also, ker P (f ) = Im Q(f )
and ker Q(f ) = Im P (f ).

Proof: All subspaces involved are f -invariant by construction (REF. LACKING), and
both ker P (f ) and ker Q(f ) are clearly contained within ker P (f )Q(f ) = ker Q(f )P (f ).
Now take E = ker P (f )Q(f ) for ease of notation: we may assume P (f )Q(f ) = 0. Bézout’s
identity yields 1 = a(x)P (x) + b(x)Q(x), and in turn we have

IE = a(f )P (f ) + b(f )Q(f ).

Clearly, for u ∈ E the vector Q(f )u ∈ ker P (f ), and vice versa, which together with the
identity u = b(f )Q(f )u + a(f )P (f )u shows ker P (f ) + ker Q(f ) = E. The same identity
shows that ker(f ) ∩ ker Q(f ) = 0. Finally, ı̀f u ∈ ker P (f ), then applying the identity
yields u = 0 + Q(f )b(f )u ∈ Im Q(f ), so ker P (f ) ⊂ Im Q(f ), which is the inclusion
missing. Making the corresponding changes one proves that ker Q(f ) = Im P (f ). 

3.3.2 Let a ∈ K, and let p(x) ∈ K[x] be such that p(a) 6= 0, and let q(x) = x − a. Let
f be such that mf (x) = (x − a)p(x). Write down Bézout’s identity and write down the
projectors for each summand.

Remark 3.3.3 One may apply Proposition 3.3.1 to the case of three factors P, Q, R that
are pairwise coprime. As it stands in the proposition, one may apply it to P Q and R, and
then restrict to ker P (f )Q(f ) and apply Proposition 3.3.1 to this invariant subspace with
P , Q. However, we might want to see the resulting decomposition more explicitly, that is,
this formula
ker P (f )Q(f )R(f ) = ker P (f ) ⊕ ker Q(f ) ⊕ ker R(f ).
Writing Bézout’s identity for P Q and R yields aP Q + bR = 1. Now, since Im R(f ) =
ker P (f )Q(f ), it is with b(f )R(f ) that we need to work in order to decompose ker P (f )Q(f )
further. Consider Bézout’s identity for P, Q now: a0 P + b0 Q = 1, and multiply bR by 1 in
the form of a0 P + b0 Q we obtain the following identity of endomorphisms:

I = b0 (f )b(f )a(f )Q(f )R(f ) + a0 (f )P (f )b(f )R(f ) + a(f )P (f )Q(f ).


Each term on the right hand side corresponds to a projection to one of the respective direct
summands ker P (f ), ker Q(f ), or ker R(f ), which renders the sum explicit (but it is easier
to show that the sum is direct by applying Proposition 3.3.1 twice).

Lemma 3.3.4 Let f : E → E be an endomorphism, where dim E < ∞. Let F, G be


invariant subspaces of E such that (mF (x), mG (x)) = 1. Then F + G is a direct sum
F ⊕ G, and that mF ⊕G (x) = mF (x)mG (x).

37
3.3.5 Prove lemma 3.3.4.
Qr
Theorem 3.3.6 Let f be such that mf (x) factors as mf (x) = i=1 pi (x)ei . The following
splitting exists and is unique:
E = ⊕ri=1 Ei , where mEi (x) = pi (x)ei .
We have a complete orthogonal system of projectors πi ( πi = 1, πi πj = 0∀i 6= j, πi2 =
P
πi ), where πi are polynomials in f and πi E = Ei .
The subspaces Ei are called primary components of the pair (E, f ).
Proof: Write Pi (x) = pi (x)ei , for 1 ≤ i ≤ r, where pi (x) runs across all irreducible factors
of mf (x) over K. Applying Proposition 3.3.1 recursively to the product P1 (x)P2 (x) · · · Pr (x) =
mf (x) yields the following splitting of f -invariant subspaces:
E = ker P1 (f ) ⊕ · · · ⊕ ker Pr (f ).
Let Ei = ker Pi (f ). Note that mEi (x)|Pi (x) by construction. Now, since mf (x) is the
product of all mEi (x) by Lemma 3.3.4, we get mEi (x) = Pi (x) = pi (x)ei . Uniqueness
followsPreadily from the fact that such a decomposition must satisfy Ei ⊂ ker pi (f )ei , and
since dim Ei = dim E, all inclusions are equalities. 
Remark 3.3.7 Theorem 3.3.6 together with Lemma 3.3.4 provide another proof of Theo-
rem 2.6.5. In the case where all irreducible factors of mf (x) are linear, the decomposition
into eigenspaces generalises into a decomposition of generalised eigenspaces. where
the generalised eigenspace of the eigenvalue λi is Ei = ker(f − λi I)ei , i.e., the primary
component associated with the linear factor x − λi .
Example 3.3.8 Compute the characteristic and minimal polynomials of the matrix as-
3 3
sociated with the   map T : R → R given by T (~x) = u × ~x, where
linear  u = (a, b,  c).
0 0 −c b
T (e1 ) = u×e1 =  c , and the matrix A of T in the canonical basis is  c 0 −a.
−b −b a 0
cA (x) = −x(x2 + kuk2 ). Thus, if u 6= 0 there are three different complex eigenvalues, i.e.
A diagonalises over C and mA (x) = −cA (x) = x3 + kuk2 x.
Remark 3.3.9 Let E = F ⊕ G, where both are f -invariant and dim E < ∞. Note that,
if we choose a basis of E by way of respective bases of F and G, the associated matrix of
f has the shape  
M 0
,
0 N
where M and N are the respective associated matrices of f |F and f |G. If, however, we
only have an invariant subspace F ⊂ E, completing a basis of F to a basis of E yields a
more generic form  
A B
0 C
(here C has an interpretation in terms of the quotient vector space E/F which we shall
refrain from including here).

38
3.3.10 Find a decomposition of mT (x) into irreducible factors over R, and write down
the primary decomposition for T .

3.3.11 Let u, v ∈ R3 , where u 6= 0. Show that, if the vectors v, u × v, u × (u × v) are


linearly dependent, then either ukv or u ⊥ v. (It admits a more elementary proof, but we
shall use the tools provided).

We recommend the reader to pause here and solve the problems, the solution of which is
found right below.

Solution for 3.3.10: Let r = |u| > 0. One has mT (x) = x(x2 + r2 ), and Bézout’s
identity yields 1 = (1/r2 )(x2 + r2 ) − x· x, hence

R3 = ker T ⊕ ker(T 2 + r2 I) = ker T ⊕ Im T.

Here the image of T has rank 2, but also coincides with hui⊥ .

Solution for 3.3.11: Given v ∈ R3 , consider the primary decomposition v = αu + w,


where w ⊥ u. Note that p(x) ∈ R3 satisfies the following– we write the direct sum as a
Cartesian product to emphasize the direct sum concept here.

p(f )v = p(f )(αu, w) = (αp(f )u, p(f )w),

and since both x, x2 + r2 are irreducible, we have the following cases. If α 6= 0 and w 6= 0,
then mv (x) = x(x2 + r2 ) and v, T v, T 2 v are linearly independent (this is precisely the
contrary to the statement!). The remaining cases are when either of the components is
zero: mv (x) = 1 if v = 0 (which is why we never consider it!), or mv (x) = x (when w = 0,
i.e. v, u are parallel), and finally the case where α = 0 and w 6= 0, i.e. v = w 6= 0 is
orthogonal to u. The solution is thus complete.

Finally, we provide a star result which will be very useful later.

Theorem 3.3.12 Notations and assumptions as in Theorem 3.3.6. There is a vector


u ∈ E such that mu (x) = mf (x).
L
Proof: We cannot but use primary decomposition here. Let v ∈ E = Ei ; we shall
study mv (x) in terms of its coordinates vi ∈ Ei .
Since p(f )v = p(f )(v1 , . . . , vr ) = (p(f )v1 , . . . , p(f )vr ), mv (x) is the least common multiple
of all mvi (x), and since mvi (x)|mEi (x) are pairwise coprime, the following formula holds:
Y
mv (x) = mvi (x), where mvi (x) = pi (x)mi , with mi ≤ ei .
Thus, mf (x) = mv (x) if and only if, for every i ≤ r, mvi (x) = mEi (x). Since we
have ker pi (f )ei −1 ( Ei = ker pi (f )ei by the definition of mEi , choose vi ∈ Ei such that
/ ker pi (f )ei −1 . An immediate corollary of this is that mu (x) = mf (x). 
vi ∈

39
Corollary 3.3.13 (Yet another proof of the Cayley-Hamilton Theorem) Take Eu ,
where u is as in Theorem 3.3.12. The characteristic polynomial of f |Eu is precisely mf (x),
and so mf (x)|cf (x).

Proof: Indeed, one has mf (x) = mu (x), and cEu = ±mu (x), so mf (x) = ±cEu (x)|cf (x).
It suffices to take a basis of Eu (with associated matrix A of f |Eu ) and to extend it to
one of E. so the associated matrix M of f has the form
 
A B
M= ,
0 C

so cf (x) = det(M − xI) = det(A − xI) det(C − xI), where det(A − xI) = cf |Eu (x) =
±mf (x). 

3.3.14 Complete the proof of Corollary 3.3.13 so as to prove that both mf (x) and cf (x)
have exactly the same irreducible factors, by induction on dim E (thus avoiding recourse
to algebraically closed fields).

Remark
L 3.3.15 So far we have obtained a decomposition into invariant subspaces E =
Ei , where every mEi (x) has precisely one irreducible factor. Forming a basis e of E by
choosing a basis for each Ei yields a matrix
 
A1 0 . . . 0
 0 A2 . . . 0 
A = Me (f ) = . . . . . . . . . . . . . . . . ,

0 0 . . . Ar

where Ai is associated with f |Ei .


In the case where f diagonalizes, each Ai is of the form λi Ini . Otherwise, if mf (x) splits
completely, one may triangularise each Ai . We shall refine our methods in order to achieve
further simplification.

Proposition 3.3.16 The decomposition into primary components works for an endomor-
phism T ∈ EndK (E), regardless of dim E, provided that there exists one nonzero polyno-
mial p(x) such that p(T ) = 0. Likewise, there exists u ∈ E such that mu (x) = mT (x).
The proofs given in this section apply to this situation. 

3.4 The Jordan canonical form


Let f be an eondomorphism of an n-dimensional vector space E such that mf (x) splits
into linear factors. We shall find a suitable basis in which f has a particularly simple
matrix.
Throughout this section, we shall assume that mf (x) = (x − λ)e (i.e. restrict to a primary
component of f ). For a proof using quotients of vector spaces, see [24, Th. 9.7].

40
Theorem 3.4.1 (Jordan canonical form) Under the above hypotheses, there are vec-
tors ui such that
Ms
E= Eui .
i=1

Proof: Replacing f by f − λI, one may assume that λ = 0. Let u be such that mu (x) =
mf (x) = xe ; we shall find an invariant subspace F such that E = Eu ⊕ F . Now fix a
basis for E, and let A be its associated matrix. One may choose the first e vectors to be
a basis of Eu , say, u0 = u, ui = f i (u), for 0 ≤ i ≤ e − 1.
Note that the characteristic and minimal polynomials of A and AT are the same, i.e.
cA (x) = cAT (x), mA (x) = mAT (x). Let N denote the e × e companion matrix
 
0 0 0 ... 0 0
1 0 0 . . . 0 0 
 
N =  0 1 0 . . . 0 0 .

. . . . . . . . . . . . . . . . . .
0 0 0 ... 1 0

The matrix A associated with f in the basis u looks like follows:


 
N B
.
0 C

There is a vector y ∈ K n such that (AT )e−1 y 6= 0. Actually, one may choose such y so
that y T Ae−1 e1 6= 0, where (ei ) is the canonical basis of K n . In our case, one may simply
choose y such that ye 6= 0.

Lemma 3.4.2 The linearly independent vectors y, AT y, . . . , (AT )i y, . . . , (AT )e−1 y form
an AT -invariant subspace of K n . Likewise, if we consider this list to be one of row
vectors, namely,
H = hy T , y T A, . . . , y T Ai , . . . , y T Ae−1 i ⊂ (K n )∗ ,
then the annihilator F = {x ∈ K n : v T x = 0 for every v ∈ H} is A-invariant.

Proof of the Lemma: Indeed, if y T Ai x = 0 for every i ≥ 0, then in particular


y T Ai (Ax) = y T Ai+1 x = 0 for every i ≥ 0, which means that Ax is in the annihilator
F whenever x is. 
Back to our Theorem, we have two subspaces, one is Eu and the other is F , which satisfies
dim F = dim E − dim Eu by construction. We shall prove that Eu ∩ F = (0), so we have
a direct sum.
Let v ∈ Eu ∩ F . This means that v = i=0 ci f i−1 (u), and also that y T Ai v = 0 for every
P
i = 0, . . . , e − 1. If v 6= 0, let r the smallest index such that cr 6= 0. It suffices to compute
X
y T Ae−1−r v = y T Ae−1−r cr f r−1 u + y T Ae−1−r+i−1 u.
i>r

41
Note that e − 1 − r + i − 1 ≥ e − 1 > e for i > r, so y T Ae−1−r v = cr (y T Ae−1−r+r u) 6= 0,
which means that v ∈/ F ! Therefore, Eu ⊕ F = E, and induction on dim E completes the
proof. 

Jordan blocks: Define the Jordan block Je (λ) to be the following e × e matrix:
 
λ 0 0 ... 0 0
1 λ 0 . . . 0 0
 
0 1 λ . . . 0 0 .
Je (λ) = λI + N =  
. . . . . . . . . . . . . . . . . .
0 0 0 ... 1 λ
Back to the proof of Theorem 3.4.1, the nilpotent endomorphism f − λI is such that
mF (x)|mf (x), so one actually produces vectors ui with non-increasing dimensions dim Eui ,
e = e1 ≥ e2 ≥ . . . ≥ es .
After choosing suitable bases for Eui that produce respective companion matrices, the
endomorphism f is equivalent to the block matrix
 
Je1 (λ) 0 ... 0
 0 Je2 (λ) 0 . . . 0 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 

0 0 . . . Jes (λ).
Thus, on the Jordan block Eu1 , one has f (u1 ) = λu1 +u2 , f (u2 ) = λu2 +u3 , . . . f (ue ) = λue .
In other words, ui = (f − λI)i−1 u1 for 1 ≤ i ≤ e = e1 .

Theorem 3.4.3 (Jordan canonical form, general case) Let f : E → E be an endo-


morphism, such that cf (x) splits completely. There is a decomposition on each primary
component
Msi
Ei = Euij ,
j=1

which means that there exists a Jordan basis (formed out of each piece Euij ) rendering its
associated matrix

State of the matter: Given an endomorphism with characteristic polynomial mf (x) =


(λ − x)n , we have proven the existence of a Jordan form for f , by showing the existence
of vectors u1 , . . . , us such that mui (x) = (x − λ)ei , where e1 = deg mf (x), and e1 ≥ e2 ≥
· · · ≥ es . These exponents determine the Jordan form. It remains to prove uniqueness,
and to show a procedure to compute the exponents ei .

The answer is to consider the dimensions dim ker(f − λI)k for all k ≥ 1. The following
lemma is straightforward:

Lemma 3.4.4 Given a nilpotent endomorphism f such that E = Eu , dim E = n, one has
dim ker f i = i for 1 ≤ i ≤ n. If i > n, one has dim ker f i = n. The sequence di = ker f i
satisfies di − 2di−1 + di−2 ≤ 0 for every i ≥ 2 integer.

42
Theorem 3.4.5 Let f be an endomorphism. the characteristic polynomial of which is
cf (x) = (λ − x)n . There are unique numbers e1 ≥ e2 ≥ · · · ≥ es ≥ 1 such that f is
similar to a Jordan block of eigenvalue λ and exponents e1 , . . . , es . These exponents may
be retrieved by calculating the dimensions di = ker(f − λI)i for every i ≤ n.

Proof: Consider the sequence di = dim ker(f − λI)i for i ≥ 0. One has

di − 2di−1 + di−2 ≤ 0 for every i ≥ 2,


L
which follows from Lemma 3.4.4 and the fact that E = Eui . For instance, d1 = s.
If i ≥ 1, then one likewise sees that di − di−1 is the number of subspaces Eui in the
decomposition of E whose dimension is dim Eui ≥ i. Thus, the number of cyclic subspaces
of dimension i in the above decomposition coincides with 2di − di−1 − di+1 . Knowing all
the numbers di therefore determines the exponents ei , which are therefore invariants of
f. 

Example 3.4.6 Let A ∈ M4 (K) be such that cA (x) = (λ − x)4 . Assume that mA (x) =
(x − λ)2 . There are two possible outcomes. Firstly, e1 = 2. The value of d1 = dim ker(A −
I) is either 2 or 3: if it is 2, then there are two cyclic subspaces Eu , Ev of dimension 2.
If d1 = 3, then d2 − d1 = 1, i.e. there is only one Eu of dimension 2, and there are two
of dimension 1 in the decomposition given by the computation 2d1 − d2 − d0 = 2.

Example 3.4.7 The Jordan form of a matrix of order up to 3 is determined by its


characteristic and minimal polynomials. In the case where cA (x) = (λ − x)4 , the cases
mA (x) = (x − λ)e with e = 1, 3, 4 have only one Jordan form each. Likewise, the Jordan
form of a matrix of order up to 4 having two or more complex eigenvalues is determined
by its characteristic and minimal polynomials.

3.5 Application to linear differential equations


Here we first deal with linear ODEs (ordinary differential equations) with constant coef-
ficients, namely, differential equations of the form

y (n) + an−1 y (n−1) + . . . + a1 y 0 + a0 y = b(t),


where y(t) is the unknown and ai are constants, although we shall deal first with the
homogeneous case b(t) = 0.

Remark 3.5.1 Such y(t) is defined onP an interval I, which we suppose to be open. Given
that y exists and one has y = − n−1
(n) (n) (i)
i=0 ai y , then since the r.h.s. is differentiable
so is y . If the function y(t) is n + r times differentiable, then y (n) equals an (r + 1)-
(n)

times differentiable function, hence y(t) is n + r + 1 times differentiable. This shows that
y(t) ∈ C ∞ (I).
We shall consider I = R.

43
Notation: Let E = C ∞ (R) or CC∞ (R) – in the second case, the functions take complex
values. We denote the operator D = dtd . This is an endomorphism of E.
Example 3.5.2 (Eigenfunctions for D) Consider D as above. One may wonder which
are the eigenvalues of D. In the real case, Dy = λy, i.e. y 0 = λy. One knows an obvious
solution, y = eλt . To check whether there are more, w should find analogies with the
derivative of a product.
Consider the function z(t) = e−λt y(t). Differentiating yields
z 0 (t) = e−λt (y 0 − λy) = 0 ⇔ y = Ceλt .
Thus, we have obtained the only eigenfunction of D with eigenvalue λ.
In the complex case, the same proof works, but it takes a bit more work. Note that
eibt = cos bt + i sin bt, and that indeed dtd eibt = b(− sin bt + i cos bt) = ibeibt . On the other
hand, let α = a + bi. One has eαt = eat eibt , and one sees that
d αt
e = αeαt .
dt
3.5.3 (Review) Let A(t), B(t) be matrices, all of whose coefficients are differentiable.
One has dtd (AB) = dA
dt
B + A dB
dt
. This works for all kinds of bilinear maps, such as the
product of two complex functions, or the cross product of two vector functions with values
in R3 , etc.:
(F × G)0 = F 0 × G + F × G0 , (F • G)0 = F 0 • G + F • G0 .
One may proceed termwise, or by using bilinearity and the definition of the derivative.
We encourage the reader to work out the derivative of det(F1 , . . . , Fn ), where Fi (t) are
vector functions in Rn of a parameter t.
First-order linear ODEs - formula: Consider a(t), R t b(t) continuous functions on an in-
terval I ⊂ R. Take a primitive of a(t), say A(t) = t0 a(s) ds where t0 ∈ I. We wish to
solve

y 0 + a(t)y = b(t).
Concentrating on the l.h.s. and looking for the derivative of a suitable product (U y)0 , we
find that (eA y)0 = eA(t) (y 0 + a(t)y), so multiplying by eA(t) yields the equivalent form

(eA(t) y)0 = eA(t) b(t).


Rt
Integrating yields eA y = t0
eA(s) b(s) ds + C, where C is an arbitrary constant, hence the
solution
Z t 
−A(t) A(s)
y(t) = e e b(s) ds + C (C is an arbitrary constant).
t0

Rephrasing our general linear ODE: Let p(x) ∈ C[x] be a monic nonconstant poly-
nomial. We wish to find an explicit form for the subspace K = ker p(D) ⊂ E. Clearly, this
subspace is also D-invariant, and D|K admits a minimal polynomial, since p(D|K) = 0.
This means that the primary components trick applies.

44
3.5.1 Homogeneous case, primary components
(z − λi )ni . By Theorem 3.3.6, one has
Q
Let p(x) =

r
M
ker p(D) = ker(D − λi I)ni .
i=1

Thus, our task reduces to finding ker(D − αI)n explicitly. Note that eαt is a solution.
Actually, writing y = zeαt yields

(D − αI)(zeαt ) = z 0 eαt .

The same works when we apply this k times:

(3.1) (D − αI)k (zeαt ) = eαt (Dk z).

The following result is a straightforward consequence of (3.1):

Theorem 3.5.4 The solution space ker(D − αI)n , for n ≥ 1, is described by

ker(D − αI)n = Cn−1 [t]eαt ,

where as always Cd [t] is the vector space of polynomials of degree ≤ d.

Theorem 3.5.5 (Homogeneous linear ODEs, constant coefficients) Let p(z) ∈ C[z]
be a nonconstant monic polynomial, and let p(z) = (z − λi )ni . The solution space is
Q

Y M
ker p(D) = ker (D − λi I)ni = Cni −1 [t]eλi t .

As a result, dim ker p(D) = deg p.

The following admits a pedestrian solution, but using the tools learnt may provide a more
elegant approach.

3.5.6 Let P (x) ∈ R[x] be a nonzero polynomial. Prove that the following equations have
at most finitely many common solutions:
Z x Z x
P (t) cos t dt = 0, P (t) sin t dt = 0.
0 0

45
3.5.2 The equation x0 = Ax
Note that, if we have an ODE such as those solved above, we may reduce it to an ODE of
order 1. For instance, given the equation y 000 − 3y 0 + 2y = 0, we may set up the variables
x1 = y 00 , x2 = y 0 , x3 = y and we obtain the ODE
 
0 3 −2
x0 = Ax, where x(t) = (x1 x2 x3 )T , A = 1 0 0  .
0 1 0

An ODE (resp. linear ODE) of order n turns into an ODE (resp. linear ODE) of order
1, where the new unknown is a vector function.

Definition A (general) linear ODE is an ODE of the form

x0 = A(t)x + b(t),

where b(t) is continuous on an open interval I ⊂ R. If b(t) = 0, this is linear homogeneous.


If A(t) = A has constant coefficients, this is a linear ODE with constant coefficients.

Remark 3.5.7 Let T be a triangular n × n matrix (upper or lower triangular). Show


that the ODE x0 = T x (likewise x0 = T x + b(t)) may be solved recursively as a sequence
of linear ODEs of order 1.

Theorem 3.5.8 Let x0 = Ax be an homogeneous linear ODE. Let A = P JP −1 , where J


is the Jordan form of A. Consider w(t) = P −1 x(t). Solving x0 = Ax reduces to solving
w0 = Jw, which may be solved by a series of linear ODEs of order 1 or by repeated use of
Theorem 3.5.5.

3.6 Problems
3.6.1 Let f be an endomorphism of E, where dimK E < ∞, and let p(x) ∈ K[x]. Prove
that p(f ) is invertible if and only if (p(x), mf (x)) = 1.
 
2 1 3
3.6.2 Let A = 0 2 1. Find all real matrices X ∈ M3 (R) such that X 2 = A. Solve
0 0 2
also X m = A for any integer m ≥ 3. (Hint: Use Problem 2.10.11 and use the cited result
therein.)

3.6.3 (Fitting decomposition) Given an endomorphism f ∈ EndK (E), where dimK E <
∞, prove that there are invariant subspaces F , G of E such that f |F is nilpotent and f |G
is invertible.

3.6.4 Let A, B, C be as in Problem 2.9.5. Work out the cases you left out.

46
3.6.5 Let A be a complex square matrix. Prove that there is an invertible matrix P such
that AT = P AP −1 .
3.6.6 Let f : E → E be an endomorphism of a finite-dimensional vector space E. Let F
be an invariant subspace of E. Prove that, if Ei are the primary components of f , then
M r
F = F ∩ Ei .
i=1

3.6.7 Let dimK E < ∞, where K is an algebraically closed field. Let f : E → E be an


endomorphism such that, for every F ⊂ E invariant by F , there exists a supplementary
subspace G ⊂ E (i.e. such that F ⊕ G = E) which is also f -invariant. Prove that f
diagonalizes. Prove also the converse: if f diagonalizes, then every invariant subspace
F ⊂ E admits a supplementary subspace G which is invariant.
3.6.8 Let K = R or C. Let A ∈ Mn (K). As usual we denote [X, Y ] = XY − Y X (Lie
bracket). Let ZA = {X ∈ Mn (K) : [X, A] = 0} (ZA is called the centraliser of A in
Mn (K)). Prove that dim ZA ≥ n.
3.6.9 (Jordan decomposition) Let f ∈ EndC (E), where dimC E < ∞. Prove that
there are endomorphisms g, h of E such that: g is diagonalizable, h is nilpotent, and both
commute with f and with each other, such that f = g + h. (Both may actually be produced
as polynomials in f ).

3.7 Primary decomposition (the projectors)


The notation δij represents the Kronecker delta function, that is δii = 1 and δij = 0 for
i 6= j.
Theorem 3.7.1 Let f ∈ EndK (E), where either dimK E < ∞ or dim Q E is infinite and
there exists a nonzero polynomial which annihilates f . Let L mf (x) = Pi (x), where Pi (x)
are pairwise coprime. There is a decomposition E = Ei , where Ei = ker Pi (f ) and
the decomposition is given by a complete orthogonal
P set of projectors πi ∈ End(E), i.e.
πi πj = δij πj (orthogonal projectors) and πi = I (complete). Each πi is given by a
polynomial in f . If Pi (x) = pi (x)νi with pi (x) irreducible, the decomposition given is the
primary decomposition of f .
Q P
Proof: Let Qi (x) = j6=i Pj (x). The hypotheses yield 1 = ai (x)Qi (x) for some
ai (x) ∈ K[x]. As a result, let πi = ai (f )Qi (f ): since mf P
(x)|Qi (x)Qj (x) for i 6= j, one has
πi πj = 0 for i 6= j. On the other hand, by construction πi = 1. Let us prove that πi is
a projector:
X X
π i = π i I = πi π k = πi π i + πi πj = πi2 ,
j6=i
which proves πi to be a projector.
These are clearly the projectors that produce the primary decomposition from Theorem
3.3.6. 

47
3.8 The rational normal form
The Jordan canonical form works for endomorphisms with only linear factors in the factor-
ization of its characteristic polynomial. In the general case, there is Frobenius’s rational
normal form. The gist is the same: after reducing to one primary component, one need
write E as the direct sum of cyclic subspaces.
Bases for the cyclic subspaces: If p(x) is irreducible and mu (x) = p(x)e , the basis to
be chosen for Eu reminds of the numerical system with
P base p(x),i so to speak. That is,
every polynomial admits a unique expression Q(x) = ri (x)p(x) , where deg ri < deg p.
This presents some advantage regarding the usual basis u, f (u), . . . , f i (u), . . ..
Theorem 3.8.1 Let E be such that mf (x) = p(x)e , with p(x) irreducible. There is
a decomposition of f -invariant subspaces, where Eui are cyclic and mui (x) = p(x)ei .
Ordering the exponents so that e1 ≥ e2 ≥ · · · es ≥ 1, we have that the list of exponents is
an invariant of f .
Proof: The proof is a bit more complex than in the Jordan form case. Choose u ∈ E
such that mu (x) = mf (x) = p(x)e , concretely ω such that ωp(f )e−1 u 6= 0. By using the
dual space E ∗ , one need obtain an f -invariant supplementary F to Eu :
By considering f ∗ ∈ EndK (E ∗ ), which has the same characteristic and minimal polyno-
mials as f , consider ω ∈ E ∗ such that mω (x) = mf (x) = mf ∗ (x). The subspace Eω∗ is
f ∗ -invariant, and its orthogonal (or annihilator) F = (Eω∗ )⊥ is also f -invariant. In other
words, the subspace F is defined by the linearly independent equations
ω(f i (x)) = 0, ∀i ≥ 0,
and is therefore stable by f , of codimension deg mω = deg mu = deg mf = e deg p, which
is precisely the dimension of Eu . The following equations are linearly independent and
define F by themselves:

ω(x) = 0, ωf (x) = 0, . . . , ωf e deg p−1 (x) = 0.


It remains to check that F ∩ Eu = 0. If v = e−1 j 0 ⊥
P
j=0 rj (f )p(f ) u ∈ (Eω ) , where deg rj <
deg p, choose the minimal j0 such that rj0 (x) 6= 0, and by Bézout’s identity obtain a(x)
such that a(x)rj0 (x) ≡ 1(mod p(x)). The vector a(f )p(f )e−1−j0 v = p(f )e−1 u 6= 0, and
ω ◦ a(f )p(f )e−1−j0 does not annihilate v, for ωa(f )p(f )e−1−j0 v = ωp(f )e−1 u 6= 0, which
contradicts v 6= 0. This proves that F ∩ Eu = 0, and counting dimensions yields E =
Eu ⊕ F . Recursive application of this step proves the theorem. 
Theorem 3.8.2 An endomorphism of a finite dimensional vector space, f ∈ EndK (E),
admits a decomposition into cyclic subspaces
M s
E= Eui ,
i=1

such that mui+1 (x)|mui (x) for all i ≤ s − 1. The polynomials mui (x) are called invariant
factors, and determine f up to similarity. Q The minimal polynomial of f is mu1 (x), and
its characteristic polynomial is cf (x) = ± mui (x).

48
3.9 Problems
3.9.1 Let f be an endomorphism of a finite-dimensional vector space. We say that f
is semisimple if every invariant subspace F ⊂ E admits an invariant supplement G.
Prove that f is semisimple if and only if mf (x) consists of simple irreducible factors.

3.9.2 Over an arbitrary field K, a matrix A ∈ Mn (K) is equivalent to its transpose.

49
50
Chapter 4

Canonical form of an endomorphism

There are two problems, which are intimately related, in the study of endomorphisms of
a finite-dimensional vector space. Their description amounts to the following:
1. Canonical form.– given A ∈ Mn (K), find a simple, standard form for A. In other
words, find a basis in which A has a simple standard form: A = P JP −1 . In the
case where A diagonalises, this J is the diagonal form of A.
2. Classification.– Given A, B ∈ Mn (K), find a simple way to determine whether
A, B are similar; namely, whether there exists an invertible matrix P ∈ GLn (K)
such that A = P BP −1 .
Such standard form is called canonical form, and it essentially describes the similarity
class of a matrix. In the case where the ground field K is algebraically closed, this
is the Jordan canonical form, and is the next best thing to the diagonal form of a
diagonalisable matrix.
See e.g. [12, Ch. VIII] for this subject and suitable background. References [13], [24]
deal only with the Jordan canonical form. [13] does not spare details or computations,
and contains a pedestrian, firm and progressive grasp of the concepts. [24] has a concise
treatment of the Jordan canonical form.

4.1 Jordan form of a nilpotent endomorphism


(THIS SHOULD BE REFERENCED TO THE PROBLEMS LIST OF THE PREVIOUS
CHAPTER)
Example 4.1.1 Let N ∈ Mn (K), where K is a field. Prove that if N is nilpotent and
N n−1 6= 0, then there exists a basis of K n wherein N takes the following form:
 
0 0 ... 0 0
1 0 . . . 0 0
 
0 1 . . . 0 0
N ∼ .
 .. .. . . .. .. 
. . . . .
0 0 ... 1 0

51
Reading the matrix on the right, we see that we need to find a basis u1 , . . . , un such
that ui = N i−1 u1 satisfies: N un = N n u1 = 0, and u1 , N u1 , . . . , N n−1 u1 are linearly
/ ker N n−1 : the minimal polynomial of u1 is therefore
independent. It suffices to choose u1 ∈
mu1 (x)|x , and does not divide x , so mu1 (x) = xn , which yields dim Eu1 = n =
n n−1

deg mu1 , E = K n = Eu1 as desired.


We have found a standard form for a nilpotent matrix which is also the companion matrix
for xn (INSERT REF.). The rest of this section is devoted to finding a canonical form for
all nilpotent matrices.
Given f as in this section, the following result specifies its Jordan form.

Theorem 4.1.2 (Jordan form of a nilpotent endomorphism) Let f : E → E be a


nilpotent endomorphism, where dimK E = n < ∞. There are vectors v1 , . . . , vs ∈ E such
that
s
M
(4.1) E= Evi .
i=1

In other words, E admits a basis for which f has a block diagonal matrix form, where
the diagonal blocks are such as in (REF. MISSING). If mvi (x) = xmi . If we assume
that mi ≥ mi+1 for i < s, then the sequence m1 ≥ · · · ≥ ms ≥ 1 characterises f up to
isomorphism. P
Furthermore, mf (x) = xmax{mi } = xm1 and pf (x) = (−x)n = (−x) mj .

4.2 Some useful arrangements


We shall now assume Theorem 4.1.2, and introduce some arrangements to our standard
basis for E which produces the desired matrix.
Firstly, consider the cyclic subspace Eu with the basis u, f (u), . . . , f d−1 (u), where d =
deg mu (x). The behaviour of f is made more explicit by displaying the vectors on a stack.

Lemma 4.2.1 Let u1 = u, ui = f i−1 (u) where mu (x) = xn and E = Eu . One may
describe the subspaces ker f i and Im f i as follows.

(i) ker f i = hun−i+1 , · · · , un−1 , un i for i ≤ n, ker f i = E (n-dimensional) for i > n.


Thus, dim ker f i = i for i < n, = n for i ≥ n.

(ii) Im f i = hui+1 , . . . , un i for i < n, Im f i = 0 for i > n. Therefore, rk f i = n − i for


i < n, 0 for i ≥ n.

Proof: Immediate. 
Now let us assume that E, f admit a special basis such as in (4.1). We represent each Evi
as a stack, and place them adjacently as follows.

52
Example 4.2.2 Let A ∈ M5 (C) be as follows.
 
0 0 0 0 0
1 0 0 0 0
 
A= 0 1 0 0 0.
0 0 0 0 0
0 0 0 1 0
We have v11 = v1 = e1 , and v12 = Av1 = e2 , v13 = A2 v1 = e3 . Analogously we have
v2 = v21 = e4 , v22 = Av2 = e5 . Thus the juxtaposition of both stacks yields the following.
Note that there are two stacks, and that dim ker f 1 = 2 (apply Lemma 4.2.1 to each one).
One also has dim ker f 2 = 2 + 2 (total boxes up to floor 2), and again dim ker f 3 = 3 + 2.
Example 4.2.3 Let n = 8. Consider A to be the following matrix:
 
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
 
0 0 0 1 0 0 0 0
 
0 0 0 0 0 0 0 0
A= 0 0
.
 0 0 0 1 0 0 
0 0 0 0 0 0 0 0
 
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
Thus, the matrix A has the shape A = J4 (0)T ⊕J2 (0)T ⊕J2 (0)T (block diagonal, with those
as its diagonal blocks). A basis adapted to A would be the following:
E = Q8 = Ee4 ⊕ Ee6 ⊕ Ee8 ,
and on each cyclic block we have e3 = Ae4 , e2 = Ae3 , e1 = Ae2 , then e5 = Ae6 , e7 = Ae8
and ker A has e4 , e6 , e8 as a basis.

53
From the above computations, we extract a lesson.
Remark 4.2.4 E, f as above. the number dim ker f i − dim ker f i−1 corresponds to the
number of stacks of height ≥ i (see Lemma 4.2.1).
4.2.5 It would follow from Remark 4.2.4 that νi = dim ker f i − dim ker f i−i is nonin-
creasing. Prove unconditionally that, for every endomorphism f : E → E of a finite-
dimensional vector space E, the sequence νi is nonincreasing.
One may easily retrieve the whole ensemble of stacks by taking the numbers νi .
4.2.6 Given the numbers νi , consider the greatest number i0 such that νi0 > 0 (hence
νi0 +1 = 0). Conclude that there are precisely νi0 stacks of height i0 . Use this fact to obtain
all numbers m1 ≥ · · · ≥ ms ≥ 1.

4.3 Proof of Theorem 4.1.2


We shall prove it by induction on n, as in [13]. If n = 1, the result is obvious. If
the theorem holds for dimensions up to n − 1, consider F = Im Ltf ( E, endowed with
f |F ∈ EndK (F ). By hypothesis, we have a decomposition F = i=1 Fvi for some vi ∈ F .
Note that t = dim ker f |F , i.e. t = dim(ker f ∩Im f ). This means that there are µ vectors
wi , . . . , wµ from ker f that complete any basis of ker f ∩ Im f to a basis of ker f (here
µ = dim ker f − dim(ker f ∩ Im f )):
(ker f ∩ Im f ) ⊕ hw1 , . . . , wµ i = ker f,

54
where µ = dim ker f − dim(ker f ∩ Im f ) and dim(F + hw1 , . . . , wµ i) = rk f + µ. This
covers the µ stacks in E that would invisible otherwise, if we previously assumed that
they exist: Ewk = hwk i.
It now remains to consider the t nonzero stacks in Im f , e.g. Fvi , of dimension dim Fvi =
mi . Since vi ∈ F = Im f , one has vi = f (ui ), which yields Fvi ⊂ Eui . Note that
ui ∈/ Im f = F , so Fvi (L Eui andL dim Eui = dim Fvi + 1 (one storey taller). It now
remains to show that E = Eui ⊕ hwk i, and to extract our basis from this splitting.
Write uji = f j−1 (ui ). We need to prove that the n vectors uji , wk are linearly independent.
Indeed, if
X X
A= αij uji + βk wk = 0, then
1≤j≤mi +1,i≤s

0 = f (A) = 1≤j≤mi ,i≤s αi,j+1 vij−1 , which yields αij = 0 for all j ≥ 1, j ≤ mi . Thus it
P
remains to show that f m1 (u1 ), . . . , f mt (ut ) and w1 , . . . , wµ are linearly independent. By
hypothesis, f mi (ui ) = f mi −1 (vi ) form a basis of ker f ∩ Im f , and wj , j ≤ µ complete
these to a basis of ker f , so αi,mi = 0, βj = 0 for all i, j, so we have found a Jordan basis
for E, f . 

4.4 Jordan canonical form (general)


Let f : E → E be an endomorphism, where E is a K-vector space of finite dimension
dimK E = n < ∞. Assume that all irreducible factors of pf (x) are linear (i.e., all of its
roots lie in K). (E, f ) admits a primary decomposition
r
M
E= Ei , where Ei = ker(f − λi I)mi , (x − λi )mi = mEi (x), and
i−1

mf (x) = ri=1 mEi (x) = ri=1 (x − λi )mi , pf (x) = ri=1 (λi − x)ni , where ni = dim Ei and
Q Q Q
1 ≤ mi ≤ ni .
We now consider the pairs (Ei , f |Ei ) in which the pair (E, f ) splits. f |Ei is not nilpotent,
but f − λi I is.

Notation: Let N be the r × r standard nilpotent matrix of order r and mN (x) = xr ,


and let λ ∈ K Write
   
0 0 ... 00 λ 0 ...0 0
1 0 . . . 0
0 1 λ ...0 0
 
 
Jr (0) = N = 0 1 . . . 0
0 , Jr (λ) = λI + N = 
0 1 ...0 0
.
 
 .. .. . . ..
..   .. .. .. .. ..
. . . .
. . . .. .
0 0 ... 1 0 0 0 ... 1 λ

We call Jr (λ) the r × r Jordan block associated with λ.

55
Theorem
Lρ 4.4.1L (Jordan canonical form) Let E, f be as above. One has a splitting
E = i=1 Euij , where uij ∈ Ei form the building blocks of f − λi I|Ei . If rij =
deg muij (x) = dim Euij , then f admits a basis wherein its matrix A is block diagonal of
the form
Jr11 (λ1 ) · · · 0 ··· 0
 
 ... ... ..
.
..
. 
 
A=
 0 · · · J rij (λi ) · · · 0 .

 . .. .. ..
 ..

. . . 
0 ··· 0 · · · Jrρ,s(λρ ) (λρ )

Proof: Given a block Euij within ker(f − λi I)ni , choose an adapted basis for f − λi I|Euij .
In this basis, f = f − λi I + λi I has an associated matrix equal to Jrij (λi ) = λi Irij + Nrij ,
and that is the corresponding block. Doing this for every summand in the splitting yields
a matrix as in the statement. 

Corollary 4.4.2 (Jordan decomposition) Let f be as above. There is a decomposition


f = σ + η, such that σ is semisimple (diagonalizable), η is nilpotent, and ση = ησ.

4.5 Problems
4.5.1 Let f be an endomorphism of a finite-dimensional vector space. We say that f
is semisimple if every invariant subspace F ⊂ E admits an invariant supplement G.
Prove that f is semisimple if and only if mf (x) consists of simple irreducible factors.

4.5.2 Over an arbitrary field K, a matrix A ∈ Mn (K) is equivalent to its transpose.

4.5.3 (Jordan decomposition) Prove Corollary 4.4.2. Use it to compute the expo-
nential of a matrix A ∈ Mn (C).

56
Chapter 5

Inner products

The structure we study here is that of a real or complex vector space, together with an
inner product. In the real case, we are talking about a general form of an inner product
indeed, which we shall call euclidean inner product. In the complex case, however,
we shall introduce another kind of inner product, called hermitian (or unitary) inner
product.
The notions of vector space and inner product became important at the end of the 19th
century, when applications to physics demanded a certain depth in the results. Thus, in
the beginning of the 20th century, a rise in abstraction of linear algebra took place, hand
in hand with its applications to mathematical physics. This grew exponentially when Max
Born, Nobel Laureate in Physics, spotted matricial structures in Heisenberg’s quantization
of the harmonic oscillator. Max Born had taken care of writing the linear algebra part
of Courant-Hilbert’s magnum opus [8], so he had the training to recognise it! Until
that moment, physicists did not have linear algebra in their syllabus, but linear algebra,
functional analysis and group theory became the bread and butter of quantum theorists.
The standard model of particle physics is a living witness to that, still unsurmounted by
any other physical theory. Even more so, superstring theory, is ever more steeped into
mathematics of all kinds, a sort of tutti-frutti of cutting-edge mathematics, and is called
by some physical mathematics (see Lee Smolin [38], Woit [42], Moore [28]).

5.1 Warm-ups
All problems may be solved using the techniques of last semester, and pave the way
for arguments improved or presented differently later in this chapter. The last problem
admits one or more solutions using the tools developed herein.

5.1.1 Given u1 , . . . , un ∈ Rn , define Gij = ui • uj . Prove that, if det G = 0, then


u1 , . . . , un are LD.

5.1.2 Let F ⊂ Rn be a vector subspace, and let r = dim F .

1. Find dim F ⊥ .

57
2. Prove that F ⊕ F ⊥ = Rn .

3. Prove that F ⊥⊥ = F .

5.1.3 Let F, G be vector subspaces of a vector space E. Assume that F ∩ G = 0. In that


case, we say that F + G is the direct sum of F and G, and write F ⊕ G = F + G.
(a) Prove that F + G satisfies the following property: every element w ∈ F + G may be
written uniquely as w = u + v, where u ∈ F and v ∈ G. (Thus, F ⊕ G is a sort of
cartesian product of F and G, hence the special notation ⊕.)

(b) Let E = Mn (R). Let F = Symn = {A ∈ Mn (R) : AT = A}, G = Skewn = {A ∈


Mn (R) : AT = −A}. Prove that

Symn ⊕ Skewn = Mn (R).

(c) Find bases for Symn and Skewn above.

5.1.4 Let E = C(R) be the vector space of continuous functions over R. Define the
subspaces E = {f ∈ E : f (x) = f (−x)}, O = {f ∈ E : f (−x) = −f (x)} of even and odd
functions, respectively. Prove that

E = C(R) = E ⊕ O.

5.1.5 (VF 2020) Define u = (1 1 1 · · · 1)T , v = (1 2 3 · · · n)T ∈ Rn . Define F = {x ∈


Rn : u • x = 0, v • x = 0}. Let A, B ∈ Rn . Prove that

u•A u•B
6= 0 ⇔ F ⊕ hA, Bi = Rn .
v•A v•B

5.2 Inner products (euclidean case)


Definition Let E be a vector space over R. A map g : E × E → R is defined to be an
inner product over E if it satisfies the following conditions:
(i) g is symmetric, i.e., g(u, v) = g(v, u).

(ii) g is bilinear, that is, linear on each argument: linear on the left, g(αu + βv, w) =
αg(u, w) + βg(v, w); and linear on the right, i.e. g(u, λv + µw) = λg(u, v) + µg(u, w).

(iii) g is positive definite, i.e. g(u, u) ≥ 0, and g(u, u) = 0 ⇒ u = 0.

Notation: The usual notation for an inner product (euclidean or hermitian) is hu, vi.
However, while this is widely used, it may be confused with our notation for the linear
span of a subset, and in these notes we shall use (u, v) whenever the inner product at
hand need not be specified.
The notation g, or G for matrices associated with an inner product, is chosen after Gram.

58
Pn
Example 5.2.1 The canonical euclidean inner product on Rn , (x, y) = x•y = i=1 xi yi ,
is a prime example.

Example 5.2.2 Consider the following map g : R2 × R2 → R :


   0
0 0 a b x
((x, y), (x , y )) = (x y) ,
b c y0

where a > 0, ac − b2 > 0. Clearly, g is bilinear and symmetric, and if u = (x y)T , then
g(u, u) = ax2 + 2bxy + cy 2 > 0 if u 6= 0. Conversely, if a > 0 and ac − b2 > 0, then the
above formula defines an inner product on R2 .

5.2.3 Define the map B(x, y) = xT M y, where B : Rn × Rn → R and M is an n × n


matrix.

1. Prove that B is bilinear.

2. Prove that B is symmetric if and only if M is symmetric, i.e. M T = M .

3. If M = D is diagonal (D = diag(d1 , . . . , dn )), prove that B is an inner product if


and only if di > 0 for all i ≤ n.

5.2.1 (Bi)linearity
Note that a map is linear, essentially when it commutes with linear combinations. For
instance, a matrix P A of m Prows and n columns yields a function T = TA : Rn → Rm that
is linear, i.e. T ( λi ui ) = ri=1 λi T (ui ). This follows readily from the matrix product!

Proposition 5.2.4 Let E be a real vector space, dim PnE = n < ∞.P Let B : E × E → R be
a bilinear map, and let (ei ) be a basis of E. If x = i=1 xi ei , y = ni=1 yi ei , then B(x, y)
has the following expression:

Xn n
X n
X
B(x, y) = B( xi ei , yj ej ) = B(ei , ej )xi yj =
i=1 j=1 i,j=1

 
y1

B(e1 , e1 ) B(e1 , e2 ) . . . B(e1 , en )
  B(e2 , e1 ) B(e2 , e2 ) . . . B(e1 , en )   y2  
= x1 x2 . . . xn   . .
 .. 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
B(en , e1 ) B(en , e2 ) . . . B(en , en ) yn

5.3 Canonical hermitian inner product on Cn


The following exercise comes in handy when we deal with hermitian spaces.

59
5.3.1 Prove that |z|2 + Re(az) + b = 0 (b real) has a solution only for |a|2 ≥ 4b (give an
explicit description of the set of solutions if you like).
(see Beck et al, A first course in complex analysis, Q 1.10.)

All concepts introduced (linear dependence and independence, dimension, etc.) using R
as field of scalars translate to the general structure of vector spaces over a field. Therefore,
when we say that two vectors u, v ∈ Cn are proportional, we say that u = λv for some
λ ∈ C, or v = µu for some µ ∈ C.

5.3.2 Let E be an n-dimensional vector space over C. Then: E is a vector space over
R, and
dimR E = 2 dimC E = 2n.

A basic example in Quantum Mechanics (which may be read in [29]) features the hermitian
inner product on C2 and the double slit experiment.
Since we have the identification C = R2 (i.e. the Argand plane), the first thing that comes
to mind is, why we should define a new inner product on Cn if we have the canonical
inner product on R2n , identifying

Cn = C × . n. . × C = R2 × . n. . × R2 = R2n ,

and using it on the real coordinates. More precisely, let u, v ∈ Cn and let ui , vi be their
complex coordinates. Write now uk = a2k−1 + ia2k , vk = b2k−1 + ib2k separating into
real and imaginary parts. Thus identified, u corresponds to A = (a1 , · · · , a2n ) and v to
B = (. . . , bi , . . .) in R2n . Their canonical inner product on R2n is therefore:
n
!
X
A • B = (a1 b1 + a2 b2 ) + (a3 b3 + a4 b4 ) + · · · (a2n−1 b2n−1 + a2n b2n ) = Re uk vk .
k=1

Example 5.3.3 In C2 , take the vectors u = (i, −1), v = (−1, −i) and u0 = (1, i + 1), v 0 =
(1 − i, 2). The inner product in R4 of u, v yields zero, although v = iu. We see that there
is a loss of information regarding proportionality! In the case of Rn we see proportionality
of two vectors through equality in the Cauchy-Schwarz inequality, but the field of scalars is
now C. Could we define a new inner product with a Cauchy-Schwarz inequality detecting
complex proportionality? .

We started this course showing product formulas for complex numbers, and reading the
geometry from them in Paragraph 1.0.2 of last semester’s notes (first lecture of the course).

Definition Let u = (u1 , · · · , un ), v = (v1 , · · · , vn ) ∈ Cn . The (canonical) hermitian


inner product of u, v is defined as:
n
X
u•v = uk vk .
k=1

60
Note that the case k = 1 (already shown in detail) uv encloses the signed (or oriented)
area of the parallelogram formed by u, v (its opposite in this case).
Note that Re u•v corresponds to the Euclidean inner product on R2n , following the above
identifications. The imaginary part added contains geometrical information, which will
appear below.

Proposition 5.3.4 (Basic properties of the hermitian inner product on Cn ) The


following properties hold:

1. v • u = u • v;

2. (C-linearity on the left) If λ0 , λ00 ∈ C, v, u, u0 ∈ Cn , then

(λ0 u0 + λ00 u00 ) • v = λ0 (u0 • v) + λ00 (u00 • v);

3. (antilinear / semilinear on the right) If µ1 , µ2 ∈ C, v 0 , v 00 , u ∈ Cn then

u • (µ1 v 0 + µ2 v 00 ) = µ1 (u • v 0 ) + µ2 (u • v 00 );

4. (positive√definite) u•u ≥ 0, e u•u = 0 ⇒ u = 0. (The norm of u ∈ Cn is defined


as kuk = u • u, and agrees with that of R2n through the usual identifications).

Both second and third properties together are called sesquilinearity. The prefix sesqui
means ‘one and a half ’ (i.e. ‘half linear’ =semilinear on the right). A sesquilinear form
on Cn is hermitian if it satisfies the first property. Finally, properties 1 through 4 define
an hermitian inner product.

We leave the checking to the reader.

Example 5.3.5 Calculate the hermitian inner product for the pairs of vectors of Example
5.3.3. Note that, although u, v = iu appear to be orthogonal in R4 , their hermitian inner
product u • iu = (−i)u • u = −2i, and the objection raised in Example 5.3.3 vanishes.

We leave the following remark to the reader.

Remark 5.3.6 Let u, v ∈ Cn , such that u•v 6= 0. There are r > 0 and a complex number
of the form eiθ such that u • v = reiθ , both unique, and so we write u • (eiθ v) = r ≥ 0.

5.3.1 Orthogonality and projection in Cn


We say that u ∈ Cn is orthogonal to v ∈ Cn if u • v = 0. Since v • u = u • v, saying
that u is orthogonal a v is tantamount to saying that v is orthogonal to u, so the order is
irrelevant.

61
Theorem 5.3.7 (Pythagoras’s Theorem in Cn ) If u, v ∈ Cn are orthogonal with re-
spect to the canonical hermitian inner product, then

ku + vk2 = kuk2 + kvk2 .

The proof is entirely analogous to its euclidean counterpart.

Given v 6= 0, there exists a unique λ ∈ C such that u − λv is orthogonal to v (proceed


just as in the real case). One has
u•v
λ= .
v•v
Thus, the projection of u on (complex) span of v is u•v
v•v
v, and the normal component of u
with respect to v is
u•v
u− v.
v•v
Just as we did in the real case, by Pythagoras’s theorem the expression ku + λvk2 satisfies
u•v 2 u•v 2
(5.1) ku + λvk2 = ku − vk + |λ + |
v•v v•v
If we try to establish an analogue of the Law of Cosines we get:

(u + v) • (u + v) = u • u + v • v + (u • v + v • u) = u • u + v • v + 2Re(u • v).

The angle between u and v is defined to be the same as that resulting from the iden-
tification of Cn with R2n (check out the real part, and proceed accordingly). The new
definition now provides the desired theorem.

Example 5.3.8 Note that, if v = eiθ u where 0 6= u ∈ Cn and 0 ≤ θ < π, then the angle
between u and v is θ.

5.3.2 The hermitian Cauchy-Schwarz inequality


Theorem 5.3.9 (Hermitian Cauchy-Schwarz inequality in Cn ) Let u, v ∈ Cn . The
following inequality holds:
|u • v|2 ≤ (u • u)(v • v),
with equality if and only if u, v are proportional, i.e. v = λu for some λ ∈ C or u = µv
for some µ ∈ C.

Proof: We shall provide one by George Pólya here, and hint at other proofs later.
Let u, v ∈ Cn be such that u • v 6= 0 (strict inequality is clear if u, v are orthogonal, as
is equality precisely when either u or v equals zero). We now use Remark 5.3.6, and we
have:

u • eiθ v = r > 0.

62
Applying now the real Cauchy-Schwarz inequality to u•v = Re(u•v), i.e. to the euclidean
inner product on R2n , yields:

r2 = |(u • eiθ v)|2 ≤ (u • u)(eiθ v • eiθ v),

and the r.h.s. clearly coincides with (u • u)(v • v). This proves inequality.
Assume that, in the above case, we have equality. This means that both vectors u and
eiθ v, as vectors in R2n , are linearly dependent over R, i.e. since u, v 6= 0, u = λeiθ v for
some λ ∈ R − {0}, which implies that u, v are linearly dependent over C.
If u, v are linearly dependent over C, then it is a simple matter to check that equality
holds in the hermitian case, which concludes the proof. 

On other proofs of hermitian Cauchy-Schwarz: The reader counts already with a


full set of tools to prove this inequality by other means, by following closely the proofs
offered in the real case. The complex analogue of the Lagrange identity supplies a full
proof. The computation of the norm of the normal component of u with respect to v
(when v 6= 0) provides another, and the study of the quadratic function of the complex
variable z ∈ C, kA + zBk2 and its minimization provides yet another proof (elaborate on
Exercise 5.3.1). 

5.3.10 (Lagrange identity) Let u, v ∈ Cn , and let ui , vi ∈ C be their respective i-th


coordinates. Prove that
n
! n ! n
!
X X X X
2 2 2
|ui | |vi | − | ui vi | = |ui vj − uj vi |2 ,
i=1 i=1 i=1 1≤i<j≤n

and derive the Cauchy-Schwarz inequality, specifying the cases where equality holds.

2
Solution for 5.3.1: |z|2 + Re(az) = |z + a2 |2 − |a|4 . The solution follows from this.
Alternatively, write x + iy and an equation for a circle appears.

Using the above, consider A, B ∈ Cn linearly independent vectors. kAz+Bk2 = kAk2 |z|2 +
2Re((A • B)z) + kBk2 > 0 for every z ∈ C, hence |A • B|2 < kAk2 kBk2 . The minimum
value of the expression occurs at z = − B•A
A•A
(see formula (5.1) or 5.3.1), which provides
yet another proof.

Solution for 5.3.10: One may prove the identity


n
! n ! n
! n !
X X X X X
ai Ai bi Bi − ai Bi Ai bi = (ai bj − aj bi )(Ai Bj − Aj Bi ),
i=1 j=1 i=1 i=1 1≤i<j≤n

and then apply ai = ui , Ai = ui , bi = vi , Bi = vi . 

63
5.4 Hermitian spaces
Definition An hermitian inner product on a vector space E over C is an hermitian
sesquilinear form that is positive definite. that is:
(IP1) (Sesquilinear) If α, β ∈ C, u, v, w ∈ E, then h(αu + βv, w) = αh(u, w) + βh(v, w)
(C-linear on the 1st variable). For all λ, µ ∈ C, v, w, w0 ∈ C, h(v, λw + µw0 ) =
λh(v, w) + µh(v, w0 ) (semilinear/antilinear on the 2nd variable).
(IP2) (Hermitian) Aside from (IP2), one has h(u, v) = h(v, u).
(IP3) (Positive definite) h(u, u) ≥ 0, and if h(u, u) = 0 then u = 0.
A complex vector space E together with an hermitian inner product h, (E, h), is called
an hermitian vector space (h is omitted unless it is not clear from the context). It is
common to write h(u, v) = (u, v)

Example 5.4.1 Let E = Cn , h(x, y) = xT y be the canonical hermitian inner product on


Cn . This is an hermitian vector space.
Rb
Example 5.4.2 Let E = CC [a, b] = {f ; [a, b] → C continuous}. Let (f, g)L2 = a f g.
This is the L2 -inner product on CC ([a, b]).

Definition An orthogonal basis of a finite-dimensional hermitian vector space E is a


basis u1 , . . . , un such that (ui , uj ) = 0 for i 6= j. An orthonormal basis further satisfies
that (ui , uj ) = δij , where δij = 1 if i = j, δij = 0 if i 6= j (δij are the coefficients of the
identity matrix).

Example 5.4.3 Let ek (x) = eikx , where k ∈ Z. We have ek ∈ C([−π, π]). These
functions form an orthonormal system for the normalised L2 product, which is (f, g) =
1
R π
2π −π
f g. Indeed,
Z π
1
(ek , e` ) = ei(k−`)x dx = 1 if k = `, 0 if k 6= `.
2π −π
These functions are linearly independent, as we proved in the chapter of vector spaces,
but cannot quite be called a basis.

Example 5.4.4 (Hilbert-Schmidt inner product) On Mn (C), consider the inner prod-
uct given by (X, Y ) = tr X T Y . This is an hermitian inner product, and (X, X) =
2
P
1≤i,j≤n |xi | .

Proposition 5.4.5PLet E be an P hermitian vector space, dim E = n < ∞. Let (ei ) be a


basis of E. If x = ni=1 xi ei , y = ni=1 yi ei , then (x, y) has the following expression:
Xn n
X n
X
(x, y) = ( xi ei , yj ej ) = (ei , ej )xi yj =
i=1 j=1 i,j=1

64
 
y1

(e1 , e1 ) (e1 , e2 ) . . . (e1 , en )
  (e2 , e1 ) (e2 , e2 ) . . . (e1 , en )   y2  
= x1 x2 . . . xn   . .
 .. 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
(en , e1 ) (en , e2 ) . . . (en , en ) yn
Notation and silly differences: Dirac, in his book The Principles of Quantum Me-
chanics, demands semilinearity on the left and C-linearity on the right when introducing
the hermitian inner product. Thus, the physicist’s definition describes h(u, v) = h(v, u),
where h is a mathematician’s hermitian inner product.
Dirac’s notation is hf |gi, and involves further ingenuous notations when we consider
operators, which we shall omit.
More and more mathematicians are using the physicist’s notation, see e.g. [15]. We care
not in the least about these differences, and would have preferred to write this chapter
with the physicist’s convention.
Theorem 5.4.6 (Cauchy-Schwarz inequality) Let E, (, ) be an hermitian or euclidean
vector space. If u, v ∈ E, then
|(u, v)|2 ≤ (u, u)(v, v),
and equality holds if and only if u, v are linearly dependent (i.e. parallel).
Proof: If v = 0, the theorem is clear. If v 6= 0, then (u − (u,v)
(v,v)
v, u − (u,v)
(v,v)
v) > 0 if u, v are
linearly independent, and 0 otherwise. This settles the theorem. 
Adapt Subsection 5.3.2 for other proofs of this Theorem.
Proposition 5.4.7 (Triangle inequality) Let E be an phermitian or euclidean vector
space. The triangle inequality holds for the norm kxk = (x, x). That is,
ku + vk ≤ kuk + kvk, with equality if and only if u = λv, λ ≥ 0, or v = µu, µ ≥ 0.
Proof: Indeed, squaring and applying Cauchy-Schwarz yields
ku + vk2 = kuk2 + kvk2 + 2Re(u, v) ≤ kuk2 + kvk2 + 2kuk· kvk,
with equality if and only if u||v and (u, v) is non-negative real. 
5.4.8 (Polarization identities) Let E be an euclidean or hermitian vector space, and
let (, ) be its inner product.
1. If E is euclidean, show that
1  1
ku + vk2 − kuk2 − kvk2 = ku + vk2 − ku − vk2 .

(u, v) =
2 4
2. Consider the above expression in the case where E is hermitian. Show that it gives
rise to Re(u, v) instead.
3. Compute (iu, v) in the hermitian case, and show how the norm squared determines
the hermitian inner product, just as it happened in the euclidean case.

65
5.5 Gram-Schmidt orthogonalization process
Let E be an euclidean or hermitian vector space. Given a vector u, it is automatic to
1
normalise u and thus to obtain a unit vector, by choosing kuk u. Likewise, given linearly
independent vectors u, v, it is easy to form another linearly independent set of vectors
which spans the same vector subspace but is orthogonal. Indeed, the pair u, v − (y,u)
(v,v)
u
does the job nicely, and can be normalised to obtain an orthonormal system of vectors.

Theorem 5.5.1 (Gram-Schmidt orthogonalization) Let u1 , · · · , ur be a linearly in-


dependent system of vectors in an euclidean or hermitian space E. There is an orthonor-
mal system v1 , · · · , vr satisfying

hu1 , · · · , ui i = hv1 , · · · , vi i, for every i ≤ r.

Proof: We prove by induction on r that there exists an orthogonal basis wi such that
hu1 , . . . , ui i = hw1 , . . . , wi i for every i ≤ r. If r = 1, the statement is obvious. Let
w1 , . . . , wr−1Pbe an orthogonal system such that hu1 , . . . , ui i = hw1 , . . . , wi i for i ≤ r − 1:
write wr = r−1 k=1 ak wk + ur . The condition that wr ⊥ wi , i ≤ r − 1 reads

0 = (wr , wk ) = ak (wk , wk ) + (ur , wk ),

and so therefore
r−1
X (ur , wk )
wr = ur − wk
k=1
(wk , wk )
1
satisfies the required condition. The choice of vk = w
kwk k k
settles the Theorem. 

Corollary 5.5.2 Let E be a finite-dimensional euclidean or hermitian vector space. There


exists an orthonormal basis for E.

Corollary 5.5.3 Let E be hermitian or euclidean, dim E = n < ∞, and let F ⊂ E be an


r-dimensional vector subspace. Choose an orthonormal basis u1 , . . . , ur of F , which exists
by Corollary 5.5.2. ui may be extended to an orthonormal basis of E.

Proof: Extend the basis of F given to a basis of E, then apply the Gram-Schmidt process.


5.5.4 Let f1 , . . . , fr ∈ C[a, b] be continuous functions on [a, b] with real values. Suppose
that the following determinant is zero:
Rb 2 Rb Rb
a
f1 a
f1 f2 . . . a f1 fn
Rb Rb 2 Rb
a
f 1 f 2 a
f 2 . . . ff
a 2 n = 0.
.............................
Rb Rb Rb 2
f f
a n 1
f f ...
a n 2
f
a n

Prove that f1 , . . . , fn are linearly dependent.

66
5.5.5 Let u1 , . . . , un be an orthogonal basis of an hermitian space E. Compute explicitly,
for every vector x ∈ E, the coordinates of x in the basis ui .

Clearly, for every orthogonal system of vectors u1 , . . . , ur in E, the vector x− ri=1 (u


(x,ui )
P
i ,ui )
ui
is orthogonal to uk for every k, and is zero if u1 , . . . , ur is a basis.
Proposition 5.5.6 Let F ⊂ E be a vector subspace of an euclidean or hermitian finite-
dimensional vector space E. The orthogonal subspace to F , F ⊥ is a vector subspace of
dimension dim E − dim F , and F ⊕ F ⊥ = E.
Proof: Indeed, if we choose an orthonormal basis e1 , . . . , er for F and extend it to an
orthonormal basis (ui )i≤n of E, we have that F ⊥ is the span of er+1 , . . . , en , and so
F ⊕ F ⊥ = E. 
Corollary 5.5.7 Notations and assumptions being as above, one has F ⊥⊥ = F .
Proof: Clearly, F ⊂ F ⊥⊥ , and applying Proposition 5.5.6 both to F and to F ⊥ yields
dim F ⊥⊥ = dim F , so F ⊥⊥ = F . 

5.5.8 Let E be an hermitian vector space, and let u1 , . . . , ur be linearly independent


vectors in E. Show that if G is the Gram matrix of u1 , . . . , ur , i.e. G = (gij ) with
gij = (ui , uj ), then det G > 0.

5.5.1 The Gram-Schmidt process and coordinates


Let u1 , . . . , ur be given linearly independent vectors in an hermitian or euclidean vector
space E. The orthogonal system w1 , . . . , wr obtained by the Gram-Schmidt process sat-
isfies the following property: if we call F = hu1 , . . . , ur i, one has the following expression
of wk in terms of the basis (ui )i≤r :

wi = a1i u1 + . . . + ai−1,i ui−1 + ur .


This is the case, since hw1 , . . . , wk i = hu1 , . . . , uk i for every k ≤ r, in particular for
k = i − 1. The process yields

i−1
X i−1
X
(5.2) wi = ui + ξj wj = ui + ηj uj
j=1 j=1

by the above equality, though we use the l.h.s. (left hand side) expression for computation
purposes.
Thus, if we should write the coordinates of the vectors w1 , . . . , wr in the basis ui , i ≤ r as
columns of a basis, the result shall be w1 = a11 u1 +. . .+ar1 ur , . . ., w1 = a1k u1 +. . .+ark ur ,
and by formula (5.2) one has aii = 1, and aij = 0 for i > j. In other words, the resulting
matrix A is upper triangular unipotent (upper triangular, and its diagonal consists only
of 1’s).

67
If we now consider vi = kw1i k wi , then the procedure yields an r × r matrix B of coefficients,
where the column Bk corresponds to kw1k k Ak . Thus, the matrix B of coordinates of the
vectors vi with respect to the basis ui of F is triangular, with only positive elements in
its diagonal (to be precise, kw1i k ).
Now consider any vector x = rk=1 αk vk . Writing vk as a linear combination of the ui for
P
every k yields an expression
r r
! r r
!
X X X X
αk ajk uj = ajk αk uj ,
k=1 j=1 j=1 k=1

P
and if x = βk uk , then the above reads as follows in matrix form:
        
β1 a11 . . . a1r α1 β1 α1
 ..   .. . . . .  ..   .. 
.= . . ..   ..  , i.e.  .  = A . .
 
βr ar1 . . . arr αr βr αr
Since both vk and uk form bases of F , the matrix A is invertible (in our case this is
explicitly shown by construction), and so
   
α1 β1
 ..  −1  .. 
 .  = A  . .
αr βr
We have thus shown:

Proposition 5.5.9 In the Gram-Schmidt process, let F =Phu1 , . . . , ur i andPlet vi be the


resulting orthonormal basis. Let x ∈ F be written as x = rk=1 βk uk , x = rk=1 αk vk in
both bases. The following connects both coordinate vectors:
       
β1 α1 α1 β1
 ..   ..   ..  −1  .. 
 .  = A . ,  .  = A  . .
βr αr αr βr

The following example will illustrate Exercise 5.5.8 and this paragraph.
R1
Example 5.5.10 Let E = C([−1, 1]) with the corresponding L2 -product, (f, g) = −1 f g.
Consider the elements ui = xi−1 , for i = 0, 1, 2. We shall apply the Gram-Schmidt process
to them. w1 = u1 , w2 = λu1 +u2 so that (w1 , u1 ) = 0, but since (u1 , u2 ) = 0 one has λ = 0.
The element w3 = αu1 + βu2 + u3 = α + βx + x2 is orthogonal to both u1 , u2 , i.e. 1, x,
α = 0, β = − 31 . For any polynomial in F = R2 [x] = hu1 , u2 , u3 i = hw1 , w2 , w3 i,
and so P
p(x) = αi wi , one has

1 1
α1 · 1 + α2 x + α3 (x2 − ) = β1 · 1 + β2 x + β3 x3 , hence β1 = α1 − α3 , β2 = α2 , β3 = α3 .
3 3
68
Thus
1 0 − 31
     
β1 α1
β2  = A α2  , where A = 0 1 0  .
β3 α3 0 0 1
The Gram matrices of the ui ’s and wi ’s are:
2 0 23
   
2 0 0
Gu =  0 23 0  , Gw = 0 23 0  .
2
3
0 52 0 0 15 1

Thus, ifPp(x) = Pβi ui = β1 + β2 x + β3 x2 = α1 · 1 + α2 x + α3 (x2 − 31 ) =


P P
αi wi and
q(x) = αi0 wi = β 0 iui , one has
 0  0
β1 α1
T
(p, q) = β1 β2 β2 Gu β2 = α1 α2 α2 A Gu A α20  .
0
 
 
β20 α20
Here the r.h.s. follows by mere substitution. On the other hand, we know that
 0
α1
(p, q) = α1 α2 α2 Gw α20  ,


α20
and so therefore Gw = AT Gu A (one may take α = ei , α0 = ej and thus check that all
entries of both matrices agree). This illustrates that det Gu = (det A)−2 det Gw , so det Gu
has the same sign as det Gw , which is clearly positive.

5.5.2 Solutions to exercises within this section


Solution for 5.5.4: See Lemma 5.6.1 below, or use 5.5.8 below.
Solution for 5.5.8:
Lemma 5.5.11 Let E be an euclidean or hermitian vector space. Let u1 , . . . , ur ∈ E
and vi , wj ∈ hu1 , . . . , ur i, where 1 ≤ i ≤ s and 1 ≤ j ≤ t. If vi = a1i u1 + . . . + ari ur ,
wj = b1j u1 + . . . + brj ur . If Pij = (ui , wj ), then
P = AT GB,
where G is the Gram matrix of u1 , . . . , ur .
   
a1i b1j
T  ..   .. 
Indeed, ei P ej = (ui , wj ), Aei =  .  and Bej =  .  . Thus,
ari brj
 
b
  1j
aki b`j (uk , u` ) = a1i · · · ari G  ...  = eTi AT GBej .
X
(ui , wj ) =

1≤i≤s,1≤j≤t brj

69
Pif ui is a set of r vectors and ei is an orthonormal basis of F = hu1 , . . . , ur i,
It follows that,
writing ui = sk=1 qki ek yields, for the Gram matrix of u1 , . . . , ur : G = QT G0 Q, where G0
is the Gram matrix of the ei ’s, hence G0 = Id and det G = | det Q|2 , and det G > 0 if the
ui form a linearly independent system. 
Note that 5.5.8 is a stronger result than Lemma 5.6.1 below!

5.6 Orthogonality and the approximation problem


Let E be an hermitian or euclidean space, and let F ⊂ E be a linear subspace of dimension
r = dim F . Choose a basis ϕi of F .
Given x ∈ E, we wonder what element x∗ ∈ F minimizes the distance to x, that is, what
x∗ ∈ F is such that
kx − x∗ k = min kx − yk.
y∈F

Lemma 5.6.1 Let E be an hermitian or euclidean space, and let u1 , . . . , ur ∈ E be lin-


early independent. The Gram matrix of u1 , . . . , ur is invertible.
 
α1
 .. 
Proof: Let G = (gij ), where gij = (ui , uj ). If α =  .  is in the kernel of G, we wish
αr
to extract a linear dependence relation on the ui ’s: the product
   Pr 
α (u , α u )

(u1 , u1 ) . . . (u1 , ur ) 1 1 i=1 i i
0 = Gα = . . . . . . . . . . . . . . . . . . . .  ...  =  ..
 = (?) = 0;
   
Pr.
(ur , u1 ) . . . (ur , ur ) αr (ur , i=1 αi ui )

αi ui = 0 now. Left multiplication by the vector α∗ yields


P
we may want to prove that

(u1 , ri=1 αi ui )
 P 
r r

0 = α Gα = α1 . . . αr 
 .
.
X X
=( αi ui , αi ui ) = 0,

Pr. i=1 i=1
(ur , i=1 αi ui )
hence α1 u1 + . . . + αr ur = 0, so therefore αi = 0 for all i ≤ r. 

Theorem 5.6.2 Let F ⊂ E be an r-dimensional subspace of an hermitian or euclidean


vector space E, where r ∈ N, and let x ∈ E. There exists precisely one element x∗ ∈ F
such that
kx − x∗ k = min kx − yk.
y∈F

Proof: There is precisely one x∗ ∈ F such that x − x∗ ∈ F ⊥ , and this x∗ ∈ F minimizes


the distance of x to F . The first claim follows from the system of equations

(x − x∗ , ui ) = 0, for 1 ≤ i ≤ r.

70
Write x∗ = ri=1 ci ui . The condition that x − x∗ ∈ F ⊥ is equivalent to a system of r
P
equations and r unknowns (ci ), namely
r
X

(x , ui ) = ck (uk , ui ) = (x, ui ), 1 ≤ i ≤ r.
k=1

The above system of equations (the normal equations is the usual term) is as follows:
   
c1 (x, u1 )

(u1 , u1 ) (u2 , u1 ) . . . (ur , u1 )
(u1 , u2 ) (u2 , u2 ) . . . (ur , u2 )  c2  (x, u2 )
 
  =
.   . .
 ..   .. 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
(u1 , ur ) (u2 , ur ) . . . (ur , ur ) c (x, ur )
| {z } r
=GT =G
 
c1
 c2 
If G is the Gram matrix of u1 , . . . , ur and c =  .. , we get
 
.
cr
 
(x, u1 )
(x, u2 )
c = (G)−1  ..  ,
 
 . 
(x, ur )

which proves both existence and uniqueness. It remains to prove that the element obtained
minimizes the distance. Let y ∈ F : by Pythagoras’s theorem, one has

x − y = (x − x∗ ) + (x∗ − y), hence kx − yk2 = kx − x∗ k2 + kx∗ − yk2 ,


| {z } | {z }
∈F ⊥ ∈F

so kx − yk > kx − x∗ k whenever x∗ 6= y ∈ F .
The fact that F ∩ F ⊥ = 0 is immediate (u ∈ F ⊥ ∩ F ⇒ u ⊥ u, i.e. (u, u) = 0), as is the
fact that F + F ⊥ = E (x = x∗ + (x − x∗ ) ∈ F + F ⊥ ). 

Remark 5.6.3 (Streamlined proof ) Note that, if we use the Gram-Schmidt process
and produce an
P orthonormal basis e1 , . . . , er of F , the solution is straightforward: the
element x∗ = ri=1 (x, ei )ei is the desired orthogonal projection.

5.6.1 The least squares method


Given a system of linear equations Ax = b, where b ∈ Rm and A is m × n, it may
happen that it is inconsistent. In other words, it may be that such x ∈ Rn does not exist;
equivalently, b ∈/ Im A. In that case, we consider an additional hypothesis, which is that
ker A = 0, i.e., the columns of A are linearly independent.

71
Let b0 be the orthogonal projection of b on Im A. The system Ax = b0 has a solution,
which is necessarily unique since ker A = 0 (so the homogeneous solution is {0}). The
unique x satisfying Ax0 = b0 is said to be obtained using the least squares method, for it
minimizes kAx − bk.
Note that, since ker A = 0, ker AT A = 0 by Lemma 5.6.1, and AT A is a square matrix of
order r, hence invertible. Thus, Ax0 = b0 if and only if AT Ax0 = AT b0 = AT b (note that
b − b0 ∈ F ⊥ and so is orthogonal to the columns of A, hence AT (b − b0 ) = 0), and so

x0 = (AT A)−1 AT b.

Remark 5.6.4 The above argument is nothing more than the normal equations applied
to the case ui = Ai , and so on.

Example 5.6.5 (Linear regression) Let xi 6= xj ∈ R for 1 ≤ i, j ≤ n, i 6= j. Given a


‘cloud’ of points (xi , yi ) on the plane, we wish to find the line y = ax + b that best fits this
cloud of points. The problem to be solved, whether consistent or not, is the following:
   
x1 1 y1
 x2 1    y2 
a
A =  .. ..  , β= , y =  ..  , Aβ = y.
   
 . . b .
xn 1 yn
 
a T
One readily sees that rk A = 2. It remains to solve the system A A = AT y. Since
b
P 2 P  P 
T x xi T x i y1
A A= P i , A y= P ,
xi n yi

so therefore
    P 2 P −1 P 
a T −1 T a x i x i x i y i
= (A A) A y, i.e., = P P .
b b xi n yi

Proposition 5.6.6 The solution x0 provided minimizes kAx − bk2 .

Indeed, this whole paragraph is merely a particular case of Theorem 5.6.2 and the discus-
sion around it.

Ax − b = (Ax − b0 ) + (b0 − b), hence kAx − bk2 = k(Ax − b0 )k2 + k(b0 − b)k2 ≥ kb0 − bk2 ,

and this lower bound is attained precisely when x = x0 . 

72
5.7 Fourier series and L2-projection
Consider the hermitian space E = CC (T), which corresponds to the space of contin-
uous 2π-periodic functions f : R → C (with complex values), and to the (averaged)
L2 -hermitian product Z π
1
(f, g) = f g.
2π −π
Let Fn = nk=−n Cek , where ek (t) = eikt (this is a known orthonormal system to us). The
L
elements of Fn are called trigonometric polynomials of degree ≤ n.

Definition Let f ∈ CC (T) (one may define this for arbitrary functions that are Riemann
integrable over [−π, π]). Let n ∈ Z. The n-th Fourier coefficient of f is defined to be
Z π
ˆ 1
f (n) = (f, en ) = f (t)e−int dt.
2π −π
Since the ek form an orthonormal system, Fn clearly has an orthonormal basis ek , and
the orthogonal projection of f to Fn coincides with
n
X
Sn [f ](x) = fˆ(k)eikx .
k=−n

5.7.1 Prove indeed that Sn [f ] is indeed the orthogonal projection of f on Fn .

5.8 Problems
5.8.1 Consider the euclidean vector space M2
(R) with
 the Hilbert-Schmidt inner product,
3 2
(A, B) = tr AT B. Given the matrix U = , find the orthogonal projection of
  2 1
1 1
V = on the subspace hU i⊥ .
1 2

5.8.2 (VE1 2021) Let F, G ⊂ E be vector subspaces of a vector space E over R of


dimension n = dimR E < ∞, such that F ∩ G = {0}. Prove that there exists an inner
product on E such that F and G are orthogonal subspaces, i.e. (u, v) = 0 for every u ∈ F ,
v ∈ G.

5.8.3 (Bessel’s Inequality) Let f ∈ E. Then, for every n ∈ N,


Z π n
1 X
|f |2 ≥ |fˆ(k)|2 .
2π −π k=−n

5.8.4 Given f (x) = π 2 − x2 , f ∈ E and n ∈ N, find c0 , . . . , cn such that f ∗ (x) =


Pn ∗
j=0 cj cos jx minimizes the norm kf − f kL2 .

73
5.8.5 Let E = C([−1, 1]) (real vector space). Let E ⊂ E (resp. O ⊂ E) be the subspace
of even functions in E (resp. odd functions). Prove that E ⊥ = O.

5.8.6 (QR factorization) Let A be a square matrix of order n, such that det A 6= 0.
Prove that there exist a matrix Q such that QT Q = I and an upper triangular matrix R
with positive diagonals rii > 0 such that A = QR, and that they are both unique.

5.8.7 Let n ≥ 2 be an integer. Is the following matrix invertible?


1 1 1

1 2
... n
1 1
. . . n+1 1 
2 3
. . . . . . . . . . . . . . . . . .

1 1 1
n n+1
. . . 2n−1

5.8.8 Let w(x) > 0 be a continuous function, w : [0, 1] → R. Prove that the following
defines an inner product on E = Rn [x], where n ∈ N:
Z 1
(p, q)w = p(x)q(x)w(x) dx.
0

1. Prove that E has an orthonormal basis pk ∈ E, such that deg pk = k for each
0 ≤ k ≤ n.

2. Prove that (pk , p0k )w = 0 for each k.


?
5.8.9 Is the following matrix invertible?
 1 1

n+1
... 2n
 1 ... 1 
 n+2 2n+1 
 .. .. ..  .
 . . . 
1 1
2n
... 3n−1

Justify your answer.

5.8.10 (Apostol Vol. II, Ch. 1) Define on Rn [x] (vector space of polynomials of de-
gree ≤ n over R) the following function:
n
X k k
B(f, g) = f ( )g( ).
k=0
n n

1. Show that B is an inner product.

2. Compute, for a, b ∈ R, the inner product B(x, ax + b).

3. Find out which linear polynomials are orthogonal to x under B.

74
5.8.11 Show that the following functions form an orthogonal system in E = C([−π, π]):

1, {cos nx, sin nx}n∈N ,

and derive the formulas for an , bn below:

1 π π
Z Z
1
an = f (x) cos nx dx, bn = f (x) sin nx dx,
π −π π −π

where a0 , an and bn (n ∈ N) are the Fourier coefficients associated with the above trigono-
metric functions, i.e. for every N ∈ N, the function f has the following orthogonal
projection Fn = h1, {cos kx, sin kx}1≤k≤n i
n
a0 X
πFn f = + ak cos kx + bk sin kx.
2 k=1

5.8.12 Consider En to be the vector space over C of sesquilinear forms on Cn . (Prove


first that En is a vector space over C).

1. Prove that to every element B of En corresponds a unique matrix A ∈ Mn (C) such


that B(x, y) = xT Ay.

2. Prove that B ∈ En is hermitian if and only if AT = A, or in other words, A∗ = A


T
(here A∗ = A ).

3. Prove that B ∈ En is anti-hermitian (i.e. B(y, x) = B(x, y) for every x, y) if and


only if iB is hermitian.

4. Every sesquilinear form may be written uniquely as B = η + ξ, where η is hermitian


and ξ is anti-hermitian.

5. Compute dim Hn , where Hn is the subspace of hermitian forms in En .

Appendix: Fourier series (optional)


While working on the heat equation, Joseph Fourier surmised that every 2π-periodic
bounded function f : R → R with finitely many discontinuities in every finite interval has
an associated (Fourier) series,

a0 X
f∼ + an cos nx + bn sin nx,
2 n=1

where an , bn are computed in terms of products of the type f (x) cos nx or f (x) sin nx.
Many of Fourier’s arguments were infantile and error-ridden, but this powerful idea sur-
vived and enriched after polishing by towering geniuses such as Riemann and Dirichlet.

75
If we have not thus far unashamedly written that

a0 X
“f (x) = + an cos nx + bn sin nx”,
2 n=1

it is due to lack of precision regarding the word ‘convergence’ and other issues.

Meaning of L2 and convergence


There are two obstacles that naturally arise in our attempt to make sense of a possible
identity with an infinite series, which is precisely a limit of finite sums. This means, we
mean to say that f is the limit of finite sums.
The first issue is, what one means by equality. The function f (x) = 0 for every x 6= pi ,
where p1 , . . . , pm are pairwise distinct points in [−π, π], passes off as zero on every inner
product, since they are given by integrals. As well, (f, f ) = 0, which is reserved to 0 ∈ E.
Of course, f ∈ / E to start with!
The other issue is, in what sense do we portray f as a limit? As an example, let 0 < α <
1/2. Define the sequence
1 1
fn (x) = nα for 0 ≤ |x| ≤ , fn (x) = x−α for |x| > .
n n
The functions fn ∈ E. On the other hand, if f (x) = x−α , then limn→∞ (f −fn , f −fn ) = 0.
And yet, much as we may call f the limit of the sequence (fn ), f is clearly not continuous
in [−π, π], and is not even bounded.
It is for those reasons that these issues belong to another book, see e.g. [35].

Specific convergence results


Proposition 5.8.13 (Parseval’s identity) One R π has, for f a complex function with finitely
2
many discontinuities in [−π, π] and such that −π |f | < ∞, the following equality holds:
Z π
1 X
(f, f ) = |f |2 = |fˆ(n)|2
2π −π n∈Z

Proposition 5.8.14 If f : R → C is 2π-periodic and of class C 1 (the derivative exists


and is continuous), then Sn [f ] converges uniformly to f in R.
Theorem 5.8.15 (Dirichlet’s Test) If f is piecewise C 1 and has a finite number of
discontinuities (only jump or avoidable discontinuities allowed), then for every x0 the
numerical sequence SN [f ](x0 ) is convergent, and its limit is precisely

f (x+
0 ) + f (x0 )
lim SN [f ](x0 ) = .
N →∞ 2
5.8.16 Using the Fourier expansions for x and x2 on [−π, π], find the numbers ∞ 1
P
n=1 n2
and ∞
P 1
P∞ 1 π2
P∞ 1 π4
n=1 n4 . (Answers: n=1 n2 = 6
, n=1 n4 = 90
. See the next paragraph!)

76
Problems and famous results on Fourier series
Here we pose problems that use the above described techniques (taken from [40]), and
state some amazing results.

5.8.17 Find the Fourier series of f (x) = x on [−π, π]. (Find the answer below)
 
sin 2x sin 3x
x ∼ 2 sin x − + − ...
2 3

5.8.18 Find the Fourier series of f (x) = x2 on [−π, π].



π2 X
2 4
Answer: x = + (−1)n 2 cos nx.
3 n=1
n

5.8.19 Find the Fourier series of f (x) = x3 on [−π, π]. (Answer: See the expansion
below, which converges pointwise to x3 if |x| < π)

−π 2
 
3
X
n 6
x ∼ 2(−1) + 3 sin nx.
n=1
n n

5.8.20 Find the Fourier series of f (x) = |x| on [−π, π]. (The convergence is also point-
wise in the points specified below).

π X 4
|x| = − cos nx, |x| < π
2 n=0 (2n + 1)2 π

The following may be used to prove the Theorem at the end.

5.8.21 Let a ∈
/ Z be a real number. Find the Fourier series of cos ax in [−π, π].

5.8.22 Use 5.8.21 to show that, for z ∈


/ πZ,
∞  
1 1 X 1 1
= + (−1)n + .
sin z z n=1 z + nπ z − nπ

∞  
1 X 1 1
cot z = + + .
z n=1 z + nπ z − nπ

Euler presented several proofs that ∞ 1 π2


P
n=1 n2 = 6 . The first one was a bit unrigorous,
but served
P as1 a stepping stone for the following general result. We use the notation
ζ(s) = ∞ n=1 ns , for s ∈ C, Re s > 1 (this was Euler’s notation!).

77
P∞ 1
Theorem 5.8.23 (Euler) If p ∈ N, then the number ζ(2p) = n=1 n2p is a rational
number times π 2p . More precisely,

22p−1 π 2p B2p
ζ(2p) = (−1)p+1 ,
(2p)!

where Bk are the Bernoulli numbers.

Euler was the one to introduce the so-called Riemann zeta function, and produced other
interesting results on ζ(s), including the famous formula

X 1 Y 1
s
= 1 .
n=1
n p prime
1 − ps

ζ is called the Riemann zeta function due to a fascinating paper of Riemann’s, where
he proves its analytic continuation in two ways and connects it to the distribution of
prime numbers. The Prime Number Theorem (asymptotic distribution of prime numbers)
had been conjectured by Gauss himself, and was settled independently by Hadamard
and de la Vallée-Poussin using the Riemann zeta function. The Riemann Hypothesis,
which Riemann conjectured in that outstanding paper, is one of the Clay Math Institute’s
Millennium Problems on mathematics.

78
Chapter 6

The spectral theorem

6.1 Bilinear and sesquilinear forms


Definition Let E be a vector space of dimension n over a field K. A bilinear form on
E is a map
B :E×E →K
such that for every x, y ∈ E the maps B(•, y) and B(x, •) are both linear maps from E
to K.
P
If
P e1 , · · · , en is a basis for E, then B(x, y) has the following expression for x = xi e i , y =
yi ei : X X X
B(x, y) = B( xi e i , yj ej ) = xi yj B(ei , ej ).
i j 1≤i,j≤n

This may be written in matrix form:


  
B(e1 , e1 ) . . . B(e1 , en ) y1
.
.. .
.. .
.. .
B(x, y) = (x1 · · · xn )    ..  .
  
B(en , e1 ) . . . B(en , en ) yn

If we define the matrix Be as (Be )ij = B(ei , ej ), then

(6.1) B(x, y) = (x)Te Be (y)e .

A bilinear form is symmetric if B(x, y) = B(y, x) for every x, y ∈ E. By formula (6.1),


B is symmetric if and only if, for one basis (or for every basis) (ei ) of E, Be = BeT is a
symmetric matrix.

Proposition 6.1.1 (Base change) Let (ei ), (ui ) be two bases of an n-dimensional vector
space E, and let B be a bilinear form on E. If Γ is the matrix with columns (ui )e , then

Bu = ΓT Be Γ.

Proof: It suffices to note that (x)e = Γ(x)u for any x ∈ E.

79
Definition Let E be a real vector space. An inner product on E is a bilinear form (, )
on E satisfying the following properties:

1. (x, y) = (y, x) (symmetric);

2. (x, x) ≥ 0, and (x, x) = 0 ⇒ x = 0 (positive definite).

A symmetric bilinear form such that S(x, x) ≥ 0 but S(x, x) may be 0 for some x 6= 0 is
called positive semidefinite.

Remark 6.1.2 Note that a bilinear form on a vector space E is determined by the values
B(ei , ej ), where (ei ) is a basis of E.

Example 6.1.3 Let B : R2 × R2 → R be the bilinear form determined by B(e1 , e1 ) =


1, B(e2 , e1 ) = 3 = B(e1 , e2 ), and B(e2 , e2 ) = 5. Prove that B is an euclidean inner product
on R2 . The symmetric bilinear form B 0 given by B 0 (e1 , e1 ) = 2, B 0 (e1 , e2 ) = 4, B 0 (e2 , e2 ) =
8 is instead positive semidefinite, but is not an inner product. We leave the details to the
reader.

6.1.4 Let E = h1, cos x, cos 2x, · · ·R, cos nx, sin x, · · · , sin nxi ⊂ C ∞ (R). Find the ma-

trix of the bilinear form B(f, g) = 0 f g in the basis listed. Using this, prove that the
functions are linearly independent.

Example 6.1.5 Let ABCD be the standard regular tetrahedron in Euclidean space R3 .
~ AC,
Find the matrix of the canonical inner product in the basis AB, ~ AD.
~
Note that the norm of each vector is 1, and the inner product of two distinct vectors is
1/2. The Gram matrix with respect to this basis is therefore
 1 1
1 2 2
1 1 1 .
2 2
1 1
2 2
1

Proposition 6.1.6 E, B as above. The maps E → E ∗ defined by x 7→ B(x, •) and


x 7→ B(•, x), which need not be equal, have the same rank.

Proof: Indeed, fixing a basis (ei ) on E and its dual basis (e∗i ) on E ∗ , the matrices of both
maps correspond to Be and BeT .

Proposition 6.1.7 (Polarization identities) Let K be a field, such that char K 6= 2.


Let B be a symmetric bilinear form, and let Q(x) = B(x, x) be the associated quadratic
form. Q(x) determines B, and in fact
1 1
B(x, y) = [Q(x + y) − Q(x) − Q(y)] = [Q(x + y) − Q(x − y)].
2 4
6.1.8 Prove Proposition 6.1.7.

80
Definition Let E be a complex vector space. A sesquilinear form is a map

B :E×E →C

that is bilinear over R, and such that B(λx, y) = λB(x, y), B(x, µy) = µB(x, y) for all
λ, µ ∈ C, x, y ∈ E. In other words, B is bilinear in the first variable and semilinear (i.e.
antilinear) on the second. A sesquilinear form B is called hermitian if B(y, x) = B(y, x)
for all x, y ∈ E.

The following proposition furnishes lots of examples.

Proposition
P P Let dim E = n ∈ N (finite-dimensional). If ei is a basis of E and
6.1.9
x = xi ei , y = yj ej , then
  
B(e1 , e1 ) . . . B(e1 , en ) y1
.
.. .
.. .
.. .
B(x, y) = (x1 · · · xn )    ..  .
  
B(en , e1 ) . . . B(en , en ) yn

Proposition 6.1.10 (Base change for sesquilinear forms) Let (ei ), (ui ) be two bases
of an n-dimensional vector space E over C, and let B be a sesquilinear form on E. If Γ
is the matrix with columns (ui )e , then

Bu = ΓT Be Γ.

Definition Let E be a complex vector space. An (hermitian) inner product on E is


a sesquilinear form (, ) on E satisfying the following properties:

1. (x, y) = (y, x) (hermitian);

2. (x, x) ≥ 0, and (x, x) = 0 ⇒ x = 0 (positive definite).

An hermitian form such that S(x, x) ≥ 0 but S(x, x) may be 0 for some x 6= 0 is called
positive semidefinite.

Example 6.1.11 On Cn , take the standard hermitian inner product


X
(x, y) = xi y i .

(, ) is an hermitian inner product.

6.2 Euclidean and hermitian spaces


Definition An euclidean space (resp. hermitian) is a pair (E, h), where E is a vector
space over R (resp. C) endowed with an euclidean (resp. hermitian) inner product h.

81
Example 6.2.1 Rn , resp Cn with the canonical euclidean (resp. hermitian) inner product
are prime examples of euclidean (resp. hermitian) vector spaces.

Example 6.2.2 Let [a, b] ⊂ R be a compact interval, and let V ⊂ CC ([a, b]) be a vector
subspace. Consider the inner product
Z b
hf, gi = f g.
a

The pair V, h , i is an hermitian vector space.

Definition In an hermitian or Euclidean space V, h, i, an orthogonal system (ui ) is a


subset of V such that every two different elements are orthogonal, i.e. hui , uj i = 0, i 6= j.
An orthonormal system is an orthogonal system of unit elements, namely, such that
hui , uj i = δij . Orthogonal (resp. orthonormal) bases are bases which are orthogonal
(resp. orthonormal).

Example 6.2.3 Prove that the system (eikx )k∈Z in CC ([−π, π])R 2πis an orthonormal sys-
1
tem, when choosing the normalised inner product hf, ghL2 = 2π 0 f g. Conclude that this
system is linearly independent, and compare with 6.1.4.

Example 6.2.4 (Hilbert-Schmidt inner product) The following sesquilinear form on


Mm×n (C) is an hermitian inner product:

hA, Bi = T r(AT B),

and is called Hilbert-Schmidt inner product. (The real version is T r(X T Y ) on Mm×n (R).)

Remark Note that, if ei is a basis of E, then it has a matrix associated with ei :


 
(e1 , e1 ) . . . (e1 , en )
 .. .. ..  .
 . . . 
(en , e1 ) . . . (en , en )

This matrix is called Gram matrix of the basis ei . Given a system of vectors u1 , . . . , ur ,
the Gram matrix associated with the ui ’s is the r × r-matrix
 
(u1 , u1 ) . . . (u1 , un )
Ge =  ... .. ..  .

. . 
(un , u1 ) . . . (un , un )

Example 6.2.5 In an euclidean vector space E, the Gram matrix associated with an
orthonormal basis ei is the identity, Ge = I. Thus, the inner product in this basis is
expressed as follows: X X X
( xi ei , yj ej ) = xi y i .
If the basis is orthogonal, then Ge is diagonal and non-degenerate (det Ge 6= 0).

82
6.2.6 The Hilbert-Schmidt inner product is the only inner product on Mm×n (C) having
the canonical basis of matrices as an orthonormal basis.

Theorem 6.2.7 (Pythagoras’s Theorem) On an euclidean or hermitian space E, if


u, v ∈ E are orthogonal then |u + v, u + v|2 = |u|2 + |v|2 .

Proof: Indeed, expanding (u + v, u + v) yields (u + v, u + v) = (u, u) + (v, v) as desired.



Theorem 6.2.8 (Cauchy-Schwarz inequality) Let u, v ∈ E, where E is an euclidean
or hermitian space. The following inequality holds:

|(u, v)|2 ≤ (u, u)(v, v),

with equality if and only if u, v are linearly dependent.


Proof: We give one possible proof. If u = 0, the theorem clearly holds, so we shall
assume u 6= 0. If u, v are linearly dependent, then v = αu and equality is easily checked,
so we assume further that  u, v are linearly independent. The vector v decomposes as
(v,u) (v,u)
v = (u,u) u + v − (u,u) u , where the first summand is the orthogonal projection of v on
the direction of u and the second is its normal component. By Pythagoras’s Theorem, we
have
   
(v, u) (v, u) (v, u) 2 2 (v, u) 2
v + λu = λ + u+ v − u ⇒ |λu + v|2 = |λ + | |u| + |v − u| .
(u, u) (u, u) (u, u) (u, u)
One sees by inspection that the absolute minimum of this function of λ is attained when
(v,u)
λ = − (u,u) , and is > 0, since hu, v+aui = hu, vi has dimension 2 for any a ∈ C. Expanding
(v,u) 2
the expression |v − (u,u)
u| yields:

(v, u) 2 (v, u)(u, v) (v, u)(u, v) (v, v)(u, u) − |(u, v)|2


0 < |v − u| = (v, v) + −2 = ,
(u, u) (u, u) (u, u) (u, u)
which settles the theorem. 
Corollary 6.2.9 (Triangle inequality) |u + v| ≤ |u| + |v|. Equality holds if and only if
u, v are proportional and the proportionality constant is non-negative.
Proof: Squaring the inequality yields

(u + v, u + v) = (u, u) + (v, v) + 2Re (u, v) ≤ (u, u) + 2|u|· |v| + (v, v).

This in turn is equivalent to Re (u, v) ≤ |(u, v)| ≤ |u|· |v|, and for equality to hold the
Cauchy-Schwarz inequality must turn into an equality, i.e. u, v are linearly dependent.
Assuming that none is zero, we have u = αv, and for both inequalities to become equalities
it is necessary and sufficient to have a ∈ [0, ∞). 
Note that the notion of angle between two nonzero vectors u, v 6= 0 can now be defined:
such angle θ is the only angle between 0 and π such that cos θ = Re|u|·|v|
(u,v)
.

83
T
Notation: Let A ∈ Mn (R) (resp. C). We denote by A∗ the matrix A (which in the real
case is of course AT ).

6.2.10 Let A be a real or complex m × n matrix. Prove that ker A∗ A = ker A . Analo-
gously, prove that Im AA∗ = Im A.

6.2.1 The Gram-Schmidt orthogonalization process


Theorem 6.2.11 (Gram-Schmidt orthogonalization process) Let u1 , . . . , ur be a lin-
early independent system of vectors in an euclidean or hermitian vector space E, h, i.
One may obtain vectors u0i , 1 ≤ i ≤ r such that hu1 , · · · , ui i = hu01 , · · · , u0i i for all i ≤ r,
such that u0i form an orthogonal system. Normalising u0i one may obtain an orthonormal
system v1 , . . . , vr such that hu1 , · · · , ui i = hv1 , · · · , vi i for all i ≤ r.

Proof: Take u01 = u1 , and v1 = 1


u0 .
|u01 | 1
As for u02 , choose u2 + a1 u1 that is orthogonal to
u01 = u1 :

(u2 + a1 u1 , u1 ) = 0, i.e. (u2 , u1 ) + a1 (u1 , u1 ) = 0.

This determines a1 = − (u 2 ,u1 )


u1 ,u1 )
. Now assume that we have determined u01 , · · · , u0r−1 ; we
shall find u0r = ur + (b1 u01 + · · · + br−1 u0r−1 (the vectors involved in this linear combination
generate hu1 , . . . , ur i).
The vector ur must be orthogonal to u01 , · · · , u0r−1 or, equivalently, to u1 , · · · , ur−1 , so the
equations are
(ur , u01 ) = 0, · · · , (ur , u0r−1 ) = 0,
which become
(ur , u0i )
bi = − , for all i = 1, . . . , r − 1.
(u0i , u0i )
By the operations performed, one clearly has an orthogonal system u01 , · · · , u0r such that
hu1 , · · · , ui i = hu01 , · · · , u0i i also for i = r.
Normalising the u0i yields an orthonormal system satisfying the same property: hu1 , · · · , ui i =
hv1 , · · · , vi i for i ≤ r. 

Corollary 6.2.12 Every finite-dimensional hermitian or euclidean space E has an or-


thonormal basis.

Proof: Applying the Gram-Schmidt process to a basis of E yields an orthonormal basis


of E. 

Corollary 6.2.13 Let u1 , . . . , ur be a linearly independent set of vectors in an hermitian


or euclidean space E. The Gram matrix of the ui has a positive determinant.

84
Proof: The ui form a basis of F = hu1 , · · · , ur i. Choose an orthonormal basis (ei ) of F ;
one has
Gu = ΓT Γ,
where Γi = (ui )e base change matrix, so indeed det Gu = | det Γ|2 > 0. 

6.2.14 Prove that u1 , · · · , ur are linearly dependent if and only if their Gram matrix is
non-invertible, namely det Gu = 0.

Remark Note that Corollary 6.2.13 and 6.2.14 imply the Cauchy-Schwarz inequality
when r = 2, and form a ‘high-degree’ version thereof.

Theorem 6.2.15 Let F be a vector subspace of a finite-dimensional hermitian or eu-


clidean vector space E. The following decomposition holds:

F ⊕ F ⊥ = E.

Proof: The Gram-Schmidt process makes the proof quick, though it is not strictly nec-
essary. It is clear that F ∩ F ⊥ = 0, for if v ∈ F ∩ F ⊥ , then (v, v) = 0, hence v = 0.
Given an orthonormal basis (ei )1≤i≤r of F , extend it to a basis (ei ) of E. By the Gram-
Schmidt process, one may assume that this basis is orthonormal. It is easy to see that
F ⊥ = her+1 , · · · , en i, which proves the theorem. 

6.3 Sylvester’s criterion for inner products


Let B be a real symmetric or complex hermitian matrix. We shall find a criterion to
determine whether B defines an euclidean or hermitian product, i.e. whether or not G is
positive definite.
The starting point is Corollary 6.2.13.

Theorem 6.3.1 (Sylvester’s criterion) The following are equivalent for an n × n real
symmetric or complex hermitian matrix B:

1. B is positive definite.

2. The angular minors of B are all positive, namely

b11 b12 . . . b1r


b21 b22 . . . b2r
.. .. .. .. > 0,
. . . .
br1 br2 . . . brr

for all r = 1, · · · , n.

85
Proof: One direction follows from Corollary 6.2.13. Assuming 2, let us prove 1 by
induction on n. If n = 1, it is clear. If n > 1, assuming 2 for all r we have: 1 holds for
the subspace he1 , · · · , en−1 i. Thus, since the submatrix B 0 = (bij )1≤i,j≤n−1 satisfies 2, B 0
is positive definite, and so by Corollary 6.2.12 admits an orthonormal basis v1 , · · · , vn−1 .
Let the base change matrix be Γ0 (square (n − 1) × (n − 1)). After performing a base
change, B transforms as follows:
  0  T 
Γ 0 B w Γ 0
(?) = T .
0 1 w α 0 1
One sees that (?), which is the matrix of B in the basis v1 , · · · , vn−1 , en of Cn , corresponds
to  
I a
(?) = Φ = .
aT β
Pn−1
Now, by virtue of 2 we still have det Φ > 0, i.e. det Φ = β − 1=1 |ai |2 > 0 (see e.g. ??).
The matrix Φ still satisfies 2, and the vectors e1 , . . . , en−1 of the canonical basis are
orthonormal. Let us complete the Gram-Schmidt process (if possible).
Indeed, it remains to modify en so that e0n shall be orthogonal to ei , i ≤ n − 1. Consider
n−1
X
e0n = en − (en , ei )ei :
i=1

clearly,
Pn−1 e0n ⊥ ei , i ≤ n − 1 by construction (do check it!) One also has that e0T 0
n Φen =
β − i=1 |ai |2 = µ > 0, and so e1 , . . . , en−1 , √1µ e0n form an orthonormal basis for Φ, which
means that Φ is positive definite and hence so is G. 

6.3.1 Generalising Sylvester’s criterion


Proposition 6.3.2 Let A be a real symmetric matrix, with nonzero angular minors of
[k]
every order, µk = A[k] . There exist a lower triangular unipotent matrix L and a diagonal
matrix D such that A = LDLT .

Proof: Decompose A = LU , and write U = DLT1 . Such decomposition exists and is


unique, by Section ??. Now, A = AT = L1 DLT , and uniqueness yields L = L1 . 

Theorem 6.3.3 Let G be a real n × n symmetric matrix, with nonzero angular minors
µk 6= 0 of every order k. There is a triangular basis wherein G admits a diagonal form
D = diag(d1 , · · · , dn ), and the signs of the diagonal are as follows: sgn(d1 ) = sgn(g11 ),
and sgn(dk ) = sgn(µk /µk−1 ) for 2 ≤ k ≤ n. An analogous statement holds for a complex
hermitian matrix G: a decomposition LT DL = G exists, and the rest holds verbatim.

6.3.4 Prove Theorem 6.3.2.

6.3.5 Prove Theorem 6.3.3.

86
6.4 The adjoint operator
Definition Given two euclidean or hermitian spaces E, F (of finite dimension), the ad-
joint operator of a linear map f : E → F is the unique linear map f ∗ : F → E such
that, for every x ∈ E, y ∈ F

hf (x), yiF = hx, f ∗ (y)iE .

If we choose bases ei of E and ui of F , let A, B be the matrix of f, f ∗ in these bases and


let G, G0 be the Gram matrices of the respective inner products of E, F in the chosen
bases. The definition of f ∗ reads as follows.

AT G0 = GB.

In other words,
−1 T T
(6.2) B = G A G = (GT )−1 A GT .

6.4.1 One has f ∗∗ = f .

Remark Note that, if E, F are fixed, then the map f 7→ f ∗ from HomC (E, F ) to
HomC (F, E) is R-linear, and C-semilinear in the hermitian case.

Corollary 6.4.2 Choose orthonormal bases (ei ) of E and (uj ) of F . If the matrix of f
T
in these bases is A, then the matrix of f ∗ in the same bases is A .

Proposition 6.4.3 Let f : E1 → E2 be a linear map, where Ei are both euclidean or


hermitian spaces. One has (ker f )⊥ = Im f ∗ , ker f ∗ = (Im f )⊥ .

Proof: We prove the first statement only. f (x) = 0 if and only if (f (x), v) = 0 for all
v ∈ E2 , i.e. if for all v ∈ E2 0 = (x, f ∗ (v)), but this is tantamount to x ∈ (Im f ∗ )⊥ .

Corollary 6.4.4 Given a matrix A ∈ Mm×n (C), one has ker A = (Im A∗ )⊥ , and ker A∗ =
(Im A)⊥ . 

Definition Let E be an euclidean or hermitian vector space, and let f be an endomor-


phism of E. We say that f is self-adjoint (symmetric in the real case, hermitian in the
complex case) if f = f ∗ . We say that f is anti-self-adjoint (antisymmetric in the real
case, antihermitian in the complex case) if f ∗ = −f .

Definition Let E1 , E2 be hermitian or euclidean vector spaces. A linear operator f :


E1 → E2 is an isometry (i.e. a linear isometry) if (x, y)1 = (f (x), f (y))2 for every
x, y ∈ E1 . If E1 = E2 = E, isometries are called orthogonal operators (euclidean
case), resp. unitary operators (hermitian case).

6.4.5 Isometries are always injective, and invertible in the case of endomorphisms of a
finite-dimensional vector space.

87
6.5 Spectral theorem for self-adjoint operators
Here E is finite-dimensional.

Proposition 6.5.1 Let f ∈ End(E) be an hermitian operator. Every eigenvalue of f is


real.

Proof: Indeed, if we fix an orthonormal basis ei of E, the matrix of f satisfies A = A∗ .


If v ∈ Cn is an eigenvector of A and λ is its eigenvalue, then for the canonical hermitian
inner product (, ) on Cn we have

(v, Av) = (Av, v), i.e. λ(v, v) = λ(v, v),

and so λ is real, since (v, v) 6= 0. The same proof holds for the euclidean case by extending
scalars from R to C on the matrix A, see Exercise ??. 

Lemma 6.5.2 Let f ∈ End(E), where E is a finite-dimensional hermitian or euclidean


space. Let F ⊂ E be an f -invariant subspace. Then F ⊥ is an f ∗ -invariant subspace of E.

Proof: Let w ∈ F ⊥ . For any u ∈ F , one has (u, f ∗ (w)) = (f (u), w) = 0, which shows
that f ∗ (w) ∈ F ⊥ .

Theorem 6.5.3 (Spectral theorem) Every self-adjoint operator admits an orthonor-


mal basis in which it diagonalises.

Proof: By induction on dim E. If dim E = 1, it is obvious. Let λ be an eigenvalue of f .


We have: ker(f − λI) is f -invariant, and so also is G = ker(f − λI)⊥ = Im(f − λI) by
Lemma 6.5.2. Clearly dim G < dim E, and f |G is self-adjoint, too, hence diagonalizable
in an orthonormal basis by induction. This basis can clearly be extended to one of E.

Corollary 6.5.4 Let f ∈ End(E) be self-adjoint. One has

µ|x|2 ≤ (x, f (x)) ≤ M |x|2 ,

where µ, M are the smallest (resp. greatest) eigenvalues of f . Each one of the equalities
turns into an equality for some value of x.

Proof: It suffices to take an orthonormal basis of eigenvectors. 

6.5.5 Prove that f ∗ = −f if and only if, (x, f (x)) = 0 for every x ∈ E.

6.5.6 Prove the spectral theorem for anti-self-adjoint endomorphisms (complex case): if
A∗ = −A, then A admits an orthonormal basis of eigenvectors.

88
6.6 Spectral theorem for unitary endomorphisms
There is a basic observation one can make about the eigenvalues of unitary endomor-
phisms.

Lemma 6.6.1 A unitary endomorphisms of an hermitian vector space has all its eigen-
values of modulus 1.

Proof: Let 0 6= v ∈ E be an eigenvector, and let λ be its eigenvalue. The equality


(f (v), f (v)) = (v, v) yields λλ = 1.

Theorem 6.6.2 Let E be a finite-dimensional hermitian vector space, and let f ∈ End(E).
There is an orthonormal basis of E in which f diagonalises, and all its eigenvalues are
complex numbers of modulus 1.

Proof: By induction on dim E. The case dim E = 1 is clear. Let λ be an eigenvalue of f ;


one has two natural invariant subspaces: G = ker(f − λI), G = ker(f − λI)⊥ . Indeed, this
follows from Lemma 6.5.2 and the remark that being f -invariant is equivalent to being
f −1 -invariant. Clearly, f |G is a unitary endomorphism of G, and the Theorem follows
just as in Theorem 6.5.3. 

Theorem 6.6.3 (Orthogonal case) Let E be euclidean, and let f be orthogonal, i.e.
f f ∗ = IE . f admits an orthonormal basis in which the matrix of f has the form
 
Ir 0 0 ··· 0
 0 −Is 0 · · · 0 
 
0
 0 N 1 · · · 0 ,

. . . . . . . . . . . . . . . . . . . . .
0 0 0 · · · Nt
 
cos θi − sin θi
where Nk = .
sin θi cos θi

Proof: Fix an orthonormal basis, and consider A to be the matrix of f in this basis. The
case λ = ±1 is plain. Suppose that λ ∈ / R is an eigenvalue of A, and that u is a unit vector
in its eigenspace. The vector u is an eigenvector of A, its eigenvalue is λ, and u, u form
an orthonormal system in Cn . Now, hu, ui ∩ Rn is 2-dimensional, and an orthonormal
basis thereof is given by √12 (u + u), i√1 2 (u − u). One may associate with each eigenvalue λ
with Im λ > 0 an orthonormal basis uij , and then pair each vector with its conjugate in
ker(A − λI) as above. Since ker(A − λI) ⊥ ker(A − λ0 I) for λ 6= λ0 , one thus obtains an
orthonormal basis for the whole of Rn . 

89
6.7 Normal endomorphisms
Definition Let f : E → E be an endomorphism of a real Euclidean or complex hermitian
space. We say that f is normal if f f ∗ = f ∗ f .
T T T
6.7.1 Let A ∈ Mn (C). Suppose that A, A commute, i.e. AA = A A. Show that
T T
ker A = ker A and Im A = Im A .

6.7.2 (Spectral theorem for normal endomorphisms) Let f : E → E be an endo-


morphism of an hermitian space. If f is normal, then there is an orthonormal basis in
which f diagonalises. (Hint: Use Exercise 6.7.1.)

6.8 Norm of an operator


We restrict ourselves to the euclidean and hermitian norms derived from inner products,
and to finite-dimensional vector spaces.

Definition Let T : E → F be a linear operator between two finite-dimensional euclidean


or hermitian vector spaces. The norm of T is defined to be
|T x|F
kT k = sup .
x6=0 |x|E

Definition Let f ∈ End(E) be a positive semidefinite self-adjoint endomorphism. The


spectral radius of f , ρ(f ), is defined to be the highest eigenvalue of f .
p
Proposition 6.8.1 Let T be as above. One has kT k = ρ(T ∗ T ). The norm is attained
by a nonzero vector, i.e. the supremum in the definition is a maximum.

Proof: |T x|2 = (T x, T x) = (x, T ∗ T x) ≤ M (x, x), where M is the highest eigenvalue of


T ∗ T . If x = u is a corresponding eigenvector, then equality holds. 

6.9 Problems
6.9.1 Consider a set of vectors u1 , · · · , uk ∈ E, where E is an arbitrary vector space
with an inner product h•, •i. Prove that the ui are LD if and only if the Gram matrix
gij = hui , uj i is degenerate, i.e. has determinant 0. More precisely, prove that the rank of
the Gram matrix of the ui coincides with dimhu1 , · · · , uk i.

6.9.2 Given f1 , . . . , fr ∈ CC [a, b] complex continuous functions on [a, b], prove that
dimhf1 , · · · , fr i = rk G(fi ) .

6.9.3 (OIMU 2013, P2) Let V be an infinite-dimensional real vector space, and let
u ∈ V . Let S ⊂ V be an infinite-dimensional vector subspace. Calculate dim(S⊕hui)∩S ⊥ .

90
Solution for 6.9.3: Let F = (S ⊕ hui) ∩ S ⊥ . We shall prove that the desired dim F is 0
or 1.
If u ∈ S ⊥ , the answer is clear: F = S ⊥ ∩ (S ⊕ hui) = hui has dimension 1. Otherwise, let
w ∈ S be such that w • u = 1. We shall assume after rescaling u that |w| = 1. We have
s = s − (s, u)w + (s, u)w, which incarnates the direct sum decomposition S = ker ω ⊕ hwi,
where ω : S → R is the linear functional given by ω(s) = (u, s).
Note that there is no loss of generality in assuming that E = S ⊕ hui. Consider a nonzero
element σ + au ∈ S ⊥ . This means that, for every s ∈ S,

(s, σ + au) = 0, i.e. (s, σ) + a(s, u) = 0 ∀s ∈ S.

Taking s = w ∈ S, a = −(w, σ) is determined. Thus, our element is of the form σ−(w, σ)u
and satisfies
0 = (σ − (w, σ)u, s) = (σ, s) − (s, u)(w, σ), ∀s ∈ S.
Rearranging the above yields, for all s ∈ S:

(s − (s, u)w, σ) = 0, ∀s ∈ S.

Note that T (s) = s − (s, u)w has image Im T = u⊥ ∩ S, and that ker T = hwi. T is a
projection, albeit not of the orthogonal kind, for T 2 = T and w need not be orthogonal
to u⊥ . The condition on σ becomes: σ ∈ S is orthogonal to Su = u⊥ ∩ S.
Thus, we have the following. Let H = Im T ⊂ S = u⊥ ∩ S. If there is an orthogonal

decomposition H ⊕ hu0 i = S, then H may be described as u⊥ 0 for u0 ∈ S, and σ ∈ hu0 i
describes a 1-dimensional subspace. If such decomposition does not exist, then σ may
only be 0, and F = 0.
If E is a Hilbert space and S ⊂ E is a closed subspace, then such u0 always exists [25,
Th. V.1.6, Cor. V.1.8]. Otherwise, such decomposition may not exist.

6.9.4 Let E be an n-dimensional euclidean or hermitian space, and let v1 , · · · , vr , w1 , · · · , wr


be two sets of elements of E, where the vi are linearly independent. Prove that

det((vi , wj )) 6= 0 ⇔ hv1 , · · · , vr i⊥ ⊕ hw1 , · · · , wr i = E.

6.9.5 Given an euclidean vector space, prove that given u, v, w vectors one has

ku + vk + kv + wk + ku + wk ≤ kuk + kvk + kwk + ku + v + wk.

6.9.6 Let f : E → F be a linear map between hermitian spaces (of finite dimension),
and let f ∗ be its adjoint. Show that kf k = kf ∗ k.

6.9.7 Let m > n, and let A be an m × n-matrix of rank n. Consider the linear variety
L, defined by AX = I, of Mn×m (R). Find its element of minimal norm.

6.9.8 Let A ∈ Mn (C) be a normal matrix, i.e. AA∗ = A∗ A. Prove the spectral theorem
for A by reducing that for self-adjoint matrices.

91
6.9.9 (80 WLP 2019, B-3) Let u ∈ Rn be a unit vector, |u| = 1, and let
P = I −2uuT . Let Q be an n×n real orthogonal matrix with no eigenvectors of eigenvalue
1. Show that the matrix P Q has 1 as an eigenvalue.

6.9.10 Let u1 , · · · , un ∈ Rn be vectors such that kui k2 < 1. Prove that the vectors
P
ei + ui , for i = 1, · · · , n form a basis of Rn .

Hint for 6.9.10: Write a matrix I + X, and prove that X is small enough so I + X is
invertible.

6.9.11 Let A ∈ Mn (C). Prove that A = A∗ hermitian if and only if AA∗ = A2 .

6.9.12 Let A be a real symmetric matrix. Prove that r = rk A is the largest number
for which there is a nonzero principal minor of order r, AII 6= 0, where I ⊂ {1, 2, . . . , n},
|I| = r.

6.9.13 Given a positive semidefinite hermitian matrix A ∈ Mn (C), prove that there
exists a unique hermitian square root B = B ∗ of A that is also positive semidefinite.

6.9.14 (IMC 2021, P5) Let A ∈ Mn (R) be such that, for all m ∈ N, there exists a
symmetric B ∈ Mn (R) such that

2021B = Am + B 2 .

Prove that | det A| ≤ 1.

6.9.15 Let A ∈ GLn (C). Show that there are unique matrices H positive definite her-
mitian and U unitary such that A = HU (this is called polar decomposition). If
A ∈ Mn (C) is not necessarily invertible, prove the existence of such decomposition. What
about uniqueness then?

6.9.16 [?, Th. 2.1.4, Cors. 2.1.5 and 2.1.6] Let u1 , . . . , un be linearly independent
vectors in an hermitian or euclidean vector space. Prove that the Gram-Schmidt process
may be obtained by considering the vectors

(u1 , u1 ) (u1 , u2 ) . . . (u1 , uk−1 ) u1


(u2 , u1 ) (u2 , u2 ) . . . (u2 , uk−1 ) u2
ϕk = .. .. .. .. .. ,
. . . . .
(uk , u1 ) (uk , u2 ) . . . (uk , uk−1 ) uk

and by taking φk = √ 1
ϕk , where ∆k is the Gram determinant of u1 , . . . , uk , ∆k =
∆k ∆k−1
det((ui , uj ))1≤i,j≤k and ∆0 = 1.

6.9.17 Define  
α −β
H={ } ⊂ M2 (C).
β α

92
(a) Prove that H is an R-algebra (with the matrix product).

(b) Prove that, if P ∈ SU (2), the map X 7→ P XP −1 induces an automorphism of H.

(c) Prove that if P ∈ SL2 (C),

P HP −1 ⊂ H ⇔ P ∈ SU (2).

6.9.18 (Berkeley problems in Maths, 7.5.32) Let A, B be self-adjoint operators in


Mn (C) such that the eigenvalues of A lie in [a, a0 ] and those of B lie in [b, b0 ]. Prove that
the eigenvalues of A + B lie in [a + b, a0 + b0 ].

6.9.19 Compute the dimensions of the vector subspaces of self-adjoint and anti-self-
adjoint endomorphisms of Mn (R) and Mn (C), respectively. Prove that there is a direct
sum decomposition in both cases.

6.9.20 Prove the spectral theorem for normal endomorphisms using the case for hermi-
tian endomorphisms.

6.9.21 Prove the spectral theorem for normal endomorphisms using polar decomposition.

6.9.22 (Schur factorization) Let A ∈ Mn (C). Prove that there is an orthonormal


basis in which A is upper triangular, namely, there are an upper triangular matrix T and
a unitary matrix P such that A = P T P −1 . Use this fact to prove the spectral theorem for
normal endomorphisms.

6.9.23 (AMM 97 no 1(1989), Bjorn Poonen) Let B ∈ Mn (C), and let k ∈ N.


Prove that there exists a unique matrix A ∈ Mn (C) such that

A(A∗ A)k = B.

6.9.24 (SEEMOUS 2023, P3) A ∈ Mn C such that

(6.3) A + A∗ = A2 A∗ .

Show that A is hermitian.

6.9.25 Show that SU (2)/ ± I ∼ = SO(3). Prove first that every matrix A ∈ SO(3) may
be written as
 2 
a + b2 − c 2 − d 2 2(bc − ad) 2(ac + bd)
ρ(a, b, c, d) =  2(ad + bc) a2 − b 2 + c 2 − d 2 2(cd − ab)  ,
2(bd − ac) 2(ab + cd) a2 − b2 − c2 + d2

where a2 + b2 + c2 + d2 = 1.

93
6.9.26 (CIIM 2022) Let A ∈ M2 (R). Let v be a unit vector, and assume the following
conditions:
(i) The vectors Av, A2 v, A3 v are also unit vectors;
(ii) one has A2 v 6= ±v, A2 v 6= ±Av.
Prove that AT A = I.

Solution for 6.9.26: Let B(x, y) = (Ax, Ay) − (x, y). The symmetric bilinear form B
is zero if and only if B(ui , uj ) = 0 for (ui ) a basis of R2 . If A = λI, one has λ = ±1 by
(i) (which violates (ii)), so mA (x) = pA (x). If τ = tr A, δ = det A, one has A2 = τ A − δA
(Cayley-Hamilton), and we shall assume for now that τ δ 6= 0. Consider the basis u1 =
v, u2 = Av.
In the case where τ δ 6= 0, note that

B(v, v) = 0 = B(Av, Av) = B(A2 v, A2 v) by (i).

From the above and using A2 v = −δv + τ Av, the equality B(A2 v, A2 v) = 0 becomes

0 = τ 2 B(Av, Av) + δ 2 B(v, v) + 2δτ B(v, Av),

hence B(v, Av) = 0. In other words, B(ui , uj ) = 0 for all 1 ≤ i, j ≤ 2, so indeed B ≡ 0,


and therefore AT A = I.
Let us rule out the remaining cases. If τ = tr A = 0, then A2 = −δI, so A2 v = −δv, and
by (i) one has δ = ±1, which contradicts (ii). Analogously, δ = det A = 0 and (i) would
violate (ii). 

‘Second’ solution: A binary quadratic form is a quadratic form on a vector space of


dimension 2. Thus, one has Q(x, y) = αx2 + 2βxy + γy 2 .
If Q(x, y) 6= 0, then Q is irreducible over R (and its set of zeros reduces to (x, y) = (0, 0)),
or Q splits over R, namely

(6.4) Q(x, y) = (ax + by)(a0 x + b0 y).

If Q(u) = (Au, Au) − (u, u), then we have Q(v) = 0, Q(Av) = 0, where v, Av are linearly
independent but this points out to the formula (6.4). Since Q(A2 v) = 0, this points to
more zeros than those listed in (6.4), if A2 v is not parallel to v or Av. This turns out to
be the case (statement plus short arguments), and forces Q to be identically zero.

6.10 Solutions for exercises within the sections


Solution for 6.4.1: Conjugating the definition of adjoint we get (f ∗ )∗ = f.

Solution for 6.7.1: We know that ker A = ker A∗ A, Im A = Im AA∗ . Thus, AA∗ = A∗ A
means ker A∗ = ker AA∗ = ker A∗ A = ker A, and hence also that Im A = (ker A∗ )⊥ =
(ker A)⊥ (use Proposition 6.4.3).

94
Proof of the spectral theorem for normal endomorphisms 6.7.2: One has

ker A ⊕ Im A = Cn ,

and so Im A = Im A2 , i.e. ker A = ker Ak for every k ∈ N. The multiplicity of the factor
x in the minimal polynomial mA (x) is therefore 1. Since Im A = (ker A)⊥ (which in turn
equal Im A∗ ), one could proceed by induction, but we shall do the following.
Apply 6.7.1 to A − λI, where λ is an eigenvalue. Within Im A, if λ 6= 0 then (A − λI)∗ =
A∗ − λI satisfies:
⊥ ⊥
Cn = ker A ⊕ ker(A − λ1 I) ⊕ (Im A ∩ (Im (A − λ1 I)),

and iterating yields the desired orthogonal decomposition. 

95
96
Chapter 7

Quadratic forms

In the whole chapter, we shall conform ourselves to the real case, i.e. K = R. Over C,
the results are simpler. The term form stands for an homogeneous polynomial: a linear
form is a homogeneous polynomial of degree 1, such as 2x + 4y − 5z. A quadratic form
shall be of degree two, for instance x2 − 2yz + 2z 2 − 2xy.

7.1 Introduction
A form is an homogeneous polynomial function. A linear form (in n variables) is a
polynomial of the form a1 x1 + . . . + an xn , where ai are constants.
P A quadratic form is
a homogeneous polynomial of degree 2, hence of the form 1≤i≤j≤n cij xi xj .
Note that there are n(n+1)
2
monomials xi xj (1 ≤ i ≤ j ≤ n) of degree 2 in n variables, so
the vector space of quadratic forms has dimension n(n + 1)/2. The connection with the
vector space of symmetric matrices of order n is more than a coincidence in dimension.
P
Proposition 7.1.1 Let x = (x1 , . . . , xn ). For every quadratic form Q(x) = 1≤i≤j≤n cij xi xj
there is a unique symmetric matrix A such that Q(x) = xT Ax.

Proof: Note that


 
a11 . . . a1n  y 
  a21 . . . a2n   .1  X
xT Ax = x1 . . . xn    ..  =
. . . . . . . . . . . . . aij xi yj ,
yn 1≤i,j≤n
an1 . . . ann
so xT Ax = aii x2i + i<j (aij + aji )xi xj . Since aij = aji , we have
P P P
i,j aij xi xj = i
xT Ax = cij xi xj if and only if cii = aii and cij = 2aij for i < j. 
P

Changing coordinates: We shall henceforth write Q(x) = QA (x) = xT Ax, where A


is symmetric. Note that, in this case, if a change of variables x = P y is effected, then
Q(x) = Q(P y) = (P y)T AP y = y T (P T AP )y = QP T AP (y). Note the difference with base
change in endomorphisms (which gives rise to conjugation, i.e. M 7→ P −1 M P ).

97
Remark 7.1.2 In the above discussion, there is one case where base change works the
same in both cases, that of endomorphisms and of quadratic forms. That is when P is
orthogonal, i.e. P T = P −1 .

7.1.3 Prove that there is a linear (homogeneous) change of variables that takes the
quadratic form xy to x2 − y 2 . (Hint: Aren’t both products?)

We have two problems in the case of quadratic forms: finding invariants (under linear
change of coordinates), and methods to decide whether two quadratic forms are equivalent:
namely, given two symmetric n × n real matrices A, B, how may we decide whether
B = P T AP for some P ∈ GLn (R)? (Recall that GLn is the linear group, which consists
of all invertible n × n matrices).

7.2 The simplest invariants. Rank


Remark 7.2.1 There are two invariants that are very clear. One is the rank, namely: if
P is invertible, then rk P T AP = rk A. On the other hand, if A is invertible, then

det P T AP = det A(det P )2 ,

so the sign of det A is an invariant, if A is invertible.

Definition Let Q(x) = xT Ax be a quadratic form, and let A be its associated symmetric
matrix. We say that Q, or A, is non-degenerate if det A 6= 0. (By Remark 7.2.1, this
condition is invariant under linear change of coordinates).

Metric invariants Let A = AT be real. There is an orthogonal matrix such that


P −1 AP = P T AP = D is diagonal. D is unique up to changing the order of its diag-
onal, of course. Since P is an isometry, i.e. it preserves distances, the diagonal D is a
metric invariant, which means that it is preserved under isometries.

In other words:
Theorem 7.2.2 The quadratic forms associated to symmetric matrices A, B ∈ Symn (R)
are metrically equivalent (i.e. there is an orthogonal matrix P such that P T AP = B) if
and only if

det(A − xI) = det(B − xI) (their characteristic polynomials agree).

Example 7.2.3 The difference between metric equivalence and general equivalence of two
quadratic forms is illustrated in the following example. Let Q(x, y) = x2 + y 2 , and let
2 2
Q0 (x, y) = xa + yb , where 0 < a < b. The change of variables x = au, y = bv
transforms Q0 into Q. However, if we take the level curves, say Q(x, y) = 1 and Q0 (x, y) =
1, it is clear that one is a circle and the other one is not, although indeed Q0 (au, bv) =
u2 + v 2 .

98
Example 7.2.4 Clearly, if D, D0 are two diagonal matrices associated to two quadratic
forms, of the same rank r and with the same number of positive eigenvectors, we have the
following. Assume that in both cases the positive eigenvalues are in rows 1 to s:
r
X r
X
Q(x) = δ1 x21 + ... + δs x2s + δi x2i , Q0 (y) = δ10 y12 + ... + δs0 ys2 + δi0 yi2 ;
i=s+1 i=s+1
q
δi
If i = + δi0
, i = 1 for i > r, then yi = i xi (which we write as y = P x, where P is
diagonal of elements i ), where Q0 (y) = Q0 (P x) = Q(x).

Definition Let Q(x) = xT Ax be a quadratic form, A = AT . The signature of Q is the


number of positive eigenvalues (λi > 0) of A.

Theorem 7.2.5 (Sylvester’s Law of Inertia) Rank and signature are the only invari-
ants of a quadratic form (here, we mean by general linear change of coordinates). In other
words, two quadratic forms may be transformed into one another by means of a linear
change of coordinates if and only if their ranks and signatures are equal.

A proof of this theorem will be provided later.

7.3 Completing squares: Gauss’s method


Consider the quadratic form in two variables Q(x, y) = Ax2 + 2Bxy + Cy 2 . Completing
the square diagonalizes Q. Gauss’s procedure does this in quadratic forms in n variables.

Theorem 7.3.1 A real quadratic form Q(x) is diagonalizable (K = Q, R, C).

Proof: We shall proceed by induction on n. Assume that aii 6= 0 for some i. For sim-
plicity, we shall assume that a11 6= 0 (else P one may effect a permutation of the variables).
The terms including x1 are: a11 x21 + 2 ni=2 a1i x1 xi , which we may write as
 !2 !2 
n n n
X a1i X a1i X a1i
a11 (x21 + 2 x1 xi ) = a11  x1 + xi − xi  .
i=2
a 11 i=2
a 11 i=2
a 11

Thus, 
n
!2 
X a1i
Q(x1 , . . . , xn ) = a11 z12 − xi  + q1 (x2 , . . . , xn ).
i=2
a11

In other words, our quadratic form in the variables z1 , x2 , . . . , xn is of the form a11 z12 +
q 00 (x2 , . . . , xn ), where
n
!2
X a1i
q2 (x2 , . . . , xn ) = −a11 xi + q1 (x2 , . . . , xn ).
i=2
a 11

99
Note that (z1 , x2 , x3 , . . . , xn ) is obtained from x by multiplying by an elementary matrix
– one may wish to define zi = xi , for i > 1. The proof by induction is concluded in this
case.
If all diagonal elements of the associated symmetric matrix are zero, either A = 0 (done),
or there is a term aij 6= 0 for some i 6= j. Let us suppose for simplicity that a12 6= 0.
The term a12 x1 x2 may be written as a difference of squares, x1 x2 = 14 [y12 − y22 ], where
y1 = x1 + x2 , y2 = x1 − x2 , and the resulting quadratic form has a011 , a022 6= 0, so the above
induction step works. 
Corollary 7.3.2 A quadratic form Q of rank r is equivalent to one of the form ri=1 ±x2i .
P

Proof: Indeed, by Theorem 7.3.1, we may assume that ri=1 di xP 2


P
i , where di 6= 0. One
may write di = µi εi , where εi = ±1, and writing zi = µi xi yields ri=1 εi zi2 = Q( µz11 , . . .)
2

as desired. 

Proof
Pr of Theorem 7.2.5:PLet A, B be two invertible P matrices such P that Q(Ay) =
2 r 2 p 2 q 2
i=1 ±y i and Q(By) = i=1 ±y i . One has Q(x) = i=1 ω(x) − i=1 ξ(x) and
Pp 0 0
Q(x) = i=1 α(x)2 − qi=1 β(x)2 , where p + q = r = p0 + q 0 , where α, β, ω, ξ are lin-
P
ear forms. Assume that p 6= p0 , q 6= q 0 (we assume that p < p0 ). This means that q 0 < q.
One may write the above as follows:
p q p 0 q 0
X X X X
2 2 2
Q(x) = ωi (x) − ξi (x) = αi (x) − βi (x)2 .
i=1 i=1 i=1 i=1

in other words,
p q0 p0 q
X X X X
2 2 2
ωi (x) + βi (x) = αi (x) + ξi (x)2 = (?).
i=1 i=1 i=1 i=1

Equating (?) to 0 yields, by looking at the r.h.s.:

ωi (x) = 0, βj (x) = 0 for i ≤ p, j ≤ q 0 ,


T T
and that maps out the locus ker ωi ∩ ker βj = L, which has dimension dim L =
n − (p0 + q) > n − r. However, including the r.h.s. yields ωi (x) = 0, βj (x) = 0 for
i ≤ p, j ≤ q 0 , αk (x) = 0, ξ` (x) = 0, for k ≤ p, ` ≤ q 0 .
Certainly, this shows that the set of common zeros of the ωi and βj is contained within the
set αk (x) = 0, βj (x) = 0, which we call M and is known to be of dimension dim M = n−r
(indeed, αi , βj are r linearly independent linear forms). This leads to a contradiction, since
dim L > dim M . 

Remark We have finally shown that both rank and signature characterise a quadratic
form over R up to equivalence. Over fields such as Q, the issue is much more complicated,
see e.g. J.P. Serre, A course in arithmetic for an introductory text to the subject, which
is still active nowadays.

100
The following is a useful computational tool.

Theorem 7.3.3 Let A be the real symmetric matrix associated with a real quadratic form
Q(x) = xT Ax. Assume that the angular (or leading) minors of A,

a11 . . . a1k
∆k = ... . . . ... ,
ak1 . . . akk

are all nonzero. The signs of the numbers

∆2 ∆k ∆n
∆1 , ,..., ,...,
∆1 ∆k−1 ∆n−1

coincide with the signs of all eigenvalues of A.

7.3.1 A constructive proof of Theorem 7.3.3


The idea is based on an explicit presentation of the Gram-Schmidt process [?].

A basis of Gram-Schmidt type: Let E, B be a quadratic space, where B is a sym-


metric bilinear form and E is a real vector space of dimension n. Let u1 , . . . , ur be linearly
independent vectors. Assume that the Gram-type determinants

B(u1 , u1 ) . . . B(u1 , uk )
B(u2 , u1 ) . . . B(u2 , uk )
∆1 = B(u1 , u1 ), . . . , .. .. .. ,k ≤ r
. . .
B(uk , u1 ) . . . B(uk , uk )

are all nonzero, ∆i 6= 0 for i = 1, . . . , r.


Define the following vectors:

B(u1 , u1 ) . . . B(u1 , uk−1 ) u1


B(u2 , u1 ) . . . B(u2 , uk−1 ) u2
vk = .. ... .. .. , where 1 ≤ k ≤ r.
. . .
B(uk , u1 ) . . . B(uk , uk−1 ) uk
Note that B(ui , vk ) = 0 for i < k. Indeed,

B(u1 , u1 ) . . . B(u1 , uk−1 ) B(u1 , ui )


B(u2 , u1 ) . . . B(u2 , uk−1 ) B(u2 , ui )
B(ui , vk ) = .. ... .. .. = 0,
. . .
B(uk , u1 ) . . . B(uk , uk−1 ) B(uk , ui )

101
since it has two equal columns. Likewise, we see that B(uk , vk ) = ∆k , and since vk =
∆k−1 uk + (a linear combination of u1 , . . . , uk−1 ) we have

B(v1 , v1 ) = ∆1 , B(vk , vk ) = ∆k−1 ∆k for k ≥ 2,

which yields an orthogonal basis that diagonalises B. The signature of B reads plainly
from this presentation, and coincides with the description given. 

Note that the Gram-Schmidt process, Sylvester’s criterion to characterise inner products
and Theorem 7.3.3 follow from the above argument. A stronger version of Theorem 7.3.3
is as follows.
Theorem 7.3.4 Let B(x, y) = xT Ay be a quadratic form on Rn , where A = AT is of
n
rank 1 and has lead minors ∆1 , . .. , ∆r 6=  0. There is a basis v1 , . . . , vn of R which
D 0
diagonalises B and such that Bv = , where D is r × r and has the shape described
0 0
in Theorem 7.3.3.
 0
A A00

Proof: Indeed, since ∆r 6= 0, we write A = , where A0 is invertible. The
B 0 B 00
equations for ker A have the following shape:
 
y
x= , A0 y + A00 z = 0, i.e. y = −(A0 )−1 A00 z = Cz, where C = −(A0 )−1 A00 .
z
 
C
Thus, a basis vr+1 , . . . , vn of ker A = Im , together with v1 , . . . , vr shall form the
In−r
desired basis of Rn . 

7.4 Quadrics
One may consider, given a quadratic form, the level hypersurfaces given by Q(x) = c,
c ∈ R. The study of their shape may be undertaken by using the tools provided here.
Consider the example x2 + y 2 + z 2 − r2 = 0. If we translate this surface by the vector
(a, b, c), the resulting equation is

(x − a)2 + (y − b)2 + (z − c)2 − r2 = 0, i.e. x2 + y 2 + z 2 − 2ax − 2by − 2cz − r2 = 0.

The linear part of the equation inevitably appears.

Definition A quadric hypersurface in Rn is the locus given by a quadratic equation in


the coordinates x1 , . . . , xn :
X n
X
αij xi xj + bk xk + c = 0.
1≤i,j≤n k=1

102
The equation may be expressed in matrix form as follows: let A = (aij ) be the symmetric
matrix associated with the degree-2 part of the above (aii = αii , aij = αij /2 = aji if
i < j), and let b = (bk ) ∈ Rn be a column vector. In block matrix form, the quadric may
be written as   
T A bT /2 x
(x 1) = 0.
b/2 c 1
We shall often consider two quadrics equal whenever their equations differ by a constant
factor. This holds whenever the underlying locus is empty or lower-dimensional, namely
x2 + 2y 2 = 0 and x2 + y 2 = 0 are different quadrics in R2 and R3 .

Example 7.4.1 (Conics) Quadrics on the plane are called conics, a name provided by
Apollonius. The ellipse (x/a)2 + (y/b)2 − 1 = 0 for a, b > 0 is but an example.

7.4.1 Affine and metric classification of quadrics


For a full classification (both metric and affine) of quadrics in every dimension, see [34].
By accepting non-homogeneous quadratic equations, we also incorporated translations to
the admissible change of variables. Thus, the more general affine change of coordinates
x = P z + q is allowed.
Note that this may still be written as
    
x P q z
= ,
1 0 1 1

and if Q(x) is the quadratic polynomial whose zero locus defines our quadric, the change
of variables Q(P z + q) looks as follows in matrix form:
    T  
T A bT /2 x T P AP bT z
Q(P z + q) = (x 1) = (z 1) .
b/2 c 1 b c 1

We say that two quadrics are equivalent if Q1 (x) is a constant multiple of Q2 after
effecting a suitable affine change of coordinates x = P z + q.
The affine classification of quadrics consists of two aspects: Firstly, finding a particu-
larly simple form that is equivalent to a given quadric, and presenting a full list of those
types up to equivalence.
Metric classification is similar, except that we consider only those transformations
x = P z + q where the linear part P is orthogonal, P T P = I. Thus, the distances are
preserved: for x0 = P z 0 + q, x00 = P z 00 + q, we have kx0 − x00 k = kP (z 0 − z 00 )k = kz 0 − z 00 k.

Proposition 7.4.2 Let A, b be as above. Assume that b ∈ Im A. The quadratic form Q


is metrically equivalent to one Q0 without linear part.

Proof: One sees that (P y + q)T A(P y + q) + bT (P y + q) + c = y T P T AP y + (2q T A +


bT )P y + q T Aq + c. It suffices to impose Aq = −b, i.e. q = − 21 A−1 b. Taking, say, P = I,
we get the desired rigid motion. 

103
Example 7.4.3 Consider the general real conic given by the equation
  
T A bT /2 x
Q(x) = (x 1) = 0.
b/2 c 1
Here A 6= 0 is real symmetric of order 2 (if A = 0 the equation would be linear). The
rank of A may be therefore 1 or 2. If rk A = 2, then b ∈ Im A and the equation is
x2 x2
metrically equivalent to one of the form a21 ± b22 − 1 = 0 if the real locus of zeros is
nonempty, the sign being + if A is (positive or negative) definite (which yields an ellipse
if definite, a hyperbola if indefinite). In the case where rk A = 1, choose the sign so
 thatA
1 0
is positive semidefinite: A is metrically equivalent to a matrix of the form (xT 1) ,
0 0
and subtracting to b a suitable element of the image of A (see the proof of Proposition
7.4.2) yields after rescaling an equation of the form
x21 + αx2 + β = 0,
which after writing x2 = ±x02 + γ yields x21 + αx2 = 0.

7.5 Quadrics in R3 with non-degenerate A


We mean those Q(x) such that det A 6= 0. After a suitable translation, the linear part
vanishes, and diagonalising A in an orthonormal basis provides a standard form for our
quadric. If Q(0) = 0, then the quadric in diagonal form (up to a constant multiple) looks
like

x2 ± (y/b)2 ± (z/c)2 = 0,
where a, b, c > 0.
If Q(0) 6= 0 (assuming the linear part vanishes), then our Q with a diagonal A looks as
follows:

±(x/a)2 ± (y/b)2 ± (z/c)2 + 1 = 0.

7.6 Problems
 
A b
7.6.1 Consider a complex conic Q given by the matrix M = , where A 6= 0.
bT 1
Prove that Q is a pair of lines if and only if det M = 0 (this includes the case of the
double line, i.e. Q(x) = (ax + by + c)2 = 0).
7.6.2 Consider two conics which do not share a line, Q1 = 0, Q2 = 0. Prove that
their intersection consists at most of four points. (Hint: Consider the pencil of conics
Q1 + λQ2 = 0, and find the degenerate conics in the pencil).
7.6.3 Let Q be a quadratic form on Rn . Let F be a vector subspace of Rn such that Q|F
is positive definite. Prove that F ⊥ ⊕ F = Rn .

104
Appendix: Signature and Gauss elimination
In this appendix we provide a proof of Theorem 7.3.3 using the LU decomposition. Note
that this result vastly generalises Sylvester’s characterisation of positive definite quadratic
forms.
The following result may be proven by induction, using block matrices.

Theorem 7.6.4 A square matrix A of order n, all of which angular minors are nonzero,
admits a unique decomposition A = LU , where L is lower triangular unipotent (tii = 1,
tij = 0 for i < j) and U is upper triangular. Moreover, if A is symmetric, then U may be
written as U = DLT , where D is diagonal.

The proof of the first assertion is carried out in the Chapter pertaining to linear sys-
tems (block matrices and induction on n), and the second assertion follows readily from
uniqueness of L, U .   
L11 0
Back to Theorem 7.3.3, write L = (blocks of size k and n−k, respectively) and
 L
21 L22
D1 0
do the same with D: D = , where D = diag(λ1 , . . . , λk ). Denote the submatrix
0 D2
Ak of A to be that formed by the first k rows and columns. Block multiplication yields
Ak = L11 D1 LT11 , hence ∆k = (det L11 )2 det D1 = det D1 , for L11 is triangular unipotent.
The Theorem follows.

Corollary 7.6.5 (Theorem 7.3.4 Let A be a real symmetric matrix of rank r, such that
its principal minors ∆i 6= 0 for P
1 ≤ i ≤ r. The quadratic form Q associated withA is
equivalent to the quadratic form ri=1 βi x2i , where β1 = ∆1 and βi = ∆∆i−1
i
, 2 ≤ i ≤ r and
βi = 0 for i > r.

The Corollary follows from applying Theorem 7.3.3 to the quadratic form Q|he1 , . . . , er i.

105
106
Bibliography

[1] T. Andreescu, Essential Linear Algebra, Birkhäuser, 2014.

[2] G. E. Andrews, Number Theory, 1971.

[3] M.A. Armstrong, Groups and Symmetry, Springer UTM, 1st Ed, 1998.

[4] M. Artin, Algebra, 1st Ed, Prentice Hall, 1991.

[5] M.F. Atiyah, I.G. Macdonald, Introduction to Commutative Algebra, 1969.

[6] P. Bayer, J. Montes, A. Travesa, Problemes d’Àlgebra, University of Barcelona,


1991.

[7] A. Clark, Elements of abstract algebra, Dover.

[8] R. Courant, D. Hilbert, Methods of mathematical physics, 2 Vols.

[9] D.B. Fraleigh, A First Course in Abstract Algebra, 7th Ed, Pearson 2002.

[10] F.Brochero, Gugu Moreira et al, Teoria dos números. Um passeio pelos primos, 3rd
Ed, SBM.

[11] E. Casas, Analytic Projective Geometry, EMS, 2014.

[12] M. Castellet, I. Llerena, Álgebra Lineal y Geometrı́a, Ed. Reverte, 1996.

[13] Lucı́a Contreras Caballero, Curso de Álgebra Lineal.

[14] D. Cox, Primes of the form x2 + ny 2 , Wiley,

[15] W. Fulton, J. Harris, Representation theory: a first course, Springer GTM.

[16] M. de Guzmán, Aventuras Matemáticas, Ed. Labor, 1987.

[17] P. Halmos, The Linear Algebra Problem Book,

[18] P. Halmos, Finite dimensional vector spaces

[19] H. Hameka, Quantum Mechanics.

[20] K. Hardy, K.S. Williams, The Green Book of Mathematical Problems, Dover, 1985.

107
[21] K. Hardy, K.S. Williams, The Red Book of Mathematical Problems, reprinted Dover,
1996.

[22] K. Hoffman, R. Kunze, Linear Algebra, 2nd Ed., Prentice Hall, 1971.

[23] M. Klamkin, USA Mathematical Olympiads 1972-1986, MAA, 1988.

[24] A. I. Kostrikin, Yu. I. Manin, Linear Algebra and Geometry,

[25] S. Lang, Real and Functional Analysis, 3rd Ed, Springer, 1993.

[26] P. Lax, Linear Algebra, 1st Ed, Wiley, 1997.

[27] S. Leonesi, C. Toffalori, Un invito all’algebra, Springer Italia, 2006.

[28] G. Moore, Physical mathematics,

[29] R. Penrose, The road to reality,

[30] G. Pólya, Mathematical Discovery, Wiley, 1981.

[31] G. Pólya, G. Szegö, Problems and Theorems in Analysis (2 Vols.), Springer, 1972.

[32] M. Queysanne, Algèbre, Armand Colin, 1964.

[33] H. Rademacher, Topics in Analytic Number Theory,

[34] A. Reventos, Affine Maps, Euclidean Motions and Quadrics, UTM, Springer, 2011.

[35] H. Royden, Fitzpatrick, Real Analysis,

[36] J. Segercrantz, Improving the Cayley-Hamilton theorem for low-rank transforma-


tions, AM Monthly 99 no. 1 (1992), 42-44.

[37] J. P. Serre, Local Fields, Springer GTM 67, 1979.

[38] L. Smolin, The trouble with physics,

[39] G. Szász et al, Contests in higher mathematics: Miklós Schweitzer Mathematical


Competitions, 1949-1961, Springer.

[40] G. Tolstov, Fourier series, Dover

[41] I.M. Vinogradov, An introduction to the theory of numbers, Pergamon Press, 1955.

[42] P. Woit, Not even false,

108

You might also like