DONALDSON-Linear Algebra
DONALDSON-Linear Algebra
Neil Donaldson
Fall 2022
Linear Algebra, Stephen Friedberg, Arnold Insel & Lawrence Spence, 4th Ed 2003, Prentice Hall.
Vector Spaces Bold-face v denotes a vector in a vector space V over a field F. A vector space is closed
under vector addition and scalar multiplication
∀v1 , v2 ∈ V, ∀λ1 , λ2 ∈ F, λ1 v1 + λ2 v2 ∈ V
Examples. Here are four (families of) vector spaces over the field R.
Linear Combinations and Spans Let β ⊆ V be a subset of a vector space V over F. A linear
combination of vectors in β is any finite sum
λ1 v1 + · · · + λ n v n
where λ j ∈ F and v j ∈ β. The span of β comprises all linear combinations: this is a subspace of V.
λ1 v1 + · · · + λn vn = 0 =⇒ ∀ j, λ j = 0
1
Example. P2 (R) has standard basis β = {1, x, x2 }: every degree ≤ 2 polynomial is a unique as
a linear combination p( x ) = a + bx + cx2 and so dim P2 (R) = 3. The real numbers a, b, c are the
co-ordinates of p with respect to β; the co-ordinate vector of p is written
a
[ p] β = b
c
Linearity and Linear Maps A function T : V → W between vector spaces V, W over the same field
F is (F-)linear if it respects the linearity properties of V, W
∀v1 , v2 ∈ V, ∀λ1 , λ2 ∈ F, T( λ1 v1 + λ2 v2 ) = λ1 T( v1 ) + λ2 T( v2 )
We write L(V, W ) for the set (indeed vector space!) of linear maps from V to W: this is shortened to
L(V ) if V = W. An isomorphism is an invertible/bijective linear map.
Theorem. If dimF V = n and β is a basis of V, then the co-ordinate map v 7→ [v] β is an isomorphism
of vector spaces V → Fn .
Matrices and Linear Maps If V, W are finite-dimensional, then any linear map T : V → W can be
described using matrix multiplication.
2 −1
Example. If A = 0 −1 , then the linear map L A : R2 → R3 (left-multiplication by A) is
−4 3
2 −1 2x − y
x x
LA = 0 −1 = −y
y y
−4 3 3y − 4x
The linear map in fact defines the matrix A; we recover the columns of the matrix by feeding the
standard basis vectors to the linear map.
2 −1
0 = LA 1 −1 = L A 0
0 1
−4 3
whose jth column is obtained by feeding the jth basis vector of β to T and taking its co-ordinate vector
with respect to γ. This fits naturally with the co-ordinate isomorphisms
γ
T(v) = w ⇐⇒ [T] β [v] β = [w]γ
2
There are two special cases when V = W:
β
• If β = γ, then we simply write [T] β instead of [T] β .
γ γ
• If T = I is the identity map, then Q β := [I] β is the change of co-ordinate matrix from β to γ.
Being able to convert linear maps into matrix multiplication is a central skill in linear algebra. Test
your comfort by working through the following; if everything feels familiar, you should consider
yourself in a good place as far as pre-requisites are concerned!
The standard bases of P2 (R) and P1 (R) are, respectively, β = {1, x, x2 } and γ = {1, x }. Observe that
0 1 0
[T(1)]γ = [0]γ = , [T( x )]γ = [1]γ = , [T( x2 )]γ = [2x ]γ =
0 0 2
0 1 0
=⇒ [T]γβ = [T(1)]γ [T( x )]γ [T( x2 )]γ =
0 0 2
a
0 1 0 b
T( a + bx + cx2 ) = [T]γβ [ a + bx + cx2 ] β = b = = [b + 2cx ]γ (†)
γ 0 0 2 2c
c
1 1 0
γ
[T] η =
0 2 2
a
1 1 0 a+b
b =
0 2 2 2b + 2c
c
γ
corresponds to an equation [T( p)]γ = [T]η [ p]η for some polynomial p( x ); what is p( x ) in terms
of a, b, c?
β
3. Find the change of co-ordinate matrix Qη and check that the matrices of T are related by
β
[T]γη = [T]γβ Qη
3
1 Diagonalizability & the Cayley–Hamilton Theorem
1.1 Eigenvalues, Eigenvectors & Diagonalization (Review)
Definition 1.1. Suppose V is a vector space over F and T ∈ L(V ). A non-zero v ∈ V is an eigenvector
of T with eigenvalue λ ∈ F (together an eigenpair) if
T(v) = λv
We start by recalling a couple of basic facts, the first of which is easily proved by induction.
0 ··· 0 λn
Thankfully there is systematic way to find eigenvalues and eigenvectors in finite-dimensions:
1. Choose any basis ϵ of V and compute the matrix A = [T]ϵ ∈ Mn (F).
2. Observe that
λ ∈ F is an eigenvalue ⇐⇒ ∃[v]ϵ ∈ Fn \ {0} such that A[v]ϵ = λ[v]ϵ
⇐⇒ ∃[v]ϵ ∈ Fn \ {0} such that ( A − λI ) [v]ϵ = 0
⇐⇒ det ( A − λI ) = 0
This last is a degree n polynomial equation whose roots are the eigenvalues.
3. For each eigenvalue λ j , compute the eigenspace Eλ j = N (T − λ j I) to find the eigenvectors.
Remember that Eλ j is a subspace of the original vector space V, so translate back if necessary!
4
Definition 1.3. The characteristic polynomial of T ∈ L(V ) is the degree-n polynomial
The eigenvalues of T are precisely the solutions to the characteristic equation p(t) = 0.
− i −1 i
( A − iI)v = v =⇒ Ei = Span
1 −i 1
and similarly E−i = Span −1i . We therefore have an eigenbasis β = { i , −i } (of C2 ), with
1 1
respect to which
i 0
[L A ] β =
0 −i
T( f )( x ) = f ( x ) + ( x − 1) f ′ ( x )
With respect to the standard basis ϵ = {1, x, x2 }, we have the non-diagonal matrix
1 −1 0
With three distinct eigenvalues, T is diagonalizable. To find the eigenvectors, compute the
nullspaces:
0 −1 0 1
λ1 = 1: 0 = ( A − λ1 I )[v1 ]ϵ = 0 1 −2 [v1 ]ϵ =⇒ [v1 ]ϵ ∈ Span 0 =⇒ E1 = Span{1}
0 0 2 0
−1 −1 0 1
λ2 = 2: A − λ2 I = 0 0 −2 =⇒ [v2 ]ϵ ∈ Span −1 =⇒ E2 = Span{1 − x }
0 0 1 0
−2 −1 0 1
λ3 = 3: A − λ3 I = 0 −1 −2 =⇒ [v3 ]ϵ ∈ Span −2 =⇒ E3 = Span{1 − 2x + x2 }
0 0 0 1
1 0 0
[T] β = 0 2 0
0 0 3
5
Conditions for diagonalizability of finite-dimensional operators
We now borrow a little terminology from the theory of polynomials.
1. Let λ ∈ F be a root; p(λ) = 0. The algebraic multiplicity mult(λ) is the largest power of λ − t to
divide p(t). Otherwise said, there exists1 some polynomial q(t) such that
2. We say that p(t) splits over F if it factorizes completely into linear factors; equivalently
∃ a, λ1 , . . . , λk ∈ F such that
p ( t ) = a ( λ 1 − t ) m1 · · · ( λ k − t ) m k
When p(t) splits, the algebraic multiplicities sum to the degree n of the polynomial
n = m1 + · · · + m k
Of course, we are most interested when p(t) is the characteristic polynomial of a linear map T ∈ L(V ).
If such a polynomial splits, then a = 1 and λ1 , . . . , λk are necessarily the (distinct) eigenvalues of T.
Example 1.6. The field matters! For instance p(t) = t2 + 1 = (t − i )(t + i ) = −(i − t)(−i − t) splits
over C but not over R. Its roots are plainly ±i.
For the purposes of review, we state the main result; this will be proved in the next section.
Theorem 1.7. Let V be finite-dimensional. A linear map T ∈ L(V ) is diagonalizable if and only if,
2. The geometric and algebraic multiplicities of each eigenvalue are equal; dim Eλ j = mult(λ j ).
3 1 0
Example 1.8. The matrix A = 030 is easily seen to have eigenvalues λ1 = 3 and λ2 = 5. Indeed
005
p ( t ) = (3 − t )2 (5 − t ), mult(3) = 2, mult(5) = 1
1 0
E3 = Span 0 , E5 = Span 0 , dim E3 = dim E5 = 1
0 1
Everything prior to this should be review. If it feels very unfamiliar, revisit your notes from 121A,
particularly sections 5.1 and 5.2 of the textbook.
1 The existence follows from Descartes factor theorem and the division algorithm for polynomials.
6
Exercises 1.1 1. For each matrix over R; find its characteristic polynomial, its eigenvalues/spaces,
and its algebraic and geometric multiplicities; decide if it is diagonalizable.
2 0 0 0 −1 6 0 0
(a) A = 00 30 13 01 (b) B = −02 60 03 00
0003 0 003
2. Suppose A is a real matrix with eigenpair (λ, v). If λ ̸∈ R show that (λ, v) is also an eigenpair.
3. Show that the characteristic polynomial of A = 34 −34 does not split over R. Diagonalize A
over C.
4. Give an example of a 2 × 2 matrix whose entries are rational numbers and whose characteristic
polynomial splits over R, but not over Q.
7. Suppose T ∈ L(V ) is invertible with eigenvalue λ. Prove that λ−1 is an eigenvalue of T−1 with
the same eigenspace Eλ . If T is diagonalizable, prove that T−1 is also diagonalizable.
8. If V is finite-dimensional and T ∈ L(V ), we may define det T to equal det[T] β , where β is any
basis of V. Explain why the choice of basis does not matter; that is, if γ is any other basis of V,
we have det[T]γ = det[T] β .
7
1.2 Invariant Subspaces and the Cayley–Hamilton Theorem
The proof of Theorem 1.7 is facilitated by a new concept, of which eigenspaces are a special case.
TW : W → W : w 7→ T(w)
Examples 1.10. 1. The trivial subspace {0} and the entire vector space V are invariant for any
linear map T ∈ L(V ).
W is an example of a generalized eigenspace; we’ll study these properly at the end of term.
To prove our diagonalization criterion, we need to see how to factorize the characteristic polynomial.
It turns out that factors of p(t) correspond to T-invariant subspaces!
1 2 4
Example 1.11. W = Span{i, j} is an invariant subspace of A = 0 3 1 ∈ M3 (R). With respect to
002
the standard basis, the restriction [L A ]W has matrix 10 23 . The characteristic polynomial pW (t) of the
Theorem 1.12. Suppose T ∈ L(V ), that dim V is finite and that W is a T-invariant subspace of V.
Then the characteristic polynomial of the restriction TW divides that of T.
Proof. Extend a basis βW of W to a basis β of V. Since T(w) ∈ Span βW for each w ∈ W, we see that
the matrix of [T] has block form
A B
[T] β = =⇒ p(t) = det( A − tI ) det(C − tI ) = pW (t) det(C − tI )
O C
where pW (t) is the characteristic polynomial, and A = [TW ] βW the matrix of the restriction TW .
Corollary 1.13. If λ is an eigenvalue of T, then TEλ = λIEλ is a multiple of the identity, whence,
8
We are now in a position to state and prove an extended version of Theorem 1.7.
Theorem 1.14. Suppose dimF V = n and that T ∈ L(V ) has distinct eigenvalues λ1 , . . . , λk . The
following are equivalent:
1. T is diagonalizable.
2. The characteristic polynomial splits over F and dim Eλ j = mult(λ j ) for each j; indeed
p(t) = pλ1 (t) · · · pλk (t) = (λ1 − t)dim Eλ1 · · · (λk − t)dim Eλk
k
3. ∑ dim Eλ j = n
j =1
4. V = Eλ1 ⊕ · · · ⊕ Eλk
7 0 −12
Example 1.15. A= 01 0 is diagonalizable. Indeed p(t) = (1 − t)2 (3 − t) splits, and we have
2 0 −3
λ 1 3
mult(λ) n 22 0 o 1
3 and R4 = E1 ⊕ E3
Eλ Span 0 , 1 Span 0
1 0 1
dim Eλ 2 1
n 2 0 3 o 1 0 0
With respect to the eigenbasis β = 0 , 1 , 0 , the map is diagonal [L A ] β = 0 1 0
1 0 1 003
p ( t ) = ( λ 1 − t ) m1 · · · ( λ k − t ) m k
splits and ∑ mult(λi ) = n. The cardinality n of an eigenbasis is at most ∑ dim Eλi since every
element is an (independent) eigenvector. By Corollary 1.13 (dim Eλ j ≤ mult(λ j )) we see that
n≤ ∑ dim Eλ j
≤ ∑ mult(λ j ) = n =⇒ ∀ j, dim Eλ j = mult(λ j )
whence the inequalities are equalities with each pair equal dim Eλ j = mult(λ j )
(3 ⇒ 4) Assume Eλ1 ⊕ · · · ⊕ Eλ j exists.2 If (λ j+1 , v j+1 ) is an eigenpair, then v j+1 ̸∈ Eλ1 ⊕ · · · ⊕ Eλ j for
otherwise this would contradict Lemma 1.2.
By induction, Eλ1 ⊕ · · · ⊕ Eλk exists; by assumption it has dimension n = dim V and therefore
equals V.
9
T-cyclic Subspaces and the Cayley–Hamilton Theorem
We finish this chapter by introducing a general family of invariant subspaces and using them to prove
a startling result.
Definition 1.16. Let T ∈ L(V ) and let v ∈ V. The T-cyclic subspace generated by v is the span
all of which lie in Span{i, k}. Plainly this is the L A -cyclic subspace ⟨i + k⟩.
We were lucky in the example that the general form Am v was so clear. It is helpful to develop a more
precise test for identifying the dimension and a basis of a T-cyclic subspace.
Suppose a T-cyclic subspace ⟨v⟩ = Span{v, T(v), T2 (v), . . .} has finite dimension.3 Let k ≥ 1 be
maximal such that the set
is linearly independent.
• If k doesn’t exist, the infinite linearly independent set {v, T(v), . . .} contradicts dim ⟨v⟩ < ∞.
• By the maximality of k, Tk (v) ∈ Span{v, T(v), . . . , Tk−1 (v)}; by induction this extends to
It follows that ⟨v⟩ = Span{v, T(v), . . . , Tk−1 (v)}, and we’ve proved a useful criterion.
3 Necessarily the situation if dim V < ∞, when we are thinking about characteristic polynomials.
10
Examples 1.20. 1. According to the Theorem, in Example 1.17 we need only have noticed
2. Let T( p( x )) = 3p( x ) − p′′ ( x ) viewed as a linear map T ∈ L( P2 (R)) and consider the T-cyclic
subspace generated by the polynomial p( x ) = x2
We finish by considering the interaction of a T-cyclic subspace with the characteristic polynomial.
Surprisingly, the coefficients of the characteristic polynomial and the linear combination coincide.
0 −9
[TW ] βW = =⇒ pW (t) = t2 − 6t + 9
1 6
Theorem 1.21. Let T ∈ L(V ) and suppose W = ⟨w⟩ has dim W = k with basis
11
With a little sneakiness, we can drop the W’s in the second part of the Theorem and observe an
intimate relation between a linear map and its characteristic polynomial.
Proof. Let w ∈ V and consider the cyclic subspace W = ⟨w⟩ generated by w. By Theorem 1.12,
p ( t ) = qW ( t ) pW ( t )
for some polynomial qW . But the previous result says that pW (T)(w) = 0, whence
p(T)(w) = 0
Since we may apply this reasoning to any w ∈ V, we conclude that p(T) is the zero function.
7 6 2 1
A2 − 6A = −6 = −5I
18 19 3 4
It may seem like a strange thing to do for this matrix, but the characteristic equation can be
used to calculate the inverse of A:
1 1 4 −1
2 −1
A − 6A + 5I = 0 =⇒ A( A − 6I ) = −5I =⇒ A = (6I − A) =
5 5 −3 2
2 −1 83
A = 0 1 −6
0 0 2
By Cayley–Hamilton,
A4 = AA3 = A(5A2 − 8A + 4I )
= 5A3 − 8A2 + 4A = 5(5A2 − 8A + 4I ) − 8A2 + 4A
4 −3 50 2 −1 83 1 0 0
3
= 17A2 − 36A + 20I = 17 0 1 −18 − 36 0 1 −6 + 20 0 1 0
0 0 4 0 0 2 0 0 1
16 −15 562
3
=0 1 −90
0 0 16
12
3. Recall Example 1.4.2, where the linear map T( f ( x )) = f ( x ) + ( x − 1) f ′ ( x ) had
By Cayley–Hamilton, T3 = 6T2 − 11T + 6I. You can check this explicitly, after first computing
T2 ( f ( x )) = f ( x ) + 3( x − 1) f ′ ( x ) + ( x − 1)2 f ′′ ( x ), etc.
Cayley–Hamilton can also be used to simplify higher powers of T and even to compute the
inverse!
1 3 1
I= (T − 6T2 + 11T) =⇒ T−1 = (T2 − 6T + 11I)
6 6
1 1
=⇒ T−1 ( f ( x )) = f ( x ) − ( x − 1) f ′ ( x ) + ( x − 1)2 f ′′ ( x )
2 6
3 0 0
Exercises 1.2 1. For the linear map T = L A : R3 → R3 where A = 024 find the T-cyclic
0 002
subspace generated by the standard basis vector e3 = 0 .
1
0
1 2 4
2. Let T = L A , where A = 031 and let v = 1 . Compute T(v) and T2 (v). Hence describe
002 −1
the T-cyclic subspace ⟨v⟩ and its dimension.
2 0 0 0
3. Given A = 00 30 13 01 , find two distinct L A -invariant subspaces W ≤ R4 such that dim W = 3.
0003
4. Suppose that W and X are T-invariant subspaces of V. Prove that the sum
W + X = {w + x : w ∈ W, x ∈ X }
is also T-invariant.
6. Give an example of an infinite-dimensional vector space V, a linear map T ∈ L(V ), and a vector
v such that ⟨v⟩ = V.
d
7. Let β = {sin x, cos x, 2x sin x, 3x cos x } and T = dx ∈ L(Span β). Plainly the subspace W :=
Span{sin x, cos x } is T-invariant. Compute the matrices [T] β and [TW ] βW and observe that
2
p ( t ) = pW ( t )
9. Check the details of Example 1.23.3 and evaluate T4 as a linear combination of I, T and T2 . In
particular, check the evaluation of T−1 ( f ( x )).
13
10. Suppose a, b are constants with a ̸= 0 and define T( f ( x )) = a f ( x ) + b f ′ ( x ).
(a) Find the characteristic polynomial of T and identify its eigenspaces. Is T diagonalizable?
(b) Find a, b, c ∈ R such that T3 = aT2 + bT + cI.
(c) What are dim L( P2 (R)) and dim Span{Tk : k ∈ N0 }? Explain.
12. If A = ac db has non-zero determinant, use the Cayley–Hamilton Theorem to obtain the usual
14. (a) Consider Example 1.20.2 where T ∈ L( P2 (R)) is defined by T( p( x )) = 3p( x ) − p′′ ( x ).
Prove that all T-cyclic subspaces have dimension ≤ 2.
(b) What if we instead consider S ∈ L( P2 (R) defined by S( p( x )) = 3p( x ) − p′ ( x )?
0 0 0 ··· 0 − a0
1 0 0 0 − a1
0 1 0 0 − a2
[TW ] βW = .. .. .. ∈ Mk ( F )
. . .
0 0 0 0 − a k −2
0 0 0 ··· 1 − a k −1
(b) Compute the characteristic polynomial pW (t) = det [TW ] βW − tIk by expanding the de-
14
2 Inner Product Spaces, part 1
y1
You should be familiar with the scalar/dot product in R2 . For any vectors x = ( xx12 ) , y = , define
y2
x · y : = x1 y1 + x2 y2
√ y
The norm or length of a vector is ||x|| = x · x.
The angle θ between vectors satisfies x · y = ||x|| ||y|| cos θ. θ x
Vectors are orthogonal or perpendicular precisely when x · y = 0.
The dot product is what allows us to compute lengths of and angles between vectors in R2 . An inner
product is an algebraic structure that generalizes this idea to other vector spaces.
Definition 2.1. An inner product space (V, ⟨ , ⟩) is a vector space V over F together with an inner
product: a function ⟨ , ⟩ : V × V → F satisfying the following properties ∀x, y, z ∈ V, λ ∈ F,
The norm or length of x ∈ V is ||x|| := ⟨x, x⟩. A unit vector has ||x|| = 1.
p
Vectors x, y are perpendicular/orthogonal if ⟨x, y⟩ = 0 and orthonormal if they are additionally unit
vectors.
Definition 2.2. Euclidean space means Rn equipped with the standard inner (dot) product,
v
n u n
⟨x, y⟩ = x · y = y x = ∑ x j y j = x1 y1 + · · · + xn yn ,
T
||x|| = t ∑ x2j
u
j =1 j =1
Unless the inner product is stated explicitly, if we refer to Rn as an inner product space, we mean
Euclidean space. However, there are many other ways to make Rn into an inner product space. . .
15
Example 2.3. Define an alternative inner product on R2 via
⟨x, y⟩ = x1 y1 + 3x2 y2
1 5 1
x·x = , y·y = , x·y = √
2 6 2 3
We have the same vector space R2 , but different inner product spaces: (R2 , ⟨ , ⟩) ̸= (R2 , ·).
The above is an example of a weighted inner product: choose weights a1 , . . . , an ∈ R+ and define
n
⟨x, y⟩ = ∑ a j x j y j = a1 x1 y1 + · · · + a n x n y n
j =1
It is a simple exercise to check that this defines an inner product on Rn . In particular, Rn may be
equipped with infinitely many distinct inner products!
More generally, a symmetric matrix A ∈ Mn (R) is positive-definite if x T Ax > 0 for all non-zero x ∈ Rn .
It is straightforward to check that
⟨x, y⟩ := yT Ax
defines an inner product on Rn . In fact all inner products on Rn arise in this fashion! The weighted
inner products correspond to A being diagonal (Euclidean space is A = I), but this is not required.
Example 2.4. The matrix A = 31 11 is positive-definite and thus defines an inner product
⟨x, y⟩ = 3x1 y1 + x1 y2 + x2 y1 + x2 y2
16
Lemma 2.5 (Basic properties). Let V be an inner product space, let x, y, z ∈ V and λ ∈ F.
1. ⟨0, x⟩ = 0
2. ||x|| = 0 ⇐⇒ x = 0
3. ||λx|| = |λ| ||x||
4. ⟨x, z⟩ = ⟨y, z⟩ for all z =⇒ x = y y
5. (Cauchy–Schwarz inequality) |⟨x, y⟩| ≤ ||x|| ||y|| with equality x
if and only if x, y are parallel
6. (Triangle Inequality) ||x + y|| ≤ ||x|| + ||y|| with equality if and x+y
only if x, y are parallel and point in the same direction Triangle inequality
Be careful with notation: |λ| means the absolute value/modulus (of a scalar), while ||x|| means the norm
(of a vector).
⟨x,y⟩
In the real case, Cauchy–Schwarz allows us to define angle via cos θ = ||x||||y||
, since the right-hand
side lies in the interval [−1, 1]. However, except in Euclidean and R2 R3 ,
this notion is of limited use;
orthogonality (⟨x, y⟩ = 0) and orthonormality are usually all we care about.
Proof. Parts 1–3 are exercises. For simplicity, we prove 5 and 6 only when F = R.
4. Let z = x − y, apply the linearity condition and part 2:
5. If y = 0, the result is trivial. WLOG (and by part 3) we may assume ||y|| = 1; if the inequality
holds for this, then it holds for all non-zero y by parts 1 and 4. Now expand:
≥0 (Cauchy–Schwarz)
Equality requires both equality in Cauchy–Schwarz (x, y parallel) and that ⟨x, y⟩ ≥ 0; since x, y
are already parallel, this means that one is a non-negative multiple of the other.
17
Complex Inner Product Spaces
Definition 2.1 is already set up nicely when C = F. One subtle difference comes from how we expand
linear combinations in the second slot.
The proof is very easy if you remember your complex conjugates; try it!
Warning! If you dabble in the dark arts of Physics, be aware that their convention5 is for an inner
product to be conjugate-linear in the first entry and linear in the second!
Definition 2.7. The standard (Hermitian) inner product and norm on Cn are
v
n u n
2
x, y y ∗
x ∑ j j 1 1
x y x y x y , || || t ∑ x j
x
u
⟨ ⟩ = = = + · · · + n n =
j =1 j =1
Note that the weights a j must still be positive real numbers. We may similarly define inner products in
terms of positive-definite matrices ⟨x, y⟩ = y∗ Ax.
3 −i x1
⟨x, y⟩ = y1 y2 = 3x1 y1 + ix1 y2 − ix2 y1 + 3x2 y2
i 3 x2
Almost all results in this chapter will be written for general inner product spaces, thus covering the
real and complex cases simultaneously. If you don’t feel confident with complex numbers, simply
let F = R and delete all complex conjugates at first read! Very occasionally a different proof will be
required depending on the field. For simplicity, examples will more often use real inner products.
4 The prefix sesqui- means one-and-a-half ; for instance a sesquicentenary is a 150 year anniversary.
5 The common Physics notation relates to ours via ⟨x | y⟩ = ⟨y, x⟩.
18
Further Examples
As before, the field F must be either R or C.
⟨ A, B⟩ = tr( B∗ A)
where tr is the trace of an n × n matrix; this makes Mm×n (F) into an inner product space.
This isn’t really a new example: if we map A → Fm×n by stacking the columns of A, then the
Frobenius inner product is the standard inner product in disguise.
Definition 2.12 (L2 inner product). Given a real interval [ a, b], the function
Z b
⟨ f , g⟩ := f (t) g(t) dt
a
With careful restriction, this works even for infinite intervals and a larger class of functions.6 Veri-
fying the required properties is straightforward if you know a little analysis; for instance continuity
allows us to conclude
Z b
|| f ||2 = | f ( x )|2 dx = 0 ⇐⇒ f ( x ) ≡ 0
a
Example 2.13. Let f ( x ) = x and g( x ) = x2 ; these lie in the inner product space C [−1, 1] with
respect to the L2 inner product.
Z 1 Z 1 1
2 2
Z
2
⟨ f , g⟩ = x dx = 0, 3
|| f || = x dx = , 2
|| g||2 = x4 dx =
−1 −1 3 −1 5
n o q q
1 1 3 5 2
With some simple scaling, we see that || f || f , || g|| g = 2 x, 2x forms an orthonormal set.
6 For us, functions will always be continuous (often polynomials) on closed bounded intervals. The square-integrable
functions and L2 -spaces for which the inner product is named are a more complicated business and beyond this course.
19
∞
Definition 2.14 (ℓ2 inner product). A sequence ( xn ) is square-summable if ∑ | xn |2 < ∞. These se-
quences form a vector space on which we can define an inner product7 n =1
∞
⟨( xn ), (yn )⟩ = ∑ xn yn
n =1
In essence we’ve taken the standard inner product on Fn and let n → ∞! This example, and its L2
cousin, are the prototypical Hilbert spaces, which have great application to differential equations, sig-
nal processing, etc. Since a rigorous discussion requires a significant amount of analysis (convergence
of series, completeness, integrability), these objects are generally beyond the course.
Our final example of an inner product is a useful, and hopefully obvious, hack to which we shall
repeatedly appeal in examples.
Lemma 2.15. Let V be a vector space over R or C. If β is a basis of V, then there exists exactly one
inner product on V for which β is an orthonormal set.
which is orthogonal to x.
also square-summable.
20
7. Show that every eigenvalue of a positive definite matrix is positive.
10. Use basic algebra to prove the Cauchy–Schwarz inequality for vectors x = ( ba ) and y = ( dc ) in
R2 with the standard (dot) product.
11. Prove the Cauchy–Schwarz and triangle inequalities for a complex inner product space. What
has to change compared to the proof of Lemma 2.5?
If you know the length of every vector, then you know the inner product!
R2 √
13. Prove that 0 x+x1 dx ≤ √23
14. Let m ∈ Z and consider the complex-valued function f m ( x ) = √12π eimx . If ⟨ , ⟩ is the L2 inner
product on C [−π, π ], prove that { f m : m ∈ Z} is an orthonormal set.
This example is central to the study of Fourier series.
(Hint: If complex functions are scary, use Euler’s formula eimx = cos mx + i sin mx and work with the
real-valued functions cos mx and sin mx. The difficulty is that you then need integration by parts. . . )
15. Let ⟨ , ⟩ be an inner product on Fn (recall that F = R or C). Define the matrix A ∈ Mn (F) by
A jk = ek , e j where {e1 , . . . , en } is the standard basis. Verify that A is the matrix of the inner
product:
∀x, y ∈ Fn , ⟨x, y⟩ = y∗ Ax
In particular,
More generally, if β = {v1 , . . . , vn } is a basis then A jk = vk , v j defines the matrix of the inner
product with respect to β:
21
2.2 Orthogonal Sets and the Gram–Schmidt Process
We start with a simple definition, relating subspaces to orthogonality.
Definition 2.16. Let U be a subspace of an inner product space V. The orthogonal complement U ⊥ is
the set
U ⊥ = {x ∈ V : ∀u ∈ U, ⟨x, u⟩ = 0}
It is easy to check that U ⊥ is itself a subspace of V and that U ∩ U ⊥ = {0}. It can moreover be seen
that U ⊆ (U ⊥ )⊥ , though equality need not hold in infinite dimensions (see Exercise 7).
n 1 0 o n 0 o
Example 2.17. U = Span 0 , − 1 3 ⊥
≤ R has orthogonal complement U = Span 3 .
0 3 1
Theorem 2.18. Let V be an inner product space and let U = Span β where β = {u1 , . . . , un } is an
orthogonal set of non-zero vectors;
(
0 if j ̸= k
u j , uk = 2
u j ̸= 0 if j = k
Then:
Observe that (∗) essentially calculates the co-ordinate vector [x] β ∈ Fn . Recalling how unpleasant
such calculations have been in the past, often requiring large matrix inversions, we immediately see
the power of inner products and orthogonal bases.
22
Proof. 1. Since β spans U, a given x ∈ U may be written
n
x= ∑ ak uk
k =1
for some scalars ak . The orthogonality of β recovers the required expression for a j :
n
2
x, u j = ∑ ak uk , u j = a j u j
k =1
Examples 2.19. 1. Consider the standard orthonormal basis β = {e1 , e2 } of R2 . For any x = ( xx12 ),
we easily check that
2
∑ x, e j e j = x1 e1 + x2 e2 = x
j =1
n 1 2 3 o
2. In R3 , β = {u1 , u2 , u3 } = 2 , −1 , 6 is an orthogonal set and thus a basis. We
3 0 −5
7
compute the co-ordinates of x = 4 with respect to β:
2
1 2 3
3 x, u j 7 + 8 + 6 14 − 4 + 0 21 + 24 − 10
x=∑ u =
2 j
2 + −1 + 6
j =1 u j
1+4+9 4+1+0 9 + 36 + 25
3 0 −5
1 2 3 3/2
3 1
= 2 + 2 −1 + 6 =⇒ [x] β = 2
2 2
3 0 −5 1/2
Compare this with the painfully slow augmented matrix method for finding co-ordinates!
1 n 1 0 o
3. Revisiting Example 2.17, let x = 1 . Since β = 0 , −1 is an orthogonal basis of U, we
1 0 3
observe that
1 0 5 0
1 2 1 ⊥ 2
πU ( x ) = 0 + −1 = −1 , πU ( x ) = x − πU ( x ) = 3
1 10 5 5
0 3 3 1
23
The Gram–Schmidt Process
Theorem 2.18 tells us how to compute the orthogonal projections corresponding to V = U ⊕ U ⊥ ,
provided U has a finite, orthogonal basis. Given how useful such bases are, our next goal is to see that
such exist for any finite-dimensional subspace. Helpfully there exists a constructive algorithm.
• Choose u1 = a1 s1 where a1 ̸= 0
k −1
!
sk , u j
• For each k ≥ 2, choose uk = ak sk − ∑ 2
uj where ak ̸= 0 (†)
j =1 uj
The purpose of the scalars ak is to give you some freedom; choose them to avoid unpleasant fractions!
If you want a set of orthonormal vectors, it is easier to scale everything after the algorithm is complete.
Indeed, by taking S to be a basis of V and normalizing the resulting β, we conclude:
Corollary 2.21. Every finite-dimensional inner product space has an orthonormal basis.
n 1 2 1 o
Example 2.22. S = { s1 , s2 , s3 } = 0 , −1 , 1 is a linearly independent subset of R3 .
0 3 1
1
1. Choose u1 = s1 = 0 =⇒ ||u1 ||2 = 1
0
2 0 0
1
⟨s2 ,u1 ⟩
=⇒ ||u2 ||2 = 10
2
2. s2 − u = −1 − 0 = −1 : choose u2 = −1
||u1 ||2 1 3 1 0 3 3
0
1 1 0 0
⟨s3 ,u1 ⟩ ⟨s3 ,u2 ⟩
1 2 2
3. s3 − u − u2 = 1 − 0 − −1 = 3 : choose u3 = 3
||u1 ||2 1 ||u2 ||2 1 1 0 10 3 5 1 1
n o
The orthogonality of β = {u1 , u2 , u3 } is clear. It is now trivial to observe that u1 , √1 u2 , √1 u3 is
10 10
an orthonormal basis of R3 .
Proof of Theorem 2.20. For each k ≤ n, define Sk = {s1 , . . . , sk } and β k = {u1 , . . . , uk }. We prove by
induction that each β k is an orthogonal set of non-zero vectors and that Span β k = Span Sk . The
Theorem is then the terminal case k = n.
(Base case k = 1) Certainly β 1 = { a1 s1 } is an orthogonal set and Span β 1 = Span S1 .
(Induction step) Fix k ≥ 2, assume β k−1 is an orthogonal non-zero set and that Span β k−1 =
Span Sk−1 . By Theorem 2.18, uk ∈ (Span β k−1 )⊥ . We also see that uk ̸= 0, for if not,
(†) =⇒ sk ∈ Span β k−1 = Span Sk−1
and S would be linearly dependent. It follows that β k is an orthogonal set of non-zero vectors.
Moreover, sk ∈ Span β k =⇒ Span Sk ≤ Span β k . Since these spaces have the same (finite) dimension
k, we conclude that Span β k = Span Sk .
By induction, β is an orthogonal, non-zero spanning set for Span S; by Theorem 2.18, it is a basis.
24
Example 2.23. This time we work in the space of polynomials P(R) equipped with the L2 inner
R1
product ⟨ f , g⟩ = 0 f ( x ) g( x ) dx on the interval [0, 1]. Let S = {1, x, x2 } and apply the algorithm:
R1
1. Choose f 1 ( x ) = 1 =⇒ || f 1 ||2 = 0
1 dx = 1
⟨ x, f 1 ⟩ R1 1
2. x − f = x− x dx = x −
|| f 1 ||2 1 0 2
R1
We choose f 2 ( x ) = 2x − 1, with || f 2 ||2 = 0 (2x − 1)2 dx = 1
3
R1
⟨ x2 , f 1 ⟩ ⟨ x2 , f 2 ⟩ R1 x2 (2x −1) dx 1
3. x2 − 2 f1 − 2 = x2 − 0
x2 dx − 0
1/3 (2x − 1) = x2 − x + 6
|| f 1 || || f 2 ||
R1 2
We choose f 3 ( x ) = 6x2 − 6x + 1 with || f 3 ||2 = 0
6x2 − 6x + 1 dx = 1
5
This example can be extended to arbitrary degree since the countable set {1, x, x2 , . . .} is basis of
P(R). Indeed this shows that ( P(R), ⟨ , ⟩) has an orthonormal basis.
Gram–Schmidt also shows that our earlier discussion of orthogonal projections is generic.
∑ ⟨x, u⟩ u
u∈ β
Exercises 2.2 1. Apply Gram–Schmidt to obtain an orthogonal basis β for Span S. Then obtain the
co-ordinate representation (Fourier coefficients) of the given vector with respect to β.
n 1 0 0 o 1
(a) S = 1 , 1 , 0 ⊆ R3 , x = 0
1 1 1 1
3 5 −1 9 7 −17 1 27
(b) S = −1 1 , 5 −1 , 2 −6 ⊆ M2 (R), X = − (use the Frobenius product)
−4 8
n 1 0 0 o 1
(c) S = i , 1 , 0 ⊆ C3 with x = 1
0 i 1 1
R1
(d) S = {1, x, x } with ⟨ f , g⟩ = −1 f ( x ) g( x ) dx and f ( x ) = x2 .
2
Important! You’ll likely need much more practice than this to get comfortable with Gram–
Schmidt; make up your own problems!
25
n 1 2 o 3
2. Let S = {s1 , s2 } = 0 , −1 and U = Span S ≤ R3 . Find πU (x) if x = −1 .
3 0 −2
(Hint: First apply Gram–Schmidt)
3. Find the orthogonal complement to U = Span{ x2 } ≤ P2 (R) with respect to the inner product
R1
⟨ f , g⟩ = 0 f (t) g(t) dt
4. Let T ∈ L(V, W ) where V, W are inner product spaces with orthonormal bases β = {v1 , . . . , vn }
γ
and γ = {w1 , . . . , wm } respectively. Prove that the matrix A = [T] β ∈ Mm×n (F) of T with
respect to these bases has jkth entry
A jk = T(vk ), w j
5. Suppose that β is an orthonormal basis of an n-dimensional inner product space V. Prove that,
∀x, y ∈ V, ⟨x, y⟩ = [y]∗β [x] β
Otherwise said, the co-ordinate isomorphism ϕβ : V → Fn defined by ϕβ (x) = [x] β is an isomorphism
of inner product spaces where we use the standard inner product on Fn
6. Let U be a subspace of an inner product space V. Prove the following:
(a) U ⊥ is a subspace of V.
(b) U ∩ U ⊥ = {0}
(c) U ⊆ (U ⊥ ) ⊥
(d) If V = U ⊕ U ⊥ , then U = (U ⊥ )⊥ (this is always the case when dim U < ∞)
7. Let ℓ2 be the set of square-summable sequences of real numbers (Definition 2.14). Consider the
sequences u1 , u2 , u3 , . . ., where u j is the zero sequence except for a single 1 in the jth entry. For
instance,
u4 = (0, 0, 0, 1, 0, 0, 0, 0, . . .)
(a) Let U = Span{u j : j ∈ N}. Prove that U ⊥ contains only the zero sequence.
(b) Show that the sequence y = n1 lies in ℓ2 , but does not lie in U.
(b) Briefly explain why the Fourier series is not an element of Span β.
(c) Sketch a few of the Fourier approximations (sum up to m = 5 or 7. . . ) and observe, when
extended to R, how they approximate a discontinuous periodic function.
9. (Hard) Much of Theorem 2.18 remains true, with suitable modifications, even if β is an infinite
set. Restate and prove as much as you can, and identify the false part(s).
26
2.3 The Adjoint of a Linear Operator
Recall how the standard inner product on Fn may be written in terms of the conjugate-transpose
⟨x, y⟩ = y∗ x = yT x
We start by inserting a matrix into this expression and interpreting in two different ways. Suppose
A ∈ Mm×n (F), v ∈ Fn and w ∈ Fm , then
⟨ A∗ w, v⟩ = v∗ ( A∗ w) = (v∗ A∗ )w = ( Av)∗ w = ⟨w, Av⟩ (†)
| {z } | {z }
in Fn in Fm
p
Example 2.25. ∈ M2 (R), w = ( yx ) and v = q . Then,
As a sanity check, let A = 12
03
T x p 1 0 x p x p
A , = , = , = xp + (2x + 3y)q
y q 2 3 y q 2x + 3y q
x p x 1 2 p x p + 2q
,A = , = , = x ( p + 2q) + 3yq
y q y 0 3 q y 3q
Note how the inner products are evaluated on different spaces. At the level of linear maps this is a
relationship between L A ∈ L(Fn , Fm ) and L A∗ ∈ L(Fm , Fn ), one that is easily generalizable.
Definition 2.26. Let T ∈ L(V, W ) where V, W are inner product spaces over the same field F. The
adjoint of T is a function T∗ : W → V (read ‘T-star’) satisfying
Note that the first inner product is computed within V and the second within W.
The adjoint effectively extends the conjugate-transpose to linear maps. We now use the same notation
for three objects, so be careful!
T
• If A is a real or complex matrix, then A∗ = A is its conjugate-transpose.
• If T is a linear map, then T∗ is its adjoint.
• If V is a vector space, then V ∗ = L(V, F) is its dual space.
Thankfully the two notations line up nicely, as part 3 of our first result shows.
Theorem 2.27 (Basic Properties). 1. If an adjoint exists,8 then it is unique and linear.
3. Suppose V, W are finite-dimensional with orthonormal bases β, γ respectively. Then the matrix
β γ
of the adjoint of T ∈ L(V, W ) is the conjugate-transpose of the original: [T∗ ]γ = ([T] β )∗ .
8 Existence of adjoints is trickier, so we postpone this a little: see Corollary 2.34 and Exercise 12.
27
Proof. 1. (Uniqueness) Suppose T∗ and S∗ are adjoints of T. Then
Since this holds for all y, Lemma 2.5 part 4 says that ∀x, T∗ (x) = S∗ (x), whence T∗ = S∗ .
(Linearity) Simply translate across, use the linearity of T, and again appeal to Lemma 2.5:
∀z, ⟨T∗ (λx + y), z⟩ = ⟨λx + y, T(z)⟩ = λ ⟨x, T(z)⟩ + ⟨y, T(z)⟩ = λ ⟨T∗ (x), z⟩ + ⟨T∗ (y), z⟩
= ⟨λT∗ (x) + T∗ (y), z⟩
=⇒ T∗ (λx + y) = λT∗ (x) + T∗ (y)
i 2
β
[ T∗ ] γ = A ∗ = 1 1+i
−3 4 − 2i
As a sanity check, multiply out a few examples of ⟨ A∗ w, v⟩ = ⟨w, Av⟩; make sure you’re comfortable
with the fact that the left inner product is on C2 and the right on C3 !
The Theorem tells us that every linear map T ∈ L(V, W ) between finite-dimensional spaces has an
adjoint and moreover how to compute it:
γ
1. Choose orthonormal bases (exist by Corollary 2.21) and find the matrix [T] β .
γ ∗
2. Take the conjugate-transpose [T] β and translate back to find T∗ ∈ L(W, V ).
The prospect of twice applying Gram–Schmidt and translating between linear maps and their ma-
trices is unappealing; calculating this way can quickly become an enormous mess! In practice, it is
often better to try a modified approach; see for instance part 2(b) of the next Example.
28
d
Examples 2.29. Let T = dx ∈ L( P1 (R)) be the derivative operator; T( a + bx ) = b. We treat P1 (R)
as an inner product space in two ways.
1. Equip the inner product for which the standard basis ϵ = {1, x } is orthonormal. Then
0 1 0 0
∗
[T] ϵ = =⇒ [T ]ϵ = =⇒ T∗ ( a + bx ) = ax
0 0 1 0
R1
2. Equip the L2 inner product ⟨ f , g⟩ = 0 f ( x ) g( x ) dx. As we saw in Example 2.23, the basis
β = { f 1 , f 2 } = {1, 2x − 1} is orthogonal with || f 1 || = 1 and || f 2 || = √13 . We compute the
d
adjoint of T = in two different ways.
dx
n √ o n √ o
(a) The basis γ = { g1 , g2 } = f 1 , 3 f 2 = 1, 3(2x − 1) is orthonormal. Observe that
√
√ 0 2 3
0 0
T( g1 ) = 0, T( g2 ) = 2 3 =⇒ [T]γ = ∗
=⇒ [T ]γ = √
0 0 2 3 0
b b √ b √
∗ ∗
=⇒ T ( a + bx ) = T a + + √ 3(2x − 1) = a + · 2 3g2
2 2 3 2
= 3(2a + b)(2x − 1)
(b) Use the orthogonal basis β and the projection formula (Theorem 2.18). With p( x ) = a + bx,
⟨T∗ ( p ), f 1 ⟩ ⟨T∗ ( p ), f 2 ⟩
T∗ ( p ) = 2
f1 + f 2 = ⟨ p, T(1)⟩ + ⟨ p, T(2x − 1)⟩ · 3(2x − 1)
|| f 1 || || f 2 ||2
Z 1
= ⟨ p, 0⟩ + 3 ⟨ p, 2⟩ (2x − 1) = 3 2( a + bx ) dx (2x − 1)
0
= 3(2a + b)(2x − 1)
Note the advantages here: no square roots and no need to change basis at the end!
The calculations for the second example were much nastier, even though we were already in posses-
sion of an orthogonal basis. The crucial point is that the two examples produce different maps T∗ : the
adjoint depends on the inner product!
29
The Fundamental Subspaces Theorem
To every linear map are associated its range and nullspace. These interact nicely with the adjoint. . .
1. R(T∗ )⊥ = N (T)
The proof is left to Exercise 6. You’ve likely observed this with transposes of small matrices.
R1
Example 2.33. g( p) := 0 p( x ) dx is a linear map g : P2 (R) → R. Equip P2 (R) with the inner
product for which the standard basis {1, x, x2 } is orthonormal. Then
g( a + bx + cx2 ) = a + 21 b + 31 c = a + bx + cx2 , 1 + 21 x + 13 x2
The idea of the proof is very simple: if g(x) = ⟨x, y⟩ then the nullspace of g must equal Span{y}⊥ . . .
Let u ∈ N ( g)⊥ be either of the two unit vectors and define, independently of u,
y := g(u)u ∈ V
The uniqueness of y follows from the cancellation property (Lemma 2.5, part 4).
30
Due to the tight correspondence, the map is often decorated as gy . Riesz’s theorem indeed says that
y 7→ gy is an isomorphism V ∼ = V ∗ . While there are infinitely many isomorphisms between these
spaces, the inner product structure identifies a canonical or preferred choice.
Corollary 2.34. Every linear map on a finite-dimensional inner product space has an adjoint.
Note how only the domain is required to be finite-dimensional! Riesz’s Theorem and the Corollary
also apply to continuous linear operators on (infinite-dimensional) Hilbert spaces, though the proof
is a little trickier.
Proof. Let T ∈ L(V, W ) where dim V < ∞, and suppose z ∈ W is given. Simply define T∗ (z) := y
where y ∈ V is the unique vector in Riesz’s Theorem arising from the linear map
g : V → F, g ( x ) = ⟨T( x ), z ⟩
Exercises 2.3 1. For each inner product space V and linear operator T ∈ L(V ), evaluate T∗ on the
given vector.
2x +y
(a) V = R2 with the standard inner product, T ( yx ) = x−3y and x = 35
R1
(c) V = P1 (R) with ⟨ f , g⟩ = 0 f (t) g(t) dt, T( f ) = f ′ + 3 f and f (t) = 4 − 2t
2. Suppose A = 14 13 and consider the linear map T = L A ∈ L(R2 ) where R2 is equipped with
⟨x, y⟩ = 4x1 y1 + x2 y2
(a) Find the matrix of T with respect to the orthonormal basis β = {v1 , v2 } = { 21 e1 , e2 }.
(b) Find the adjoint T∗ and its matrix with respect to the standard basis ϵ = {e1 , e2 }.
(Hint: the answer isn’t A T !)
d
3. Extending Examples 2.29, find the adjoint of T = dx ∈ L( P2 (R)) with respect to:
(a) The inner product where the standard basis ϵ = {1, x, x2 } is orthonormal.
R1
(b) (Hard!) The L2 inner product 0 f ( x ) g( x ) dx.
4. Let T( f ) = f ′′ be a linear transformation of P2 (R) and let ϵ = {1, x, x2 } be the standard basis.
Find T∗ ( a + bx + cx2 ):
(a) With respect to the inner product where ϵ is orthonormal;
R1
(b) With respect to the L2 inner product ⟨ f , g⟩ = −1 f (t) g(t) dt.
(Hint: {1, x, 3x2 − 1} is orthogonal)
31
5. Prove part 2 of Theorem 2.27.
7. For each inner product space V and linear transformation g : V → F, find a vector y ∈ V such
that g(x) = ⟨x, y⟩ for all x ∈ V.
x
3
(a) V = R with the standard inner product, and g y = x − 2y + 4z
z
z
(b) V = C2 with the standard inner product, and g ( w ) = iz − 2w
R1
(c) V = P2 (R) with the L2 inner product ⟨ f , h⟩ = 0 f ( x )h( x ) dx, and g ( f ) = f ′ (1)
8. (a) In the proof of Theorem 2.32, explain why y depends only on g (not u).
(b) In the proof of Corollary 2.34, check that g(x) := ⟨T(x), z⟩ is linear.
9. Let y, z ∈ V be fixed vectors and define T ∈ L(V ) by T(x) = ⟨x, y⟩ z. Show that T∗ exists and
find an explicit expression.
10. Suppose A ∈ Mm×n (F). Prove that A∗ A is diagonal if and only if the columns of A are orthog-
onal. What additionally would it mean if A∗ A = I?
(a) Prove that the eigenvalues of T∗ are the complex conjugates of those of T.
(Hint: relate the characteristic polynomial p∗ (t) = det(T∗ − tI) to that of T)
(b) Prove that T∗ is diagonalizable if and only if T is.
12. (Hard) We present two linear maps which do not have an adjoint!
(a) Since ϵ = {1, x, x2 , . . .} is a basis of P(R), we may define a linear map T ∈ L( P(R)) via
T( x n ) = 1 for all n; for instance
T(4 + 3x + 2x5 ) = 9
Let ⟨ , ⟩ be the inner product for which ϵ is orthonormal. If T∗ existed, show that
∞
T∗ (1 ) = ∑ xn
n =0
∞
!
1
T( xn ) = ∑ xn , 0, 0, 0, 0, 0, . . .
n =1
n
Find the adjoint T∗ . If V ≤ ℓ2 is the subspace whose elements have only finitely many
non-zero terms, show that the restriction TV does not have an adjoint.
32
2.4 Normal & Self-Adjoint Operators and the Spectral Theorem
We now come to the fundamental question of this chapter: for which linear operators T ∈ L(V ) can
we find an orthonormal eigenbasis? Many linear maps are, of course, not even diagonalizable, so in
general this is far too much to hope for! Let’s see what happens if such a basis exists. . .
If β is an orthonormal basis of eigenvectors of T, then
If V is a real inner product space, then these matrices are identical and so T∗ = T. In the complex
case, we instead observe that
Definition 2.35. Suppose T is a linear operator on an inner product space V and assume T has an
adjoint. We say that T is,
Normal if TT∗ = T∗ T,
Self-adjoint if T∗ = T.
The definitions for square matrices over R and C are identical, where ∗ now denotes the conjugate-
transpose.
Example 2.36. The (non-symmetric) real matrix A = 21 −21 is normal but not self-adjoint:
2 −1 2 1 5 0 2 1 2 −1
T
AA = = = = AT A
1 2 −1 2 0 5 −1 2 1 2
More generally, every non-zero skew-hermitian matrix A∗ = − A is normal but not self-adjoint:
A∗ = − A =⇒ AA∗ = − A2 = A∗ A
We saw above that linear maps with an orthonormal eigenbasis are either self-adjoint or normal
depending whether the inner product space is real or complex. Amazingly, this provides a complete
characterisation of such maps!
Theorem 2.37 (Spectral Theorem, version 1). Let T be a linear operator on a finite-dimensional
inner product space V.
33
Examples 2.38. 1. We diagonalize the self-adjoint linear map T = L A ∈ L(R2 ) where A = 6 3 .
3 −2
The basis β = {w1 , w2 } is orthonormal, with respect to which [T] β = 70 −03 is diagonal.
1 3 1 0 10 −6 1 3 1 0 1 3
∗
AA = = ̸= = = A∗ A
0 −2 3 −2 −6 4 3 13 3 −2 0 −2
It is diagonalizable, indeed
1 3 1 0
γ= , ⇝ [T] γ =
0 −1 0 −2
3. Let A = 01 −01 and consider T = L A acting on both C2 and R2 . Since T is normal but not
self-adjoint, we’ll see how the field really matters in the spectral theorem.
First the complex case: T ∈ L(C2 ) is normal and thus diagonalizable with respect to an or-
thonormal basis of eigenvectors. Here are the details.
Now for the real case: T ∈ L(R2 ) is not self-adjoint and thus should not be diagonalizable
with respect to an orthonormal basis of eigenvectors. Indeed this is trivial; the characteristic
polynomial has no roots in R and so there are no real eigenvalues! It is also clear geometrically:
T is rotation by 90° counter-clockwise around the origin, so it has no eigenvectors.
34
Proving the Spectral Theorem for Self-Adjoint Operators
It is irrelevant whether V is real or complex. The previous example demonstrates part 2; even when
V = C2 is a complex inner product space, the eigenvalues of a self-adjoint matrix are real.
3. This is trivial if V is complex since every characteristic polynomial splits over C. We therefore
assume V is real. Choose any orthonormal basis γ of V, let A = [T]γ ∈ Mn (R), and define
S := L A ∈ L(Cn ). Then;
We are now able to prove the spectral theorem for self-adjoint operators on a finite-dimensional inner
product space V. The argument applies regardless of whether V is real or complex.
35
Proving the Spectral Theorem for Normal Operators
What changes for normal operators on complex inner product spaces? Not much! Indeed the proof
is almost identical when T is merely normal.
• We don’t need parts 2 and 3 of Lemma 2.39: every linear operator on a finite-dimensional
complex inner product space has an eigenvalue and we no longer care whether eigenvalues are
real.
– W being T-invariant: This isn’t quite as simple as (∗), but thankfully part 3 of the next
result provides the needed correction.
– TW being normal: We need a replacement for part 1 of Lemma 2.39; this is a little more
involved.
Rather than write out all the details, we leave this to Exercises 6 and 7.
For completeness, and as an analogue/extension of Lemma 2.39, we summarize some of the basic
properties of normal operators. These also apply to self-adjoint operators as a special case.
3. T(x) = λx ⇐⇒ T∗ (x) = λx so that T and T∗ have the same eigenvectors and conjugate
eigenvalues. This recovers the previously established fact that λ ∈ R if T is self-adjoint.
Proof. 1. ||T(x)||2 = ⟨T(x), T(x)⟩ = ⟨T∗ T(x), x⟩ = ⟨TT∗ (x), x⟩ = ⟨T∗ (x), T∗ (x)⟩ = ||T∗ (x)||2 .
2. ⟨x, (T − tI)(y)⟩ = ⟨x, T(y)⟩ − t ⟨x, y⟩ = ⟨T∗ (x), y⟩ − tx, y = (T∗ − tI)(x), y shows that T −
tI has adjoint T∗ − tI. It is trivial to check that these commute.
parts
3. T(x) = λx ⇐⇒ ||(T − λI)(x)|| = 0 ⇐⇒ (T∗ − λI)(x) = 0 ⇐⇒ T∗ (x) = λx.
1&2
4. In part this follows from the spectral theorem, but we can also prove more straightforwardly.
Suppose T(x) = λx and T(y) = µy where λ ̸= µ. By part 3,
36
Schur’s Lemma
It is reasonable to ask how useful an orthonormal basis can be in general. Here is one answer.
Lemma 2.41 (Schur). Suppose T is a linear operator on a finite-dimensional inner product space V.
If the characteristic polynomial of T splits, then there exists an orthonormal basis β of V such that
[T] β is upper-triangular.
The spectral theorem is a special case; since the proof is similar, we leave it to the exercises.
The conclusion of Schur’s lemma is weaker than the spectral theorem, though it applies to more
operators: indeed if V is complex, it applies to any T! Every example of the spectral theorem is also
an example of Schur’s lemma. Example 2.38.2 provides another, since the matrix A is already upper
triangular with respect to the standard orthonormal basis. Here is a another example.
Example 2.42. Consider T( f ) = 2 f ′ ( x ) + x f (1) as a linear map T ∈ L( P1 (R)) with respect to the
R1
L2 inner product ⟨ f , g⟩ = 0 f (t) g(t) dt. We have
T( a + bx ) = 2b + ( a + b) x
⟨1, 1 + x ⟩ 1 + 12 1
1− 2
( 1 + x ) = 1 − 1
(1 + x ) = (5 − 9x ) =⇒ f 2 = 5 − 9x
||1 + x || 1+1+ 3 14
2 −13
T( f 2 ) = −18 − 4x = −13(1 + x ) − (5 − 9x ) = −13 f 1 − f 2 =⇒ [T]{ f1 , f2 } =
0 −1
We can also work with the corresponding orthonormal basis as posited in the theorem, though the
matrix is messier:
2 − √133
(r ) !
3 1
β = { g1 , g2 } = (1 + x ), √ (5 − 9x ) =⇒ [T] β =
7 7 0 −1
Alternatively, we could have started with the other eigenvector h1 = 2 − x: an orthogonal vector to
this is h2 = 4 − 9x, with respect to which
−1 −13
[T]{h1 ,h2 } =
0 2
In both cases the eigenvalues are down the diagonal, as must be for an upper-triangular matrix.
In general, it is difficult to quickly find a suitable basis satisfying Schur’s lemma. After trying the
proof in the exercises, you should be able to describe a method, though it is impractically slow!
37
Exercises 2.4 1. For each linear operator T on an inner product space V, decide whether T is nor-
mal, self-adjoint, or neither. If the spectral theorem permits, find an orthonormal eigenbasis.
3. Suppose S, T are self-adjoint operators on an inner product space V. Prove that ST is self-adjoint
if and only if ST = TS.
(Hint: recall Theorem 2.27)
4. Let T be normal on a finite-dimensional inner product space V. Prove that N (T∗ ) = N (T) and
that R(T∗ ) = R(T).
(Hint: Use Lemma 2.40 and the Fundamental Subspaces Theorem 2.30)
6. Let W be a T-invariant subspace of an inner product space V and let TW ∈ L(W ) be the restric-
tion of T to W. Prove:
(a) W ⊥ is T∗ -invariant.
(b) If W is both T- and T∗ -invariant, then (TW )∗ = (T∗ )W .
(c) If W is both T- and T∗ -invariant and T is normal, then TW is normal.
7. Use the previous question to complete the proof of the spectral theorem for a normal operator
on a finite-dimensional complex inner product space.
8. (a) Suppose S is a normal operator on a finite-dimensional complex inner product space, all of
whose eigenvalues are real. Prove that S is self-adjoint.
(b) Let T be a normal operator on a finite-dimensional real inner product space V whose char-
acteristic polynomial splits. Prove that T is self-adjoint and that there exists an orthonor-
mal basis of V of eigenvectors of T.
(Hint: Mimic the proof of Lemma 2.39 part 3 and use part (a))
9. Prove Schur’s lemma by induction, similarly to the proof of the spectral theorem.
(Hint: T∗ has an eigenvector x; why? Now show that W = Span{x}⊥ is T-invariant. . . )
38
2.5 Unitary and Orthogonal Operators and their Matrices
In this section we focus on length-preserving transformations of an inner product space.
Definition 2.43. A linear9 isometry of an inner product space V is a linear map T satisfying
∀x ∈ V, ||T(x)|| = ||x||
2 2 2
x 1 4x − 3y 1 x
T (4x − 3y)2 + (3x + 4y)2 = x2 + y2 =
= =
y 5 3x + 4y 25 y
This matrix is very special in that its inverse equals its transpose:
1 4 3 1 4 3
−1
A = 16 = = AT
+ 9 − 3 4 5 − 3 4
25 25
We call such matrices orthogonal. The simple version of what follows is that every linear isometry on
Rn is multiplication by an orthogonal matrix.
Definition 2.45. A unitary operator T on an inner product space V is an invertible linear map satis-
fying T∗ T = I = TT∗ . A unitary matrix is a (real or complex) matrix satisfying A∗ A = I.
If V is real, we usually call these orthogonal operators/matrices; this isn’t necessary, since unitary en-
compasses both real and complex spaces. An orthogonal matrix satisfies A T A = I.
1 i 2+2i
Example 2.46. The matrix A = is unitary:
3 2−2i i
In infinite dimensions, we need T∗ to be both the left- and right-inverse of T. This isn’t an empty
requirement (see Exercise 13).
9 There also exist non-linear isometries: for instance translations (T( x ) = x + a for any constant a) and complex conjugation
(T(x) = x) on Cn . Together with linear isometries, these essentially comprise all isometries in finite dimensions.
39
We now tackle the correspondence between unitary operators and isometries.
The finite-dimensional restriction is important in part 2: we use the existence of adjoints, the spectral
theorem, and that a left-inverse is also a right-inverse. See Exercise 13 for an example of a non-unitary
isometry in infinite dimensions.
The proof shows a little more:
Corollary 2.48. On a finite-dimensional space, being unitary is equivalent to each of the following:
While (a) is simply (†), claims (b) and (c) are also worth proving explicitly: see Exercise 9. If β is the
standard orthonormal basis of Fn and T = L A , then the columns of A form the orthonormal set T( β).
This makes identifying unitary/orthogonal matrices easy:
Corollary 2.49. A matrix A ∈ Mn (R) is orthogonal if and only if its columns form an orthonormal
basis of Rn with respect to the standard (dot) inner product.
A matrix A ∈ Mn (C) is unitary if and only if its columns form an orthonormal basis of Cn with
respect to the standard (Hermitian) inner product.
10 In ⟨x,y⟩
particular, in a real inner product space isometries also preserve the angle θ between vectors since cos θ = ||x||||y|| .
40
Examples 2.50. 1. The matrix Aθ = cos θ − sin θ ∈ M (R) is orthogonal for any θ. Example 2.44 is
sin θ cos θ 2
− 1 3 −1 3 − 1 4
this with θ = tan 4 = sin 5 = cos 5 . More generally (Exercise 6), it can be seen that every
real orthogonal 2 × 2 matrix has the form Aθ or
cos θ sin θ
Bθ =
sin θ − cos θ
for some angle θ. The effect of the L Aθ is to rotate counter-clockwise by θ, while that of LBθ is to
reflect across the line making angle 12 θ with the positive x-axis.
√ √ !
√2 3 1
2. A = √1 ∈ M3 (R) is orthogonal: check the columns!.
6 √2 √0 −2
− 2 3 −1
3. A = √1 1 i ∈ M2 (C) is unitary: indeed it maps the standard basis to the orthonormal basis
2 i 1
1 1 1 i
T( β ) = √ ,√
2 i 2 1
It is also easy to check that the characteristic polynomial is
√1 − t √i 1 2 1
!
1
p(t) = det 2 2 = t− √ + =⇒ t = √ (1 ± i ) = e±πi/4
i 1
√ −t
√
2 2 2 2 2
1 1
D E Z π Z π D E
eix f ( x ), g( x ) = eix f ( x ) g( x ) dx = f ( x )e−ix g( x ) dx = f ( x ), e−ix g( x )
2π −π 2π −π
11 An infinite orthonormal set β = { f : k ∈ Z} can be found so that every function f ‘equals’ an infinite series in the
k
sense that || f − ∑ ak f k || = 0. Since these are not finite sums, β isn’t strictly a basis, though it isn’t uncommon for it to be
so described. Moreover, given that the norm is defined by an integral, this also isn’t a claim that f and ∑ ak f k are equal
as functions. Indeed the infinite series need not be continuous! For these reasons, when working with Fourier series, one
tends to consider a broader class than the continuous functions.
41
Unitary and Orthogonal Equivalence
Suppose A ∈ Mn (R) is symmetric (self-adjoint) A T = A. By the spectral theorem, A has an orthonor-
mal eigenbasis β = {w1 , . . . , wn }: Aw j = λ j w j . Arranging the eigenbasis as the columns of a matrix,
we see that the columns of U = (w1 · · · wn ) are orthonormal and so U is an orthogonal matrix. We
can therefore write
λ1 · · · 0
A = UDU −1 = U ... ..
.
.. U T
.
0 ··· λn
A similar approach works if A ∈ Mn (C) is normal: we now have A = UDU ∗ where U is unitary.
polynomial is
1 1 1 1
w2 = √ , w2i = √
2 −i 2 i
We conclude that
−1
1 1 1 2 0 1 1 1 1 1 1 0 1 i
A= √ √ =
2 −i i 0 2i 2 −i i −i i 0 i 1 −i
Definition 2.52. Square matrices A, B are unitarily equivalent if there exists a unitary matrix U such
that B = U ∗ AU. Orthogonal equivalence is similar: B = U T AU.
Theorem 2.53. A ∈ Mn (C) is normal if and only if it is unitarily equivalent to a diagonal matrix
(the matrix of its eigenvalues).
A ∈ Mn (R) is self-adjoint (symmetric) if and only if it is orthogonally equivalent to a diagonal matrix.
A T = (U T DU )T = U T D T U = U T DU = A
42
Exercises 2.5 1. For each matrix A find an orthogonal or unitary U and a diagonal D = U ∗ AU.
2 1 1
12 0 −1 2 3−3i
(a) (b) (c) (d)
21 1 0 3+3i 5 121
112
2. Which of the following pairs are unitarily/orthogonally equivalent? Explain your answers.
0 −1 0 2 0 0
(a) A = 01 10 and B = 02 20 (b) A = 1 0 0 and B = 0 −1 0
0 0 1 0 0 0
0 −1 0 1 0 0
(c) A = 1 0 0 and B = 0 i 0
0 0 1 0 0 −i
3. Let a, b ∈ C be such that | a|2 + |b|2 = 1. Prove that every 2 × 2 matrix of the form
a −eiθ b is
b eiθ a
unitary. Are these all the unitary 2 × 2 matrices? Prove or disprove.
4. If A, B are orthogonal/unitary, prove that AB and A−1 are also orthogonal/unitary.
(This proves that orthogonal/unitary matrices are groups under matrix multiplication)
5. Check that A = 13 4i5 −54i ∈ M2 (C) satisfies A T A = I (it is a complex orthogonal matrix).
(These don’t have the same nice relationship with inner products, and are thus less useful to us)
6. Supply the details of Exercise 2.50.1.
(Hints: β = {i, j} is orthonormal, whence { Ai, Aj} must be orthonormal. Now draw pictures to
compute the result of rotating and reflecting the vectors i and j.)
7. Show that the linear map in Example 2.50.4 has no eigenvectors.
8. Prove that A ∈ Mn (C) has an orthonormal basis of eigenvectors whose eigenvalues have mod-
ulus 1, if and only if A is unitary.
9. Prove parts (b) and (c) of Corollary 2.48 for a finite-dimensional inner product space:
(a) If β is an orthonormal basis such that T( β) is orthonormal, then T is unitary.
(b) If T is unitary, and η is an orthonormal basis, then T(η ) is an orthonormal basis.
10. Let T be a linear operator on a finite-dimensional inner product space V. If ||T(x)|| = ||x|| for
all x in some orthonormal basis of V, must T be unitary? Prove or disprove.
11. Let T be a unitary operator on an inner product space V and let W be a finite-dimensional
T-invariant subspace of V. Prove:
(a) T(W ) = W (Hint: show that TW is injective);
(b) W⊥ is T-invariant.
12. Let W a subspace of an inner product space V such that V = W ⊕ W ⊥ . Define T ∈ L(V ) by
T(u + w) = u − w where u ∈ W and w ∈ W ⊥ . Prove that T is unitary and self-adjoint.
13. In the inner product space ℓ2 of square-summable sequences, consider the linear operator
T( x1 , x2 , . . .) = (0, x1 , x2 , . . .). Prove that T is an isometry and compute its adjoint. Check that T
is non-invertible and non-unitary.
14. Prove Schur’s Lemma for matrices. Every A ∈ Mn (R) is orthogonally equivalent and every
A ∈ Mn (C) is unitarily equivalent to an upper triangular matrix.
43
2.6 Orthogonal Projections
Recall the discussion of the Gram-Schmidt process, where we saw that any finite-dimensional sub-
space W of an inner product space V has an orthonormal basis βW = {w1 , . . . , wn }. In such a situa-
tion, we can define the orthogonal projections onto W and W ⊥ via
n
πW : V → W : x 7 → ∑ x, w j w j , ⊥
πW : V → W ⊥ : x 7 → x − πW ( x )
j =1
Our previous goal was to use orthonormal bases to ease computation. In this section we develop
projections more generally. First recall the notion of a direct sum within a vector space V:
1 6 −2 2 1
Example 2.55. A= is a projection matrix with R( A) = Span and N ( A) = Span .
5 3 −1 1 3
Indeed, it is straightforward to describe all projection matrices in M2 (R). There are three cases:
1 a 1 ad − ac
A= (d − c) =
ad − bc b ad − bc bd −bc
It should be clear that every projection T has (at most) two eigenspaces:
• R(T) is an eigenspace with eigenvalue 1
• N (T) is an eigenspace with eigenvalue 0
If V is finite-dimensional and ρ, η are bases of R( T ), N ( T ) respectively, then the matrix of T with
respect to ρ ∪ η has block form
I 0
[T] ρ ∪ η =
0 0
where rank I = rank T. In particular, every finite-dimensional projection is diagonalizable.
44
Lemma 2.56. T ∈ L(V ) is a projection if and only if T2 = T.
(⇐) Suppose T2 = T. Note first that if r ∈ R(T), then r = T(v) for some v ∈ V, whence
Thus T is the identity on R(T). Moreover, if x ∈ R(T) ∩ N (T), (†) says that x = T(x) = 0, whence
and so R(T) ⊕ N (T) is a well-defined subspace of V.12 To finish things off, let v ∈ V and observe that
so that v = T(v) + v − T(v) is a decomposition into R(T)- and N (T)-parts. We conclude that
Thus far the discussion hasn’t had anything to do with inner products. . .
Definition 2.57. An orthogonal projection is a projection T ∈ L(V ) on an inner product space for
which we additionally have
R(πW ) = W and N ( πW ) = W ⊥
⊥ = I − π has R( π ⊥ ) = W ⊥ and N ( π ⊥ ) = W.
The complementary orthogonal projection πW W W W
Example (2.55 continued). The identity and zero matrices are both 2 × 2 orthogonal projection
matrices, while those of type 3 are orthogonal if ( ba ) · ( dc ) = 0: we obtain
2
1 a 1 a ab
A= 2 ( a b) = 2
a + b2 b a + b2 ab b2
More generally, if W ≤ Fn has orthonormal basis {w1 , . . . , wk }, then the matrix of πW is ∑kj=1 w j w∗j .
12 In
finite dimensions, the rank–nullity theorem and dimension counting finishes the proof here without having to
proceed further:
45
Theorem 2.58. A projection T ∈ L(V ) is orthogonal if and only if it is self-adjoint T = T∗ .
Proof. (⇒) By assumption, R(T) and N (T) are orthogonal subspaces. Letting x, y ∈ V and using
subscripts to denote R(T)- and N (T)-parts, we see that
Since T is a projection already, we have V = R(T) ⊕ N (T) = R(T) ⊕ R(T)⊥ , from which
⊥
R(T) = R(T)⊥ = N (T) ⊥
Theorem 2.59 (Spectral Theorem, mk. II). Let V be a finite-dimensional complex/real inner prod-
uct space and T ∈ L(V ) be normal/self-adjoint with spectrum {λ1 , . . . , λk } and corresponding
eigenspaces E1 , . . . , Ek . Let π j ∈ L(V ) be the orthogonal projection onto Ej . Then:
2. πi π j = 0 if i ̸= j.
4. (Spectral decomposition) T = λ1 π1 + · · · + λk πk
Proof. 1. T is diagonalizable and so V is the direct sum of the eigenspaces of T. Since T is normal,
the eigenvectors corresponding to distinct eigenvalues are orthogonal, whence the eigenspaces
are mutually orthogonal. In particular, this says that
Êj := Ei ≤ E⊥
M
j
i̸= j
46
Examples 2.60. We verify the resolution of the identity and the spectral decomposition; for clarity,
we index projections and eigenspaces by eigenvalue rather than the natural numbers.
1. The symmetric matrix A = 10 2
2 7 has spectrum {6, 11} and orthonormal eigenvectors
1 1 1 2
w6 = √ , w11 = √
5 −2 5 1
The corresponding projections therefore have matrices
1 1 1 1 −2 1 4 2
T T
π6 = w6 w6 = (1 − 2) = , π11 = w11 w11 =
5 −2 5 −2 4 5 2 1
from which the resolution of the identity and the spectral decomposition are readily verified:
1 0 1 6 + 44 −12 + 22
π6 + π11 = and 6π6 + 11π11 = =A
0 1 5 −12 + 22 24 + 11
mal eigenvectors
1 i 1 1
w2 = √ , w2i = √
2 1 2 i
The orthogonal projection matrices are therefore
1 i 1 1 i 1 1 −i
π2 = w2 w2∗ = (−i 1) = ∗
π2i = w2i w2i =
2 1 2 −i 1 2 i 1
from which
1 0 1 i i 1
π2 + π2i = and 2π2 + 2iπ2i = + =B
0 1 −i 1 −1 i
0 1 1
3. The matrix C = 1 0 1 has spectrum {−1, 2}, an orthonormal eigenbasis
110
1 1 1
1 1 1
{u, v, w} = √ 1 , √ −1 , √ 1
3 2 6 −2
1 0
and eigenspaces E2 = Span{u} and E−1 = Span{v, w}. The orthogonal projections have ma-
trices
1 1 1 1
1 1
π2 = uu T = 1 (1 1 1) = 1 1 1
3 3
1 1 1 1
1 1
1 1
π−1 = vv T + ww T = −1 (1 − 1 0) + 1 (1 1 − 2)
2 6
0 −2
1 −1 0 1 1 −2 2 −1 −1
1 1 1
= −1 1 0 + 1 1 −2 = −1 2 −1
2 6 3
0 0 0 −2 −2 4 −1 −1 2
It is now easy to check the resolution of the identity and the spectral decomposition:
π 2 + π −1 = I and 2π2 − π−1 = C
47
Orthogonal Projections and Minimization Problems
We finish this section with an important observation that drives much of the application of inner
product spaces to other parts of mathematics and beyond. Throughout this discussion, X and Y
denote inner product spaces.
Theorem 2.61. Suppose Y = W ⊕ W ⊥ . For any y ∈ Y, the orthogonal projection πW (y) is the
unique element of W which minimizes the distance to y:
⊥ (y) = y − π (y) ∈ W ⊥ ,
Proof. Apply Pythagoras’ Theorem: since πW W
y
q q
√1 , 3 5 2
2 2 x, 8 (3x − 1)
from which
2
1 3 5
p( x ) = ⟨1, e x ⟩ + ⟨ x, e x ⟩ x + (3x2 − 1) 3x2 − 1, e x
2 2 8
1 1 x
Z 1
3
Z
= e dx + x xe x dx
2 −1 2 −1 1
Z 1
5
+ (3x2 − 1) (3x2 − 1)e x dx
8 −1
1 5
= (e − e−1 ) + 3e−1 x + (e − 7e−1 )(3x2 − 1)
2 4
≈ 1.18 + 1.10x + 0.179(3x2 − 1)
−1 0 1
≈ 1 + 1.1x + 0.537x2 x
The linear and quadratic approximations to y = e x are drawn. Compare this with the Maclaurin
polynomial e x ≈ 1 + x + 21 x2 from calculus.
48
2. The nth Fourier approximation of a function f ( x ) is its orthogonal projection onto the finite-
dimensional subspace
According to the Theorem, this is the unique function in Fn ( x ) ∈ Wn minimizing the integral
Z π
|| f ( x ) − Fn ( x )||2 = | f ( x ) − Fn ( x )|2 dx
−π
(
1 if 0 < x ≤ π
For example, if f ( x ) = is extended periodically, then
−1 if − π < x ≤ 0
n
4 sin(2j − 1) x 4 sin 3x sin 5x sin(2n − 1) x
F2n−1 ( x ) =
π ∑ 2j − 1 = π sin x +
3
+
5
+···+
2n − 1
j =1
y
1
x
−2π −π π 2π
−1
y = f ( x ) and its eleventh Fourier approximation y = F11 ( x )
Exercises 2.6 1. Compute the matrices of the orthogonal projections onto W viewed as subspaces
of the standard inner product spaces Rn or Cn .
1 1 i 1
4
(a) W = Span (b) W = Span 2 , 0 (c) W = Span 1 , i
−1
1 −1 0 1
1 1
(d) W = Span 1 , 2 (watch out, these vectors aren’t orthogonal!)
0 1
49
2. For each of the following matrices, compute the projections onto each eigenspace, verify the
resolution of the identity and the spectral decomposition.
2 1 1
1 2 0 −1 2 3 − 3i
(a) (b) (c) (d) 1 2 1
2 1 1 0 3 + 3i 5
1 1 2
6. Let T be a normal operator on a finite-dimensional complex inner product space V. Use the
spectral decomposition T = λ1 π1 + · · · + λk πk to prove:
(a) If Tn is the zero map for some n ∈ N, then T is the zero map.
(b) U ∈ L(V ) commutes with T if and only if U commutes with each π j .
(c) There exists a normal U ∈ L(V ) such that U2 = T.
(d) T is invertible if and only if λ j ̸= 0 for all j.
(e) T is a projection if and only if every λ j = 0 or 1.
(f) T = −T∗ if and only if every λ j is imaginary.
(a) Verify that the general complex (eikx ) and real (cos kx, sin kx) expressions for the Fourier
approximation are correct.
(Hint: use Euler’s formula eikx = cos kx + i sin kx)
(b) Verify the explicit expression for F2n−1 ( x ) when f ( x ) is the given step-function. What is
F2n ( x ) in this case?
50
2.7 The Singular Value Decomposition and the Pseudoinverse
Given T ∈ L(V, W ) between finite-dimensional inner product spaces, the overarching concern of this
chapter is the existence and computation of bases β, γ of V, W with two properties:
• That β, γ be orthonormal, thus facilitating easy calculation within V, W;
γ
• That the matrix [T] β be as simple as possible.
We have already addressed two special cases:
Spectral Theorem When V = W and T is normal/self-adjoint, ∃ β = γ such that [T] β is diagonal.
Schur’s Lemma When V = W and p(t) splits, ∃ β = γ such that [T] β is upper triangular.
In this section we allow V ̸= W and β ̸= γ, and obtain a result that applies to any linear map between
finite-dimensional inner product spaces.
3 1
Example 2.63. Let A = 2 −2 and consider orthonormal bases β = {v1 , v2 } of R2 and γ =
1 3
{w1 , w2 , w3 } of R3 respectively:
1 −1 1
1 1 1 −1 1 1 1
β= √ , √ γ=, √ 0 , √ −2 , √ −1
2 1 2 1
2
1 6 1 3 −1
√ 4 √0
γ
Since Av1 = 4w1 and Av2 = 2 3w2 , whence [L A ] β = 0 2 3 is almost diagonal.
0 0
Theorem 2.64 (Singular Value Decomposition). Suppose V, W are finite-dimensional inner prod-
uct spaces and that T ∈ L(V, W ) has rank r. Then:
2. Any such β is an eigenbasis of T∗ T, whence the scalars σj are uniquely determined by T: indeed
σj2 v j
( (
if j ≤ r σj v j if j ≤ r
T∗ T ( v j ) = and T∗ ( w j ) =
0 otherwise 0 otherwise
diag(σ1 , . . . , σr ) O
γ
Σ = [L A ] β = , P = ( w1 , . . . , w m ), Q = ( v1 , . . . , v n )
O O
51
Definition 2.65. The numbers σ1 , . . . , σr are the singular values of T. If T is not maximum rank, we
have additional zero singular values σr+1 = · · · = σmin(m,n) = 0.
Freedom of Choice While the singular values are uniquely determined, there is often significant free-
dom regarding the bases β and γ, particularly if any eigenspace of T∗ T has dimension ≥ 2.
Special Case (Spectral Theorem) If V = W and T is normal/self-adjoint, we may choose β to be an
eigenbasis of T, then σj is the modulus of the corresponding eigenvalue (see Exercise 7).
Rank-one decomposition If we write g j : V → F for the linear map g j : v 7→ v, v j (recall Riesz’s
Theorem), then the singular value decomposition says
r r
T= ∑ σj w j gj that is T( v ) = ∑ σj v, v j w j
j =1 j =1
3 1
Example (2.63 cont). We apply the method in the Theorem to A = 2 −2 .
1 3
T 14 2 2 2
The symmetric(!) matrix A A = 2 14 has eigenvalues σ1 = 16, σ2 = 12 and orthonormal eigen-
√
vectors v1 = √1 11 , v2 = √1 −11 . The singular values are therefore σ1 = 4, σ2 = 2 3. Now
2 2
compute
1 −1
1 1 1 1 1 1 −1 1
w1 = Av1 = √ A = √ 0 , w2 = Av2 = √ A = √ −2
σ1 4 2 1 2 1 σ2 2 6 1 6 1
1
and observe that these are orthonormal. Finally choose w3 = √13 −1 to complete the orthonormal
−1
basis γ of R3 . A singular value decomposition is therefore
1 −1 1
√ √ √
3 1 4 √ 0
2 6 3 √1 √1
!
−2 −1
A = 2 −2 = PΣQ∗ = 0 √ √
3 0 2 3
2 2
6 −1 √1
√
1 3 √1 √1 −1
√ 0 0 2 2
2 6 3
13 Since β is orthonormal, it is common to write v∗j for the map g j = ⟨ , v j ⟩ in general contexts. To those familiar with
the dual space V ∗ = L(V, F), the set { g1 , . . . , gn } = {v1∗ , . . . , v∗n } is the dual basis to β. In this course v∗j will only ever mean
the conjugate-transpose of a column vector in Fn . This discussion is part of why physicists write inner products differently!
52
Proof. 1. Since T∗ T is self-adjoint, the spectral theorem says it has an orthonormal basis of eigen-
vectors β = {v1 , . . . , vn }. If T∗ T(v j ) = λ j v j , then
(
λj if j = k
T ( v j ), T ( v k ) = T∗ T ( v j ), v k = λ j v j , v k = (∗)
0 if j ̸= k
2
whence every eigenvalue is a non-negative real number: λ j = T(v j ) ≥ 0.
Since rank T∗ T = rank T = r (Exercise 8), exactly r eigenvalues are non-zero; by reordering
basis vectors if necessary, we may assume
λ1 ≥ · · · ≥ λr > 0
If j ≤ r, define σj := λ j > 0 and w j := σ1j T(v j ), then the set {w1 , . . . , wr } is orthonormal (∗).
p
3. This is merely part 1 in the context of T = L A ∈ L(Fn , Fm ). The orthonormal bases β, γ consist
of column vectors and so the (change of co-ordinate) matrices P, Q are unitary.
Examples 2.66. 1. The matrix A = 20 32 has A T A = 4 6 with eigenvalues σ12 = 16 and σ22 = 1
6 13
and orthonormal eigenbasis
1 1 1 −2
β= √ ,√
5 2 5 1
The singular values are therefore σ1 = 4 and σ2 = 1, from which we obtain
1 1 1 2 1 −1
γ= Av1 , Av2 = √ ,√
σ1 σ2 5 1 5 2
and the decomposition
√2 −1 √1 √2
! !
√ 4 0
A = PΣQ∗ = 5 5 5 5
√1 √2 0 1 −2
√ √1
5 5 5 5
4 2 1 −1 4 2 4 1 2 −1
A= 1 2 + −2 1 =
+
5 1 5 2 5 1 2 5 −4 2
53
2. The decomposition can be very messy to find in non-matrix situations. Here is a classic example
where we simply observe the structure directly.
R1
The L2 inner product ⟨ f , g⟩ = 0 f ( x ) g( x ) dx on P2 (R) and P1 (R) admits orthonormal bases
n√ √ o n√ o
β= 5(6x2 − 6x + 1), 3(2x − 1), 1 , γ= 3(2x − 1), 1
d
Let T = dx be the derivative operator. The matrix of T is already in the required form!
√
2 15 √0 0
[T]γβ =
0 2 3 0
√ √
thus β, γ are suitable bases and the singular values of T are σ1 = 2 15 and σ2 = 2 3.
Since β, γ are orthonormal, we could have used the adjoint method to evaluate this directly,
60 0 0
Up to sign, {[v1 ] β , [v2 ] β , [v3 ] β } is therefore forced to be the standard ordered basis of R3 , con-
firming that β was the correct basis of P2 (R) all along!
The Pseudoinverse
The singular value decomposition of a map T ∈ L(V, W ) gives rise to a natural map from W back to
V. This map behaves somewhat like an inverse even when the operator is non-invertible!
Definition 2.67. Given the singular value decomposition of a rank r map T ∈ L(V, W ), the pseu-
doinverse of T is the linear map T† ∈ L(W, V ) defined by
1
(
† σj v j if j ≤ r
T (w j ) =
0 otherwise
VO = Span{v1 , . . . , vr } ⊕ Span{vr+1 , . . . , vn } { 0V }
( C
vj if j ≤ r
| {z } | {z }
† N (T)⊥ =R(T† ) N (T)=R(T† )⊥
T T( v j ) = O
0 otherwise
( T T† bijection
wj if j ≤ r
TT† (w j ) =
0 otherwise R(T)=N (T† )⊥ R(T)⊥ =N (T† )
z }| { z }| {
W = Span{w1 , . . . , wr } ⊕ Span{wr+1 , . . . , wm } { 0W }
Otherwise said, the combinations are orthogonal projections: T† T = πN
⊥ †
(T) and TT = πR(T)
54
Given the singular value decomposition of a matrix A = PΣQ∗ , its pseudoinverse is that of matrix of
(L A )† , namely
r
1 diag(σ1−1 , . . . , σr−1 ) O
A† = ∑ v j w∗j = QΣ† P∗ where Σ† =
σ
j =1 j
O O
3 1
Examples 2.68. 1. Again continuing Example 2.63, A = 2 −2 has pseudoinverse
1 3
1 1
A† = v1 w1∗ + v2 w2∗
σ1 σ2
1 1 1 −1
= √ √ (1 0 1) + √ √ √ (−1 − 2 1)
4 2 2 1 2 3 2 6 1
1 1 0 1 1 1 2 −1 1 5 4 1
= + =
8 1 0 1 12 −1 −2 1 24 1 −4 5
which is exactly what we would have found by computing A† = QΣ† P∗ . Observe that
2 1 1
1 0 1
A† A = and AA† = 1 2 1
0 1 3
1 1 2
are the orthogonal projection matrices onto the spaces N ( A)⊥ = Span{v1 , v2 } = R2 and
R( A) = Span{w1 , w2 } ≤ R3 respectively. Both spaces have dimension 2, since rank A = 2.
d
2. The pseudoinverse of T = dx : P2 (R) → P1 (R), as seen in Example 2.66.2, maps
√ 1 √ 1
T† ( 3(2x − 1)) = √ 5(6x2 − 6x + 1) = √ (6x2 − 6x + 1)
2 15 2 3
1 √ 1
T† (1 ) = √ 3(2x − 1) = x −
2 3 2
b b √
† †
=⇒ T ( a + bx ) = T a + + √ 3(2x − 1)
2 2 3
b 1 b
= a+ x− + (6x2 − 6x + 1)
2 2 12
b a b
= x2 + ax − −
2 2 6
The pseudoinverse of ‘differentiation’ therefore returns a √
particular choice of
√anti-derivative,
2
namely the unique anti-derivative of a + bx lying in Span{ 5(6x − 6x + 1), 3(2x − 1)}.
Exercises 2.7 1. Find the ingredients β, γ and the singular values for each of the following:
x
(a) T ∈ L(R2 , R3 ) where T ( yx ) = x+y
x −y
R1
(b) T : P2 (R) → P1 (R) and T( f ) = f ′′ where ⟨ f , g⟩ := 0 f ( x ) g( x ) dx
R 2π
(c) V = W = Span{1, sin x, cos x } and ⟨ f , g⟩ = 0 f ( x ) g( x ) dx, with T( f ) = f ′ + 2 f
55
2. Find a singular value decomposition of each of the matrices:
1 1 1 1 1
1 0 1
(a) 1 1 (b) (c) 1 −1 0
1 0 −1
−1 −1 1 0 −1
rem. What is γ here? What is it about the eigenvalues of A that make this possible?
(Even when T is self-adjoint, the vectors in β need not also be eigenvectors of T!)
8. In the proof of the singular value theorem we claimed that rank T∗ T = rank T. Verify this by
checking explicitly that N (T∗ T) = N (T).
(This is circular logic if you use the decomposition, so you must do without!)
9. Let V, W be finite-dimensional inner product spaces and T ∈ L(V, W ). Prove:
(a) T∗ TT† = T† TT∗ = T∗ .
(Hint: evaluate on the basis γ = {w1 , . . . , wm } in the singular value theorem)
(b) If T is injective, then T∗ T is invertible and T† = (T∗ T)−1 T∗ .
(c) If T is surjective, then TT∗ is invertible and T† = T∗ (TT∗ )−1 .
10. Consider the equation T(x) = b, where T is a linear map between finite-dimensional inner
product spaces. A least-squares solution is a vector x which minimizes ||T(x) − b||.
(a) Prove that x0 = T† (b) is a least-squares solution and that any other has the form x0 + n
for some n ∈ N (T).
(Hint: Theorem 2.61 says that x0 is a least-squares solution if and only if T(x0 ) = πR(T) (b))
(b) Prove that x0 = T† (b) has smaller norm than any other least-squares solution.
(c) If T is injective, prove that x0 = T† (b) is the unique least-squares solution.
11. Find the minimal norm solution to the first system, and the least-squares solution to the second:
3x + y = 1
(
3x + 2y + z = 9
2x − 2y = 0
x − 2y + 3z = 3
x + 3y = 0
56
Linear Regression (non-examinable)
Given a data set {(t j , y j ) : 1 ≤ j ≤ m}, we may employ the least-squares method to find a best-fitting
line y = c0 + c1 t; often used to predict y given a value of t.
The trick is to minimize the sum of the squares of the vertical deviations of the line from the data set.
t1 1 y1
m
c1
2 2 .. .. ..
∑ y j − c0 − c1 t j = ||y − Ax|| where A = . . x = c0 y = .
j =1
tm 1 ym
With the indicated notation, we recognize this as a least-squares problem. Indeed if there are at least
two distinct t-values in the data set, then rank A = 2 is maximal and we have a unique best-fitting
line with coefficients given by
c1
= A † y = ( A T A ) −1 A T y
c0
Example 2.69. Given the data set {(0, 1), (1, 1), (2, 0), (3, 2), (4, 2)}, we compute
0 1 1
1 1 1
−1
c1 30 10 15 3 1
T −1 T
A= 2 1 , y = 0 =⇒ x0 = = ( A A) A y =
=
c0 10 5 6 10 2
3 1 2
4 1 2
3
The regression line therefore has equation y = 10 ( t + 2).
The process can be applied more generally to approximate using other functions. To find the best-
fitting quadratic polynomial y = c0 + c1 t + c2 t2 , we’d instead work with
2
t1 t1 1 y1
c2
m 2
.. . . ..
A=. . .
. . x = c1 y = . =⇒ ∑ y j − c0 − c1 t − c2 t2j = ||y − Ax||2
t2 t m 1 c0 ym j =1
m
Provided we have at least three distinct values t1 , t2 , t3 , the matrix A is guaranteed to have rank 3 and
there will be a best-fitting least-squares quadratic: in this case
1
y= (15t2 − 39t + 72)
70
This curve and the best-fitting straight line are shown below.
y y
2 2
1 1
0 0
0 1 2 3 4 0 1 2 3 4
t t
57
Optional Problems Use a computer to invert any 3 × 3 matrices!
1. Check the calculation for the best-fitting least-squares quadratic in Example 2.69.
2. Find the best-fitting least-squares linear and quadratic approximations to the data set
{(1, 2), (3, 4), (5, 7), (7, 9), (9, 12)}
(a) Show that the equations A T Ax0 = A T y can be written in matrix form
∑ t2j ∑ t j c ∑ tj yj
=
∑ tj m d ∑ yj
Cov(t, y)
c= and d = y − ct
σt2
where
m m
1 1
• t=
m ∑ t j and y = m ∑ y j are the means (averages),
j =1 j =1
m
1
• σt2 =
m ∑ (t j − t)2 is the variance,
j =1
m
1
• Cov(t, y) =
m ∑ (t j − t)(y j − y) is the covariance.
j =1
58
2.8 Bilinear and Quadratic Forms
In this section we slightly generalize the idea of an inner product. Throughout, V is a vector space
over a field F; it need not be an inner product space and F can be any field (not just R or C).
Examples 2.71. 1. If V is a real inner product space, then the inner product ⟨ , ⟩ is a symmetric
bilinear form. Note that a complex inner product is not bilinear!
T 1 2
B(x, y) = x y = x1 y1 + 2x1 y2 + 2x2 y1
2 0
defines a symmetric bilinear form, though not an inner product since it isn’t positive definite;
for example B(j, j) = 0.
Definition 2.72. Let B be a bilinear form on a finite-dimensional space with basis ϵ = {v1 , . . . , vn }.
The matrix of B with respect to ϵ is the matrix [ B]ϵ = A ∈ Mn (F) with ijth entry
Aij = B(vi , v j )
Given x, y ∈ V, compute their co-ordinate vectors [x]ϵ , [y]ϵ with respect to ϵ, then
B(x, y) = [x]ϵT A[y]ϵ
The set of bilinear forms on V is therefore in bijective correspondence with Mn (F). Moreover,
T
B(y, x) = [y]ϵT A[x]ϵ = [y]ϵT A[x]ϵ = [x]ϵT AT [y]ϵ
β
Finally, if β is another basis of V, then an appeal to the change of co-ordinate matrix Qϵ yields
B(x, y) = [x]ϵT A[y]ϵ = ( Qϵβ [x] β )T A( Qϵβ [y]ϵ ) = [x] Tβ ( Qϵβ )T AQϵβ [y] β =⇒ [ B] β = ( Qϵβ )T [ B]ϵ Qϵβ
To summarize:
1. If A is the matrix of B with respect to some basis, then every other matrix of B has the form
Q T AQ for some invertible Q.
2. B is symmetric if and only if its matrix with respect to any (and all) bases is symmetric.
59
Examples 2.74. 1. Example 2.71.2 can be written
T 1 2
B(x, y) = x y = x1 y1 + 2x1 y2 + 2x2 y1 = ( x1 + 2x2 )(y1 + 2y2 ) − 4x2 y2
2 0
T
1 0 y1 + 2y2 T 1 2 1 0 1 2
= x1 + 2x2 x2 =x y
0 −4 y2 0 1 0 −4 0 1
β
If ϵ is the standard basis, then [ B] β = 10 −04 where Qϵ = 12 . It follows that Qϵβ = 1 −2
1 −2 01 0 1
from which β = 0 , 1 is a diagonalizing basis.
2. In general, we may perform a sequence of simultaneous row and column operations to diago-
(λ)
nalize any symmetric B; we require only elementary matrices Eij of type III.14 For instance:
1 −2 3
1 2 0 1 0 −3 1 0 0 1 −1 −3
(2) (−3) (1)
Qϵβ = E12 E13 E32 = 0 1 0 0 1 0 0 1 0 = 0 1 0
0 0 1 0 0 1 0 1 1 0 1 1
n 1 −1 −3 o
from which β = 0 , 1 , 0 . If you’re having trouble believing this, invert the
0 1 1
change of co-ordinate matrix and check that
Warning! If F = R then every symmetric B may be diagonalized by an orthonormal basis (for the
usual dot product on Rn ). It is very unlikely that our algorithm will produce such! The algorithm has
two main advantages over the spectral theorem: it is typically faster and it applies to vector spaces
over any field. As a disadvantage, it is highly non-unique.
14 Recall (λ)
that Eij is the identity matrix with an additional λ in the ijth entry.
(λ)
• As a column operation (right-multiplication), A 7→ AEij adds λ times the ith column to the jth .
(λ) (λ)
• As a row operation (left-multiplication), A 7→ Eji A = ( Eij )T A adds λ times the ith row to the jth .
60
Example 2.75. We diagonalize B(x, y) = x T 16 63 y = x1 y1 + 6x1 y2 + 6x2 y1 + 3x2 y2 in three ways.
1 −6
• −16 10 16 63 10 −16 = 10 −033 = [ B] β where β = 0 , . This corresponds to
1
2. If B is symmetric and F does not have characteristic two (see aside), then B is diagonalizable.
T : V → F : v 7→ B(x, v)
Aside: Characteristic two fields This means 1 + 1 = 0 in F; this holds, for instance, in the field
Z2 = {0, 1} of remainders modulo 2. We now see the importance of char F ̸= 2 to the above result.
• The proof uses the existence of x ∈ V such that B(x, x) ̸= 0. If B is non-zero, ∃u, v such that
B(u, v) ̸= 0. If both B(u, u) = 0 = B(v, v), then x = u + v does the job whenever char F ̸= 2:
• To see that the requirement isn’t idle, consider B(x, y) = x T 01 10 y on the finite vector space
surprisingly, the matrix of B is identical with respect to any basis of Z22 , whence B is symmetric
but non-diagonalizable.
61
In Example 2.75 notice how the three diagonal matrix representations have something in common:
one each of the diagonal entries are positive and negative. This is a general phenomenon:
Theorem 2.77 (Sylvester’s Law of Inertia). Suppose B is a symmetric bilinear form on a real vector
space V with diagonal matrix representation diag(λ1 , . . . , λn ). Then the number of entries λ j which
are positive/negative/zero is independent of the diagonal representation.
Definition 2.78. The signature of a symmetric bilinear form B is the triple (n+ , n− , n0 ) representing
how many positive, negative and zero terms are in any diagonal representation. Sylvester’s Law says
that the signature is an invariant of a symmetric bilinear form.
Positive-definiteness says that a real inner product on an n-dimensional space has signature (n, 0, 0).
Practitioners of relativity often work in Minkowski spacetime: R4 equipped with a signature (1, 3, 0)
bilinear form, typically
2
c 0 0 0
0 −1 0 0
B(x, y) = x T y = c2 x1 y1 − x2 y2 − x3 y3 − x4 y4
0 0 −1 0
0 0 0 −1
where c is the speed of light. Vectors are time-, space-, or light-like depending on whether B(x, x) is
positive, negative or zero. For instance x = 3c−1 e1 + 2e2 + 2e3 + e4 is light-like.
62
Quadratic Forms & Diagonalizing Conics
K : V → F : x 7→ B(x, x)
A function K : V → F is termed a quadratic form when such a symmetric bilinear form exists.
Examples 2.80. 1. If B is a real inner product, then K (v) = ⟨v, v⟩ = ||v||2 is the square of the norm.
2. Let dim V = n and A be the matrix of B with respect to a basis β. By the symmetry of A,
n
(
Aij if i = j
K (x) = x Ax = ∑ xi Aij x j = ∑ ãij xi x j where ãij =
T
i,j=1 1≤ i ≤ j ≤ n 2Aij if i ̸= j
3 −1
E.g., K (x) = 3x12 + 4x22 − 2x1 x2 corresponds to the bilinear form B(x, y) = x T y
−1 4
As a fun application, we consider the diagonalization of conics in R2 . The general non-zero conic has
equation
T a b
2 2
K (x) = ax + 2bxy + cy ↭ B(v, w) = v w
b c
λ1 t21 + λ2 t22 + µ1 t1 + µ2 t2 = η, λ1 , λ2 , µ1 , µ2 , η ∈ R
µj
If λi ̸= 0, we may complete the square via the linear transformation s j = t j + 2λ j . The canonical
forms are then recovered:
Since B is symmetric, we may take {v1 , v2 } to be an orthonormal basis of R2 , whence any conic may be
put in canonical form by applying only a rotation/reflection and translation (completing the square).
Alternatively, we could diagonalize K using our earlier algorithm; this additionally permits shear
transforms. By Sylvester’s Law, the diagonal entries will have the same number of (+, −, 0) terms
regardless of the method, so the canonical form will be unchanged.
63
Examples 2.81. 1. We describe and plot the conic with equation 7x2 + 24xy = 144.
The matrix of the associated bilinear form is 12 7 12
0 t2 y
which has orthonormal eigenbasis 4
4 t1
1 4 1 −3
3
β = { v1 , v2 } = , 22 4
5 3 5 4 3
1v 2
2
with eigenvalues (λ1 , λ2 ) = (16, −9). In the rotated ba- 1
v1
sis, this is the canonical hyperbola −1
−4 −2 −1 2 4
− 2 −2 x
t 2 2 2 3
t − 3 − −
16t21 − 9t22 = 144 ⇐⇒ 12 − 22 = 1 −4 −4
3 4
which is easily plotted. In case this is too fast, use the −4
change of co-ordinate matrix to compute directly:
1 4 −3 t1 β x 1 4x + 3y
Qϵβ = =⇒ = [x] β = Qϵ =
5 3 4 t2 y 5 −3x + 4y
√ √ 2
K (x) = ( 37 + 2)s21 − ( 37 − 2)s22 = −33 −6 3
t1
however the calculation to find η is time-consuming and the
expressions for s1 , s2 are extremely ugly.
A similar approach can be applied to higher degree quadratic equations/manifolds: e.g. ellipsoids,
paraboloids and hyperboloids in R3 .
Exercises 2.8 1. Prove that the sum of any two bilinear forms is bilinear, and that any scalar multi-
ple of a bilinear form is bilinear: thus the set of bilinear forms on V is a vector space.
(You can’t use matrices here, since V could be infinite-dimensional!)
2. Compute the matrix of the bilinear form
B(x, y) = x1 y1 − 2x1 y2 + x2 y1 − x3 y3
1
n 1 0 o
3
on R with respect to the basis β = 0 , 0 , 1 .
1 −1 0
64
3. Check that the function B( f , g) = f ′ (0) g′′ (0) is a bilinear form on the vector space of twice-
differentiable functions. Find the matrix of B with respect to β = {cos t, sin t, cos 2t, sin 2t}
when restricted to the subspace Span β.
4. For each matrix A, find a diagonal matrix D and an invertible matrix Q such that Q T AQ = D.
3 1 2
1 3
(a) (b) 1 4 0
3 2
2 0 −1
6. If F does not have characteristic 2, and K (x) = B(x, x) is a quadratic form, prove that we can
recover the bilinear form B via
1
B(x, y) = (K (x + y) − K (x) − K (y))
2
(a) x2 + y2 + xy = 6
(b) 35x2 + 120xy = 4x + 3y
11. Suppose that a non-empty, non-degenerate15 conic C in R2 has the form ax2 + 2bxy + cy2 + dx +
ey + f = 0, where at least one of a, b, c ̸= 0, and define ∆ = b2 − ac. Prove that:
15 The conic contains at least two points and cannot be factorized as a product of two straight lines: for example, the
65
3 Canonical Forms
3.1 Jordan Forms & Generalized Eigenvectors
Throughout this course we’ve concerned ourselves with variations of a general question: for a given
map T ∈ L(V ), find a basis β such that the matrix [T] β is as close to diagonal as possible. In this
chapter we see what is possible when T is non-diagonalizable.
8 4
Example 3.1. The matrix A = −−25 12 ∈ M2 (R) has characteristic equation
−10 4 2
E2 = N = Span
−25 10 5
triangular, which is better than nothing! How simple can we make this matrix? Let v2 = ( yx ), then
−8x + 4y x −10x + 4y
Av2 = =2 + = 2v2 + (−5x + 2y)v1
−25x + 12y y −25x + 10y
2 −5x + 2y
=⇒ [L A ] β =
0 2
Since v2 cannot be parallel to v1 , the only thing we cannot have is a diagonal matrix. The next best
thing is for the upper right corner be 1; for instance we could choose
2 1 2 1
β = { v1 , v2 } = , =⇒ [L A ] β =
5 3 0 2
λ 1
..
λ .
J= ..
.
1
λ
where all non-indicated entries are zero. Any 1 × 1 matrix is also a Jordan block.
A Jordan canonical form is a block-diagonal matrix diag( J1 , . . . , Jm ) where each Jk is a Jordan block.
A Jordan canonical basis for T ∈ L(V ) is a basis β of V such that [T] β is a Jordan canonical form.
If a map is diagonalizable, then any eigenbasis is Jordan canonical and the corresponding Jordan
canonical form is diagonal. What about more generally? Does every non-diagonalizable map have a
Jordan canonical basis? If so, how can we find such?
66
n 1 1 1 o
Example 3.3. It can easily be checked that β = {v1 , v2 , v3 } = 0 , 2 , 1 is a Jordan canon-
1 0 1
ical basis for
−1 2 3
A = −4 5 4
−2 1 4
4 1+3 2 0 0
Generalized Eigenvectors
Example 3.3 was easy to check, but how would we go about finding a suitable β if we were merely
given A? We brute-forced this in Example 3.1, but such is not a reasonable approach in general.
Eigenvectors get us some of the way:
The practical question is how to fill out a Jordan canonical basis once we have a maximal independent
set of eigenvectors. We now define the necessary objects.
k ∈N
As with eigenspaces, the generalized eigenspaces of A ∈ Mn (F) are those of the map L A ∈ L(Fn ).
It is easy to check that our earlier Jordan canonical bases consist of generalized eigenvectors.
Example 3.1: We have one eigenvalue λ = 2. Since ( A − 2I )2 = 00 00 is the zero matrix, every
K3 = Span{v1 , v2 }, K2 = E2 = Span{v3 }
67
In order to easily compute generalized eigenspaces, it is useful to invoke the main result of this
section. We postpone the proof for a while due to its meatiness.
Theorem 3.5. Suppose that the characteristic polynomial of T ∈ L(V ) splits over F:
p ( t ) = ( λ 1 − t ) m1 · · · ( λ k − t ) m k
where the λ j are the distinct eigenvalues of T with algebraic multiplicities m j . Then:
Compare this with the statement on diagonalizability from the start of the course.
With regard to part 2; we shall eventually be able to choose this to be a Jordan canonical basis. In
conclusion: a map has a Jordan canonical basis if and only if its characteristic polynomial splits.
R3 = K2 ⊕ K3
5 2 −1
2. We find the generalized eigenspaces of the matrix A = 00 0
9 6 −1
The characteristic polynomial is
5−t −1
p(t) = det( A − λI ) = −t = −t(t2 − 5t + t − 5 + 9) = −(0 − t)1 (2 − t)2
9 −1 − t
1
• λ = 0 has multiplicity 1; indeed K0 = N ( A − 0I )1 = N ( A) = Span −1 is just the
3
eigenspace E0 .
• λ = 2 has multiplicity 2,
2
3 2 −1 0 −4 0 1 0
68
Properties of Generalized Eigenspaces and the Proof of Theorem 3.5
A lot of work is required to justify our main result. Feel free to skip the proofs at first reading.
2. Kλ is T-invariant.
(T − µI)(x) = T(x) − µx ∈ Kλ
whence Kλ is (T − µI)-invariant.
Suppose, for a contradiction, that T − µI is not injective on Kλ . Then
Let k ∈ N be minimal such that (T − λI)k (y) = 0 and let z = (T − λI)k−1 (y). Plainly
z ̸= 0, for otherwise k is not minimal. Moreover,
69
Now to prove Theorem 3.5: remember that the characteristic polynomial of T is assumed to split.
(Parts 1(b) and 2) We prove simultaneously by induction on the number of distinct eigenvalues of T.
(Base case) If T has only one eigenvalue, then p(t) = (λ − t)m . Another appeal to Cayley–
Hamilton says (T − λI)m (x) = 0 for all x ∈ V. Thus V = Kλ and dim Kλ = m.
(Induction step) Fix k and suppose the results hold for maps with k distinct eigenvalues. Let T
have distinct eigenvalues λ1 , . . . , λk , µ, with multiplicities m1 , . . . , mk , m respectively. Define16
W = R(T − µI)m
The subspace W has the following properties, the first two of which we leave as exercises:
• W is T-invariant.
• W ∩ Kµ = {0} so that µ is not an eigenvalue of the restriction TW .
• Each Kλ j ≤ W: since (T − µI)Kλ is an isomorphism (Lemma part 3), we can invert,
j
m
1
x ∈ Kλ j =⇒ x = (T − µI)m (T − µI)− Kλ (x) ∈ R(T − µI)m = W
j
Since W ∩ Kµ = {0} it is enough finally to use the rank–nullity theorem and count dimensions:
k
dim V = rank(T − µI)m + null(T − µI)m = dim W + dim Kµ = ∑ dim Kλ j
+ dim Kµ
j =1
(∗)
≤ m1 + · · · + mk + m = deg( p(t)) = dim V
The inequality is thus an equality; each dim Kλ j = m j and dim Kµ = m. We conclude that
V = K λ1 ⊕ · · · ⊕ K λ k ⊕ K µ
which completes the induction step and thus the proof. Whew!
16 This is yet another argument where we consider a suitable subspace to which we can apply an induction hypothesis;
recall the spectral theorem, Schur’s lemma, bilinear form diagonalization, etc. Theorem 3.12 will provide one more!
70
Cycles of Generalized Eigenvectors
By Theorem 3.5, for every linear map whose characteristic polynomial splits there exists generalized
eigenbasis. This isn’t the same as a Jordan canonical basis, but we’re very close!
5 1 0
Example 3.8. The matrix A = 0 5 1 ∈ M3 (R) is a single Jordan block, whence there is a single
005
generalized eigenspace K5 = R3 and the standard basis ϵ = {e1 , e2 , e3 } is Jordan canonical.
The crucial observation for what follows is that one of these vectors e3 generates the others via re-
peated applications of A − 5I:
e2 = ( A − 5I )e3 , e1 = ( A − 5I )e2 = ( A − 5I )2 e3
where the generator x ∈ Kλ is non-zero and k is minimal such that (T − λI)k (x) = 0.
2. Span β x is T-invariant. With respect to β x , the matrix of the restriction of T is the k × k Jordan
λ 1 .
!
.
λ . .
block [TSpan β x ] β x = .. 1 .
λ
In what follows, it will be useful to consider the linear map U = T − λI. Note the following:
• The nullspace of U is the eigenspace: N (U) = Eλ ≤ Kλ .
• T commutes with U: that is TU = UT.
• β x = {Uk−1 (x), . . . , U(x), x}; that is, Span β x = ⟨x⟩ is the U-cyclic subspace generated by x.
a0 Uk−1 (x) = 0 =⇒ a0 = 0
Now feed the same combination to Uk−2 , etc., to see that all coefficients a j = 0.
71
The basic approach to finding a Jordan canonical basis is to find the generalized eigenspaces and play
with cycles until you find a basis for each Kλ . Many choices of canonical basis exist for a given map!
We’ll consider a more systematic method in the next section.
1 0 2
Examples 3.11. 1. The characteristic polynomial of A = 0 1 6 ∈ M3 (R) splits:
6 −2 1
1−t 6 0 1−t
p ( t ) = (1 − t ) +2 = (1 − t) (1 − t)2 + 12 − 12 = (1 − t)3
−2 1 − t 6 −2
With only one eigenvalue we see that K1 = R3 . Simply choose any vector in R3 and see what
U = A − I does to it! For instance, with x = e1 ,
n 1 1 1 o n 12 0 1 o
β x = U2 0 , U 0 , 0 = 36 , 0 , 0
0 0 0 0 6 0
d
3. Let T = on P3 (R). With respect to the standard basis ϵ = {1, x, x2 , x3 },
dx
0 1 0 0
A = [T]ϵ = 00 00 20 03
0000
72
Our final results state that this process works generally.
Theorem 3.12. Let T ∈ L(V ) have an eigenvalue λ. If dim Kλ < ∞, then there exists a basis
β λ = β x1 ∪ · · · ∪ β xn of Kλ consisting of finitely many linearly independent cycles.
Intuition suggests that we create cycles β x j by starting with a basis of the eigenspace Eλ and extending
backwards: for each x, if x = (T − λI)(y), then x ∈ β y ; now repeat until you have a maximum length
cycle. This is essentially what we do, though a sneaky induction is required to make sure we keep
track of everything and guarantee that the result really is a basis of Kλ .
(i) For the induction hypothesis, suppose every generalized eigenspace with dimension < m (for
any linear map!) has a basis consisting of independent cycles of generalized eigenvectors.
(ii) Define W = R(U) ∩ Eλ : that is
(
U(w) = 0 and
w ∈ W ⇐⇒
w = U(v) for some v ∈ Kλ
Let k = dim W, choose a complementary subspace X such that Eλ = W ⊕ X and select a basis
{xk+1 , . . . , xn } of X. If k = 0, the induction step is finished (why?). Otherwise we continue. . .
(iii) The calculation in the proof of Lemma 3.10 (take j = 1) shows that R(U) is T-invariant; it is
therefore the single generalized eigenspace K̃λ of TR(U) .
(iv) By the rank–nullity theorem,
By the induction hypothesis, R(U) has a basis of independent cycles. Since the last non-zero
element in each cycle is an eigenvector, this basis consists of k distinct cycles β x̂1 ∪ · · · ∪ β x̂k
whose terminal vectors form a basis of W.
(v) Since each x̂ j ∈ R(U), there exist vectors x1 , . . . , xk such that x̂ j = U(x j ). Including the length-
one cycles generated by the basis of X, the cycles β x1 , . . . , β xn now contain
vectors. We leave as an exercise the verification that these vectors are linearly independent.
Corollary 3.13. Suppose that the characteristic polynomial of T ∈ L(V ) splits (necessarily dim V <
∞). Then there exists a Jordan canonical basis, namely the union of bases β λ from Theorem 3.12.
Proof. By Theorem 3.5, V is the direct sum of generalized eigenspaces. By the previous result, each
Kλ has a basis β λ consisting of finitely many cycles. By Lemma 3.10, the matrix of TKλ has Jordan
canonical form with respect to β λ . It follows that β = β λ is a Jordan canonical basis for T.
S
73
Exercises 3.1 1. For each matrix, find the generalized eigenspaces Kλ , find bases consisting of
unions of disjoint cycles of generalized eigenvectors, and thus find a Jordan canonical form J
and invertible Q so that the matrix may be expressed as QJQ−1 .
11 −4 −5
1 1 1 2
(a) A = (b) B = (c) C = 21 −8 −11
−1 3 3 2
3 −1 0
2 1 0 0
0 2 1 0
(d) D = 0 0 3 0
0 1 −1 3
2. If β = {v1 , . . . , vn } is a Jordan canonical basis, what can you say about v1 ? Briefly explain why
the linear map L A ∈ L(R2 ) where A = 10 −01 has no Jordan canonical form.
(a) In step (ii), suppose dim W = k = 0. Explain why {x1 , . . . , xn } is in fact a basis of Kλ , so
that the rest of the proof is unnecessary.
(b) In step (v), prove that the m vectors in the cycles β x1 , . . . , β xn are linearly independent.
(Hint: model your argument on part 1 of Lemma 3.10)
74
3.2 Cycle Patterns and the Dot Diagram
In this section we obtain a useful result that helps us compute Jordan forms more efficiently and
systematically. To give us some clues how to proceed, here is a lengthy example.
Example 3.14. Precisely three Jordan canonical forms A, B, C ∈ M3 (R) correspond to the charac-
teristic polynomial p(t) = (5 − t)3 :
5 0 0 5 1 0 5 1 0
A= 0 5 0
B= 0 5
0 C = 0 5 1
0 0 5 0 0 5 0 0 5
In all three cases the standard basis β = {e1 , e2 , e3 } is Jordan canonical, so how do we distinguish
things? By considering the number and lengths of the cycles of generalized eigenvectors.
β e1 = { e 1 } , β e2 = { e 2 } , β e3 = {e3 } =⇒ β = β e1 ∪ β e2 ∪ β e3 = {e1 , e2 , e3 }
a b
v = b =⇒ ( B − 5I )v = 0 =⇒ ( B − 5I )2 v = 0
c 0
a b c
v = b =⇒ (C − 5I )v = c , (C − 5I )2 v = 0 , (C − 5I )3 v = 0
c 0 0
generates a cycle with maximum length two provided c ̸= 0. Indeed this cycle is a Jordan basis,
so one cycle is all we need:
β = β e3 = (C − 5I )2 e3 , (C − 5I )e3 , e3 = {e1 , e2 , e3 }
Why is the example relevant? Suppose that dimR V = 3 and that T ∈ L(V ) has characteristic polyno-
mial p(t) = (5 − t)3 . Theorem 3.12 tells us that T has a Jordan canonical form, and that is is moreover
one of the above matrices A, B, C. Our goal is to develop a method whereby the pattern of cycle-
lengths can be determined, thus allowing us to be able to discern which Jordan form is correct. As a
side-effect, this will also demonstrate that the pattern of cycle lengths for a given T is independent of
the Jordan basis so that, up to some reasonable restriction, the Jordan form of T is unique. To aid us
in this endeavor, we require some terminology. . .
75
Definition 3.15. Let V be finite dimensional and Kλ a generalized eigenspace of T ∈ L(V ). Follow-
ing the Theorem 3.12, assume that β λ = β x1 ∪ · · · ∪ β xn is a Jordan canonical basis of TKλ , where the
cycles are arranged in non-increasing length. That is:
2. k1 ≥ k2 ≥ · · · ≥ k n
The dot diagram of TKλ is a representation of the elements of β λ , one dot for each vector: the jth column
represents the elements of β x j arranged vertically with x j at the bottom.
Given a linear map, our eventual goal is to identify the dot diagram as an intermediate step in the
computation of a Jordan basis. First, however, we observe how the conversion of dot diagrams to a
Jordan form is essentially trivial.
Example 3.16. Suppose dim V = 14 and that T ∈ L(V ) has the following eigenvalues and dot
diagrams:
λ1 = −4 λ2 = 7 λ3 = 12
• • • • • • • • •
• • • •
•
T has a Jordan canonical basis β with respect to which its Jordan canonical form is
−4 1
0 −4
−4 1
0 −4
−4
−4
7 1 0
[T] β =
0 7 1
0 0 7
7 1
0 7
12
12
12
Note how the sizes of the Jordan blocks are non-increasing within each eigenvalue. For instance, for
λ1 = −4, the sequence of cycle lengths (k j ) is 2 ≥ 2 ≥ 1 ≥ 1.
76
Theorem 3.17. Suppose β λ is a Jordan canonical basis of TKλ as described in Definition 3.15, and
suppose the ith row of the dot diagram has ri entries. Then:
1. For each r ∈ N, the vectors associated to the dots in the first r rows form a basis of N (T − λI)r .
3. When i > 1, ri = null(T − λI)i − null(T − λI)i−1 = rank(T − λI)i−1 − rank(T − λI)i
Example (3.14 cont). We describe the dot diagrams of the three matrices A, B, C, along with the
corresponding vectors in the Jordan canonical basis β and the values ri .
A: • • • x1 x2 x3 e1 e2 e3
Since A − 5I is the zero matrix, r1 = 3 − rank( A − 5I ) = 3. The dot diagram has one row,
corresponding to three independent cycles of length one: β = β e1 ∪ β e2 ∪ β e3 .
B: • • ( B − 5I )x1 x2 e1 e3
• x1 e2
0 1 0
Row 1: B − 5I = 0 0 0 =⇒ rank( B − 5I ) = 1 and r1 = 3 − 1 = 2. The first row {e1 , e3 } is
000
a basis of E5 = N ( B − 5I ).
Row 2: ( B − 5I )2 is the zero matrix, whence r2 = rank( B − 5I ) − rank( B − 5I )2 = 1 − 0 = 1.
The dot diagram corresponds to β = β e2 ∪ β e3 = {e1 , e2 } ∪ {e3 }.
C: • (C − 5I )2 x1 e1
• (C − 5I )x1 e2
• x1 e3
0 1 0
Row 1: C − 5I = 001 =⇒ r1 = 3 − rank(C − 5I ) = 1. The first row {e1 } is a basis of
000
E5 = N (C − 5I ).
0 0 1
Row 2: (C − 5I )2 = 000 =⇒ r2 = rank(C − 5I ) − rank(C − 5I )2 = 2 − 1 = 1. The first
000
two rows {e1 , e2 } form a basis of N (C − 5I )2 .
Row 3: (C − 5I )3 is the zero matrix, whence r3 = rank(C − 5I )2 − rank(C − 5I )3 = 1 − 0 = 1.
Proof. As previously, let U = T − λI.
1. Since each dot represents a basis vector U p (v j ), any v ∈ Kλ may be written uniquely as a linear
combination of the dots. Applying U simply moves all the dots up a row and all dots in the top
row to 0. It follows that v ∈ N (Ur ) ⇐⇒ it lies in the span of the first r rows. Since the dots
are linearly independent, they form a basis.
2. By part 1, r1 = dim N (U) = null(T − λI) = dim V − rank(T − λI).
3. More generally,
ri = (r1 + · · · + ri ) − (r1 + · · · + ri−1 ) = dim N (Ui ) − dim N (Ui−1 )
= null(Ui ) − null(Ui−1 ) = rank(T − λI)i−1 − rank(T − λI)i
77
Since the ranks of maps (T − λI)i are independent of basis, so also is the dot diagram. . .
Corollary 3.18. For any eigenvalue λ, the dot diagram is uniquely determined by T and λ. If we
list Jordan blocks for each eigenspace in non-increasing order, then the Jordan form of a linear map
is unique up to the order of the eigenvalues.
We now have a slightly more systematic method for finding Jordan canonical bases.
6 2 −4 −6
Example 3.19. The matrix A = 00 30 03 00 has characteristic equation
2 1 −2 −1
6−t −6
p ( t ) = (3 − t )2 = (2 − t)(3 − t)3
2 −1 − t
Since we now have three dots (equalling dim K3 ), the algorithm terminates and the dot diagram
for K3 is • •
•
For the single dot in the second row, we choose something in N ( A − 3I )2 which isn’t an eigen-
vector; perhaps the simplest choice is x1 = e2 , which yields the two-cycle
2 0
β x1 = {( A − 3I )x1 , x1 } = 0 , 1
0 0
1 0
0
To complete the first row, choose any eigenvector to complete the span: for instance x2 = 2 .
1
0
Other choices are available! For instance, if we’d chosen the two-cycle generated by x1 = e3 , we’d
obtain a different Jordan basis but the same canonical form J:
3 −4 0 0 3 −4 0 0 2 0 0 0 3 −4 0 0 −1
β̃ = 0 ,
0
0
0 , 01 , 21 , A= 0 0 02
0 0 11
0310
0030
0 0 02
0 0 11
2 −2 0 0 2 −2 0 0 0003 2 −2 0 0
78
We do one final example for a non-matrix map.
∂f ∂f
Example 3.20. Let ϵ = {1, x, y, x2 , y2 , xy} and define T f ( x, y) = 2 ∂x − ∂y as a linear operator on
There is only one eigenvalue λ = 0 and therefore one generalized eigenspace K0 = V. We could keep
working with matrices, but it is easy to translate the nullspaces of the matrices back to subspaces of
V, from which the necessary data can be read off:
We now have five dots; since dim K0 = 6, the last row has one, and the dot diagram is • • •
• •
•
2 2
Since the first two rows span N (T ), we may choose any f 1 ̸∈ N (T ) for the final dot: f 1 = xy is
suitable, from which the first column of the dot diagram becomes
T2 ( xy) • • −4 • •
T( xy) • 2y − x •
xy xy
Now choose the second dot on the second row to be anything in N (T2 ) such that the first two rows
span N (T2 ): this time f 2 = x2 − 4y2 is suitable, and the diagram becomes:
T2 ( xy) T( x2 − 4y2 ) • −4 4x + 8y •
T( xy) x2 − 4y2 2y − x x2 − 4y2
xy xy
The final dot is now chosen so that the first row spans N (T): this time f 3 = x2 + 4y2 + 4xy works.
The result is a Jordan canonical basis and form for T
0 1 0
0 0 1
0 0 0
2 2 2 2
β = −4, 2y − x, xy, 4x + 8y, x − 4y , x + 4y + 4xy , J = [T] β =
0 1
0 0
0
As previously, many other choices of cycle-generators f 1 , f 2 , f 3 are available; while these result in
different Jordan canonical bases, Corollary 3.18 assures us that we’ll always obtain the same canonical
form J.
79
Exercises 3.2 1. Let T be a linear operator whose characteristic polynomial splits. Suppose the
eigenvalues and the dot diagrams for the generalized eigenspaces Kλi are as follows:
λ1 = 2 λ2 = 4 λ3 = −3
• • • • • • •
• • •
• •
2 1 0
0 2 1
0 0 2
J= 2 1
0 2
3
3
3. For each matrix A find a Jordan canonical form and an invertible Q such that A = QJQ−1 .
0 −3 1 2
−3 3 −2 0 1 −1
−2 1 −1 2
(a) A = −7 6 −3 (b) A = −4 4 −2 (c) A =
−2 1 −1 2
1 −1 2 −2 1 1 −2 −3 1 4
4. For each linear operator T, find a Jordan canonical form J and basis β:
5. (Generalized Eigenvector Method for ODEs) Let A ∈ Mn (R) have an eigenvalue λ and sup-
pose β v0 = {vk−1 , . . . , v1 , v0 } is a cycle of generalized eigenvectors for this eigenvalue. Show
that
k −1
(
b0′ (t) = 0, and
x(t) := eλt ∑ b j (t)v j satisfies x′ (t) = Ax ⇐⇒
j =0 b′j (t) = b j−1 (t) when j ≥ 1
3 1 0 0
0 3 1 0
x′ = x
0 0 3 0
0 0 0 2
80
3.3 The Rational Canonical Form (non-examinable)
We finish the course with a very quick discussion of what can be done when the characteristic poly-
nomial of a linear map does not split. In such a situation, we may assume that
m m
p(t) = (−1)n ϕ1 (t) 1 · · · ϕk (t) k (∗)
where each ϕj (t) is an irreducible monic polynomial over the field.
Example 3.21. The following matrix has characteristic equation p(t) = (t2 + 1)2 (3 − t)
0 −1 0 0 0 !
1 0 0 0 0
A= 0 0 0 −1 0 ∈ M5 (R)
0 0 1 0 0
0 0 0 0 3
This doesn’t split over R since t2 + 1 = 0 has no real roots. It is, however, diagonalizable over C.
Definition 3.22. The monic polynomial tk + ak−1 tk−1 + · · · + a0 has companion matrix
0 0 0 ··· 0 − a0
1 0 0 0 − a1
0 1 0 0 − a2
.
. ..
.
.. (when k = 1, this is the 1 × 1 matrix (− a0 ))
. .
0 0 0 0 − a k −2
0 0 0 ··· 1 − a k −1
If T ∈ L(V ) has characteristic polynomial (∗), then a rational canonical basis is a basis for which
C1 O · · · O
O C2 O
[T] β = ..
.. ..
. . .
O O ··· Cr
where each Cj is a companion matrix of some (ϕj (t))s j where s j ≤ m j . We call [T] β a rational canonical
form of T.
Theorem 3.23. A rational canonical basis exists for any linear operator T on a finite-dimensional
vector space V. The canonical form is unique up to ordering of companion matrices.
Example (3.21 cont). The matrix A is already in rational canonical form: the standard basis is rational
canonical with three companion blocks,
C1 = C2 = 01 −01 , C3 = (3)
81
Example 3.24. Let A = 4 −3 ∈ M2 (R). Its characteristic polynomial
2 2
p(t) = t2 − 6t + 14 = (t − 3)2 + 5
doesn’t split over R and so it has no eigenvalues. Instead simply pick a vector, x = 1 (say), define
0
4
y = Ax = 2 , let β = {x, y} and observe that
0 −14
[L A ] β =
1 6
is a rational canonical form. Indeed this works for any x ̸= 0: if β := {x, Ax}, then Cayley–Hamilton
forces
0 −14
2
A x = (6A − 14I )x = −14x + 6Ax =⇒ [L A ] β =
1 6
A systematic approach to finding rational canonical forms is similar to that for Jordan forms: for each
m
irreducible divisor of p(t), the subspace Kϕ = N ϕ(T) plays a role analogous to a generalized
eigenspace; indeed Kλ = Kϕ for the linear irreducible factor ϕ(t) = λ − t!
We finish with two examples; hopefully the approach is intuitive, even without theoretical justifica-
tion.
then there are two possible rational canonical forms; here is an example of each.
0 −15 0 −9
!
2 2 −3 0
1. If A = 0 −9 0 −6 , then ϕ( A) = O is the zero matrix, whence N (ϕ( A)) = R4 . Since ϕ(t)
−3 0 5 2
isn’t the full characteristic polynomial, we expect there to be two independent cycles of length
two in the canonical basis. Start with something simple as a guess:
1 0 −3
0
x1 = 0 =⇒ x2 = Ax1 = 2 =⇒ Ax2 = 40 = −3x1 + 2x2
0
0 −3 −6
Over C, this example is diagonalizable. Indeed each of the 2 × 2 companion matrices is diago-
nalizable over C.
82
0 0 2 1
1 1 −1 −1
2. Let B = 0 1 −2 −16 . This time
00 1 5
3 2 −7 −29
!
3
11
2 −1 1 4 13
ϕ( B) = B − 2B + 3I = 1 −3 −6 −17 =⇒ N (ϕ( B)) = Span −1
1 , −02
0 1 1 2 0 1
Anything not in this span will suffice as a generator for a single cycle of length four: e.g.,
1 0 0 2
x1 = 0 , x2 = Bx1 = 0 , x3 = Bx2 = 1 , x4 = Bx3 = −01
0 1 1
0 0 0 1
−1 1 0 0 2
2 0 1 1 0
Bx4 = −14 = −9 0 + 12 0 − 10 1 +4 −1
4 0 0 0 1
83