0% found this document useful (0 votes)
7 views15 pages

Eigenvalues

Eigenvallue and eigenvector and its application
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views15 pages

Eigenvalues

Eigenvallue and eigenvector and its application
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Eigenvalues, Eigenvectors, and Applications

We will look at things related to eigenvalues and eigenvectors in this set of notes, which should cover most
of the material in Chapter 7. The non-examinable sections are marked with an asterisk.

Contents

1 Introduction to eigenvalues 1

2 Matrix functions* 3

3 Eigenspace and multiplicities 4

4 Geometric point of view 5

5 Complex eigenvalues 6

6 Trace and determinant 8

7 Examples 9

8 Non-diagonalizable matrices* 11

9 Fast eigenvalue algorithms* 14

1 Introduction to eigenvalues

Example. Consider the Fibonacci numbers defined by


F0 = 0, F1 = 1, Fn+1 = Fn + Fn−1
We will try to find a general formula for Fn .
(Step 1). This can be reformulated as computing An , where
 
0 1
A=
1 1
x  y 
In fact, observe that A y = x+y , so it’s easy to check that
   
n 0 Fn
A =
1 Fn+1
 dn 0 
(Step 2). If A were equal to a diagonal matrix d01 d02 , then An = 01 dn . But we are not so lucky. The
 
2
next best thing is A = SDS −1 , where D is diagonal, since then
An = (SDS −1 )(SDS −1 ) · · · (SDS −1 ) = SD(S −1 S)D(S −1 S) · · · DS −1 = SDn S −1
We observed this in the change of basis section. A physicist would say that A is decoupled
in the basis defined by S, because A acts on the two vectors independently of each other. A
mathematician is more straightforward and says A is diagonalized by S.
(Step 3). How do we find S and D? Let S = [~v1 ~v2 ], and let D = d01 d02 , then
 

AS = [A~v1 A~v2 ], SD = [d1~v1 d2~v2 ]


So we need A~vi = di~vi for i = 1, 2. Moreover, we need ~v1 and ~v2 to be linearly independent, so
in particular, they cannot be zero. In general, if we have A~v = λ~v , where λ is a scalar and ~v 6= 0,
then λ is an eigenvalue of A, and ~v is an associated eigenvector.
(Step 4). Here comes the trick: A~v = λ~v is equivalent to (A − λI)~v = 0. If ~v 6= 0, then it is in ker(A − λI),
so A − λI is not invertible. We can tell when this happens by computing a determinant.
 
−λ 1
p(λ) := det(A − λI) = det = λ2 − λ − 1
1 1−λ
This is the characteristic√polynomial of A. Its roots are exactly the eigenvalues of A. In our
case, they are λ± = 12 (1 ± 5).
(Step 5). Now, finding eigenvectors just comes down to finding non-trivial elements in a kernel. For λ+ , the
relevant matrix is 1 √ 
(−1 − 5) 1√
A − λ+ I = 2
1
1 2 (1 − 5)
 1 √ T
Observe that the (column) vector ~v+ = 1 2 (1 + 5) satisfies the equation given by the first
row, so it must be in the kernel, because the kernel is non-trivial by construction. You can easily
verify it by computation as well. We have found our first eigenvector. Similarly, we can find that
√ T
an eigenvector for λ− is ~v− = 1 12 (1 − 5) .



(Step 6). Now let S = [~v+ ~v− ], then det S = − 5 6= 0, so S is invertible. By construction, we have
 
λ 0
AS = S +
0 λ−
so we finally get
 n    n 1
√ 
1 2 (1 −
An = S +
λ 0
S −1
= − √
1√ 1√ λ+ 0 √5) −1
0 λn− 1
5 2 (1 + 5)
1
2 (1 − 5) 0 λn− − 12 (1 + 5) 1
Therefore,
 1 + √5 n  1 − √5  n
   n   !
0
n 1 λ 0 −1 1
Fn = [1 0]A = − √ [1 1] + =√ −
1 5 0 λn− 1 5 2 2

It may not be clear a priori that this is always an integer, but see remark (1).

We now make three tangential remarks:


1. Observe
√ that we got from A~v+ = λ+~v+ to A~v− = λ−~v− by simply replacing the signs of all occurrences

of√ 5. This turns out to be an extremely profound observation: there is a symmetry between 5 and
− 5, and an equation using only integers which holds for one also holds for the other.
√ √
It also√holds√if 5 is replaced by r for any rational number r which √ is not a √
square. If instead
we have 2 + 3, we have √ four
√ symmetries:
√ we can switch
√ the signs of 2 and 3 independently.
But what happens with 2 + 3 + 6 ? What about 3 2 ? This line of thought led to Galois theory,
and one of the first achievements is the insolvability of the general quintic using radicals. I can go on.
Please let me know if you want to learn more.
2. Since |λ− |√< 1, the second term is exponentially smaller than the first term, so Fn is the integer closest
n
to √15 1+2 5 .

3. In practice, this is not the best way to compute the Fibonacci numbers, since it involves a lot of floating
point arithmetic. It is much better to compute An directly by repeated squaring.
In summary, we did the following to diagonalize a general n × n matrix A.
(Step 1). Compute the characteristic polynomial

pA (λ) = det(A − λIn )

This is a polynomial of degree n in λ with leading coefficient (−1)n .


(Step 2). A generic polynomial of degree n has n distinct roots. Let’s pretend that is the case and let
λ1 , · · · , λn be the roots of pA . These are the eigenvalues of A.
(Step 3). For each λi , by construction, ker(A − λi In ) is non-trivial, so we can pick a non-zero vector ~vi in
it. This is an eigenvector associated to λi .
(Step 4). If S = [~v1 · · · ~vn ] is invertible, then we are done: A = SDS −1 , where D is the diagonal matrix
with diagonal entries λ1 , · · · , λn . We will see later that if the eigenvalues are distinct, then S is
automatically invertible.
In the case when the characteristic polynomial has repeated roots, the situation is a lot more subtle, and
we will spend some time on that. For now, consider the difference between the matrices
   
1 0 1 1
A= , B=
0 1 0 1
and perhaps try to follow the above steps.

2 Matrix functions*

Let A be an n × n matrix. Let p(λ) = c0 + c1 λ + · · · + cn λn be a polynomial in λ, then we can define

p(A) := In + c1 A + c2 A2 + · · · + cn An

So In takes the role of 1 here. Observe that while two matrices need not commute in general, we always
have p(A)q(A) = q(A)p(A) for any polynomials p and q.
But we don’t need to stop there. We can consider a power series instead. First, let’s take
1 2 1
exp(λ) = 1 + λ + λ + · · · + λn + · · ·
2! n!
Recall from calculus that this converges for all λ, so one can ask if the following series
? 1 1
eA = exp(A) = In + A + A2 + · · · + An + · · ·
2 n!
makes sense. One way to make this question precise is to ask if each entry in the truncated series tends to
a limit. It is actually possible to show that this is true by directly estimating the entries, but inspired by
the previous section, we can first try to diagonalize A. First, suppose D is diagonal with diagonal entries
λ1 , · · · , λn , then it is clear that  λ 
e 1
eD = 
 .. 
. 
eλn
Now, if A = SDS −1 , then the same cancellation argument applied to each term gives eA = SeD S −1 , so we
have convergence, and moreover a practical way of computing eA .
Now consider the other important power series
λ λn
log(1 + λ) = λ − + · · · + (−1)n+1 + ···
2 n
The same reduction to the diagonal case works, but in the diagonal case, we have to contend with the radius
of convergence. If A has an eigenvalue greater than 1, then the series does not converge for that eigenvalue,
so log(1 + A) cannot a priori be defined. We see that an important measurement of the “size” of A is its
maximal eigenvalue,1 which is sometimes called the spectral radius of A. The above discussion can be
formulated as: a power series converges at a matrix A if the spectral radius of A is less than the radius of
convergence of the power series.
1 We will see other ways of measuring the size of a matrix, for example, the maximum of all entries is another good way.

Later, the maximum singular value is another way. Each is useful in some circumstances, and it is interesting to study how
they are related to each other.
Theorem. If the spectral radius of A is less than 1, then I − A is invertible.
1
Proof. The function 1−x has a power series expansion

1
= 1 + x + x2 + · · ·
1−x
which converges when |x| < 1. Therefore, if the spectral radius of A is at most 1, then I − A is invertible
with inverse I + A + A2 + · · · .
Applying functions to matrices has many applications. Suppose we have a system of differential equations
(
ẋ(t) = y(t)
, x(0) = y(0) = 1
ẏ(t) = x(t) + y(t)
 x(t)   
We can put it in matrix form by letting ~x(t) = y(t)
, A = 01 11 , then
 
d~x(t) 1
= A~x(t), ~x(0) = ~x0 :=
dt 1

The solution has the same form as if ~x were just a regular function.

~x(t) = eAt ~x0

From this, one sees that the eigenvalues of A should control the long-term behaviour of the solution. There
are also far-reaching generalizations of this to infinite dimensional spaces, allowing us to evaluate functions
at general operators.

3 Eigenspace and multiplicities

Before the interlude, the following matrices were considered


   
1 0 1 1
A= , B=
0 1 0 1

In both cases, the characteristic polynomial is p(λ) = (λ − 1)2 , so there is a unique eigenvalue. We can
compute that dim ker(A − I2 ) = 2 and dim ker(B − I2 ) = 1. In the second case, we cannot find enough
eigenvectors to form a basis of R2 , so B is not diagonalizable. On the other hand, A is clearly diagonalizable.
This section will introduce some further definitions, enough to state the main theorem on diagonalizability.
Definition. Let A be an n × n matrix. Let λ be a scalar. The λ-eigenspace of A is
(A)
Eλ = ker(A − λIn )
(A)
Its dimension is the geometric multiplicity of λ, denoted by mgeom (λ).
Definition. Let A be an n × n matrix. Let λ be a scalar. The algebraic multiplicity of λ is the unique
(A)
non-negative integer malg (λ) such that
(A)
pA (x) = (x − λ)malg (λ) g(x)

where pA is the characteristic polynomial of A, and g(λ) 6= 0.


When we say roots of a polynomial counted with multiplicity, we mean each root contributes its algebraic
multiplicity. For example, p(λ) = (λ − 1)2 (λ + 1)(λ − 2) has three positive roots, counted with multiplicity.
It follows form definition that the following statements are equivalent
1. λ is an eigenvalue of A.
(A)
2. Eλ 6= {0}.
(A)
3. mgeom (λ) > 0.
(A)
4. malg (λ) > 0.
(A)
Also observe that E0 = ker(A).
(A) (B) (A) (B)
For our examples, we have mgeom (1) = 2 and mgeom (1) = 1. But malg (1) = malg (1) = 2. The fact
that A is diagonalizable but B is not is explained by the following theorem, which is the main examinable
theorem of this chapter.
Theorem. Let A be an n × n matrix.
(A) (A)
1. For any λ, we have mgeom (λ) ≤ malg (λ).
2. The matrix A is diagonalizable over C if and only if equality holds for all λ ∈ C.
3. The matrix A is diagonalizable over R if and only if it is diagonalizable over C and pA (λ) has n real
roots, counted with multiplicity.
This is quite an elaborate theorem, and the proof will occupy much of the next few sections.

4 Geometric point of view

Recall that A is diagonalizable if there exists a diagonal matrix D and an invertible matrix S such that
A = SDS −1 . We now reinterpret everything using linear transformations.
Definition. Let T : Rn → Rn be a linear transformation. An eigenbasis for T is a basis of Rn consisting
of eigenvectors of T . We say T is diagonalizable if it has an eigenbasis.
It is clear that T is diagonalizable if and only if its matrix with respect to any basis is diagonalizable.
This is now a coordinate-independent definition. Here is the advantage: diagonalizing a matrix
corresponds exactly to finding an eigenbasis of the underlying linear transformation. By starting
with a linear transformation, we are fixing the space and varying the coordinate system on the space. The
problem becomes one of choosing a good basis.
We should check that most of the things we introduced are in fact geometric, in the sense that they do not
change when we change coordinates. For the characteristic polynomial, this is a property of the determinant

det(SAS −1 − λI) = det(S(A − λI)S −1 ) = det(A − λI)

so we can speak of the characteristic polynomial of a linear transformation. The algebraic multiplicity is
therefore also a property of a linear transformation. The eigenspace, eigenvector, and geometric multiplicities
are already geometric: they are expressed in terms of kernels and dimensions.
Warning. Similar matrices have the same eigenvalues. They do not necessarily have the same eigenvectors.
Instead, the eigenvectors are related by a change of basis.
The main theorem is therefore entirely geometric. We use this reformulation to prove its first part.
Proof of Part (1) of Main Theorem. Let V = ker(T − λI). Fix a basis {~v1 , · · · , ~vm } for V , then m =
(T )
mgeom (λ). Extend it to a basis of Rn by adjoining vectors ~vm+1 , · · · , ~vn . With respect to this basis, the
matrix of T has the form  
λIm ∗
0 ∗
(T )
Therefore, pT (x) = (x − λ)m g(x) for some polynomial g(x). In particular, m ≤ malg (λ).
Before proving the other parts, we show that different eigenspaces are disjoint. This also applies to
generalized eigenspaces, to be introduced later, with essentially the same proof.
(T )
Theorem. Let λ1 , · · · , λm be distinct eigenvalues of T . Suppose we have ~v1 + · · · +~vm = 0, where ~vi ∈ Eλi
for each i, then ~v1 = · · · = ~vm = 0.
Proof. Suppose the conclusion is false. Let ~v1 + · · · + ~vm = 0 be a non-trivial relation where the number
of non-zero ~vi is minimal. Without loss of generality, we may suppose ~v1 6= 0. Observe that T ~vi = λi~vi by
hypothesis, so applying T − λ1 gives

(λ1 − λ1 )~v1 + (λ2 − λ1 )~v2 + · · · + (λm − λ1 )~vm = 0


(T )
The term from Eλ1 is now zero, so by minimality, we must have (λi − λ1 )~vi = 0 for all i > 1. But the
eigenvalues are distinct, so ~vi = 0 if i > 1, which also implies ~v1 = 0, which is a contradiction.
Proof of Part (3) of Main Theorem. First suppose T is diagonalizable, then using an eigenbasis, we can
check the equality of the two multiplicities on diagonal matrices, which makes the claim clear.
(T ) (T )
In general, let λ1 , · · · , λp be eigenvalues of T , then by concatenating a choice of bases for Eλ1 , · · · Eλp ,
Pp (T )
we get a set of i=1 mgeom (λi ) linearly independent eigenvectors in Rn of size, by the previous theorem. If
(T ) (T )
T has n real eigenvalues counted with multiplicities, and malg (λ) = mgeom (λ) for all λ, then this procedure
gives us an independent set of size n, so it is an eigenbasis for T .
It follows from the proof that there are two obstructions to diagonalizability:
(T ) (T )
– malg (λ) > mgeom (λ): the eigenspaces are not big enough. In a non-examinable section, we will see
how to fix this using generalized eigenspaces.
Pp (T )
– i=1 malg (λi ) < n: there are not enough eigenvalues. This obstruction no longer exists if we work
with C, the field of complex numbers.
We are in fact sufficiently careful with the wording that the above proof works word-by-word to give a proof
of part (2), once we allow scalars to be complex numbers. We will review some facts about complex numbers
in the next section and talk about complex eigenvalues.

5 Complex eigenvalues

The fundamental theorem of algebra states that a polynomial of degree n has exactly n complex roots,
counted with multiplicities, so the second obstruction no longer occurs. This is why the statement of part
(2) of the Main Theorem is much cleaner. We will assume the basics of complex numbers as covered in the
textbook and instead highlight some key features.
In this section,
√ be a square root of −1. There is a choice involved, and you actually cannot distinguish
let i √
between + −1 and − −1. This is very profound, so profound that we make the following definition.
Definition. If z = a + bi, where a, b ∈ R, then its (complex) conjugate is z̄ := a − bi.
The fact that i and −i are indistinguishable is the philosophical reason why conjugation preserves all
arithmetic operations, so for example

z + w = z̄ + w̄, zw = z̄ · w̄, z −1 = z̄ −1
√ √
This is related to the earlier remark that 5 and − 5 are indistinguishable if you only have access to
the rational numbers. Another consequence is the following trivial and yet extremely profound fact: if an
expression involving z and z̄ is supposed to have a real value, then it must be invariant under complex
conjugation, so for example
1 1
Re(z) = (z + z̄), Im(z) = (z − z̄)
2 2i
both return real numbers, because they are invariant under conjugation. Once again, please talk to me if
you want to learn more.
Since we can perform arithmetic operations, and there is a good notion of size (|z| := (z z̄)1/2 ), we can
evaluate power series at complex numbers.
Definition. The exponential function is defined by the usual power series

X zn
ez = exp(z) :=
n=0
n!

This satisfies the usual properties, the key one being

exp(z + w) = exp(z) exp(w)

By substituting in z = iθ, we get


∞ ∞ ∞
X θn X θ2k X θ2k+1
eiθ = in = (−1)k +i (−1)k = cos θ + i sin θ
n=0
n! (2k)! (2k + 1)!
k=0 k=0

which you may know as Euler’s formula, even though it is a complete triviality from definition.
Now, by allowing complex numbers as scalars, the discussions in the previous sections go through, and
part (2) of the Main Theorem follows from the proof for part (3) and the fundamental theorem of algebra.
We will now look at an instructive example.
Example. Let R be the rotation matrix
 
cos θ − sin θ
R=
sin θ cos θ

Its characteristic polynomial is pR (λ) = λ2 − (2 cos θ)λ + 1, so the eigenvalues are cos θ ± i sin θ = e±iθ . Now,
 
−i sin θ sin θ
R − (cos θ + i sin θ)I2 =
− sin θ −i sin θ
−iθ
i
so an eigenvector for eiθ is 1 . By direct computation or the symmetry heuristics, an eigenvector for e
is −i

1 , so we have
−1
−i eiθ
  
i 0 i −i
R=
1 1 0 e−iθ 1 1
Note that the S-matrix does not depend on the angle, which fundamentally happens because all 2D
rotations commute with each other. We use this observation to find two rotation matrices whose sum is also
a rotation matrix: observe the complex number identity
2πi πi
1 + z = w, z=e 3 ,w = e 3

See the following picture:


y
z w

3 π
3 x

It follows that      
1 0 z 0 w 0
+ =
0 1 0 z̄ 0 w̄
so after a change of basis, we get I2 + R 2π
3
= R π3 .
In general, by the principle of symmetry under complex conjugation, the complex roots of a real poly-
nomial must appear in conjugate pairs: if z is a root, then z̄ is also a root. Let T be a real linear operator
with eigenvalues z and z̄. If ~v is an eigenvector for z, then ~v is an eigenvector
 for z̄.0 Let
 z = a + bi where
a, b ∈ R, then z̄ = a − bi. In the diagonal form, we have a block of the form a+bi 0 a−bi . Write ~
v = ~u + iw,
~
where ~u, w
~ are real vectors, then we have

T (~u + iw)
~ = (a + bi)(~u + iw)
~

Comparing real and imaginary parts gives

T ~u = a~u − bw,
~ Tw
~ = b~u + aw
~
a b
 
Therefore, if we replace {~v , ~v } by {~u, w},
~ then the matrix of T with respect to this new basis is −b a . We
summarize this discussion in the following theorem.
Theorem. Suppose A is a real n × n matrix which is diagonalizable over C, then A is similar
 a b  to a block
diagonal matrix, where each block is either a single number or a 2 × 2 matrix of the form −b a .

As a recap, to get to this form, first diagonalize as usual over C, then replace a pair of conjugate
eigenvectors with their real and imaginary parts. In that plane, instead of a scaling, A acts as a scaled
rotation, which is responsible for the spirals in the corresponding discrete
 dynamic systems.
a b
 
Finally, we remark that this operation of going from z0 z̄0 to −b a preserves all arithmetic operations,
since it is just a change of basis, and the basis does not depend on z and z̄. So we can view complex numbers
as 2 × 2 matrices. The unit circle is sent to rotation matrices.

6 Trace and determinant

Recall that the trace of an n×n matrix A is the sum of its diagonal entries. In the expansion of det(A−λIn ),
the leading coefficient is (−1)n . The term for λn−1 also comes entirely from the diagonal, and it is not hard
to see that it must be (−1)n−1 Tr(A). Finally, if λ = 0, then we recover on the one hand the constant term
of pA (λ), but on the other hand det(A). Therefore,
Theorem. Let A be an n × n matrix, then its characteristic polynomial has the form

pA (λ) = (−1)n λn + (−1)n−1 Tr(A)λn−1 + · · · + det(A)

In particular, the characteristic polynomial of a 2 × 2 matrix is λ2 − Tr(A)λ + det(A).


By the relation between roots of a polynomial and its coefficients, we get that

Theorem. The product of all eigenvalues is det(A). The sum of all eigenvalues is Tr(A).
We now compute the eigenvalues of a TST matrix. Recall that they are matrices of the form
 
a b 0 ··· 0
b a
 b · · · 0 
(n)  .. . . . . . . . . ... 
Da,b =  . . 
 
0 · · · b a b
0 ··· 0 b a
(n) (n)
So we know that Da,b − λIn = Da−λ,b . Since we also had a formula for its determinant, it would appear
reasonable to compute its eigenvalues directly. This is doable but tricky. Instead, we guess an eigenbasis
and compute eigenvalues from there.
The guess is based on the identity

(k − 1)π (k + 1)π π kπ
sin + sin = 2 cos sin
n n n n
kπ 2kπ nkπ
Let ~vk = [sin n+1 , sin n+1 , · · · , sin n+1 ], then for each i,

(n) (i − 1)kπ (i + 1)kπ kπ ik kπ


(D0,1 ~vk )i = sin + sin = 2 cos sin = 2 cos (~vk )i
n+1 n+1 n+1 π n+1
(n) kπ
Therefore, D0,1 ~vk = 2 cos n+1 ~vk . For k = 1, 2, · · · , n, these give n eigenvectors with n different eigenvalues,
(n) kπ
so they form an eigenbasis. Therefore, the eigenvalues of D0,1 are 2 cos n+1 for k = 1, · · · , n.
(n) (n) (n) kπ
Finally, D(a,b) = In + bD(0,1) , so the eigenvalues of D(a,b) are a + 2b cos n+1 . In particular, take a = 2
and b = 1. The half-angle formula gives
kπ kπ
2 + 2 cos = 4 cos2
n+1 2(n + 1)
(n) Qn+1 kπ
 (n)
So det D(2,1) = k=1 4 cos2 2(n+1) . But we have computed that det D(2,1) = n + 1. Their equality gives
the formula promised in the last chapter, namely

π 2π (n − 1)π n
cos cos · · · cos = 2n−2
2n 2n 2n 2

7 Examples

Example 1 We compute the eigenvalues of the following n × n matrix


 
1 1 1 1 · · · 1
1 0
 0 0 · · · 0 
1 0
 0 0 · · · 0 
1 0 0 0 · · · 0
A= · ·

 · · · ·

· · · · · ·
 
· · · · · ·
1 0 0 0 · · · 0

This requires computing its characteristic polynomial


 
1−λ 1 1 1 · · · 1
 1
 −λ 0 0 · · · 0 

 1
 0 −λ 0 · · · 0 

 1 0 0 −λ · · · 0 
pA (λ) = det 
 ·

 · · · · · 

 · · · · · · 
 
 · · · · · · 
1 0 0 0 · · · −λ

To compute the determinant, we use column operations. For now, suppose λ 6= 0, then we can add λ−1
times column i to column 1 for i = 2, · · · , n to get

1 − λ + (n − 1)λ−1 1
 
1 1 · · · 1

 0 −λ 0 0 · · · 0 


 0 0 −λ 0 · · · 0 

 0 0 0 −λ · · · 0 
 = (1 − λ + (n − 1)λ−1 )(−λ)n−1
pA (λ) = det 

 · · · · · · 


 · · · · · · 

 · · · · · · 
0 0 0 0 · · · −λ
We can simplify this expression to (−λ)n−2 (λ2 − λ − (n − 1)). Therefore, this equals to pA (λ) if λ 6= 0. If
λ = 0, then we are looking at det(A). If n > 2, then A has repeated columns, so det(A) = 0, and the formula
still holds. If n = 2, we can check this directly. A more satisfying reason is that the polynomial
q(λ) = pA (λ) − (−λ)n−2 (λ2 − λ − (n − 1))
satisfies q(λ) = 0 if λ 6= 0. But a non-zero polynomial can have at most finitely many roots, so q = 0.
Therefore, 0 is an eigenvalue with algebraic multiplicity n − 2, and there are two other eigenvalues

1 ± 4n − 3
λ± =
2
Suppose n > 1, then the rank of A is 2, since columns 2 to n are equal. By the rank-nullity theorem,
dim ker A = n − 2, so the geometric multiplicity of 0 is n − 2. It follows that A is diagonalizable, and all of
its eigenvalues are real. We will see in the next chapter that this property holds for all symmetric matrices.

Example 2 Textbook section 7.6, #40. The question studies the following matrix
 
p −q −r −s
q p s −r
A= r −s

p q 
s r −q p
and relates it to the representation of an integer as a sum of four squares. We will do it with the minimal
amount of computation, as is intended.
First observe by direct computation that
AT A = (p2 + q 2 + r2 + s2 )I4
It follows that A is invertible if and only if p, q, r, s are not all 0, in which case its inverse is
1
A−1 = AT
p2 + q2 + r2 + s2
Moreover, det(A)2 = det(AT A) = (p2 + q 2 + r2 + s2 )4 , so det(A) = ±(p2 + q 2 + r2 + s2 )2 .
We don’t know the sign yet, which a priori depends on p, q, r, s, but we will figure it out from the
characteristic polynomial. Observe that A − λI4 can be obtained from A by replacing p with p − λ, so
det(A − λI4 ) = ±((p − λ)2 + q 2 + r2 + s2 )2
But the leading coefficient must be positive, so the sign is positive. In particular, setting λ = 0 gives
det(A) = (p2 + q 2 + r2 + s2 )2
This can also be argued by continuity: if λ is large, then the diagonal entries dominate the rest, so the
determinant is positive. But the sign cannot jump, so it must always be positive.
It follows easily from the formula for the characteristic polynomial that the eigenvalues of A are
p
λ± = −p ± −(q 2 + r2 + s2 )
each occurring with algebraic multiplicity 2. Since AT is a multiple of A−1 , we have AAT = AT A, so the
spectral theorem for normal operators (see notes for Chapter 8) shows that A is diagonalizable, but we don’t
need to know this.
Now, observe that
kA~xk2 = ~xT AT A~x = (p2 + q 2 + r2 + s2 )~xT ~x = (p2 + q 2 + r2 + s2 )k~xk2
p
so kA~xk = p2 + q 2 + r2 + s2 k~xk. The squared version is more useful for later: if ~x = [x1 x2 x3 x4 ]T and
A~x = [y1 y2 y3 y4 ]T , then it says
y12 + y22 + y32 + y42 = (p2 + q 2 + r2 + s2 )(x21 + x22 + x23 + x24 )
Moreover, if all p, q, r, s, x1 , x2 , x3 , x4 are integers, then y1 , y2 , y3 , y4 are also integers, so we have shown that
the product of two sums of four squares is also a sum of four squares. For example, set [p q r s] = [3 3 4 5]
and ~x = [1 2 4 4]T , then
A~x = [−39 13 18 13]T
so we must have
(−39)2 + 132 + 182 + 132 = (32 + 32 + 42 + 52 )(12 + 22 + 42 + 42 ) = 59 × 37 = 2183
Given that all primes are representable as a sum of four squares, this fact shows that all positive integers
are representable as a sum of four squares.
We end with two remarks on this problem.
1. One can check that Ap,q,r,s Ap0 ,q0 ,r0 ,s0 has the same form, with parameters given by the coordinates of
Ap,q,r,s [p0 q 0 r0 s0 ]T , up to some sign differences. These matrices form a new system of number-like
a b

objects, similar to how a complex number a + bi can be represented as a 2 × 2 matrix −b a . They
are more commonly known as the quaternions, and written as p + qi + rj + sk, where i, j, k satisfy the
relations i2 = j 2 = k 2 = ijk = −1. This is not commutative: ij = k = −ji.
One way to build complex numbers from real numbers is to define a complex number as a pair of
two real numbers (a, b) with multiplication (a, b)(c, d) := (ac − bd, ad + bc). Starting instead with
the complex numbers and performing this construction gives us the quaternions. Starting with the
quaternions gives the Cayley octonions, which are not even associative, but they are sometimes useful
in studying certain exotic objects. Continuing this way will lose even more properties, and the resulting
objects have not been particularly useful.
2. The number of ways of writing a positive integer n as a sum of four squares is
X
r4 (n) = 8 d
4-d|n

where the sum goes over all divisors of n which are not multiples of 4. If p is a prime, then the answer
is 8(p + 1). This treats permutations of the terms and changing signs as giving distinct representations,
so if p = 3, the 32 representations are all of the form 3 = (±1)2 + (±1)2 + (±1)2 + 02 . If p = 37, there
are 3 essentially distinct representations
37 = 12 + 22 + 42 + 42 = 02 + 02 + 12 + 62 = 22 + 22 + 22 + 52
To count those representations, the problem reduces to studying representation of a number as a sum
of three or fewer squares. The two-square case is easily resolved with a similar formula. The three
square case is somewhat well-understood, but it lies far deeper and has relation to the generalized
Riemann hypothesis.

8 Non-diagonalizable matrices*

Not all hopes are lost when the matrix is not diagonalizable. We will talk about three fixes. But first, we
remark that a generic matrix is diagonalizable, since a generic polynomial has no repeated roots. One way
to make “generic” precise is to say that there is a polynomial ∆ in n2 variables such that if an n × n matrix
A is not diagonalizable, then ∆ evaluated at the entries of A is 0. In other words, if ∆ 6= 0, then A is
diagonalizable. For n = 2, we can take ∆ = (Tr A)2 − 4 det A = (a + c)2 − 4(ad − bc). Therefore, in many
results, one can assume the matrix is diagonalizable and then do some sort of continuity argument.
Theorem. Every n × n matrix is similar to an upper triangular matrix (over C).
Proof. We prove more: if T is a linear transformation, then there exists an orthonormal basis such that T is
upper triangular in that basis (we need to change the definitions slightly to accommodate complex numbers).
The proof proceeds by induction.
By the fundamental theorem of algebra, T has a complex eigenvector ~u1 , which we may assume has unit
length by scaling. Let V = span(~u1 )⊥ , and let P be the projection onto V . We have a linear operator P T
on V , which has dimension n − 1, so by induction hypothesis, it has an orthonormal basis with respect to
which T is upper triangular. Appending ~u1 to it gives the required basis.
This is sufficient to prove the Cayley–Hamilton theorem for general matrices.
Theorem (Cayley–Hamilton). Let A be an n×n matrix with characteristic polynomial pA , then pA (A) = 0.
Proof. By the previous theorem, we may assume A is upper triangular, in which case we have
n
Y
pA (λ) = (λ − aii )
i=1
Qn Qk
so pA (A) = i=1 (A − aii In ). It is now easy to prove by induction that the first k columns of i=1 (A − aii In )
are zero, so pA (A) = 0.
This is usually not good enough for the computations we want to do, so there is a more powerful theorem.
Theorem (Jordan normal form). For each λ ∈ C and positive integer n, define a Jordan block
 
λ 1 0 ··· 0
0 λ 1 · · · 0
 .. .. . . .
 
..
.
Jλ,n =  . . . .. 

. ..
 ..

. ··· λ 1
0 0 ··· 0 λ

Then every matrix A is similar to a block diagonal matrix, where each block is a Jordan block. Moreover,
this is unique up to re-arranging the blocks.
Therefore, the similarity classes of n×n complex matrices are indexed by a multi-set of distinct eigenvalues
with a partition for each multiplicity.
We now indicate some of the inputs to the proof. The first is a generalized eigenspace
e (T ) :=
[
ker (T − λ)k


k≥1
 
So if A = 00 10 , then its 0-eigenspace is span(~e1 ), but its generalized 0-eigenspace is R2 . In general, the
dimension of the generalized eigenspace for λ is equal to its algebraic multiplicity. The proof consists of first
reducing to the case where λ = 0 is the unique eigenvalue occurring with algebraic multiplicity n, and then
applying the previous theorem on upper triangular form to conclude that T n = 0.
Similarly to the proof of the Main Theorem, we get a decomposition of the entire space into generalized
eigenspaces. To obtain the Jordan normal form, we need to study the structure of a generalized eigenspace.
This requires looking at the following sequence of subspaces
(T )
Eλ = ker(T − λ) ⊆ ker (T − λ)2 ⊆ ker (T − λ)3 · · ·
 

The amount by which the dimension jumps in each step determines the sizes of the Jordan blocks. It can
be helpful to think about this sequence when T is already a Jordan block.
As an example, we prove a simple case of the theorem. It follows very easily by reducing to the upper
triangular case, but we give a much longer proof to illustrate the general idea.
Theorem. If A is a 2 × 2 matrix with an eigenvalue λ of algebraic multiplicity 2, then either A is equal to
λI2 or A is similar to J = λ0 λ1 .
Proof. If λ has geometric multiplicity 2, then A is diagonalizable, so A is similar to λI2 , which implies
(λ)
A = λI2 . We now suppose that mgeom (A) = 1, so dim ker(A − λ) = 1.
Consider the usual equation
A[~v1 , ~v2 ] = [~v1 , ~v2 ]J = [~v1 , ~v1 + ~v2 ]
So we need to solve two equations

(A − λI)~v1 = 0, (A − λI)~v2 = ~v1


Since A has an eigenvalue equal to λ, ~v1 can be taken to be an eigenvector. Now the question is if ~v1 ∈
(λ)
Im(A − λI). By assumption, mgeom (A) = 1, so dim Im(A − λI) = 1. The characteristic polynomial of A
is pA (X) = (X − λ) , so by the Cayley–Hamilton theorem (A − λI)2 = 0, so Im(A − λI) is contained in
2

ker(A − λI). But both spaces have dimension 1, so they are equal. By construction, ~v1 ∈ ker(A − λI), so it
is also in Im(A − λI). It follows that ~v2 exists.
Finally, ~v2 and ~v1 are linearly independent: suppose not, then ~v2 ∈ ker(A − λI). But by definition,
(A − λI)~v2 = ~v1 6= 0, which is a contradiction.
Finally, if the entries of the starting matrix are all rational numbers, we might not want to introduce
irrational numbers into the picture at all. One such possibility is the rational canonical form, given in the
next theorem. Its proof is actually not hard, but it involves a long digression into rational polynomials.
Theorem. Given a polynomial p(λ) = a0 + a1 λ + · · · + an λn , where an = 1, define
 
0 0 ··· 0 −a0
1 0 · · · 0 −a1 
 .. . . . .. 
 
. . .
Bp =  .
 . . . . 
 .. 
0 . 0 −an−2 
0 0 · · · 1 −an−1
Then every matrix A is similar to a block diagonal matrix, where each block is isomorphic to Bp , where p is
a factor of the characteristic polynomial of A. This representation is unique up to permuting the blocks.
Among those three, the Jordan normal form is probably the hardest, but also the most useful for theory
and applications. We give one application to ODE.
Example. Consider the second order linear ODE
ẍ(t) + 2ẋ(t) + x(t) = 0
In the usual procedure for finding the general solution, one solves the characteristic equation r2 + 2r + 1 = 0,
which has a repeated root r = −1, so the general solution is x(t) = e−t (A + Bt).
We now do this using matrix exponentiation. Let y(t) = ẋ(t), then we have a system of equations
(
ẋ(t) = y(t)
ẏ(t) = −x(t) − 2y(t)

which we have seen can be re-written in matrix form


 
˙~x(t) = 0 1
~x(t)
−1 −2

Let A be the matrix, then det(A − λI2 ) = λ2 + 2λ + 1, which has a repeated root λ = −1 (think about how
this is related to the
 rational canonical form). The geometric multiplicity of −1 is only 1, so A has Jordan
normal form J = −1 1

0 −1 . In fact,  
−1 1 1
A = SJS , S =
−1 0
The general solution is therefore
~x(t) = eAt ~x0 = SeJt S −1 ~x0
 
It remains to compute eJt . Write J = −I + E, so E = 00 10 , then E 2 = 0, so by binomial expansion

J 2 = (−I + E)n = (−1)n I + (−1)n−1 nE


Therefore,
∞  −t
te−t

X 1 e
eJt = ((−t)n I + (−t)n−1 ntE) =
n! 0 e−t
n=0
Substituting this into the solution gives the required general solution.
9 Fast eigenvalue algorithms*

In practice, people do not compute eigenvalues by finding roots of the characteristic polynomial. Recall
that the Gauss–Jordan elimination is O(n3 ), which is okay if all entries are numbers. But det(A − λI) is a
polynomial of degree n, so addition is actually O(n) and multiplication is O(n2 ) (or O(n log n) with FFT).
Division is even worse, so the end-result is something like O(n5 ). On top of that, there are serious numerical
stability issues. We also need a root-finding algorithm, but those are well-developed already.
We will briefly discuss two families of eigenvalue-finding strategies, which comes close to scratching the
surface of the vast amount of work in this area.

The power method The idea is very simple: if A has a unique eigenvalue λ1 with maximum modulus,
Ak ~
v
then kA k~
vk
should converge to an eigenvector associated with it for a generic initial ~v . This is quite clear is
A is diagonal, and in general we can either use Jordan normal form (there is a unique Jordan block of size
1 for λ1 ) or we can wiggle the matrix elements a bit to reduce to the diagonal case.
Example. For the Fibonacci example, the above discussion shows that

Fn 1+ 5

Fn−1 2
as n → ∞, which is clear from the formula we wrote down. In fact, the power method comes down to
computing the Fibonacci numbers and taking the above ratio every time. The error in the approximation is
around O(0.38n ), so we have a linear algorithm (we need to take log because the correct measure of size is
the number of correct digits).
Example. The power method does not converge exponentially if λ1 is a multiple root. It may not even
converge if there are more than one eigenvalues with modulus equal to the spectral radius. Here are two
easy examples:
An~v
       
1 1 0 1 n 1
A= , ~v = =⇒ =√ → with error O(n−2 )
0 1 1 kAn~v k 2
n +1 0 0
An~v
   
0 1 0
A= , ~v = =⇒ is periodic.
−1 0 1 kAn~v k
In general, the eigenvalue contributing to the spectral radius may not be real, in which case we have at
least two eigenvalues on the spectral circle (this does not happen if A is symmetric, as we will see in the
next chapter). Moreover, for large matrices, the eigenvalues may be clumped together, so we have a bad
convergence rate. This is fixed using shifts.
There are two ideas involved:
1. If we perform the power method on A−1 , we can isolate the minimal eigenvalue of A.
2. If we have a good estimate µ of an eigenvalue of A, then the minimal eigenvalue of A − µI is much
smaller than the other eigenvalues.
We use them in conjunction to get the inverse power method with shifts.
1. Find a good estimate µ for an eigenvalue of A.
2. Solve the equation (A − µI)~vk+1 = ~vk .
3. Normalize ~vk+1 and decide if we are close enough.
4. If not, update the estimate µ and repeat from (2).
There are various schemes for the initial guess and updating estimate part. A good implementation will
usually get a cubic convergence rate. To speed up the second step, we can pre-process A to introduce 0s,
typically iteratively replacing A by QAQ−1 for some orthogonal matrix Q. This can be done using the same
strategy we used for QR-factorization.
QR-based algorithms This is based on the following simple observation: if A = QR is its QR factoriza-
tion, then RQ is similar to A, since

RQ = Q−1 (QR)Q = Q−1 AQ

The basic QR-eigenvalue algorithm consists of repeatedly performing this. At a first glance, there is no reason
that this leads anywhere, but in fact the terms below the diagonal tend to converge to 0, so eventually we
end with an upper triangular matrix (if all eigenvalues are real) or a block upper triangular matrix with
small blocks. The reason is that we are still secretly performing a bunch of power iterations together.
The same shifting idea can still be applied to dramatically speed up convergence. If we pre-process
A to have good zeros placements, then the QR-factorization steps are usually O(n2 ), so they are not as
computationally expensive as they may appear.

You might also like