0% found this document useful (0 votes)
9 views24 pages

Eigen Values, Eigen Vectors and Diagonal Form of Matrices

Uploaded by

Mrinal Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views24 pages

Eigen Values, Eigen Vectors and Diagonal Form of Matrices

Uploaded by

Mrinal Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Notes on Eigenvalues, eigenvectors, and diagonalization

Dr. Abhay Kumar Singh

These notes give an introduction to eigenvalues, eigenvectors, and diagonalization, with an emphasis on the
application to solving systems of differential equations. The key point is that, when you solve such systems, the
linear algebra really does the “heavy lifting”, and calculus is only necessary for one easy step of the process.
The first two sections focus on the big picture of why you would care about eigenvalues, eigenvectors, and
diagonalization, and Section 3 explains how to actually find eigenvalues and eigenvectors. Depending on how
you like to learn math, you might prefer to read Section 3 first, and then come back to the first two sections
afterward.

1 Big picture: Systems of linear differential equations

1.1 Describing systems of linear differential equations in vector form

The main motivation for eigenvalues and eigenvectors is their application in solving systems of linear differen-
tial equations. An example of a system of linear differential equations is

x10 = 2x1 + 3x2 ,


x20 = x1 + 4x2 .

In this type of equation, x1 and x2 are both supposed to be functions of t, and you’re trying to find what x1 and
x2 can be, given that they need to satisfy the above equations.
What makes this kind of system hard to solve is the fact that the variables x1 and x2 interact — the derivative
of x1 depends on x2 , and the derivative of x2 depends on x1 — so there’s a feedback mechanism that’s difficult
to resolve. But we will see that linear algebra can be used to reduce the system to a simpler one, where the
feedback mechanism doesn’t exist.
First, observe that we can express the above system in vector form. If we let
   
x1 2 3
x= , and A = ,
x2 1 4

then the system of equations can be written as

x 0 = Ax.

More generally, any system of linear differential equations (at least those that are homogeneous, i.e. that don’t
contain any constant terms) can be written in the form x 0 = Ax for some square matrix A.
1.2 Similar matrices

Our approach to solving systems of linear differential equations will utilize the concept of matrix similarity:

Definition. Two square matrices A and B are called similar if there is an invertible matrix P such that P−1 AP =
B (or, equivalently, A = PBP−1 ).

There is a sense in which you could say that two similar matrices are “essentially the same.” Specifically,
since P is invertible, you could view it as just being a “change of perspective,” or more formally, a change of
basis. If P−1 AP = B, then that means that doing the transformation B is the same thing as doing P (changing
perspective), then doing A, and then doing P−1 (changing back to the original perspective).
Note that, if you know that P is invertible, then it’s good enough to check that AP = PB. It is helpful to think of
the following diagram:
Rn o
B
Rn
P P
 
Rn o A
Rn
The equation P−1 AP = B means that you could start in the upper right-hand corner and use B to go to the upper
left-hand corner, or you could use P to go down, A to go left, and then P−1 to go up, and the result should be
the same either way. Alternatively, the equation AP = PB means that both ways of going from the upper right
corner to the lower left corner should produce the same result.
I’ll emphasize that similarity is a very strong relationship, in contrast to row-equivalence, which is a very weak
relationship. If you pick two matrices at random they will most likely not be similar, whereas they most likely
will be row-equivalent. If two matrices are similar, they will have many properties in common. For example,
they will have the same determinant, since

det A = det P det B det P−1 = det B

(remember det P−1 = 1/ det P, so you can cancel.) We will later discover more common properties.

1.3 Similarity and systems of linear differential equations

Let’s suppose you want to solve the system x 0 = Ax for some matrix A. And suppose that A is similar to another
matrix B = P−1 AP which is somehow “nicer” (we’ll soon clarify what “nicer” might mean).
You can view the invertible matrix P as giving a change of variables

x = Py, or equivalently y = P−1 x.

This is actually just a fancy case of a u-substitution (like in calculus), which we can use to rewrite the equation
x 0 = Ax in terms of y.
Since P doesn’t depend on t, we can treat it as a constant and “pull it out” when we take the derivative of x:

x 0 = Py 0 .

Now we can substitute x = Py and x 0 = Py 0 into the equation x 0 = Ax, to get

Py 0 = APy.

2
If we multiply both sides of the equation by P−1 , we get

y 0 = P−1 APy = By.

The conclusion is that, if A is similar to B, then the change of variables x = Py turns the system x 0 = Ax
into the system y 0 = By. This means we can (potentially) use the following strategy to solve the equation
x 0 = Ax:
1. First solve the equation y 0 = By.
2. Then multiply those solutions by P to solve for x.
Of course, this strategy is only useful if it’s somehow easier to solve the equation involving B than it is to solve
the equation involving A. So we arrive at the following question:
Given a square matrix A, how can we find a similar matrix B for which the corresponding system of
equations is easier to solve?
Everything that we will do from here on will be motivated by trying to answer this question.

2 Diagonalization

2.1 Diagonal matrices and decoupled systems

The easiest systems to solve are the ones where the matrix is diagonal, meaning that the only nonzero entries
are along the diagonal. In the 2 × 2 case, such a matrix is of the form
 
λ1 0
D= ,
0 λ2

and the corresponding system of differential equations y 0 = Dy is

y01 = λ1 y1 ,
y02 = λ2 y2 .

What’s great about this system is that it is decoupled — there’s an equation for y1 and an equation for y2 , and the
two variables don’t interact with each other. In fact, these equations are simple exponential growth equations,
and the solutions are

y1 = c1 eλ1t ,
y2 = c2 eλ2t .

(Note: This is the only step in the whole process that requires calculus.)

2.2 Diagonalization

Suppose that you are given an arbitrary n × n matrix A. The discussions in Sections 1.3 and 2.1 motivate the
following question: Can we find a diagonal matrix D that is similar to A?

3
Let’s try to figure out what this would involve. We’ll use the diagram

Rn o
D
Rn
P P
 
Rn o
A
Rn
If we start with e1 in the upper right-hand corner, then we could multiply by D and get
De1 = λ1 e1 .
Alternatively, we could multiply by P. We don’t know what P is yet, so let’s just say that the columns of P are
some yet-undetermined vectors v1 , . . . , vn . Then, when we multiply e1 by P, we’ll get the first column:
Pe1 = v1 .
Also, since λ1 is a scalar, we can pull it out when computing PDe1 :
PDe1 = P(λ1 e1 ) = λ1 P(e1 ) = λ1 v1 .
This is enough information to fill in all four corners of the diagram:

λ1_e1 o
D
e_1
P P

 
λ1 v1 = Av1 o
A
v1
The key point here is that the only way to make the diagram “work” is if Av1 = λ1 v1 . This equation is called
the eigenvalue equation:

Definition. An eigenvector of a square matrix A is a nonzero vector v such that

Av = λv

for some scalar λ. We then say that λ is the eigenvalue associated to v.

A more thorough discussion of eigenvalues and eigenvectors is in the next section. But for now, let’s just use
the concept to complete the discussion of diagonalizability.
Using the “eigen” terminology, we would say that, to make the above diagram “work”, v1 needs to be an
eigenvector, and λ1 needs to be the corresponding eigenvalue. And if we started with e2 in the upper right-hand
corner, we could conclude that v2 needs to be an eigenvector, etc.
We need to have n eigenvectors v1 , . . . , vn to form the matrix P, and since P needs to be invertible, these
eigenvectors need to form a basis. But that’s all we need. This solves the problem (in theory, at least):

Theorem 2.1. If {v
 1 , . . . , vn } is a basis of eigenvectors for an n × n matrix A, then you can form the matrix
P = v1 · · · vn , and it will automatically be true that P−1 AP = D, where D is the diagonal matrix whose


entries are the corresponding eigenvalues:


 
λ1 0 . . . 0
.
 0 λ2 . . . .. 

D= . .
. . . . .

. . . 0
0 . . . 0 λn

4
I’ll emphasize that this theorem doesn’t necessarily give a positive answer to the question at the beginning of the
section. The issue is that, depending on what A is, it may or may not be possible to find a basis of eigenvectors.
In later sections, we’ll go through all the different possibilities for what can happen.
Terminology: The process of finding the P and the D such that P−1 AP = D is called diagonalization. If it is
possible to diagonalize A (in other words, if there exists a basis of eigenvectors), then you would say that A is
diagonalizable.

3 Eigenvalues and eigenvectors

3.1 Geometric meaning of the eigenvalue equation

Recall that, if A is a square matrix, then an eigenvector of A is a nonzero vector v such that

Av = λv

for some scalar λ. In this case, λ is called an eigenvalue. Note that λ is allowed to be 0, but v isn’t.
Geometrically, the eigenvalue equation says that, if v is an eigenvector, then when you multiply it by A, the
result lies on the line spanned by v. It might be that Av is longer than v or shorter than v. It might be that Av
is pointing in exactly the opposite direction to v (in this case λ is negative), and it might even be that Av is the
zero vector (in this case λ = 0).

3.2 How to solve for eigenvalues and eigenvectors

To solve the eigenvalue equation, it is helpful to rewrite it in the following way:

(A − λI)v = 0.

(To get here, we moved the λv term over to the left and used the fact that λI is the matrix that “implements”
scalar multiplication by λ.) This equation simply says that v is in the nullspace of A − λI.
One way to know whether or not there exists a nonzero vector in the nullspace of a square matrix is to take
its determinant; if the determinant is 0, then there does exist a nonzero vector in the nullspace, and if the
determinant is nonzero, then the nullspace is trivial. Applying this principle to the equation (A − λI)v = 0, we
conclude that a nonzero solution v exists exactly when det(A − λI) = 0.
This fact gives us a relatively straightforward (but sometimes tedious!) procedure to find all the eigenvalues
and eigenvectors of A:
1. Compute det(A − λI), with λ being treated as undetermined. The result will be a polynomial in the
variable λ. This polynomial is called the characteristic polynomial of A.
2. Factor the characteristic polynomial. The roots are the eigenvalues. (They are precisely the values of λ
for which the eigenvalue equation has a nonzero solution v.)
3. For each eigenvalue λ, compute the nullspace of A − λI. The nonzero vectors in the nullspace are the
eigenvectors for eigenvalue λ, and the entire nullspace is referred to as the eigenspace for λ. Usually, you
will want to find a basis for every eigenspace.

5
3.3 Let’s go crazy with examples

There are a lot of examples below, because there are a lot of subtle issues that can arise in the details. I highly
recommend that you go over each example carefully, because each one illustrates a different issue. The best
thing would be for you to follow along by doing the computations yourself.
Example 3.1. Problem: Find the eigenvalues and corresponding eigenspaces of the matrix
 
2 3
A= .
1 4

Solution: The characteristic polynomial is


 
2−λ 3
det(A − λI) = det = (2 − λ)(4 − λ) − 3.
1 4−λ
At this point, you would usually expand the parentheses and refactor. (There are some cases where other tricks
can be employed; I’ll demonstrate these in later examples.) In this case, you get

8 − 6λ + λ2 − 3 = λ2 − 6λ + 5 = (λ − 5)(λ − 1),

so the eigenvalues are λ = 5 and λ = 1.


The next step is to solve for the eigenvectors. For λ = 5, we have
 
−3 3
A − 5I = .
1 −1

Note that both rows are multiples of each other — this is to be expected in the 2 × 2 case, because we picked λ
to make the matrix “degenerate”.
The eigenspace is just the space of solutions to the homogeneous equation for this matrix. I’m going to assume
you’re good at solving homogeneous equations by now,  so I’ll leave out the details for this step. The solution
1
space (hence the eigenspace for λ = 5) is the span of .
1
Similarly, for λ = 1, we have  
1 3
A − 1I = ,
1 3
 
−3
and the eigenspace is the span of . ♦
1
Example 3.2. Problem: Find the eigenvalues and corresponding eigenspaces for the matrix
 
2 3
A= .
3 2

Solution: The characteristic polynomial is


 
2−λ 3
det(A − λI) = det = (2 − λ)2 − 9.
3 2−λ
It’s still fine to expand the parentheses and refactor if you want. But here is a slightly more fancy approach you
can use in this case:
(2 − λ)2 − 9 = 0,

6
(2 − λ)2 = 9,
2 − λ = ±3,
so the eigenvalues are λ = 2 ± 3 (in other words, λ = 5 and λ = −1).
For λ = 5, we have  
−3 3
A − 5I = ,
3 −3
 
1
and the eigenspace is the span of .
1
For λ = −1, we have  
3 3
A − (−1)I = ,
3 3
 
−1
and the eigenspace is the span of . ♦
1
Example 3.3. Problem: Find the eigenvalues of the matrix
 
1 3
A= .
3 4

Solution: The characteristic polynomial is


 
1−λ 3
det(A − λI) = det = (1 − λ)(4 − λ) − 9.
3 4−λ

If you expand everything, you get


λ2 − 5λ − 5.
Remember that if you’re having trouble factoring, you can always rely on the quadratic formula, which tells
you that the roots are √ √ √
5 ± 25 + 20 5 ± 45 5 ± 3 5
λ= = = ,
2 2 2
which unfortunately can’t be simplified any further.
The process for finding the corresponding eigenspaces is the same as the other examples, although the square
roots will make the numbers a bit ugly. (I wouldn’t ask you to do it by hand, but it’s good for you to know that
it can be done, in principle.) ♦
Example 3.4. Problem: Find the eigenvalues and corresponding eigenspaces for the matrix
 
2 −1
A= .
1 0

Solution: The characteristic polynomial is


 
2 − λ −1
det(A − λI) = det = −λ(2 − λ) + 1 = λ2 − 2λ + 1 = (λ − 1)2 ,
1 −λ

so there’s only one eigenvalue λ = 1.

7
Then  
1 −1
A − 1I = ,
1 −1
 
1
and the eigenspace is the span of . ♦
1
Example 3.5. Problem: Find the eigenvalues and corresponding eigenspaces of the matrix
 
3 −5
A= .
1 −1

Solution: The characteristic polynomial is


 
3−λ −5
det(A − λI) = det = (3 − λ)(−1 − λ) + 5 = λ2 − 2λ + 2.
1 −1 − λ
This is another tricky one, so we can use the quadratic formula:

2± 4−8
λ= = /.
2
Uh oh! There’s a negative number under the square root, so this quadratic doesn’t have any real roots! That
means that A doesn’t have any real eigenvalues, and therefore it doesn’t have any real eigenvectors. ♦
And now for some 3 × 3 examples. In general, the 3 × 3 case is hard to do by hand (mostly because it can
be difficult to factor cubic polynomials by hand), but the following examples illustrate some cases where it’s
doable.
Example 3.6. Problem: Find the eigenvalues and corresponding eigenspaces for the matrix
 
2 0 0
A = 1 4 0 .
1 2 3

Solution: This matrix is lower-triangular, and so is A − λI:


 
2−λ 0 0
A − λI =  1 4−λ 0 .
1 2 3−λ

The determinant of a triangular matrix is just the product of the diagonal entries, which in this case is (2 −
λ)(4 − λ)(3 − λ). So the eigenvalues are λ = 2, 4, 3. Note that the eigenvalues are just the entries along the
diagonal — this is always the case for triangular matrices, and it’s fine to simply write “A is triangular, so the
eigenvalues are the diagonal entries 2, 4, 3.”
For λ = 2, we have  
0 0 0
A − 2I = 1 2 0 .
1 2 1
Typically, you will need to row-reduce in the 3 × 3 case. The reduced form is
 
1 2 0
0 0 1 ,
0 0 0

8
so you end up with the equations

x1 = −x2 ,
x2 free,
x3 = 0,

and the eigenspace is       


 −x2   −1  −1
 x2  = x2  1  = Span  1  .
0 0 0
   

For λ = 4, we have  
−2 0 0
A − 4I =  1 0 0  .
1 2 −1
The reduced form is  
1 0 0
0 2 −1 ,
0 0 0
so you end up with the equations

x1 = 0,
1
x2 = x3 ,
2
x3 free,

and the eigenspace is


        
 0   0  0 0
− 1 x3  = x3 − 1  = Span − 1  = Span −1 .
2 2  2
x3 1 1 2
  

(Note that in the last step I just multiplied the vector by 2 so there wouldn’t be any fractions. You’re allowed to
do this because it doesn’t change the span.)
For λ = 3, we have  
−1 0 0
A − 3I =  1 1 0 .
1 2 0
The reduced form is  
1 0 0
0 1 0 ,
0 0 0
so you end up with the equations

x1 = 0,
x2 = 0,
x3 free,

9
and the eigenspace is       
 0   0  0
 0  = x3 0 = Span 0 .
x3 1 1
   


Example 3.7. Problem: Find the eigenvalues and corresponding eigenspaces for the matrix
 
2 3 0
A = 1 4 0 .
0 0 5

Solution: The characteristic polynomial is


 
2−λ 3 0
det(A − λI) = det  1 4−λ 0 .
0 0 5−λ

Expand over the last row or last column to get


 
(2 − λ) 3
(5 − λ) det = (5 − λ) [(2 − λ)(4 − λ) − 3] .
1 (4 − λ)

This polynomial is already partially factored, so just leave the (5 − λ) out there and factor the other part. That
part is just like in Example 3.1, and you’ll end up with (5 − λ)2 (1 − λ), so the eigenvalues are λ = 5 and λ = 1.
It’s worth remembering that 5 appears as a “double root”; this will turn out to have some significance.
For λ = 5, we have  
−3 3 0
A − 5I =  1 −1 0 .
0 0 0
The reduced form is  
1 −1 0
0 0 0 ,
0 0 0
so you end up with the equations

x1 = x2 ,
x2 , x3 free,

and the eigenspace is           


 x2   1 0   1 0 
x2  = x2 1 + x3 0 = Span 1 , 0 .
x3 0 1 0 1
     

Note that the eigenspace is 2-dimensional here. This isn’t something that always happens, but it’s what you
hope for when you’ve got a double root.
For λ = 1, we have  
1 3 0
A − 1I = 1 3 0 .
0 0 4

10
The reduced form is  
1 3 0
0 0 1 ,
0 0 0
so you end up with the equations

x1 = −3x2 ,
x2 free,
x3 = 0,

and the eigenspace is       


 −3x2   −3  −3
 x2  = x2  1  = Span  1  .
0 0 0
   

3.4 Similar matrices and the characteristic polynomial

If you are given two square matrices A and B, one question you could ask is whether A and B are similar or
not. (Remember that similar matrices essentially describe the same operation up to a change of basis, so two
different people attempting to describe the same system might need to answer this question when they compare
their work.)
To prove that A and B are similar, you could just find a P such that P−1 AP = B, and that would do it. But what
if you thought A and B actually aren’t similar? The issue is that it isn’t good enough to say “I tried to find a P
and couldn’t do it.” You need to somehow prove that it isn’t possible to find P. One way to do this is to look
for things called “invariants” — quantities that similar matrices have in common. Here are some invariants that
you already know about:
• Rank: If A and B are similar, then rank(A) = rank(B). So if A and B have different ranks, then you would
know that they are not similar. Warning: Similar matrices can have different row-reduced forms, but they
should have the same number of leading variables.
• Nullity: If A and B are similar, then nullity(A) = nullity(B). (Although, because of the Rank-Nullity
Theorem, this essentially provides the same information as the rank.)
• Determinant: If A and B are similar, then det A = det B (The explanation was in Section 1.2). So, if A and
B have different determinants, then you would know that they are not similar.
But the above invariants aren’t all that sensitive, in the sense that there are a lot of examples of matrices that
have the same rank, nullity, and determinant, but still aren’t similar. For example,
     
1 1 2 3 3 1
, ,
0 1 1 2 5 2

all have determinant 1, rank 2, and nullity 0, but we will soon see that none of them is similar to any other.
The following theorem provides a much more powerful invariant.

Theorem 3.8. If A and B are similar, then A and B have the same characteristic polynomial.

11
Proof. I claim that, if A and B are similar, then A − λI and B − λI are similar as well. Specifically, if P−1 AP = B,
then
P−1 (A − λI)P = P−1 AP − λP−1 P = B − λI.
We know that similar matrices have the same determinant, so det(A − λI) = det(B − λI). In other words, the
characteristic polynomials of A and B are equal.

Corollary 3.9. If A and B are similar, then A and B have the same eigenvalues.

Warning: Similar matrices don’t necessarily have the same eigenvectors!


For the three matrices above, the characteristic polynomials are, respectively,

λ2 − 2λ + 1, λ2 − 4λ + 1, λ2 − 5λ + 1.

The fact that they all have different characteristic polynomials qualifies as proof that they are not similar to each
other.
 
1 1
Warning: The characteristic polynomial still isn’t a perfect invariant. For example, has the same char-
0 1
acteristic polynomial as the identity matrix, even though it isn’t similar to the identity matrix.

4 More about diagonalization

4.1 When is a matrix diagonalizable?

Given a square matrix A, you could imagine two questions related to its diagonalizability. One would simply
ask, “Is A diagonalizable?” The other would actually ask you to diagonalize A (that is, find the P and D),
assuming that A is diagonalizable. In many cases, it’s possible to determine whether A is diagonalizable without
doing all the work required to actually diagonalize A. One of the facts that makes this possible is the following
theorem:

Theorem 4.1. Any set of eigenvectors that all correspond to different eigenvalues will automatically be lin-
early independent.

A simple conclusion you can draw from this theorem is the following:

Corollary 4.2. If an n × n matrix has n different eigenvalues, then you know for sure that it is diagonalizable.

Proof. Each eigenvalue has at least one associated eigenvector (otherwise it wouldn’t be an eigenvalue!). So, if
you had n different eigenvalues, then you could find one eigenvector for each eigenvalue. They will automati-
cally be linearly independent because of Theorem 4.1, so they form a basis of eigenvectors.

The point here is that you might not need to find the eigenvectors, if you are only asked to determine diagonaliz-
ability. If your characteristic polynomial fully factors without any double roots, then the matrix is diagonalizable
for sure.

12
Example 4.3. Problem: Is the matrix  
2 3
A=
1 4
diagonalizable?
Solution: This is the same matrix in Example 3.1, so I won’t repeat the work here. But the point is that this
question doesn’t require you to do all the work we did in Example 3.1. As soon as you get to the point where
you know that the eigenvalues are λ = 5 and 1, then you can say “Yes, it is diagonalizable, and we know this
because there are two different eigenvalues”. ♦
Similarly, you can say that the matrices in Examples 3.2 and 3.3 are diagonalizable because they both have two
different eigenvalues. (Note that in Example 3.3 we didn’t bother finding any eigenvectors, but we didn’t need
to do so to determine diagonalizability.) Also, because the matrix in Example 3.6 is lower-triangular, you can
quickly see that the eigenvalues are λ = 2, 4, 3, which are all different, and conclude that it’s diagonalizable
without doing any work at all. Here’s a 4-dimensional example:
Example 4.4. Problem: Is the matrix  
1 4 −2 6
0 4 6 2
 
0 0 0 6
0 0 0 2
diagonalizable?
Solution: It is upper-triangular, so its eigenvalues are the numbers along the diagonal: 1, 4, 0, 2. Since all four
eigenvalues are different, you can conclude that it is diagonalizable. ♦
What about the case where you don’t have n different eigenvalues? This can actually happen in two different
ways:
• It’s possible that the characteristic polynomial doesn’t fully factor, like in Example 3.5. In this case,
the matrix won’t be diagonalizable, at least not in terms of real numbers. However, it may still be possible
to diagonalize it using complex numbers; we’ll see how this works (at least in the 2 × 2 case) in Section
5. But for now, let’s say that the answer is “No-with-an-asterisk”.
• It’s possible that the characteristic polynomial has a repeated root, like in Examples 3.4 and 3.7.
These cases are more tricky. In order for such a matrix to be diagonalizable, any eigenvalue that’s a
double root should have a 2-dimensional eigenspace, any eigenvalue that’s a triple root should have a
3-dimensional eigenspace, etc. So, in general, you will still have to do some work toward computing
eigenspaces, but you can restrict your attention to the eigenspaces for repeated-root eigenvalues. Some
examples are below.
Example 4.5. Problem: Is the matrix  
2 −1
1 0
diagonalizable?
Solution: This is the matrix from Example 3.4. As we found there, it has a double root λ = 1, and the corre-
sponding eigenspace is only 1-dimensional. So this matrix is not diagonalizable. In Section 6, we’ll learn about
the “next-best thing” to diagonalization for this one. ♦

13
Example 4.6. Problem: Is the matrix  
2 3 0
1 4 0
0 0 5
diagonalizable?
Solution: This is the matrix from Example 3.7. As we saw there, λ = 5 is a double root, but luckily the
corresponding eigenspace is 2-dimensional. So we can conclude that this matrix is diagonalizable. Note that
you don’t need to compute the λ = 1 eigenspace in order to answer this question. ♦
Example 4.7. Problem: Is the matrix  
4 0 −1
0 4 −1
1 0 2
diagonalizable?
Solution: The characteristic polynomial factors as (λ − 3)2 (λ − 4) (I’ll leave the details of that calculation
to you). Because λ = 3 is a double root, we need to check whether or not the corresponding eigenspace is
2-dimensional.
We have  
1 0 −1
A − 3I = 0 1 −1 ,
1 0 −1
which row-reduces to  
1 0 −1
0 1 −1 .
0 0 0
There is only one free variable, so the eigenspace is only 1-dimensional. We can conclude that the matrix is not
diagonalizable. ♦

4.2 Examples of diagonalization

Let’s look again at the examples from Section 3.3. For each of the matrices, we’ll answer the following question:
Problem: If A is diagonalizable, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D.
We already did the necessary work to answer this question in Section 3.3, so it’s just a matter of presenting it
in the appropriate way. The idea is that P is made up out of a basis of eigenvectors, and D is made up out of the
corresponding eigenvalues.
• Example 3.1:    
1 −3 5 0
P= , D= .
1 1 0 1

• Example 3.2:    
1 −1 5 0
P= , D= .
1 1 0 −1

• (Example 3.3 is skipped because who wants to compute the eigenvectors for that)

14
• Example 3.4: Not diagonalizable
• Example 3.5: Not diagonalizable (at least in terms of real numbers)
• Example 3.6:    
−1 0 0 2 0 0
P =  1 −1 0 , D = 0 4 0 .
0 2 1 0 0 3

• Example 3.7:    
1 0 −3 5 0 0
P = 1 0 1  , D = 0 5 0 .
0 1 0 0 0 1

4.3 Using diagonalization to solve systems of differential equations

Finally, we’re ready to put the strategy outlined in Section 1.3 into action. If a matrix A is diagonalizable, then
you can solve the system x 0 = Ax with the following steps:
1. Diagonalize A. That is, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D. Doing
this involves the following substeps:
(a) Find the eigenvalues of A.
(b) For each eigenvalue, find the eigenspace, and express it as the span of a set of linearly independent
vectors. (The number of vectors in the set should be equal to the multiplicity of the eigenvalue,
which more often than not is just one.)
(c) The matrix P is made up out of all of the spanning vectors you found for the eigenspaces, and D is
made up out of the corresponding eigenvalues.
2. Solve the system y 0 = Dy. This is easy to do because the system is decoupled.
3. The solutions to the original system are given by x = Py.
Example 4.8. Problem: Find all solutions to the system of differential equations
x10 = 2x1 + 3x2 ,
x20 = x1 + 4x2 .

Solution: In this case, A is the matrix in Example 3.1, so we’ve already done step 1. We got
   
1 −3 5 0
P= , D= .
1 1 0 1

For step 2, we consider the system y 0 = Dy, or


y01 = 5y1 ,
y02 = y2 .
The solutions are simply y1 = c1 e5t and y2 = c2 et (remember that in general, each variable is allowed to have a
different constant), which can be written in vector form as
 5t 
c e
y= 1 t .
c2 e

15
For step 3, we just need to multiply by P to get x:

1 −3 c1 e5t
  5t
c1 e − 3c2 et
  
x = Py = =
1 1 c2 et c1 e5t + c2 et

It’s common to split the solution into the parts involving c1 and c2 , and to pull out the exponential scalars, like
this:    
5t 1 t −3
x = c1 e + c2 e .
1 1
And that’s the final answer. ♦
Observe that the the vectors in the final answer of Example 4.8 are the eigenvectors. This is not a coincidence.
Here’s the general explanation. Suppose that A is a 2 × 2 matrix, and that v1 and v2 are linearly independent
eigenvectors with eigenvalues λ1 and λ2 , so that
 
  λ1 0
P = v1 v2 , D = .
0 λ2

The solutions to y 0 = Dy can be written as


 λ t
c e 1
y = 1 λ2t = c1 eλ1t e1 + c2 eλ2t e2 .
c2 e

Then, because P sends e1 to v1 and e2 to v2 , the solutions to x 0 = Ax are

x = Py = c1 eλ1t v1 + c2 eλ2t v2 .

If you’ve solved linear systems of equations in any of your other classes, you were probably just taught to use
this last formula as the “rule” for solving a system. Now you know where that rule comes from.

5 Complex eigenvalues

5.1 Diagonalization over the complex numbers

As we observed in Section 4.1, one way that a matrix might fail to be diagonalizable is that its characteristic
polynomial might not fully factor. In the 2 × 2 case, this happens when the quadratic formula gives you a
negative quantity under the square root. We saw that this happened in Example 3.5.
One way to get around this problem is to expand your universe to include complex numbers. This is a more
dramatic shift than you might realize — it means that everything should be allowed to be complex. The
scalars can be complex numbers instead of real numbers, the vectors can have complex entries (so the space is
Cn instead of Rn ), and the matrices can have complex entries also. Conceptually, this is a big deal.
But from a practical point of view, it’s actually not that much of a change. Everything works exactly the same
way as before, except that all the arithmetic involves complex numbers.
Example 5.1. Problem: Consider the matrix
 
1 −2
A= .
1 3

Using complex numbers, find the eigenvalues and corresponding eigenspaces of A.

16
Solution: The characteristic polynomial is
(1 − λ)(3 − λ) + 2 = λ2 − 4λ + 5.

Since −4 = 2i, the quadratic formula tells us that the roots are

4 ± 16 − 20 4 ± 2i
λ= = = 2 ± i.
2 2
Again, the eigenspaces are the nullspaces of A − λI. For λ = 2 + i, we get
 
−1 − i −2
A − (2 + i)I = .
1 1−i
Now, (just like in the case of real eigenvalues) there should be a free variable here, which means the two rows
should be multiples of each other. And they are, but it’s less obvious than it is with real numbers (in this case,
the first row is −1 − i times the second row). But as long as you’re sure you found your eigenvalues correctly,
it’s safe to assume that the rows are equivalent, and you can take either one of them and use the corresponding
equation to determine the eigenspace. In this case, the second row is a little bit nicer, because it has the 1 in it.
Its corresponding equation is
x1 + (1 − i)x2 = 0,
so
x1 = (−1 + i)x2
and x2 is free, and the eigenspace is
      
(−1 + i)x2 −1 + i −1 + i
= x2 = Span .
x2 1 1

In principle, the next step would be to find the eigenspace for the other eigenvalue λ = 2 − i. But actually,
we can get this without doing any more work (see Theorem 5.2 below); it is simply the conjugate of the other
eigenspace:  
−1 − i
Span .
1

The following theorem explains the last step in Example 5.1.

Theorem 5.2. If a matrix has all real entries, then its eigenvalues and eigenvectors come in conjugate pairs.
In other words, if x is an eigenvector with eigenvalue λ, then its conjugate x̄ is an eigenvector with the
¯
conjugate eigenvalue λ.

Proof. If x is an eigenvector with eigenvalue λ, then it satisfies the eigenvalue equation


Ax = λx.
If we take the conjugate of both sides of this equation, then everything gets conjugated:
¯
Āx̄ = λx̄.
Since all of the entries of A are real, it is actually equal to its own conjugate: A = Ā. Therefore we see that
¯
Ax̄ = λx̄,
¯
which says that x̄ is an eigenvector with eigenvalue λ.

17
This theorem makes diagonalization not too difficult in the case of complex roots. Here are a couple of examples
that carry it out.
Example 5.3. Problem: Consider the matrix
 
3 −5
A= .
1 −1

Using complex numbers, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D.
Solution: This is the matrix from Example 3.5. We found there that the characteristic polynomial is λ2 − 2λ + 2,
so the quadratic formula gives √
2 ± 4 − 8 2 ± 2i
λ= = = 1 ± i.
2 2

For λ = 1 + i, we have  
2−i −5
A − (1 + i)I = .
1 −2 − i
Both rows should be equivalent, so let’s use the equation for the second row because it has a 1:

x1 + (−2 − i)x2 = 0,

x1 = (2 + i)x2 ,
so the eigenspace is  
2+i
Span .
1
Because of Theorem 5.2, we don’t have to do any further work to know that the eigenspace for the conjugate
¯ = 1 − i is
λ  
2−i
Span .
1
As usual, P is made up out of a basis of eigenvectors, and D is made up out of the corresponding eigenvalues:
   
2+i 2−i 1+i 0
P= , D= .
1 1 0 1−i


Example 5.4. Problem: Diagonalize the matrix
 
3 −2
A=
4 −1

over the complex numbers.


Solution: The characteristic polynomial is

(3 − λ)(−1 − λ) + 8 = λ2 − 2λ + 5.

2± 4−20 2±4i
The roots are λ = 2 = 2 = 1 ± 2i. For λ = 1 + 2i, we get
 
2 − 2i −2
A − (1 + 2i)I = .
4 −2 − 2i

18
Both rows should be equivalent, so let’s use the equation in the first row:
(2 − 2i)x1 = 2x2 ,
so we can take x1 = 2 and x2 = (2 − 2i) (note the “cross-multiplication trick”). So the eigenspace is
 
2
Span .
2 − 2i
¯ = 1 − 2i, the eigenspace is the conjugate
For the conjugate eigenvalue λ
 
2
Span ,
2 + 2i
so we can set    
2 2 1 + 2i 0
P= and D =
2 − 2i 2 + 2i 0 1 − 2i
and know that P−1 AP = D. ♦

5.2 Solving systems with complex eigenvalues

Suppose you wanted to solve the system x 0 = Ax, where A is a 2 × 2 matrix with complex eigenvalues. In this
case, the method described in Section 4.3 still works, but the problem is that you’ll get complex solutions, even
though you are most likely only looking for real solutions.
To illustrate what happens, let’s use the matrix A from Example 5.3. In that case, the solutions to the equation
y 0 = Dy are  (1+i)t 
a e
y = 1 (1−i)t = a1 e(1+i)t e1 + a2 e(1−i)t e2 ,
a2 e
so the solutions to the original equation are
x = Py
 
2 + i 2 − i  (1+i)t 
= a1 e e1 + a2 e(1−i)t e2
1 1
   
(1+i)t 2 + i (1−i)t 2 − i
= a1 e + a2 e .
1 1
The above is a linear combination of conjugate solutions, where the constants a1 and a2 are complex numbers.
I’ll skip the details, but with a little bit of complex arithmetic, you can conclude that the real solutions are linear
combinations of the real and imaginary parts of these solutions. So we need to find the real and imaginary parts.
Using Euler’s identity, we can rewrite the exponentials in terms of sine and cosine; in particular,
e(1+i)t = et eit = et (cost + i sint).
Then, if we focus on the part of the above solution that’s multiplied by a1 , we get
   
2+i 2+i
e(1+i)t = et (cost + i sint)
1 1
 
t 2 cost + 2i sint + i cost − sint
=e .
cost + i sint

19
The real part of this is  
2 cost − sint
et ,
cost
and the imaginary part is  
t2 sint + cost
e ,
sint
so the general solution consists of all linear combinations of the two:
   
t 2 cost − sint t 2 sint + cost
x = c1 e + c2 e .
cost sint

6 Jordan form for 2 × 2 matrices

Even if a matrix has real eigenvalues, it might fail to be diagonalizable if the characteristic polynomial has
repeated roots, and if the corresponding eigenspaces aren’t big enough. In this case, you have to look for
something else.
Recall that our motivation for diagonalization was the fact that the system y 0 = Dy is decoupled when D is
diagonal, so it was easy to solve. At this point we need to search for other examples of “nice” matrices, for
which the corresponding system is solvable. Jordan matrices provide an example of such a matrix.
In general, the theory of Jordan matrices is somewhat complicated, but we’ll focus on the 2 × 2 case, which is
more reasonable for us to do in an introductory course.

6.1 Jordan matrices

Definition: a 2 × 2 Jordan matrix is a matrix that looks like


 
λ 1
J=
0 λ

for some number λ.

The corresponding system of differential equations y 0 = Jy is

y01 = λy1 + y2 ,
y02 = λy2 .

This system isn’t completely decoupled like it is for diagonal matrices, but there’s still a clear path to solving
it. First, solve the equation for y2 :
y2 = c2 eλt ,
and then plug that into the equation for y1 :

y01 = λy1 + c2 eλt .

This equation is a little more tricky, but it can be solved using a technique called “integrating factors”. I’ll skip
the details and tell you the answer:
y1 = c1 eλt + c2teλt .

20
So, in vector form, the solutions are
 
c1 + c2 t
λt
y=e = eλt (c1 e1 + c2 (e1 + te2 )) .
c2

6.2 Jordanization

If A is a 2 × 2 matrix with only one eigenvalue λ, you could ask whether A is similar to the Jordan matrix
 
λ 1
J= .
0 λ

Spoiler alert: it’s going to turn out that the answer is always yes (except for the boring case where A = λI). This
is good news — it means that we can always solve the corresponding systems.
The process for finding the P in this case is surprisingly easy. Here is a step-by-step process:
1. Pick a nonzero vector v that isn’t an eigenvector. There are two ways to handle this. One is to figure out
what the eigenvectors are, and then pick something different. The other way (which I would recommend)
is to just pick any vector v since, probabalistically, it’s unlikely that a randomly chosen vector will be an
eigenvector. And, in the unlucky case that you do pick an eigenvector, you would be able to figure it out
in Step 2.
2. Let u = Av − λv. Important note: If you get u = 0 here, then this means that v is an eigenvector, and you
should go back and pick a different v.
3. Form P out of the vectors u and v:  
P= u v .

And that’s it! It will automatically be true that P−1 AP = J. This process doesn’t have a good standard name
(people might say it “puts a matrix in Jordan form”), but we’ll call it Jordanization. The set of vectors {u, v} is
normally called a Jordan basis.
Of course, if you’re surprised, skeptical, and/or mystified by the process, that’s a good thing, since I just sprung
these instructions on you without any explanation for why they work. I’ll give the explanation below, but first
let’s demonstrate with an example:
Example 6.1. Problem: Consider the matrix
 
5 2
A= .
−2 1

Find an invertible matrix P such that P−1 AP is a Jordan matrix.


Solution: If you compute the eigenvalues of A, you’ll find that it only has one eigenvalue λ = 3 (This is
important, because if there were two different eigenvalues, you would diagonalize instead of Jordanize.). We’ll
go through the three steps described in Section 6.2.
 
1
Step 1 says to basically pick any vector at random. I’ll pick v = because it’s nice and simple.
0
Step 2 says to compute      
5 3 2
u = Av − 3v = − = .
−2 0 −2

21
(At this point, if you got u = 0, you would go back and pick a different v.)
Step 3 says to form P out of u and v (I’ll stress that the order in which you put the vectors matters here):
 
  2 1
P= u v = ,
−2 0
 
3 1
and that’s it. It will automatically be true that P−1 AP = . To check that it works, type this into Wolfra-
0 3
mAlpha:
{{2,1},{-2,0}}ˆ(-1) * {{5,2},{-2,1}} * {{2,1},{-2,0}}
I’ll point out that, because we were able to pick v almost
 completely randomly, the matrix P in this case is very
1
much non-unique. For example, if we had picked v = , then we would have gotten
1

    
7 3 4
u = Av − 3v = − =
−1 3 −4

and  
  4 1
P= u v = ,
−4 1
 
3 1
but it would still be true that P−1 AP = , and that’s the part that matters. ♦
0 3

6.3 Why does this strange process work?

To explain why the Jordanization process works, we need to figure out two things: First, how do we know that
P is invertible? Second, how do we know that P−1 AP = J?
Invertibility of P is equivalent to linear independence of the vectors u and v. So, why should they be linearly
independent? Well, if they weren’t, then that would mean that one of them is a scalar multiple of the other.
Then we would be able to write
u = µv
for some scalar µ. But, by definition, u = Av − λv, so this would mean

Av − λv = µv,
Av = (λ + µ)v,

which would mean that v is an eigenvector with eigenvalue λ + µ. This simply can’t happen for µ 6= 0 (since λ
is the only eigenvalue). It could happen for µ = 0, but that’s exactly the case where u = 0, which the process
tells you to avoid. So we can conclude that, if you follow the process, the vectors u and v that you’ll end up
with will be linearly independent.
To explain why P−1 AP = J, let’s return to the diagram that describes how similar matrices relate:

Rn o
J
Rn
P P
 
Rn o A
Rn

22
Let’s start with e2 in the upper right corner. If you go to the upper left corner, you get

Je2 = e1 + λe2 .

Alternatively, if you go to the lower right corner, you get Pe2 , which is v (because v is the second column of
P). We can also compute
PJe2 = P(e1 + λe2 ) = u + λv.
This is enough information to fill in all four corners of the diagram:


e1 +_λe2 o
J
e_2
P P

 
u + λv = Av o
A
v

And you can see that, to make the diagram “work”, it’s necessary that u + λv = Av. This is why the instructions
tell you to set u = Av − λv.
But we also need to worry about what happens when we start with e1 in the upper right. In this case, the diagram
is
o J 
λe
_1 e_1
P P

 
λu = Au o
A
u
which tells us that u needs to be an eigenvector. It turns out that this happens automatically. There’s a theorem
called the Cayley-Hamilton Theorem that says that (in this particular situation) (A − λI)2 = 0. This means that,
for any vector v,

(A − λI)2 v = 0,
(A − λI)(Av − λv) = 0,
(A − λI)u = 0,

so u is an eigenvector. So that’s good, everything works the way it was supposed to, and we can conclude that
P−1 AP = J.
A quick note on Cayley-Hamilton: The general statement of the Cayley-Hamilton Theorem is that a matrix
“satisfies” its characteristic equation. In other words, if the characteristic polynomial of A is λn + bn−1 λn−1 +
· · · + b1 λ + b0 , then
An + bn−1 An−1 + · · · + b1 A + b0 I = 0.
A general proof of Cayley-Hamilton would be better left for a more advanced course, but it is straightforward
(albeit not very enlightening) to verify it in the 2 × 2 case. If
 
a b
A= ,
c d

then the characteristic polynomial of A is λ2 − (a + d)λ + (ad − bc). By direct calculation, you can check that

A2 − (a + d)A + (ad − bc)I = 0.

23
6.4 Using Jordanization to solve systems of differential equations

Example 6.2. Problem: Describe the solutions to the system of differential equations

x10 = 5x1 + 2x2 ,


x20 = −2x1 + x2 .

Solution: The matrix for this system is the matrix A from Example 6.1. As we saw there, P−1 AP = J, where
   
2 1 3 1
P= and J = .
−2 0 0 3

Using the method described in Section 1.3, we can first solve the system y 0 = Jy and then set x = Py to obtain
solutions to the original system.
As we saw in Section 6.1, the solutions to y 0 = Jy are
 
3t c1 + c2t
y=e = e3t (c1 e1 + c2 (e1 + te2 )) ,
c2

so the solutions to the original system are


        
3t 2 1 c1 + c2t 3t 2c1 + 2c2t + c2 3t 2 3t 2t + 1
x = Py = e =e = c1 e + c2 e .
−2 0 c2 −2c1 − 2c2t −2 −2t


In the general situation, the solutions to y 0 = Jy are
 
c + c2t
y = eλt 1 = eλt (c1 e1 + c2 (e1 + te2 )) ,
c2

and the solutions to the equation x 0 = Ax are

x = Py = eλt (c1 u + c2 (u + tv)) .

If you’ve taken a course on differential equations, you might have learned a formula like this, but you probably
didn’t get a good explanation of where the formula came from. But now we’ve derived it from scratch.

24

You might also like