0% found this document useful (0 votes)
1K views92 pages

Linear Algebra

This document discusses linear functions and matrices. It defines key concepts like vector spaces, bases, and linear combinations. It explains that a vector space together with rules for scalar multiplication and vector addition is called an n-dimensional vector space. It also describes the standard basis vectors and how any vector in Rn can be uniquely represented as a linear combination of the standard basis vectors using only scalar multiplication and vector addition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views92 pages

Linear Algebra

This document discusses linear functions and matrices. It defines key concepts like vector spaces, bases, and linear combinations. It explains that a vector space together with rules for scalar multiplication and vector addition is called an n-dimensional vector space. It also describes the standard basis vectors and how any vector in Rn can be uniquely represented as a linear combination of the standard basis vectors using only scalar multiplication and vector addition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Chapter 6

Linear Algebra

6.1 Linear Functions and Dynamical Systems


In this chapter, we will be studying linear functions in n dimensions:
f : Rn −→ Rn
As we develop this subject, called linear algebra, we are always going to keep two applications
in mind.
(1) discrete-time dynamical systems, where f : Rn → Rn is the function giving the next state
as a function of the previous state:
(X1 , X2 , . . . , Xn )N+1 = f (X1 , X2 , . . . , Xn )N
(2) continuous-time differential equations, where f is the vector field giving the change vector
as a function of the state vectors:
(X1 , X2 , . . . , Xn ) = f (X1 , X2 , . . . , Xn )
Notation
When we want to refer to a point in Rn , that is, a vector, we will denote it by a single boldface
letter, such as X and Y: ⎛ ⎞ ⎛ ⎞
X1 Y1
⎜X2 ⎟ ⎜Y2 ⎟
⎜ ⎟ ⎜ ⎟
X=⎜ . ⎟ Y=⎜.⎟
⎝ .. ⎠ ⎝ .. ⎠
Xn Yn

Note that we have started to write the vector (X1 , X2 , . . . , Xn ) vertically, using round parenthe-
ses. The vertical expression means exactly the same thing as the horizontal expression; the horizontal
one is common in dynamical systems theory, and the vertical one is common in linear algebra.

6.2 Linear Functions and Matrices


Points and Vectors
We know that the state space of a dynamical system is Rn , the space of all n-tuples
(X1 , X2 , . . . , Xn ), with each Xi belonging to R. This is the view of state space we developed in


c Springer International Publishing AG 2017 273
A. Garfinkel et al., Modeling Life,
DOI 10.1007/978-3-319-59731-7_6
274 Linear Algebra

Chapter 1: state space is the space of all possible values of the state vector. This is true for
both state space and tangent space, both of which are Rn . For example, in the Romeo–Juliet
models, the state space R2 consists of all possible pairs (R, J), where both R and J belong to
R, and the tangent space is also R2 , the space of all possible pairs (R , J  ), where both R and
J  belong to R.
We also learned in Chapter 1 some elementary rules for manipulating vectors. We needed
these rules, for example, in Euler’s method, where we needed to multiply the change vector X 
by the scalar Δt to get a small change vector, and then we needed to add the small change vector
to the current state vector to get the next state vector. These rules for scalar multiplication and
vector addition are the rules we will need for operating in Rn .
The space of all n-vectors Rn , together with the rules for scalar multiplication and vector
addition, is called n-dimensional vector space. Note that the sum of n-vectors is also an n-
vector, and the scalar multiple of an n-vector is also an n-vector. So the operations of scalar
multiplication and vector addition keep us in the same space.
In this chapter, we will learn about the property of vector spaces and the linear functions that
take Rn → Rk , that is, take vectors in n-dimensional space (the domain) and assign to each of
them a vector in k-dimensional space (the codomain). Most of the time, we will focus on the
case n = k. To begin, let’s recall the rules for operating with vectors from Chapter 1.
(1) If X and Y are two vectors in Rn , then their sum is defined by
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X1 Y1 X1 + Y1
⎜X2 ⎟ ⎜Y2 ⎟ ⎜X2 + Y2 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
X+Y =⎜ . ⎟+⎜ . ⎟=⎜ .. ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ . ⎠
Xn Yn Xn + Yn
(2) If X is a vector in Rn and a is a scalar in R, we define the multiplication of a vector by
a scalar as ⎛ ⎞ ⎛ ⎞
X1 aX1
⎜X2 ⎟ ⎜aX2 ⎟
⎜ ⎟ ⎜ ⎟
aX = a⎜ . ⎟ = ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠
Xn aXn

Exercise 6.2.1 Carry out the following operations, or say why they’re impossible.
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 −2 4  1
2
a) 2 + 0 ⎠
⎝ ⎠ ⎝ b) −3⎝ 6 ⎠ c) + 3⎠

4
3 5 −9 5
    ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 7 1 0 1 0 0
d) 5( + ) e) −4 +2
1 3 0 1 f) 5⎝0⎠ − 3⎝1⎠ + 8⎝0⎠
0 0 1

Bases and Linear Combinations


In Rn there is a certain set of vectors that play a special role. It is the set
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 0
⎜0⎟ ⎜1⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
e1 = ⎜ . ⎟ e2 = ⎜ . ⎟ · · · en = ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
0 0 1
6.2. Linear Functions and Matrices 275

These n vectors are a basis for Rn , by which we mean that every vector X can be written
uniquely as ⎛ ⎞
X1
⎜X2 ⎟
⎜ ⎟
X = ⎜ . ⎟ = X1 e1 + X2 e2 + · · · + Xn en
⎝ .. ⎠
Xn

To see why an arbitrary vector X can be represented uniquely in the {e1 , e2 , . . . , en } basis,
recall that ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X1 X1 0 0
⎜X2 ⎟ ⎜ 0 ⎟ ⎜X2 ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
X = ⎜ . ⎟ = ⎜ . ⎟ + ⎜ . ⎟ + ··· + ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
Xn 0 0 Xn

by the rule of vector addition. This, in turn, means that


⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 0
⎜0⎟ ⎜1⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
X = X1 ⎜ . ⎟ + X2 ⎜ . ⎟ + · · · + Xn ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
0 0 1
by the rule of multiplication of a vector by a scalar.
There are many such sets of vectors, giving us many bases for Rn . This particular basis
{e1 , e2 , . . . , en } is called the standard basis, but later in this chapter we will see other bases for
Rn .
For example, let’s consider the 2D vector space R2 representing the juvenile (J) and adult (A)
populations of some animal species. Then a point in (J,A) space represents a certain number of
5
juveniles and a certain number of adults. So the point represents the state in which there
10
are 5 juveniles and 10 adults. The standard basis for R2 is
 
1 0
eJ = eA =
0 1
So we can write
    
5 5 0 1 0
= + =5 + 10 = 5eJ + 10eA
10 0 10 0 1
When we say that every vector X in Rn can be written uniquely as X1 e1 + X2 e2 + · · · + Xn en ,
note that the only operations we have used are scalar multiplication and vector addition. When
we use only scalar multiplication and vector addition to combine a set of vectors, the result is
called a linear combination of those vectors.

Exercise 6.2.2 What are the standard basis vectors for R4 ?

Exercise 6.2.3 In e notation, what is the standard basis vector of R6 that has a 1 in position 5?
276 Linear Algebra

Exercise 6.2.4 Write the following vectors as the sum of scalar multiples of the standard basis
vectors in R2 .
  
45 387 a
a) b) c)
12 509 b
Exercise 6.2.5 Are the following expressions linear combinations? If so, of what variables?

a) 2a + 5b b) e X + 3Y

c) 7Z + 6H − 3t 2 d) −6X + 4W + 5

Exercise 6.2.6 Why does it make sense to describe a smoothie as a linear combination of
ingredients?

Linear Functions: Definitions and Examples


In Chapter 2, we learned that a function f is called linear if and only if two conditions are met:
1) f (X + Y ) = f (X) + f (Y ) and 2) f (cX) = cf (X) for every scalar c. The same definition
applies to functions that act on vectors.
A function f : Rn → Rm is linear if it has the properties
f (X + Y) = f (X) + f (Y) for all X, Y in Rn
f (cX) = cf (X) for all c in R
Note that n and m don’t have to be equal. In other words, the domain and codomain of f
can have different dimensions, although in our applications, they usually won’t.

Exercise 6.2.7 According to the definition of linearity, are the following functions linear?

  2 √
X X b) f (X) = X
a) f ( )=
Y 2Y
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X 2X X 2X
c) f (⎝ Y ⎠) = ⎝XY ⎠ d) f (⎝ Y ⎠) = ⎝ 4Y ⎠
Z 3Z Z 3Z

What Do Linear Functions Look Like?


The definition of linearity tells us what it means for a function to be linear but doesn’t give us
an easy way to tell whether a particular function is linear without doing some work. We will now
develop a way to do that. This will lead to a very useful notation for linear functions, one that
we will use extensively for the next two chapters.

Linear functions R1 → R1 . We’ll start with the simplest example, f : R1 → R1 . In this context,
we think of numbers as one-dimensional vectors and write R1 instead of R. Thinking

of R1 as
a one-dimensional vector space, we see that it has the standard basis {e} = { 1 }.
If f is a linear function and X is any vector in R1 , what is f (X)?
6.2. Linear Functions and Matrices 277

To start answering this question, we’ll take the odd-seeming but useful step of writing X as
X · e. Then, according to the definition of linearity, we have
f (X) = f (X · e) = Xf (e)

Exercise 6.2.8 Which property of linear functions gives us this result?

But what is f (e)? We don’t know what it is, but we do know that it belongs to R1 . Let’s
just call it k. Then
Xf (e) = Xk

As before, we can rewrite k as ke. Then, multiplying, we get


Xk = Xke = kX
Putting it all together yields
f (X) = Xf (e) = Xk = Xke = kX
Since X is in R1 , it is the same as the scalar X, and we can drop the boldface notation and
write f (X) = kX.
To summarize, if f : R1 → R1 is linear, it must have the form f (X) = kX for some scalar k
in R.

Linear functions R2 → R1 . Suppose f : R2 → R1 is a linear function. In R2 , the standard basis


is  
1 0
{e1 , e2 } = { , }
0 1

2 X
A vector in R has the form and can be written as
Y
  
X 1 0
=X +Y = Xe1 + Y e2
Y 0 1
Then from the definition of linear function,

X
f( ) = f (Xe1 + Y e2 ) = f (Xe1 ) + f (Y e2 ) = Xf (e1 ) + Y f (e2 )
Y

Exercise 6.2.9 Which property of linear functions gives us this result?

Now f (e1 ) is some vector in R1 ; call it a. Similarly, f (e2 ) is some vector in R1 ; call it b:

X
f( ) = Xf (e1 ) + Y f (e2 ) = Xa + Y b = Xa e + Y b e = aX + bY
Y

X
To summarize, if f : R2 → R1 is linear, it must have the form f ( ) = aX + bY for two
Y
scalars a and b.
278 Linear Algebra

Exercise 6.2.10 Work through this procedure to find the form that a linear function f : R3 →
R1 must have.

Linear functions: Rn → R1 . In general, if f is a linear function in Rn → R1 , then


⎛ ⎞
X1
⎜X2 ⎟

⎜ ⎟
f (X) = Y where X = ⎜ . ⎟, Y = Y
⎝ .. ⎠
Xn
In Rn , the standard basis is {e1 , e2 , . . . , en }. In R1 , the standard basis is {e}. Then there is
a unique set of scalars c1 , c2 , . . . , cn such that
f (X) = f (X1 e1 + X2 e2 + · · · + Xn en )
= X1 f (e1 ) + X2 f (e2 ) + · · · + Xn f (en )
= X1 c1 e + X2 c2 e + · · · + Xn cn e
= c1 X1 e + c2 X2 e + · · · + cn Xn e
= c1 X1 + c2 X2 + · · · + cn Xn (e is the same as the scalar 1)

Exercise 6.2.11 Explain what we are doing in each step in the series of equations above, paying
special attention to places where we use vector operations and the properties of linear functions.

The representation of f as f (X) = f (X1 e1 + X2 e2 + · · · + Xn en ) is useful, because it explicitly


shows the dependence on the basis vectors e1 , e2 , . . . , en . If we change the basis to a nonstandard
one {v1 , v2 , . . . , vn }, then there will be a different unique set of scalars a1 , a2 , . . . , an and another
unique set of scalars b1 , b2 , . . . , bn such that
f (X) = f (a1 X1 v1 + a2 X2 v2 + · · · + an Xn vn )
= a1 X1 f (v1 ) + a2 X2 f (v2 ) + · · · + an Xn f (vn )
= a1 X1 b1 e + a2 X2 b2 e + · · · + an Xn bn e
= a1 b1 X1 e + a2 b2 X2 e + · · · + an bn Xn e
= a1 b1 X1 + a2 b2 X2 + · · · + an bn Xn (e is the same as the scalar 1)
In summary, every linear function of Rn into R1 can be written as a linear combination of
X1 , X2 , . . . , Xn . The coefficients of the linear combination depend on the choice of basis, so we
will absolutely have to keep track of the basis vectors that we are using.

The Matrix Representation of a Linear Function


Now that we understand linear functions from Rn to R1 , we can extend this to a complete
representation of all functions Rn to Rn (or even Rn to Rm , although we will not often need
that).
6.2. Linear Functions and Matrices 279

The case f : R2 → R2 . Suppose f is a linear function R2 → R2 . In the standard basis {e1 , e2 }


of R2 , we use the properties of linearity to get

X
f( ) = f (Xe1 + Y e2 ) = Xf (e1 ) + Y f (e2 )
Y
Since both f (e1 ) and f (e2 ) are vectors in R2 , there are scalars a, b, c, and d such that
 
a b
f (e1 ) = and f (e2 ) =
c d
We can then say that   
X a b
f( )=X +Y
Y c d

Applying scalar multiplication and vector addition, we get


   
X aX bY aX + bY
f( )= + =
Y cX dY cX + d Y
Thus, the four numbers a, b, c, and d characterize f relative to the basis {e1 , e2 }. Since X
and Y are placeholders, in order to characterize the function f , we really need only the four
numbers a, b, c, and d. We will write the four numbers as a 2 × 2 array in square brackets:

a b
c d
When an array of numbers is used to characterize
a linear function, the array is called a
a b
matrix. We say that the 2 × 2 matrix is the matrix representation of f relative
c d
to the basis {e1 , e2 }.
The operation of a linear function f on a vector is then calculated by applying the matrix
representing f (relative to a given basis) to the representation of the vector. We can write
  
X a b X aX + bY
f( )= =
Y c d Y cX + d Y

Exercise 6.2.12 Work through


 the reasoning of this section using numerical vectors of your
1 0
choosing for f ( ) and f ( ).
0 1

When we want to talk about applying a matrix to a vector, we just write them next to each
other, putting 
the
matrix in square brackets on the left and the vector in round brackets on the
a b X
right: . The action of f on a vector in the domain is found by applying the matrix
c d Y
representation of f to the vector, according to the rule shown in Figure 6.1.

a b X aX + bY a b X aX + bY
c d Y c d Y cX + dY

Figure 6.1: Applying a matrix to a vector in R2 .


280 Linear Algebra

Notice that the first column of the matrix is f (e1 ), and the second column is f (e2 ). This is
a general principle of how matrices work.

 
3 −2
Exercise 6.2.13 If f (e1 ) = and f (e2 ) = , what is the matrix representation of f ?
6 5

6 8
Exercise 6.2.14 If the matrix representing f is , what are f (e1 ) and f (e2 )?
5 1

The case f : R3 → R3 . Suppose f is a linear function that takes vectors in R3 (the domain) to
R3 (the codomain). And suppose X is a vector in R3 . In the standard basis {e1 , e2 , e3 }, X can
be written as
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X1 1 0 0

X = X2 = X1 0 + X2 1 + X3 0⎠ = X1 e1 + X2 e2 + X3 e3
⎠ ⎝ ⎠ ⎝ ⎠ ⎝
X3 0 0 1
To evaluate the action of f on X, we know that
f (X) = f (X1 e1 + X2 e2 + X3 e3 )
By the rules of linearity, we can decompose f (X) as
f (X) = f (X1 e1 + X2 e2 + X3 e3 )
= f (X1 e1 ) + f (X2 e2 ) + f (X3 e3 )
= X1 f (e1 ) + X2 f (e2 ) + X3 f (e3 )
We can say that f (e1 ) is some vector in R3 . Therefore, there are scalars a11 , a21 , and a31
such that ⎛ ⎞
a11
f (e1 ) = ⎝a21 ⎠
a31

The vector f (e2 ) is also some vector in R3 . So there are scalars a12 , a22 , and a32 such that
⎛ ⎞
a12
f (e2 ) = ⎝a22 ⎠
a32
Similarly, for f (e3 ), there are scalars a13 , a23 , and a33 such that
⎛ ⎞
a13
f (e3 ) = ⎝a23 ⎠
a33
Consequently, plugging the expressions for f (e1 ), f (e2 ), and f (e3 ) into f (X), we get
f (X) = X1 f (e1 ) + X2 f (e2 ) + X3 f (e3 )
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 a12 a13
= X1 ⎝a21 ⎠ + X2 ⎝a22 ⎠ + X3 ⎝a23 ⎠
a31 a32 a33
6.2. Linear Functions and Matrices 281
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 X1 a12 X2 a13 X3
= ⎝a21 X1 ⎠ + ⎝a22 X2 ⎠ + ⎝a23 X3 ⎠
a31 X1 a32 X2 a33 X3
⎛ ⎞
a11 X1 + a12 X2 + a13 X3

= a21 X1 + a22 X2 + a23 X3 ⎠
a31 X1 + a32 X2 + a32 X2
⎡ ⎤⎛ ⎞
a11 a12 a13 X1
= ⎣a21 a22 a23 ⎦⎝X2 ⎠
a31 a32 a33 X3
Therefore, the 3 × 3 matrix [aij ] is the matrix1 representation of f : R3 → R3 relative to the
standard basis {e1 , e2 , e3 }.

Exercise 6.2.15 For a function f : R3 → R2 , choose vectors for f (e1 ), f (e2 ), f (e3 ) and work
through the reasoning above to find the matrix representation of f . What are the dimensions
of this matrix?

Exercise 6.2.16 Similarly, for another function g : R3 → R2 , choose vectors for g(e1 ), g(e2 )
and work through the reasoning above to find the matrix representation of g. What are the
dimensions of this matrix?

Generalizing to f : Rn → Rn . We can generalize these ideas to f : Rn → Rn . Suppose f is a


linear function Rn → Rn . If X is any vector in Rn , then it can be written in the standard basis
{e1 , e2 , . . . , en } as ⎛ ⎞
X1
⎜X2 ⎟
⎜ ⎟
X = ⎜ . ⎟ = X1 e1 + X2 e2 + · · · + Xn en
⎝ .. ⎠
Xn

To find f (X), we use the fact that we know that there are always scalars aij (i , j = 1, 2, . . . , n)
such that ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 a12 a1n
⎜a21 ⎟ ⎜a22 ⎟ ⎜a2n ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
f (e1 ) = ⎜ . ⎟ f (e2 ) = ⎜ . ⎟ · · · f (en ) = ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
an1 an2 ann

Then
⎛ ⎞
X1
⎜X2 ⎟
⎜ ⎟
f (⎜ . ⎟) = f (X1 e1 + X2 e2 + · · · + Xn en ) linear combination
⎝ .. ⎠
Xn
= X1 f (e1 ) + X2 f (e2 ) + · · · + Xn f (en ) properties of linearity

1 We will often write the matrix whose components are aij as the matrix [aij ].
282 Linear Algebra
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 a12 a1n
⎜a21 ⎟ ⎜a22 ⎟ ⎜a2n ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ representation of
= X1 ⎜ . ⎟ + X2 ⎜ . ⎟ + · · · + Xn ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠ f (e1 ), f (e2 ), . . . , f (en )
an1 an2 ann
⎛ ⎞
a11 X1 + a12 X2 + · · · + a1n Xn
⎜a21 X1 + a22 X2 + · · · + a2n Xn ⎟ scalar multiplication
=⎜



... vector addition
an1 X1 + an2 X2 + · · · + ann Xn
⎡ ⎤⎛ ⎞
a11 a12 . . . a1n X1
⎢a21 a22 . . . a2n ⎥⎜X2 ⎟
⎥⎜
⎢ ⎟
=⎢ . . . .. ⎥⎜ .. ⎟
⎣ .. .
. . . . ⎦⎝ . ⎠
an1 an2 ... ann Xn
We say that the n × n matrix [aij ] is the matrix representation of f : Rn → Rn relative
to the basis {e1 , e2 , . . . , en }.
Similar to the R2 → R2 and the R3 → R3 cases, the action of f : Rn → Rn on a vector in Rn
is found by applying the matrix representation of f to the vector, according to the rule shown
in Figure 6.2.

a11 a12 a1n X1 a11 X1 + a12 X2 + + a1n Xn


a21 a22 a2n X2

an1 an2 ann Xn

a11 a12 a1n X1 a11 X1 + a12 X2 + + a1n Xn


a21 a22 a2n X2 a21 X1 + a22 X2 + + a2n Xn

an1 an2 ann Xn

a11 a12 a1n X1 a11 X1 + a12 X2 + + a1n Xn


a21 a22 a2n X2 a21 X1 + a22 X2 + + a2n Xn

an1 an2 ann Xn an1 X1 + an2 X2 + + ann Xn

Figure 6.2: Applying a matrix to a vector in Rn .

If f is a linear function from Rn to Rn , the columns of the matrix representing f are f (e1 ),
f (e2 ), . . . , f (en ).
6.2. Linear Functions and Matrices 283

What all this abstract work buys us is the ability to say what a function does to any vector
by knowing what it does to the standard basis vectors. For example, in the f : R2 → R2 case, it
means that we can say whatthe function
 does to an infinity of possible vectors by knowing what
1 0
it does to just two vectors, and . This is powerful, and it will enable us to understand
0 1
techniques for working with matrices instead of just memorizing them.
We will now develop an example of the use of matrices in biology that we will refer to
throughout this chapter.

A Matrix Population Model: Black Bears


As an example of a linear function R2 → R2 , we will consider a state space (J, A), where J is
the number of juveniles, and A is the number of adults of a species of black bear.
Black bears are a common and highly adaptable species found throughout North America,
from the Appalachian Mountains to suburban Los Angeles. Females become sexually mature at
three or four years of age and live 15 to 20 years in the wild. Approximately every two years,
a female will give birth, most commonly to two cubs. We are interested in developing a simple
mathematical model of a black bear population.
To model a black bear population, we divide it up into juveniles J (cubs and subadults who
are not yet sexually mature) and adults A. Then  the
state of the system is given by a point in
J
juvenile–adult (J, A) space, that is, as a vector .
A
Suppose that on average, a female black bear gives birth to two cubs every two years. This
averages out to one cub per year. However, it would simplify our work to focus only on females,
as many population models do. Therefore, we will say that a female bear gives birth to 0.5
female cubs each year, on average. Each year, about 10% of juveniles die and 25% mature into
adults, leaving 65% as juveniles.
Representing the juvenile population in the Nth year as JN and that of adults as AN , we have
the juvenile population in the (N + 1)st year as
JN+1 = 0.65JN + 0.5AN
If an adult bear’s life expectancy is around 14 years and bears become adults at age 4, they
average 10 years as adults. This makes the per capita death rate 1/10 = 0.1 adults per year, so
each year, 1 − 0.1 = 90% of adults remain adults. In addition, as we mentioned before, 25% of
juveniles mature into adults each year. This gives the adult population in the (N + 1)st year as
AN+1 = 0.25JN + 0.9AN
Therefore, the black bear population model is given by a linear function f :
  
JN+1 JN 0.65JN + 0.5AN
=f( )=
AN+1 AN 0.25JN + 0.9AN
which can be written in matrix form
  
JN+1 0.65 0.5 JN J
= =M N
AN+1 0.25 0.9 AN AN
284 Linear Algebra

Exercise 6.2.17 What are the matrices representing the following systems of equations?
a) XN+1 = 2XN + 6YN and YN+1 = 3XN + 8YN
b) XN+1 = −1.5XN and YN+1 = 6XN + YN
c) ZN+1 = 18ZN + 5WN and WN+1 = −7ZN + 2.2WN
d) aN+1 = −3aN and bN+1 = bN
e) aN+1 = −2bN and bN+1 = 4aN

Exercise 6.2.18 What systems of equations are represented by the following matrices? (You
can use X and Y as your variables.)


3 5 −2 3 0 4
a) b) c)
7 9 1 2 −5 0
⎡ ⎤
−1 0 0 4 0
d)
0 2.5 e) ⎣−7 0 2⎦
1 0 3

Applying Matrices to Vectors


Suppose during one year, we have a population of 100 juvenile bears and 50 adult bears and
want to know what the population will be next year. The current state of the population can be
written in the standard basis {e1 , e2 } as
   
J0 100 1 0
= = 100 + 50 = 100e1 + 50e2
A0 50 0 1
We now need to apply the function f to this vector to find the next year’s population.
From the matrix representation of this function, we can immediately say that f (e1 ) and f (e2 )
are the first and second columns of M, respectively.
  
0.65 0.5 0.65 0.5
M= f (e1 ) = f (e2 ) =
0.25 0.9 0.25 0.9
Then the next year’s population is

J
f ( 0 ) = f (100e1 + 50e2 )
A0
= 100f (e1 ) + 50f (e2 )
 
0.65 0.5
= 100 + 50
0.25 0.9

100 × 0.65 + 50 × 0.5
=
100 × 0.25 + 50 × 0.9

90
=
70
Therefore, next year’s population will be 90 juveniles and 70 adults.
6.2. Linear Functions and Matrices 285

Exercise 6.2.19 Use the method we used here to find the next year’s population if this year’s
population consists of 15 juveniles and 8 adults.

We can also use the rule for applying a matrix to a vector (Figure 6.1) to calculate the
populations of the two age groups in the following year:
   
J1 0.65 0.5 100 0.65 × 100 + 0.5 × 50 90
= = =
A1 0.25 0.9 50 0.25 × 100 + 0.9 × 50 70

Exercise 6.2.20 Evaluate:

  ⎡ ⎤⎛ ⎞
3 2 10 2 6 5 4 0 1 X
a) b) ⎣
4 1 10 1 4 3 c) 3 2 1⎦⎝ Y ⎠
1 4 2 Z

Composition of Linear Functions, Multiplication of Matrices


It is a crucial property of functions that we can “chain” them; that is, we can apply functions
repeatedly. In Chapters 1 and 2, we saw that if f and g are functions R → R, then we can define
f (g(X)), the result of applying f to g(X), which is written as “f ◦ g” and called “f composed
with g.”
In higher dimensions, the idea of chaining functions and applying them successively also makes
perfect sense. If f takes Rn to Rk and g takes Rk to Rp , we can define f ◦ g(X) = f (g(X)).

f g

n g k f p

This is the general case, but in this text, we are mostly interested in the case Rn → Rn → Rn .
If f and g are linear functions, represented (in the standard basis {e1 , e2 , . . . , en }) by matrices
A and B, then their composition f ◦ g is also a linear function, which is therefore represented
by a matrix we will call C. As always, the columns of this matrix show what the function does
to the standard basis vectors. The first column is (f ◦ g)(e1 ), the second is (f ◦ g)(e2 ), and the
nth column is (f ◦ g)(en ).
How do we find the matrix of f ◦ g? We already know g(e1 ); it’s just the first column of B.
Now all we need to do is apply f to this vector, which we can do using the shortcut of applying
the matrix A to g(e1 ). Similarly, to find the second column of the matrix of f ◦ g, we apply the
matrix A to g(e2 ), which is the second column of B. Repeating this process, we generate the n
columns of the matrix that represents f ◦ g.
We can also develop this idea algebraically to calculate the matrix C = [cij ] from A and B.
Suppose A = [aij ] and B = [bij ]. If we take an arbitrary vector X in Rn , apply B to it, and then
apply A to the result, we get
286 Linear Algebra
⎡ ⎤⎡ ⎤⎛ ⎞
a11 a12 ... a1n b11 b12 ... b1n X1
⎢a21 a22 ... a2n ⎥⎢ b2n ⎥⎜X2 ⎟
⎢ ⎥⎢b21 b22 ... ⎥⎜ ⎟
ABX = ⎢ . .. .. .. ⎥⎢ .. .. .. .. ⎥⎜ .. ⎟
⎣ .. . . . ⎦⎣ . . . . ⎦⎝ . ⎠
an1 an2 ... ann bn1 bn2 . . . bnn Xn
⎡ ⎤⎛ ⎞
a11 a12 ... a1n b11 X1 + b12 X2 + · · · + b1n Xn
⎢a21 a2n ⎥⎜ ⎟
⎢ a22 ... ⎥⎜b21 X1 + b22 X2 + · · · + b2n Xn ⎟
=⎢ . .. .. .. ⎥⎜ .. ⎟ apply B to X
⎣ .. . . . ⎦⎝ . ⎠
an1 an2 . . . ann bn1 X1 + bn2 X2 + · · · + bnn Xn
⎛ ⎞
c11 X1 + c12 X2 + · · · + c1n Xn
⎜c21 X1 + c22 X2 + · · · + c2n Xn ⎟
⎜ ⎟
=⎜ .. ⎟ apply A to BX
⎝ . ⎠
cn1 X1 + cn2 X2 + · · · + cnn Xn
⎡ ⎤⎛ ⎞
c11 c12 . . . c1n X1
⎢c21 c . . . c ⎥⎜X2 ⎟
⎢ 22 2n ⎥⎜ ⎟
=⎢ . .. .. .. ⎥⎜ .. ⎟ = CX
⎣ .. . . . ⎦⎝ . ⎠
cn1 cn2 ... cnn Xn

where cij = ai1 b1j + · · · + aii bij + · · · + aij bjj + · · · + ain bnj = k=n
k=1 aik bkj
We can think of this matrix multiplication graphically (Figure 6.3). To find cij , take row i of
matrix A and column j of matrix B, line the two up, and then multiply them componentwise,
adding up the results.

a11 a1i a1j a1n b 11 b 1i b 1j b 1n c11 c1i c1j c1n

ai1 aii aij ain b i1 b ii b ij b in ci1 cii cij cin

aj1 aji ajj ajn b j1 b ji b jj b jn cj1 cji cjj cjn

an1 ani anj ann b n1 b ni b nj b nn cn1 cni cnj cnn

Figure 6.3: Multiplication of two n × n matrices.

Matrix Multiplication
If a linear function f is represented by the matrix A and another linear function g is represented
by the matrix B, then the composition f ◦ g(X) is represented by the matrix product ABX.
6.2. Linear Functions and Matrices 287

Exercise 6.2.21 For the following functions, can f (g(x)) exist?


a) f : R2 → R5 and g : R3 → R2
b) f : R4 → R3 and g : R2 → R3
c) f : R7 → R138 and g : R26 → R7

Exercise 6.2.22 If the matrices A and B have the following dimensions, does AB exist?
a) A is a 5 × 2 matrix and B is a 2 × 3 matrix.
b) A is a 3 × 4 matrix and B is a 3 × 2 matrix.
c) A is a 138 × 7 matrix and B is a 7 × 26 matrix.

Exercise 6.2.23 Multiply:

⎡ ⎤
1 5 2 −1 2 3 −2 4 2 0
a) b) 1 2 3 ⎣
3 2 4 5 3 −1 1 −3 c) −2 5⎦
3 2 −1
1 −3

An Application of Matrix Multiplication

We can illustrate the principle of multiplication of matrices by considering an alternative scenario


for the black bear, in a bad year. We will model “bad year” by lowering the birth rate from 0.5
to 0.4 and increasing the death rate for juveniles to 40%, with 50% of them remaining juvenile
and only 10% maturing to adults. The juvenile population model is
JN+1 = 0.5JN + 0.4AN
We also increase the adult death rate to 20%, so the survival rate will be 100%−20% = 80%.
The adult population model is therefore
AN+1 = 0.1JN + 0.8AN
Putting these together, we get
 
JN+1 0.5JN + 0.4AN
=
AN+1 0.1JN + 0.8AN
The matrix that describes the “bad year” dynamics is therefore

0.5 0.4
M bad =
0.1 0.8
We can then calculate the populations after a good year followed by a bad year. The two-year
forecast for an initial population of 100 juveniles and 50 adults is
   
J0 0.5 0.4 0.65 0.5 100 0.425 0.61 100 73
M bad M = = =
A0 0.1 0.8 0.25 0.9 50 0.265 0.77 50 65
288 Linear Algebra

Exercise 6.2.24 Verify that this calculation is correct by applying the good-year matrix M to
the initial condition, and then applying the bad-year matrix M bad to the result. How does your
result compare to the above calculation?

Exercise 6.2.25 What does the matrix M M bad represent?

Exercise 6.2.26 What matrix product represents a sequence of two good years, followed by
two bad years, followed by a good year? Be careful about the order of multiplication.

Notation

matrix symbol matrix vector symbol vector matrix operating on vector


 
a11 a12 X1 a11 a12 X1
M X MX =
a21 a22 X2 a21 a22 X2

Once we have the matrix representation of a function, we can then talk about what would
happen if we applied the function repeatedly to get the long-term behavior of the system. This
is our next topic.

Further Exercises 6.2


⎛ ⎞
0
⎜0⎟
⎜ ⎟
1. If f is linear, what is f (⎜ . ⎟)?
⎝ .. ⎠
0
2. Give two everyday or scientific examples of linear combinations not mentioned in the
text and briefly explain why each is a linear combination.

3. You are making smoothies. (Be sure to justify your answers to the questions that follow.)
a) A smoothie recipe can be seen as a linear combination of ingredients. Explain why
this is true.
b) Is the cost to make a smoothie a linear function of the costs of the ingredients?
c) Is the caloric content of the smoothie a linear function of the caloric content of the
ingredients?
d) Iron is absorbed better in the presence of vitamin C. Is the amount of available iron
in your smoothie a linear function of the amount of available iron in the ingredients?
e) You get your friends to taste your creations. Is the number of friends who like a
smoothie likely to be a linear function of the number who like each ingredient?
6.2. Linear Functions and Matrices 289

f) Your smoothies are a hit and you decide to go into business. If you want to keep
prices simple, so that all smoothies of a given size cost the same, will your prices
be a linear function of the prices of the ingredients?

4. While going to a teaching assistant’s office hours, you get lost in the bowels of the School
of Engineering. You are walking through the Materials Science Department when you find
a strip of a material that looks like nothing you have ever seen before. You pocket it for
later examination. Back in your room, you decide to study how the material responds to
stretching and compression. Design an experiment to see whether its response to these
forces is linear.

5. You are studying how temperature affects the growth of your state flower in order to
predict the species’s response to climate change. You have a greenhouse and can grow
the plants at any temperature you want.
a) Suppose you call the average temperature at which the plants grow 0, so below-
average temperatures are negative and above-average ones are positive. Similarly,
below-average growth rates are negative and above-average ones are positive.
Design an experiment to test whether the response of change in growth rate to
change in temperature is linear.
b) What result do you expect this experiment to produce? Justify your answer.

6. The function g : R2 → R2 is linear.


⎛ ⎞ ⎛ ⎞
 5  3
−1 3
g( ) = ⎝−2⎠ and g( ) = ⎝−3⎠
4 2
3 0
   
−2 −1 3 −2
Since =5 + , what is g( )?
22 4 2 22
7. Assume that f is a linear function. Without using matrices, do the following:
    
1 2 0 −4 5
a) If f ( )= and f ( )= , find f ( ).
0 3 1 7 6
⎛ ⎞ ⎛ ⎞
 7  2 
1 0 3
b) If f ( ) = ⎝5⎠ and f ( ) = ⎝4⎠, find f ( ).
0 1 4
9 6
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1  0  0  8
1 3 −9
c) If f (⎝0⎠) = , f (⎝1⎠) = , and f (⎝0⎠) = , find f (⎝−5⎠).
2 5 −2
0 0 1 7
8. Could the functions described below be linear? Justify your answers.
   
12 6 −4 −2
a) f ( )= and f ( )=
3 −5 −1 3
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
2  4  6 
⎝ ⎠ −1 ⎝ ⎠ 5 ⎝ ⎠ 3
b) f ( −5 ) = , f( 1 )= and f ( −4 ) =
2 2 4
3 3 0
290 Linear Algebra

9. Multiply:
 
2 3 3 5 8 1
a) b)
1 2 2 0 4 5
⎛ ⎞ ⎡ ⎤⎛ ⎞
1 0 1 3 2
6 −2 7 ⎝ ⎠
c) 3 d) ⎣−4 2 1 ⎦⎝−4⎠
1 0 2
4 3 6 −2 3

10. Carry out the following matrix multiplications. For each problem, say what the function
represented by each matrix does to the standard basis vectors and what the product of
the two matrices would do to these vectors.

7 9 0 2 5 −4 3 4 −1 −2 3 0
a) b) c)
3 1 4 6 2 0.5 2 −1 5 9 0 1

11. Multiply:

7 8 3 2 3 2 5 2 −1
a) b)
4 5 −2 −3 1 5 4 2 1
⎡ ⎤ ⎡ ⎤⎡ ⎤
−2 1 1 2 0 4 6 −7
−6 3 7
c) ⎣ 0 3⎦ d) ⎣3 5 0 ⎦⎣−2 0 1⎦
9 −4 −5
4 6 0 1 −2 −4 4 3

12. What is the difference between multiplying a matrix times a vector and multiplying two
matrices?

13. We have two linear functions, f : R2 → R4 and g : R3 → R2 . The matrix representing


f is ⎡ ⎤
−2 3
⎢5 4⎥
⎢ ⎥
⎣2 1⎦
0 3

a) Suppose
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1  0  1 
⎝ ⎠ 5 ⎝ ⎠ 2 ⎝ ⎠ 3
g( 0 ) = , g( 0 ) = and g( 1 ) =
7 1 4
0 1 1
Find the matrix of g.
b) Find the matrix of f ◦ g or explain in terms of functions why it does not exist.
c) Find the matrix of g ◦ f or explain in terms of functions why it does not exist.

14. The function f : R2 → R2 is linear.


   
2 4 0 −15
a) If f ( )= and f ( )= , find the matrix representing f .
0 2 5 5
6.3. Long-Term Behaviors of Matrix Models 291

3
b) What is f ( )?
4    
2 2 1 2 0 7
c) g : R → R is also a linear function. If g( )= and g( )= ,
0 3 1 −1
what is the matrix of g ◦ f ?

6.3 Long-Term Behaviors of Matrix Models


With an understanding of vectors and matrices, we can now use them to model biological
processes and explore the long-term dynamics of these systems.
The long-term behavior of a matrix model is revealed by applying the matrix many times over.
This is called an iterated matrix or iterated function. If we begin with an initial condition X,
then the long-term behavior is
 ·
M · · M X = M N X
N

for large values of N.


Matrix models can exhibit three basic types of long-term dynamics: stable and unstable equi-
librium behavior, neutral equilibria, and neutral oscillations. We will study examples of each of
these in turn.

Stable and Unstable Equilibria

The black bear population model developed in the previous section is an example of a Leslie
matrix. A Leslie matrix model of a population gives the rates at which individuals go from one
life stage to another. In this case, we have two life stages, juvenile and adult. The diagonal entries
give the fraction of the population that stays within the same life stage, while the off-diagonal
entry in the top row gives the birth rate of juveniles. The off-diagonal entry in the bottom row
is the transition rate from the juvenile stage to the adult stage. Therefore, in the model

0.65 0.5
M=
0.25 0.9
65% of juveniles remain juveniles and 90% of adults remain adults in any given year. Furthermore,
25% of juveniles in a given year mature into adults, and the average adult has 0.5 (female)
offspring.

Exercise 6.3.1 Come up with a Leslie matrix model for a fictional species with two life stages
and describe the meaning of its entries, as above.

Let’s look at the long-term behavior of this model. If we iterate M from an initial condition
of 10 juveniles and 50 adults for 15 times, we see that both juvenile and adult populations grow
with time (Figure 6.4, left). Notice that the trajectory consists of isolated points. This is because
a Leslie matrix is a discrete-time model. If we plot these points in J-A state space, we see that
after the first few values, all the points fall on a straight line passing through the origin, implying
that the ratio of juveniles to adults remains constant as the population grows (Figure 6.4, right).
292 Linear Algebra

Moreover, the distance between successive state points increases with time, meaning that the
population growth rate increases with population size.

300 A
juveniles (J) good years 300
adults (A)

(J0, A0) = (10, 50)


200
200

100 100

0
N 0 J
5 10 15 100 200 300

Figure 6.4: Time series (left) and corresponding trajectory (right) produced by iterating the
matrix M, modeling the black bear population in a good year. Notice that both consist of
discrete points.

Now let’s consider a bad year, which, as we saw, is modeled by the matrix

0.5 0.4
M bad =
0.1 0.8
Iterating this matrix, we see that both juvenile and adult populations go to zero with time
(Figure 6.5, left). However, this decline doesn’t initially affect both age groups in the same way.
The juvenile population grows for a time, while the adult population just shrinks. Of course,
this can’t go on forever, so after a few years, both populations enter long-term decline. (The
system’s behavior before it enters this long-term pattern is called a transient.)

50 A (J0 , A0) = (10, 50)


bad years juveniles (J)
adults (A) 50
40

30

25
20

10

0 N 0
J
5 10 15 25 50

Figure 6.5: Time series (left) and corresponding trajectory (right) produced by iterating the
matrix M bad , modeling the black bear population in a bad year.

Let’s consider another Leslie matrix for a two-stage population. Here we will consider a
situation in which 10% of juveniles remain juvenile, 40% become adults, and the rest die. The
birth rate is 1.4 offspring per adult, and only 20% of adults survive each year. This gives us a
6.3. Long-Term Behaviors of Matrix Models 293

matrix
0.1 1.4
M osc =
0.4 0.2

If we iterate M osc , we see that both juvenile and adult populations approach the stable
equilibrium at (0, 0) in an oscillatory manner (Figure 6.6).

A
70
50
juveniles (J)
60
adults (A)
40
50

40 30

30
20
20
10
10

5 10 15
N 10 20 30 40 50 60 70
J

Figure 6.6: Time series (left) and corresponding trajectory (right) produced by iterating the
matrix M osc .

Neutral Equilibria
We will now consider an important class of models whose equilibria are not the isolated equilib-
rium points we have been seeing all along. In these models, called Markov processes, the final
equilibrium value depends on the initial condition, so there is an infinity of equilibrium points.
All of the models we have seen so far can be thought of as compartmental models. In a com-
partmental model, a large number of objects are transferred from one compartment to another,
according to rules. In the discrete-time version of compartmental modeling, these transfers take
place at discrete time points, 1, 2, 3, . . . , N.
In epidemiology, the study of infectious diseases, many models use compartments called
susceptibles (those who can become infected), and infecteds. We will represent these two pop-
ulations by S and I.
In epidemiology, linear models of disease transmission are used to predict whether a disease
will initially spread. Epidemiologists will make an estimate of the rate of “new cases per old case,”
the quantity called R0 (read “R-zero” or “R-nought”) and then model the epidemic as
IN+1 = R0 IN
where IN is the number of infected people at the Nth time point. If R0 > 1, the epidemic
spreads, while if R0 < 1, the epidemic will tend to die out.
In the more general case, we can write a simple compartmental model representing the trans-
fers from the susceptibles compartment S to the infecteds compartment I and vice versa.

I remaining I
S becoming I
S I
S remaining S I becoming S

We will make the extremely strong assumption that at each time point, a constant fraction a
of the susceptibles become infected and a constant fraction b of the infecteds recover to become
294 Linear Algebra

susceptibles again. If a is the fraction of S that become I, then the fraction of S that remain
S must be 1 − a. If b is the fraction of I that become S, then the fraction of I that remain I
must be 1 − b. This gives us the following figure.
1-b
a
S I
b
1-a

The discrete-time dynamics for this S-I compartmental model are


SN+1 = (1 − a)SN + bIN
IN+1 = aSN + (1 − b)IN
This can be written in matrix form:
 
SN+1 1−a b SN
=
IN+1 a 1−b IN
Let’s choose a = 0.1 and b = 0.2, which means that at each time point, 10% of susceptible
individuals become infected, and 90% remain susceptible. Similarly, 20% of infected individuals
recover, with 80% remaining infected. Notice that the disease is nonlethal, because there are no
death terms in this model. And there is no immunity, since infecteds return to the susceptible
compartment.
This gives us the matrix
0.9 0.2
M SI = (6.1)
0.1 0.8

If we iterate M SI , we see a new kind of behavior. If we begin with an initial condition of 10


susceptibles and 50 infecteds, the system stabilizes at an equilibrium point. And if we begin with
a different initial condition, at 30 susceptibles and 80 infecteds, the system also stabilizes at an
equilibrium point, but a different one.

Exercise 6.3.2 Explain why the entries in each column of a transition matrix such as equa-
tion (6.1) must add up to one. (Hint: Label the rows and columns, writing “from” and “to” where
appropriate.)

Exercise 6.3.3 Starting with 20 susceptible and 40 infected individuals, iterate M SI 15 times
in SageMath. What steady state does the system reach? Do the same for 50 susceptible and
60 infected individuals. How do your results compare to the simulations in Figure 6.7?

80 susceptibles (S) 80
infecteds (I)
60 60

40 40

20 20

5 10 15
N 5 10 15
N

Figure 6.7: Time series from two simulations of the susceptible-infected model. Starting from
different initial conditions, the system converges to different equilibrium points.
6.3. Long-Term Behaviors of Matrix Models 295

Exercise 6.3.4 What is the behavior of the total population (S + I) over time?

Why does this susceptible–infected system behave so differently from the black bear Leslie
matrices we studied at the beginning of this section? One key difference is that Leslie matrices
involve births and deaths. A population modeled by a Leslie matrix model must grow or decline
unless the birth and death rates exactly balance. In this particular disease model, on the other
hand, individuals are just shuffled from one compartment to another, without any overall increase
or decrease in population size.

Neutral Oscillations
Our final example of a matrix model is one that gives neutral oscillations (Bodine et al. 2014). By
“neutral,” we mean that here, as in the previous example of neutral equilibria, the final outcome
depends on the initial condition, only here the final outcome is an oscillation. These “neutral
oscillations” are therefore a discrete-time analogue to the neutral oscillations we saw in the
frictionless spring and the shark–tuna models.
Locusts, which are important agricultural pests, have three stages in their life cycle: eggs (E),
hoppers (juveniles) (H), and adults (A). In a certain locust species, the egg and hopper stages
each last one year, with 2% of eggs surviving to become hoppers and 5% of hoppers surviving
to become adults. Adults lay 1000 eggs (as before, we are modeling only females) and then die.
From these principles, we can write a 3-variable linear equation
EN+1 = 0 · EN + 0 · HN + 1000AN
HN+1 = 0.02EN + 0 · HN + 0 · AN
AN+1 = 0 · EN + 0.05HN + 0 · AN
which gives rise to a 3 × 3 Leslie matrix:
⎡ ⎤
0 0 1000
L = ⎣0.02 0 0 ⎦
0 0.05 0
Simulating the model, iterating L with an initial population of 50 eggs, 100 hoppers, and 50
adults results in oscillatory dynamics of the populations over time. Consider, for example, the
adult population (Figure 6.8, black dots). As you can see, the adult population oscillates with
no overall growth or decline.

50

40

adults(A) 30

20

10

N
5 10 15

Figure 6.8: Time series of adult populations from two simulations (black and green) of the locust
population model from two different initial conditions.
296 Linear Algebra

If we try a different initial condition, say 50 eggs, 20 hoppers, and 30 adults, we get a
different oscillation, also with no overall growth or decay, but with different values (Figure 6.8,
green dots).

Exercise 6.3.5 Simulate the discrete-time dynamical system described by the matrix L, and
plot all three populations.

Exercise 6.3.6 Calculate the total population E + H + A at each time point. How does it
change?

We have now seen the repertoire of long-term behaviors that linear models can exhibit: stable
and unstable equilibria, neutral equilibria, and neutral oscillations.

Matrix Models in Ecology and Conservation Biology


One interesting example of the use of matrix models in real scientific research involves the
extinction of moas, giant birds that inhabited New Zealand until shortly after it was colonized by
humans in the late 1200s AD. Archaeological data suggested that moas went extinct less than
200 years after human colonization. But could a small population really hunt moas to extinction
so rapidly?
Researchers used data from present-day moa relatives and analysis of fossil remains to build
a Leslie matrix model of moa population dynamics (Holdaway and Jacomb 2000). The goal of
the model was to study the relative importance of two different factors in the extinction of the
moa, namely, human hunting and habitat loss. This is a type of question that is ideally suited to
modeling: we can try different combinations of the two factors and see what happens.
The study used model parameters that changed over time to represent the effects of a growing
human population on moa survivorship. The results indicated that even low hunting pressure by
a population of a few hundred people was enough to drive moas to extinction in 160 years or
less (Figure 6.9).

stable
80000

60000

100 people + no habitat loss


40000

100 people + habitat loss


20000
200 people + habitat loss

0 years (AD)
1280 1320 1360 1400 1440 1480

Figure 6.9: Simulated effects of different human colonization scenarios on moa populations.
Redrawn from “Rapid extinction of the moas (Aves: Dinornithiformes): model, test, and implica-
tions,” by R.N. Holdaway and C. Jacomb, 2000, Science 287(5461):2250–2254. Reprinted with
permission from AAAS.
6.3. Long-Term Behaviors of Matrix Models 297

Note from their simulations that even without habitat loss, the hunting pressure of even 100
humans, growing at 2.2% per year, with no habitat loss, was enough to drive the moa to extinc-
tion, albeit in a slightly longer time. Habitat loss made it worse, and if they considered an initial
population of 200 humans and included habitat loss, the decline was even more catastrophic.
The authors conclude that “Long-lived birds are very vulnerable to human predation of adults.”

Exercise 6.3.7 If a species is going extinct, what equilibrium is the population size approaching?
Is this equilibrium stable or unstable?

Matrix models are also helping to prevent sea turtles from going the way of the moa. Log-
gerhead sea turtles are an endangered species. Adult females build nests on beaches, lay eggs,
and leave. Hatchlings then go out to sea, where they grow into juveniles and then adults.
In the 1980s, sea turtle conservation efforts focused on protecting nests and hatchlings. Then
a group of ecologists decided to test whether such efforts, even if extremely successful, could
actually save the species from extinction (Crouse et al. 1987). They used field data to build a
matrix model consisting of seven life stages (eggs and hatchlings, small juveniles, large juveniles,
subadults, novice breeders, first-year remigrants, and mature breeders), and for each stage in
turn, they reduced mortality to zero. This is obviously impossible, but it’s the most basic test a
conservation strategy must pass. If eliminating all mortality in a life stage can’t save the species,
neither can merely reducing the mortality.
Simulations showed that if nothing was done, the population would decline. However, elimi-
nating all mortality in the eggs and hatchlings stage didn’t reverse the decline. To do so, it was
necessary to protect large juveniles and subadults. Since most preventable mortality at this stage
came from turtles getting caught in fishing and shrimping nets, mandating the installation of
turtle excluder devices that allow sea turtles to escape from nets is a much better strategy for
protecting the species. The United States currently requires the use of these devices, but some
countries in loggerhead habitat do not.

Further Exercises 6.3

1. Giant pandas are a vulnerable species famous for their consumption of large amounts
of bamboo. Write a discrete-time matrix model of a giant panda population using the
following assumptions. We are modeling only the female population.
– Pandas have three life stages: cubs, subadults, and reproductively mature adults.
– Cubs remain cubs for only one year. They have a mortality rate of 17%.
– Pandas remain subadults for three years. Thus, about 33% of subadults mature into
adults each year.
– 28% of subadults die each year.
– On average, adults give birth to 0.5 female cubs each year.
– 97.7% of adults survive from one year to the next.
298 Linear Algebra

2. Nitrogen is a key element in all organisms. Use the following assumptions to set up a
matrix model of nitrogen flow in an ecosystem consisting of producers (P ), consumers
(C) and decomposers (D).
– 25% of the nitrogen in plants goes to consumers and 50% goes to decomposers.
– 75% of the nitrogen in consumers goes to decomposers.
– 5% of the nitrogen in decomposers goes to consumers, and 15% is lost from the
ecosystem. The rest goes to plants.

3. In epidemiology, a common way to model the spread of an infectious disease is to track


the number of susceptible individuals (S), the number of currently infected individuals
(I), and the number of individuals who have recovered from the disease with immunity
(R). Assume the following:
– Each day, 2% of susceptible individuals get infected.
– On average, a person remains infected for five days, so each day roughly 20% of
infected individuals recover. Most (say 18%) will have developed immunity to the
disease, but a few (2%) will not be immune, and thus will immediately be susceptible
again.
– A person’s immunity does not last forever. Each day 1% of recovered individuals
become susceptible again.
a) Draw a compartment diagram for this model and label each of the arrows appro-
priately.
b) What is the matrix of this model?

4. Black-lip oysters (Pinctada margaritifera) are born male, but may become female later
in life (a phenomenon known as protandrous hermaphroditism). We can therefore divide
their population into three life stages: juveniles (which are all male), adult males, and
adult females. Assume the following:
– Each year, about 9% of juveniles remain juveniles, 0.9% grow to become adult males,
and 0.1% grow into adult females. The rest die.
– Each year, about 4% of adult males become female, and about 10% of them die.
– About 10% of adult females die each year. Females never change back into males.
– Each female lays enough eggs to yield about 200 juveniles per year.
Write a discrete-time matrix model based on these assumptions.

6.4 Eigenvalues and Eigenvectors


We have now seen a variety of matrix models, with a variety of long-term behaviors, such as
equilibrium point behaviors and oscillatory behaviors. We simulated these long-term behaviors
by simply iterating the matrix over and over again from an initial condition. Our goal now is to
understand these long-term behaviors and to be able to predict them, by studying the structure
of the model itself. In order to do this, we need to develop one more critical piece of linear
algebra: the concepts of eigenvalues and eigenvectors.
6.4. Eigenvalues and Eigenvectors 299

Linear Functions in One Dimension


Recall from Chapter 2 that the linear functions in one dimension are exactly the functions
f (X) = r X, where r is in R. Those are the only functions that can pass the stringent test for
linearity:
f (X + Y ) = f (X) + f (Y ) for all X, Y
f (kX) = kf (X) for all k in R

Linear Functions in Two Dimensions


Let’s consider an arbitrary linear function f : R2 → R2 :
  
U X aX + bY
=f( )=
V Y cX + d Y
As we saw, this function can also be represented in matrix form:
 
U X
=M
V Y
where
a b
M=
c d

The easiest way to make a 2D function is to take two 1D functions and join them together.
So if U = aX and V = d Y , then we can make the function
 
U aX
=
V dY
This represents a very special case in which U depends only on X, and V depends only on Y .
In this special case, the function is represented by a diagonal matrix, which is a matrix whose
entries are all 0 except those on the descending diagonal:
  
U aX a 0 X
= =
V dY 0 d Y
In this case, it is easy to determine the action of function f : it acts like multiplication by a
along the X axis and like multiplication by d along the Y axis.
Y
dY
f
Y

X
0 X aX

For example, consider a linear discrete-time dynamical system consisting of two species that
don’t interact with each other, such as sharks and rabbits. Let SN be the number of sharks
in the Nth year, and let RN be the number of rabbits in the Nth year. Because there is no
interaction, SN+1 is purely a function of SN , and RN+1 is purely a function of RN . If the shark
population grows at a rate a and the rabbit population grows at a rate d, then SN+1 = aSN and
RN+1 = dRN .
300 Linear Algebra

The matrix representation of this system of two noninteracting species is then


  
SN+1 aSN a 0 SN
= =
RN+1 dRN 0 d RN

A diagonal matrix represents a function that can be decomposed into two 1-dimensional
functions along the axes X and Y. Diagonal matrices represent systems in which the variables
are noninteracting.


2 0
Exercise 6.4.1 Consider the matrix M = .
0 3

1
a) Compute Me1 , Me2 , and M .
1

1
b) Draw e 1 , e 1 , , and the vectors you obtained in the first part of this problem.
1

1
c) Describe what M does to e1 , e2 , and .
1
d) What will M do to other vectors that lie along the X axis? The Y axis?
e) What will M do to vectors that do not lie along the axes?

0.5 0
Exercise 6.4.2 Repeat the previous exercise for M = .
0 −2

Eigenvalues

Understanding the action of a diagonal matrix is easy. But what about the general case? The
typical matrix is not a diagonal matrix, so it is hard to guess what the action of the matrix looks
like. Since U is a function of both X and Y , and so is V , we cannot simply decompose f into
two 1D systems acting along the X and Y axes. We can’t just look at the X and Y axes and
stretch or compress the standard basis vectors.
But what if we could find two new axes? Specifically, what if we could find two vectors U
and V such that f is decomposable into two 1D systems acting along the U and V axes?
If two such axes did exist, then by definition, they would have to have the property that
MU = λ1 U and MV = λ2 V
for some real numbers λ1 and λ2 , which means that M would be acting along the vector U as
multiplication by λ1 , and acting along the vector V as multiplication by λ2 .
When this can be done, we call U and V the eigenvectors of M, and λ1 and λ2 are the
corresponding eigenvalues.

Exercise
 6.4.3
One of the eigenvalues of the matrix M is 3, and a corresponding eigenvector
2
is V = . Find MV.
1
6.4. Eigenvalues and Eigenvectors 301

In other words, we are looking for solutions to the linear equation


ME = λE (6.2)
where E is the axis we are looking for (Figure 6.10). We will solve this equation for λ and E. Let

a b X
M= and E=
c d Y
We can write  
a b X aX + bY
ME = =
c d Y cX + d Y

Y
λY

Y
E ME= λE

0 X λX X

Figure 6.10: The effect of applying the matrix M to the vector E (black arrow) is a new vector
that is E multiplied by a scalar λ.

and  
X λX
λE = λ =
Y λY

Since ME = λE,  
aX + bY λX
=
cX + d Y λY

From this vector equation, we get the following two equations:


aX + bY = λX
cX + d Y = λY
We want to manipulate these equations to give us an expression in terms of λ. The first
expression is
aX + bY = λX
=⇒ λX − aX = bY
=⇒ (λ − a)X = bY
bY
=⇒ X=
λ−a

which gives us X in terms of Y . We will now use that to substitute for X in the second expression,
cX + d Y = λY
302 Linear Algebra

which gives us
bY
c + dY = λY
λ−a
cbY
=⇒ = (λ − d)Y
λ−a
cb
=⇒ = (λ − d)
λ−a
=⇒ cb = (λ − a)(λ − d)
=⇒ cb = λ2 − aλ − dλ + ad

which finally gives us


λ2 − (a + d)λ + (ad − cb) = 0

This is a quadratic equation in λ, called the characteristic equation, which must be satisfied if
λ is a solution to equation (6.2).
We know how to solve quadratic equations. Using the quadratic formula, we get

(a + d) ± (a + d)2 − 4(1)(ad − cb)
λ=
2(1)
which can be simplified to

(a + d)2 − 4(ad − cb)
(a + d) ±
λ= = (λ1 , λ2 ) (6.3)
2

a b
We have found a very fundamental relationship. For every matrix M = there is a set
c d
of axes2 U, V such that MU = λ1 U and MV = λ2 V, and we have found λ1 and λ2 in terms of
the coefficients a, b, c, and d. The quadratic formula gives us two values of λ (note the ± sign
in the expression). These two values, which we call λ1 and λ2 , are called the two eigenvalues of
the matrix M.


a b
For the matrix M = , the characteristic equation (or characteristic polynomial) for
c d
an eigenvalue λ in 2D is
λ2 − (a + d)λ + (ad − cb) = 0

Eigenvalues are solutions to this equation.

Let’s try an example. Consider the matrix



1 2
M=
4 3
This is obviously an undecomposable function of X1 and X2 . Can we find two new axes along
which it is decomposable? Plugging the coefficient values into equation (6.3), we get

4 ± 42 − 4(3 − 8) 4±6
λ= = = (λ1 , λ2 ) = (5, −1)
2 2

2 We will later see that these may not be axes in the usual sense, since they could involve complex numbers,

but we can still write them down symbolically.


6.4. Eigenvalues and Eigenvectors 303

We have now found that there are two axes U, V such that the matrix acts like multiplication
by λ1 = 5 along U, and acts like multiplication by λ2 = −1 along V.
But we do not know what U and V are yet.


3 5 4 1
Exercise 6.4.4 Compute the eigenvalues of the following matrices: and .
2 4 3 2

Eigenvectors

X
We now need to find U and V. Let’s say U = . Since we said that M acts like multiplication
Y
by 5 along U, this means that
   
1 2 X X + 2Y X 5X
MU = = = λ1 U = 5 =
4 3 Y 4X + 3Y Y 5Y
So
X + 2Y = 5X =⇒ Y = 2X
4X + 3Y = 5Y =⇒ Y = 2X

Now Y = 2X is the equation for the line in (X, Y ) space that has slope 2 and passes through
the origin. This line is the axis
 U. We can choose any nonzero vector on the U axis to represent
1
it, for example, the vector . This vector is then called an eigenvector of the matrix M
2
corresponding to the eigenvalue λ1 = 5.
 to the second eigenvalue λ2 = −1 can be found in a similar
An eigenvector corresponding
X
manner. Let’s assume V = . Then
Y
   
1 2 X X + 2Y X −X
MV = = = λ2 V = −1 =
4 3 Y 4X + 3Y Y −Y
So
X + 2Y = −X =⇒ Y = −X
4X + 3Y = −Y =⇒ Y = −X

Y = −X is the equation for the line in (X, Y ) space that has slope −1 and passes through
the origin. This line is the axis V. As before,
 we can choose any nonzero vector on the U axis to
1
represent it, for example, the vector , which is then called an eigenvector of the matrix
−1
M corresponding the eigenvalue λ2 = −1.
Y U ( λ1= 5) V ( λ2= 1) Y
1 1

X X
-1 1 -1 1

-1 -1
304 Linear Algebra

We have now accomplished a basic task: given an indecomposable nondiagonal matrix, we


have found two new axes, U and V, along which the matrix is diagonal. Let’s call this diagonal
matrix D. This new set of axes can be seen as a new coordinate system for R2 ; call it {U, V}.
In the {U, V} coordinate system, the matrix D is diagonal:

λ1 0
D=
0 λ2

matrix in {X, Y} eigenvalues eigenvectors diagonalized matrix in {U, V}



1 2 1 λ 0
M= λ1 = 5 U= D= 1
4 3 2 0 λ2
1
λ2 = −1 V=
−1

If a matrix M has two real eigenvalues λ1 and λ2 , this implies that M can be decomposed
using two new axes, U and V, such that M acts like multiplication by λ1 along U and like
multiplication by λ2 along V.

New coordinate systems. We can navigate in R2 using these two new axes. The standard basis
{e1 , e2 } is the most familiar coordinate system for R2 : to get to any point, go a certain distance
horizontally (parallel to e1 ) and a certain distance vertically (parallel to e2 ). The eigenvectors U
and V also form a coordinate system, and we can get to any point in R2 by moving a certain
distance in the U-direction and a certain distance in the V-direction.
We will now illustrate the process of navigating in R2 using two different coordinate sys-
tems. As our {X, Y} coordinate system, we will use the standard basis {e1 , e2 }. For the {U, V}
coordinate system, we will use the eigenvectors we just calculated:
   
1 0 1 1
{X, Y} = {e1 , e2 } = { , } {U, V} = { , }
0 1 2 −1

3
Consider the point p represented in the standard {X, Y} coordinate system as p{X,Y } = .
0
To navigate from the origin to p, go three units in the X direction, and zero units in the Y
direction (Figure 6.11, left).
Y Y
V U
2 2
1 1
p p
X X
1 2 3 1 2 3

Figure 6.11: Finding the coordinates of the point p in a new coordinate system {U, V}.
6.4. Eigenvalues and Eigenvectors 305

In order to navigate to p in the {U, V} coordinate system, suppose that the coordinates of p
are c1 and c2 . We have
   
1 1 c1 × 1 + c2 × 1 3
p{U,V } = c1 U + c2 V = c1 + c2 = =
2 −1 c1 × 2 + c2 × (−1) 0
Solving this algebraically, we get

c1 × 1 + c2 × 1 = 3
=⇒ c1 = 1, c2 = 2
c1 × 2 + c2 × (−1) = 0
Therefore, to navigate from the origin to p in the {U, V} coordinate system, go one unit in
the U direction and two units in the V direction (Figure 6.11, right).

Exercise 6.4.5 Find the eigenvectors of the matrices whose eigenvalues you found in Exercise
6.4.4 on page 303.

We will now use the ability to change coordinate systems to map the action of M.

Using eigenvalues and eigenvectors to calculate the action of a matrix


We will now show how to use the eigenvectors and corresponding eigenvalues of a matrix to
calculate the action of the matrix on a test point.
The following discussion is somewhat technical; the details can be skimmed over, and the
reader can skip to “Are All Matrices Diagonalizable?” on page 312. However, the high-level
summary of what we will do here is important. What we are going to do, for an arbitrary matrix
M and a test point p, is find q = Mp. We will do this by the following procedure:
(1) Pick a test point p. Let {X, Y} be an arbitrary coordinate system (it could be the standard
basis {e1 , e2 } or any  Suppose we have the coordinates of p in the {X, Y} coordinate
other).
pX
system as p{X,Y } = .
pY
(2) Calculate the eigenvectors U, V of the matrix M and their corresponding eigenvalues λ1
and λ2 .
(3) Find the  representation
of the test point p in the {U, V} coordinate system to obtain
pU
p{U,V } = .
pV
(4) Evaluate the action of M by multiplying the U-component pU by λ1 , and the V-component
pV by λ2 . This gives us the location of the point q in the {U, V} coordinate system,
λ1 pU
q{U,V } = .
λ2 pV
(5) Transform the {U, V} coordinate representation of q, q{U,V } back into the {X, Y} coor-
dinate system to obtain q{X,Y } .

1 2
An example. Let’s compute what the matrix M = does to the test point p. For the
4 3
{X, Y} coordinate system,
 wewill use {e1 , e2 }. In this standard coordinate system, we pick the
pX 1
test point p{X,Y } = = .
pY 0.5
306 Linear Algebra

In order to calculate the action of M, we need to locate this point on the U and V axes
 from the {X, Y} coordinate system
(Figure 6.12). To do this, we need a way of transforming
pU
to the new {U, V} coordinate system to get p{U,V } = .
pV

Y U U
V
pV

p
V pX
p
X pY

pU

Figure 6.12: The coordinates of the point p in the {X, Y} and {U, V} coordinate systems.

Once we have the test point p represented in the {U, V} coordinate system, we then just
multiply its components by the corresponding eigenvalues λ1 and λ2 (Figure 6.13). Here, the
U-component is multiplied by λ1 = 5, and the  V-component
 by λ2 =
is multiplied  −1. Thus,

pU qU λ1 · pU
the image under M of the test point p{U,V} = is the point q{U,V } = = =
pV qV λ2 · pV

5pU
.
−pV

Y q U ( λ1= 5)

V ( λ2= 1)

p
X

Figure 6.13: Using eigenvalues and corresponding eigenvectors to find the action of M on the
point p in the {U, V} coordinate system.
6.4. Eigenvalues and Eigenvectors 307

We now have the point q represented in the {U, V} coordinate system, that is, q{U,V} . The
final step is to transform the point q back into the original {X, Y} coordinate system to get
qX
q{X,Y } = (Figure 6.14).
qY

Y qX q U
qV

V
qU
qY

p
X

Figure 6.14: Transforming the point q back into the original {X, Y} coordinate system.

These figures graphically illustrate the process of finding the new point using the {U, V}
coordinate system. Now, in order to actually calculate that point, we have to do it algebraically,
using the linear algebra of coordinate transforms.

Changing bases: coordinate transforms. In R2 , we have been using as our basis vectors the
standard basis  
1 0
{X, Y} = { , }
0 1

The key to calling this set of vectors a basis is that every vector p can be written in the
{X, Y} coordinate system as
  
pX 1 0
p{X,Y} = = pX + pY = pX X + pY Y
pY 0 1
But the standard basis isn’t the only possible one. In fact, any two vectors that aren’t multiples
of each other can serve as a basis for R2 . 
pX
If we pick U and V as two such vectors, then every vector p that had coordinates in
pY

pU
the {X, Y} basis now has a new set of coordinates in the {U, V} basis. We want to find
pV
those new coordinates.
In general, there is always a matrix transform that will take the representation of a point
expressed in any basis in Rn to any other basis. Here we will illustrate this for the case in R2 in
which the two coordinate systems are {Z, W} and {U, V}.
Suppose we have a vector p and we know its coordinates in {Z, W} space as p{Z,W } . We
would like to know the vector p expressed in the {U, V} coordinate system, that is, p{U,V } . In
other words, we want to find the transformation matrix T such that p{U,V } = T p{Z,W } .
308 Linear Algebra

In order to find the transformation matrix T , the key is to express the “old” coordinates
{Z, W} in terms of the “new” {U, V} coordinates. Assuming that there are a, b, c, d such that
Z = aU + bV
W = cU + dV
we can substitute for Z and W the corresponding expressions in U and V to get an expression
for p in the {U, V} coordinates as
p{Z,W} = pZ Z + pW W



= pZ aU + bV + pW cU + dV



= a · pZ + c · pW U + b · pZ + d · pW V
= pU U + pV V
So   
pU a · pZ + c · pW a c pZ
p{U,V} = = =
pV b · pZ + d · pW b d pW

Therefore, the transformation matrix T that gives us p{U,V} in terms of p{Z,W} is



a c
T =
b d
and the required transformation is

a c
p{U,V} = T p{Z,W} = p
b d {Z,W}
Now we need to find a, c, b, and d. First, let’s recall the definition of each of the coordinates
in terms of their components:
   
ZX WX UX VX
Z= , W= , U= , V=
ZY WY UY VY
Notice that in each case, we are expressing the coordinate vector in terms of its representation
in the standard {X, Y} basis. So, while we are transforming from one arbitrary {Z, W} basis
to another arbitrary {U, V} basis, we are keeping track of both of them in terms of their
representation in the standard {X, Y} basis.
Substituting the component definitions of each coordinate into the definition of a, b, c, and
d, we get   
ZX UX V
=a +b X
Z = aU + bV ZY UY VY
⇐⇒   
W = cU + dV WX U V
=c X +b X
WY UY VY

If we multiply this out, we get


ZX = aUX + bVX
ZY = aUY + bVY
WX = cUX + dVX
WY = cUY + dVY

These are four linear equations in four unknowns. We can solve this problem by hand, or we
can use the computer algebra function of SageMath to do all the messy work. The result of this
6.4. Eigenvalues and Eigenvectors 309

algebra is that we now have a, b, c, and d in terms of the components of U, V, Z and W:


−VX ZY + VY ZX VY WX − VX WY VY WX − VX WY UX WY − UY WX
a= , b= , c= , d=
UX VY − UY VX UX VY − UY VX UX VY − UY VX UX VY − UY VX
If we assemble these into the transformation matrix T , we get

a c 1 −VX ZY + VY ZX VY WX − VX WY
T = =
b d UX VY − UY VX UX ZY − UY ZX UX WY − UY WX
This is a complete expression for the transformation matrix. It cannot fail to give us the trans-
formation matrix, unless, of course, the expression in the denominator UX VY − UY VX equals 0.
What does it mean for UX VY − UY VX to be equal to zero?
UX VY − UY VX = 0 ⇐⇒ UX VY = UY VX
If we assume that neither U nor V is the Y axis, which would otherwise make UX = 0 or
VX = 0, then we can divide by each of them and get
UY VY
=
UX VX
Notice that UY
UX is the slope of the U vector, and VY
VX is the slope of the V vector.

Y U
UY
V
VY

UX VX X

If the slope of U is equal to the slope of V, then U and V are multiples of each other, and
therefore they are not a basis for R2 .

Exercise 6.4.6 Show that under the condition UX VY − UY VX = 0, if U is the Y axis (UX = 0),
then V has to be the Y axis as well (VX = 0), and vice versa, which contradicts our assumption
that U and V serves as a basis in R2 .

The action of M. We can now return to our problem of evaluating the action of M on the test
point p = (1, 0.5) in the {X, Y} coordinate system, that is,
 
pX 1
p{X,Y } = =
pY 0.5
using the eigenvalues and eigenvectors of M. Our first task is to find the test point p expressed
in the coordinate system of the eigenvectors U and V of the matrix M. This is a straightforward
application of the transformation matrix T we just developed.
Here the “old” coordinate system {Z, W} is
 
1 0
{Z, W} = {X, Y} = { , }
0 1
310 Linear Algebra

and the “new” coordinate system is the system of eigenvectors U and V of the matrix M:
 
1 1
{U, V} = { , }
2 −1
The coordinate components we need to calculate T are
  ⎧
1 ⎪
⎪ ZX =1
Z=
ZX
= ⎪

0 ⎪
⎪ ZY =0
ZY ⎪

  ⎪

WX 0 ⎪
⎪ WX =0
W= = ⎪

WY 1 ⎨W =1
Y
  ⇐⇒
UX 1 ⎪
⎪ UX = 1
U= = ⎪

UY 2 ⎪
⎪ UY = 2
  ⎪



VX 1 ⎪
⎪ VX = 1
V= = ⎪

VY −1 ⎩
VY = −1
So the transformation matrix T from the “old” {X, Y} coordinate system to the “new” {U, V}
coordinate system is ⎡ ⎤
1 1
⎢ 3 ⎥
= ⎣3
a c
T =
b d 2 1⎦

3 3

Then we can use this transformation matrix T to give us the test point p expressed in the {U, V}
coordinate system, p{U,V } , in terms of p{X,Y } :
⎡ ⎤
1 1  

p{U,V } = T p{X,Y } = ⎣ 3 3 ⎥ 1 = 0.5
2 1 ⎦ 0.5 0.5

3 3
Therefore, our test point is
 
pU 0.5
p{U,V } = = = pU U + pV V
pV 0.5
Now that we have the point expressed in the eigenvector {U, V} coordinate system, we can
use the eigenvalues to calculate the action of the matrix. We said that the action of that
matrix M is that it acts like multiplication by λ1 along its corresponding U eigenvector, and
multiplication by λ2 along its corresponding V eigenvector.
Therefore, in order to find the point, which we will call q, that results from the action of
the matrix M on the test point p, we simply multiply the U-component of p by λ1 and the
V-component of p by λ2 to find q{U,V } :
    
qU λ 0 qU λ1 · pU 5 × 0.5 2.5
q{U,V } = = D p{U,V} = 1 = = =
qV 0 λ2 qV λ2 · pV −1 × 0.5 −0.5
To confirm this and check our work, let’s calculate the action of M in the {X, Y} coordinate
system and then transform the result into the {U, V} coordinate system and see whether the
two calculations agree.
6.4. Eigenvalues and Eigenvectors 311

First, we find q{X,Y } by applying M to the test point p{X,Y } :


 
1 2 1 2
q{X,Y } = M p{X,Y } = =
4 3 0.5 5.5
Then we use the transformation matrix T to transform q{X,Y } into q{U,V } ,
⎡ ⎤
1 1  
⎢ 3 ⎥ 2 2.5
q{U,V } = T q{X,Y } = ⎣ 3 ⎦ =
2 1 5.5 −0.5

3 3
which agrees exactly with our calculation of q{U,V } using the eigenvalues. The two methods of
calculating q{U,V } are equivalent:
T
q{X,Y
 } −−−−−−−−−−→ q{U,V
 }
⏐ ⏐
⏐ ⏐ λ1 0
M⏐ ⏐D =
⏐ ⏐ 0 λ2
T
p{X,Y } −−−−−−−−−−→ p{U,V }
However, q{U,V } is not what we originally wanted; we wanted q{X,Y } . We need to take one
step further and somehow get back to the {X, Y} coordinate system from q{U,V } . To do this,
we need the inverse of the matrix T , that is, the matrix that “undoes" the action of T . To find
this matrix, called T −1 , realize that

1 0
T −1 T =
0 1
If we let
c1 c2
T −1 =
c3 c4

then we have ⎡ ⎤
1 1
c1 c2 ⎢ 3 3 ⎥ 1 0
⎣ =
c3 c4 2 1⎦ 0 1

3 3

which implies

⎪ 1 2

⎪c1 3 + c2 =1 ⎧

⎪ 3
⎡ ⎤ ⎪
⎪ 1 ⎪
⎪c1 =1
1 2 1 1 ⎪
⎪ 2 ⎪

c + c2 c1 − c 2 ⎨c1 − c2 =0 ⎨c =1
⎢ 13 3 3 3⎥ = 1 0 3 3 2
⎣ 1 =⇒ =⇒
2 1 1⎦ 0 1 ⎪
⎪ 1 2 ⎪
⎪c3 =2
c3 + c 4 c3 − c 4 ⎪
⎪c3 + c4 =0 ⎪

3 3 3 3 ⎪
⎪ 3 3 ⎩c = −1

⎪ 1 1 4

⎩c3 − c4 =1
3 3
So
−1 1 1
T =
2 −1
312 Linear Algebra

Consequently, we can go from p{X,Y} to q{X,Y} by transforming into the {U, V} system by T ,
applying D, and then transforming back into the {X, Y} coordinate system using T −1 :
q{X,Y } = Mp{X,Y } = T −1 DT p{X,Y }
In summary, we can evaluate the action of the matrix M on a point by applying the diagonal
matrix D:
T −1
 } ←−−−−−−−−−− q{U,V
q{X,Y  }
⏐ ⏐
⏐ ⏐ λ1 0
M⏐ ⏐D =
⏐ ⏐ 0 λ2
T
p{X,Y } −−−−−−−−−−→ p{U,V }

This may seem as though we are not saving much effort, because we also have to figure
out T and T −1 . However, if M is a matrix representing a dynamical system, then we need to
iterate M many times to simulate the dynamics. In this case, the advantage is clear: we need to
calculate and apply T and T −1 only once, and the rest of the iteration process is simply applying
the diagonal matrix D many times, which is easy:
M · · M p{X,Y } = T −1 D
 ·  ·
· · D T p{X,Y }
N N

Are all matrices diagonalizable?



1 2
We have successfully diagonalized the matrix , and it makes sense to ask, are all matrices
4 3
diagonalizable in this way?
The answer is no. Consider the matrix

0 1
M=
−1 −1
Let’s calculate its eigenvalues. Plugging the matrix coefficients into the characteristic equation
(equation (6.3) on page 302),

(a + d) ± (a + d)2 − 4(ad − cb)
λ=
2
we get  √
(−1) ± (−1)2 − 4(0 − (−1)) −1 ± −3
λ= =
2 2

and here we have a problem. Notice the “ −3” term.
√ As you know, there is no such real number.
There is a concept of imaginary numbers, like i = −1, and in that notation, we can write our
eigenvalue as √ √ √
−1 ± 3 −1 1 3
λ= =− ± i
2 2 2

But what can this mean? It certainly does not look good for our goal of decomposing the
matrix into two 1D multiplications.
In fact, the appearance of imaginary numbers is an infallible sign that we are dealing with a
type of motion that is indecomposable, namely, rotation.
6.4. Eigenvalues and Eigenvectors 313

The reason why complex numbers are associated with rotations can be made intuitive. Think
of a function f that has an eigenvalue λ = −1 along the eigenvector X. The action of f is to
flip the direction of any vector along this axis, for example, it would flip (1, 0) to (−1, 0); see
Figure 6.15.

Y Y
λ= 1

f
X -1 0
X
0 1 1

λ= 1

Figure 6.15: The function f , whose eigenvalue is −1 along its eigenvector (which is the X axis)
flips a positive vector (left) to a negative one (right).

Now think about this function not as a flip, but as a rotation through 180◦ , say counter-
clockwise. And now let’s consider a rotation of 90◦ , say counterclockwise. What would be the
eigenvalue of this 90◦ rotation? It has the property that applying it twice has the effect of a flip,
that is, λ = −1. But as we saw earlier, if f (X) = λX, then the effect of applying f twice is
! " ! " ! "
f f (X) = λ f (X) = λ λX = λ2 X

The 90-degree rotation applied twice is the 180◦ rotation. So if λ90◦ were the eigenvalue of
the 90◦ rotation, it would have to have the property that
(λ90◦ )2 = −1
That, of course, implies that λ90◦ is imaginary. The equation has two solutions,
λ90◦ = ±i
The two solutions +i and −i correspond to the counterclockwise and clockwise rotations
(Figure 6.16).

λ = +i λ = +i
λ = +i

0
0
λ= i λ= i
λ= i

Figure 6.16: Left: the imaginary eigenvalues λ = ±i represent a 90 degree rotation, either
clockwise (λ = −i) or counterclockwise (λ = +i). Right: applying either rotation twice has the
effect of flipping the horizontal vector, that is, multiplying it by −1.
314 Linear Algebra

It makes sense that rotation could not have real eigenvalues, because two real eigenvalues
would mean that the function could be split into two 1D expansions and contractions. But
rotation is an action that is essentially two-dimensional, and therefore indecomposable.
Think about the rotation matrices that we discussed earlier. For example, the matrix

cos θ − sin θ
M=
sin θ cos θ
represents counterclockwise rotation through the angle θ (Figure 6.17).

Figure 6.17: The effect of the rotation matrix M is to rotate the black vector counterclockwise
by θ, producing the red vector.

What are its eigenvalues? Plugging the matrix coefficients into the characteristic equation
(equation (6.3) on page 302),

(a + d) ± (a + d)2 − 4(ad − cb)
λ=
2
we get #! "2 ! "
(2 cos θ) ± 2 cos θ − 4 (cos θ)2 − (−(sin θ)2 )
λ=
2

But recall that


(cos θ)2 + (sin θ)2 = 1

so
#
1 !1 "2 1
 cos θ) ±
(2 4

 cos θ − 

4
 
λ= 1 = (cos θ) ± (cos θ)2 − 1

2 
= cos θ ± −(sin θ)2

= cos θ ± sin θ −1
Therefore, the eigenvalues for this rotation matrix consist of a pair of complex conjugate
values:
λ = cos θ ± sin θ i

And when the rotation angle is θ = 90◦ , the eigenvalues are


λ90◦ = cos 90◦ ± sin 90◦ i = ±i
This confirms our earlier remark that the λ for a 90◦ rotation would have to be λ = ±i.
6.4. Eigenvalues and Eigenvectors 315

We can now return to our original example:



0 1
M=
−1 −1
We calculated its eigenvalues as

1 3
λ=− ± i
2 2
which implies that the action of M must be a rotation. We can confirm this by applying M to
some random test points.
Note that successive applications of the matrix M bring the point back to its original position
after three iterations (Figure 6.18).

30
Y

P = M3P
20

M2P
10

X
30 20 10 10 20 30

10

20

30 MP

Figure 6.18: Applying the matrix M to the point p three times brings it back to p.


1 0
Exercise 6.4.7 Show that M 3 =
0 1

5
Exercise 6.4.8 Using the point p = as the test point, apply M three times to calculate
0
Mp, M 2 p, and M 3 p.

Thus, we confirm that complex eigenvalues imply the existence of rotation. To put it another
way, what is an eigenvector? It’s a vector whose direction is unchanged by the action of M,
which merely stretches, contracts, and/or flips it. But obviously, in the action of a rotation, no
direction stays the same! So a rotation cannot have real eigenvalues or real eigenvectors.
So we can now give a definite answer to our question, are all matrices diagonalizable? The
answer is no. Instead there is a weaker condition that is true: every 2D matrix is either
(1) diagonalizable, which means that it has two real eigenvalues, or
(2) a rotation (possibly together with expansion and/or contraction), which means that it has
a pair of complex conjugate eigenvalues.
316 Linear Algebra

Eigenvalues in n Dimensions
We have focused so far on 2D linear functions f : R2 → R2 and used the variables X and Y to
describe the domain and U and V to describe the codomain.
Now we want to study the n-dimensional case, and we will need a new terminology for the
variables. We want to consider an n-dimensional linear function
f : Rn −→ Rn
We will call the domain variables X = (X1 , X2 , . . . , Xn ) and the codomain variables Y =
(Y1 , Y2 , . . . , Yn ), so
f (X) = Y
f (X1 , X2 , . . . , Xn ) = (Y1 , Y2 , . . . , Yn )
From the definition of linear function, we know that there are constants
a11 , a12 , . . . , a1n , a21 , a22 , . . . , a2n , an1 , an2 , . . . , ann
such that
Y1 = a11 X1 + a12 X2 + · · · + a1n Xn
Y2 = a21 X1 + a22 X2 + · · · + a2n Xn
.. ..
. .
Yn = an1 X1 + an2 X2 + · · · + ann Xn
so that f is represented by the matrix
⎡ ⎤
a11 a12 ... a1n
⎢a21 a22 ... a2n ⎥
⎢ ⎥
⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
an1 an2 ... ann
The application of f to the vector X is then represented by
⎡ ⎤⎛ ⎞ ⎛ ⎞
a11 a12 . . . a1n X1 Y1
⎢a21 a22 . . . a2n ⎥ ⎜X2 ⎟ ⎜Y2 ⎟
⎢ ⎥⎜ ⎟ ⎜ ⎟
⎢ .. .. .. .. ⎥⎜ .. ⎟ = ⎜ .. ⎟
⎣ . . . . ⎦⎝ . ⎠ ⎝ . ⎠
an1 an2 . . . ann Xn Yn
Do n-dimensional linear functions have eigenvalues and eigenvectors? The answer is that the
n-dimensional case is remarkably like the 2-dimensional case. We will need some theorems and
principles from a linear algebra course or text. We will state them here as we need them; the
interested reader is encouraged to look them up for fuller treatment.
The first question is, can we find eigenvalues? Recall that in 2D, we wrote down the equation
ME = λE
where M is the matrix in question and λ and E are the desired eigenvalue and corresponding
eigenvector. In 2D, we wrote this matrix as

a b
M=
c d
6.4. Eigenvalues and Eigenvectors 317

We then brute-force solved the linear equations and got the characteristic polynomial
λ2 + (a + d)λ + (ad − cb) = 0
In order to generalize this process to n dimensions, we have to go back and restate our
argument in more general language. We were looking for eigenvectors and eigenvalues by trying
to solve
ME = λE

This is equivalent to saying


ME = (λI)E

where I is the identity matrix


1 0
I=
0 1

but that implies


ME − (λI)E = 0
=⇒ (M − λI)E = 0
For every matrix, linear algebra defines a quantity, called the determinant. The determinant of
a matrix is a number that provides certain information about the matrix. Linear algebra defines
this number, called det(M) or |M|, for an arbitrary n-dimensional matrix M.
The details of the definition need not concern us here. What is important is two facts about
the determinant:
(1) The equation (M − λI)E = 0 has a nontrivial solution if and only if
det(M − λI) = 0

a11 a12
(2) the determinant of a 2D matrix M = is
a21 a22
det(M) = a11 a22 − a21 a12
We can now redescribe our brute-force derivation of the characteristic polynomial in 2D by
realizing that we are looking for solutions to
(M − λI)E = 0

a b
Since M is the matrix , the requirement
c d
 $ $
a−λ b $a − λ b $
det = $$ $=0
c d −λ c d − λ$
implies
(a − λ)(d − λ) − cb = 0
2
=⇒ λ + (a + d)λ + (ad − cb) = 0
which is exactly the characteristic polynomial!
318 Linear Algebra

The format det(M−λI) = 0 generalizes to n dimensions: the eigenvalues of the n-dimensional


matrix ⎡ ⎤
a11 a12 . . . a1n
⎢a21 a22 . . . a2n ⎥
⎢ ⎥
M=⎢ . .. .. .. ⎥
⎣ .. . . . ⎦
an1 an2 ... ann

are exactly the solutions to this equation.


The actual calculation of the determinant in higher dimensions is messy and is best left
to computer algebra programs, such as SageMath. This is especially true because just as the
2D characteristic polynomial contains a λ2 term, the n-dimensional characteristic polynomial
contains a λn term. Solving higher-order polynomial equations is extremely tedious and difficult
by hand.
We do know one very important fact, so important that it is sometimes called the fundamental
theorem of algebra: An nth-order polynomial equation
a1 X n + a2 X n−1 + · · · + an = 0 (where the a1 , a2 , . . . , an are real numbers)
has exactly n solutions. Moreover, these solutions are either real or pairs of complex conjugates.
These n solutions are exactly the eigenvalues λ1 , λ2 , . . . , λn of the n × n matrix M.
Therefore, an n-dimensional matrix has exactly n eigenvalues, and each of them is either a
real number or half of a pair of complex conjugate numbers.

Further Exercises 6.4


⎛ ⎞
3
1. If M is a 3 × 3 matrix and ⎝−2⎠ is an eigenvector of M with corresponding eigenvalue
3
⎛ ⎞
4
5, what is M ⎝−2⎠?
3

2. If f : R4 → R4 ⎛is a linear
⎞ function and −2 is an eigenvalue of f with corresponding
3
⎜1⎟
eigenvector v = ⎜ ⎟
⎝−3⎠, what is f (v)?
−7

−7 3 1
3. The matrix A = has an eigenvector . What is its corresponding eigen-
−18 8 3
value?
⎡ ⎤ ⎛ ⎞
2 −5 −4 2
4. The matrix A = ⎣0 3 2 ⎦ has an eigenvector ⎝−2⎠. What is its corresponding
0 −4 −3 4
eigenvalue?
6.4. Eigenvalues and Eigenvectors 319

7 −5
5. Which of the following are eigenvectors of ? What are their corresponding
10 −8
eigenvalues?
   
2 2 −1 −2
a) b) c) d)
3 4 2 −2

6. Compute the eigenvalues and, if they exist, eigenvectors of the following matrices:

7 9 0 2 5 −4 3 4
a) b) c) d)
3 1 4 6 2 0.5 2 −1

−1 −2 3 0
e) f)
5 9 0 1

7. Compute the eigenvalues of the linear function


 
X 4X − 5Y
f( )=
Y 2X − 2Y

−9 −8
8. One of the eigenvalues of the matrix is 3. What is a corresponding eigen-
12 11
vector for it?
⎡ ⎤
4 5 −3

9. One of the eigenvalues of the matrix 4 6 −4⎦ is 2. What is a corresponding
8 11 −7
eigenvector for it?

10. a) Solve for a and b in the equation


  
2 −3 9
a +b =
5 1 14

9
b) Use your answer to part (a) to give the coordinates of with respect to the
14
 
2 −3
basis , .
5 1
 
−7 1

11. Give the coordinates of with respect to the basis , −1, 2 .
5 1
12. The point of this problem is to demonstrate that if you know all the eigenvalues and
eigenvectors of a linear function f (or a matrix M), you can compute f (W) (which
is MW) for every vector W. In short, knowing all the eigenvalues and eigenvectors is
equivalent to knowing the function.
a) Solve for u and v in the equation
  
2 −3 9
u +v =
5 1 14
320 Linear Algebra

(Hint: You will probably want to rewrite this as a system of equations and “solve
simultaneously.”)

9
b) Explain what your answer to part (a) means about the coordinates of in some
14
nonstandard coordinate system. (Hint: Which one?)
c) Suppose that f : R2 → R2 is a linear function and its eigenvectors are as follows:
 
2 −3
with eigenvalue 2, and with eigenvalue −3
5 1
 
2 −3
What is f ( )? What is f ( )?
5 1

9
d) Continuing from part (c), what is f ( )? (Hint: Use your answers to parts (a)
14
and (c) and the two defining properties of a linear function.)

13. Diagonalize the following matrices:



8 −3 2 −2 2 3
a) b) c)
10 −3 0 −1 4 1

6.5 Linear Discrete-Time Dynamics


We will now develop an application of linear algebra to linear discrete-time dynamical systems.
Here f : Rn → Rn is the function that tells us that
XN+1 = f (XN )
In 1D, we saw that the only functions that can pass the stringent test for linearity are the
functions f (X) = kX, where k is some constant in R. If k = 0, these functions can equal 0
only once, and that is when X = 0. The definition of an equilibrium point for a discrete-time
dynamical system is
XN+1 = XN

But if XN+1 = kXN , then this would imply kXN = XN . If XN = 0, then k must equal 1.
And k = 1 is a very special value that is atypical and to be avoided; note that if k = 1, every
point is an equilibrium point. As we saw in our discussion of discrete-time dynamical systems
(“Discrete-Time Dynamical Systems” in Chapter 5 on page 225), the fact that f (X) = kX can
be zero only when X = 0 means that the discrete-time system XN+1 = kXN has exactly one
equilibrium point, at X = 0. As we saw, this equilibrium point is stable if |k| < 1, and unstable
if |k| > 1.

Linear Uncoupled Two-Dimensional Systems


Let’s consider the two-dimensional case. To create our first example, we will take two 1D
discrete-time systems and join them together into an uncoupled (or decoupled) 2D system.
“Uncoupled” means that the growth of X depends only on X, and the growth of Y depends only
6.5. Linear Discrete-Time Dynamics 321

on Y :  
XN+1 = α XN XN+1 αXN
=⇒ =
YN+1 = β YN YN+1 β YN

This can also be written in matrix form:


 
Xn+1 α 0 Xn
=
Yn+1 0 β Yn
Notice that all the nonzero entries of this matrix are located on the diagonal going from the
top left corner to the bottom right (the main diagonal). It’s a diagonal matrix. If the matrix
representing a system of equations is diagonal, the variables in the equations are uncoupled.
So for example, if there are two noninteracting populations, one of which is growing at 40%
a year and the other at 20% a year, the system is described by the 2 × 2 matrix

α 0 1.4 0
=
0 β 0 1.2
If we begin with an initial condition
 
X0 50
=
Y0 50
then the population of the two species in the following year is
   
X1 1.4 0 50 1.4 × 50 + 0 × 50 70
= = =
Y1 0 1.2 50 0 × 50 + 1.2 × 50 60
If we iterate this matrix repeatedly, we see that if we start at an initial condition of (X0 , Y0 ) =
(50, 50), the trajectory quickly flattens out, and the growth becomes mostly in the X direction
(Figure 6.19). The lesson here is that if a diagonal matrix has unequal growth rates, then the
dynamics will be eventually dominated by the larger growth rate. Here the growth rate along the
X axis is 40% and the growth rate along the Y axis is 20%, so the dynamics will eventually be
dominated by the growth in X.

1000 (X13 , Y13) ≈ (3968, 535)


(X0 , Y0) = (50, 50)
Y

0 1000 2000 3000 4000


X

Figure 6.19: Repeated applications of a matrix will result in a trajectory that lies along the
direction of the dominant eigenvector. Here both populations are growing.

We can also have declining populations. If one population is growing at 40% a year and the
other is declining at 20% a year, the matrix describing the system is

α 0 1.4 0
=
0 β 0 0.8
If we iterate this matrix repeatedly, we see that there is growth in the X direction and
shrinking in the Y direction, and once again, the growth dynamics are eventually dominated by
the dimension with the larger growth (Figure 6.20).
322 Linear Algebra

200
(X0 , Y0) = (50, 50)
(X8 , Y8) ≈ (738, 8)
Y

0
X
200 400 600 800

Figure 6.20: When one population is declining, the long-term trajectory still lies along the direc-
tion of the dominant eigenvector.

Uncoupled systems are therefore easy to analyze, because the behavior of each variable can
be studied separately and the system then reassembled. Each variable is growing or shrinking
exponentially, and the overall system behavior is just a combination of the behaviors of the
variables making it up. (For simplicity, we use the word “grow” from now on to mean either
positive or negative growth.)

Exercise 6.5.1 If there are two noninteracting populations, one of which is growing at 20% a
year and the other at 25% a year, derive the matrix that describes the dynamics of the system
and simulate a trajectory of this system.

Exercise 6.5.2 If one population is growing at 20% a year and the other is declining at 10%
a year. What would be the matrix that describes this system? Draw a trajectory of this system.

Exercise 6.5.3 For the exercise above, plot time series graphs for each population separately
to show that it is undergoing exponential growth or decline.

To understand this long-term behavior better, we can examine geometrically how a system’s
state vector is transformed by a matrix. Let’s use the matrix

1.4 0
0 0.8
  
50 50 0
and apply it to three test vectors , , and . We get
0 50 50
     
1.4 0 50 70 1.4 0 50 70 1.4 0 0 0
= = =
0 0.8 0 0 0 0.8 50 40 0 0.8 50 40
If we plot these (Figure 6.21), we see that if a vector is along the X or Y axis, it just grows
or shrinks when multiplied by the matrix. However, a vector in general position is rotated in
addition to growing.
There is one more case we have to deal with. So far, all the entries in our matrices have
been positive real numbers. We have been thinking of examples in population dynamics, and the
only multipliers that make sense in population dynamics are positive real numbers. Suppose, for
example, that one of the matrix entries was negative. Then when we applied the matrix to a
vector of populations, one of the populations would become negative, which makes no sense in
the real world. But in general, state variables can take on any values, positive or negative, and
in these cases, negative multipliers make sense.
6.5. Linear Discrete-Time Dynamics 323
Y
50

0 X
50

Figure 6.21: Black colors denote the three test vectors. Red colors denote the vectors that result
after applying the matrix to these test vectors.

Consider, for example, the matrix



−1.4 0
0 0.8
If we begin with an initial condition
 
X0 50
=
Y0 50
then the next value is
   
X1 −1.4 0 50 −1.4 × 50 + 0 × 50 −70
= = =
Y1 0 0.8 50 0 × 50 + 0.8 × 50 40
If we apply the matrix repeatedly, we get a trajectory that flips back and forth between
positive and negative X values, since multiplying twice by a negative number gives a positive
number. This results in an oscillation. This particular oscillation has a growing amplitude, since
| − 1.4| > 1. At the same time, the dynamics along the Y axis are shrinking, since 0.8 < 1
(Figure 6.22).

Y
50
(X0 , Y0) = (50, 50)

(X1 , Y1) = (-70, 40)

0
X
-600 -400 -200 200 400 600

Figure 6.22: When the dominant eigenvalue is negative, repeated applications of the matrix still
result in a trajectory that lies along the dominant eigenvector (here the X axis), while flipping
back and forth between positive and negative X values.
324 Linear Algebra

Note that as the number of iterations grows, the trajectory grows flatter and flatter, and it
clings more and more to the X axis. Thus the long-term behavior of this system will be dominated
by the changes in X, because | − 1.4| > |0.8| and −1.4 is the eigenvalue in the X direction.
For every matrix, let’s define its principal eigenvector as the eigenvector whose eigenvalue
has the largest absolute value. (Since these matrices are diagonal, their eigenvalues are simply
the matrix entries on the main diagonal, and the corresponding eigenvectors are the X and Y
axes.)
We can now make a general statement, which is illustrated by all three examples: the long-
term behavior of an iterated matrix dynamical system is dominated by the principal eigenvalue,
and the state point will evolve until its motion lies along the principal eigenvector.
We can now summarize the behavior of 2D decoupled linear discrete-time systems. These are
the systems represented by the matrix

α 0
0 β
They have a unique equilibrium point at (0, 0), and the stability of that equilibrium point is
determined by the absolute value of α and β:
• If |α| > 1 and |β| > 1, then the equilibrium point is purely unstable.
• If |α| < 1 and |β| < 1, then the equilibrium point is purely stable.
• If |α| < 1 and |β| > 1 (or the reverse, |α| > 1 and |β| < 1), then the equilibrium point is
an unstable saddle point.
Moreover, the signs of α and β determine whether the state point oscillates on its way toward
or away from the equilibrium point.
• If α < 0, there is oscillation along the X axis.
• If β < 0, there is oscillation along the Y axis.
• If α > 0, there is no oscillation along the X axis.
• If β > 0, there is no oscillation along the Y axis.

Exercise 6.5.4 By determining the absolute value and the signs of α and β, predict the long-
term behavior of the four discrete dynamical systems described by the following matrices:

−2 0 1.3 0 −0.2 0 0.5 0
a) b) c) d)
0 0.5 0 0.6 0 0.8 0 0.8
and then verify this prediction by iterating the matrix to simulate the dynamical systems.

Linear Coupled Two-Dimensional Systems

In the more general case, of course, X and Y are coupled: the next X value depends on both
the previous X value and the previous Y value, and so does the next Y value. This gives us a
matrix  
XN+1 = aXN + bYN XN+1 aXN + bYN
=⇒ =
YN+1 = cXN + d YN YN+1 cXN + d YN
6.5. Linear Discrete-Time Dynamics 325

which can then be written in the matrix form


 
XN+1 a b XN
=
YN+1 c d YN
where the off-diagonal entries are not zero.
We have already seen that such a matrix has eigenvalues λ1 and λ2 . Generically, these com-
pletely determine the action of the matrix.
• If λ1 and λ2 are real numbers, then there exist eigenvectors U and V corresponding to
those eigenvalues.
• If an eigenvalue has absolute value less than 1, the matrix shrinks vectors lying along that
eigenvector.
• If an eigenvalue has absolute value greater than 1, the matrix expands vectors lying along
that eigenvector.
• If an eigenvalue is negative, the action of the matrix is to flip back and forth between
negative and positive values along that eigenvector.
• The other case was that λ1 and λ2 are a pair of complex conjugate eigenvalues, and then
the action of the matrix was a rotation.
We will now use exactly these insights to draw conclusions about matrices as discret-time
dynamical systems: to determine the stability of the equilibrium point at (0, 0), find the eigen-
values of the matrix and infer stability.
Let’s look at some examples.

A Saddle Point: The Black Bear Model

We previously saw a model of black bear populations in which the juvenile and adult populations
in the (N + 1)st year were given as a linear function of the populations in the Nth year:
 
JN+1 = 0.65JN + 0.5AN JN+1 0.65JN + 0.5AN
=⇒ =
AN+1 = 0.25JN + 0.9AN AN+1 0.25JN + 0.9AN
where 0.65 is the fraction of juveniles who remain alive as juveniles in the next year, and 0.25
is the fraction of juveniles who mature into adults that year. Furthermore, 0.5 is the birth rate
with which adults give birth to juveniles, and 0.9 is the fraction of adults who survive into the
next year.
The matrix form is  
JN+1 0.65 0.5 JN
=
AN+1 0.25 0.9 AN

We saw that if we iterated M repeatedly, the juvenile and adult populations went to infinity
(Figure 6.4 on page 292). We will now explain why that is the case by looking at the eigenvalues
and corresponding eigenvectors of M.
First, let’s find the eigenvalues for the matrix

0.65 0.5
M=
0.25 0.9
by plugging the matrix coefficients into the characteristic equation (equation (6.3) on page 302):

(0.65 + 0.9) ± (0.65 + 0.9)2 − 4(0.65 × 0.9 − 0.25 × 0.5)
λ=
2
326 Linear Algebra

1.6 ± (0.75)2 1.55 ± 0.75
= =
2 2
= (1.15, 0.4)
Therefore, the two eigenvalues are
λ1 = 1.15 and λ2 = 0.4
Note that these are real numbers and that |λ1 | > 1 and |λ2 | < 1. Therefore, the behavior
must have one stable direction and one unstable direction. In other words, it must be a saddle
point.
To find the axes of the saddle point, we calculate the eigenvectors U and V corresponding
 will
J
to each eigenvalue. Let’s say that U = . The matrix M acts like multiplication by λ1 along
A
U, which means that
MU = λ1 U

So we can say
   
0.65 0.5 J 0.65J + 0.5A J 1.15J
MU = = = λ1 U = 1.15 =
0.25 0.9 A 0.25J + 0.9A A 1.15A
So
0.65J + 0.5A = 1.15J =⇒ A=J
0.25J + 0.9A = 1.15A =⇒ A=J

Now A = J is the equation for a line in (J, A) space that has slope = +1. This line
 is the axis
1
U. We can choose any vector on the U axis to represent it, for example the vector , which
1
is then an eigenvector of the matrix M corresponding to the eigenvalue λ1 = 1.15.
The eigenvector corresponding to the second eigenvalue λ2 = 0.4 can be found in a similar
manner. It satisfies
MV = λ2 V

J
Let’s assume V = . Then
A
   
0.65 0.5 J 0.65J + 0.5A J 0.4J
MV = = = λ2 V = 0.4 =
0.25 0.9 A 0.25J + 0.9A A 0.4A
So
0.65J + 0.5A = 0.4J =⇒ A = −0.5J
0.25J + 0.9A = 0.4A =⇒ A = −0.5J

The equation A = −0.5J is the equation for a line in (J, A) space that has slope = −0.5.
This line axis V. We can choose any vector on the V axis to represent it, for example the
 is the
−2
vector , which is then an eigenvector of the matrix M corresponding to the eigenvalue
1
λ2 = 0.4.
If we plot these eigenvectors and choose a point, let’s say
 
J0 10
=
A0 50
6.5. Linear Discrete-Time Dynamics 327

as our initial condition and apply the matrix on this vector once, we get
   
J1 0.65 0.5 10 0.65 × 10 + 0.5 × 50 31.5
= = =
A1 0.25 0.9 50 0.25 × 10 + 0.9 × 50 47.5
We see that the action of this matrix is to push the state point closer to the U axis while
moving away from the V axis. Thus, for this initial condition, the action of the matrix is to
increase the number of juveniles and decrease the number of adults in the first year (Figure 6.23).

A (J0 , A0) (J1 , A1)


50 U ( λ1= 1.15)

V ( λ2= 0.4)

J
50 50

Figure 6.23: One application of the matrix M to the point (J0 , A0 ) takes it to (J1 , A1 ) which is
closer to the dominant eigenvector U axis and further from the V axis.

If we iterate the matrix many times from two different initial conditions, we see that successive
points march toward the U axis and out along it. Since the U axis is the line A = J, we can say
that the populations of the two age groups approach a 1 : 1 ratio, while the whole population
grows larger and larger (Figure 6.24).

400 A U 400 A U
(J0, A0) = (10, 50)

200 200
V V (J0, A0) = (100, 20)

J 200 200 400


J
200 200 400

Figure 6.24: After repeated iterations of the matrix M, the long-term trajectory lies along the
direction of the dominant eigenvector U axis, regardless of the initial conditions. Eventually, the
ratio of adults to juveniles approaches a constant value.

Finally, our theoretical prediction of “saddle point” can be confirmed by applying the matrix
repeatedly to a set of initial conditions lying on a circle. In this way, we can construct a graphical
picture of the action of M. We see that the action is to squeeze along the V axis and expand
along the U axis (Figure 6.25).
Notice an interesting feature of Figure 6.25. We started with a circle of initial conditions, but
by the fifth iteration, the original circle had flattened nearly into a line, and that line was lying
along the principal eigenvector.
328 Linear Algebra

A U

Figure 6.25: One application of the J-A matrix to a circle of initial conditions (black dots)
transforms them into an oval (dark gray dots). Applying the matrix for the second time, it
flattens the oval even further and expands it along the U axis (light gray dots). By the fifth
iteration (red dots), the initial circle has been transformed into a line lying along the principal
eigenvector and expanding in that direction.

A Stable Equilibrium Point: Black Bear in a Bad Year

Let’s consider the alternative scenario for the black bear, in a bad year.
We modeled “bad year” by lowering the birth rate from 0.5 to 0.4, and increasing the death
rate for juveniles to 40%, with 50% of them remaining juvenile and only 10% maturing to adults.
The juvenile population dynamics are
JN+1 = 0.5JN + 0.4AN
We also increased the adult death rate to 20%, and therefore, the survival rate will be
1 − 20% = 80%. The adult population dynamics are therefore
AN+1 = 0.1JN + 0.8AN
Putting these together, we get
 
JN+1 0.5JN + 0.4AN
=
AN+1 0.1JN + 0.8AN
The matrix that describes the “bad year” dynamics is

0.5 0.4
M bad =
0.1 0.8
Recall that when we iterated M bad repeatedly, both juvenile and adult populations appeared
to go to extinction (Figure 6.5 on page 292). We can explain this long-term behavior by studying
the eigenvalues and corresponding eigenvectors of M bad .
6.5. Linear Discrete-Time Dynamics 329

What are the dynamics of this system? First, let’s find the eigenvalues for the matrix by
plugging the matrix coefficients into the characteristic equation

(a + d) ± (a + d)2 − 4(ad − cb)
λ=
2
We get

(0.5 + 0.8) ± (0.5 + 0.8)2 − 4(0.5 × 0.8 − 0.1 × 0.4)
λ=
 2
1.3 ± (0.25) 1.3 ± 0.5
= =
2 2
= (0.9, 0.4)
Therefore, the two eigenvalues are
λ1 = 0.9 and λ2 = 0.4
Note that these are real numbers and that both |λ1 | < 1 and |λ2 | < 1. Therefore, the
behavior must have two stable directions. In other words, it must be a purely stable node.
To find the axes of the node, we will calculate the eigenvectors U and V corresponding to
J
each eigenvalue. Let’s say U = . The matrix M bad acts like multiplication by λ1 along U,
A
which means that
M bad U = λ1 U

So we can say
  
0.5 0.4 J 0.5J + 0.4A J 0.9J
M bad U = = = λ1 U = 0.9 =
0.1 0.8 A 0.1J + 0.8A A 0.9A
So
0.5J + 0.4A = 0.9J =⇒ A=J
0.1J + 0.8A = 0.9A =⇒ A=J

Now “A = J” is the equation for the line in (J, A) space that has slope = +1. This line is
 the

1
axis U. We can choose any vector on the U axis to represent it, for example the vector ,
1
which is then an eigenvector of the matrix M bad corresponding to the eigenvalue λ1 = 0.9.
The eigenvector corresponding to the second eigenvalue λ2 = 0.4 can be found in a similar
manner. It satisfies
M bad V = λ2 V

J
Let’s assume V = . Then
A
   
0.5 0.4 J 0.5J + 0.4A J 0.4J
M bad V = = = λ2 V = 0.4 =
0.1 0.8 A 0.1J + 0.8A A 0.4A
So
0.5J + 0.4A = 0.4J =⇒ A = −0.25J
0.1J + 0.8A = 0.4A =⇒ A = −0.25J

The equation A = −0.25J is the equation for the line in (J, A) space that has slope = −0.25.
This line is the axis V. We can choose any vector on the V axis to represent it, for example the
330 Linear Algebra

−4
vector , which is then an eigenvector of the matrix M bad corresponding to the eigenvalue
1
λ2 = 0.4.
If we plot these eigenvectors and choose a point
 
J0 10
=
A0 50
as our initial condition and apply the matrix to this vector once, we get the population of the
two age groups in the next year:
   
J1 0.5 0.4 10 0.5 × 10 + 0.4 × 50 25
= = =
A1 0.1 0.8 50 0.1 × 10 + 0.8 × 50 41
We see that the action of this matrix is to push the state point closer to the U axis while
moving away from the V axis. So the action of M bad is to move the state point toward the
U axis, but in contrast to the good year case, M bad moves the state point to a lower V-value
(Figure 6.26).

A (J0, A0) = (10, 50)


(J1, A1) = (25, 41)
50
U

25

25
J
50

Figure 6.26: One application of the matrix M bad to the point (J0 , A0 ) takes it to (J1 , A1 ) which
is closer to both the U and V axes.

If we iterate M bad repeatedly, the state point always walks toward the U axis while approaching
(0, 0), which means extinction (Figure 6.27).

A (J0 , A0) = (10, 50) A


50 50
U U

(J0 , A0) = (50, 20)


25 25

V V
J J
25 50 25 50

Figure 6.27: After repeated iterations of the matrix M bad , the ratio of adults to juveniles is
approaching a constant value, regardless of the initial conditions. Notice that the both popula-
tions are decreasing.
6.5. Linear Discrete-Time Dynamics 331

Finally, we confirm our theoretical prediction of “stable node” by applying the M bad matrix
repeatedly to a set of initial conditions that lie on a circle. The effect is to collapse the circle
onto the U axis along the direction of the V axis while shrinking along the U axis. The overall
effect is to shrink the circle to the point (0, 0) (Figure 6.28).

A
U

Figure 6.28: One application of the M bad matrix to a circle of initial conditions (black dots)
transforms them into an oval (dark gray dots). Applying the matrix for the second time, it
flattens the oval even further and shrinks it along the U axis (light gray dots). By the fifth
iteration (red dots), the initial circle has been transformed into a line lying along the principal
eigenvector and continually shrinking along that direction.

Stable Equilibrium Point with Oscillatory Approach

We also simulated another Leslie matrix for a two-stage population. In this case, 10% of juveniles
remain juvenile, 40% become adults, and the rest die. The birth rate is 1.4 offspring per adult,
and only 20% of adults survive each year. This gives us the matrix

0.1 1.4
M osc =
0.4 0.2
Repeated iteration of M osc resulted in an oscillatory approach to a stable equilibrium point at
(0, 0) (Figure 6.6 on page 293). We can understand this behavior by considering the eigenvalues
and corresponding eigenvectors of M osc .
First, let’s find the eigenvalues for the matrix by plugging the matrix coefficients into the
characteristic equation

(a + d) ± (a + d)2 − 4(ad − cb)
λ=
2
We get

(0.1 + 0.2) ± (0.1 + 0.2)2 − 4(0.1 × 0.2 − 0.4 × 1.4)
λ=
 2
0.3 ± (2.25) 0.3 ± 1.5
= =
2 2
= (0.9, −0.6)
332 Linear Algebra

Therefore, the two eigenvalues are


λ1 = 0.9 and λ2 = −0.6
These two eigenvalues are both real, and both have absolute value less than 1, so we know
that the equilibrium point is stable. To find the axes of the equilibrium point, we need to find
the corresponding eigenvectors.
First
M osc U = λ1 U

We can say that


   
0.1 1.4 J 0.1J + 1.4A J 0.9J
Mosc U = = = λ1 U = 0.9 =
0.4 0.2 A 0.4J + 0.2A A 0.9A
This gives us
0.1J + 1.4A = 0.9J =⇒ A = 1.75J
0.4J + 0.2A = 0.9A =⇒ A = 1.75J

which implies that the eigenvector U lies on the line A = 1.75J, which has slope 1.75. The
vector (J, A) = (4, 7) will serve nicely as an eigenvector on this line.
For the second eigenvector, we solve
M osc V = λ2 V
We can say that
   
0.1 1.4 J 0.1J + 1.4A J −0.6J
Mosc V = = = λ2 U = −0.6 =
0.4 0.2 A 0.4J + 0.2A A −0.6A
yielding
0.1J + 1.4A = −0.6J =⇒ A = −0.5J
0.4J + 0.2A = −0.6A =⇒ A = −0.5J

The second eigenvector is therefore any vector on the line A = −0.5J, which is the line of
slope −0.5. For example, we can take (J, A) = (2, −1) as our eigenvector V.
Having calculated the eigenvalues and the eigenvectors, we can now make the theoretical
prediction that this matrix will shrink slowly along U and collapse more quickly toward the U
axis in an oscillating manner. The presence of a negative eigenvalue means that the matrix will
flip the state point back and forth on either side of the U axis. This flipping will occur with
ever-decreasing amplitudes, since |λ2 | < 1.
Let’s verify these predictions by applying the matrix to a test point (Figure 6.29). We see,
exactly as predicted, that the state point oscillates around the U axis with diminishing amplitude
as it approaches the origin.
Finally, if we apply the matrix repeatedly to a circle of initial conditions, we see that the first
iteration has flattened the circle into an oval, which is pointing below the U axis. The second
iteration flattens and shrinks the oval further and tilts it upward, so that it is pointing above
6.5. Linear Discrete-Time Dynamics 333

A
100
U ( λ1= 0.9)

V ( λ2= -0.6 )

J
100 200

Figure 6.29: Iteration of the matrix M osc causes the state point to diminish continually along
the U axis, while also diminishing along the V axis, but in an oscillatory manner.

the U axis, while the third iteration further shrinks and flattens the oval and tilts it back to
point below the U axis. The oscillatory tilt above and below the U axis is caused by the negative
eigenvalue along the V direction (Figure 6.30).

A A A
0 0 0
2 2
1 1 1
J J 3 J

Figure 6.30: Starting with a circle of initial conditions (0), repeated action of the matrix M osc
flattens the circle into an ellipse (1), and flips the ever-flattening ellipse back and forth across
the U axis, in a diminishing manner (2, 3).

Thus the overall behavior is an oscillatory approach to the stable equilibrium point at (0, 0),
so both populations shrink to zero.

Unstable Equilibrium Point with Oscillatory Departure

In the previous example of M osc , the black bear population collapse is due partly to the low birth
rate of 1.4 offspring per adult. If we raise this birth rate to 2 offspring per adult, we get the
matrix
0.1 2
Mosc2 =
0.4 0.2

and this new system has a distinctly different behavior. Now we have an unstable oscillatory
equilibrium point (Figure 6.31).
334 Linear Algebra

A initial condition

100

0
J
100 200 300

Figure 6.31: Iteration of the matrix M osc2 results in a trajectory that is oscillatory/stable in one
direction, and expanding (unstable) in the other.

Exercise 6.5.5 Calculate the eigenvalues and eigenvectors of this matrix with increased birth
rate, and use them to explain the behavior in Figure 6.31.

Neutral Equilibria: Markov Processes, and an Example in Epidemiology

We modeled the susceptible and infected populations in an epidemic, using a Markov process
model (“Neutral Equilibria” on page 293). We saw that when we iterated the matrix M SI repeat-
edly, we observed that the system would go to a stable equilibrium, but the equilibrium depended
on the initial condition (Figure 6.7 on page 294). We can explain why this occurs by studying
the eigenvalues and corresponding eigenvectors of M SI . We will see that in Markov process
models, there is always an eigenvalue λ = 1 that gives us a line of equilibrium points along its
corresponding eigenvector.
As before, the discrete-time dynamics for this S-I compartmental model is written in matrix
form as
 
SN+1 1−a b SN
=
IN+1 a 1 − b IN
We made the assumption that at each time point (such as day, week, or month), a constant
fraction a of the susceptibles become infected and a constant fraction b of the infecteds recover
to become susceptible again. If a is the fraction of S that become I, then the fraction of S that
remain S must be 1 − a. If b is the fraction of I that become S, then the fraction of I that
remain I must be 1 − b.
We chose a = 0.1 and b = 0.2, giving us the matrix

0.9 0.2
M SI =
0.1 0.8
What are the dynamics of this system? Let’s find the eigenvalues for this matrix by plugging
the matrix coefficients into the characteristic equation (equation (6.3) on page 302); we get

(0.9 + 0.8) ± (0.9 + 0.8)2 − 4(0.9 × 0.8 − 0.1 × 0.2)
λ=
√ 2
1.7 ± 0.09 1.7 ± 0.3
= =
2 2
= (1, 0.7)
6.5. Linear Discrete-Time Dynamics 335

Therefore, the two eigenvalues are


λ1 = 1 and λ2 = 0.7

S
To find their corresponding eigenvectors U and V, let’s say U = . The matrix M SI acts
I
like multiplication by λ1 along U. This means that
M SI U = λ1 U
   
0.9 0.2 S 0.9S + 0.2I S S
M SI U = = = λ1 U = 1 =
0.1 0.8 I 0.1S + 0.8I I I

So
0.9S + 0.2I = S =⇒ I = 0.5S
0.1S + 0.8I = I =⇒ I = 0.5S

Now I = 0.5S is the equation of a line in (S, I) space that has slope 0.5. This line
 is the axis
2
U. We can choose any vector on the U axis to represent it, for example the vector , which
1
is then called an eigenvector of the matrix M SI corresponding the eigenvalue λ1 = 1.
The eigenvector corresponding to the second eigenvalue λ2 = 0.7 can be found in a similar
manner. It satisfies
M SI V = λ2 V

S
Let’s assume V = . Then
I
   
0.9 0.2 S 0.9S + 0.2I S 0.7S
M SI V = = = λ2 V = 0.7 =
0.1 0.8 I 0.1S + 0.8I I 0.7I
So
0.9S + 0.2I = 0.7S =⇒ I = −S
0.1S + 0.8I = 0.7I =⇒ I = −S

Since I = −S is the equation of a line in (S, I) space that has slope = −1, this lineis the

1
axis V. We can choose any vector on the U axis to represent it, for example the vector ,
−1
which is then called an eigenvector of the matrix M SI corresponding the eigenvalue λ2 = 0.7.

The action of M SI . The matrix acts as multiplication by λ1 = 1 along U, and it acts as


multiplication by λ2 = 0.7 along V. The problem comes when we try to say whether the point
(0, 0) is stable or unstable. Along the V eigenvector, it has |λ2 | = 0.7 < 1, so it is clearly stable
in the V direction. But in the U direction, it is neither expanding nor shrinking! The eigenvalue
λ1 = 1 along the U direction means that every point on U is an equilibrium point.
According to this analysis, the action of the matrix M SI on a point should be to compress it
along V axis and leave it unchanged (that is, multiplied by λ1 = 1) along the U axis.
This prediction is confirmed by some experiments with the matrix M SI .
If we start with an initial condition
 
S0 50
=
I0 50
336 Linear Algebra

and apply the matrix to this vector once, we get


   
S1 0.9 0.2 50 0.9 × 50 + 0.2 × 50 55
= = =
I1 0.1 0.8 50 0.2 × 50 + 0.8 × 50 45
If we decompose this initial condition along the directions of the two eigenvectors, we get
the U-component and the V-component. The action of the matrix has no effect on the U-
component, and it shrinks the V-component to 70% of its previous value (Figure 6.32, left). If
we now apply M SI repeatedly, we see that the overall effect is to walk the point down along the
V direction toward the U axis (Figure 6.32, right).

I (S0 , I0) = (50, 50) (S1 , I1) = (55, 45) I (S0 , I0) = (50, 50)
50 50
U ( λ2= 1 ) U
V ( λ2= 0.7 )
V

50
S 50
S

Figure 6.32: Left: Application of the S-I matrix M SI to the initial condition (S0 , I0 ) results in
the state point (S1 , I1 ), closer to the U axis but at a constant distance from the V axis. Right:
Repeated applications of M SI approach the U axis while remaining a constant distance from V.

Indeed, if we start with any initial condition on the line parallel to the V axis passing through
(50, 50), the dynamical system will converge to the same equilibrium point. For example, if we
take an initial condition on the other side of the U axis, say (90, 10), we see that the action of
the matrix is to walk the point up along the V direction toward the U axis (Figure 6.33).

I (S0 , I0) = (10, 90) I


100 100

V U U
50 V 50

(S0 , I0) = (90, 10)

S S
-50 50 100 -50 50 100

Figure 6.33: The U axis is a line of stable equilibrium points for the matrix M SI . Any initial
condition on a given line parallel to the V axis will approach the same equilibrium point on the
U axis.

Thus it is clear from both theoretical prediction and experiments that it is only the U com-
ponent of the initial condition that determines the final equilibrium point.
6.5. Linear Discrete-Time Dynamics 337

Therefore, if we start from an initial condition along a different line, say (10, 60), we see that
the action of M SI is to walk the state point toward a different equilibrium point on the U axis
(Figure 6.34).

I
100 (S0 , I0) = (10, 90)
(S0 , I0) = (10, 60)

V 50 U

-50 50 100
S

Figure 6.34: Two trajectories (red and black) starting from different initial conditions that do
not lie on the same line parallel to the V axis, will both approach the U axis but toward different
equilibrium points.

An effective way to visualize the action of any matrix M is to take a large number of initial
conditions in a circle and look at what repeated iterations of M do to the circle.
When we make this plot for the S-I matrix, we see that the action of M SI is to flatten the
circle into an oval. If we apply M SI repeatedly, the oval gets thinner and thinner and shifts its axis
slightly until it begins to resemble a thick flat line lying exactly along the U axis (Figure 6.35).

I I
V
U

S S

Figure 6.35: Left: one application of the S-I matrix to a circle of initial conditions (dark gray)
transforms them into a oval. Right: repeated applications flatten and rotate the oval. By the
tenth iteration (red dots), the initial circle has been transformed into a line lying along the
principal eigenvector.
338 Linear Algebra

Thus, we see here again the fact that repeated iteration of a matrix from any set of initial
conditions results in a thin oval whose principal axis moves closer and closer to the principal
eigenvector. Finally, for a large number of iterations, the resulting structure resembles a line,
a thin finger pointing along the principal eigenvector. And so once again, when you iterate a
matrix many times, you are looking at its principal eigenvector.

Markov processes Note that in this case, there are no births or deaths; the number of people
remains constant. Therefore, the sum of the entries in each column of the matrix must be equal
to 1, because each person in the compartment must go somewhere.
fraction of fraction of 1-a b
S who remain S I who become S

fraction of fraction of
S who become I I who remain I a 1-b

Σ =1 Σ =1 Σ =1 Σ =1
A matrix whose columns all add up to 1 is called a stochastic matrix. It’s called “stochastic”
(which means involving chance or probability) because we can interpret the matrix entries as
transition probabilities from one compartment to another.
We can imagine a large number of particles, in this case people, hopping from one compart-
ment to another, with hopping probabilities given by the elements of the matrix. Every matrix of
transition probabilities like this one will have the property that the columns all add to 1, because
probabilities must add to 1. When we interpret the matrix as a matrix of transition probabilities,
the process is called a Markov process.
In all such processes, λ = 1 will always be an eigenvalue, and hence all equilibria are neutral
equilibria. In a neutral equilibrium system, the behavior will always be to go to a stable final
state, but the stable final state depends on the initial condition.

Neutral Oscillations from the Locust Model

We saw that the three-variable locust model consists of three stages: eggs (E), hoppers (juve-
niles) (H), and adults (A) (Bodine et al. 2014). The egg and hopper stages each last one year,
with 2% of eggs surviving to become hoppers and 5% of hoppers surviving to become adults.
Adults lay 1000 eggs (as before, we are modeling only females) and then die. The model was
EN+1 = 0 · EN + 0 · HN + 1000AN ⎡ ⎤
0 0 1000
HN+1 = 0.02EN + 0 · HN + 0 · AN =⇒ L = ⎣0.02 0 0 ⎦
AN+1 = 0 · EN + 0.05HN + 0 · AN 0 0.05 0

We saw that the model gave us neutral oscillations, which depended on the initial conditions
(Figure 6.8 on page 295). We can confirm this by plotting the trajectory of repeated applications
of L to two different initial conditions in 3-dimensional (E, H, A) state space (Figure 6.36).
6.5. Linear Discrete-Time Dynamics 339

40

adults (A)
20

1000

0
0
500
20000
hoppers (H)
eggs (E) 40000
0

Figure 6.36: Two trajectories resulting from simulations of the locust population model with two
different initial conditions.

To explain this neutral oscillatory behavior, we need to study the eigenvalues of the matrix
L; see Exercise 6.5.6 below.

Exercise 6.5.6 Use SageMath to calculate the eigenvalues of L. Verify that they are
√ √
1 3 1 3
λ1 = 1, λ2 = − + i, λ3 = − − i
2 2 2 2
What do the eigenvalues tell you about the behavior you have just seen? Relate each of the
phenomena you saw above to specific properties of the eigenvalues.

Lessons

We have seen that the equilibrium point behavior of a linear discrete-time dynamical system is
entirely determined by the eigenvalue and eigenvector decomposition of its matrix representation.
There is also an important lesson about the long-term behavior of linear (or matrix) discrete-
time systems that we remarked on in each of our examples: if you take a blob of points and
apply a matrix M to them many times, you will be looking at the principal eigenvector of M.
Put another way, the long-term behavior of a linear discrete-time system is dominated by its
largest eigenvalue and the corresponding eigenvector.
There is a nice algebraic way to see why this is true. Suppose our n-dimensional dynamical
system is
XN+1 = f (XN ) = MXN

If we start with an initial condition X0 , then


XN = M N X0
Now suppose that the eigenvalues of M, in descending order of magnitude (absolute
value), are λ1 , λ2 , . . . , λn , and the corresponding eigenvectors are E1 , E2 , . . . , En . In the basis
340 Linear Algebra

{E1 , E2 , . . . , En } formed by the n eigenvectors, there are constants c1 , c2 , . . . , cn such that we


can decompose the initial condition X0 into
X0 = c1 E1 + c2 E2 + · · · + cn En
Then applying M to X0 once, we get
! "
X1 = M(X0 ) = M c1 E1 + c2 E2 + · · · + cn En
! " ! " ! "
= M c1 E1 + M c2 E2 + · · · + M cn En
= c1 ME1 + c2 ME2 + · · · + cn MEn
= c1 λ1 E1 + c2 λ2 E2 + · · · + cn λn En
And similarly,
! "
X2 = M(X1 ) = M c1 λ1 E1 + c2 λ2 E2 + · · · + cn λn En
! " ! " ! "
= M c1 λ1 E1 + M c2 λ2 E2 + · · · + M cn λn En
= c1 λ21 E1 + c2 λ22 E2 + · · · + cn λ2n En
If we iterate M 100 times, we get
! "
X100 = M(X99 ) = M c1 λ99 99 99
1 E1 + c2 λ2 E2 + · · · + cn λn En
! " ! " ! "
= M c1 λ99
1 E1 + M c 2 λ 99
2 E2 + · · · + M c n λ 99
n En

= c1 λ100 100 100


1 E1 + c2 λ2 E2 + · · · + cn λn En

If λ1 is even slightly larger than λ2 , then λ1001 will be much larger then λ100
2 . Therefore, the
dynamics along the principal eigenvector will dominate the long-term behavior of the matrix.
This principle is beautifully illustrated in the following example.

Further Exercises 6.5

1. A swan population can be subdivided into young swans (Y ) and mature swans (M). We
can then set up a discrete-time model of these populations as follows:
 
YN+1 0.57 1.5 YN
=
MN+1 0.25 0.88 MN
a) Explain the biological meaning of each of the four numbers in the matrix of this
model.
b) It turns out that the eigenvectors of this matrix
 are
approximately as follows (you
 can

1.9 1.9
check this using SageMath if you wish): with eigenvalue 0.09 and
−0.6 1.0
with eigenvalue 1.36. What will happen to the swan population in the long run?
c) Many years in the future, if there are 2000 mature swans, approximately how many
young swans would you expect there to be?
6.6. Google PageRank 341

2. A blobfish population consists of juveniles and adults. Each year, 50% of juveniles become
adults and 10% die. Adults have a 75% chance of surviving from one year to the next
and have, on average, four offspring a year.
a) Write a discrete-time matrix model describing this population.
b) If the population this year consists of 50 juveniles and 35 adults, what will next year’s
population be?
c) What will happen to the population in the long run?

6.6 Google PageRank


Shortly after the invention of the World Wide Web, programs began to appear that would enable
you to search over the web to find websites, or “pages,” that mentioned a specified key word or
phrase.
The early versions of these “web browsers” or “search engines” were not very good. If you
typed in “Paris, France” you were as likely to be directed to the personal web page of a couple
from Seattle who had recently visited Paris and posted photos as to, say, the French government
website or the official site of the city of Paris.
Something had to be done to enable the search engine to rank websites according to how
“important” they are. But what does “important” mean? One answer to this was provided by two
graduate students in computer science, Sergey Brin and Larry Page, who published an article in
the journal Computer Networks in 1998, called “The Anatomy of a Large-Scale Hypertextual Web
Search Engine” (Brin and Page 1998). They began their paper thus: “In this paper, we present
Google, a prototype of a large-scale search engine which makes heavy use of the structure present
in hypertext. Google is designed to crawl and index the Web efficiently and produce much more
satisfying search results than existing systems.”
Their key idea is that we want not just websites, but websites that are themselves pointed to
or voted for by other important websites, that is, by websites that are themselves pointed to or
“voted for” by other important websites, which then “pass on” their importance to the sites that
they point to. This regress suggests a dynamical system or iterated matrix system, iterating the
“points to” function over and over.
As we saw in the discussion of discrete-time dynamical systems, the result of iterating a
matrix M over and over is the principal eigenvector of M. Indeed, Page and Brin describe their
new concept, called PageRank, which assigns an importance PR(A) to every page A, as follows:
“PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to
the principal eigenvector of the normalized link matrix of the web.”
The key idea is that we can represent networks with matrices. So let’s consider a net that is
composed of pages p1 , p2 , . . . , pn . First, we will create the “points to” matrix, which is
⎡ ⎤
a11 a12 . . . a1n
⎢a21 a22 . . . a2n ⎥
⎢ ⎥
⎢ .. .. . .. ⎥
⎣ . . . . . ⎦
an1 an2 ... ann
where aij = 1 if page pj points to page pi , and aij = 0 if not.
342 Linear Algebra

page aij = 1 page

i points to j
Note that the sum of the elements in each row i is the total number of pages that point to
page i , and the sum of the elements in each column j is the total number of pages that page j
points to:
⎡ ⎤
a11 . . . a1j . . . a1n
⎢ .. .. .. ⎥
⎢ . . . ⎥ total number of
⎢ ⎥ sum of the i th row
⎢ ai1 . . . aij . . . ain ⎥ = pages that
⎢ ⎥ ai1 + · · · + aij + · · · + ain
⎢ . . . ⎥ point to page i
⎣ .. .. .. ⎦
an1 . . . anj . . . ann
⎡ ⎤
a11 . . . a1j . . . a1n
⎢ .. .. .. ⎥
⎢ . . . ⎥ total number of
⎢ ⎥ sum of the jth column
⎢ ai1 . . . aij . . . ain ⎥ = pages that
⎢ ⎥ a1j + · · · + aij + · · · + anj
⎢ . . . ⎥ page j points to
⎣ .. .. .. ⎦
an1 . . . anj . . . ann

Then we have to account for the fact that a webpage might point to many other pages. A
“vote” from a selective page counts more than a “vote” from a page that points to lots of other
pages, so if one page points to many others, the importance score that it passes on to the other
pages must be diluted by the total number of outbound links. For example, if page j points to
page i , then aij = 1. But this will need to be diluted by the total number of pages that page j
points to, which is a1j + a2j + · · · + anj .
So we define Li,j as the normalized weight of page j’s vote on page i :
page j’s vote on page i (0 or 1) aij
Li,j = =
total number of pages that page j pointed to a1j + a2j + · · · + anj
We now define the “links to” matrix
⎡ ⎤
L11 L12 ... L1n
⎢L21 L22 ... L2n ⎥
⎢ ⎥
L = [Lij ] = ⎢ . .. .. .. ⎥
⎣ .. . . . ⎦
Ln1 Ln2 ... Lnn
Then let’s define the PageRank vector as the vector made up of the “importance” of each
page p1 , p2 , . . . , pn . This is the vector that the search engine needs to calculate to assign an
importance to each page in the network. Its components P R1 , P R2 , . . . , P Rn are the importance
scores of each page. The higher the score, the more important the page. The more important
the page, the higher it appears in the search engine results:
⎛ ⎞ ⎛ ⎞
importance of p1 P R1
⎜importance of p2 ⎟ ⎜P R2 ⎟
⎜ ⎟ ⎜ ⎟
PR = ⎜ .. ⎟ = ⎜ .. ⎟
⎝ . ⎠ ⎝ . ⎠
importance of pn P Rn
6.6. Google PageRank 343

To start with, we will assume an initial condition, which we will call “old PR,” in which all n
pages have equal importance. We will normalize the total importance to 1, so
⎛1⎞
⎛ ⎞ ⎜n⎟
old P R1 ⎜1⎟
⎜old P R2 ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎜ n


old PR = ⎜ ⎟ = ⎜
⎠ ⎜ .. ⎟
..
⎝ . ⎟
⎜.⎟
old P Rn ⎝ ⎠
1
n
Then we update the old PR vector. The new value of P Ri is the sum of the normalized
incoming links to page i . In this way, each page that points to page i “passes on” a fraction of
its own importance to page i .
That is, we update the page rank P Ri by assigning the new value
! " ! " ! "
new P Ri = Li1 · old P R1 + Li2 · old P R2 + · · · + Lin · old P Rn

which is the sum of the normalized weight of each page j’s vote on page i × page j’s page rank.
If we do this update for each of the old page ranks, we get a “new” page rank vector
⎛ ⎞ ⎛ ⎞
new P R1 L11 · (old P R1 ) + L12 · (old P R2 ) + · · · + L1n · (old P Rn )
⎜new P R2 ⎟ ⎜L21 · (old P R1 ) + L22 · (old P R2 ) + · · · + L2n · (old P Rn )⎟
⎜ ⎟ ⎜ ⎟
new PR = ⎜ .. ⎟=⎜ .. ⎟
⎝ . ⎠ ⎝ . ⎠
new P Rn Ln1 · (old P R1 ) + Ln2 · (old P R2 ) + · · · + Lnn · (old P Rn )
This can be rewritten as
⎛ ⎞ ⎡ ⎤⎛ ⎞
new P R1 L11 L12 ... L1n old P R1
⎜new P R2 ⎟ ⎢L21 L22 ... L2n ⎥⎜old P R2 ⎟
⎜ ⎟ ⎢ ⎥⎜ ⎟
new PR = ⎜ .. ⎟ = ⎢ .. .. .. .. ⎥⎜ .. ⎟
⎝ . ⎠ ⎣ . . . . ⎦⎝ . ⎠
new P Rn Ln1 Ln2 ... Lnn old P Rn
or in vector form ! "
new PR = L old PR

But as Page and Brin saw, this is only a first estimate. The next question is, how important
are the sites that pointed to the sites that pointed to site i ? To take that factor into account,
we replace the “new” page rank vector by a “new new” page rank vector
⎛ ⎞ ⎡ ⎤⎛ ⎞
new new P R1 L11 L12 ... L1n new P R1
⎜new new P R2 ⎟ ⎢L21 L22 ... L2n ⎥⎜new P R2 ⎟
⎜ ⎟ ⎢ ⎥⎜ ⎟
new new PR = ⎜ .. ⎟=⎢ . .. .. .. ⎥⎜ .. ⎟
⎝ . ⎠ ⎣ .. . . . ⎦ ⎝ . ⎠
new new P Rn Ln1 Ln2 ... Lnn new P Rn
⎡ ⎤⎡ ⎤⎛ ⎞
L11 L12 ... L1n L11 L12 ... L1n old P R1
⎢L21 L22 ... L2n ⎥⎢L21 L22 ... L2n ⎥⎜old P R2 ⎟
⎢ ⎥⎢ ⎥⎜ ⎟
=⎢ . .. .. .. ⎥⎢ .. .. .. .. ⎥⎜ .. ⎟
⎣ .. . . . ⎦⎣ . . . . ⎦⎝ . ⎠
Ln1 Ln2 ... Lnn Ln1 Ln2 ... Lnn old P Rn

or in vector form ! " ! "


new new PR = L new PR = L2 old PR
344 Linear Algebra

In other words, the infinite regress that is contained in the idea of “sites that are linked to by
sites that are linked to by . . . ” is actually a model for a discrete-time dynamical system that is
an iteration of the “link to” matrix.
What happens when we iterate this link matrix L many times? Suppose the eigenvectors of
L are E1 , E2 , . . . , En , in descending order of their corresponding eigenvalues, λ1 , λ2 , . . . , λn . So
λ1 is the largest eigenvalue.
The action of applying L to the initial condition “old PR” many times is then dominated by
the principal eigenvector of L, which is E1 . Indeed, there are constants c1 , c2 , . . . , cn that enable
us to express the initial condition
old PR = c1 E1 + c2 E2 + · · · + cn En
in the eigenvector basis {E1 , E2 , . . . , En }. After iterating the matrix many times, say 100, we
get
! " ! " ! " ! "
L100 old PR = L100 c1 E1 + L100 c2 E2 + · · · + L100 cn En
= c1 λ100 100 100
1 E1 + c2 λ2 E2 + · · · + cn λn En

So the long-term behavior of repeatedly iterating L is dominated by the principal eigenvector


E1 .
We call the principal eigenvector of the matrix L the page rank vector. Thus the vector
E1 , ⎛ ⎞
P R1
⎜P R2 ⎟
⎜ ⎟
E1 = ⎜ . ⎟
⎝ .. ⎠
P Rn

and its components P R1 , P R2 , . . . , P Rn are the page ranks, the final importance scores assigned
to each page. When you search a term, Google presents pages to you in the order of their page
rank eigenvector.

Surfer Model

In our discussion of Markov processes, we saw that a Markov process can be represented by a
matrix (M) each element mij of which is the probability of a person “hopping” from compartment
j to compartment i in the next time interval.
The long-term behavior of the system is given by the iteration of the matrix, which will tend
to some outcome. As we saw, the results of that iteration are determined by the eigenvector
and eigenvalue decomposition of the matrix.
Brin and Page realized that their “links to” matrix could also be seen as a model of a Markov
process, in which a random web surfer “hops” from one page j to another page i with a probability
equal to the normalized weight of page j’s vote on page i , which is Lij .
Notice that the “links to” matrix satisfies the key condition that defines a “stochastic” matrix:
each column adds up to 1. For example, the elements of the jth column of the “links to” matrix
are L1j , . . . Lij , . . . Lnj . Recall that the definition of Lij is
page j’s vote on page i (0 or 1) aij
Li,j = =
total number of pages that page j pointed to a1j + · · · + anj
6.6. Google PageRank 345

So the sum of the jth column of the “links to” matrix is


a1j aij anj
L1j + · · · + Lij + · · · + Lnj = + ··· + + ··· +
a1j + · · · + anj a1j + · · · + anj a1j + · · · + anj
a1j + · · · + anj
=
a1j + · · · + anj
=1

⎡ ⎤
L11 . . . L1j . . . L1n
⎢ .. .. .. ⎥
⎢ . . . ⎥
⎢ ⎥ sum of the jth column
⎢ Li1 . . . Lij . . . Lin ⎥ = 1
⎢ ⎥ L1j + · · · + Lij + · · · + Lnj
⎢ . .. .. ⎥
⎣ .. . . ⎦
Ln1 . . . Lnj . . . Lnn

So the page rank vector can be interpreted in this surfer model as the probability that the
surfer, clicking randomly on each page, will end up on a given page.

An Example of the PageRank Algorithm


Suppose we have a network of four web pages, A, B, C, and D, with links to the other pages as
shown below.
Page A Page B Page C Page D
Page A Page A
Page B Page B Page B
Page C
Page D Page D

In this network, the “points to” relationship is summarized as

B D C

where the arrow means “points to.” We can then derive the “points to” matrix, more commonly
called a directed adjacency matrix because it shows which pages are linked and the direction of
the link. For example, from the diagram, we know that page A points to page C; therefore, in
the “points to” matrix, we have aC←A = 1:
⎡ ⎤ ⎡ ⎤
aA←A aA←B aA←C aA←D 0 1 0 1
⎢aB←A aB←D ⎥ ⎢ ⎥
“points to” matrix = ⎢
aB←B aB←C ⎥ = ⎢1 0 1 1⎥
⎣ aC←A aC←B aC←C aC←D ⎦ ⎣ 1 0 0 0⎦
aD←A aD←B aD←C aD←D 1 0 1 0
From the “points to” matrix, we can derive the “links to” matrix L by normalizing each “vote”
from page j to page i by the total number of “votes” cast by page j. So for example, the sum
346 Linear Algebra

of the first column of the “points to” matrix is the total number of pages that page A points to,
which is 3. So each vote that A casts has to be divided by 3.
0 1 0 1
1 0 1 1
1 0 0 0
1 0 1 0
∑= 3 ∑ = 1 ∑ = 2 ∑ = 2

In this manner, we derive the normalized weights as

1
A 1
3 3
1 1 1
2 3

1 1
B 2
D 2
C
1
2

which gives rise to the “links to” matrix


⎡ ⎤
1
0 1 0
⎡ ⎤ ⎢
⎢ 2⎥⎥
LA←A LA←B LA←C LA←D ⎢1 1 1⎥
⎢LB←A ⎢ 0 ⎥
LB←D ⎥
⎥=⎢ 2⎥
⎢3 2
LB←B LB←C
L=⎢
⎣ LC←A ⎦ ⎥
aC←B LC←C LC←D ⎢1 ⎥
⎢ 0 0 0⎥
LD←A LD←B LD←C LD←D ⎢3 ⎥
⎣1 1 ⎦
0 0
3 2
If we begin with an initial condition that is the vector of equal weights to each page (0.25),
then the results of repeatedly iterating the matrix L are
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0.25 0.38 0.44 0.35
⎜0.25⎟ ⎜0.33⎟ ⎜0.27⎟ ⎜0.29⎟
PR = ⎜⎝0.25⎠
⎟ L PR = ⎜ ⎝0.08⎠
⎟ L2 PR = ⎜ ⎝0.13⎠
⎟ L3 PR = ⎜ ⎝0.15⎠

0.25 0.21 0.17 0.21


⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0.40 0.39 0.38 0.39
⎜0.30⎟ ⎜0.29⎟ ⎜0.29⎟ ⎜0.29⎟
L4 PR = ⎜⎝0.12⎠
⎟ L5 PR = ⎜ ⎝0.13⎠
⎟ L6 PR = ⎜ ⎝0.13⎠
⎟ L7 PR = ⎜ ⎝0.13⎠

0.19 0.19 0.20 0.19


⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0.39 0.39 0.39
⎜ 0.29 ⎟ ⎜ 0.29⎟ ⎜ 0.29⎟
L8 PR = ⎜⎝0.13⎠
⎟ L9 PR = ⎜ ⎝0.13⎠
⎟ L10 PR = ⎜ ⎝0.13⎠

0.19 0.19 0.19

Note that the iteration process stabilizes after only a few iterations and reaches a “stationary
distribution” that is the principal eigenvector, which gives us the final page ranks.
6.6. Google PageRank 347

Food Webs
Another example of a Google-style eigenvector-based ranking system can be found in the analysis
of food webs in ecology.
In a food web, nutrients move from one species to another. In an application of the Google
eigenvector concept, Allesina and Pascual wanted to find out whether a given species was
“important for co-extinctions” (Allesina and Pascual 2009). That is, they wanted to know which
species had the biggest impact on the food web and whose loss would therefore be the most
catastrophic.
If the food web has species 1, 2, . . . , k that interact with each other, we will let [aij ] be the
k × k matrix that represents the “preys on” hierarchy, in other words, the i th row and jth column
entry of the “preys on” matrix is given by aij = 1 if species j preys on species i .

species aij = 1 species

i preys on j
Just as Google wants the web pages that are pointed to by web pages that are pointed to
. . . , so in food webs we are interested in species that are preyed on by species that are preyed
on . . . .
We find these “important” species by the same method: start with the “preys on” matrix of 0’s
and 1’s, normalize it to a stochastic matrix (all columns add to 1), and then find the principal
eigenvector. Each species’ importance in this food web is then its corresponding component in
this principal eigenvector.
The ranking that is produced by the principal eigenvector is then interpretable as “the sequence
of the losses that results in the fastest collapse of the network.” Allesina and Pascual argue that
this dominant eigenvector analysis is superior to other approaches to food webs, for example,
those that focus on “hub” or “keystone” species, which are defined as those species that have
the largest number of links to other species.

Input/Output Matrices and Complex Networks

Economics

The history of matrix analysis of networks begins in economics. The economist Wassily Leontieff
produced an input/output matrix analysis of the Unites States economy in 1941. In a matrix
representation of an economy, we have a list of “sectors” s1 , s2 , . . . , sk , such as steel, water,
rubber, oil. Then we form the k × k matrix [aij ] in which each entry aij represents the quantity
of resources that sector j orders from sector i .

sector aij sector

i orders from j
The first practical application came two years later, during World War II. The US government
asked Leontieff to create an input/output matrix representing the Nazi war economy in order to
identify which sectors were the most critical. This was done, and the eigenvector calculation of
this large-dimensional matrix was one of the very early uses of automated computing.
Leontieff used “the first commercial electro-mechanical computer, the IBM Automatic
Sequence Controlled Calculator (called the Mark I), originally designed under the direction of
348 Linear Algebra

Harvard mathematician Howard Aiken in 1939, built and operated by IBM engineers in Endicott,
New York for the US Navy” (Miller and Blair 2009).
The results of his eigenvector analysis would not have been immediately obvious: the critical
sectors were oil and ball bearings. Ball bearings were critical components of machinery and
vehicles, and no substitutes for them existed. In accord with this analysis, the US Army Air
Forces designated ball bearing factories and oil refineries as the major targets for their bombing
campaign in Europe.

Ecological Networks

In the 1970s, ecologists studying the flow of energy and nutrients (substances like carbon,
nitrogen, and phosphorus) in ecosystems discovered Leontief’s work and began using it to study
ecosystems as input/output systems (Hannon 1973), creating the field of ecological network
analysis. (See Fath and Patten (1999) for a readable introduction.) The first step in doing so is
to decide what substance to study (this substance is called the currency of the model), and if we
are studying a whole ecosystem, decide how to partition it into compartments. Compartments
can be species, collections of species, or nonliving ecosystem components such as dissolved
nitrate in water.
We then measure or estimate how much of our currency flows between each pair of compart-
ments. This gives what is called the flow matrix F . Entry fij of this matrix tells us how much
currency flows from compartment j to compartment i . For example, the ecological interactions
that make up an oyster–mussel community in a reef have been modeled as consisting of six
compartments. The currency in this case is energy, and the flows from one compartment to
another are shown in Figure 6.37.

Z1

Y1 1 f61 6 predators Y6
X1 X6
f21 f26
f25
Y2 2 deposited detrius f52 5 deposit feeders Y5
X2 f53 X5
f24
f32
f42
Y3 3 microbiota 4 meiofauna Y4
X3 f43 X4

Figure 6.37: Six compartment model of reef community (redrawn from Patten (1985)).

Based on the graph of the network, we can make an input–output matrix for the compartments
in the system. We can then iterate this matrix to find the long-term behavior predicted by the
model.
Suppose we iterate the matrix many times and the system stabilizes at some equilibrium point.
When the system is at equilibrium, the sum of all the outflows (or inflows) from a compartment
is called the compartment’s throughflow .
We can make a vector, T, of these throughflows. Dividing each entry in the F matrix by the
f
throughflow of the donor compartment gives a matrix called the G matrix, where Gij = Tijj . This
matrix gives us the probability that a unit of currency leaving compartment j enters compartment
i , or the fraction of the currency that does so.
6.6. Google PageRank 349

The G matrix tells us about the currency going from compartment j to compartment i in
one direct step. However, ecologists are interested in more than just the question of how much
flows from j to i . They also want to know about second-order flows, in which currency transfer
happens in two steps: j → k → i ; the currency first has to get from j to k and then from k to i .
The probability of going from j to k is Gkj , and the probability of going from k to i is Gik . And
the probability of going from j → i through k is the product of Gkj and Gik . Adding up these
products for all the compartments that could play the role of k gives the fraction of currency
leaving j that gets to i in two steps. We can do this for every compartment in the model simply
by multiplying the G matrix by itself. The resulting matrix is written as G 2 . More generally, the
amount of currency going from j to i in n steps is entry i , j of the matrix G n .
Why is this interesting? Well, all powers of G tell us about indirect flows between j and i . We
may sum all these matrices to obtain the sum of all indirect flows as G 2 + G 3 + · · · . Because
real ecosystems leak energy and nutrients, the entries in G n+1 are generally smaller than those
in G n , and the sum G 2 + G 3 + · · · converges to some limiting matrix. Comparing the entries
of this matrix to those of G itself lets us compare the relative importance of direct and indirect
flows. It turns out that in many ecosystem models, indirect flows are significant and can even
carry more energy or nutrients than direct flows!
Why does this happen, despite the fact that currency is lost at every step? It’s true that
a longer path will typically carry less currency than a shorter one. But how many long paths
are there? We can find out by taking powers of the adjacency matrix A. The i , jth entry of An
tells us the number of paths of length n between j and i . For most ecosystem and food web
models, these numbers rapidly become astronomical. For example, in the 29-species food web in
Figure 6.38, there are at least 28 million paths between seals and the fish hake (Yodzis 1998).

Figure 6.38: A food web for an ecosystem off the coast of southern Africa. Reprinted from
“Local trophodynamics and the interaction of marine mammals and fisheries in the Benguela
ecosystem,” by P. Yodzis, 1998, Journal of Animal Ecology 67(4):635–658. Copyright 1998
John Wiley & Sons. Reprinted with permission from John Wiley & Sons.
350 Linear Algebra

This proliferation of paths allows indirect paths taken together to carry a large amount of
energy or nutrients, even though no individual path may be very significant. This is one of the
reasons why predicting how an ecosystem or other complex system will respond to an intervention
is difficult.

6.7 Linear Differential Equations


Our second major application of linear algebra is the subject of linear differential equations. Here,
the function
f : Rn −→ Rn

is the vector field that assigns the n-dimensional change vector


X = (X1 , X2 , . . . , Xn )
to the n-dimensional state vector
X = (X1 , X2 , . . . , Xn )
Since both X and X are vectors in Rn , the vector field f truly is a function from Rn to Rn .
We can decompose the function f : Rn → Rn into n component functions f1 , f2 , . . . , fn , each
of which is a function Rn → R. This amounts to writing the vector differential equation
X = f (X)
as the n-component differential equations
X1 = f1 (X1 , X2 , . . . , Xn )
X2 = f2 (X1 , X2 , . . . , Xn )
.. ..
. .
Xn = fn (X1 , X2 , . . . , Xn )
Linear dynamical systems have particularly simple behaviors and can be completely classified.

Equilibrium Points
First of all, let’s discuss equilibrium points. If we think about one-dimensional linear vector fields,
then we are talking about either
X = r X or X  = −r X (assuming r > 0)
It is clear that the only equilibrium points these systems can have are X = 0.
But what about two-dimensional or even n-dimensional cases? In the n-dimensional case, if
we are looking for equilibrium points, we are looking for solutions to
X = 0 = f (X)
which implies
X1 = 0 = f1 (X1 , X2 , . . . , Xn )
X2 = 0 = f2 (X1 , X2 , . . . , Xn )
.. .. ..
. . .
Xn = 0 = fn (X1 , X2 , . . . , Xn )
6.7. Linear Differential Equations 351

where f1 , f2 , . . . , fn are all linear functions Rn → R.


How many solutions can this set of equations have? Here, a theorem from elementary algebra
comes to the rescue:3 setting n linear functions of n unknowns equal to zero can have only one
solution, and that is
X1 = X2 = · · · = Xn = 0

(We can find this by using the first equation to eliminate X1 in terms of the other variables, then
using the second equation to eliminate X2 , and finally we get an equation of the form aXn = 0,
which can have only the solution X1 = X2 = · · · = Xn = 0.)

A linear system of differential equations has a unique equilibrium point, at


X1 = X2 = · · · = Xn = 0

Stability
Having found the equilibrium point, we now need to determine its stability.
In the one-dimensional case, we have already seen that X  = r X has a stable equilibrium
point at X = 0 if and only if r < 0.
If we now pass to the decoupled 2D case,
X  = aX
Y  = dY
we can say that since the system decouples into two 1D subsystems along the X and Y axes,
the behavior of the equilibrium point is given by the behaviors along the two axes. The two 1D
subsystems are X  = aX and Y  = d Y . And if we join them, we get
   
X  = aX X a 0 X
=⇒ =
Y  = dY Y 0 d Y

As we saw in Chapter 3, these equilibrium points can be purely stable nodes (a < 0 and
d < 0), purely unstable nodes (a > 0 and d > 0), and saddle points (a < 0 and d > 0 or a > 0
and d < 0).

Exercise 6.7.1 Why does it makes sense that these signs of a and d give rise to the equilibrium
types listed above? (Hint: Draw some phase portraits.)

Exercise 6.7.2 Classify the equilibria of the following systems:


% % %
X  = 2X X  = 0.5X X  = −1.2X
a) b) c)
Y  = −3Y Y  = 1.8Y Y  = −0.3Y

3 Almost all the time. The exceptions are cases in which two equations are multiples of each other, such as

0 = X + Y and 0 = 2X + 2Y . Try solving these for X and Y ; you don’t get a definite answer.
352 Linear Algebra

The Flow Associated with a Linear Differential Equation


1D Recall a very important fact about the differential equation
X = r X
As we saw in Chapter 2, this differential equation has an explicit solution. In other words, it’s
possible to actually write out a function X(t) such that
X  (t) = r X(t)
In this case, the explicit solution to the differential equations is the function
X(t) = X(0)e r t
where X(0) is the initial condition.
We call X(0)e r t the flow corresponding to the differential equation X  = r X.

Exercise 6.7.3 Find the flow of the differential equation X  = 0.25X.

2D Let’s go on to discuss the two-dimensional case. The simplest case is two uncoupled systems
X  = aX
Y  = dY
This can be represented as the matrix differential equation
  
X a 0 X
=
Y 0 d Y
The flow corresponding to the diagonal matrix differential equation is then just the combina-
tion of the flows in the two components:
 
X(t) X(0)e at
=
Y (t) Y (0)e dt
where X(0), Y (0) are the initial conditions.

%
X  = 0.3X
Exercise 6.7.4 Find the flow of the differential equation
Y  = −0.5Y
%
X(t) = X(0)e 2t
Exercise 6.7.5 What differential equation has the flow
Y (t) = Y (0)e −0.7t
6.7. Linear Differential Equations 353

Eigenbehavior
We can look at the equation
X = r X

represented by the linear function


f (X) = r X

and ask something that may seem redundant and pointless. We will ask whether this 1D linear
function has an eigenvalue and an eigenvector. The answer is that of course it does. An eigen-
vector is a subspace along which f acts like multiplication by λ, and X obviously satisfies this,
with λ = r .
Therefore, for the differential equation X  = r X, we can rewrite the equation for the flow as
X(t) = X(0)e λt (where λ = r )
Similarly, in the 2D uncoupled case, for the matrix differential equation
  
X a 0 X
=
Y 0 d Y
we can ask whether the matrix has eigenvalues and eigenvectors. And again, the answer is that
of course it does: the vectors  
1 0
{X, Y} = { , }
0 1

are eigenvectors, and the corresponding eigenvalues are


λX = a λY = d
Then we can rewrite the equation for the flow for this uncoupled 2D system as
 
X(t) X(0)e λX t
=
Y (t) Y (0)e λY t

Exercise 6.7.6 Construct the flow for the matrix differential equation
  
X −2 0 X
=
Y 0 1 Y

This form is the key to understanding the general 2D case. By mixing and matching var-
ious values of λX and λY , we get a gallery of equilibrium points in diagonal linear systems
(Figure 6.39).
354 Linear Algebra

Y Y

unstable node X X
λX =1 λY = 1

X(t) X(0) e t
X’= X Y’= Y
Y(t) Y(0) e t

Y
Y
stable node
X X
λ X = 1 λY = 1

X(t) X(0) e t
X’= X
Y(0) e t
Y’= Y
Y(t)

Y
Y
stable node X X
λ X = 1 λY = 2

X(t) X(0) e t
X’= X Y’= 2Y Y(t) Y(0) e 2t

Y Y

saddle point X X
λX = 1 λY = 1

X(t) X(0) e t
X’=X Y’= Y Y(t) Y(0) e t

Figure 6.39: Equilibrium points and flows in 2D uncoupled systems.

We can now go on to the general case:


  
X  = aX + bY X a b X
=⇒ =
Y  = cX + d Y Y c d Y

The key to understanding behavior in this general case is to decompose the system into its
eigenvalues and eigenvectors, and then infer the flow from the “eigenbehavior” just as we have
been doing. So, for example, if λ1 and λ2 are both real numbers, we find their corresponding
6.7. Linear Differential Equations 355

eigenvectors U and V, and conclude that the flow is U(0)e λ1 t on the U axis and V (0)e λ2 t on
the V axis. This completely determines the behavior in the 2D state space.

An example in two dimensions. Consider the linear differential equation


9 4
X = X − Y
7 7
8 9
Y = X− Y
7 7
represented by the matrix differential equation

9 4 ⎛ ⎞ ⎡ 9 4 ⎤⎛ ⎞
X = X − Y ⎪ ⎪
⎬ X − X
7 7 ⎝ ⎠=⎢ 7 7 ⎥⎝ ⎠
=⇒ ⎣ ⎦
8 9 ⎪⎪ 8 9
Y = X− Y⎭ Y − Y
7 7 7 7
How will this system behave? We need to study the eigenvalues and corresponding eigenvec-
tors of the matrix ⎡9 4⎤

⎢7 7⎥
M=⎣ ⎦
8 9

7 7

The eigenvalues of this matrix are obtained by plugging the matrix entries into the charac-
teristic equation (equation (6.2) on page 299). We get
λ1 = 1 and λ2 = −1

Exercise 6.7.7 Confirm this.

Next, we calculate the eigenvectors. The eigenvector U corresponding to λ1 satisfies


MU = λ1 U
We can say that
⎡9 4 ⎤⎛ ⎞ ⎛ 9 4 ⎞ ⎛ ⎞ ⎛ ⎞
− X X− Y X X
⎢7 7 ⎥⎝ ⎠ ⎜ 7 7 ⎟
MU = ⎣ ⎦ =⎝ ⎠ = λ1 U = 1⎝ ⎠ = ⎝ ⎠
8 9 8 9
− Y X− Y Y Y
7 7 7 7
This gives us
9 4
X− Y =X =⇒ Y = 0.5X
7 7
8 9
X− Y =Y =⇒ Y = 0.5X
7 7

which implies that the eigenvector U lies on the line Y = 0.5X, which has slope 0.5. The vector
(X, Y ) = (2, 1) will serve nicely as an eigenvector on this line.
The eigenvector V corresponding to λ2 must satisfy
M V = λ2 V
356 Linear Algebra

We can say that


⎡9 4 ⎤⎛ ⎞ ⎛ 9 4 ⎞ ⎛ ⎞ ⎛ ⎞
− X X− Y X X
⎢7 7 ⎥⎝ ⎠ ⎜ 7 7 ⎟
MV = ⎣ ⎦ =⎝ ⎠ = λ2 V = −1⎝ ⎠ = ⎝ ⎠
8 9 8 9
− Y X− Y Y Y
7 7 7 7
This gives us
9 4
X − Y = −X =⇒ Y = 4X
7 7
8 9
X − Y = −Y =⇒ Y = 4X
7 7

which implies that the eigenvector V lies on the line Y = 4X, which has slope 4. The vector
(X, Y ) = (1, 4) will serve nicely as an eigenvector on this line.
The resulting equilibrium point structure therefore has a stable direction along the V axis
(λV = λ2 = −1) and an unstable direction along the U axis (λU = λ1 = 1). Therefore, the
equilibrium point is a saddle point whose axes are U and V.
The flow corresponding to this saddle point is then exactly as in the uncoupled 2D system
 
U(t) U(0)e λU t
=
V(t) V(0)e λV t
where U(0) and V (0) are initial conditions expressed in the {U, V} coordinate system (Figure 6.40).

Y V

Figure 6.40: The flow around a saddle point. U and V are the unstable and stable eigenvectors.

Suppose we are given a matrix differential equation


  
X X
= M
Y Y
and we want to know the behavior from an initial condition (X(0), Y (0)). In order to find it:
(1) Use the coordinate transformation matrix T (see Changing bases: coordinate transforms
in section 6.4) to transform the initial conditions from the {X, Y} coordinate system to
6.7. Linear Differential Equations 357

the {U, V} coordinate system:


 
X(0) T U(0)
−−−−−−−−−−→
Y(0) V(0)
(2) Evolve the differential equation along the U, V axes by the exponential flows
 
U(t) U(0)e λU t
=
V(t) V(0)e λV t
(3) Use the inverse coordinate transformation matrix T −1 to transform the result from the
{U, V} coordinate system back into the {X, Y} coordinate system:
 
X(t) T −1 U(0)e λU t
←−−−−−−−−−−
Y(t) V(0)e λV t

Exercise 6.7.8 Classify the equilibria of the following linear differential equations:
% %
X = Y X = 4X + 3Y
a) 
b) 
Y = −2X − 3Y Y = X − 2Y

Complex eigenvalues Finally, let’s consider the nondiagonalizable cases. Consider, for example,
the spring with friction:
)   
X = V X 0 1 X
=⇒ =
V  = −X − V V −1 −1 V

0 1
M=
−1 −1

The eigenvalues of M are



1 3
λ=− ± i ≈ −0.5 ± 0.866 i
2 2
So the eigenvalues are a pair of complex conjugate numbers with negative real parts.
How are we to understand the flow in the case of complex conjugate eigenvalues? The key is
that it is really the same as in the case of real eigenvalues. There, we saw that the flow has the
general form
e λt

along the corresponding eigenvectors. The same is true for imaginary eigenvalues: if λ = a + b i,
then the flow is
e λt = e (a+b i)t = e at e b i t

The key to the dynamics is in the expression e at e bit . Notice that it is the product of two
terms.
The first term e at is an exponential in time, and its exponent is the real part of the eigen-
value. Therefore, if the real part of the eigenvalue is positive, the solution has a term that
is exponentially growing with time, whereas if the real part of the eigenvalue is negative, the
358 Linear Algebra

term becomes a negative exponential, decaying in time. So the sign of a, the real part of the
eigenvalue, determines whether the dynamics are growing or shrinking.
The second term, e b i t , which contains the imaginary part of the eigenvalue, b i, contributes
rotation to the flow. We can see this by recalling Euler’s formula e ix = cos(x) + i sin(x). So
e b i t = cos(bt) + i sin(bt)
The presence of cosine and sine functions of time guarantees that the solution is a periodic
function of time, which gives the solution its oscillatory component.
So, to return to our example of the spring with friction, we can say that the equilibrium point
at (0, 0) is
(1) oscillatory, because the eigenvalues are complex conjugates;
(2) shrinking, because the real part of the eigenvalues is less than 0.
Therefore, the equilibrium point is a stable spiral, which we confirm with simulation (Figure 6.41).

V
1 3 i
λ= 2 2

Figure 6.41: Simulation of the spring with friction verifies the prediction of a stable spiral equi-
librium point.

As another example, in the spring with negative friction,


   
X = V X 0 1 X
=⇒ =
V  = −X + V V −1 1 V

the dynamics are given by the eigenvalues of the matrix



0 1
M=
−1 1
which are √
1 3
λ= ± i ≈ 0.5 ± 0.866 i
2 2

We conclude that the equilibrium point at (0, 0) is


(1) oscillatory, because the eigenvalues are complex conjugates;
(2) expanding, because the real part of the eigenvalues is greater than 0.
Therefore, the equilibrium point is an unstable spiral (Figure 6.42).
6.7. Linear Differential Equations 359

1 3 i
λ= 2 2 X

Figure 6.42: Simulation of the spring with negative friction verifies the prediction of an unstable
spiral equilibrium point.

Finally, for the frictionless spring,


  
X = V X 0 1 X
=⇒ =
V  = −X V −1 0 V

the dynamics are given by the eigenvalues of the matrix



0 1
M=
−1 0
which are
λ = ±i

We conclude that the equilibrium point at (0, 0) is


(1) oscillatory, because the eigenvalues are complex conjugates;
(2) neither expanding nor shrinking, because the real part of the eigenvalues is equal to 0.
Therefore, the equilibrium point is a center (Figure 6.43).

Figure 6.43: Simulation of the frictionless spring verifies the prediction of a neutral equilibrium
point.

Exercise 6.7.9 Classify the equilibria of the linear differential equations whose eigenvalues are
given below:

a) 2 ± −3i b) 0.5 ± 2.6i c) −3 ± −0.75i d) −0.25 ± −0.1i


360 Linear Algebra

A Compartmental Model in Pharmacokinetics


A simple test for liver function is to inject a dye into the bloodstream and see how fast the liver
clears it from the blood and excretes it into the bile. If it clears the dye quickly, liver function is
normal. In the case of the liver, this test is possible because there is a dye (bromsulphthalein,
BSP) that is absorbed only by the liver (Watt and Young 1962).
In order to understand the dynamics of this process, we make a simple linear model. The
model is compartmental, with a blood compartment X and a liver compartment Y . (We don’t
need a bile compartment, since nothing depends on it; we can view it as excretion.)
We’ve seen compartmental models before, in the discrete-time setting. In the epidemiology
model, for example, we had an S (susceptible) compartment and an I (infected) compartment,
and we imagined particles (that is, people) “hopping” from one compartment to another at differ-
ent rates. Here we imagine not particles but a continuous fluid, “flowing” from one compartment
to another at different rates.
The compartmental model is shown in Figure 6.44, where a is the transfer rate of the dye
from the blood (X) to the liver (Y ), b is the transfer rate from the liver (Y ) to the blood (X),
and h is the clearance rate from the liver into the bile. To measure liver function, h is the quantity
we really want to know.

a h
blood liver
b
X Y

Figure 6.44: Compartmental model of the movement of a tracer dye between the liver and the
bloodstream.

The problem is that we can’t observe h. All we can observe is X(t), the concentration of
the dye in the blood. We can estimate X(t) by making a number of blood draws over time,
measuring the dye level at each time point and then using curve-fitting software to estimate the
smooth curve that best fits the data points.
In order to get from an observation of X(t) to an estimation of h, we need to solve this
model. The differential equations are
X = − aX
 + bY

blood→liver liver→blood

Y = aX
 − bY
 − hY

blood→liver liver→blood liver→bile

which we can write as a matrix differential equation


  
X −a b X
=
Y a −(b + h) Y
To model a single injection of the dye (BSP), we set the initial condition of the dye concen-
tration in the blood compartment to a nonzero value X(0) = c, and the initial condition of the
dye concentration in the liver compartment Y (0) = 0.
6.7. Linear Differential Equations 361

We will solve for the long-term dynamics by finding the eigenvalues of the matrix

−a b
M= (a > 0, b > 0, h > 0)
a −(b + h)
Plugging the four entries of M into the characteristic polynomial (equation (6.3) on page
302), we get the two eigenvalues as

1 
λ 1 , λ2 = − (a + b + h) ± (a + b + h)2 − 4ah
2
First of √
all, let’s note that both eigenvalues are real. In order for this to be true, the expression
under the sign has to be nonnegative. This is easily checked:
(a + b + h)2 − 4ah = a2 + b2 + h2 + 2ab + 2ah + 2bh − 4ah
= a2 + b2 + h2 + 2ab − 2ah + 2bh
= a2 − 2ah + h2 + b2 + 2ab + 2bh
= (a − h)2 + 2ab + 2bh + b2
>0
The next
 question is whether the eigenvalues are negative or positive. That depends upon
whether (a + b + h)2 − 4ah is less than (a + b + h). It is certainly true that
(a + b + h)2 − 4ah < (a + b + h)2
since 4ah is a positive number. This implies

(a + b + h)2 − 4ah < a + b + h
which implies 
−(a + b + h) ± (a + b + h)2 − 4ah < 0

So both eigenvalues λ1 , λ2 are negative real numbers, which means that (0, 0), the state in
which all dye is cleared, is a stable equilibrium point. Therefore, the behavior in approach to the
stable equilibrium point is the sum of two exponentially decaying terms. The question is how
fast the state point goes to the stable equilibrium point, for which we need the explicit solution.
Suppose that the eigenvectors corresponding to λ1 and λ2 are U and V. Then we can write
the explicit solution to the differential equation as
 
U(t) U(0)e λ1 t
=
V(t) V(0)e λ2 t
But what we need, to compare it to the experimentally measured data, is X(t). So we need
X(t) and Y(t), not U(t) and V(t).
We go from one coordinate system to the other just as we did before by means of the
coordinate transformation matrix T that takes the {X, Y} basis into the {U, V} basis:
 
X(0) T U(0)
−−−−−−−−−−→
Y(0) V(0)



⏐λ1 , λ2
*
 
X(t) T −1 U(t)
←−−−−−−−−−−
Y(t) V(t)
362 Linear Algebra

When we carry this out, we get explicit solutions


X(t) = Ae λ1 t + Be λ2 t

1
Y (t) = A(a − λ1 )e λ1 t + B(a − λ2 )e λ2 t
b
(a − λ2 )X(0) − bY (0) (a − λ1 )X(0) − bY (0)
where A= B=
λ1 − λ2 λ2 − λ1

In order to compare X(t) to the experimental data, we face a problem. There are four unknown
parameters in the X(t) equation, and it is very difficult to infer four unknown parameters from
a single curve.
The key step in doing this is to think about the graph of a process that is represented by the
sum of two negative exponentials.
Choosing typical numbers for the parameters, and assuming that |λ1 | is significantly greater
than |λ2 |, so that λ1 is a rapidly decaying process and λ2 is a slowly decaying process (which is
the case in the liver), we obtain the following graph:
100
X(t) = A e λ1t + B e λ2t
80
A e λ1t
μg/L 60
B e λ 2t
40

20

t (mins)
5 10 15 20

The trick is to notice that in the early part of the curve, say the first five minutes, the curve
X(t) is very close to the fast negative exponential, while for t > 10 minutes, the curve X(t) is
very close to the slowly decaying process.
We then use the first segment of the X(t) curve to estimate Ae λ1 t , and the second segment
of the X(t) curve to estimate Be λ2 t . A simple calculation then gives us h, which is the liver’s
clearance rate.
100 X(t) = A e λ1t + B e λ2t
80 A e λ1t
60 B e λ 2t
μg/L
40

20

t (mins)
5 10 15 20
6.7. Linear Differential Equations 363

Linear Differential Equations in n Dimensions


The extension to n-dimensional linear differential equations is straightforward: the situation in n
dimensions is very similar to that in two dimensions, and no really new phenomena occur.
We already saw that if f : Rn → Rn is linear, then the matrix M that represents f has
eigenvalues λ1 , λ2 , . . . , λn . We saw that each eigenvalue is either a real number or one of a pair
of complex conjugate eigenvalues.
We can then say that the equilibrium point at (0, 0, . . . , 0) can be decomposed into
(1) stable 1D directions (eigenvectors whose eigenvalues λ < 0);
(2) unstable 1D directions (eigenvectors whose eigenvalues λ > 0);
(3) 2D spiraling behaviors corresponding to pairs of complex conjugate eigenvalues, which are
stable (spiraling in) if the real part of the eigenvalues is negative, and unstable (spiraling
out) if the real part of the eigenvalues is positive.
In this way, we can completely classify every equilibrium point of a linear differential equation.

Further Exercises 6.7

1. Suppose Romeo and Juliet’s love obeys the differential equation


  
R R
=A
J J
where A is a 2 × 2 matrix with the following eigenvectors:
 
−2 3
with eigenvalue −1, and with eigenvalue −4
3 1
a) Give a rough sketch of the vector field for this differential equation.
b) What will happen in the long run?

2. Romeo and Juliet’s relationship is modeled by the equations


R = 0.5R + J
J  = 2R − 0.1J
a) Find and classify all the equilibria for this system.
b) Use the system’s eigenvectors to sketch its vector field.

3. Suppose Romeo and Juliet’s love obeys the following differential equations:
R = −R + 3J
J  = 3R − J

−1 3
The matrix of this system is , which has the following eigenvectors:
3 −1
 
1 −1
with eigenvalue 2, and with eigenvalue −4
1 1
364 Linear Algebra

We will use these two eigenvectors to define a new coordinate system, and we will use
u and v to represent these coordinates. However, in this problem, we will treat u and v
as new variables. Your goal is to rewrite this system of differential equations in terms of
these new variables.
a) Starting with the definition of the coordinates u and v ,
  
R 1 −1
=u +v
J 1 1
solve for u and v in terms of R and J to get
1 1
u= R+ J
2 2
1 1
v =− R+ J
2 2
b) Since R and J are just functions of time, u and v are as well, and taking the derivative
of both sides of the two equations above gives u  = 12 R + 12 J  and v  = − 12 R + 12 J  .
Substitute the original differential equations into this to get u  and v  in terms of R
and J.
c) Now substitute the expressions for R and J (in terms of u and v ) from part (a) into
your answer from part (b) and simplify. This should give you u  and v  in terms of u
and v .
d) What is the matrix of the new system of differential equations that you ended up
with in part (c)? What do you notice about its form? What do you notice about the
specific numbers that appear in it?

You might also like