Linear Algebra
Linear Algebra
Linear Algebra
Note that we have started to write the vector (X1 , X2 , . . . , Xn ) vertically, using round parenthe-
ses. The vertical expression means exactly the same thing as the horizontal expression; the horizontal
one is common in dynamical systems theory, and the vertical one is common in linear algebra.
c Springer International Publishing AG 2017 273
A. Garfinkel et al., Modeling Life,
DOI 10.1007/978-3-319-59731-7_6
274 Linear Algebra
Chapter 1: state space is the space of all possible values of the state vector. This is true for
both state space and tangent space, both of which are Rn . For example, in the Romeo–Juliet
models, the state space R2 consists of all possible pairs (R, J), where both R and J belong to
R, and the tangent space is also R2 , the space of all possible pairs (R , J ), where both R and
J belong to R.
We also learned in Chapter 1 some elementary rules for manipulating vectors. We needed
these rules, for example, in Euler’s method, where we needed to multiply the change vector X
by the scalar Δt to get a small change vector, and then we needed to add the small change vector
to the current state vector to get the next state vector. These rules for scalar multiplication and
vector addition are the rules we will need for operating in Rn .
The space of all n-vectors Rn , together with the rules for scalar multiplication and vector
addition, is called n-dimensional vector space. Note that the sum of n-vectors is also an n-
vector, and the scalar multiple of an n-vector is also an n-vector. So the operations of scalar
multiplication and vector addition keep us in the same space.
In this chapter, we will learn about the property of vector spaces and the linear functions that
take Rn → Rk , that is, take vectors in n-dimensional space (the domain) and assign to each of
them a vector in k-dimensional space (the codomain). Most of the time, we will focus on the
case n = k. To begin, let’s recall the rules for operating with vectors from Chapter 1.
(1) If X and Y are two vectors in Rn , then their sum is defined by
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X1 Y1 X1 + Y1
⎜X2 ⎟ ⎜Y2 ⎟ ⎜X2 + Y2 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
X+Y =⎜ . ⎟+⎜ . ⎟=⎜ .. ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ . ⎠
Xn Yn Xn + Yn
(2) If X is a vector in Rn and a is a scalar in R, we define the multiplication of a vector by
a scalar as ⎛ ⎞ ⎛ ⎞
X1 aX1
⎜X2 ⎟ ⎜aX2 ⎟
⎜ ⎟ ⎜ ⎟
aX = a⎜ . ⎟ = ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠
Xn aXn
Exercise 6.2.1 Carry out the following operations, or say why they’re impossible.
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 −2 4 1
2
a) 2 + 0 ⎠
⎝ ⎠ ⎝ b) −3⎝ 6 ⎠ c) + 3⎠
⎝
4
3 5 −9 5
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 7 1 0 1 0 0
d) 5( + ) e) −4 +2
1 3 0 1 f) 5⎝0⎠ − 3⎝1⎠ + 8⎝0⎠
0 0 1
These n vectors are a basis for Rn , by which we mean that every vector X can be written
uniquely as ⎛ ⎞
X1
⎜X2 ⎟
⎜ ⎟
X = ⎜ . ⎟ = X1 e1 + X2 e2 + · · · + Xn en
⎝ .. ⎠
Xn
To see why an arbitrary vector X can be represented uniquely in the {e1 , e2 , . . . , en } basis,
recall that ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X1 X1 0 0
⎜X2 ⎟ ⎜ 0 ⎟ ⎜X2 ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
X = ⎜ . ⎟ = ⎜ . ⎟ + ⎜ . ⎟ + ··· + ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
Xn 0 0 Xn
Exercise 6.2.3 In e notation, what is the standard basis vector of R6 that has a 1 in position 5?
276 Linear Algebra
Exercise 6.2.4 Write the following vectors as the sum of scalar multiples of the standard basis
vectors in R2 .
45 387 a
a) b) c)
12 509 b
Exercise 6.2.5 Are the following expressions linear combinations? If so, of what variables?
a) 2a + 5b b) e X + 3Y
c) 7Z + 6H − 3t 2 d) −6X + 4W + 5
Exercise 6.2.6 Why does it make sense to describe a smoothie as a linear combination of
ingredients?
Exercise 6.2.7 According to the definition of linearity, are the following functions linear?
2 √
X X b) f (X) = X
a) f ( )=
Y 2Y
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X 2X X 2X
c) f (⎝ Y ⎠) = ⎝XY ⎠ d) f (⎝ Y ⎠) = ⎝ 4Y ⎠
Z 3Z Z 3Z
Linear functions R1 → R1 . We’ll start with the simplest example, f : R1 → R1 . In this context,
we think of numbers as one-dimensional vectors and write R1 instead of R. Thinking
of R1 as
a one-dimensional vector space, we see that it has the standard basis {e} = { 1 }.
If f is a linear function and X is any vector in R1 , what is f (X)?
6.2. Linear Functions and Matrices 277
To start answering this question, we’ll take the odd-seeming but useful step of writing X as
X · e. Then, according to the definition of linearity, we have
f (X) = f (X · e) = Xf (e)
But what is f (e)? We don’t know what it is, but we do know that it belongs to R1 . Let’s
just call it k. Then
Xf (e) = Xk
Now f (e1 ) is some vector in R1 ; call it a. Similarly, f (e2 ) is some vector in R1 ; call it b:
X
f( ) = Xf (e1 ) + Y f (e2 ) = Xa + Y b = Xa e + Y b e = aX + bY
Y
X
To summarize, if f : R2 → R1 is linear, it must have the form f ( ) = aX + bY for two
Y
scalars a and b.
278 Linear Algebra
Exercise 6.2.10 Work through this procedure to find the form that a linear function f : R3 →
R1 must have.
Exercise 6.2.11 Explain what we are doing in each step in the series of equations above, paying
special attention to places where we use vector operations and the properties of linear functions.
When we want to talk about applying a matrix to a vector, we just write them next to each
other, putting
the
matrix in square brackets on the left and the vector in round brackets on the
a b X
right: . The action of f on a vector in the domain is found by applying the matrix
c d Y
representation of f to the vector, according to the rule shown in Figure 6.1.
a b X aX + bY a b X aX + bY
c d Y c d Y cX + dY
Notice that the first column of the matrix is f (e1 ), and the second column is f (e2 ). This is
a general principle of how matrices work.
3 −2
Exercise 6.2.13 If f (e1 ) = and f (e2 ) = , what is the matrix representation of f ?
6 5
6 8
Exercise 6.2.14 If the matrix representing f is , what are f (e1 ) and f (e2 )?
5 1
The case f : R3 → R3 . Suppose f is a linear function that takes vectors in R3 (the domain) to
R3 (the codomain). And suppose X is a vector in R3 . In the standard basis {e1 , e2 , e3 }, X can
be written as
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X1 1 0 0
⎝
X = X2 = X1 0 + X2 1 + X3 0⎠ = X1 e1 + X2 e2 + X3 e3
⎠ ⎝ ⎠ ⎝ ⎠ ⎝
X3 0 0 1
To evaluate the action of f on X, we know that
f (X) = f (X1 e1 + X2 e2 + X3 e3 )
By the rules of linearity, we can decompose f (X) as
f (X) = f (X1 e1 + X2 e2 + X3 e3 )
= f (X1 e1 ) + f (X2 e2 ) + f (X3 e3 )
= X1 f (e1 ) + X2 f (e2 ) + X3 f (e3 )
We can say that f (e1 ) is some vector in R3 . Therefore, there are scalars a11 , a21 , and a31
such that ⎛ ⎞
a11
f (e1 ) = ⎝a21 ⎠
a31
The vector f (e2 ) is also some vector in R3 . So there are scalars a12 , a22 , and a32 such that
⎛ ⎞
a12
f (e2 ) = ⎝a22 ⎠
a32
Similarly, for f (e3 ), there are scalars a13 , a23 , and a33 such that
⎛ ⎞
a13
f (e3 ) = ⎝a23 ⎠
a33
Consequently, plugging the expressions for f (e1 ), f (e2 ), and f (e3 ) into f (X), we get
f (X) = X1 f (e1 ) + X2 f (e2 ) + X3 f (e3 )
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 a12 a13
= X1 ⎝a21 ⎠ + X2 ⎝a22 ⎠ + X3 ⎝a23 ⎠
a31 a32 a33
6.2. Linear Functions and Matrices 281
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 X1 a12 X2 a13 X3
= ⎝a21 X1 ⎠ + ⎝a22 X2 ⎠ + ⎝a23 X3 ⎠
a31 X1 a32 X2 a33 X3
⎛ ⎞
a11 X1 + a12 X2 + a13 X3
⎝
= a21 X1 + a22 X2 + a23 X3 ⎠
a31 X1 + a32 X2 + a32 X2
⎡ ⎤⎛ ⎞
a11 a12 a13 X1
= ⎣a21 a22 a23 ⎦⎝X2 ⎠
a31 a32 a33 X3
Therefore, the 3 × 3 matrix [aij ] is the matrix1 representation of f : R3 → R3 relative to the
standard basis {e1 , e2 , e3 }.
Exercise 6.2.15 For a function f : R3 → R2 , choose vectors for f (e1 ), f (e2 ), f (e3 ) and work
through the reasoning above to find the matrix representation of f . What are the dimensions
of this matrix?
Exercise 6.2.16 Similarly, for another function g : R3 → R2 , choose vectors for g(e1 ), g(e2 )
and work through the reasoning above to find the matrix representation of g. What are the
dimensions of this matrix?
To find f (X), we use the fact that we know that there are always scalars aij (i , j = 1, 2, . . . , n)
such that ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 a12 a1n
⎜a21 ⎟ ⎜a22 ⎟ ⎜a2n ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
f (e1 ) = ⎜ . ⎟ f (e2 ) = ⎜ . ⎟ · · · f (en ) = ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
an1 an2 ann
Then
⎛ ⎞
X1
⎜X2 ⎟
⎜ ⎟
f (⎜ . ⎟) = f (X1 e1 + X2 e2 + · · · + Xn en ) linear combination
⎝ .. ⎠
Xn
= X1 f (e1 ) + X2 f (e2 ) + · · · + Xn f (en ) properties of linearity
1 We will often write the matrix whose components are aij as the matrix [aij ].
282 Linear Algebra
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 a12 a1n
⎜a21 ⎟ ⎜a22 ⎟ ⎜a2n ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ representation of
= X1 ⎜ . ⎟ + X2 ⎜ . ⎟ + · · · + Xn ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠ f (e1 ), f (e2 ), . . . , f (en )
an1 an2 ann
⎛ ⎞
a11 X1 + a12 X2 + · · · + a1n Xn
⎜a21 X1 + a22 X2 + · · · + a2n Xn ⎟ scalar multiplication
=⎜
⎝
⎟
⎠
... vector addition
an1 X1 + an2 X2 + · · · + ann Xn
⎡ ⎤⎛ ⎞
a11 a12 . . . a1n X1
⎢a21 a22 . . . a2n ⎥⎜X2 ⎟
⎥⎜
⎢ ⎟
=⎢ . . . .. ⎥⎜ .. ⎟
⎣ .. .
. . . . ⎦⎝ . ⎠
an1 an2 ... ann Xn
We say that the n × n matrix [aij ] is the matrix representation of f : Rn → Rn relative
to the basis {e1 , e2 , . . . , en }.
Similar to the R2 → R2 and the R3 → R3 cases, the action of f : Rn → Rn on a vector in Rn
is found by applying the matrix representation of f to the vector, according to the rule shown
in Figure 6.2.
If f is a linear function from Rn to Rn , the columns of the matrix representing f are f (e1 ),
f (e2 ), . . . , f (en ).
6.2. Linear Functions and Matrices 283
What all this abstract work buys us is the ability to say what a function does to any vector
by knowing what it does to the standard basis vectors. For example, in the f : R2 → R2 case, it
means that we can say whatthe function
does to an infinity of possible vectors by knowing what
1 0
it does to just two vectors, and . This is powerful, and it will enable us to understand
0 1
techniques for working with matrices instead of just memorizing them.
We will now develop an example of the use of matrices in biology that we will refer to
throughout this chapter.
Exercise 6.2.17 What are the matrices representing the following systems of equations?
a) XN+1 = 2XN + 6YN and YN+1 = 3XN + 8YN
b) XN+1 = −1.5XN and YN+1 = 6XN + YN
c) ZN+1 = 18ZN + 5WN and WN+1 = −7ZN + 2.2WN
d) aN+1 = −3aN and bN+1 = bN
e) aN+1 = −2bN and bN+1 = 4aN
Exercise 6.2.18 What systems of equations are represented by the following matrices? (You
can use X and Y as your variables.)
3 5 −2 3 0 4
a) b) c)
7 9 1 2 −5 0
⎡ ⎤
−1 0 0 4 0
d)
0 2.5 e) ⎣−7 0 2⎦
1 0 3
Exercise 6.2.19 Use the method we used here to find the next year’s population if this year’s
population consists of 15 juveniles and 8 adults.
We can also use the rule for applying a matrix to a vector (Figure 6.1) to calculate the
populations of the two age groups in the following year:
J1 0.65 0.5 100 0.65 × 100 + 0.5 × 50 90
= = =
A1 0.25 0.9 50 0.25 × 100 + 0.9 × 50 70
⎡ ⎤⎛ ⎞
3 2 10 2 6 5 4 0 1 X
a) b) ⎣
4 1 10 1 4 3 c) 3 2 1⎦⎝ Y ⎠
1 4 2 Z
f g
n g k f p
This is the general case, but in this text, we are mostly interested in the case Rn → Rn → Rn .
If f and g are linear functions, represented (in the standard basis {e1 , e2 , . . . , en }) by matrices
A and B, then their composition f ◦ g is also a linear function, which is therefore represented
by a matrix we will call C. As always, the columns of this matrix show what the function does
to the standard basis vectors. The first column is (f ◦ g)(e1 ), the second is (f ◦ g)(e2 ), and the
nth column is (f ◦ g)(en ).
How do we find the matrix of f ◦ g? We already know g(e1 ); it’s just the first column of B.
Now all we need to do is apply f to this vector, which we can do using the shortcut of applying
the matrix A to g(e1 ). Similarly, to find the second column of the matrix of f ◦ g, we apply the
matrix A to g(e2 ), which is the second column of B. Repeating this process, we generate the n
columns of the matrix that represents f ◦ g.
We can also develop this idea algebraically to calculate the matrix C = [cij ] from A and B.
Suppose A = [aij ] and B = [bij ]. If we take an arbitrary vector X in Rn , apply B to it, and then
apply A to the result, we get
286 Linear Algebra
⎡ ⎤⎡ ⎤⎛ ⎞
a11 a12 ... a1n b11 b12 ... b1n X1
⎢a21 a22 ... a2n ⎥⎢ b2n ⎥⎜X2 ⎟
⎢ ⎥⎢b21 b22 ... ⎥⎜ ⎟
ABX = ⎢ . .. .. .. ⎥⎢ .. .. .. .. ⎥⎜ .. ⎟
⎣ .. . . . ⎦⎣ . . . . ⎦⎝ . ⎠
an1 an2 ... ann bn1 bn2 . . . bnn Xn
⎡ ⎤⎛ ⎞
a11 a12 ... a1n b11 X1 + b12 X2 + · · · + b1n Xn
⎢a21 a2n ⎥⎜ ⎟
⎢ a22 ... ⎥⎜b21 X1 + b22 X2 + · · · + b2n Xn ⎟
=⎢ . .. .. .. ⎥⎜ .. ⎟ apply B to X
⎣ .. . . . ⎦⎝ . ⎠
an1 an2 . . . ann bn1 X1 + bn2 X2 + · · · + bnn Xn
⎛ ⎞
c11 X1 + c12 X2 + · · · + c1n Xn
⎜c21 X1 + c22 X2 + · · · + c2n Xn ⎟
⎜ ⎟
=⎜ .. ⎟ apply A to BX
⎝ . ⎠
cn1 X1 + cn2 X2 + · · · + cnn Xn
⎡ ⎤⎛ ⎞
c11 c12 . . . c1n X1
⎢c21 c . . . c ⎥⎜X2 ⎟
⎢ 22 2n ⎥⎜ ⎟
=⎢ . .. .. .. ⎥⎜ .. ⎟ = CX
⎣ .. . . . ⎦⎝ . ⎠
cn1 cn2 ... cnn Xn
where cij = ai1 b1j + · · · + aii bij + · · · + aij bjj + · · · + ain bnj = k=n
k=1 aik bkj
We can think of this matrix multiplication graphically (Figure 6.3). To find cij , take row i of
matrix A and column j of matrix B, line the two up, and then multiply them componentwise,
adding up the results.
Matrix Multiplication
If a linear function f is represented by the matrix A and another linear function g is represented
by the matrix B, then the composition f ◦ g(X) is represented by the matrix product ABX.
6.2. Linear Functions and Matrices 287
Exercise 6.2.22 If the matrices A and B have the following dimensions, does AB exist?
a) A is a 5 × 2 matrix and B is a 2 × 3 matrix.
b) A is a 3 × 4 matrix and B is a 3 × 2 matrix.
c) A is a 138 × 7 matrix and B is a 7 × 26 matrix.
⎡ ⎤
1 5 2 −1 2 3 −2 4
2 0
a) b) 1 2 3 ⎣
3 2 4 5 3 −1 1 −3 c) −2 5⎦
3 2 −1
1 −3
Exercise 6.2.24 Verify that this calculation is correct by applying the good-year matrix M to
the initial condition, and then applying the bad-year matrix M bad to the result. How does your
result compare to the above calculation?
Exercise 6.2.26 What matrix product represents a sequence of two good years, followed by
two bad years, followed by a good year? Be careful about the order of multiplication.
Notation
Once we have the matrix representation of a function, we can then talk about what would
happen if we applied the function repeatedly to get the long-term behavior of the system. This
is our next topic.
3. You are making smoothies. (Be sure to justify your answers to the questions that follow.)
a) A smoothie recipe can be seen as a linear combination of ingredients. Explain why
this is true.
b) Is the cost to make a smoothie a linear function of the costs of the ingredients?
c) Is the caloric content of the smoothie a linear function of the caloric content of the
ingredients?
d) Iron is absorbed better in the presence of vitamin C. Is the amount of available iron
in your smoothie a linear function of the amount of available iron in the ingredients?
e) You get your friends to taste your creations. Is the number of friends who like a
smoothie likely to be a linear function of the number who like each ingredient?
6.2. Linear Functions and Matrices 289
f) Your smoothies are a hit and you decide to go into business. If you want to keep
prices simple, so that all smoothies of a given size cost the same, will your prices
be a linear function of the prices of the ingredients?
4. While going to a teaching assistant’s office hours, you get lost in the bowels of the School
of Engineering. You are walking through the Materials Science Department when you find
a strip of a material that looks like nothing you have ever seen before. You pocket it for
later examination. Back in your room, you decide to study how the material responds to
stretching and compression. Design an experiment to see whether its response to these
forces is linear.
5. You are studying how temperature affects the growth of your state flower in order to
predict the species’s response to climate change. You have a greenhouse and can grow
the plants at any temperature you want.
a) Suppose you call the average temperature at which the plants grow 0, so below-
average temperatures are negative and above-average ones are positive. Similarly,
below-average growth rates are negative and above-average ones are positive.
Design an experiment to test whether the response of change in growth rate to
change in temperature is linear.
b) What result do you expect this experiment to produce? Justify your answer.
9. Multiply:
2 3 3 5 8 1
a) b)
1 2 2 0 4 5
⎛ ⎞ ⎡ ⎤⎛ ⎞
1 0 1 3 2
6 −2 7 ⎝ ⎠
c) 3 d) ⎣−4 2 1 ⎦⎝−4⎠
1 0 2
4 3 6 −2 3
10. Carry out the following matrix multiplications. For each problem, say what the function
represented by each matrix does to the standard basis vectors and what the product of
the two matrices would do to these vectors.
7 9 0 2 5 −4 3 4 −1 −2 3 0
a) b) c)
3 1 4 6 2 0.5 2 −1 5 9 0 1
11. Multiply:
7 8 3 2 3 2 5 2 −1
a) b)
4 5 −2 −3 1 5 4 2 1
⎡ ⎤ ⎡ ⎤⎡ ⎤
−2 1
1 2 0 4 6 −7
−6 3 7
c) ⎣ 0 3⎦ d) ⎣3 5 0 ⎦⎣−2 0 1⎦
9 −4 −5
4 6 0 1 −2 −4 4 3
12. What is the difference between multiplying a matrix times a vector and multiplying two
matrices?
a) Suppose
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 1
⎝ ⎠ 5 ⎝ ⎠ 2 ⎝ ⎠ 3
g( 0 ) = , g( 0 ) = and g( 1 ) =
7 1 4
0 1 1
Find the matrix of g.
b) Find the matrix of f ◦ g or explain in terms of functions why it does not exist.
c) Find the matrix of g ◦ f or explain in terms of functions why it does not exist.
The black bear population model developed in the previous section is an example of a Leslie
matrix. A Leslie matrix model of a population gives the rates at which individuals go from one
life stage to another. In this case, we have two life stages, juvenile and adult. The diagonal entries
give the fraction of the population that stays within the same life stage, while the off-diagonal
entry in the top row gives the birth rate of juveniles. The off-diagonal entry in the bottom row
is the transition rate from the juvenile stage to the adult stage. Therefore, in the model
0.65 0.5
M=
0.25 0.9
65% of juveniles remain juveniles and 90% of adults remain adults in any given year. Furthermore,
25% of juveniles in a given year mature into adults, and the average adult has 0.5 (female)
offspring.
Exercise 6.3.1 Come up with a Leslie matrix model for a fictional species with two life stages
and describe the meaning of its entries, as above.
Let’s look at the long-term behavior of this model. If we iterate M from an initial condition
of 10 juveniles and 50 adults for 15 times, we see that both juvenile and adult populations grow
with time (Figure 6.4, left). Notice that the trajectory consists of isolated points. This is because
a Leslie matrix is a discrete-time model. If we plot these points in J-A state space, we see that
after the first few values, all the points fall on a straight line passing through the origin, implying
that the ratio of juveniles to adults remains constant as the population grows (Figure 6.4, right).
292 Linear Algebra
Moreover, the distance between successive state points increases with time, meaning that the
population growth rate increases with population size.
300 A
juveniles (J) good years 300
adults (A)
100 100
0
N 0 J
5 10 15 100 200 300
Figure 6.4: Time series (left) and corresponding trajectory (right) produced by iterating the
matrix M, modeling the black bear population in a good year. Notice that both consist of
discrete points.
Now let’s consider a bad year, which, as we saw, is modeled by the matrix
0.5 0.4
M bad =
0.1 0.8
Iterating this matrix, we see that both juvenile and adult populations go to zero with time
(Figure 6.5, left). However, this decline doesn’t initially affect both age groups in the same way.
The juvenile population grows for a time, while the adult population just shrinks. Of course,
this can’t go on forever, so after a few years, both populations enter long-term decline. (The
system’s behavior before it enters this long-term pattern is called a transient.)
30
25
20
10
0 N 0
J
5 10 15 25 50
Figure 6.5: Time series (left) and corresponding trajectory (right) produced by iterating the
matrix M bad , modeling the black bear population in a bad year.
Let’s consider another Leslie matrix for a two-stage population. Here we will consider a
situation in which 10% of juveniles remain juvenile, 40% become adults, and the rest die. The
birth rate is 1.4 offspring per adult, and only 20% of adults survive each year. This gives us a
6.3. Long-Term Behaviors of Matrix Models 293
matrix
0.1 1.4
M osc =
0.4 0.2
If we iterate M osc , we see that both juvenile and adult populations approach the stable
equilibrium at (0, 0) in an oscillatory manner (Figure 6.6).
A
70
50
juveniles (J)
60
adults (A)
40
50
40 30
30
20
20
10
10
5 10 15
N 10 20 30 40 50 60 70
J
Figure 6.6: Time series (left) and corresponding trajectory (right) produced by iterating the
matrix M osc .
Neutral Equilibria
We will now consider an important class of models whose equilibria are not the isolated equilib-
rium points we have been seeing all along. In these models, called Markov processes, the final
equilibrium value depends on the initial condition, so there is an infinity of equilibrium points.
All of the models we have seen so far can be thought of as compartmental models. In a com-
partmental model, a large number of objects are transferred from one compartment to another,
according to rules. In the discrete-time version of compartmental modeling, these transfers take
place at discrete time points, 1, 2, 3, . . . , N.
In epidemiology, the study of infectious diseases, many models use compartments called
susceptibles (those who can become infected), and infecteds. We will represent these two pop-
ulations by S and I.
In epidemiology, linear models of disease transmission are used to predict whether a disease
will initially spread. Epidemiologists will make an estimate of the rate of “new cases per old case,”
the quantity called R0 (read “R-zero” or “R-nought”) and then model the epidemic as
IN+1 = R0 IN
where IN is the number of infected people at the Nth time point. If R0 > 1, the epidemic
spreads, while if R0 < 1, the epidemic will tend to die out.
In the more general case, we can write a simple compartmental model representing the trans-
fers from the susceptibles compartment S to the infecteds compartment I and vice versa.
I remaining I
S becoming I
S I
S remaining S I becoming S
We will make the extremely strong assumption that at each time point, a constant fraction a
of the susceptibles become infected and a constant fraction b of the infecteds recover to become
294 Linear Algebra
susceptibles again. If a is the fraction of S that become I, then the fraction of S that remain
S must be 1 − a. If b is the fraction of I that become S, then the fraction of I that remain I
must be 1 − b. This gives us the following figure.
1-b
a
S I
b
1-a
Exercise 6.3.2 Explain why the entries in each column of a transition matrix such as equa-
tion (6.1) must add up to one. (Hint: Label the rows and columns, writing “from” and “to” where
appropriate.)
Exercise 6.3.3 Starting with 20 susceptible and 40 infected individuals, iterate M SI 15 times
in SageMath. What steady state does the system reach? Do the same for 50 susceptible and
60 infected individuals. How do your results compare to the simulations in Figure 6.7?
80 susceptibles (S) 80
infecteds (I)
60 60
40 40
20 20
5 10 15
N 5 10 15
N
Figure 6.7: Time series from two simulations of the susceptible-infected model. Starting from
different initial conditions, the system converges to different equilibrium points.
6.3. Long-Term Behaviors of Matrix Models 295
Exercise 6.3.4 What is the behavior of the total population (S + I) over time?
Why does this susceptible–infected system behave so differently from the black bear Leslie
matrices we studied at the beginning of this section? One key difference is that Leslie matrices
involve births and deaths. A population modeled by a Leslie matrix model must grow or decline
unless the birth and death rates exactly balance. In this particular disease model, on the other
hand, individuals are just shuffled from one compartment to another, without any overall increase
or decrease in population size.
Neutral Oscillations
Our final example of a matrix model is one that gives neutral oscillations (Bodine et al. 2014). By
“neutral,” we mean that here, as in the previous example of neutral equilibria, the final outcome
depends on the initial condition, only here the final outcome is an oscillation. These “neutral
oscillations” are therefore a discrete-time analogue to the neutral oscillations we saw in the
frictionless spring and the shark–tuna models.
Locusts, which are important agricultural pests, have three stages in their life cycle: eggs (E),
hoppers (juveniles) (H), and adults (A). In a certain locust species, the egg and hopper stages
each last one year, with 2% of eggs surviving to become hoppers and 5% of hoppers surviving
to become adults. Adults lay 1000 eggs (as before, we are modeling only females) and then die.
From these principles, we can write a 3-variable linear equation
EN+1 = 0 · EN + 0 · HN + 1000AN
HN+1 = 0.02EN + 0 · HN + 0 · AN
AN+1 = 0 · EN + 0.05HN + 0 · AN
which gives rise to a 3 × 3 Leslie matrix:
⎡ ⎤
0 0 1000
L = ⎣0.02 0 0 ⎦
0 0.05 0
Simulating the model, iterating L with an initial population of 50 eggs, 100 hoppers, and 50
adults results in oscillatory dynamics of the populations over time. Consider, for example, the
adult population (Figure 6.8, black dots). As you can see, the adult population oscillates with
no overall growth or decline.
50
40
adults(A) 30
20
10
N
5 10 15
Figure 6.8: Time series of adult populations from two simulations (black and green) of the locust
population model from two different initial conditions.
296 Linear Algebra
If we try a different initial condition, say 50 eggs, 20 hoppers, and 30 adults, we get a
different oscillation, also with no overall growth or decay, but with different values (Figure 6.8,
green dots).
Exercise 6.3.5 Simulate the discrete-time dynamical system described by the matrix L, and
plot all three populations.
Exercise 6.3.6 Calculate the total population E + H + A at each time point. How does it
change?
We have now seen the repertoire of long-term behaviors that linear models can exhibit: stable
and unstable equilibria, neutral equilibria, and neutral oscillations.
stable
80000
60000
0 years (AD)
1280 1320 1360 1400 1440 1480
Figure 6.9: Simulated effects of different human colonization scenarios on moa populations.
Redrawn from “Rapid extinction of the moas (Aves: Dinornithiformes): model, test, and implica-
tions,” by R.N. Holdaway and C. Jacomb, 2000, Science 287(5461):2250–2254. Reprinted with
permission from AAAS.
6.3. Long-Term Behaviors of Matrix Models 297
Note from their simulations that even without habitat loss, the hunting pressure of even 100
humans, growing at 2.2% per year, with no habitat loss, was enough to drive the moa to extinc-
tion, albeit in a slightly longer time. Habitat loss made it worse, and if they considered an initial
population of 200 humans and included habitat loss, the decline was even more catastrophic.
The authors conclude that “Long-lived birds are very vulnerable to human predation of adults.”
Exercise 6.3.7 If a species is going extinct, what equilibrium is the population size approaching?
Is this equilibrium stable or unstable?
Matrix models are also helping to prevent sea turtles from going the way of the moa. Log-
gerhead sea turtles are an endangered species. Adult females build nests on beaches, lay eggs,
and leave. Hatchlings then go out to sea, where they grow into juveniles and then adults.
In the 1980s, sea turtle conservation efforts focused on protecting nests and hatchlings. Then
a group of ecologists decided to test whether such efforts, even if extremely successful, could
actually save the species from extinction (Crouse et al. 1987). They used field data to build a
matrix model consisting of seven life stages (eggs and hatchlings, small juveniles, large juveniles,
subadults, novice breeders, first-year remigrants, and mature breeders), and for each stage in
turn, they reduced mortality to zero. This is obviously impossible, but it’s the most basic test a
conservation strategy must pass. If eliminating all mortality in a life stage can’t save the species,
neither can merely reducing the mortality.
Simulations showed that if nothing was done, the population would decline. However, elimi-
nating all mortality in the eggs and hatchlings stage didn’t reverse the decline. To do so, it was
necessary to protect large juveniles and subadults. Since most preventable mortality at this stage
came from turtles getting caught in fishing and shrimping nets, mandating the installation of
turtle excluder devices that allow sea turtles to escape from nets is a much better strategy for
protecting the species. The United States currently requires the use of these devices, but some
countries in loggerhead habitat do not.
1. Giant pandas are a vulnerable species famous for their consumption of large amounts
of bamboo. Write a discrete-time matrix model of a giant panda population using the
following assumptions. We are modeling only the female population.
– Pandas have three life stages: cubs, subadults, and reproductively mature adults.
– Cubs remain cubs for only one year. They have a mortality rate of 17%.
– Pandas remain subadults for three years. Thus, about 33% of subadults mature into
adults each year.
– 28% of subadults die each year.
– On average, adults give birth to 0.5 female cubs each year.
– 97.7% of adults survive from one year to the next.
298 Linear Algebra
2. Nitrogen is a key element in all organisms. Use the following assumptions to set up a
matrix model of nitrogen flow in an ecosystem consisting of producers (P ), consumers
(C) and decomposers (D).
– 25% of the nitrogen in plants goes to consumers and 50% goes to decomposers.
– 75% of the nitrogen in consumers goes to decomposers.
– 5% of the nitrogen in decomposers goes to consumers, and 15% is lost from the
ecosystem. The rest goes to plants.
4. Black-lip oysters (Pinctada margaritifera) are born male, but may become female later
in life (a phenomenon known as protandrous hermaphroditism). We can therefore divide
their population into three life stages: juveniles (which are all male), adult males, and
adult females. Assume the following:
– Each year, about 9% of juveniles remain juveniles, 0.9% grow to become adult males,
and 0.1% grow into adult females. The rest die.
– Each year, about 4% of adult males become female, and about 10% of them die.
– About 10% of adult females die each year. Females never change back into males.
– Each female lays enough eggs to yield about 200 juveniles per year.
Write a discrete-time matrix model based on these assumptions.
The easiest way to make a 2D function is to take two 1D functions and join them together.
So if U = aX and V = d Y , then we can make the function
U aX
=
V dY
This represents a very special case in which U depends only on X, and V depends only on Y .
In this special case, the function is represented by a diagonal matrix, which is a matrix whose
entries are all 0 except those on the descending diagonal:
U aX a 0 X
= =
V dY 0 d Y
In this case, it is easy to determine the action of function f : it acts like multiplication by a
along the X axis and like multiplication by d along the Y axis.
Y
dY
f
Y
X
0 X aX
For example, consider a linear discrete-time dynamical system consisting of two species that
don’t interact with each other, such as sharks and rabbits. Let SN be the number of sharks
in the Nth year, and let RN be the number of rabbits in the Nth year. Because there is no
interaction, SN+1 is purely a function of SN , and RN+1 is purely a function of RN . If the shark
population grows at a rate a and the rabbit population grows at a rate d, then SN+1 = aSN and
RN+1 = dRN .
300 Linear Algebra
A diagonal matrix represents a function that can be decomposed into two 1-dimensional
functions along the axes X and Y. Diagonal matrices represent systems in which the variables
are noninteracting.
2 0
Exercise 6.4.1 Consider the matrix M = .
0 3
1
a) Compute Me1 , Me2 , and M .
1
1
b) Draw e 1 , e 1 , , and the vectors you obtained in the first part of this problem.
1
1
c) Describe what M does to e1 , e2 , and .
1
d) What will M do to other vectors that lie along the X axis? The Y axis?
e) What will M do to vectors that do not lie along the axes?
0.5 0
Exercise 6.4.2 Repeat the previous exercise for M = .
0 −2
Eigenvalues
Understanding the action of a diagonal matrix is easy. But what about the general case? The
typical matrix is not a diagonal matrix, so it is hard to guess what the action of the matrix looks
like. Since U is a function of both X and Y , and so is V , we cannot simply decompose f into
two 1D systems acting along the X and Y axes. We can’t just look at the X and Y axes and
stretch or compress the standard basis vectors.
But what if we could find two new axes? Specifically, what if we could find two vectors U
and V such that f is decomposable into two 1D systems acting along the U and V axes?
If two such axes did exist, then by definition, they would have to have the property that
MU = λ1 U and MV = λ2 V
for some real numbers λ1 and λ2 , which means that M would be acting along the vector U as
multiplication by λ1 , and acting along the vector V as multiplication by λ2 .
When this can be done, we call U and V the eigenvectors of M, and λ1 and λ2 are the
corresponding eigenvalues.
Exercise
6.4.3
One of the eigenvalues of the matrix M is 3, and a corresponding eigenvector
2
is V = . Find MV.
1
6.4. Eigenvalues and Eigenvectors 301
Y
λY
Y
E ME= λE
0 X λX X
Figure 6.10: The effect of applying the matrix M to the vector E (black arrow) is a new vector
that is E multiplied by a scalar λ.
and
X λX
λE = λ =
Y λY
Since ME = λE,
aX + bY λX
=
cX + d Y λY
which gives us X in terms of Y . We will now use that to substitute for X in the second expression,
cX + d Y = λY
302 Linear Algebra
which gives us
bY
c + dY = λY
λ−a
cbY
=⇒ = (λ − d)Y
λ−a
cb
=⇒ = (λ − d)
λ−a
=⇒ cb = (λ − a)(λ − d)
=⇒ cb = λ2 − aλ − dλ + ad
This is a quadratic equation in λ, called the characteristic equation, which must be satisfied if
λ is a solution to equation (6.2).
We know how to solve quadratic equations. Using the quadratic formula, we get
(a + d) ± (a + d)2 − 4(1)(ad − cb)
λ=
2(1)
which can be simplified to
(a + d)2 − 4(ad − cb)
(a + d) ±
λ= = (λ1 , λ2 ) (6.3)
2
a b
We have found a very fundamental relationship. For every matrix M = there is a set
c d
of axes2 U, V such that MU = λ1 U and MV = λ2 V, and we have found λ1 and λ2 in terms of
the coefficients a, b, c, and d. The quadratic formula gives us two values of λ (note the ± sign
in the expression). These two values, which we call λ1 and λ2 , are called the two eigenvalues of
the matrix M.
a b
For the matrix M = , the characteristic equation (or characteristic polynomial) for
c d
an eigenvalue λ in 2D is
λ2 − (a + d)λ + (ad − cb) = 0
2 We will later see that these may not be axes in the usual sense, since they could involve complex numbers,
We have now found that there are two axes U, V such that the matrix acts like multiplication
by λ1 = 5 along U, and acts like multiplication by λ2 = −1 along V.
But we do not know what U and V are yet.
3 5 4 1
Exercise 6.4.4 Compute the eigenvalues of the following matrices: and .
2 4 3 2
Eigenvectors
X
We now need to find U and V. Let’s say U = . Since we said that M acts like multiplication
Y
by 5 along U, this means that
1 2 X X + 2Y X 5X
MU = = = λ1 U = 5 =
4 3 Y 4X + 3Y Y 5Y
So
X + 2Y = 5X =⇒ Y = 2X
4X + 3Y = 5Y =⇒ Y = 2X
Now Y = 2X is the equation for the line in (X, Y ) space that has slope 2 and passes through
the origin. This line is the axis
U. We can choose any nonzero vector on the U axis to represent
1
it, for example, the vector . This vector is then called an eigenvector of the matrix M
2
corresponding to the eigenvalue λ1 = 5.
to the second eigenvalue λ2 = −1 can be found in a similar
An eigenvector corresponding
X
manner. Let’s assume V = . Then
Y
1 2 X X + 2Y X −X
MV = = = λ2 V = −1 =
4 3 Y 4X + 3Y Y −Y
So
X + 2Y = −X =⇒ Y = −X
4X + 3Y = −Y =⇒ Y = −X
Y = −X is the equation for the line in (X, Y ) space that has slope −1 and passes through
the origin. This line is the axis V. As before,
we can choose any nonzero vector on the U axis to
1
represent it, for example, the vector , which is then called an eigenvector of the matrix
−1
M corresponding the eigenvalue λ2 = −1.
Y U ( λ1= 5) V ( λ2= 1) Y
1 1
X X
-1 1 -1 1
-1 -1
304 Linear Algebra
If a matrix M has two real eigenvalues λ1 and λ2 , this implies that M can be decomposed
using two new axes, U and V, such that M acts like multiplication by λ1 along U and like
multiplication by λ2 along V.
New coordinate systems. We can navigate in R2 using these two new axes. The standard basis
{e1 , e2 } is the most familiar coordinate system for R2 : to get to any point, go a certain distance
horizontally (parallel to e1 ) and a certain distance vertically (parallel to e2 ). The eigenvectors U
and V also form a coordinate system, and we can get to any point in R2 by moving a certain
distance in the U-direction and a certain distance in the V-direction.
We will now illustrate the process of navigating in R2 using two different coordinate sys-
tems. As our {X, Y} coordinate system, we will use the standard basis {e1 , e2 }. For the {U, V}
coordinate system, we will use the eigenvectors we just calculated:
1 0 1 1
{X, Y} = {e1 , e2 } = { , } {U, V} = { , }
0 1 2 −1
3
Consider the point p represented in the standard {X, Y} coordinate system as p{X,Y } = .
0
To navigate from the origin to p, go three units in the X direction, and zero units in the Y
direction (Figure 6.11, left).
Y Y
V U
2 2
1 1
p p
X X
1 2 3 1 2 3
Figure 6.11: Finding the coordinates of the point p in a new coordinate system {U, V}.
6.4. Eigenvalues and Eigenvectors 305
In order to navigate to p in the {U, V} coordinate system, suppose that the coordinates of p
are c1 and c2 . We have
1 1 c1 × 1 + c2 × 1 3
p{U,V } = c1 U + c2 V = c1 + c2 = =
2 −1 c1 × 2 + c2 × (−1) 0
Solving this algebraically, we get
c1 × 1 + c2 × 1 = 3
=⇒ c1 = 1, c2 = 2
c1 × 2 + c2 × (−1) = 0
Therefore, to navigate from the origin to p in the {U, V} coordinate system, go one unit in
the U direction and two units in the V direction (Figure 6.11, right).
Exercise 6.4.5 Find the eigenvectors of the matrices whose eigenvalues you found in Exercise
6.4.4 on page 303.
We will now use the ability to change coordinate systems to map the action of M.
In order to calculate the action of M, we need to locate this point on the U and V axes
from the {X, Y} coordinate system
(Figure 6.12). To do this, we need a way of transforming
pU
to the new {U, V} coordinate system to get p{U,V } = .
pV
Y U U
V
pV
p
V pX
p
X pY
pU
Figure 6.12: The coordinates of the point p in the {X, Y} and {U, V} coordinate systems.
Once we have the test point p represented in the {U, V} coordinate system, we then just
multiply its components by the corresponding eigenvalues λ1 and λ2 (Figure 6.13). Here, the
U-component is multiplied by λ1 = 5, and the V-component
by λ2 =
is multiplied −1. Thus,
pU qU λ1 · pU
the image under M of the test point p{U,V} = is the point q{U,V } = = =
pV qV λ2 · pV
5pU
.
−pV
Y q U ( λ1= 5)
V ( λ2= 1)
p
X
Figure 6.13: Using eigenvalues and corresponding eigenvectors to find the action of M on the
point p in the {U, V} coordinate system.
6.4. Eigenvalues and Eigenvectors 307
We now have the point q represented in the {U, V} coordinate system, that is, q{U,V} . The
final step is to transform the point q back into the original {X, Y} coordinate system to get
qX
q{X,Y } = (Figure 6.14).
qY
Y qX q U
qV
V
qU
qY
p
X
Figure 6.14: Transforming the point q back into the original {X, Y} coordinate system.
These figures graphically illustrate the process of finding the new point using the {U, V}
coordinate system. Now, in order to actually calculate that point, we have to do it algebraically,
using the linear algebra of coordinate transforms.
Changing bases: coordinate transforms. In R2 , we have been using as our basis vectors the
standard basis
1 0
{X, Y} = { , }
0 1
The key to calling this set of vectors a basis is that every vector p can be written in the
{X, Y} coordinate system as
pX 1 0
p{X,Y} = = pX + pY = pX X + pY Y
pY 0 1
But the standard basis isn’t the only possible one. In fact, any two vectors that aren’t multiples
of each other can serve as a basis for R2 .
pX
If we pick U and V as two such vectors, then every vector p that had coordinates in
pY
pU
the {X, Y} basis now has a new set of coordinates in the {U, V} basis. We want to find
pV
those new coordinates.
In general, there is always a matrix transform that will take the representation of a point
expressed in any basis in Rn to any other basis. Here we will illustrate this for the case in R2 in
which the two coordinate systems are {Z, W} and {U, V}.
Suppose we have a vector p and we know its coordinates in {Z, W} space as p{Z,W } . We
would like to know the vector p expressed in the {U, V} coordinate system, that is, p{U,V } . In
other words, we want to find the transformation matrix T such that p{U,V } = T p{Z,W } .
308 Linear Algebra
In order to find the transformation matrix T , the key is to express the “old” coordinates
{Z, W} in terms of the “new” {U, V} coordinates. Assuming that there are a, b, c, d such that
Z = aU + bV
W = cU + dV
we can substitute for Z and W the corresponding expressions in U and V to get an expression
for p in the {U, V} coordinates as
p{Z,W} = pZ Z + pW W
= pZ aU + bV + pW cU + dV
= a · pZ + c · pW U + b · pZ + d · pW V
= pU U + pV V
So
pU a · pZ + c · pW a c pZ
p{U,V} = = =
pV b · pZ + d · pW b d pW
These are four linear equations in four unknowns. We can solve this problem by hand, or we
can use the computer algebra function of SageMath to do all the messy work. The result of this
6.4. Eigenvalues and Eigenvectors 309
Y U
UY
V
VY
UX VX X
If the slope of U is equal to the slope of V, then U and V are multiples of each other, and
therefore they are not a basis for R2 .
Exercise 6.4.6 Show that under the condition UX VY − UY VX = 0, if U is the Y axis (UX = 0),
then V has to be the Y axis as well (VX = 0), and vice versa, which contradicts our assumption
that U and V serves as a basis in R2 .
The action of M. We can now return to our problem of evaluating the action of M on the test
point p = (1, 0.5) in the {X, Y} coordinate system, that is,
pX 1
p{X,Y } = =
pY 0.5
using the eigenvalues and eigenvectors of M. Our first task is to find the test point p expressed
in the coordinate system of the eigenvectors U and V of the matrix M. This is a straightforward
application of the transformation matrix T we just developed.
Here the “old” coordinate system {Z, W} is
1 0
{Z, W} = {X, Y} = { , }
0 1
310 Linear Algebra
and the “new” coordinate system is the system of eigenvectors U and V of the matrix M:
1 1
{U, V} = { , }
2 −1
The coordinate components we need to calculate T are
⎧
1 ⎪
⎪ ZX =1
Z=
ZX
= ⎪
⎪
0 ⎪
⎪ ZY =0
ZY ⎪
⎪
⎪
⎪
WX 0 ⎪
⎪ WX =0
W= = ⎪
⎪
WY 1 ⎨W =1
Y
⇐⇒
UX 1 ⎪
⎪ UX = 1
U= = ⎪
⎪
UY 2 ⎪
⎪ UY = 2
⎪
⎪
⎪
⎪
VX 1 ⎪
⎪ VX = 1
V= = ⎪
⎪
VY −1 ⎩
VY = −1
So the transformation matrix T from the “old” {X, Y} coordinate system to the “new” {U, V}
coordinate system is ⎡ ⎤
1 1
⎢ 3 ⎥
= ⎣3
a c
T =
b d 2 1⎦
−
3 3
Then we can use this transformation matrix T to give us the test point p expressed in the {U, V}
coordinate system, p{U,V } , in terms of p{X,Y } :
⎡ ⎤
1 1
⎢
p{U,V } = T p{X,Y } = ⎣ 3 3 ⎥ 1 = 0.5
2 1 ⎦ 0.5 0.5
−
3 3
Therefore, our test point is
pU 0.5
p{U,V } = = = pU U + pV V
pV 0.5
Now that we have the point expressed in the eigenvector {U, V} coordinate system, we can
use the eigenvalues to calculate the action of the matrix. We said that the action of that
matrix M is that it acts like multiplication by λ1 along its corresponding U eigenvector, and
multiplication by λ2 along its corresponding V eigenvector.
Therefore, in order to find the point, which we will call q, that results from the action of
the matrix M on the test point p, we simply multiply the U-component of p by λ1 and the
V-component of p by λ2 to find q{U,V } :
qU λ 0 qU λ1 · pU 5 × 0.5 2.5
q{U,V } = = D p{U,V} = 1 = = =
qV 0 λ2 qV λ2 · pV −1 × 0.5 −0.5
To confirm this and check our work, let’s calculate the action of M in the {X, Y} coordinate
system and then transform the result into the {U, V} coordinate system and see whether the
two calculations agree.
6.4. Eigenvalues and Eigenvectors 311
then we have ⎡ ⎤
1 1
c1 c2 ⎢ 3 3 ⎥ 1 0
⎣ =
c3 c4 2 1⎦ 0 1
−
3 3
which implies
⎧
⎪ 1 2
⎪
⎪c1 3 + c2 =1 ⎧
⎪
⎪ 3
⎡ ⎤ ⎪
⎪ 1 ⎪
⎪c1 =1
1 2 1 1
⎪
⎪ 2 ⎪
⎪
c + c2 c1 − c 2 ⎨c1 − c2 =0 ⎨c =1
⎢ 13 3 3 3⎥ = 1 0 3 3 2
⎣ 1 =⇒ =⇒
2 1 1⎦ 0 1 ⎪
⎪ 1 2 ⎪
⎪c3 =2
c3 + c 4 c3 − c 4 ⎪
⎪c3 + c4 =0 ⎪
⎪
3 3 3 3 ⎪
⎪ 3 3 ⎩c = −1
⎪
⎪ 1 1 4
⎪
⎩c3 − c4 =1
3 3
So
−1 1 1
T =
2 −1
312 Linear Algebra
Consequently, we can go from p{X,Y} to q{X,Y} by transforming into the {U, V} system by T ,
applying D, and then transforming back into the {X, Y} coordinate system using T −1 :
q{X,Y } = Mp{X,Y } = T −1 DT p{X,Y }
In summary, we can evaluate the action of the matrix M on a point by applying the diagonal
matrix D:
T −1
} ←−−−−−−−−−− q{U,V
q{X,Y }
⏐ ⏐
⏐ ⏐ λ1 0
M⏐ ⏐D =
⏐ ⏐ 0 λ2
T
p{X,Y } −−−−−−−−−−→ p{U,V }
This may seem as though we are not saving much effort, because we also have to figure
out T and T −1 . However, if M is a matrix representing a dynamical system, then we need to
iterate M many times to simulate the dynamics. In this case, the advantage is clear: we need to
calculate and apply T and T −1 only once, and the rest of the iteration process is simply applying
the diagonal matrix D many times, which is easy:
M · · M p{X,Y } = T −1 D
· ·
· · D T p{X,Y }
N N
But what can this mean? It certainly does not look good for our goal of decomposing the
matrix into two 1D multiplications.
In fact, the appearance of imaginary numbers is an infallible sign that we are dealing with a
type of motion that is indecomposable, namely, rotation.
6.4. Eigenvalues and Eigenvectors 313
The reason why complex numbers are associated with rotations can be made intuitive. Think
of a function f that has an eigenvalue λ = −1 along the eigenvector X. The action of f is to
flip the direction of any vector along this axis, for example, it would flip (1, 0) to (−1, 0); see
Figure 6.15.
Y Y
λ= 1
f
X -1 0
X
0 1 1
λ= 1
Figure 6.15: The function f , whose eigenvalue is −1 along its eigenvector (which is the X axis)
flips a positive vector (left) to a negative one (right).
Now think about this function not as a flip, but as a rotation through 180◦ , say counter-
clockwise. And now let’s consider a rotation of 90◦ , say counterclockwise. What would be the
eigenvalue of this 90◦ rotation? It has the property that applying it twice has the effect of a flip,
that is, λ = −1. But as we saw earlier, if f (X) = λX, then the effect of applying f twice is
! " ! " ! "
f f (X) = λ f (X) = λ λX = λ2 X
The 90-degree rotation applied twice is the 180◦ rotation. So if λ90◦ were the eigenvalue of
the 90◦ rotation, it would have to have the property that
(λ90◦ )2 = −1
That, of course, implies that λ90◦ is imaginary. The equation has two solutions,
λ90◦ = ±i
The two solutions +i and −i correspond to the counterclockwise and clockwise rotations
(Figure 6.16).
λ = +i λ = +i
λ = +i
0
0
λ= i λ= i
λ= i
Figure 6.16: Left: the imaginary eigenvalues λ = ±i represent a 90 degree rotation, either
clockwise (λ = −i) or counterclockwise (λ = +i). Right: applying either rotation twice has the
effect of flipping the horizontal vector, that is, multiplying it by −1.
314 Linear Algebra
It makes sense that rotation could not have real eigenvalues, because two real eigenvalues
would mean that the function could be split into two 1D expansions and contractions. But
rotation is an action that is essentially two-dimensional, and therefore indecomposable.
Think about the rotation matrices that we discussed earlier. For example, the matrix
cos θ − sin θ
M=
sin θ cos θ
represents counterclockwise rotation through the angle θ (Figure 6.17).
Figure 6.17: The effect of the rotation matrix M is to rotate the black vector counterclockwise
by θ, producing the red vector.
What are its eigenvalues? Plugging the matrix coefficients into the characteristic equation
(equation (6.3) on page 302),
(a + d) ± (a + d)2 − 4(ad − cb)
λ=
2
we get #! "2 ! "
(2 cos θ) ± 2 cos θ − 4 (cos θ)2 − (−(sin θ)2 )
λ=
2
so
#
1 !1 "2 1
cos θ) ±
(2 4
cos θ −
4
λ= 1 = (cos θ) ± (cos θ)2 − 1
2
= cos θ ± −(sin θ)2
√
= cos θ ± sin θ −1
Therefore, the eigenvalues for this rotation matrix consist of a pair of complex conjugate
values:
λ = cos θ ± sin θ i
30
Y
P = M3P
20
M2P
10
X
30 20 10 10 20 30
10
20
30 MP
Figure 6.18: Applying the matrix M to the point p three times brings it back to p.
1 0
Exercise 6.4.7 Show that M 3 =
0 1
5
Exercise 6.4.8 Using the point p = as the test point, apply M three times to calculate
0
Mp, M 2 p, and M 3 p.
Thus, we confirm that complex eigenvalues imply the existence of rotation. To put it another
way, what is an eigenvector? It’s a vector whose direction is unchanged by the action of M,
which merely stretches, contracts, and/or flips it. But obviously, in the action of a rotation, no
direction stays the same! So a rotation cannot have real eigenvalues or real eigenvectors.
So we can now give a definite answer to our question, are all matrices diagonalizable? The
answer is no. Instead there is a weaker condition that is true: every 2D matrix is either
(1) diagonalizable, which means that it has two real eigenvalues, or
(2) a rotation (possibly together with expansion and/or contraction), which means that it has
a pair of complex conjugate eigenvalues.
316 Linear Algebra
Eigenvalues in n Dimensions
We have focused so far on 2D linear functions f : R2 → R2 and used the variables X and Y to
describe the domain and U and V to describe the codomain.
Now we want to study the n-dimensional case, and we will need a new terminology for the
variables. We want to consider an n-dimensional linear function
f : Rn −→ Rn
We will call the domain variables X = (X1 , X2 , . . . , Xn ) and the codomain variables Y =
(Y1 , Y2 , . . . , Yn ), so
f (X) = Y
f (X1 , X2 , . . . , Xn ) = (Y1 , Y2 , . . . , Yn )
From the definition of linear function, we know that there are constants
a11 , a12 , . . . , a1n , a21 , a22 , . . . , a2n , an1 , an2 , . . . , ann
such that
Y1 = a11 X1 + a12 X2 + · · · + a1n Xn
Y2 = a21 X1 + a22 X2 + · · · + a2n Xn
.. ..
. .
Yn = an1 X1 + an2 X2 + · · · + ann Xn
so that f is represented by the matrix
⎡ ⎤
a11 a12 ... a1n
⎢a21 a22 ... a2n ⎥
⎢ ⎥
⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
an1 an2 ... ann
The application of f to the vector X is then represented by
⎡ ⎤⎛ ⎞ ⎛ ⎞
a11 a12 . . . a1n X1 Y1
⎢a21 a22 . . . a2n ⎥ ⎜X2 ⎟ ⎜Y2 ⎟
⎢ ⎥⎜ ⎟ ⎜ ⎟
⎢ .. .. .. .. ⎥⎜ .. ⎟ = ⎜ .. ⎟
⎣ . . . . ⎦⎝ . ⎠ ⎝ . ⎠
an1 an2 . . . ann Xn Yn
Do n-dimensional linear functions have eigenvalues and eigenvectors? The answer is that the
n-dimensional case is remarkably like the 2-dimensional case. We will need some theorems and
principles from a linear algebra course or text. We will state them here as we need them; the
interested reader is encouraged to look them up for fuller treatment.
The first question is, can we find eigenvalues? Recall that in 2D, we wrote down the equation
ME = λE
where M is the matrix in question and λ and E are the desired eigenvalue and corresponding
eigenvector. In 2D, we wrote this matrix as
a b
M=
c d
6.4. Eigenvalues and Eigenvectors 317
We then brute-force solved the linear equations and got the characteristic polynomial
λ2 + (a + d)λ + (ad − cb) = 0
In order to generalize this process to n dimensions, we have to go back and restate our
argument in more general language. We were looking for eigenvectors and eigenvalues by trying
to solve
ME = λE
2. If f : R4 → R4 ⎛is a linear
⎞ function and −2 is an eigenvalue of f with corresponding
3
⎜1⎟
eigenvector v = ⎜ ⎟
⎝−3⎠, what is f (v)?
−7
−7 3 1
3. The matrix A = has an eigenvector . What is its corresponding eigen-
−18 8 3
value?
⎡ ⎤ ⎛ ⎞
2 −5 −4 2
4. The matrix A = ⎣0 3 2 ⎦ has an eigenvector ⎝−2⎠. What is its corresponding
0 −4 −3 4
eigenvalue?
6.4. Eigenvalues and Eigenvectors 319
7 −5
5. Which of the following are eigenvectors of ? What are their corresponding
10 −8
eigenvalues?
2 2 −1 −2
a) b) c) d)
3 4 2 −2
6. Compute the eigenvalues and, if they exist, eigenvectors of the following matrices:
7 9 0 2 5 −4 3 4
a) b) c) d)
3 1 4 6 2 0.5 2 −1
−1 −2 3 0
e) f)
5 9 0 1
(Hint: You will probably want to rewrite this as a system of equations and “solve
simultaneously.”)
9
b) Explain what your answer to part (a) means about the coordinates of in some
14
nonstandard coordinate system. (Hint: Which one?)
c) Suppose that f : R2 → R2 is a linear function and its eigenvectors are as follows:
2 −3
with eigenvalue 2, and with eigenvalue −3
5 1
2 −3
What is f ( )? What is f ( )?
5 1
9
d) Continuing from part (c), what is f ( )? (Hint: Use your answers to parts (a)
14
and (c) and the two defining properties of a linear function.)
But if XN+1 = kXN , then this would imply kXN = XN . If XN = 0, then k must equal 1.
And k = 1 is a very special value that is atypical and to be avoided; note that if k = 1, every
point is an equilibrium point. As we saw in our discussion of discrete-time dynamical systems
(“Discrete-Time Dynamical Systems” in Chapter 5 on page 225), the fact that f (X) = kX can
be zero only when X = 0 means that the discrete-time system XN+1 = kXN has exactly one
equilibrium point, at X = 0. As we saw, this equilibrium point is stable if |k| < 1, and unstable
if |k| > 1.
on Y :
XN+1 = α XN XN+1 αXN
=⇒ =
YN+1 = β YN YN+1 β YN
Figure 6.19: Repeated applications of a matrix will result in a trajectory that lies along the
direction of the dominant eigenvector. Here both populations are growing.
We can also have declining populations. If one population is growing at 40% a year and the
other is declining at 20% a year, the matrix describing the system is
α 0 1.4 0
=
0 β 0 0.8
If we iterate this matrix repeatedly, we see that there is growth in the X direction and
shrinking in the Y direction, and once again, the growth dynamics are eventually dominated by
the dimension with the larger growth (Figure 6.20).
322 Linear Algebra
200
(X0 , Y0) = (50, 50)
(X8 , Y8) ≈ (738, 8)
Y
0
X
200 400 600 800
Figure 6.20: When one population is declining, the long-term trajectory still lies along the direc-
tion of the dominant eigenvector.
Uncoupled systems are therefore easy to analyze, because the behavior of each variable can
be studied separately and the system then reassembled. Each variable is growing or shrinking
exponentially, and the overall system behavior is just a combination of the behaviors of the
variables making it up. (For simplicity, we use the word “grow” from now on to mean either
positive or negative growth.)
Exercise 6.5.1 If there are two noninteracting populations, one of which is growing at 20% a
year and the other at 25% a year, derive the matrix that describes the dynamics of the system
and simulate a trajectory of this system.
Exercise 6.5.2 If one population is growing at 20% a year and the other is declining at 10%
a year. What would be the matrix that describes this system? Draw a trajectory of this system.
Exercise 6.5.3 For the exercise above, plot time series graphs for each population separately
to show that it is undergoing exponential growth or decline.
To understand this long-term behavior better, we can examine geometrically how a system’s
state vector is transformed by a matrix. Let’s use the matrix
1.4 0
0 0.8
50 50 0
and apply it to three test vectors , , and . We get
0 50 50
1.4 0 50 70 1.4 0 50 70 1.4 0 0 0
= = =
0 0.8 0 0 0 0.8 50 40 0 0.8 50 40
If we plot these (Figure 6.21), we see that if a vector is along the X or Y axis, it just grows
or shrinks when multiplied by the matrix. However, a vector in general position is rotated in
addition to growing.
There is one more case we have to deal with. So far, all the entries in our matrices have
been positive real numbers. We have been thinking of examples in population dynamics, and the
only multipliers that make sense in population dynamics are positive real numbers. Suppose, for
example, that one of the matrix entries was negative. Then when we applied the matrix to a
vector of populations, one of the populations would become negative, which makes no sense in
the real world. But in general, state variables can take on any values, positive or negative, and
in these cases, negative multipliers make sense.
6.5. Linear Discrete-Time Dynamics 323
Y
50
0 X
50
Figure 6.21: Black colors denote the three test vectors. Red colors denote the vectors that result
after applying the matrix to these test vectors.
Y
50
(X0 , Y0) = (50, 50)
0
X
-600 -400 -200 200 400 600
Figure 6.22: When the dominant eigenvalue is negative, repeated applications of the matrix still
result in a trajectory that lies along the dominant eigenvector (here the X axis), while flipping
back and forth between positive and negative X values.
324 Linear Algebra
Note that as the number of iterations grows, the trajectory grows flatter and flatter, and it
clings more and more to the X axis. Thus the long-term behavior of this system will be dominated
by the changes in X, because | − 1.4| > |0.8| and −1.4 is the eigenvalue in the X direction.
For every matrix, let’s define its principal eigenvector as the eigenvector whose eigenvalue
has the largest absolute value. (Since these matrices are diagonal, their eigenvalues are simply
the matrix entries on the main diagonal, and the corresponding eigenvectors are the X and Y
axes.)
We can now make a general statement, which is illustrated by all three examples: the long-
term behavior of an iterated matrix dynamical system is dominated by the principal eigenvalue,
and the state point will evolve until its motion lies along the principal eigenvector.
We can now summarize the behavior of 2D decoupled linear discrete-time systems. These are
the systems represented by the matrix
α 0
0 β
They have a unique equilibrium point at (0, 0), and the stability of that equilibrium point is
determined by the absolute value of α and β:
• If |α| > 1 and |β| > 1, then the equilibrium point is purely unstable.
• If |α| < 1 and |β| < 1, then the equilibrium point is purely stable.
• If |α| < 1 and |β| > 1 (or the reverse, |α| > 1 and |β| < 1), then the equilibrium point is
an unstable saddle point.
Moreover, the signs of α and β determine whether the state point oscillates on its way toward
or away from the equilibrium point.
• If α < 0, there is oscillation along the X axis.
• If β < 0, there is oscillation along the Y axis.
• If α > 0, there is no oscillation along the X axis.
• If β > 0, there is no oscillation along the Y axis.
Exercise 6.5.4 By determining the absolute value and the signs of α and β, predict the long-
term behavior of the four discrete dynamical systems described by the following matrices:
−2 0 1.3 0 −0.2 0 0.5 0
a) b) c) d)
0 0.5 0 0.6 0 0.8 0 0.8
and then verify this prediction by iterating the matrix to simulate the dynamical systems.
In the more general case, of course, X and Y are coupled: the next X value depends on both
the previous X value and the previous Y value, and so does the next Y value. This gives us a
matrix
XN+1 = aXN + bYN XN+1 aXN + bYN
=⇒ =
YN+1 = cXN + d YN YN+1 cXN + d YN
6.5. Linear Discrete-Time Dynamics 325
We previously saw a model of black bear populations in which the juvenile and adult populations
in the (N + 1)st year were given as a linear function of the populations in the Nth year:
JN+1 = 0.65JN + 0.5AN JN+1 0.65JN + 0.5AN
=⇒ =
AN+1 = 0.25JN + 0.9AN AN+1 0.25JN + 0.9AN
where 0.65 is the fraction of juveniles who remain alive as juveniles in the next year, and 0.25
is the fraction of juveniles who mature into adults that year. Furthermore, 0.5 is the birth rate
with which adults give birth to juveniles, and 0.9 is the fraction of adults who survive into the
next year.
The matrix form is
JN+1 0.65 0.5 JN
=
AN+1 0.25 0.9 AN
We saw that if we iterated M repeatedly, the juvenile and adult populations went to infinity
(Figure 6.4 on page 292). We will now explain why that is the case by looking at the eigenvalues
and corresponding eigenvectors of M.
First, let’s find the eigenvalues for the matrix
0.65 0.5
M=
0.25 0.9
by plugging the matrix coefficients into the characteristic equation (equation (6.3) on page 302):
(0.65 + 0.9) ± (0.65 + 0.9)2 − 4(0.65 × 0.9 − 0.25 × 0.5)
λ=
2
326 Linear Algebra
1.6 ± (0.75)2 1.55 ± 0.75
= =
2 2
= (1.15, 0.4)
Therefore, the two eigenvalues are
λ1 = 1.15 and λ2 = 0.4
Note that these are real numbers and that |λ1 | > 1 and |λ2 | < 1. Therefore, the behavior
must have one stable direction and one unstable direction. In other words, it must be a saddle
point.
To find the axes of the saddle point, we calculate the eigenvectors U and V corresponding
will
J
to each eigenvalue. Let’s say that U = . The matrix M acts like multiplication by λ1 along
A
U, which means that
MU = λ1 U
So we can say
0.65 0.5 J 0.65J + 0.5A J 1.15J
MU = = = λ1 U = 1.15 =
0.25 0.9 A 0.25J + 0.9A A 1.15A
So
0.65J + 0.5A = 1.15J =⇒ A=J
0.25J + 0.9A = 1.15A =⇒ A=J
Now A = J is the equation for a line in (J, A) space that has slope = +1. This line
is the axis
1
U. We can choose any vector on the U axis to represent it, for example the vector , which
1
is then an eigenvector of the matrix M corresponding to the eigenvalue λ1 = 1.15.
The eigenvector corresponding to the second eigenvalue λ2 = 0.4 can be found in a similar
manner. It satisfies
MV = λ2 V
J
Let’s assume V = . Then
A
0.65 0.5 J 0.65J + 0.5A J 0.4J
MV = = = λ2 V = 0.4 =
0.25 0.9 A 0.25J + 0.9A A 0.4A
So
0.65J + 0.5A = 0.4J =⇒ A = −0.5J
0.25J + 0.9A = 0.4A =⇒ A = −0.5J
The equation A = −0.5J is the equation for a line in (J, A) space that has slope = −0.5.
This line axis V. We can choose any vector on the V axis to represent it, for example the
is the
−2
vector , which is then an eigenvector of the matrix M corresponding to the eigenvalue
1
λ2 = 0.4.
If we plot these eigenvectors and choose a point, let’s say
J0 10
=
A0 50
6.5. Linear Discrete-Time Dynamics 327
as our initial condition and apply the matrix on this vector once, we get
J1 0.65 0.5 10 0.65 × 10 + 0.5 × 50 31.5
= = =
A1 0.25 0.9 50 0.25 × 10 + 0.9 × 50 47.5
We see that the action of this matrix is to push the state point closer to the U axis while
moving away from the V axis. Thus, for this initial condition, the action of the matrix is to
increase the number of juveniles and decrease the number of adults in the first year (Figure 6.23).
V ( λ2= 0.4)
J
50 50
Figure 6.23: One application of the matrix M to the point (J0 , A0 ) takes it to (J1 , A1 ) which is
closer to the dominant eigenvector U axis and further from the V axis.
If we iterate the matrix many times from two different initial conditions, we see that successive
points march toward the U axis and out along it. Since the U axis is the line A = J, we can say
that the populations of the two age groups approach a 1 : 1 ratio, while the whole population
grows larger and larger (Figure 6.24).
400 A U 400 A U
(J0, A0) = (10, 50)
200 200
V V (J0, A0) = (100, 20)
Figure 6.24: After repeated iterations of the matrix M, the long-term trajectory lies along the
direction of the dominant eigenvector U axis, regardless of the initial conditions. Eventually, the
ratio of adults to juveniles approaches a constant value.
Finally, our theoretical prediction of “saddle point” can be confirmed by applying the matrix
repeatedly to a set of initial conditions lying on a circle. In this way, we can construct a graphical
picture of the action of M. We see that the action is to squeeze along the V axis and expand
along the U axis (Figure 6.25).
Notice an interesting feature of Figure 6.25. We started with a circle of initial conditions, but
by the fifth iteration, the original circle had flattened nearly into a line, and that line was lying
along the principal eigenvector.
328 Linear Algebra
A U
Figure 6.25: One application of the J-A matrix to a circle of initial conditions (black dots)
transforms them into an oval (dark gray dots). Applying the matrix for the second time, it
flattens the oval even further and expands it along the U axis (light gray dots). By the fifth
iteration (red dots), the initial circle has been transformed into a line lying along the principal
eigenvector and expanding in that direction.
Let’s consider the alternative scenario for the black bear, in a bad year.
We modeled “bad year” by lowering the birth rate from 0.5 to 0.4, and increasing the death
rate for juveniles to 40%, with 50% of them remaining juvenile and only 10% maturing to adults.
The juvenile population dynamics are
JN+1 = 0.5JN + 0.4AN
We also increased the adult death rate to 20%, and therefore, the survival rate will be
1 − 20% = 80%. The adult population dynamics are therefore
AN+1 = 0.1JN + 0.8AN
Putting these together, we get
JN+1 0.5JN + 0.4AN
=
AN+1 0.1JN + 0.8AN
The matrix that describes the “bad year” dynamics is
0.5 0.4
M bad =
0.1 0.8
Recall that when we iterated M bad repeatedly, both juvenile and adult populations appeared
to go to extinction (Figure 6.5 on page 292). We can explain this long-term behavior by studying
the eigenvalues and corresponding eigenvectors of M bad .
6.5. Linear Discrete-Time Dynamics 329
What are the dynamics of this system? First, let’s find the eigenvalues for the matrix by
plugging the matrix coefficients into the characteristic equation
(a + d) ± (a + d)2 − 4(ad − cb)
λ=
2
We get
(0.5 + 0.8) ± (0.5 + 0.8)2 − 4(0.5 × 0.8 − 0.1 × 0.4)
λ=
2
1.3 ± (0.25) 1.3 ± 0.5
= =
2 2
= (0.9, 0.4)
Therefore, the two eigenvalues are
λ1 = 0.9 and λ2 = 0.4
Note that these are real numbers and that both |λ1 | < 1 and |λ2 | < 1. Therefore, the
behavior must have two stable directions. In other words, it must be a purely stable node.
To find the axes of the node, we will calculate the eigenvectors U and V corresponding to
J
each eigenvalue. Let’s say U = . The matrix M bad acts like multiplication by λ1 along U,
A
which means that
M bad U = λ1 U
So we can say
0.5 0.4 J 0.5J + 0.4A J 0.9J
M bad U = = = λ1 U = 0.9 =
0.1 0.8 A 0.1J + 0.8A A 0.9A
So
0.5J + 0.4A = 0.9J =⇒ A=J
0.1J + 0.8A = 0.9A =⇒ A=J
Now “A = J” is the equation for the line in (J, A) space that has slope = +1. This line is
the
1
axis U. We can choose any vector on the U axis to represent it, for example the vector ,
1
which is then an eigenvector of the matrix M bad corresponding to the eigenvalue λ1 = 0.9.
The eigenvector corresponding to the second eigenvalue λ2 = 0.4 can be found in a similar
manner. It satisfies
M bad V = λ2 V
J
Let’s assume V = . Then
A
0.5 0.4 J 0.5J + 0.4A J 0.4J
M bad V = = = λ2 V = 0.4 =
0.1 0.8 A 0.1J + 0.8A A 0.4A
So
0.5J + 0.4A = 0.4J =⇒ A = −0.25J
0.1J + 0.8A = 0.4A =⇒ A = −0.25J
The equation A = −0.25J is the equation for the line in (J, A) space that has slope = −0.25.
This line is the axis V. We can choose any vector on the V axis to represent it, for example the
330 Linear Algebra
−4
vector , which is then an eigenvector of the matrix M bad corresponding to the eigenvalue
1
λ2 = 0.4.
If we plot these eigenvectors and choose a point
J0 10
=
A0 50
as our initial condition and apply the matrix to this vector once, we get the population of the
two age groups in the next year:
J1 0.5 0.4 10 0.5 × 10 + 0.4 × 50 25
= = =
A1 0.1 0.8 50 0.1 × 10 + 0.8 × 50 41
We see that the action of this matrix is to push the state point closer to the U axis while
moving away from the V axis. So the action of M bad is to move the state point toward the
U axis, but in contrast to the good year case, M bad moves the state point to a lower V-value
(Figure 6.26).
25
25
J
50
Figure 6.26: One application of the matrix M bad to the point (J0 , A0 ) takes it to (J1 , A1 ) which
is closer to both the U and V axes.
If we iterate M bad repeatedly, the state point always walks toward the U axis while approaching
(0, 0), which means extinction (Figure 6.27).
V V
J J
25 50 25 50
Figure 6.27: After repeated iterations of the matrix M bad , the ratio of adults to juveniles is
approaching a constant value, regardless of the initial conditions. Notice that the both popula-
tions are decreasing.
6.5. Linear Discrete-Time Dynamics 331
Finally, we confirm our theoretical prediction of “stable node” by applying the M bad matrix
repeatedly to a set of initial conditions that lie on a circle. The effect is to collapse the circle
onto the U axis along the direction of the V axis while shrinking along the U axis. The overall
effect is to shrink the circle to the point (0, 0) (Figure 6.28).
A
U
Figure 6.28: One application of the M bad matrix to a circle of initial conditions (black dots)
transforms them into an oval (dark gray dots). Applying the matrix for the second time, it
flattens the oval even further and shrinks it along the U axis (light gray dots). By the fifth
iteration (red dots), the initial circle has been transformed into a line lying along the principal
eigenvector and continually shrinking along that direction.
We also simulated another Leslie matrix for a two-stage population. In this case, 10% of juveniles
remain juvenile, 40% become adults, and the rest die. The birth rate is 1.4 offspring per adult,
and only 20% of adults survive each year. This gives us the matrix
0.1 1.4
M osc =
0.4 0.2
Repeated iteration of M osc resulted in an oscillatory approach to a stable equilibrium point at
(0, 0) (Figure 6.6 on page 293). We can understand this behavior by considering the eigenvalues
and corresponding eigenvectors of M osc .
First, let’s find the eigenvalues for the matrix by plugging the matrix coefficients into the
characteristic equation
(a + d) ± (a + d)2 − 4(ad − cb)
λ=
2
We get
(0.1 + 0.2) ± (0.1 + 0.2)2 − 4(0.1 × 0.2 − 0.4 × 1.4)
λ=
2
0.3 ± (2.25) 0.3 ± 1.5
= =
2 2
= (0.9, −0.6)
332 Linear Algebra
which implies that the eigenvector U lies on the line A = 1.75J, which has slope 1.75. The
vector (J, A) = (4, 7) will serve nicely as an eigenvector on this line.
For the second eigenvector, we solve
M osc V = λ2 V
We can say that
0.1 1.4 J 0.1J + 1.4A J −0.6J
Mosc V = = = λ2 U = −0.6 =
0.4 0.2 A 0.4J + 0.2A A −0.6A
yielding
0.1J + 1.4A = −0.6J =⇒ A = −0.5J
0.4J + 0.2A = −0.6A =⇒ A = −0.5J
The second eigenvector is therefore any vector on the line A = −0.5J, which is the line of
slope −0.5. For example, we can take (J, A) = (2, −1) as our eigenvector V.
Having calculated the eigenvalues and the eigenvectors, we can now make the theoretical
prediction that this matrix will shrink slowly along U and collapse more quickly toward the U
axis in an oscillating manner. The presence of a negative eigenvalue means that the matrix will
flip the state point back and forth on either side of the U axis. This flipping will occur with
ever-decreasing amplitudes, since |λ2 | < 1.
Let’s verify these predictions by applying the matrix to a test point (Figure 6.29). We see,
exactly as predicted, that the state point oscillates around the U axis with diminishing amplitude
as it approaches the origin.
Finally, if we apply the matrix repeatedly to a circle of initial conditions, we see that the first
iteration has flattened the circle into an oval, which is pointing below the U axis. The second
iteration flattens and shrinks the oval further and tilts it upward, so that it is pointing above
6.5. Linear Discrete-Time Dynamics 333
A
100
U ( λ1= 0.9)
V ( λ2= -0.6 )
J
100 200
Figure 6.29: Iteration of the matrix M osc causes the state point to diminish continually along
the U axis, while also diminishing along the V axis, but in an oscillatory manner.
the U axis, while the third iteration further shrinks and flattens the oval and tilts it back to
point below the U axis. The oscillatory tilt above and below the U axis is caused by the negative
eigenvalue along the V direction (Figure 6.30).
A A A
0 0 0
2 2
1 1 1
J J 3 J
Figure 6.30: Starting with a circle of initial conditions (0), repeated action of the matrix M osc
flattens the circle into an ellipse (1), and flips the ever-flattening ellipse back and forth across
the U axis, in a diminishing manner (2, 3).
Thus the overall behavior is an oscillatory approach to the stable equilibrium point at (0, 0),
so both populations shrink to zero.
In the previous example of M osc , the black bear population collapse is due partly to the low birth
rate of 1.4 offspring per adult. If we raise this birth rate to 2 offspring per adult, we get the
matrix
0.1 2
Mosc2 =
0.4 0.2
and this new system has a distinctly different behavior. Now we have an unstable oscillatory
equilibrium point (Figure 6.31).
334 Linear Algebra
A initial condition
100
0
J
100 200 300
Figure 6.31: Iteration of the matrix M osc2 results in a trajectory that is oscillatory/stable in one
direction, and expanding (unstable) in the other.
Exercise 6.5.5 Calculate the eigenvalues and eigenvectors of this matrix with increased birth
rate, and use them to explain the behavior in Figure 6.31.
We modeled the susceptible and infected populations in an epidemic, using a Markov process
model (“Neutral Equilibria” on page 293). We saw that when we iterated the matrix M SI repeat-
edly, we observed that the system would go to a stable equilibrium, but the equilibrium depended
on the initial condition (Figure 6.7 on page 294). We can explain why this occurs by studying
the eigenvalues and corresponding eigenvectors of M SI . We will see that in Markov process
models, there is always an eigenvalue λ = 1 that gives us a line of equilibrium points along its
corresponding eigenvector.
As before, the discrete-time dynamics for this S-I compartmental model is written in matrix
form as
SN+1 1−a b SN
=
IN+1 a 1 − b IN
We made the assumption that at each time point (such as day, week, or month), a constant
fraction a of the susceptibles become infected and a constant fraction b of the infecteds recover
to become susceptible again. If a is the fraction of S that become I, then the fraction of S that
remain S must be 1 − a. If b is the fraction of I that become S, then the fraction of I that
remain I must be 1 − b.
We chose a = 0.1 and b = 0.2, giving us the matrix
0.9 0.2
M SI =
0.1 0.8
What are the dynamics of this system? Let’s find the eigenvalues for this matrix by plugging
the matrix coefficients into the characteristic equation (equation (6.3) on page 302); we get
(0.9 + 0.8) ± (0.9 + 0.8)2 − 4(0.9 × 0.8 − 0.1 × 0.2)
λ=
√ 2
1.7 ± 0.09 1.7 ± 0.3
= =
2 2
= (1, 0.7)
6.5. Linear Discrete-Time Dynamics 335
So
0.9S + 0.2I = S =⇒ I = 0.5S
0.1S + 0.8I = I =⇒ I = 0.5S
Now I = 0.5S is the equation of a line in (S, I) space that has slope 0.5. This line
is the axis
2
U. We can choose any vector on the U axis to represent it, for example the vector , which
1
is then called an eigenvector of the matrix M SI corresponding the eigenvalue λ1 = 1.
The eigenvector corresponding to the second eigenvalue λ2 = 0.7 can be found in a similar
manner. It satisfies
M SI V = λ2 V
S
Let’s assume V = . Then
I
0.9 0.2 S 0.9S + 0.2I S 0.7S
M SI V = = = λ2 V = 0.7 =
0.1 0.8 I 0.1S + 0.8I I 0.7I
So
0.9S + 0.2I = 0.7S =⇒ I = −S
0.1S + 0.8I = 0.7I =⇒ I = −S
Since I = −S is the equation of a line in (S, I) space that has slope = −1, this lineis the
1
axis V. We can choose any vector on the U axis to represent it, for example the vector ,
−1
which is then called an eigenvector of the matrix M SI corresponding the eigenvalue λ2 = 0.7.
I (S0 , I0) = (50, 50) (S1 , I1) = (55, 45) I (S0 , I0) = (50, 50)
50 50
U ( λ2= 1 ) U
V ( λ2= 0.7 )
V
50
S 50
S
Figure 6.32: Left: Application of the S-I matrix M SI to the initial condition (S0 , I0 ) results in
the state point (S1 , I1 ), closer to the U axis but at a constant distance from the V axis. Right:
Repeated applications of M SI approach the U axis while remaining a constant distance from V.
Indeed, if we start with any initial condition on the line parallel to the V axis passing through
(50, 50), the dynamical system will converge to the same equilibrium point. For example, if we
take an initial condition on the other side of the U axis, say (90, 10), we see that the action of
the matrix is to walk the point up along the V direction toward the U axis (Figure 6.33).
V U U
50 V 50
S S
-50 50 100 -50 50 100
Figure 6.33: The U axis is a line of stable equilibrium points for the matrix M SI . Any initial
condition on a given line parallel to the V axis will approach the same equilibrium point on the
U axis.
Thus it is clear from both theoretical prediction and experiments that it is only the U com-
ponent of the initial condition that determines the final equilibrium point.
6.5. Linear Discrete-Time Dynamics 337
Therefore, if we start from an initial condition along a different line, say (10, 60), we see that
the action of M SI is to walk the state point toward a different equilibrium point on the U axis
(Figure 6.34).
I
100 (S0 , I0) = (10, 90)
(S0 , I0) = (10, 60)
V 50 U
-50 50 100
S
Figure 6.34: Two trajectories (red and black) starting from different initial conditions that do
not lie on the same line parallel to the V axis, will both approach the U axis but toward different
equilibrium points.
An effective way to visualize the action of any matrix M is to take a large number of initial
conditions in a circle and look at what repeated iterations of M do to the circle.
When we make this plot for the S-I matrix, we see that the action of M SI is to flatten the
circle into an oval. If we apply M SI repeatedly, the oval gets thinner and thinner and shifts its axis
slightly until it begins to resemble a thick flat line lying exactly along the U axis (Figure 6.35).
I I
V
U
S S
Figure 6.35: Left: one application of the S-I matrix to a circle of initial conditions (dark gray)
transforms them into a oval. Right: repeated applications flatten and rotate the oval. By the
tenth iteration (red dots), the initial circle has been transformed into a line lying along the
principal eigenvector.
338 Linear Algebra
Thus, we see here again the fact that repeated iteration of a matrix from any set of initial
conditions results in a thin oval whose principal axis moves closer and closer to the principal
eigenvector. Finally, for a large number of iterations, the resulting structure resembles a line,
a thin finger pointing along the principal eigenvector. And so once again, when you iterate a
matrix many times, you are looking at its principal eigenvector.
Markov processes Note that in this case, there are no births or deaths; the number of people
remains constant. Therefore, the sum of the entries in each column of the matrix must be equal
to 1, because each person in the compartment must go somewhere.
fraction of fraction of 1-a b
S who remain S I who become S
fraction of fraction of
S who become I I who remain I a 1-b
Σ =1 Σ =1 Σ =1 Σ =1
A matrix whose columns all add up to 1 is called a stochastic matrix. It’s called “stochastic”
(which means involving chance or probability) because we can interpret the matrix entries as
transition probabilities from one compartment to another.
We can imagine a large number of particles, in this case people, hopping from one compart-
ment to another, with hopping probabilities given by the elements of the matrix. Every matrix of
transition probabilities like this one will have the property that the columns all add to 1, because
probabilities must add to 1. When we interpret the matrix as a matrix of transition probabilities,
the process is called a Markov process.
In all such processes, λ = 1 will always be an eigenvalue, and hence all equilibria are neutral
equilibria. In a neutral equilibrium system, the behavior will always be to go to a stable final
state, but the stable final state depends on the initial condition.
We saw that the three-variable locust model consists of three stages: eggs (E), hoppers (juve-
niles) (H), and adults (A) (Bodine et al. 2014). The egg and hopper stages each last one year,
with 2% of eggs surviving to become hoppers and 5% of hoppers surviving to become adults.
Adults lay 1000 eggs (as before, we are modeling only females) and then die. The model was
EN+1 = 0 · EN + 0 · HN + 1000AN ⎡ ⎤
0 0 1000
HN+1 = 0.02EN + 0 · HN + 0 · AN =⇒ L = ⎣0.02 0 0 ⎦
AN+1 = 0 · EN + 0.05HN + 0 · AN 0 0.05 0
We saw that the model gave us neutral oscillations, which depended on the initial conditions
(Figure 6.8 on page 295). We can confirm this by plotting the trajectory of repeated applications
of L to two different initial conditions in 3-dimensional (E, H, A) state space (Figure 6.36).
6.5. Linear Discrete-Time Dynamics 339
40
adults (A)
20
1000
0
0
500
20000
hoppers (H)
eggs (E) 40000
0
Figure 6.36: Two trajectories resulting from simulations of the locust population model with two
different initial conditions.
To explain this neutral oscillatory behavior, we need to study the eigenvalues of the matrix
L; see Exercise 6.5.6 below.
Exercise 6.5.6 Use SageMath to calculate the eigenvalues of L. Verify that they are
√ √
1 3 1 3
λ1 = 1, λ2 = − + i, λ3 = − − i
2 2 2 2
What do the eigenvalues tell you about the behavior you have just seen? Relate each of the
phenomena you saw above to specific properties of the eigenvalues.
Lessons
We have seen that the equilibrium point behavior of a linear discrete-time dynamical system is
entirely determined by the eigenvalue and eigenvector decomposition of its matrix representation.
There is also an important lesson about the long-term behavior of linear (or matrix) discrete-
time systems that we remarked on in each of our examples: if you take a blob of points and
apply a matrix M to them many times, you will be looking at the principal eigenvector of M.
Put another way, the long-term behavior of a linear discrete-time system is dominated by its
largest eigenvalue and the corresponding eigenvector.
There is a nice algebraic way to see why this is true. Suppose our n-dimensional dynamical
system is
XN+1 = f (XN ) = MXN
If λ1 is even slightly larger than λ2 , then λ1001 will be much larger then λ100
2 . Therefore, the
dynamics along the principal eigenvector will dominate the long-term behavior of the matrix.
This principle is beautifully illustrated in the following example.
1. A swan population can be subdivided into young swans (Y ) and mature swans (M). We
can then set up a discrete-time model of these populations as follows:
YN+1 0.57 1.5 YN
=
MN+1 0.25 0.88 MN
a) Explain the biological meaning of each of the four numbers in the matrix of this
model.
b) It turns out that the eigenvectors of this matrix
are
approximately as follows (you
can
1.9 1.9
check this using SageMath if you wish): with eigenvalue 0.09 and
−0.6 1.0
with eigenvalue 1.36. What will happen to the swan population in the long run?
c) Many years in the future, if there are 2000 mature swans, approximately how many
young swans would you expect there to be?
6.6. Google PageRank 341
2. A blobfish population consists of juveniles and adults. Each year, 50% of juveniles become
adults and 10% die. Adults have a 75% chance of surviving from one year to the next
and have, on average, four offspring a year.
a) Write a discrete-time matrix model describing this population.
b) If the population this year consists of 50 juveniles and 35 adults, what will next year’s
population be?
c) What will happen to the population in the long run?
i points to j
Note that the sum of the elements in each row i is the total number of pages that point to
page i , and the sum of the elements in each column j is the total number of pages that page j
points to:
⎡ ⎤
a11 . . . a1j . . . a1n
⎢ .. .. .. ⎥
⎢ . . . ⎥ total number of
⎢ ⎥ sum of the i th row
⎢ ai1 . . . aij . . . ain ⎥ = pages that
⎢ ⎥ ai1 + · · · + aij + · · · + ain
⎢ . . . ⎥ point to page i
⎣ .. .. .. ⎦
an1 . . . anj . . . ann
⎡ ⎤
a11 . . . a1j . . . a1n
⎢ .. .. .. ⎥
⎢ . . . ⎥ total number of
⎢ ⎥ sum of the jth column
⎢ ai1 . . . aij . . . ain ⎥ = pages that
⎢ ⎥ a1j + · · · + aij + · · · + anj
⎢ . . . ⎥ page j points to
⎣ .. .. .. ⎦
an1 . . . anj . . . ann
Then we have to account for the fact that a webpage might point to many other pages. A
“vote” from a selective page counts more than a “vote” from a page that points to lots of other
pages, so if one page points to many others, the importance score that it passes on to the other
pages must be diluted by the total number of outbound links. For example, if page j points to
page i , then aij = 1. But this will need to be diluted by the total number of pages that page j
points to, which is a1j + a2j + · · · + anj .
So we define Li,j as the normalized weight of page j’s vote on page i :
page j’s vote on page i (0 or 1) aij
Li,j = =
total number of pages that page j pointed to a1j + a2j + · · · + anj
We now define the “links to” matrix
⎡ ⎤
L11 L12 ... L1n
⎢L21 L22 ... L2n ⎥
⎢ ⎥
L = [Lij ] = ⎢ . .. .. .. ⎥
⎣ .. . . . ⎦
Ln1 Ln2 ... Lnn
Then let’s define the PageRank vector as the vector made up of the “importance” of each
page p1 , p2 , . . . , pn . This is the vector that the search engine needs to calculate to assign an
importance to each page in the network. Its components P R1 , P R2 , . . . , P Rn are the importance
scores of each page. The higher the score, the more important the page. The more important
the page, the higher it appears in the search engine results:
⎛ ⎞ ⎛ ⎞
importance of p1 P R1
⎜importance of p2 ⎟ ⎜P R2 ⎟
⎜ ⎟ ⎜ ⎟
PR = ⎜ .. ⎟ = ⎜ .. ⎟
⎝ . ⎠ ⎝ . ⎠
importance of pn P Rn
6.6. Google PageRank 343
To start with, we will assume an initial condition, which we will call “old PR,” in which all n
pages have equal importance. We will normalize the total importance to 1, so
⎛1⎞
⎛ ⎞ ⎜n⎟
old P R1 ⎜1⎟
⎜old P R2 ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎜ n
⎟
⎟
old PR = ⎜ ⎟ = ⎜
⎠ ⎜ .. ⎟
..
⎝ . ⎟
⎜.⎟
old P Rn ⎝ ⎠
1
n
Then we update the old PR vector. The new value of P Ri is the sum of the normalized
incoming links to page i . In this way, each page that points to page i “passes on” a fraction of
its own importance to page i .
That is, we update the page rank P Ri by assigning the new value
! " ! " ! "
new P Ri = Li1 · old P R1 + Li2 · old P R2 + · · · + Lin · old P Rn
which is the sum of the normalized weight of each page j’s vote on page i × page j’s page rank.
If we do this update for each of the old page ranks, we get a “new” page rank vector
⎛ ⎞ ⎛ ⎞
new P R1 L11 · (old P R1 ) + L12 · (old P R2 ) + · · · + L1n · (old P Rn )
⎜new P R2 ⎟ ⎜L21 · (old P R1 ) + L22 · (old P R2 ) + · · · + L2n · (old P Rn )⎟
⎜ ⎟ ⎜ ⎟
new PR = ⎜ .. ⎟=⎜ .. ⎟
⎝ . ⎠ ⎝ . ⎠
new P Rn Ln1 · (old P R1 ) + Ln2 · (old P R2 ) + · · · + Lnn · (old P Rn )
This can be rewritten as
⎛ ⎞ ⎡ ⎤⎛ ⎞
new P R1 L11 L12 ... L1n old P R1
⎜new P R2 ⎟ ⎢L21 L22 ... L2n ⎥⎜old P R2 ⎟
⎜ ⎟ ⎢ ⎥⎜ ⎟
new PR = ⎜ .. ⎟ = ⎢ .. .. .. .. ⎥⎜ .. ⎟
⎝ . ⎠ ⎣ . . . . ⎦⎝ . ⎠
new P Rn Ln1 Ln2 ... Lnn old P Rn
or in vector form ! "
new PR = L old PR
But as Page and Brin saw, this is only a first estimate. The next question is, how important
are the sites that pointed to the sites that pointed to site i ? To take that factor into account,
we replace the “new” page rank vector by a “new new” page rank vector
⎛ ⎞ ⎡ ⎤⎛ ⎞
new new P R1 L11 L12 ... L1n new P R1
⎜new new P R2 ⎟ ⎢L21 L22 ... L2n ⎥⎜new P R2 ⎟
⎜ ⎟ ⎢ ⎥⎜ ⎟
new new PR = ⎜ .. ⎟=⎢ . .. .. .. ⎥⎜ .. ⎟
⎝ . ⎠ ⎣ .. . . . ⎦ ⎝ . ⎠
new new P Rn Ln1 Ln2 ... Lnn new P Rn
⎡ ⎤⎡ ⎤⎛ ⎞
L11 L12 ... L1n L11 L12 ... L1n old P R1
⎢L21 L22 ... L2n ⎥⎢L21 L22 ... L2n ⎥⎜old P R2 ⎟
⎢ ⎥⎢ ⎥⎜ ⎟
=⎢ . .. .. .. ⎥⎢ .. .. .. .. ⎥⎜ .. ⎟
⎣ .. . . . ⎦⎣ . . . . ⎦⎝ . ⎠
Ln1 Ln2 ... Lnn Ln1 Ln2 ... Lnn old P Rn
In other words, the infinite regress that is contained in the idea of “sites that are linked to by
sites that are linked to by . . . ” is actually a model for a discrete-time dynamical system that is
an iteration of the “link to” matrix.
What happens when we iterate this link matrix L many times? Suppose the eigenvectors of
L are E1 , E2 , . . . , En , in descending order of their corresponding eigenvalues, λ1 , λ2 , . . . , λn . So
λ1 is the largest eigenvalue.
The action of applying L to the initial condition “old PR” many times is then dominated by
the principal eigenvector of L, which is E1 . Indeed, there are constants c1 , c2 , . . . , cn that enable
us to express the initial condition
old PR = c1 E1 + c2 E2 + · · · + cn En
in the eigenvector basis {E1 , E2 , . . . , En }. After iterating the matrix many times, say 100, we
get
! " ! " ! " ! "
L100 old PR = L100 c1 E1 + L100 c2 E2 + · · · + L100 cn En
= c1 λ100 100 100
1 E1 + c2 λ2 E2 + · · · + cn λn En
and its components P R1 , P R2 , . . . , P Rn are the page ranks, the final importance scores assigned
to each page. When you search a term, Google presents pages to you in the order of their page
rank eigenvector.
Surfer Model
In our discussion of Markov processes, we saw that a Markov process can be represented by a
matrix (M) each element mij of which is the probability of a person “hopping” from compartment
j to compartment i in the next time interval.
The long-term behavior of the system is given by the iteration of the matrix, which will tend
to some outcome. As we saw, the results of that iteration are determined by the eigenvector
and eigenvalue decomposition of the matrix.
Brin and Page realized that their “links to” matrix could also be seen as a model of a Markov
process, in which a random web surfer “hops” from one page j to another page i with a probability
equal to the normalized weight of page j’s vote on page i , which is Lij .
Notice that the “links to” matrix satisfies the key condition that defines a “stochastic” matrix:
each column adds up to 1. For example, the elements of the jth column of the “links to” matrix
are L1j , . . . Lij , . . . Lnj . Recall that the definition of Lij is
page j’s vote on page i (0 or 1) aij
Li,j = =
total number of pages that page j pointed to a1j + · · · + anj
6.6. Google PageRank 345
⎡ ⎤
L11 . . . L1j . . . L1n
⎢ .. .. .. ⎥
⎢ . . . ⎥
⎢ ⎥ sum of the jth column
⎢ Li1 . . . Lij . . . Lin ⎥ = 1
⎢ ⎥ L1j + · · · + Lij + · · · + Lnj
⎢ . .. .. ⎥
⎣ .. . . ⎦
Ln1 . . . Lnj . . . Lnn
So the page rank vector can be interpreted in this surfer model as the probability that the
surfer, clicking randomly on each page, will end up on a given page.
B D C
where the arrow means “points to.” We can then derive the “points to” matrix, more commonly
called a directed adjacency matrix because it shows which pages are linked and the direction of
the link. For example, from the diagram, we know that page A points to page C; therefore, in
the “points to” matrix, we have aC←A = 1:
⎡ ⎤ ⎡ ⎤
aA←A aA←B aA←C aA←D 0 1 0 1
⎢aB←A aB←D ⎥ ⎢ ⎥
“points to” matrix = ⎢
aB←B aB←C ⎥ = ⎢1 0 1 1⎥
⎣ aC←A aC←B aC←C aC←D ⎦ ⎣ 1 0 0 0⎦
aD←A aD←B aD←C aD←D 1 0 1 0
From the “points to” matrix, we can derive the “links to” matrix L by normalizing each “vote”
from page j to page i by the total number of “votes” cast by page j. So for example, the sum
346 Linear Algebra
of the first column of the “points to” matrix is the total number of pages that page A points to,
which is 3. So each vote that A casts has to be divided by 3.
0 1 0 1
1 0 1 1
1 0 0 0
1 0 1 0
∑= 3 ∑ = 1 ∑ = 2 ∑ = 2
1
A 1
3 3
1 1 1
2 3
1 1
B 2
D 2
C
1
2
Note that the iteration process stabilizes after only a few iterations and reaches a “stationary
distribution” that is the principal eigenvector, which gives us the final page ranks.
6.6. Google PageRank 347
Food Webs
Another example of a Google-style eigenvector-based ranking system can be found in the analysis
of food webs in ecology.
In a food web, nutrients move from one species to another. In an application of the Google
eigenvector concept, Allesina and Pascual wanted to find out whether a given species was
“important for co-extinctions” (Allesina and Pascual 2009). That is, they wanted to know which
species had the biggest impact on the food web and whose loss would therefore be the most
catastrophic.
If the food web has species 1, 2, . . . , k that interact with each other, we will let [aij ] be the
k × k matrix that represents the “preys on” hierarchy, in other words, the i th row and jth column
entry of the “preys on” matrix is given by aij = 1 if species j preys on species i .
i preys on j
Just as Google wants the web pages that are pointed to by web pages that are pointed to
. . . , so in food webs we are interested in species that are preyed on by species that are preyed
on . . . .
We find these “important” species by the same method: start with the “preys on” matrix of 0’s
and 1’s, normalize it to a stochastic matrix (all columns add to 1), and then find the principal
eigenvector. Each species’ importance in this food web is then its corresponding component in
this principal eigenvector.
The ranking that is produced by the principal eigenvector is then interpretable as “the sequence
of the losses that results in the fastest collapse of the network.” Allesina and Pascual argue that
this dominant eigenvector analysis is superior to other approaches to food webs, for example,
those that focus on “hub” or “keystone” species, which are defined as those species that have
the largest number of links to other species.
Economics
The history of matrix analysis of networks begins in economics. The economist Wassily Leontieff
produced an input/output matrix analysis of the Unites States economy in 1941. In a matrix
representation of an economy, we have a list of “sectors” s1 , s2 , . . . , sk , such as steel, water,
rubber, oil. Then we form the k × k matrix [aij ] in which each entry aij represents the quantity
of resources that sector j orders from sector i .
i orders from j
The first practical application came two years later, during World War II. The US government
asked Leontieff to create an input/output matrix representing the Nazi war economy in order to
identify which sectors were the most critical. This was done, and the eigenvector calculation of
this large-dimensional matrix was one of the very early uses of automated computing.
Leontieff used “the first commercial electro-mechanical computer, the IBM Automatic
Sequence Controlled Calculator (called the Mark I), originally designed under the direction of
348 Linear Algebra
Harvard mathematician Howard Aiken in 1939, built and operated by IBM engineers in Endicott,
New York for the US Navy” (Miller and Blair 2009).
The results of his eigenvector analysis would not have been immediately obvious: the critical
sectors were oil and ball bearings. Ball bearings were critical components of machinery and
vehicles, and no substitutes for them existed. In accord with this analysis, the US Army Air
Forces designated ball bearing factories and oil refineries as the major targets for their bombing
campaign in Europe.
Ecological Networks
In the 1970s, ecologists studying the flow of energy and nutrients (substances like carbon,
nitrogen, and phosphorus) in ecosystems discovered Leontief’s work and began using it to study
ecosystems as input/output systems (Hannon 1973), creating the field of ecological network
analysis. (See Fath and Patten (1999) for a readable introduction.) The first step in doing so is
to decide what substance to study (this substance is called the currency of the model), and if we
are studying a whole ecosystem, decide how to partition it into compartments. Compartments
can be species, collections of species, or nonliving ecosystem components such as dissolved
nitrate in water.
We then measure or estimate how much of our currency flows between each pair of compart-
ments. This gives what is called the flow matrix F . Entry fij of this matrix tells us how much
currency flows from compartment j to compartment i . For example, the ecological interactions
that make up an oyster–mussel community in a reef have been modeled as consisting of six
compartments. The currency in this case is energy, and the flows from one compartment to
another are shown in Figure 6.37.
Z1
Y1 1 f61 6 predators Y6
X1 X6
f21 f26
f25
Y2 2 deposited detrius f52 5 deposit feeders Y5
X2 f53 X5
f24
f32
f42
Y3 3 microbiota 4 meiofauna Y4
X3 f43 X4
Figure 6.37: Six compartment model of reef community (redrawn from Patten (1985)).
Based on the graph of the network, we can make an input–output matrix for the compartments
in the system. We can then iterate this matrix to find the long-term behavior predicted by the
model.
Suppose we iterate the matrix many times and the system stabilizes at some equilibrium point.
When the system is at equilibrium, the sum of all the outflows (or inflows) from a compartment
is called the compartment’s throughflow .
We can make a vector, T, of these throughflows. Dividing each entry in the F matrix by the
f
throughflow of the donor compartment gives a matrix called the G matrix, where Gij = Tijj . This
matrix gives us the probability that a unit of currency leaving compartment j enters compartment
i , or the fraction of the currency that does so.
6.6. Google PageRank 349
The G matrix tells us about the currency going from compartment j to compartment i in
one direct step. However, ecologists are interested in more than just the question of how much
flows from j to i . They also want to know about second-order flows, in which currency transfer
happens in two steps: j → k → i ; the currency first has to get from j to k and then from k to i .
The probability of going from j to k is Gkj , and the probability of going from k to i is Gik . And
the probability of going from j → i through k is the product of Gkj and Gik . Adding up these
products for all the compartments that could play the role of k gives the fraction of currency
leaving j that gets to i in two steps. We can do this for every compartment in the model simply
by multiplying the G matrix by itself. The resulting matrix is written as G 2 . More generally, the
amount of currency going from j to i in n steps is entry i , j of the matrix G n .
Why is this interesting? Well, all powers of G tell us about indirect flows between j and i . We
may sum all these matrices to obtain the sum of all indirect flows as G 2 + G 3 + · · · . Because
real ecosystems leak energy and nutrients, the entries in G n+1 are generally smaller than those
in G n , and the sum G 2 + G 3 + · · · converges to some limiting matrix. Comparing the entries
of this matrix to those of G itself lets us compare the relative importance of direct and indirect
flows. It turns out that in many ecosystem models, indirect flows are significant and can even
carry more energy or nutrients than direct flows!
Why does this happen, despite the fact that currency is lost at every step? It’s true that
a longer path will typically carry less currency than a shorter one. But how many long paths
are there? We can find out by taking powers of the adjacency matrix A. The i , jth entry of An
tells us the number of paths of length n between j and i . For most ecosystem and food web
models, these numbers rapidly become astronomical. For example, in the 29-species food web in
Figure 6.38, there are at least 28 million paths between seals and the fish hake (Yodzis 1998).
Figure 6.38: A food web for an ecosystem off the coast of southern Africa. Reprinted from
“Local trophodynamics and the interaction of marine mammals and fisheries in the Benguela
ecosystem,” by P. Yodzis, 1998, Journal of Animal Ecology 67(4):635–658. Copyright 1998
John Wiley & Sons. Reprinted with permission from John Wiley & Sons.
350 Linear Algebra
This proliferation of paths allows indirect paths taken together to carry a large amount of
energy or nutrients, even though no individual path may be very significant. This is one of the
reasons why predicting how an ecosystem or other complex system will respond to an intervention
is difficult.
Equilibrium Points
First of all, let’s discuss equilibrium points. If we think about one-dimensional linear vector fields,
then we are talking about either
X = r X or X = −r X (assuming r > 0)
It is clear that the only equilibrium points these systems can have are X = 0.
But what about two-dimensional or even n-dimensional cases? In the n-dimensional case, if
we are looking for equilibrium points, we are looking for solutions to
X = 0 = f (X)
which implies
X1 = 0 = f1 (X1 , X2 , . . . , Xn )
X2 = 0 = f2 (X1 , X2 , . . . , Xn )
.. .. ..
. . .
Xn = 0 = fn (X1 , X2 , . . . , Xn )
6.7. Linear Differential Equations 351
(We can find this by using the first equation to eliminate X1 in terms of the other variables, then
using the second equation to eliminate X2 , and finally we get an equation of the form aXn = 0,
which can have only the solution X1 = X2 = · · · = Xn = 0.)
Stability
Having found the equilibrium point, we now need to determine its stability.
In the one-dimensional case, we have already seen that X = r X has a stable equilibrium
point at X = 0 if and only if r < 0.
If we now pass to the decoupled 2D case,
X = aX
Y = dY
we can say that since the system decouples into two 1D subsystems along the X and Y axes,
the behavior of the equilibrium point is given by the behaviors along the two axes. The two 1D
subsystems are X = aX and Y = d Y . And if we join them, we get
X = aX X a 0 X
=⇒ =
Y = dY Y 0 d Y
As we saw in Chapter 3, these equilibrium points can be purely stable nodes (a < 0 and
d < 0), purely unstable nodes (a > 0 and d > 0), and saddle points (a < 0 and d > 0 or a > 0
and d < 0).
Exercise 6.7.1 Why does it makes sense that these signs of a and d give rise to the equilibrium
types listed above? (Hint: Draw some phase portraits.)
3 Almost all the time. The exceptions are cases in which two equations are multiples of each other, such as
0 = X + Y and 0 = 2X + 2Y . Try solving these for X and Y ; you don’t get a definite answer.
352 Linear Algebra
2D Let’s go on to discuss the two-dimensional case. The simplest case is two uncoupled systems
X = aX
Y = dY
This can be represented as the matrix differential equation
X a 0 X
=
Y 0 d Y
The flow corresponding to the diagonal matrix differential equation is then just the combina-
tion of the flows in the two components:
X(t) X(0)e at
=
Y (t) Y (0)e dt
where X(0), Y (0) are the initial conditions.
%
X = 0.3X
Exercise 6.7.4 Find the flow of the differential equation
Y = −0.5Y
%
X(t) = X(0)e 2t
Exercise 6.7.5 What differential equation has the flow
Y (t) = Y (0)e −0.7t
6.7. Linear Differential Equations 353
Eigenbehavior
We can look at the equation
X = r X
and ask something that may seem redundant and pointless. We will ask whether this 1D linear
function has an eigenvalue and an eigenvector. The answer is that of course it does. An eigen-
vector is a subspace along which f acts like multiplication by λ, and X obviously satisfies this,
with λ = r .
Therefore, for the differential equation X = r X, we can rewrite the equation for the flow as
X(t) = X(0)e λt (where λ = r )
Similarly, in the 2D uncoupled case, for the matrix differential equation
X a 0 X
=
Y 0 d Y
we can ask whether the matrix has eigenvalues and eigenvectors. And again, the answer is that
of course it does: the vectors
1 0
{X, Y} = { , }
0 1
Exercise 6.7.6 Construct the flow for the matrix differential equation
X −2 0 X
=
Y 0 1 Y
This form is the key to understanding the general 2D case. By mixing and matching var-
ious values of λX and λY , we get a gallery of equilibrium points in diagonal linear systems
(Figure 6.39).
354 Linear Algebra
Y Y
unstable node X X
λX =1 λY = 1
X(t) X(0) e t
X’= X Y’= Y
Y(t) Y(0) e t
Y
Y
stable node
X X
λ X = 1 λY = 1
X(t) X(0) e t
X’= X
Y(0) e t
Y’= Y
Y(t)
Y
Y
stable node X X
λ X = 1 λY = 2
X(t) X(0) e t
X’= X Y’= 2Y Y(t) Y(0) e 2t
Y Y
saddle point X X
λX = 1 λY = 1
X(t) X(0) e t
X’=X Y’= Y Y(t) Y(0) e t
The key to understanding behavior in this general case is to decompose the system into its
eigenvalues and eigenvectors, and then infer the flow from the “eigenbehavior” just as we have
been doing. So, for example, if λ1 and λ2 are both real numbers, we find their corresponding
6.7. Linear Differential Equations 355
eigenvectors U and V, and conclude that the flow is U(0)e λ1 t on the U axis and V (0)e λ2 t on
the V axis. This completely determines the behavior in the 2D state space.
The eigenvalues of this matrix are obtained by plugging the matrix entries into the charac-
teristic equation (equation (6.2) on page 299). We get
λ1 = 1 and λ2 = −1
which implies that the eigenvector U lies on the line Y = 0.5X, which has slope 0.5. The vector
(X, Y ) = (2, 1) will serve nicely as an eigenvector on this line.
The eigenvector V corresponding to λ2 must satisfy
M V = λ2 V
356 Linear Algebra
which implies that the eigenvector V lies on the line Y = 4X, which has slope 4. The vector
(X, Y ) = (1, 4) will serve nicely as an eigenvector on this line.
The resulting equilibrium point structure therefore has a stable direction along the V axis
(λV = λ2 = −1) and an unstable direction along the U axis (λU = λ1 = 1). Therefore, the
equilibrium point is a saddle point whose axes are U and V.
The flow corresponding to this saddle point is then exactly as in the uncoupled 2D system
U(t) U(0)e λU t
=
V(t) V(0)e λV t
where U(0) and V (0) are initial conditions expressed in the {U, V} coordinate system (Figure 6.40).
Y V
Figure 6.40: The flow around a saddle point. U and V are the unstable and stable eigenvectors.
Exercise 6.7.8 Classify the equilibria of the following linear differential equations:
% %
X = Y X = 4X + 3Y
a)
b)
Y = −2X − 3Y Y = X − 2Y
Complex eigenvalues Finally, let’s consider the nondiagonalizable cases. Consider, for example,
the spring with friction:
)
X = V X 0 1 X
=⇒ =
V = −X − V V −1 −1 V
0 1
M=
−1 −1
along the corresponding eigenvectors. The same is true for imaginary eigenvalues: if λ = a + b i,
then the flow is
e λt = e (a+b i)t = e at e b i t
The key to the dynamics is in the expression e at e bit . Notice that it is the product of two
terms.
The first term e at is an exponential in time, and its exponent is the real part of the eigen-
value. Therefore, if the real part of the eigenvalue is positive, the solution has a term that
is exponentially growing with time, whereas if the real part of the eigenvalue is negative, the
358 Linear Algebra
term becomes a negative exponential, decaying in time. So the sign of a, the real part of the
eigenvalue, determines whether the dynamics are growing or shrinking.
The second term, e b i t , which contains the imaginary part of the eigenvalue, b i, contributes
rotation to the flow. We can see this by recalling Euler’s formula e ix = cos(x) + i sin(x). So
e b i t = cos(bt) + i sin(bt)
The presence of cosine and sine functions of time guarantees that the solution is a periodic
function of time, which gives the solution its oscillatory component.
So, to return to our example of the spring with friction, we can say that the equilibrium point
at (0, 0) is
(1) oscillatory, because the eigenvalues are complex conjugates;
(2) shrinking, because the real part of the eigenvalues is less than 0.
Therefore, the equilibrium point is a stable spiral, which we confirm with simulation (Figure 6.41).
V
1 3 i
λ= 2 2
Figure 6.41: Simulation of the spring with friction verifies the prediction of a stable spiral equi-
librium point.
1 3 i
λ= 2 2 X
Figure 6.42: Simulation of the spring with negative friction verifies the prediction of an unstable
spiral equilibrium point.
Figure 6.43: Simulation of the frictionless spring verifies the prediction of a neutral equilibrium
point.
Exercise 6.7.9 Classify the equilibria of the linear differential equations whose eigenvalues are
given below:
a h
blood liver
b
X Y
Figure 6.44: Compartmental model of the movement of a tracer dye between the liver and the
bloodstream.
The problem is that we can’t observe h. All we can observe is X(t), the concentration of
the dye in the blood. We can estimate X(t) by making a number of blood draws over time,
measuring the dye level at each time point and then using curve-fitting software to estimate the
smooth curve that best fits the data points.
In order to get from an observation of X(t) to an estimation of h, we need to solve this
model. The differential equations are
X = − aX
+ bY
blood→liver liver→blood
Y = aX
− bY
− hY
blood→liver liver→blood liver→bile
We will solve for the long-term dynamics by finding the eigenvalues of the matrix
−a b
M= (a > 0, b > 0, h > 0)
a −(b + h)
Plugging the four entries of M into the characteristic polynomial (equation (6.3) on page
302), we get the two eigenvalues as
1
λ 1 , λ2 = − (a + b + h) ± (a + b + h)2 − 4ah
2
First of √
all, let’s note that both eigenvalues are real. In order for this to be true, the expression
under the sign has to be nonnegative. This is easily checked:
(a + b + h)2 − 4ah = a2 + b2 + h2 + 2ab + 2ah + 2bh − 4ah
= a2 + b2 + h2 + 2ab − 2ah + 2bh
= a2 − 2ah + h2 + b2 + 2ab + 2bh
= (a − h)2 + 2ab + 2bh + b2
>0
The next
question is whether the eigenvalues are negative or positive. That depends upon
whether (a + b + h)2 − 4ah is less than (a + b + h). It is certainly true that
(a + b + h)2 − 4ah < (a + b + h)2
since 4ah is a positive number. This implies
(a + b + h)2 − 4ah < a + b + h
which implies
−(a + b + h) ± (a + b + h)2 − 4ah < 0
So both eigenvalues λ1 , λ2 are negative real numbers, which means that (0, 0), the state in
which all dye is cleared, is a stable equilibrium point. Therefore, the behavior in approach to the
stable equilibrium point is the sum of two exponentially decaying terms. The question is how
fast the state point goes to the stable equilibrium point, for which we need the explicit solution.
Suppose that the eigenvectors corresponding to λ1 and λ2 are U and V. Then we can write
the explicit solution to the differential equation as
U(t) U(0)e λ1 t
=
V(t) V(0)e λ2 t
But what we need, to compare it to the experimentally measured data, is X(t). So we need
X(t) and Y(t), not U(t) and V(t).
We go from one coordinate system to the other just as we did before by means of the
coordinate transformation matrix T that takes the {X, Y} basis into the {U, V} basis:
X(0) T U(0)
−−−−−−−−−−→
Y(0) V(0)
⏐
⏐
⏐
⏐λ1 , λ2
*
X(t) T −1 U(t)
←−−−−−−−−−−
Y(t) V(t)
362 Linear Algebra
In order to compare X(t) to the experimental data, we face a problem. There are four unknown
parameters in the X(t) equation, and it is very difficult to infer four unknown parameters from
a single curve.
The key step in doing this is to think about the graph of a process that is represented by the
sum of two negative exponentials.
Choosing typical numbers for the parameters, and assuming that |λ1 | is significantly greater
than |λ2 |, so that λ1 is a rapidly decaying process and λ2 is a slowly decaying process (which is
the case in the liver), we obtain the following graph:
100
X(t) = A e λ1t + B e λ2t
80
A e λ1t
μg/L 60
B e λ 2t
40
20
t (mins)
5 10 15 20
The trick is to notice that in the early part of the curve, say the first five minutes, the curve
X(t) is very close to the fast negative exponential, while for t > 10 minutes, the curve X(t) is
very close to the slowly decaying process.
We then use the first segment of the X(t) curve to estimate Ae λ1 t , and the second segment
of the X(t) curve to estimate Be λ2 t . A simple calculation then gives us h, which is the liver’s
clearance rate.
100 X(t) = A e λ1t + B e λ2t
80 A e λ1t
60 B e λ 2t
μg/L
40
20
t (mins)
5 10 15 20
6.7. Linear Differential Equations 363
3. Suppose Romeo and Juliet’s love obeys the following differential equations:
R = −R + 3J
J = 3R − J
−1 3
The matrix of this system is , which has the following eigenvectors:
3 −1
1 −1
with eigenvalue 2, and with eigenvalue −4
1 1
364 Linear Algebra
We will use these two eigenvectors to define a new coordinate system, and we will use
u and v to represent these coordinates. However, in this problem, we will treat u and v
as new variables. Your goal is to rewrite this system of differential equations in terms of
these new variables.
a) Starting with the definition of the coordinates u and v ,
R 1 −1
=u +v
J 1 1
solve for u and v in terms of R and J to get
1 1
u= R+ J
2 2
1 1
v =− R+ J
2 2
b) Since R and J are just functions of time, u and v are as well, and taking the derivative
of both sides of the two equations above gives u = 12 R + 12 J and v = − 12 R + 12 J .
Substitute the original differential equations into this to get u and v in terms of R
and J.
c) Now substitute the expressions for R and J (in terms of u and v ) from part (a) into
your answer from part (b) and simplify. This should give you u and v in terms of u
and v .
d) What is the matrix of the new system of differential equations that you ended up
with in part (c)? What do you notice about its form? What do you notice about the
specific numbers that appear in it?