CST, Calculus For Cognitive Scientists Higher Order Models and Their Analysis
CST, Calculus For Cognitive Scientists Higher Order Models and Their Analysis
James K. Peterson
Calculus for
Cognitive
Scientists
Higher Order Models and Their Analysis
Cognitive Science and Technology
Series editor
David M.W. Powers, Adelaide, Australia
123
James K. Peterson
Department of Mathematical Sciences
Clemson University
Clemson, SC
USA
We would like to thank all the students who have used the various iterations
of these notes as they have evolved from handwritten to the fourth fully typed
version here. We particularly appreciate your interest as this course is required and
uses mathematics; a combination that causes fear in many biological science
majors. We have been pleased by the enthusiasm you have brought to this inter-
esting combination of ideas from many disciplines. Finally, we gratefully
acknowledge the support of Hap Wheeler in the Department of Biological Sciences
during the years 2006 to 2014 for believing that this material would be useful to
biology students.
For this new text on a follow-up course to the first course on calculus for
cognitive scientists, we would like to thank all of the students from Spring 2006 to
Fall 2014 for their comments and patience with the inevitable typographical errors,
mistakes in the way we explained topics, and organizational flaws as we have
taught second semester of calculus ideas to them. This new text starts assuming you
know something the equivalent of a first semester course in calculus and particu-
larly know about exponential and logarithm functions, first-order models and the
MATLAB tools needed to solve the models numerically. In addition, you need to
know a fair bit of a start into calculus for functions of more than one variable and
the ideas of approximation to functions of one and two variables. These are not
really standard topics in just one course in calculus, which is why our first volume
was written to provide coverage of all those things. In addition, all of the mathe-
matics subserve ideas from biological models so that everything is wrapped toge-
ther in a pleasing package!
With that background given, in this text, we add new material on linear and
nonlinear systems models and more biological models. We also cover a useful way
of solving what are called linear partial differential equations using the technique
vii
viii Acknowledgments
named Separation of Variables. To make sense of all this, we naturally have to dip
into mathematical waters at appropriate points and we are not shy about that! But
rest assured, everything we do is carefully planned because it is of great use to you
in your attempts to forge an alliance between cognitive science, mathematics, and
computation.
Contents
Part I Introduction
1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 A Roadmap to the Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Part II Review
2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 The Inner Product of Two Column Vectors . . . . . . . . . . . . . . 11
2.1.1 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Interpreting the Inner Product. . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Determinants of 2 2 Matrices . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Worked Out Problems . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Systems of Two Linear Equations. . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Worked Out Examples. . . . . . . . . . . . . . . . . . . . . . 23
2.4.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Solving Two Linear Equations in Two Unknowns . . . . . . . . . 25
2.5.1 Worked Out Examples. . . . . . . . . . . . . . . . . . . . . . 28
2.5.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Consistent and Inconsistent Systems . . . . . . . . . . . . . . . . . . . 31
2.6.1 Worked Out Examples. . . . . . . . . . . . . . . . . . . . . . 33
2.6.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Specializing to Zero Data . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.7.1 Worked Out Examples. . . . . . . . . . . . . . . . . . . . . . 37
2.7.2 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
ix
x Contents
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
List of Figures
xvii
xviii List of Figures
xxiii
List of Code Examples
xxv
xxvi List of Code Examples
This book tries to show how mathematics, computer science, and biology can be
usefully and pleasurably intertwined. The first volume (J. Peterson, Calculus for
Cognitive Scientists: Derivatives, Integration and Modeling (Springer, Singapore,
2015 in press)) discussed the necessary one- and two-variable calculus tools as well
as first-order ODE models. In this volume, we explicitly focus on two-variable
ODE models both linear and nonlinear and learn both theoretical and computational
tools using MATLAB to help us understand their solutions. We also go over
carefully on how to solve cable models using separation of variables and Fourier
series. And we must always caution you to be careful to make sure the use of
mathematics gives you insight. These cautionary words about the modeling of the
physics of stars from 1938 should be taken to heart:
Technical journals are filled with elaborate papers on conditions in the interiors of model
gaseous spheres, but these discussions have, for the most part, the character of exercises in
mathematical physics rather than astronomical investigations, and it is difficult to judge the
degree of resemblance between the models and actual stars. Differential equations are like
servants in livery: it is honourable to be able to command them, but they are “yes” men,
loyally giving support and amplification to the ideas entrusted to them by their master—
Paul W. Merrill, The Nature of Variable Stars, 1938, quoted in Arthur I. Miller Empire
of the Stars: Obsession, Friendship, and Betrayal in the Quest for Black Holes, 2005.
where we have taken the liberty to replace physics with our domain here of biology.
We should never forget the last line
xxxi
xxxii Abstract
We must always take our modeling results and go back to the scientists to make
sure they retain relevance.
History
Based On:
Notes On MTHSC 108 for Biologists developed during the
Spring 2006 and Fall 2006,
Spring 2007 and Fall 2007 courses
The first edition of this text was used in
Spring 2008 and Fall 2008,
The course was then relabeled at MTHSC 111
and the text was used in
Spring 2009 and Fall 2009 courses
The second edition of this text was used in
Spring 2010 and Fall 2010 courses
The third edition was used in the
Spring 2011 and Fall 2011,
Spring 2012 and Summer Session I courses
The fourth edition was used in
Fall 2012, Spring 2013, and Fall 2013 courses
The fifth edition was used in the Spring 2014 course
Also, we have used material from notes on
Partial Differential Equation Models
which has been taught to small numbers of students since 2008.
xxxiii
Part I
Introduction
Chapter 1
Introductory Remarks
In this course, we will try to introduce beginning cognitive scientists to more of the
kinds of mathematics and mathematical reasoning that will be useful to them if they
continue to live, grow and learn within this area. In our twenty first century world,
there is a tight integration between the areas of mathematics, computer science and
science. Now a traditional Calculus course for the engineering and physical sciences
consists of a four semester sequence as shown in Table 1.1.
Unfortunately, this sequence of courses is heavily slanted towards the needs of the
physical sciences. For example, many of our examples come from physics, chemistry
and so forth. As long as everyone in the class has the common language to under-
stand the examples, the examples are a wonderful tool for adding insight. However,
the typical students who are interested in cognitive science often find the language
of physics and engineering to be outside their comfort zone. Hence, the examples
lose their explanatory power. Our first course starts a different way of teaching this
material and this text will continue that journey. Our philosophy, as usual, is that all
parts of the course must be integrated, so we don’t want to use mathematics, science
or computer approaches for their own intrinsic value. Experts in these separate fields
must work hard to avoid this. This is the time to be generalists and always look for
connective approaches. Also, models are carefully chosen to illustrate the basic idea
that we know far too much detail about virtually any biologically based system we
can think of. Hence, we must learn to throw away information in the search of the
appropriate abstraction. The resulting ideas can then be phrased in terms of mathe-
matics and simulated or solved with computer based tools. However, the results are
not useful, and must be discarded and the model changed, if the predictions and illu-
minating insights we gain from the model are incorrect. We must always remember
that throwing away information allows for the possibility of mistakes. This is a hard
lesson to learn, but important. Note that models from population biology, genetics,
cognitive dysfunction, regulatory gene circuits and many others are good examples
to work with. All require massive amounts of abstraction and data pruning to get
anywhere, but the illumination payoffs are potentially quite large.
In this course, we introduce enough relevant mathematics and the beginnings of useful
computational tools so that you can begin to understand a fair bit about Biological
Modeling. We present a selection of nonlinear biological models and slowly build you
to the point where you can begin to have a feel for the model building process. We start
our model discussion with the classical Predator–Prey model in Chap. 10. We try to
talk about it as completely as possible and we use it as a vehicle to show how graphical
analysis coupled with careful mathematical reasoning can give us great insight. We
discuss completely the theory of the original Predator–Prey model in Sect. 10.1 and
its qualitative analysis in Sect. 10.5. We then introduce the use of computational tools
to solve the Predator–Prey model using MatLab in Sect. 10.11. While this model is
very successful at modeling biology, the addition of self-interaction terms is not.
The self-interaction models are analyzed in Chap. 11 and computational tools are
discussed in Sect. 11.8.
In Chap. 12, we show you a simple infectious disease model. The nullclines for this
model are developed in Sect. 12.1 and our reasoning why only trajectories that start
with positive initial conditions are biologically relevant are explained in Sect. 12.2.
1.1 A Roadmap to the Text 5
The infectious versus susceptible curve is then derived in Sect. 12.3. We finish this
Chapter with a long discussion of how we use a bit of mathematical wizardry to
develop a way to estimate the value of ρ in these disease models by using data
gathered on the value of R . This analysis in Sect. 12.6, while complicated, is well
worth your effort to peruse!
In Chap. 13, we show you a simple model of colon cancer which while linear
is made more complicated by its higher dimensionality—there are 6 variables of
interest now and graphical analysis is of less help. We try hard to show you how
we can use this model to get insight as to when point mutations or chromosomal
instability are the dominant pathway to cancer.
In Chap. 15, we go over a simple model of insulin detection using second order
models which have complex roots. We use the phase shifted form to try to detect
insulin from certain types of data.
1.2 Code
The code for much of this text is in the directory ODE in our code folder which you
can download from Biological Information Processing (http://www.ces.clemson.
edu/~petersj/CognitiveModels.html). These code samples can then be downloaded
as the zipped tar ball CognitiveCode.tar.gz and unpacked where you wish. If you
have access to MatLab, just add this folder with its sub folders to your MatLab path.
If you don’t have such access, download and install Octave on your laptop. Now
Octave is more of a command line tool, so the process of adding paths is a bit more
tedious. When we start up an Octave session, we use the following trick. We write
up our paths in a file we call MyPath.m. For us, this code looks like this
The paths we want to add are setup as strings, here called s1 etc., and to use this,
we start up Octave like so. We copy MyPath.m into our working directory and then
do this
We agree it is not as nice as working in MatLab, but it is free! You still have
to think a bit about how to do the paths. For example, in Peterson (2015c), we
develop two different ways to handle graphs in MatLab. The first is in the direc-
tory GraphsGlobal and the second is in the directory Graphs. They are not
to be used together. So if we wanted to use the setup of Graphs and noth-
ing else, we would edit the MyPath.m file to set s = [s11]; only. If we
wanted to use the GraphsGlobal code, we would edit MyPath.m so that
s11 = ’/home/petersj/MatLabFiles/BioInfo/GraphsGlobal:’;
and then set s = [s11];. Note the directories in the MyPath.m are ours: the main
directory is ’/home/petersj/MatLabFiles/BioInfo/ and of course,
you will have to edit this file to put your directory information in there instead
of ours.
All the code will work fine with Octave. So pull up a chair, grab a cup of coffee
or tea and let’s get started.
References
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd, Singapore, 2015a in press)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd,
Singapore, 2015b in press)
J. Peterson, BioInformation Processing: A Primer On Computational Cognitive Science, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd, Singapore, 2015c in press)
Part II
Review
Chapter 2
Linear Algebra
We need to use both vector and matrix ideas in this course. This was covered already
in the first text (Peterson 2015), so we will assume you can review that material
before you start into this chapter. Here we will introduce some new ideas as well
as tools in MatLab we can use to solve what are called linear algebra problems; i.e.
systems of equations. Let’s begin by looking at inner products more closely.
We can also define the inner product of two vectors. If V and W are two column
vectors of size n × 1, then the product V T W is a matrix of size 1 × 1 which we
identify with a real number. We see if
⎡ ⎤ ⎡ ⎤
V1 W1
⎢ V2 ⎥ ⎢ W2 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
V = ⎢ V3 ⎥ and W = ⎢ W3 ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎣.⎦ ⎣ . ⎦
Vn Wn
V T W = W T V = V , W [V1 W1 + V2 W2 + V3 W3 + · · · + Vn Wn ]
and we identify this one by one matrix with the real number
V1 W1 + V2 W2 + V3 W3 + · · · + Vn Wn
V1 W1 + V2 W2 + V3 W3 + · · · + Vn Wn
2.1.1 Homework
Exercise 2.1.1 Find the dot product of the vectors V and W given by
6 7
V = and W = .
1 2
Exercise 2.1.2 Find the dot product of the vectors V and W given by
−6 2
V = and W = .
−8 6
Exercise 2.1.3 Find the dot product of the vectors V and W given by
10 2
V = and W = .
−4 80
We add, subtract and scalar multiply vectors and matrices as usual. We also suggest
you review how to do matrix–vector multiplications. Multiplication of matrices is
more complex as we discussed in the volume (Peterson 2015). Let’s go through it
again a bit more abstractly. Recall the dot product of two vectors V and V W is
defined to be
n
V , W = Vi Wi
i=1
where n is the number of components in the vectors. Using this we can define the
multiplication of the matrix A of size n × p with the matrix B of size p × m as
follows.
2.1 The Inner Product of Two Column Vectors 13
⎡ ⎤
Row 1 of A
⎢ Row 2 of A⎥
⎢ ⎥
⎢ .. ⎥ Column 1 of B | · · · | Column n of B
⎣ . ⎦
Row n of A
⎡ ⎤
< Row 1 of A, Column 1 of B > . . . < Row 1 of A, Column n of B >
⎢< Row 2 of A, Column 1 of B > . . . < Row 2 of A, Column n of B >⎥
⎢ ⎥
=⎢ .. .. .. ⎥
⎣ . . . ⎦
< Row n of A, Column 1 of B > . . . < Row n of A, Column n of B >
We can write this more succinctly with we let Ai denote the rows of A and B i be the
columns of B. Note the use of subscripts for the rows and superscripts for the columns.
Then, we can rewrite the matrix multiplication algorithm more compactly as
⎡ ⎤ ⎡ ⎤
A1 A1 , B 1 . . . A1 , B n
⎢ A2 ⎥ ⎢ A2 , B 1 . . . A2 , B n ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ B1 | · · · | Bn =⎢ .. .. .. ⎥
⎣ . ⎦ ⎣ . . . ⎦
An An , B . . . An , B
1 1
AB i j = Ai , B j .
Comment 2.1.1 If A is a matrix of any size and 0 is the appropriate zero matrix
of the same size, then both 0 + A and A + 0 are nicely defined operations and the
result is just A.
Comment 2.1.2 Matrix multiplication is not communicative: i.e. for square matri-
ces A and B, the matrix product A B is not necessarily the same as the product B A.
What could this number < V, W > possibly mean? To figure this out, we have to
do some algebra. Let’s specialize to nonzero column vectors with only 2 compo-
nents. Let
a b
V = and W =
c d
Since these vectors are not zero, only one of the terms in (a, c) and in (b, d) can be
zero because otherwise both components would be zero and we are assuming these
vectors are not the zero vector. We will use this fact in a bit. Now here < V, W > =
ab + cd. So
14 2 Linear Algebra
|| V ||2 || W ||2 = a 2 + c2 b2 + d 2
= a 2 b2 + a 2 d 2 + c2 b2 + c2 d 2
Thus,
Now, this does look complicated, doesn’t it? But this last term is something squared
and so it must be non-negative! Hence, taking square roots, we have shown that
|< V, W >| ≤ || V || || W ||
Note, since a real number is always less than or equal to it absolute value, we can
also say
< V, W > ≤ || V || || W ||
And we can say more. If it turned out that the term (ad − bc)2 was zero, then
ad − bc = 0. There are then a few cases to look at.
1. If all the terms a, b, c and d are not zero, then we can write ad = bc implies a/c =
b/d. We know the vector V can be interpreted as the line segment starting at (0, 0)
on the line with equation y = (a/c)x. Similarly, the vector W can be interpreted
as the line segment connecting (0, 0) and (b, d) on the line y = (b/d)x. Since
a/c = b/d, these lines are the same. So both points (a, c) and (b, d) line on the
same line. Thus, we see these vectors lay on top of each other or point directly
opposite each other in the x − y plane; i.e. the angle between these vectors is 0
or π radians (that is 0◦ or 180◦ ).
2. If a = 0, then bc must be 0 also. Since we know the vector V is not the zero
vector, we can’t have c = 0 also. Thus, b must be zero. This tells us V has
components (0, c) for some non zero c and V has components (0, d) for some
non zero d. These components also determine two lines like in the case above
which either point in the same direction or opposite one another. Hence, again,
the angle between the lines determined by these vectors is either 0 or π radians.
3. We can argue just like the case above if d = 0. We would find the angle between
the lines determined by the vectors is either 0 or π radians.
Moreover,
|< V, W >| = || V || || W ||
if and only the quantity ad − bc = 0. Further, this quantity is equal to 0 if and only
if the angle between the line segments determined by the vectors V and W is 0◦
or 180◦ .
Here is yet another way to look at this: assume there is a non zero value of t so
that the equation below is true.
a b 0
V +t W = +t =
c d 0
This implies
a b
= −t
c d
Since these two vectors are equal, their components must match. Thus, we must have
a = −t b
c = −t d
Thus,
c
a d = (−t b) =bc
−t
and we are back to ad − bc = 0! Hence, another way of saying that the vectors V
and W are either 0◦ or 180◦ apart is to say that as vectors they are multiples of one
another! Such vectors are called collinear vectors to save writing. In general, we
say two n dimensional vectors are collinear if there is a nonzero constant t so that
V = t W although, of course, we can’t really figure out a way to visualize these
vectors!
Now, the scaled vectors E = ||VV || and F = ||W W
||
have magnitudes of 1. Their
components are (a/ || V ||, c/ || V || and (b/ || W ||, d/ || W ||. These points
lie on a circle of radius 1 centered at the origin. Let θ1 be the angle E makes with
the positive x-axis. Then, since the hypotenuse distance that defines the cos(θ1 ) and
sin(θ1 ) is 1, we must have
16 2 Linear Algebra
a
cos(θ1 ) =
|| V ||
c
sin(θ1 ) =
|| V ||
We can do the same thing for the angle θ2 that F makes with the positive x axis to see
b
cos(θ2 ) =
|| W ||
d
sin(θ2 ) =
|| W ||
The angle between vectors V and W is the same as between vectors E and F. Call
this angle θ. Then θ = θ1 − θ2 and using the formula for the cos of the difference
of angles
cos(θ) = cos(θ1 − θ2 )
= cos(θ1 ) cos(θ2 ) + sin(θ1 ) sin(θ2 )
a b c d
= +
|| V || || W || || V || || W ||
ab + cd
=
|| V || || W ||
< V, W >
=
|| V || || W ||
Hence, the ratio < V, W > /(|| V || || W ||) is the same as cos(θ)! So we can use
this simple calculation to find the angle between a pair two dimensional vectors.
The more general proof of the Cauchy Schwartz Theorem for n dimensional vec-
tors is a journey you can take in another mathematics class! We will state it though
so we can use it later if we need it.
Moreover,
|< V, W >| = || V || || W ||
Theorem 2.2.2 then tells us that if the vectors V and W are not zero, then
< V, W >
−1 ≤ ≤1
|| V || || W ||
and by analogy to what works for two dimensional vectors, we can use this ratio to
define the cos of the angle between two n dimensional vectors even though we can’t
see them at all! We do this in Definition 2.2.1.
Definition 2.2.1 (The Angle Between n Dimensional Vectors)
If V and W are two non zero n dimensional column vectors with components
(V1 , . . . , Vn ) and (W1 , . . . , Wn ) respectively, the angle θ between these vectors is
defined by
< V, W >
cos(θ) =
|| V || || W ||
Moreover, the angle between the vectors is 0◦ if < V, W > = 1 and is 180◦ if
< V, W > = −1.
2.2.1 Examples
Example 2.2.1 Find the angle between the vectors V and W given by
−6 −8
V = and W = .
13 1
< V, W > 61
cos(θ) = =√ √
|| V || || W || 205 65
= 0.5284
tors, we know
Example 2.2.3 Find the angle between the vectors V and W given by
6 8
V = and W = .
−13 1
Solution Compute the inner product < V , W >=(6)(8) + (−13)(1) =√35. Next,
√ vectors: || V || = (6) + (−13) = 205 and
find the magnitudes of these 2 2
|| W || = (8) + (1) = 65. Then, if θ is the angle between the vectors, we know
2 2
< V, W > 35
cos(θ) = =√ √
|| V || || W || 205 65
= 0.3032
2.2.2 Homework
Exercise 2.2.1 Find the angle between the vectors V and W given by
5 7
V = and W = .
4 2
Exercise 2.2.2 Find the angle between the vectors V and W given by
−6 9
V = and W = .
−8 8
2.2 Interpreting the Inner Product 19
Exercise 2.2.3 Find the angle between the vectors V and W given by
10 2
V = and W = .
−4 8
Exercise 2.2.4 Find the angle between the vectors V and W given by
6 7
V = and W = .
1 2
Exercise 2.2.5 Find the angle between the vectors V and W given by
3 2
V = and W = .
−5 −3
Exercise 2.2.6 Find the angle between the vectors V and W given by
1 −2
V = and W = .
−4 −3
Since the number ad − bc is so important is all of our discussions about the rela-
tionship between the two dimensional vectors V and W with components (a, c) and
(b, d) respectively, we will define this number to be the determinant of the matrix
A formed by using V for column 1 and W for column 2 of A. That is
ab a b
A= = V W =
cd c d
Notice that det AT is (a)(d) − (b)(c) also. Hence, if det AT is zero, it means
that Y and Z are collinear. Hence, it the det ( A) is zero, both the vectors determined
by the rows of A and the columns of A are collinear. Let’s summarize what we know
about this new thing called the determinant of A.
1. If | A | is 0, then the vectors determined by the columns of A are collinear.
This also means that the vectors determined by the columns are multiples of one
another. Also, the vectors determined by the columns of AT are also collinear.
2. If | A | is not 0, then the vectors determined by the columns of A are not collinear
which means these vectors point in different directions. Another way of saying
this is that these vectors are not multiples of one another. The same is true for the
columns of the transpose of A.
are collinear.
Solution Form the matrix A using these vectors as the columns. This gives
4 −2
A=
5 3
2.3 Determinants of 2 × 2 Matrices 21
The calculate | A | = (4)(3) − (−2)(5). Since this value is 22 which is not zero,
these vectors are not collinear.
are collinear.
Solution Form the matrix A using these vectors as the columns. This gives
−6 3
A=
4 −2
The calculate | A | = (−6)(−2) − (3)(4). Since this value is 0, these vectors are
collinear. You should graph them in the x−y plane to see this visually.
2.3.2 Homework
We can use all of this material to understand simple two linear equations in two
unknowns x and y. Consider the problem
2x +4 y = 7 (2.1)
3 x + 4 y = −8 (2.2)
22 2 Linear Algebra
Using the standard ways of multiplying vectors by scalars and adding vectors, we
see the above can be rewritten as
2x 4y 7
+ =
3x 4y −8
or
2x + 4y 7
=
3x + 4y −8
This last vector equation is clearly the same as the original Eqs. 2.1 and 2.2:
2x + 4y = 7
3x + 4y = −8
we see Eqs. 2.1 and 2.2 are equivalent to the vector equation
x V + y W = D.
We can also write the system Eqs. 2.1 and 2.2 in an equivalent matrix–vector form.
Recall the original system which is written below:
2x +4 y = 7
3 x + 4 y = −8
xV+yW = D
where
2 4 7
V = , W= and D =
3 4 −8
2.4 Systems of Two Linear Equations 23
Now use V and W as column one and column two of the matrix A
24
A= V W =
34
We call the matrix A the coefficient matrix of the system given by Eqs. 2.1 and 2.2.
Now we can introduce a new type of notation. Think of the column vector
x
y
as being a vector variable. We will use a bold font and a capital letter for this and set
x
X=
y
A X = D.
We typically refer to the vector D as the data vector associated with the system given
by Eqs. 2.1 and 2.2.
1x + 2y =9
−5 x + 12 y = −1
A X = D.
7x + 5y =2
−3 x + −4 y = 1
A X = D.
2.4.2 Homework
1α + 2β = 3
4α + 5β = 6
2.4 Systems of Two Linear Equations 25
We now know how to write the system of two linear equations in two unknowns
given by Eqs. 2.3 and 2.4
a x + b y = D1 (2.3)
c x + d y = D2 (2.4)
xV+yW = D
where
a b D1
V = , W= and D =
c d D2
Finally, using V and W as column one and column two of the matrix A
ab
A= V W =
cd
Then, the original system was written in vector and matrix–vector form as
ab x D1
xV+yW = =
cd y D2
26 2 Linear Algebra
Now, we can solve this system very easily as follows. We have already discussed the
inner product of two vectors. So we could compute the inner product of both sides
of x V + y W = D with any vector U we want and get
< U, x V + y W > = < U, D >
Since this is true for any vector U, let’s try to find useful ones! Any vector U that
satisfies < U, W > = 0 would be great as then the y would drop out and we could
solve for x. The angle between such vector U and W would then be 90◦ or 270◦ .
We will call such vectors orthogonal as the lines associated with the vectors are
perpendicular.
We can easily find such a vector. Since W defines a line through the origin with
slope d/b, from our usual algebra and pre-calculus courses, we know the line through
the origin which is perpendicular to it has negative reciprocal slope: i.e. −b/d. A
line with the slope −b/d corresponds with a vector with components (d, −b). The
usual symbol for perpendicularity is ⊥ so we will label our vector orthogonal to W
as W ⊥ . We see that
⊥ d
W =
−b
and as expected
Thus, we have
This looks complicated, but it can be written in terms of things we understand. Let’s
actually calculate the inner products. We find
and
D1 b
< W ⊥ , D > = (d)(D1 ) + (−b)(D2 ) = det .
D2 d
2.5 Solving Two Linear Equations in Two Unknowns 27
Hence, by taking the inner product of both sides with W ⊥ , we find the y term drops
out and we have
ab D1 b
x det = det
cd D2 d
We can do a similar thing to find out what the variable y is by taking the inner
product of both sides of of x V + y W = D with the vector V ⊥ and get
where
⊥ c
V =
−a
and as expected
Going through the same steps as before, we would find that if det ( A) is non zero,
we could solve for y to get
a D1
det
c D2 det V D
y= =
ab det V W
det
cd
Let’s summarize:
1. Given any system of two linear equations in two unknowns, there is a coefficient
matrix A with first column V and second column W that is associated with it.
Further, the right hand side of the system defines a data vector D.
2. If det ( A) is not zero, we can solve for the unknowns x and y as follows:
28 2 Linear Algebra
det DW
x=
det V W
det V D
y=
det V W
−2 x + 4 y = 6
8 x + −1 y = 2
Solution Solve
−2 x + 4 y = 6
8 x + −1 y = 2
−5 x + 1 y = 8
9 x + −10 y = 2
Solution We have
−5 1 9
V = , W= and D =
9 −10 −2
30 2 Linear Algebra
det DW
x=
det V W
8 1
det
2 −10
=
−5 1
det
9 −10
−82 −82
= =− .
41 41
det V D
y=
det V W
−5 8
det
9 2
=
−5 1
det
9 −10
−82
=
41
2.5.2 Homework
−3 x + 4 y = 6
8 x + 9 y = −1
2x + 3y =6
−4 x + 0 y = 8
18 x + 1 y = 1
−9 x + 3 y = 17
2.5 Solving Two Linear Equations in Two Unknowns 31
−7 x + 6 y = −4
8x + 1y =1
−90 x + 1 y = 1
80 x + −1 y = 1
So what happens if det ( A) = 0? By the remark above, we know that the vectors V
and W are collinear. We also know from our discussions in Sect. 2.3 that the columns
of AT are collinear. Hence, there is a non zero constant r so that
a c
=r
b d
r c x + r d y = D1
c x + d y = D2
or
r (c x + d y) = D1
c x + d y = D2
You can see we do not really have two equations in two unknowns since the top
equation on the left hand side is just a multiple of the left hand side of the bottom
32 2 Linear Algebra
c x + d y = D1 /r
c x + d y = D2
Now subtract the top equation from the bottom equation. You find
0 x + 0 y = 0 = D2 − D1 /r
This equation only makes sense if when you subtract the top from the bottom equa-
tion, the new right hand side is 0! We call such systems consistent if the right hand
side becomes 0 and inconsistent if not zero. So we have a great test for inconsistency.
We scale the top or bottom equation just right to make them identical and subtract
the two equations. If we get 0 = α for a nonzero α, the system is inconsistent.
Here is an example. Consider the system
2x +3y = 8
4x +6y = 9
2 x + 3 y = 8 = D1
2 (2 x + 3 y) = 9 = D2
This system would be consistent if the bottom equation was exactly two times the top
equation. For this to happen, we need D2 = 2 D1 ; i.e., we need 9 = 2 × 8 which is
impossible. So these equations are inconsistent. As mentioned earlier, an even better
way to see these equations are inconsistent is to subtract the top equation from the
bottom equation to get
0x + 0 y = 0 = 1
2.6 Consistent and Inconsistent Systems 33
D=xV+yW
= x sW + y W
This says that the data vector D = (xs + y) W . Hence, if there is a solution x and y,
it will only happen in D is a multiple of W . This says D is collinear with W which
in turn is collinear with V . Going back to our sample
2x +3y = 8
4 x + 6 y = 9.
4 x + 5 y = 11
−8 x − 10 y = −22
This system would be consistent if the bottom equation was exactly minus two times
the top equation. For this to happen, we need D2 = −2 D1 ; i.e., we need −22 =
−2 × 11 which is true. So these equations are consistent.
6 x + 8 y = 14
18 x + 24 y = 48
Solution We see immediately that the determinant of the coefficient matrix A is zero.
So again, the question of consistency is reasonable to ask. Here, the column vectors
of AT are
6 18
Y = and Z =
8 24
6 x + 8 y = 14 = D1
3 (6 x + 8 y) = 48 = D2
This system would be consistent if the bottom equation was exactly three times the
top equation. For this to happen, we need D2 = 3 D1 ; i.e., we need −48 = 3 × 14
which is not true. So these equations are inconsistent.
2.6.2 Homework
2x +5y = 1
8 x + 20 y = 4
60 x + 80 y = 120
6 x + 8 y = 13
−2 x + 7 y = 10
20 x − 70 y = 4
x+y=1
2x +2 y = 3
−11 x − 3 y = −2
33 x + 9 y = 6
If the system we want to solve has zero data, then we must solve a system of equa-
tions like
a x +b y = 0
c x + d y = 0.
36 2 Linear Algebra
Define the vectors V and W as usual. Note D is now the zero vector
a b 0
V = , W= , and D =
c d 0
are collinear and so there is a non zero constant r so that a = r c and b = r d. This
gives the system
r (c x + b y) = 0
c x + d y = 0.
Now if you multiply the bottom equation by r and subtract from the top equation,
you get 0. This tells us the system is consistent. The original system of two equations
is thus only one equation. We can choose to use either the original top or bottom
equation to solve. Say we choose the original top equation. Then we need to find x
and y choices so that
a x +b y = 0
There are in finitely many solutions here! It is easiest to see how to solve this kind
of problem using some examples.
2.7 Specializing to Zero Data 37
−2 x + 7 y = 0
20 x − 70 y = 0
Solution First, note the determinant of the coefficient matrix of the system is zero.
Also, since the bottom equation is −10 times the top equation, we see the system is
also consistent. We solve using the top equation:
−2 x + 7 y = 0
Thus,
7y =2x
y = (2/7) x
will always work. There is a lot of ambiguity here as the multiplier x is completely
arbitrary. For example,if we let x = 7 c for an arbitrary c, then solving for y, we
find y = 2 c. We can then rewrite the solution vector as
2
c
7
in terms of the arbitrary multiplier c. It does not really matter what form we pick,
however we often try to pick a form which has integers as components.
4x +5y = 0
8 x + 10 y = 0
Solution First, note the determinant of the coefficient matrix of the system is zero.
Also, since the bottom equation is 2 times the top equation, we see the system is also
consistent. We solve using the bottom equation this time:
8 x + 10 y = 0
38 2 Linear Algebra
Thus,
10 y = −8 x
y = −(4/5) x
will always work. Again, there is a lot of ambiguity here as the multiplier x is
completely arbitrary. For example,if we let x = 10 d for an arbitrary d, then solving
for y, we find y = −8 d. We can then rewrite the solution vector as
10
d
−8
in terms of the arbitrary multiplier d. Again, it is important to note that it does not
really matter what form we pick, however we often try to pick a form which has
integers as components.
2.7.2 Homework
x +3y = 0
6 x + 18 y = 0
−3 x + 4 y = 0
9 x − 12 y = 0
2x +7y = 0
1 x + (3/2) y = 0
−10 x + 5 y = 0
20 x − 10 y = 0
2.7 Specializing to Zero Data 39
−12 x + 5 y = 0
4 x − (5/3) y = 0
BA= AB=I
which is the solution to our system! The matrix B is, of course, special and clearly
plays the role of an inverse for the matrix A. When such a matrix B exists, it is called
the inverse of A and is denoted by A−1 .
Definition 2.8.1 (The Inverse of the matrix A)
If there is a matrix B of the same size as the square matrix A, B is said to be inverse
of A is
BA= AB=I
find A−1 .
and hence,
x 8
= A−1
y 7
−12
−1 40 − 28
= = 22
22 −24 − 14 38
22
2.8.2 Homework
Let’s look at how we can use MatLab/Octave to solve the general linear system of
equations
Ax =b
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −2 3 x 9
⎣0 4 1⎦ ⎣ y ⎦ = ⎣8⎦
0 0 6 z 2
this is easily solve by starting at the last equation and working backwards. This is
called backsolving. Here, we have
2
z=
6
1 23
4y = 8 − z = 8 − =
3 3
23
y=
12
23 71
x = 9 + 2y − 3z = 9 + −1=
6 6
It is easy to write code to do this in MatLab as we do below.
2.9 Computational Linear Algebra 43
To use this function, we would enter the following commands at the Matlab prompt.
For now, we are assuming that you are running Matlab in a local directory which
contains your Matlab code LTriSol.m. So we fire up Matlab and enter these com-
mands:
Here is a simple function to solve a similar system where this time A is upper
triangular. The code is essentially the same although the solution process starts at
the top and sweeps down.
44 2 Linear Algebra
As usual, to use this function, we would enter the following commands at the Matlab
prompt. We are still assuming that you are running Matlab in a local directory and
that your Matlab code UTriSol.m is also in this directory.
So we enter these commands in Matlab.
Start in the row 1 and column 1 position in A. The entry there is the pivoting element.
Divide the entries below it by the 8 and store them in the rest of column 1. This gives
the new matrix A∗
⎡ ⎤
8 23
A ∗ = ⎣− 8 3 2 ⎦
4
7
8
89
If we took the original row 1 and multiplied it by the − 48 and subtracted it from row
2, we would have the new second row
0 4 3.5
If we took the 78 , multiplied the original row 1 by it and subtracted it from the original
row 3, we would have
50 51
0 8 8
With these operations done, we have the matrix A∗ taking the form
⎡ ⎤
8 2 3
A∗ = ⎣− 8 4 3.5⎦
4
7 25 51
8 4 8
The multipliers in the lower part of column 1 are important to what we are doing, so
we are saving them in the parts of column 1 we have made zero. In MatLab, what
we have just done could be written like this
The code above does what we just did by hand. Now do the same thing again, but
start in the column 2 and row 2 position in the new matrix A∗ . The new pivoting
element is 4, so below it in column 2, we divide the rest of the elements of column
2 by 4 and store the results. This gives
⎡ ⎤
8 2 3
A = ⎣− 8 4 3.5⎦
∗ 4
7 25 51
8 16 8
7 25 29
8 16 32
7 25 29
8 16 32
Let this final matrix be called B. We can extract the lower triangular part of B using
the MatLab command tril(B,-1) and the lower triangular matrix L formed from
A is then made by adding a main diagonal of 1’s to this. The upper triangular part of
A is then U which we find by using triu(B). In code this is
7 25 29
8 16
1 0 0 32
[ n , n ] = s i z e (A) ;
f o r k =1: n−1
13 % find multiplier
A( k +1 : n , k ) = A( k +1: n , k ) /A( k , k ) ;
% z e r o o u t column
A( k +1: n , k +1: n ) = A( k +1: n , k +1: n ) − A( k +1 : n , k ) ∗A( k , k +1: n ) ;
end
18 L = e y e ( n , n ) + t r i l (A, −1) ;
U = t r i u (A) ;
end
35 1.6471
4.7859
−0.4170
0.9249
x = UTriSol (U, y )
40 x =
0.0103
0.0103
0.3436
45 0.0103
0.0103
c = A∗x
c =
1.0000
50 3.0000
5.0000
7.0000
9.0000
2 1 5 3 4
y = LTriSol (L, b ( piv ) ) ;
y
37 y =
3.0000
−1.2174
8.5011
0.3962
42 −0.2279
x = UTriSol (U, y ) ;
x
x =
0.0103
47 0.0103
0.3436
0.0103
0.0103
c = A∗x
52 c =
1.0000
3.0000
5.0000
7.0000
57 9.0000
A v = r v? (2.5)
There are many ways to interpret what such a number and vector pair means, but for
the moment, we will concentrate on finding such a pair (r, v). Now, if this was true,
we could rewrite the equation as
r v− Av = 0 (2.6)
and it acts like multiplying by one with numbers; i.e. I v = v for any vector v. Thus,
instead of saying r v, we could say r I v. We can therefore write Eq. 2.6 as
r I v− Av = 0 (2.7)
We know that we can factor the vector v out of the left hand side and rewrite again
as Eq. 2.8.
r I−A v=0 (2.8)
Now recall that we want the vector v to be non zero. Note, in solving this system,
there are two possibilities:
2.10 Eigenvalues and Eigenvectors 51
(i): the determinant of B is non zero which implies the only solution is v = 0.
(ii): the determinant of B is zero which implies the there are infinitely many solutions
for v all of the form a constant c times some non zero vector E.
Here the matrix B = r I − A. Hence, if we want a non zero solution v, we must
look for the values of r that force det (r I − A) = 0. Thus, we want
0 = det (r I − A)
r − a −b
= det
−c r − d
= (r − a) (r − d) − b c
= r 2 − (a + d) r + ad − bc.
det (r I − A) = 0.
Av =r v
for the eigenvalue r is then called an eigenvector associated with the eigenvalue r
for the matrix A.
Comment 2.10.1 Since this is a quadratic equation, there are always two roots
which take the forms below:
(i): the roots r1 and r2 are real and distinct,
(ii): the roots are repeated r1 = r2 = c for some real number c,
(iii): the roots are complex conjugate pairs; i.e. there are real numbers α and β so
that r1 = α + β i and r2 = α − β i.
52 2 Linear Algebra
or
r + 3 −4
0 = det
1r −2
= (r + 3)(r − 2) + 4
= r2 + r − 2
= (r + 2)(r − 1)
This gives
1 −4
1 −4
The two rows of this matrix should be multiples of one another. If not, we made
a mistake and we have to go back and find it. Our rows are indeed multiples, so
pick one row to solve for the eigenvector. We need to solve
1 −4 v1 0
=
1 −4 v2 0
v1 − 4 v2 = 0
1
v2 = v1
4
2.10 Eigenvalues and Eigenvectors 53
The vector
1
1/4
This gives
4 −4
1 −1
Again, the two rows of this matrix should be multiples of one another. If not, we
made a mistake and we have to go back and find it. Our rows are indeed multiples,
so pick one row to solve for the eigenvector. We need to solve
4 −4 v1 0
=
1 −1 v2 0
v1 − v2 = 0
v2 = v1
The vector
1
1
or
r − 4 −9
0 = det
1r +6
= (r − 4)(r + 6) + 9
= r 2 + 2 r − 15
= (r + 5)(r − 3)
This gives
−9 −9
1 1
The two rows of this matrix should be multiples of one another. If not, we made
a mistake and we have to go back and find it. Our rows are indeed multiples, so
pick one row to solve for the eigenvector. We need to solve
−9 −9 v1 0
=
1 1 v2 0
v1 + v2 = 0
v2 = − v1
2.10 Eigenvalues and Eigenvectors 55
The vector
1
−1
This gives
−1 −9
1 9
Again, the two rows of this matrix should be multiples of one another. If not, we
made a mistake and we have to go back and find it. Our rows are indeed multiples,
so pick one row to solve for the eigenvector. We need to solve
−1 −9 v1 0
=
1 9 v2 0
v1 + 9 v2 = 0
−1
v2 = v1
9
Letting v1 = B, we find the solutions have the form
v1 1
=B −1
v2 9
The vector
1
−1
9
2.10.1 Homework
det (r I − A) = 0.
Av =r v
for the eigenvalue r is then called an eigenvector associated with the eigenvalue r
for the matrix A.
Comment 2.10.2 Since this is a polynomial equation, there are always n roots some
of which are real numbers which are distinct, some might be repeated and some might
be complex conjugate pairs (and they can be repeated also!). An example will help.
Suppose we started with a 5 × 5 matrix. Then, the roots could be
1. All the roots are real and distinct; for example, 1, 2, 3, 4 and 5.
2. Two roots are the same and three roots are distinct; for examples, 1, 1, 3, 4 and
5.
3. Three roots are the same and two roots are distinct; for examples, 1, 1, 1, 4 and
5.
4. Four roots are the same and one roots is distinct from that; for examples, 1, 1,
1, 1 and 5.
5. Five roots are the same; for examples, 1, 1, 1, 1 and 1.
6. Two pairs of roots are the same and one roots is different from them; for examples,
1, 1, 3, 3 and 5.
7. One triple root and one pair of real roots; for examples, 1, 1, 1, 3 and 3.
8. One triple root and one complex conjugate pair of roots; for examples, 1, 1, 1,
3 + 4i and 3 − 4i.
9. One double root and one complex conjugate pair of roots and one different real
root; for examples, 1, 1, 2, 3 + 4i and 3 − 4i.
10. Two complex conjugate pair of roots and one different real root; for examples,
−2, 1 + 6i, 1 − 6i, 3 + 4i and 3 − 4i.
We will now discuss certain ways to compute eigenvalues and eigenvectors for a
square matrix in MatLab. For a given A, we can compute its eigenvalues as follows:
58 2 Linear Algebra
So we have found the eigenvalues of this small 3 × 3 matrix. To get the eigenvectors,
we do this:
Note the eigenvalues are not returned in ranked order. The eigenvalue/eigenvector
pairs are thus
λ1 = −0.3954
⎡ ⎤
0.7530
V1 = ⎣ −0.6525 ⎦
0.0847
λ2 = 11.8161
⎡ ⎤
−0.3054
V2 = ⎣ −0.7238 ⎦
−0.6187
λ3 = −6.4206
⎡ ⎤
−0.2580
V3 = ⎣ −0.3770 ⎦
0.8896
It is possible to show that the eigenvalues of a symmetric matrix will be real and
eigenvectors corresponding to distinct eigenvalues will be 90◦ apart. Such vectors
are called orthogonal and recall this means their inner product is 0. Let’s check it
out. The eigenvectors of our matrix are the columns of W above. So their dot product
should be 0!
Well, the dot product is not actually 0 because we are dealing with floating point
numbers here, but as you can see it is close to machine zero (the smallest number
our computer chip can detect). Welcome to the world of computing!
Reference
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science + Business Media Singapore Pte
Ltd., Singapore, 2015 in press)
Chapter 3
Numerical Methods Order One ODEs
Now that you are taking this course on More Calculus for Cognitive Scientists,
we note that in the previous course, you were introduced to continuity, derivatives,
integrals and models using derivatives. You were also taught about functions of two
variables and partial derivatives along with more interesting models. You were also
introduced to how to solve models using Euler’s method. We use these ideas a lot,
so there is a much value in reviewing this material. So let’s dive into it again. When
we try to solve systems like
dy
= f (t, y) (3.1)
dt
y(t0 ) = y0 (3.2)
where f is continuous in the variables t and y, and y0 is some value the solution is
to have at the time point t0 , we will quickly find that it is very hard in general to do
this by hand. So it is time to begin looking at how the MatLab environment can help
us. We will use MatLab to solve these differential equations with what are called
numerical methods. First, let’s discuss how to approximate functions in general.
f (x) = f ( p) + E 0 (x, p)
where E 0 (x, p) is the error. On the other hand, we could try to find the best straight
line that does the job. We would find
where E 1 (x, p) is the error now. This straight line is the first order Taylor polyno-
mial but we know it also as the tangent line. We can continue finding polynomials
of higher and higher degree and their respective errors. In this class, our interests
stop with the quadratic case. We would find
where E 2 (x, p) is the error. This is called the second order Taylor polynomial or
quadratic approximation. Now let’s dig into the theory behind this so that we can
better understand the error terms.
Let’s consider a function which is defined locally at the point p. This means there
is at least an interval (a, b) containing p where f is defined. Of course, this interval
could be the whole x axis!. Let’s also assume f exists locally at p in this same
interval. Now pick any x is the interval [ p, b) (we can also pick a point in the left
hand interval (a, p] but we will leave that discussion to you!). From Calculus I,
recall Rolle’s Theorem and the Mean Value Theorem. These are usually discussed in
Calculus I, but we really prove them carefully in course called Mathematical Analysis
(but that is another story).
and
f (b) − f (a)
= f (c).
b−a
3.1 Taylor Polynomials 63
Our function f on the interval [ p, x] satisfies all the requirements of the Mean Value
Theorem. So we know there is a point cx with p < cx < x so that
f (x) − f ( p)
= f (cx ).
x−p
P0 ( p, x) = f ( p).
We’ll call this the 0th order Taylor Polynomial for f at the point p. Next, let the 0th
order error term be defined by
The error or remainder term is clearly the difference or discrepancy between the
actual function value at x and the 0th order Taylor Polynomial. Since f (x) − f ( p) =
f (cx )(x − p), we can write all we have above as
We can interpret what we have done by saying f ( p) is the best choice of 0th order
polynomial or constant to approximate f (x) near p. Of course, for most functions,
this is a horrible approximation! So the next step is to find the best straight line that
approximates f near p. Let’s try our usual tangent line to f at p. We summarize this
result as a theorem.
3.1.2.1 Examples
Example 3.1.1 If f (t) = t 3 , by the theorem above, we know on the interval [1, 3]
that at 1 f (t) = f (1) + f (c)(t − 1) where c is some point between 1 and t. Thus,
t 3 = 1 + (3c2 )(t − 1) for some 1 < c < t. So here the zeroth order Taylor
Polynomial is P0 (t, 1) = 1 and the error is E 0 (t, 1) = (3c2 )(t − 1).
Example 3.1.2 If f (t) = e−1.2t , by the theorem above, we know that at 0 f (t) =
f (0) + f (c)(t − 0) where c is some point between 0 and t. Thus, e−1.2t = 1 +
(−1.2)e−1.2c (t −0) for some 0 < c < t or e−1.2t = 1−1.2e−1.2c t. So here the zeroth
order Taylor Polynomial is P0 (t, 1) = 1 and the error is E 0 (t, 0) = −1.2e−1.2c t.
T (x) = f ( p) + f ( p) (x − p)
and the term E 1 (x, p) represents the error between the true function value f (x) and
the tangent line value T (x). That is
Another way to look at this is that the tangent line is the best straight line or linear
approximation to f at the point p. We all know from our first calculus course how
these pictures look. If the function f is curved near p, then the tangent line is not
a very good approximation to f at p unless x is very close to p. Now, let’s assume
f is actually two times differentiable on the local interval (a, b) also. Define the
constant M by
f (x) − f ( p) − f ( p)(x − p)
M= .
(x − p)2
3.1 Taylor Polynomials 65
In this discussion, this really is a constant value because we have fixed our value of
x and p already. We can rewrite this equation as
Then,
Then,
and
g( p) = f ( p) − f ( p) − f ( p) ( p − p) − M( p − p)2 = 0.
We thus know g(x) = g( p) = 0. Also, from the Mean Value Theorem, there is a
point cx0 between p and x so that
g(x) − g( p)
= g (cx0 ).
p−x
Since the numerator is g(x) − g( p), we now know g (cx0 ) = 0. But we also have
g ( p) = f ( p) − f ( p) − 2M( p − p) = 0.
Next, we can apply Rolle’s Theorem to the function g . This tells us there is a point
cx1 between p and cx0 so that g (cx1 ) = 0. Thus,
1 1
M= f (cx ).
2
66 3 Numerical Methods Order One ODEs
f (x) − f ( p) − f ( p)(x − p) 1
= f (cx1 ), some cx1 with p < cx1 < cx0 .
(x − p)2 2
P1 (x, p) = f ( p) + f ( p)(x − p)
E 1 (x, p) = f (x) − P1 (x, p) = f (x) − f ( p) − f ( p)(x − p)
1
= f (cx1 )(x − p)2 .
2
Thus, we have shown, E 1 (x, p) satisfies
(x − p)2
E 1 (x, p) = f (cx1 ) (3.4)
2
where cx1 is some point between x and p. Note the usual Tangent line is the same as
the first order Taylor Polynomial, P1 ( f, p, x) and we have a nice representation of
our error. We can state this as our next theorem:
3.1.3.1 Example
Example 3.1.4 For f (t) = e−1.2t on the interval [0, 5] find the tangent line approx-
imation, the error and maximum the error can be on the interval.
3.1 Taylor Polynomials 67
where c is some point between 0 and t. Hence, c is between 0 and 5 also. The first
order Taylor Polynomial is P1 (t, 0) = 1 − 1.2t which is also the tangent line to
e−1.2t at 0. The error is (1/2)(−1.2)2 e−1.2c t 2 .
Now let AE(t) denote absolute value of the actual error at t and ME be maximum
absolute error on the interval. The largest the error can be on [0, 5] is when f (c)
is the biggest it can be on the interval. Here,
Problem Two: Let’s find the tangent line approximations for a simple exponential
decay function again but let’s do it a bit more generally.
Example 3.1.5 If f (t) = e−βt , for β = 1.2 × 10−5 , find the tangent line approxi-
mation, the error and the maximum error on [0, 5].
Solution At any t
1
f (t) = f (0) + f (0)(t − 0) + f (c)(t − 0)2
2
1
= 1 + (−β)(t − 0) + (−β)2 e−βc (t − 0)2
2
1
= 1 − βt + β 2 e−βc t 2 .
2
where c is some point between 0 and t which means c is between 0 and 5. The first
order Taylor Polynomial is P1 (t, 0) = 1 − βt which is also the tangent line to e−βt
at 0. The error is 21 β 2 e−βc t 2 . The largest the error can be on [0, 5] is when f (c) is
the biggest it can be on the interval. Here,
−5
AE(t) = |(1/2)(1.2 × 10−5 )2 e−1.2×10 c t 2 | ≤ (1/2)(1.2 × 10−5 )2 (1)(5)2
= (1/2)1.44 × 10−10 (25) = ME
We could also ask what quadratic function Q fits f best near p. Let the quadratic
function Q be defined by
68 3 Numerical Methods Order One ODEs
(x − p)2
Q(x) = f ( p) + f ( p) (x − p) + f ( p) . (3.5)
2
The new error is called E Q (x, p) and is given by
If f is three times differentiable, we can argue like we did in the tangent line approx-
imation (using the Mean Value Theorem and Rolle’s theorem on an appropriately
defined function g) to show there is a new point cx2 between p and cx1 with
(x − p)3
E Q (x, p) = f (cx2 ) (3.6)
6
So if f looks like a quadratic locally near p, then Q and f match nicely and the
error is pretty small. On the other hand, if f is not quadratic at all near p, the error
will be large. We then define the second order Taylor polynomial, P2 ( f, p, x) and
second order error, E 2 (x, p) = E Q (x, p) by
1
P2 (x, p) = f ( p) + f ( p)(x − p) + f ( p)(x − p)2
2
1
E 2 (x, p) = f (x) − P2 (x, p) = f (x) − f ( p) − f ( p)(x − p) − f ( p)(x − p)2
2
1 2
= f (cx ) (x − p)3 .
6
Theorem 3.1.5 (Second Order Taylor Polynomial)
Let f : [a, b] → be continuous on [a, b] and be at least three times differentiable
on (a, b). Given p in [a, b], for each x, there is at least one point c, between p and x, so
that f (x) = f ( p) + f ( p)(x − p) + (1/2) f ( p)(x − p)2 + (1/6) f (c)(x − p)3 .
The quadratic f ( p) + f ( p)(x − p) + (1/2) f ( p)(x − p)2 is called the second
order Taylor Polynomial for f at p and we denote it by P2 (x, p). The point p
is again called the base point. Note we are approximating f (x) by the quadratic
f ( p) + f ( p)(x − p) + (1/2) f ( p)(x − p)2 and the error we make is E 2 (x, p) =
(1/6) f (c)(x − p).
3.1.4.1 Examples
Example 3.1.6 If f (t) = e−βt , for β = 1.2 × 10−5 , find the second order approxi-
mation, the error and the maximum error on [0, 5].
3.1 Taylor Polynomials 69
Solution For each t in the interval [0, 5], then there is some 0 < c << t < 5 so that
We can find higher order Taylor polynomials and remainders using these arguments
as long as f has higher order derivatives. But, for our purposes, we can stop here.
Let’s try to approximate the solution to the model x = f (x) with x(0) = x0 . Note
the dynamics does not depend on time t. The solution x(t) can be written
where ch is between 0 and h. We can rewrite this more. Note x = f (x) tells
us we can replace x (0) by f (x(0)) = f (x0 ). Also, since x = f (x), the chain
rule tells us x = f (x) x = (d f /d x) f where we let f (x) = (d f /d x)(x). So
x (ch ) = f (x(ch )) f (x(ch )) and we have
Let x1 be the true solution x(h) and let x̂0 be the starting or zeroth Euler approximate
which is defined by x̂0 = x0 . Hence, we make no error at first. Further, let the first
Euler approximate x̂1 be defined by x̂1 = x0 + f (x0 ) h = x̂0 + f (x̂0 ) h. Which is
the tangent line approximation to x at the point t = 0! Then we have
We are almost there! Next, we can apply the Mean Value Theorem to the difference
f (x1 ) − f (x̂1 ) and find f (x1 ) − f (x̂1 ) = f (xd )(x1 − x̂1 ) with xd between x1 and
x̂1 . Plugging this in, we have
Thus
eut = 1 + ut + u 2 euc t 2 /2
for some c between 0 and u. But the error term is positive so we know eut ≥ 1 + ut.
Letting u = Ch and using t = 1, we have
72 3 Numerical Methods Order One ODEs
1 + (1 + Ch) ≤ 1 + eCh .
and so
E 2 ≤ (C h 2 /2) (1 + eCh )
Now let’s do the approximation for x(3h). We will let x3 = x(3h) and we will
define the third Euler approximate by x̂3 = x̂2 + f (x̂2 ) h. The tangent line approxi-
mation to x at 2h gives
But we can apply the Mean Value Theorem to the difference f (x2 ) − f (x̂2 ). We
find f (x2 ) − f (x̂2 ) = f (xu )(x2 − x̂2 ) with xu between x2 and x̂2 . Plugging this
in, we find
Thus
We have already shown that 1 + u ≤ eu for any u. It follows (1 + u)2 ≤ (eu )2 = e2u .
So we have
E 3 ≤ C 1 + (1 + Ch) + (1 + Ch)2 h 2 /2
≤ C 1 + eCh + e2Ch h 2 /2.
e3Ch − 1 2
E3 ≤ C h /2.
eCh − 1
e N Ch − 1 2
EN ≤ C h /2.
eCh − 1
eC T − 1 2
EN ≤ C h /2.
eCh − 1
The total or global error thus has the form of a constant times h and hence, the
solution on the entire interval is of order h. Of course, the local error at each step is
on the order of h 2 .
If the dynamics function depends on time, we will still be able to look carefully at the
Euler approximates. Let’s look carefully at the errors we make when we use Euler’s
method in this case. We start with a preliminary result called a Lemma.
Proof We know if the base point is 0, then since e x is twice differentiable, there is
a number ξ between 0 and x so that e x = 1 + x + eξ x2 . However, the last term is
2
always nonnegative and so dropping that term we obtain e x ≥ 1 + x from which the
final result follows.
We need another basic idea about functions which is called Lipschitzity. This is a
definition.
| f (x) − f (y)| ≤ K |x − y|
Many functions satisfy a Lipschitz condition. For example, if f has a derivative that
is bounded by the constant K on [a, b], this means | f (x)| ≤ K for all x in [a, b].
We can then apply the Mean Value Theorem to f on any interval [x, y] in [a, b] to
see there is a number ξ between x and y so that
on a rectangle D which is the set of (t, x) with t ∈ [a, b] and x ∈ [c, d] with the point
(t0 , x0 ) in D for some finite intervals [a, b] and [c, d]. The theory of the solutions to
our models allows us to make sure the value of d is large enough to hold the image
of the solution x; i.e., x(t) ∈ [c, d] for all t ∈ [t0 , b]. Also, for convenience, we
are assuming x is a scalar variable although the arguments we present can easily be
modified to handle x being a vector. Using the Taylor Remainder theorem, this gives
for a given time point t:
dx 1 d2x 1 d3x
x(t) = x(t0 ) + (t0 ) (t − t0 ) + (t 0 ) (t − t 0 ) 2
+ (ξ) (t − t0 )3
dt 2 dt 2 6 dt 3
where ξ is some point in the interval [t0 , t]. We also know by the chain rule that since
x is f that
d2x ∂f ∂ f dx
= +
dt 2 ∂t ∂t dt
d2x ∂f ∂f
= + f
dt 2 ∂t ∂x
which implies, switching to a standard subscript notation for partial derivatives (yes,
these calculations are indeed yucky2 !)
d3x
= ( f tt + f t x f ) + ( f xt + f x x f ) f + f x ( f t + f x f )
dt 3
The important thing to note is that this third order derivative is made up of algebraic
combinations on f and its various partial derivatives. We typically assume that on the
interval [t0 , b], that all of these functions are continuous and bounded. Thus, letting
||g|| represent the maximum value of the continuous function g(s, u) on the interval
[t0 , b] × [c, d], we know there is a constant B so that
|x (ξ)| ≤ (|| f tt ||∞ + || f t x ||∞ || f ||∞ ) + (|| f xt ||∞ + || f x x ||∞ || f ||∞ ) || f ||∞
+ || f x ||∞ (|| f t ||∞ + || f x ||∞ || f ||∞ )
= C.
That is, ||x ||∞ ≤ C on [t0 , b]. Of course, if f has sufficiently smooth higher order
partial derivatives, we can find bounds on higher derivatives of x as well. Now, using
the standard abbreviations f 0 = f (t0 , x0 ), f t0 = ∂∂tf (t0 , x0 ) and f x0 = ∂∂xf (t0 , x0 )
with similar notations for the second order partials, we see our solution can be written
as
76 3 Numerical Methods Order One ODEs
1 1 d3x
x(t) = x0 + f 0 (t − t0 ) + ( f t0 + f x0 f 0 ) (t − t0 )2 + (ξ) (t − t0 )3
2 6 dt 3
1
= x0 + f 0 (t − t0 ) + ( f t0 + f x0 f 0 ) (t − t0 )2
2
1
+ f tt + f t x f ) + ( f xt + f x x f ) f + f x ( f t + f x f (t − t0 )3 .
6 ξ
We can now state a result which tells us how much error we make with Euler’s
method. From the remarks above, the assumption ||x ||∞ is bounded is not an un-
reasonable for many models we wish to solve. Our discussions above allow us to
be fairly quantitative about how much local error and global error we make using
Euler’s approximations. We state this as a theorem.
where xn is the value of the true solution x(tn ) and e0 = x0 − x̂0 . If we also know
e0 = 0 (the usual state of affairs), then, letting the constant B be defined by
e(b−a)K − 1
B= ||x ||∞ ,
2K
we can say x(b) − x̂ N (h) ≤ B h where N (h) is the index at which t Nh = b.
Proof From our remarks earlier, our assumptions on f and its first order partials
tell us f satisfies a Lipschitz condition with Lipschitz constant K in D and that the
solution x has a bounded second derivative on the interval [t0 , b]. Let en = xn − x̂n
and τn = h2 x (ξn ). Then, the usual Taylor series expansion gives
h
x(tn+1 ) = x(tn ) + h f (tn , x(tn )) + h x (ξn )
2
= x(tn ) + h f (tn , x(tn )) + h τn
Subtracting, we find
xn+1 − x̂n+1 = xn − x̂n + h f (tn , x(tn )) − f (tn , x̂n ) + h τn .
Thus,
en+1 = en + h f (tn , x(tn )) − f (tn , x̂n ) + h τn .
h h
|τn | = x (ξn ) ≤ ||x ||∞ .
2 2
This is a recursion relation. We can easily see what is happening by working out a
few terms.
n−1
|en | ≤ |e0 |(1 + h K ) + h|τ (h)|
n
(1 + h K )i .
i=0
1 − rn
1 + r + r 2 + r 3 + · · · + r n−1 = .
1−r
(1 + h K )n − 1
|en | ≤ |e0 |(1 + h K )n + |τ (h)| .
K
Then, using Lemma 3.3.1, we have
(1 + h K )n − 1
|en | ≤ |e0 | enh K + h|τ (h)| .
hK
But since tn = t0 + nh, we know nh ≤ b − t0 leading to
(1 + h K )n − 1
|en | ≤ |e0 | e(b−t0 )K + |τ (h)| .
K
e(b−a)K − 1
|en | ≤ |e0 | e(b−t0 )K + |τ (h)| .
K
Now if e0 = 0 (as it normally would be), we have
e(b−a)K − 1
|en | ≤ ||x ||∞ h
2K
and letting
e(b−a)K − 1
B= ||x ||∞ ,
2K
we have |e N (h) | ≤ Bh as required.
Comment 3.3.1 Note the local error we make at each step is proportional to h 2 but
the global error after we reach t = b is proportional to h. Hence, Euler’s method is
an order 1 method.
3.4 Euler’s Algorithm 79
3.4.1 Examples
Example 3.4.1 Find the first three Euler approximates for x = −2x, x(0) = 3 using
h = 0.3. Find the true solution values and errors also.
Solution Here f (x) = −2x and the true solution is x(t) = 3e−2t .
Step 0: x̂0 = x0 = 3 so E 0 = 0.
Step 1:
Step 2:
Step 3:
Example 3.4.2 Find the first three Euler approximates for x = 2x, x(0) = 4 using
h = 0.2. Find the true solution values and errors also.
Step 2:
Step 3:
These methods are based on more sophisticated ways of approximating the solution
y . These methods use multiple function evaluations at different time points around a
given t ∗ to approximate y(t ∗ ). In more advanced classes, we can show this technique
generates a sequence {yn } starting at y0 using the following recursion equation:
3.5 Runge–Kutta Methods 81
yn+1 = yn + h × F o (tn , yn , h, f )
y0 = y0
where h is the step size we use for our underlying partition of the time space giving
ti = t0 + i × h
for appropriate indices and F o is a fairly complicated function of the previous approx-
imate solution, the step size and the right hand side function f . The Runge–Kutta
methods are available for various choices of the superscript o which is called the
order of the method. We will not discuss much about F o in this course, as it is best
served up in a more advanced class. What we can say is this: For order o, the local
error is like h o+1 . So
Order One: Local error is h 2 and this method is the same as the Euler Method.
The global error then goes down linearly with h.
Order Two: Local error is h 3 and this method is better than the Euler Method. If
the global error for a given stepsize h is then E, halving the stepsize to h/2 gives
a new global error of E/4. Thus, the global error goes down quadratically. This
means halving the stepsize has a dramatic effect of the global error.
Order Three: Local error is h 4 and this method is better than the Euler Method. If
the global error for a given stepsize h is E, then halving the stepsize to h/2 gives a
new global error of E/8. Thus, the global error goes down as a cubic power. This
means halving the stepsize has an even more dramatic effect of the global error.
Order Four: Local error is h 5 and this method is better than the order three
Method. If the global error for a given stepsize h is E, then halving the step-
size to h/2 gives a new global error of E/16. Thus, the global error goes down as
a fourth power! This means halving the stepsize has huge effect of the global error.
We will now look at MatLab code that allows us to solve our differential equation
problems using the Runge–Kutta method instead of the Euler method of Sect. 3.3.
The basic code to implement the Runge–Kutta methods is broken into two pieces.
The first one, RKstep.m implements the evaluation of the next approximation so-
lution at point (tn , yn ) given the old approximation at (tn−1 , yn−1 ). We then loop
through all the steps to get to the chosen final time using the code in FixedRK.m.
The details of these algorithms are beyond the scope of this text and so we will not go
into them here. In this code, we are allowing for the dynamic functions to depend on
82 3 Numerical Methods Order One ODEs
time also. Previously, we have used dynamics like f = @(x) 3*x and we expect
the dynamics functions to have that form in DoEuler. However, we want to have
more complicated dynamics now—at least the possibility of it!—so we will adapt
what we have done before. We will now define our dynamics as if they depend on
time. So from now on, we would write f=@(t,x) 3*x even though there is no
time dependence. We then rewrite our DoEuler to DoEulerTwo so that we can
use these more general dynamics. This code is in DoEulerTwo.m and we have
discussed it in the first text on starting your calculus journey. You can review this
function in that text. The Runge–Kutta code uses the new dynamics functions. We
have gone over this code in the previous text, but we will show it to you again for
completeness. In this code, you see the lines like feval(fname,t,x) which
means take the function fname passed in as an argument and evaluate it as the pair
(t,x). Hence, fname(t,x) is the same as f(t,x).
The code above does all the work. It manages all of the multiple tangent line
calculations that Runge–Kutta needs at each step. We loop through all the steps to
get to the chosen final time using the code in FixedRK.m which is shown below.
Here is an example where we solve a specific model using all four Runge–Kutta
choices and plot them all together. Note when we use RKstep, we only return the
first two outputs; that is, for our returned variables, we write [htime1,xhat1] =
FixedRK(f,0,20,0.06,1,N1); instead of returning the full list of outputs
which includes function evaluations [htime1,xhat1,fhat1] = FixedRK
(f,0,20,0.06,1,N1);. We can do this as it is all right to not return the third
output. However, you still have to return the arguments in the order stated when the
function is defined. For example, if we used the command [htime1,fhat1] =
FixedRK(f,0,20,0.06,1,N1);, this would return the approximate values
and place them in the variable fhat1 which is not what we would want to do.
84 3 Numerical Methods Order One ODEs
Now let’s review functions of more than one variable since it may have been some
time since you looked at these ideas. This is gone over carefully in the first text on
starting calculus ideas, but it is a good idea to talk about them again.
Let’s start by looking at the x−y plane as a collection of two dimensional vectors.
Each vector is rooted at the origin and the head of the vector corresponds to our
usual coordinate pair (x, y). The set of all such x and y determines the x−y plane
which we will also call 2 . The superscript two is used because we are now explicitly
acknowledging that we can think of these ordered pairs as vectors also with just a
slight identification on our part. Since we know about vectors, note if we have a
vector we can rewrite it, using our standard rules for vector arithmetic and scaling
of vectors as
6 1 0
=6 +7
7 0 1
A little thought will let you see we can do this for any vector and so we define special
vectors i = e1 and j = e2 as follows:
1 0
i = e1 = and j = e2 =
0 1
x
= x e1 + y e1
y
=xi+y j
Now let’s start looking at functions that map each ordered pair (x, y) into a number.
Let’s begin with an example. Consider the function f (x, y) = x 2 + y 2 defined for
all x and y. Hence, for each x and y we pick, we calculate a number we can denote by
z whose value is f (x, y) = x 2 + y 2 . Using the same ideas we just used for the x−y
plane, we see the set of all such triples (x, y, z) = (x, y, x 2 + y 2 ) defines a surface
in 3 which is the collection of all ordered triples (x, y, z). Each of these triples can
be identified with a three dimensional vector whose tail is the origin and whose head
is the triple (x, y, z). We note any three dimensional vector can be written as
⎡ ⎤
x
⎣ y ⎦ = x e1 + y e2 + yz e3
z
=xi+y j+ zk
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0
i = e1 = ⎣0⎦ , j = e2 = ⎣1⎦ and k = e3 = ⎣0⎦
0 0 1
We can plot this surface in MatLab with fairly simple code. As discussed in the first
volume, the utility function DrawSimpleSurface which manages how to draw
such a surface using boolean variables like DoGrid to turn a graph on or off. The
surface is drawn using a grid, a mesh, traces, patches and columns and a base all of
which contribute to a somewhat cluttered figure. Hence, the boolean variables allows
us to select how much clutter we want to see! So if the boolean variable DoGrid is
set to one, the grid is drawn. The code is self-explanatory so we just lay it out here.
We haven’t shown all the code for the individual drawing functions, but we think
you’ll find it interesting to see how we manage the pieces in this one piece of code.
So check this out.
4.1 Functions of Two Variables 87
Hence, to draw everything for this surface, we would use the session:
This surface has circular cross sections for different positive values of z and it is
called a circular paraboloid. If you used f (x, y) = 4x 2 + 3y 2 , the cross sections
for positive z would be ellipses and we would call the surface an elliptical paraboloid.
Now this code is not perfect. However, as an exploratory tool it is not bad! Now it is
time for you to play with it a bit in the exercises below.
88 4 Multivariable Calculus
4.1.1 Homework
Exercise 4.1.1 Explore the surface graph of the circular paraboloid f (x, y) = x 2 +
y 2 for different values of (x0 , y0 ) and x and y. Experiment with the 3D rotated
view to make sure you see everything of interest.
Exercise 4.1.2 Explore the surface graph of the elliptical paraboloid f (x, y) =
2x 2 + y 2 for different values of (x0 , y0 ) and x and y. Experiment with the 3D
rotated view to make sure you see everything of interest.
Exercise 4.1.3 Explore the surface graph of the elliptical paraboloid f (x, y) =
2x 2 + 3y 2 for different values of (x0 , y0 ) and x and y. Experiment with the 3D
rotated view to make sure you see everything of interest.
4.2 Continuity
Let’s recall the ideas of continuity for a function of one variable. Consider these three
versions of a function f defined on [0, 2].
⎧ 2
⎨x , if 0 ≤ x < 1
f (x) = 10, if x = 1
⎩
1 + (x − 1)2 if 1 < x ≤ 2.
The first version is not continuous at x = 1 because although the lim x→1 f (x) exists
and equals 1 (lim x→1− f (x) = 1 and lim x→1+ f (x) = 1), the value of f (1) is 10
which does not match the limit. Hence, we know f here has a removable discontinuity
at x = 1. Note continuity failed because the limit existed but the value of the function
did not match it. The second version of f is given below.
x 2, if 0 ≤ x ≤ 1
f (x) =
(x − 1)2 if 1 < x ≤ 2.
In this case, the lim x→1− = 1 and f (1) = 1, so f is continuous from the left. How-
ever, lim x→1+ = 0 which does not match f (1) and so f is not continuous from the
right. Also, since the right and left hand limits do not match at x = 1, we know
lim x→1 does not exist. Here, the function fails to be continuous because the limit
does not exist. The final example is below:
x 2, if 0 ≤ x < 1
f (x) =
x + (x − 1)2 if 1 < x ≤ 2.
4.2 Continuity 89
Here, the limit and the function value at 1 both match and so f is continuous at
x = 1. To extend these ideas to two dimensions, the first thing we need to do is
to look at the meaning of the limiting process. What does lim(x,y)→(x0 ,y0 ) mean?
Clearly, in one dimension we can approach a point x0 from x in two ways: from
the left or from the right or jump around between left and right. Now, it is apparent
that we can approach a given point (x0 , y0 ) in an infinite number of ways. Draw
a point on a piece of paper and convince yourself that there are many ways you
can draw a curve from another point (x, y) so that the curve ends up at (x0 , y0 )!
We still want to define continuity in the same way; i.e. f is continuous at the point
(x0 , y0 ) if lim(x,y)→(x0 ,y0 ) f (x, y) = f (x0 , y0 ). If you look at the graphs of the surface
z = x 2 + y 2 we have done previously, we clearly see that we have this kind of
behavior. There are no jumps, tears or gaps in the surface we have drawn. Let’s make
this formal.
Here is an example of a function which is not continuous at the point (0, 0). Let
If we show the limit as we approach (0, 0) does not exist, then we will know f is not
continuous at (0, 0). If this limit exists, we should get the same value for the limit
no matter what path we take to reach (0, 0). Let the first path be given by x(t) = t
and y(t) = 2t. Then, as t → 0, (x(t),
√ y(t)) → (0, √0) as desired. Plugging in to f ,
2t/ t 2 + 4t 2 = 2/ 5 and hence the limit along this
we find for t = 0, f (t, 2t) = √
path is this constant value 2/ 5. On the other hand, along the path x(t) = t and
y(t) = −3t, for t = 0, we have f (t, −3t) = 2/3 which is not the same. Since the
limiting value differs on two paths, the limit can’t exist. Hence, f is not continuous
at (0, 0).
Let’s go back to our simple surface example and look at the traces again. In Fig. 4.1, we
show the traces for the base point x0 = 0.5 and y0 = 0.5. We have also drawn vertical
lines down from the traces to the x−y plane to further emphasize the placement of
90 4 Multivariable Calculus
Fig. 4.1 The traces f (x0 , y) and f (x, y0 ) for the surface z = x 2 + y 2 for x0 = 0.5 and y0 = 0.5
the traces on the surface. The surface itself is not shown as it is somewhat distracting
and makes the illustration too busy.
You can generate this type of graph yourself with the function DrawFullTraces
as follows:
Note, that each trace has a well-defined tangent line and derivative at the points x0
and y0 . We have
d d 2
f (x, y0 ) = (x + y02 )
dx dx
= 2x
as the value y0 in this expression is a constant and hence its derivative with respect to
x is zero. We denote this new derivative as ∂∂xf which we read as the partial derivative
of f with respect to x. It’s value as the point (x0 , y0 ) is 2x0 here. For any value of
(x, y), we would have ∂∂xf = 2x. We also have
4.3 Partial Derivatives 91
d d 2
f (x0 , y) = (x + y 2 )
dy dy 0
= 2y
We then denote this new derivative as ∂∂ yf which we read as the partial derivative of
f with respect to y. It’s value as the point (x0 , y0 ) is then 2y0 here. For any value of
(x, y), we would have ∂∂ yf = 2y.
The tangent lines for these two traces are then
d
T (x, y0 ) = f (x0 , y0 ) + f (x, y0 ) (x − x0 )
dx x0
= (x02 + y02 ) + 2x0 (x − x0 )
d
T (x0 , y) = f (x0 , y0 ) + f (x0 , y) (y − y0 )
dy y0
We can also write these tangent line equations like this using our new notation for
partial derivatives.
∂f
T (x, y0 ) = f (x0 , y0 ) + (x0 , y0 ) (x − x0 )
∂x
= (x02 + y02 ) + 2x0 (x − x0 )
∂f
T (x0 , y) = f (x0 , y0 ) + (x0 , y0 ) (y − y0 )
∂y
= (x02 + y02 ) + 2y0 (y − y0 ).
We can draw these tangent lines in 3D. To draw T (x, y0 ), we fix the y value to be y0
and then we draw the usual tangent line in the x−z plane. This is a copy of the x−z
plane translated over to the value y0 ; i.e. it is parallel to the x−z plane we see at the
value y = 0. We can do the same thing for the tangent line T (x, y0 ); we fix the x
value to be x0 and then draw the tangent line in the copy of the y − z plane translated
to the value x0 . We show this in Fig. 4.3. Note the T (x, y0 ) and the T (x0 , y) lines
are determined by vectors as shown below.
⎡ ⎤ ⎡ ⎤
1 ⎡ ⎤ 0 ⎡ ⎤
⎢ ⎥ 1 ⎢ ⎥ 0
0 ⎥ = ⎣ 0 ⎦ and B = ⎢ 1
⎥=⎣ 1 ⎦
A=⎢
⎣ ⎦ ⎣d ⎦
d
dx
f (x, y0 ) 2x0 dy
f (x0 , y) 2y0
x0 y0
Note that if we connect the lines determined by the vectors A and B, we determine
a flat sheet which you can interpret as a piece of paper laid on top of these two lines.
92 4 Multivariable Calculus
Fig. 4.2 The traces f (x0 , y) and f (x, y0 ) for the surface z = x 2 + y 2 for x0 = 0.5 and y0 = 0.5
with added tangent lines. We have added the tangent plane determined by the tangent lines
Of course, we can only envision a small finite subset of this sheet of paper as you
can see in Fig. 4.2. Imagine that the sheet extends infinitely in all directions! The
sheet of paper we are plotting is called the tangent plane to our surface at the point
(x0 , y0 ). We will talk about this more formally later.
To draw this picture with the tangent lines, the traces and the tangent plane, we use
DrawTangentLines which has arguments (f,fx,fy,delx,nx,dely,ny,
r,x0,y0). There are three new arguments: fx which is ∂ f /∂x, fy which is ∂ f /∂ y
and r which is the size of the tangent plane that is plotted. For the picture shown in
Fig. 4.3, we’ve removed the tangent plane because the plot was getting pretty busy.
We did this by commenting out the line that plots the tangent plane. It is easy for
you to go into the code and add it back in if you want to play around. The MatLab
command line is
If you want to see the tangent plane as well as the tangent lines, all you have to
do is look at the following lines in DrawTangentLines.m.
4.3 Partial Derivatives 93
Fig. 4.3 The traces f (x0 , y) and f (x, y0 ) for the surface z = x 2 + y 2 for x0 = 0.5 and y0 = 0.5
with added tangent lines
These lines setup the tangent plane and the tangent plane is turned off if there is
a % in front of surf(U,V,W,’EdgeColor’,’blue’);. We edited the file to
take the % out so we can see the tangent plane. We then see the plane in Fig. 4.4 as
we saw before.
The ideas we have been discussing can be made more general. When we take the
derivative with respect to one variable while holding the other variable constant (as
we do when we find the normal derivative along a trace), we say we are taking a
partial derivative of f. Here there are two flavors: the partial derivative with respect
to x and the partial derivative with respect to y. We can now state some formal
definitions and introduce the notations and symbols we use for these things. We
define the process of partial differentiation carefully below.
Fig. 4.4 The traces f (x0 , y) and f (x, y0 ) for the surface z = x 2 + y 2 for x0 = 0.5 and y0 = 0.5
with added tangent lines
f (x, y) − f (x, y0 )
lim
x→x0 ,y=y0 x − x0
f (x, y) − f (x0 , y)
lim
x=x0 ,y→y0 y − y0
If these limits exists, they are called the partial derivatives of f with respect to x and
y at (x0 , y0 ), respectively.
Comment 4.3.2 We often use another notation for partial derivatives. The function
f of two variables x and y can be thought of as having two arguments or slots into
which we place values. So another useful notation is to let the symbol D1 f be f x and
D2 f be f y . We will be using this notation later when we talk about the chain rule.
Comment 4.3.3 It is easy to take partial derivatives. Just imagine the one variable
held constant and take the derivative of the resulting function just like you did in
your earlier calculus courses.
4.3 Partial Derivatives 95
∂z
Example 4.3.1 Let z = f (x, y) = x 2 + 4y 2 be a function of two variables. Find ∂x
and ∂∂zy .
Solution Thinking of y as a constant, we take the derivative in the usual way with
respect to x. This gives
∂z
= 2x
∂x
∂z
= 8y
∂y
∂z
= 8x y 3
∂x
∂z
= 12x 2 y 2
∂y
∂f (4x 3 )
= 3
∂x y +2
∂f (x 4 + 1)(3y 2 )
=−
∂y (y 3 + 2)2
96 4 Multivariable Calculus
x 4 y 2 +2
Example 4.3.4 f (x, y) = y 3 x 5 +20
.
Solution
Solution
∂f
= e−(x +y ) (−2x)
2 4
∂x
∂f
= e−(x +y ) (−4y 3 )
2 4
∂y
∂f 1 4y
=
∂y 2 x + 2y 2
2
4.3.1 Homework
These are for you: for each of these functions, find f x and f y .
First, functions with no cross terms.
Exercise 4.3.1 f (x, y) = x 2 + 3y 2 .
Exercise 4.3.2 f (x, y) = 4x 2 + 5y 4 .
Exercise 4.3.3 f (x, y) = −3x + 2y 8 .
4.3 Partial Derivatives 97
x 2 +2y
Exercise 4.3.9 f (x, y) = 5x+y 3
.
x 2 +2
Exercise 4.3.11 f (x, y) = 5+y
.
Before we discuss tangent planes to a function f again, let’s digress to the ideas of
planes in general in 3D. We define a plane as follows.
Definition 4.4.1 (Planes)
A plane in 3D through the point (x0 , y0 , z 0 ) is defined as the set of all points (x, y, z)
so that the angle between the vectors D and N is zero where D is the vector we get
by connecting the point (x0 , y0 , z 0 ) to the point (x, y, z). Hence, for
⎡ ⎤ ⎡ ⎤
x − x0 N1
D = ⎣ y − y0 ⎦ and N = ⎣ N2 ⎦
z − z0 N3
the plane is the set of points (x, y, z) so that < D, N > = 0. The vector N is called
the normal vector to the plane.
Comment 4.4.1 A little thought shows that any plane crossing through the origin
is a two dimensional subspace of 3 .
Example 4.4.1 The equation 2x + 3y − 5z = 0 defines the plane whose normal vec-
tor is N = [2, 3, −5]T which passes through the origin (0, 0, 0).
Example 4.4.2 The equation 2(x − 2) + 3(y − 1) − 5(z + 3) = 0 defines the plane
whose normal vector is N = [2, 3, −5]T which passes through the point (2, 1, −3).
Note this can be rewritten as 2x + 3y − 5z = 4 + 3 + 15 = 22 after a simple manip-
ulation.
Example 4.4.3 The equation 2x + 3y − 5z = 11 corresponds to a plane with normal
vector N = [2, 3, −5]T which passes through some point (x0 , y0 , z 0 ). There are an
infinite number of choices for this base point: any triple which solves 2x0 + 3y0 −
5z 0 = 11 will do the job. An easy way to pick one is to pick two and solve for the
third. So for example, if z 0 = 0 and y0 = 4, we find 2x0 + 12 = 11 which gives
x0 = −1/2. Thus, this plane could be rewritten as 2(x + 1/2) + 3(y − 4) − 5z = 0.
There is another very useful way to define a plane which we did not discuss in the first
volume. As long as the vectors A and B point in different directions, they determine
a new vector A × B which is perpendicular to both of them and can serve as the
normal to a plane. Note, saying the vectors A and B point in different directions is
the same as saying they are linear dependent.
Definition 4.4.2 (Planes Again)
The plane in 3D determined by the vectors A and B containing the point (x0 , y0 , z 0 )
is defined as the plane whose normal vector is N = A × B.
4.4 Tangent Planes 99
A1 C 1 + A2 C 2 + A3 C 3 = 0
B1 C1 + B2 C2 + B3 C3 = 0
A2 A3 B2 B3
C1 = − C2 − C3 = − C2 − C3 .
A1 A1 B1 B1
− BB31 − A3
A1
C2 = C3
− BB21 − A2
A1
− BB31 − A3
A1 A1 B1
= C3
− BB21 − A2
A1
A1 B1
B1 A3 − A1 B3
= C3 .
A1 B2 − B1 A2
A2 B1 A3 − A1 B3 A3
C1 = − + C3
A1 A1 B2 − B1 A2 A1
A2 (B1 A3 − A1 B3 ) + A3 (A1 B2 − B1 A2 )
=− C3
A1 (A1 B2 − B1 A2 )
A2 B1 A3 − A1 A2 B3 + A3 A1 B2 − A3 B1 A2
=− C3
A1 (A1 B2 − B1 A2 )
−A1 A2 B3 + A3 A1 B2
=− C3
A1 (A1 B2 − B1 A2 )
A2 B3 − A3 B2
= C3 .
A1 B2 − B1 A2
100 4 Multivariable Calculus
A2 B3 − A3 B2
C1 = C3
A1 B2 − B1 A2
B1 A3 − A1 B3
C2 = C3 .
A1 B2 − B1 A2
Choosing C3 = A1 B2 − B1 A2 , we find that the vector we are looking for has the
components
C1 = A2 B3 − A3 B2
C2 = B1 A3 − A1 B3
C3 = A1 B2 − B1 A2
A2 A3
C1 = det
B2 B3
A1 A3
C2 = − det
B1 B3
A1 A2
C3 = det
B1 B2
Then using the standard basis vectors for 3 , i, j and k, we see the vector C = A × B
can be written as
A2 A3 A A A A
A × B = i det − j det 1 3 + k det 1 2
B2 B3 B1 B3 B1 B2
A2 A3 A A A A
A × B = i det − j det 1 3 + k det 1 2
B2 B3 B1 B3 B1 B2
4.4 Tangent Planes 101
and we define the determinant of this matrix to coincide with the definition of A × B.
Comment 4.4.2 This is easy to remember. Start with the i in row one. Cross out the
first row and first column of C. The first term in the cross product is then i times the
determinant of the 2 × 2 submatrix that is left over. This is the matrix
A1 A2
B1 B2
The second term in row one is j . Associate this term with a minus sign and since it is
in row one, column two cross that row and column out of C to obtain the submatrix
A1 A3
B1 B3
The last term is associated with the row one, column three entry k. Cross out that
row and column in C to obtain the submatrix
A2 A3
B2 B3
The cross product is then the sum of these three terms. This is also called expanding
a 3 × 3 determinant by the first row, but that is another story.
102 4 Multivariable Calculus
4.4.1.1 Homework
For each of these problems, graph the two vectors as well as the cross product.
Recall the tangent plane to a surface z = f (x, y) at the point (x0 , y0 ) was the plane
determined by the tangent lines T (x, y0 ) and T (x0 , y). The T (x, y0 ) line was deter-
mined by the vector
⎡ ⎤
1 ⎡ ⎤
⎢ ⎥ 1
0 ⎥=⎣ 0 ⎦
A=⎢
⎣ ⎦
d
dx
f (x, y0 ) 2x0
x0
⎡ ⎤
0 ⎡ ⎤
⎢ ⎥ 0
1
B=⎢
⎣
⎥=⎣ 1 ⎦
⎦
d
dy
f (x0 , y) 2y0
y0
⎡ ⎤
1
A=⎣ 0 ⎦
∂f
∂x
(x 0 , y0 )
⎡ ⎤
0
B=⎣ 1 ⎦
∂f
∂y
(x0 , y0 )
4.4 Tangent Planes 103
⎡ ⎤
i j k
A × B = ⎣1 0 f x (x0 , y0 )⎦
0 1 f y (x0 , y0 )
or
0 f x (x0 , y0 ) 1 f x (x0 , y0 ) 01
A × B = i det − j det + k det
1 f y (x0 , y0 ) 0 f y (x0 , y0 ) 01
= − f x (x0 , y0 )i − f y (x0 , y0 ) j + k
⎡ ⎤
− f x (x0 , y0 )
= ⎣− f y (x0 , y0 )⎦ .
1
The tangent plane to the surface z = f (x, y) at the point (x0 , y0 ) is then given by
We can use another compact definition at this point. We can define the gradient of
the function f to be the vector ∇ f . The gradient is defined as follows.
f x (x0 , y0 )
∇ f (x0 , y0 ) = .
f y (x0 , y0 )
Note the gradient takes a scalar function argument and returns a vector answer.
Solution
2x + 4y
∇ f (x, y) = .
4x + 18y
Note this can also be written as 10x + 40y + z = 55 which is also a standard form.
However, in this form, the attachment point (1, 2, 45) is hidden from view.
4.4.3 Homework
Exercise 4.4.7 Find the gradient of f (x, y) = sin(x y) and the equation of the tan-
gent plane to this surface at the point (π/4, −π/4).
We can use MatLab/Octave to draw tangent planes and tangent lines to a surface.
Consider the function DrawTangentPlanePackage. The source code is similar
to what we have done in previous functions. This time, we send in the function f and
the two partial derivatives fx and fy. First, we plot the traces and draw vertical lines
from the traces to the x−y plane. Note this code will not do very well on surfaces
where the z values become negative! But then, this code is just for exploration and
it is easy enough to alter it for other jobs. And it is a good exercise! After the traces
and their shadow lines are drawn, we draw the tangent lines. Finally, we draw the
tangent plane. The tangent plane calculation uses the partial derivatives we sent into
this function as arguments.
4.4 Tangent Planes 105
The illustrations this code produces have already been used in Fig. 4.2. Practice with
this code and draw other pictures! A typical session to generate this figure would
look like
4.4.5 Homework
Exercise 4.4.8 Draw tangent lines and planes for the surface f (x, y) = x 2 + 3y 2
for various points (x0 , y0 ).
Exercise 4.4.9 Draw tangent lines and planes for the surface f (x, y) = −x 2 − 3y 2
for various points (x0 , y0 ). You will need to modify the code to make this work!
Exercise 4.4.10 Draw tangent lines and planes for the surface f (x, y) = x 2 − 3y 2
for various points (x0 , y0 ). You will need to modify the code to make this work! Make
sure you try the point (0, 0).
Let’s look at the partial derivatives of f (x, y). As long as f (x, y) is defined locally
at (x0 , y0 ), we can say f x (x0 , y0 ) and f y (x0 , y0 ) exist if and only if there are error
functions E 1 (x, y, x0 , y0 ) and E 2 (x, y, x0 , y0 ) so that
From Definition 4.5.1, we can show if f is differentiable at the point (x0 , y0 ), then
L 1 = f x (x0 , y0 ) and L 2 = f y (x0 , y0 ). The argument goes like this: since f is differ-
entiable at (x0 , y0 ), we can say
f (x, y) − f (x0 , y0 ) − L 1 (x − x0 ) − L 2 (y − y0 )
lim = 0.
(x,y)→(x0 ,y0 ) (x − x0 )2 + (y − y0 )2
Thus, the right hand partial derivative f x (x0 , y0 )+ exists and equals L 1 . On the other
hand, if x < 0, then (x)2 = −x and we find, with a little manipulation, that
we still have
108 4 Multivariable Calculus
So the left hand partial derivative f x (x0 , y0 )− exists and equals L 1 also. Combining,
we see f x (x0 , y0 ) = L 1 . A similar argument shows that f y (x0 , y0 ) = L 2 . Hence, we
can say if f is differentiable at (x0 , y0 ) then f x and f y exist at this point and we have
Now that we know a bit about two dimensional derivatives, let’s go for gold and
figure out the new version of the chain rule. The argument we make here is very
similar in spirit to the one dimensional one. You should go back and check it out! We
will do this argument carefully but without tedious rigor. At least that is our hope.
You’ll have to let us know how we did!
We assume there are two functions u(x, y) and v(x, y) defined locally about
(x0 , y0 ) and that there is a third function f (u, v) which is defined locally around
(u 0 = u(x0 , y0 ), v0 = v(x0 , y0 )). Now assume f (u, v) is differentiable at (u 0 , v0 )
and u(x, y) and v(x, y) are differentiable at (x0 , y0 ). Then we can say
where all the error terms behave as usual as (x, y) → (x0 , y0 ) and (u, v) →
(u 0 , v0 ). Note that as (x, y) → (x0 , y0 ), u(x, y) → u 0 = u(x0 , y0 ) and v(x, y) →
v0 = v(x0 , y0 ) as u and v are continuous at the (u 0 , v0 ) since they are differentiable
there. Let’s consider the partial of f with respect to x. Let u = u(x0 + x, y0 ) −
u(x0 , y0 ) and v = v(x0 + x, y0 ) − v(x0 , y0 ). Thus, u 0 + u = u(x0 + x, y0 )
and v0 + v = v(x0 + x, y0 ). Hence
4.6 The Chain Rule 109
f (u 0 + u, v0 + v) − f (u 0 , v0 )
x
f u (u 0 , v0 )(u − u 0 ) + f v (u 0 , v0 )(v − v0 ) + E f (u, v, u 0 , v0 )
=
x
u − u0 v − v0 E f (u, v, u 0 , v0 )
= f u (u 0 , v0 ) + f v (u 0 , v0 ) +
x x x
u x (x0 , y0 )(x − x0 ) + E u (x, x0 , y0 )
= f u (u 0 , v0 )
x
vx (x0 , y0 )(x − x0 ) + E v (x, x0 , y0 ) E f (u, v, u 0 , v0 )
+ f v (u 0 , v0 ) +
x x
= f u (u 0 , v0 ) u x (x0 , y0 ) + f v (u 0 , v0 ) vx (x0 , y0 )
E u (x, x0 , y0 ) E v (x, x0 , y0 ) E f (u, v, u 0 , v0 )
+ + + .
x x x
∂f ∂ f ∂u ∂ f ∂v
= + .
∂x ∂u ∂x ∂v ∂x
A similar argument shows
∂f ∂ f ∂u ∂ f ∂v
= + .
∂y ∂u ∂ y ∂v ∂ y
∂f ∂ f ∂u ∂f ∂v
= +
∂x ∂u ∂x ∂v ∂x
∂f ∂ f ∂u ∂f ∂v
= + .
∂y ∂u ∂ y ∂v ∂y
4.6.1 Examples
∂f ∂ f ∂x ∂f ∂y
= +
∂r ∂x ∂r ∂y ∂r
∂f ∂ f ∂x ∂f ∂y
= +
∂θ ∂x ∂θ ∂y ∂θ
This becomes
∂f
= 2x + 2 cos(θ) + 20y 3 sin(θ)
∂r
∂f
= 2x + 2 −r sin(θ) + 20y 3 r cos(θ)
∂θ
You can then substitute in for x and y to get the final answer in terms of r and θ (kind
of ugly though!).
∂f ∂ f ∂u ∂f ∂v
= +
∂x ∂u ∂x ∂v ∂x
∂f ∂ f ∂u ∂f ∂v
= +
∂y ∂u ∂ y ∂v ∂y
This becomes
∂f
= 20uv 4 2x + 40u 2 v 3 8x
∂x
∂f
= 20uv 4 4y + 40u 2 v 3 (−10y)
∂θ
You can then substitute in for u and v to get the final answer in terms of x and y
(even more ugly though!).
Example 4.6.3 In the discussion of Hamilton’s Rule from the first course on calculus
for biologists, we discuss a fitness function w for a model of altruism which depends
on P which is the probability of giving aid and Q which is the probability of receiving
aid. The model is
w = w0 + b Q − c P
4.6 The Chain Rule 111
∂w ∂w ∂ P ∂w ∂ Q
= +
∂P ∂P ∂P ∂Q ∂P
∂Q
= −c + b
∂P
Let ∂ P Q be denoted by r , the coefficient of relatedness. The parameter r is very
hard to understand even though it was introduced in 1964 to study altruism. Altruism
occurs if fitness increases or ∂∂wP > 0. So altruism occurs if −c + br > 0 or r b > c.
This inequality is Hamilton’s Rule, but what counts is we understand what these
terms mean biologically.
4.6.1.1 Homework
We are now ready to give you a whirlwind tour of what you can call second order
ideas in calculus for two variables. Or as some would say, let’s drink from the fountain
of knowledge with a fire hose! Well, maybe not that intense....
We will use these ideas for some practical things. Recall from the first class on
calculus for biologist’s, we used these ideas to to find the minimum and maximum
of functions of two variables and we applied those ideas to the problem of finding
the best straight line that fits a collection of data points. This regression line is of
great importance to you in your career as biologists! We also introduced the ideas of
average or mean, covariance and variance when we worked out how to find the
regression line. The slope of the regression line has many important applications and
we showed you some of them in our Hamilton’s rule model.
Once we have the chain rule, we can quickly develop other results such as how
much error we make when we approximate our surface f (x, y) using a tangent plane
at a point (x0 , y0 ). To finish our arguments, we need an analog of the Mean Value
Theorem from Calculus. The first thing we need is to know when a function of two
variables is differentiable. Just because it’s partials exist at a point is not enough to
guarantee that! But we can prove that if the partials are continuous around that point,
112 4 Multivariable Calculus
then the derivative does exist. And that means we can write the function in terms of
its tangent plane plus an error. The arguments to do this are not terribly hard, so let’s
go through them. We will need a version of the Mean Value Theorem for functions
of two variables. Here it is:
Proof The argument that shows this is pretty straightforward. We apply the chain
rule using the simpler functions u(t) = x0 + t (x − x0 ) and v(t) = y0 + t (y − y0 ).
Then u and v are differentiable with u (t) = x − x0 and v (t) = y − y0 . Hence, if
h(t) = f (u(t), v(t)), we have
Now it is not true that just because a function f has partial derivatives at a point
(x0 , y0 ) that f is differentiable. There are many examples where partials can exist
at a point and the function itself does not satisfy the definition of differentiability.
However, if we know the partials are themselves continuous locally at the point
(x0 , y0 ) then it is true that f is differentiable there. Once we know f is differentiable
there we can apply chain rule type ideas. Let’s assume f is defined locally around
(x0 , y0 ) and consider the difference
E(x, y, x0 , y0 ) = f x (x0 + t2 x, y0 ) − f x (x0 , y0 ) x
+ f y (x0 + tx, y0 + t1 y) − f y (x0 , y0 ) y
We know as (x, y) → (0, 0), the numbers (t1 , t2 ) we found using the Mean
Value Theorem will also go to (0, 0) and so (x0 + t2 x, y0 ) → (x0 , y0 ) and (x0 +
x, y0 + t1 y) → (x0 , y0 ). Then the continuity of f x and f y at (x0 , y0 ) tells us
f x (x0 + t2 x, y0 ) − f x (x0 , y0 ) x
+ f y (x0 + tx, y0 + t1 y) − f y (x0 , y0 ) y → 0.
But the terms |x|/ (x)2 + (y)2 ≤ 1 and |y|/ (x)2 + (y)2 ≤ 1 and so as
(x, y) → (0, 0), we must have E(x, y, x0 , y0 )/ (x)2 + (y)2 → 0 as well.
These two limits show that f is differentiable at (x0 , y0 ). We can state this as a
theorem. We use this idea a lot in two dimensional calculus.
Now let’s go back to the old idea of a tangent plane to a surface. For the surface
z = f (x, y) if its partials are continuous functions (they usually are for our work!)
then f is differentiable and hence we know that
We can characterize the error must better if we have access to what are called the
second order partial derivatives of f . Roughly speaking, we take the partials of f x
and f y to obtain the second order terms. We can make this discussion brief. Assuming
f is defined locally as usual near (x0 , y0 ), we can ask about the partial derivatives
of the functions f x and f y with respect to x and y also. We define the second order
partials of f as follows.
f x (x, y) − f x (x, y0 )
lim = ∂x ( f x )
x→x0 ,y=y0 x − x0
f x (x, y) − f x (x0 , y)
lim = ∂y ( fx )
x=x0 ,y→y0 y − y0
4.8 Second Order Error Estimates 115
f y (x, y) − f y (x, y0 )
lim = ∂x ( f y )
x→x0 ,y=y0 x − x0
f y (x, y) − f y (x0 , y)
lim = ∂y ( f y )
x=x0 ,y→y0 y − y0
Comment 4.8.1 When these second order partials exist at (x0 , y0 ), we use the fol-
lowing notations interchangeably: f x x = ∂x ( f x ), f x y = ∂ y ( f x ), f yx = ∂ y ( f x ) and
f yy = ∂ y ( f y ).
The second order partials are often organized into a matrix called the Hessian.
f x x (x0 , y0 ) f x y (x0 , y0 )
H(x0 , y0 ) =
f yx (x0 , y0 ) f yy (x0 , y0 )
Comment 4.8.2 It is possible to prove that if the first order partials are continuous
locally near (x0 , y0 ) then the mixed order partials f x y and f yx must match at the
point (x0 , y0 ). Most of our surfaces have this property. Hence, for these smooth
surfaces, the Hessian is a symmetric matrix!
Example 4.8.1 Let f (x, y) = 2x − 8x y. Find the first and second order partials of
f and its Hessian.
f x (x, y) = 2 − 8y
f y (x, y) = −8x
f x x (x, y) = 0
f x y (x, y) = −8
f yx (x, y) = −8
f yy (x, y) = 0.
f x x (x, y) f x y (x, y) 0 −8
H(x, y) = =
f yx (x, y) f yy (x, y) −8 0
116 4 Multivariable Calculus
4.8.1 Homework
Exercise 4.8.1 Let f (x, y) = 5x − 2x y. Find the first and second order partials of
f and its Hessian.
Exercise 4.8.2 Let f (x, y) = −8y + 9x y − 2y 2 . Find the first and second order
partials of f and its Hessian.
Exercise 4.8.3 Let f (x, y) = 4x − 6x y − x 2 . Find the first and second order par-
tials of f and its Hessian.
Exercise 4.8.4 Let f (x, y) = 4x 2 − 6x y − x 2 . Find the first and second order par-
tials of f and its Hessian.
We can now explain the most common approximation result for tangent planes. Let
t2
h(t) = h(0) + h (0)t + h (c) .
2
Using the chain rule, we find
and
h (t) = ∂x f x (x0 + tx, y0 + ty)x + f y (x0 + tx, y0 + ty)y x
+∂ y f x (x0 + tx, y0 + ty)x + f y (x0 + tx, y0 + ty)y y
f x x (x0 + tx, y0 + ty) f yx (x0 + tx, y0 + ty) x
h (t) = x y
f x y (x0 + tx, y0 + ty) f yy (x0 + tx, y0 + ty) y
T
x x
h (t) = H(x0 + tx, y0 + ty)
y y
1
h(1) = h(0) + h (0)(1 − 0) + h (c)
2
for some c between 0 and 1. Substituting for the h terms, we find
Clearly, we have shown how to express the error in terms of second order partials.
There is a point c between 0 and 1 so that
1 x T x
E(x0 , y0 , x, y) = H(x0 + cx, y0 + cy)
2 y y
To understand how to think about finding places where the minimum and maximum
of a function to two variables might occur, all you have to do is realize it is a common
sense thing. We already know that the tangent plane attached to the surface which
represents our function of two variables is a way to approximate the function near
the point of attachment. We have seen in our pictures what happens when the tangent
plane is flat. This flatness occurs at the minimum and maximum of the function. It
also occurs in other situations, but we will leave that more complicated event for other
118 4 Multivariable Calculus
courses. The functions we want to deal with are quite nice and have great minima
and maxima. However, we do want you to know there are more things in the world
and we will touch on them only briefly.
To see what to do, just recall the equation of the tangent plane error to our function
of two variables f (x, y).
where c is some number between 0 and 1 that is different for each x. We also know
that the equation of the tangent plane to f (x, y) at the point (x0 , y0 ) is
Now let’s assume the tangent plane is flat at (x0 , y0 ). Then the gradient ∇ f is the
zero vector and we have ∂∂xf (x0 , y0 ) = 0 and ∂∂ yf (x0 , y0 ) = 0. So the tangent plane
error equation simplifies to
Now let’s simplify this. The Hessian is just a 2 × 2 matrix whose components are
the second order partials of f . Let
∂2 f
A(c) = (x0 + c(x − x0 ), y0 + c(y − y0 ))
∂x 2
∂2 f
B(c) = (x0 + c(x − x0 ), y0 + c(y − y0 ))
∂x ∂ y
∂2 f
= (x0 + c(x − x0 ), y0 + c(y − y0 ))
∂ y ∂x
∂2 f
D(c) = (x0 + c(x − x0 ), y0 + c(y − y0 ))
∂ y2
Then, we have
A(c) B(c) x − x0
f (x, y) = f (x0 , y0 ) + (1/2) x − x0 y − y0
B(c) D(c) y − y0
4.9 Extrema Ideas 119
We can multiply this out (a nice simple pencil and paper exercise!) to find
f (x, y) = f (x0 , y0 ) + 1/2 A(c)(x − x0 )2
+ 2B(c)(x − x0 ) (y − y0 ) + D(c)(y − y0 )2
Now group the first three terms together and combine the last two terms into one
term.
u + 3uv + 6v = u + 3uv + (3/2) v + 6 − (3/2) v 2 .
2 2 2 2 2 2
The first three terms are a perfect square, (u + (3/2)v)2 . Simplifying, we find
2
u + 3uv + 6v = u + (3/2)v + (135/4) v 2 .
2 2
This is called completely the square! Now let’s do this with the Hessian quadratic
we have. First, factor our the A(c). We will assume it is not zero so the divisions are
fine to do. Also, for convenience, we will replace x − x0 by x and y − y0 by y.
This gives
A(c) B(c) D(c)
f (x, y) = f (x0 , y0 ) + (x)2 + 2 x y + (y)2 .
2 A(c) A(c)
A(c) B(c)
f (x, y) = f (x0 , y0 ) + (x)2 + 2 x y
2 A(c)
B(c) 2 B(c) 2 D(c)
+ (y) −
2
(y) +
2
(y) .
2
A(c) A(c) A(c)
120 4 Multivariable Calculus
Now group the first three terms together—the perfect square and combine the last
two terms into one. We have
f (x, y) = f (x0 , y0 )
2
A(c) B(c) A(c) D(c) − (B(c))2
+ x + y + (y) .
2
2 A(c) (A(c))2
Now we need this common sense result which says that if a function g is continuous
at a point (x0 , y0 ) and positive or negative , then it is positive or negative in a circle
of radius r centered at (x0 , y0 ). Here is the formal statement.
Now getting back to our problem. We have at this point where the partials are zero,
the following expansion
A(c) B(c) B(c) 2
f (x, y) = f (x0 , y0 ) + (x) + 2
2
x y + (y) 2
2 A(c) A(c)
A(c) D(c) − (B(c))2
+ (y) .
2
(A(c))2
The algebraic sign of the terms after the function value f (x0 , y0 ) are completely
determined by the terms which are not squared. We have two simple cases:
4.9 Extrema Ideas 121
• A(c) > 0 and A(c) D(c) − (B(c))2 > 0 which implies the term after f (x0 , y0 ) is
positive.
• A(c) < 0 and A(c) D(c) − (B(c))2 > 0 which implies the term after f (x0 , y0 ) is
negative.
Now let’s assume all the second order partials are continuous at (x0 , y0 ). We know
A(c) = ∂∂x 2f (x0 + c(x − x0 ), y0 + c(y − y0 )) and from Theorem 4.9.1, if ∂∂x 2f (x0 , y0 )
2 2
> 0, then so is A(c) in a circle around (x0 , y0 ). The other term A(c) D(c) −
(B(c))2 > 0 will also be positive is a circle around (x0 , y0 ) as long as ∂∂x 2f (x0 , y0 )
2
∂2 f ∂2 f
∂ y2
(x0 , y0 ) − (x , y0 ) > 0. We can say similar things about the negative case.
∂x∂ y 0
Now to save typing let ∂∂x 2f (x0 , y0 ) = f x0x , ∂∂ y 2f (x0 , y0 ) = f yy ∂2 f
2 2
0
and ∂x∂ (x , y0 ) = f x0y .
y 0
So we can restate our two cases as
where, for convenience, we use a superscript 0 to denote we are evaluating the partials
at (x0 , y0 ). So we have come up with a great condition to verify if a place where
the partials are zero is a minimum or a maximum. If you think about it a bit, you’ll
notice we left out the case where f x0x f yy
0
− ( f x0y )2 < 0 which is important but we
will not do that in this class. That is for later courses to pick up, however it is the
test for the analog of the behavior we see in the cubic y = x 3 . The derivative is 0 but
there is neither a minimum or maximum at x = 0. In two dimensions, the situation is
more interesting of course. This kind of behavior is called a saddle. We have another
Theorem!
Now the second order test fails if det(H(x0 , y0 )) = 0 at the critical point as a
few examples show. First, the function f (x, y) = x 4 + y 4 has a global minimum at
(0, 0) but at that point
122 4 Multivariable Calculus
12x 2 0
H(x, y) = =⇒ det(H(x0 , y0 )) = 144x 2 y 2 .
0 12y 2
2
A(c) B(c) A(c) D(c) − (B(c))2 2
f (x, y) = f (x0 , y0 ) + x + y + (y) .
2 A(c) (A(c))2
Now suppose we knew A(c) D(c) − (B(c))2 < 0. Then, using the usual continuity
argument, we know that there is a circle around the critical point (x0 , y0 ) so that
A(c) D(c) − (B(c))2 < 0 when c = 0. This is the same as saying det(H(x0 , y0 )) <
0. But notice that on the line going through the critical point having y = 0, this
gives
2
A(c)
f (x, y) = f (x0 , y0 ) + x .
2
Now, if A(c) > 0, the first case gives f (x, y) = f (x0 , y0 ) + a positive number
showing f has a minimum on that trace. However, the second case gives f (x, y) =
f (x0 , y0 )− a positive number which shows f has a maximum on that trace. The
fact that f is minimized in one direction and maximized in another direction gives
rise to the expression that we consider f to behave like a saddle at this critical point.
The analysis is virtually the same if A(c) < 0, except the first trace has the maximum
4.9 Extrema Ideas 123
and the second trace has the minimum. Hence, the test for a saddle point is to see if
det(H(x0 , y0 )) < 0 as we stated in Theorem 4.9.2.
4.9.1 Examples
Example 4.9.1 Use our tests to show f (x, y) = x 2 + 3y 2 has a minimum at (0, 0).
Solution The partials here are f x = 2x and f y = 6y. These are zero at x = 0 and
y = 0. The Hessian at this critical point is
20
H (x, y) = = H (0, 0).
06
as H is constant here. Our second order test says the point (0, 0) corresponds
to a minimum because f x x (0, 0) = 2 > 0 and f x x (0, 0) f yy (0, 0) − ( f x y (0, 0))2 =
12 > 0.
Solution The partials here are f x = 2x + 6y and f y = 6x + 6y. These are zero at
when
2x + 6y = 0
6x + 6y = 0
as H is again constant here. Our second order test says the point (0, 0) corresponds to
a saddle because f x x (0, 0) = 2 > 0 and f x x (0, 0) f yy (0, 0) − ( f x y (0, 0))2 = 12 −
36 < 0.
Example 4.9.3 Show our tests fail on f (x, y) = 2x 4 + 4y 6 even though we know
there is a minimum value at (0, 0).
Solution For f (x, y) = 2x 4 + 4y 6 , you find that the critical point is (0, 0) and all
the second order partials are 0 there. So all the tests fail. Of course, a little common
sense tells you (0, 0) is indeed the place where this function has a minimum value.
Just think about how it’s surface looks. But the tests just fail. This is much like the
curve f (x) = x 4 which has a minimum at x = 0 but all the tests fail on it also.
124 4 Multivariable Calculus
Example 4.9.4 Show our tests fail on f (x, y) = 2x 2 + 4y 3 and the surface does not
have a minimum or maximum at the critical point (0, 0).
Solution For f (x, y) = 2x 2 + 4y 3 , the critical point is again (0, 0) and f x x (0, 0) =
4, f yy (0, 0) = 0 and f x y (0, 0) = f yx (0, 0) = 0. So f x x (0, 0) f yy (0, 0) − ( f x y (0, 0))2
= 0 so the test fails. Note the x = 0 trace is 4y 3 which is a cubic and so is nega-
tive below y = 0 and positive above y = 0. Not much like a minimum or maximum
behavior on this trace! But the trace for y = 0 is 2x 2 which is a nice parabola which
does reach its minimum at x = 0. So the behavior of the surface around (0, 0) is not
a maximum or a minimum. The surface acts a lot like a cubic. Do this in MatLab.
This will give you a surface. In the plot that is shown go to the tool menu and click
of the rotate 3D option and you can spin it around. Clearly like a cubic! You can see
the plot in Fig. 4.5.
4.9.2 Homework
Exercise 4.9.1 Use our tests to show f (x, y) = 4x 2 + 2y 2 has a minimum at (0, 0).
Feel free to draw a surface plot to help you see what is going on.
4.9 Extrema Ideas 125
Exercise 4.9.5 Show our tests fail on f (x, y) = 6x 4 + 8y 8 even though we know
there is a minimum value at (0, 0). Feel free to draw a surface plot to help you see
what is going on.
Exercise 4.9.6 Show our tests fail on f (x, y) = 10x 2 + 5y 5 and the surface does
not have a minimum or maximum at the critical point (0, 0). Feel free to draw a
surface plot to help you see what is going on.
Part III
The Main Event
Chapter 5
Integration
To help us with our modeling tasks, we need to explore two new ways to compute
antiderivatives and definite integrals. These methods are called Integration By Parts
and Partial Fraction Decompositions. You should also recall our discussions about
antiderivatives or primitives and Riemann integration from Peterson (2015) where
we go over topics such as how we define the Riemann Integral, the Fundamental
Theorem Calculus and the use of the Cauchy Fundamental Theorem Calculus. you
should also review the basic ideas of continuity and differentiability.
This technique is based on the product rule for differentiation. Let’s assume that
the functions f and g are both differentiable on the finite interval [a, b]. Then the
product f g is also differentiable on [a, b] and
We usually write this in an even more abbreviated form. If we let u(t) = f (t),
then du = f (t) dt. Also, if v(t) = g(t), then dv = g (t) dt. Then, we can rephrase
Eq. 5.1 as Eq. 5.2.
b b
u dv = uv |ab − v du (5.2)
a a
We can also develop the integration by parts formula as an indefinite integral. When
we do that we obtain the version in Eq. 5.3
b
u dv = uv − v du + C (5.3)
a
Equation 5.2 gives what is commonly called the Integration By Parts formula.
Let’s work through some problems carefully. As usual, we will give many details at
first and gradually do the problems faster with less written down. You need to work
hard at understanding this technique.
Example 5.1.1 Evaluate ln(t) dt.
Solution Let u(t) = ln(t) and dv = dt. Then du = 1t dt and v = dt = t. When
we find the antiderivative v, at this stage we don’t need to carry around an arbitrary
constant C as we will add one at the end. Applying Integration by Parts, we have
ln(t) dt = udv
= uv − vdu
1
= ln(t) t − t dt
t
= ln(t) t − dt
= t ln(t) − t + C
Example 5.1.2 Evaluate t ln(t) dt.
5.1 Integration by Parts 131
Solution Let u(t) = ln(t) and dv = tdt. Then du = 1
t
dt and v = tdt = t 2 /2.
Applying Integration by Parts, we have
t ln(t) dt = udv
= uv − vdu
1
= ln(t) t 2 /2 − t 2 /2 dt
t
2
t
= ln(t) − t/2 dt
2
t2 t2
= ln(t) − + C
2 4
Example 5.1.3 Evaluate t 3 ln(t) dt.
Solution Let u(t) = ln(t) and dv = t 3 dt. Then du = 1
t
dt and v = t 3 dt = t 4 /4.
Applying Integration by Parts, we have
t ln(t) dt =
3
udv
= uv − vdu
1
= ln(t) t /4 −
4
t 4 /4 dt
t
4
t
= ln(t) − t 3 /4 dt
4
t2 t4
= ln(t) − +C
2 16
Example 5.1.4 Evaluate t et dt.
Solution Let u(t) = t and dv = et dt. Then du = dt and v = et dt = et . Applying
Integration by Parts, we have
t e dt =
t
udv
= uv − vdu
= et t − et dt
= t et − et dt
= t et − et + C
132 5 Integration
Example 5.1.5 Evaluate t 2 et dt.
Solution Let u(t) = t 2 and dv = et dt. Then du = 2tdt and v = et dt = et . Apply-
ing Integration by Parts, we have
t e dt =
2 t
udv
= uv − vdu
= et t 2 − et 2t dt
= t 2 et − 2t et dt
Now the integral 2t et dt also requires the use of integration by parts. So we
integrate again using this technique Let u(t) = 2t and dv = et dt. Then du = 2dt
and v = et dt = et . Applying Integration by Parts again, we have
2t et dt = udv
= uv − vdu
= et 2t − et 2 dt
= 2t e −
t
2 et dt
= 2t et − 2 et + C
It is very awkward to do these multiple integration by parts in two separate steps like
we just did. It is much more convenient to repackage the computation like this:
t 2 et dt = uv −
vdu
u=t 2
dv = e dt t
= et t 2 − et 2t dt
du = 2tdt v = et
= et t 2 − et 2t dt
= t 2 et − 2t et dt
u = 2t dv = et dt
=
du = 2dt v = et
= t 2 et − et 2t − et 2 dt
= t 2 et − 2t et − 2 et + C
5.1 Integration by Parts 133
The framed boxes are convenient for our explanation, but this is still a bit awkward
(and long!) to write out for our problem solution. So let’s try this:
t 2 et dt = et t 2 − et 2t dt
u = t 2 ; dv = et dt; du = 2tdt; v = et
= et t 2 − et 2t dt
=t e −
2 t
2t et dt
Solution We will do this one the short way: first do the indefinite integral just like
the last problem.
134 5 Integration
t 2 sin(t) dt = −t 2 cos(t) − − cos(t) 2t dt
Then, we see
3
t 2 sin(t) dt = {−t 2 cos(t) + 2t sin(t) + 2 cos(t)}(3)
1
− {−t 2 cos(t) + 2t sin(t) + 2 cos(t)}(1)
= {−9 cos(3) + 6 sin(3) + 2 cos(3)}
− {− cos(1) + 2 sin(1) + 2 cos(1)}
And it is not clear we can do much to simplify this expression except possibly just
use our calculator to actually compute a value!
5.1.2 Homework
Exercise 5.1.1 Evaluate ln(5t) dt.
Exercise 5.1.2 Evaluate 2t ln(t 2 ) dt.
Exercise 5.1.3 Evaluate (t + 1)2 ln(t + 1) dt.
Exercise 5.1.4 Evaluate t 2 e2t dt.
2
Exercise 5.1.5 Evaluate 0 t 2 e−3t dt.
Exercise 5.1.6 Evaluate 10t sin(4t) dt.
Exercise 5.1.7 Evaluate 6t cos(8t) dt.
Exercise 5.1.8 Evaluate (6t + 4) cos(8t) dt.
5 2
Exercise 5.1.9 Evaluate 2 (t + 5t + 3) ln(t) dt.
Exercise 5.1.10 Evaluate (t 2 + 5t + 3) e2t dt.
5.2 Partial Fraction Decompositions 135
Suppose we wanted to integrate a function like (x+2)1(x−3) . This does not fit into a
simple substitution method at all. The way we do this kind of problem is to split the
fraction (x+2)1(x−3) into the sum of the two simpler fractions x+2
1 1
and x−3 . This is
called the Partial Fractions Decomposition approach Hence, we want to find numbers
A and B so that
1 A B
= +
(x + 2) (x − 3) x +2 x −3
If we multiply both sides of this equation by the term (x + 2) (x − 3), we get the
new equation
1 = A (x − 3) + B (x + 2)
Since this equation holds for x = 3 and x = −2, we can evaluate the equation twice
to get
1 = {A (x − 3) + B (x + 2)} |x=3
=5B
1 = {A (x − 3) + B (x + 2)} |x=−2
= −5 A
1 −1/5 1/5
= +
(x + 2) (x − 3) x +2 x −3
where it is hard to say which of these equivalent forms is the most useful. In general, in
later chapters, as we work out various modeling problems, we will choose whichever
of the forms above is best for our purposes.
5 A B
= +
(t + 3) (t − 4) t +3 t −4
5 = A (t − 4) + B (t + 3)
5 = (A (t − 4) + B (t + 3)) |t=4
=7B
5 = (A (t − 4) + B (t + 3)) |t=−3
= −7 A
Thus, we know A = −5/7 and B = 5/7. We can now evaluate the integral:
5 −5/7 5/7
dt = + dt
(t + 3) (t − 4) t +3 t −4
−5/7 5/7
= dt + dt
t +3 t −4
1 1
= −5/7 dt + 5/7 dt
t +3 t −4
= −5/7 ln (| t + 3 |) + 5/7 ln (| t − 4 |) + C
5 |t −4|
= ln +C
7 |t +3|
| t − 4 | 5/7
= ln +C
|t +3|
5.2 Partial Fraction Decompositions 137
10 A B
= +
(2t − 3) (8t + 5) 2t − 3 8t + 5
10 = A (8t + 5) + B (2t − 3)
Thus, we know B = −80/34 = −40/17 and A = 10/17. We can now evaluate the
integral:
10 10/17 −40/17
dt = + dt
(2t − 3) (8t + 5) 2t − 3 8t + 5
10/17 −40/17
= dt + dt
2t − 3 8t + 5
1 1
= 10/17 dt − 40/17 dt
2t − 3 8t + 5
= 10/17 (1/2) ln (| 2t − 3 |) − 40/17 (1/8) ln (| 8t + 5 |) + C
5 | 2t − 3 |
= ln +C
17 | 8t + 5 |
| 2t − 3 | 5/17
= ln +C
| 8t + 5 |
6 A B
= +
(4 − t) (9 + t) 4−t 9+t
6 = A (9 + t) + B (4 − t)
6 = (A (4 − t) + B (9 + t)) |t=4
= 13 B
6 = (A (4 − t) + B (9 + t)) |t=−9
= 13 A
Thus, we know A = 6/13 and B = 6/13. We can now evaluate the integral:
6 6/13 6/13
dt = + dt
(4 − t) (9 + t) 4−t 9+t
−6/13 6/13
= + dt
t −4 t +9
−6/13 6/13
= dt + dt
t −4 t +9
1 1
= 6/13 dt + 6/13 dt
t −4 t +9
= 6/13 ln (| t − 4 |) + 6/13 ln (| t + 9 |) + C
6
= ln (| (2t − 3) (8t + 5) |) + C
13
= ln (| (2t − 3) (8t + 5) |)6/13 + C
−6 A B
= +
(t − 2) (2t + 8) t − 2 2t + 8
−6 = A (2t + 8) + B (t − 2)
5.2 Partial Fraction Decompositions 139
Thus, we know A = −1/2 and B = 1. We can now evaluate the indefinite integral:
−6 −1/2 1
dt = + dt
(t − 2) (2t + 8) t −2 2t + 8
−1/2 1
= + dt
t −2 2t + 8
−1/2 1
= dt + dt
t −2 2t + 8
1 1
= −1/2 dt + dt
t −2 2t + 8
= −1/2 ln (| t − 2 |) + 1/2 ln (| 2t + 8 |) + C
1 | 2t + 8 |
= ln +C
2 |t −2|
1 2t + 8
= ln | | +C
2 t −2
Note that this evaluation would not make sense if the on an interval [a, b] is any of
these two natural logarithm functions were undefined at some point in [a, b]. Here,
both natural logarithm functions are nicely defined on [4, 7].
140 5 Integration
5.2.2 Homework
Reference
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd, Singapore, 2015 in press)
Chapter 6
Complex Numbers
In the chapters to come, we will need to use the idea of a complex number. When we
use the quadratic equation to find the roots of a polynomial like f (t) = t 2 + t + 1,
we find
√
−1 ± 1 − 4
t=
2
1 √
= − ± −3
2
Since, it is well known that there are no numbers
√ in our world whose squares can be
negative, it was easy to think that the term −3 represented some sort of imaginary
√
quantity. But, it seemed reasonable that the usual properties of the function should
hold. Thus, we can write
√ √ √
−3 = −1 3
√
and the term −1 had the amazing property that when √ you squared it, you got back
−1! Thus, the√ square√root of any negative number −c for a positive c could be
rewritten as −1 × c. It became clear to the people studying the roots of polyno-
mials such as our simple one √above, that
√ if the set of real numbers was augmented to
include numbers of the form −1 × c, there √ would √ be a nice √ way to represent any
root of a polynomial. Since a number like −1 × 4 or 2 × −1 is also possible, it
seemed like two copies of the real numbers were needed: one that was the usual real
numbers
√ and another copy which was any real number times this strange quantity
−1. It became very convenient to label the
√ set of all traditional real numbers as the
x axis and the set of√numbers prefixed by −1 as the y axis.
Since the prefix −1 was the only real √ difference between√the usual real num-
bers and the new numbers with the prefix −1, this prefix −1 seemed the the
quintessential representative of this difference. Historically, since these new pre-
fixed
√ numbers were already thought of as imaginary, it was decided to start labeling
−1 as the simple letter i where i is short for imaginary! Thus, a number of this
© Springer Science+Business Media Singapore 2016 141
J.K. Peterson, Calculus for Cognitive Scientists,
Cognitive Science and Technology, DOI 10.1007/978-981-287-877-9_6
142 6 Complex Numbers
sort could be represented as a + b i where a and b are any ordinary real numbers.
In particular, the roots of our polynomial could be written as
1 √
t = − ± i 3.
2
The angle measured from the positive x axis to this vector is called the angle asso-
ciated with the complex number z and is commonly denoted by the symbol θ or
Arg(z). Hence, there are two equivalent ways we can represent a complex number
z. We can use coordinate information and write z = a + b i or we can use magnitude
and angle information. In this case, if you look at Fig. 6.1, you can clearly see that
a = r cos(θ) and b = r sin(θ). Thus,
z = | z | (cos(θ) + i sin(θ))
= r (cos(θ) + i sin(θ))
iy A complex number
a + b i has real part
z̄ = a − bi a and imaginary part b.
Its complex conjugate is
ax a − b i The coordinate
(a, −b) is graphed in
−θ the usual Cartesian man-
ner as an ordered pair
in the complex plane.
r The magnitude of z̄ is
(a)2 + (b)2 which is
shown on the graph as
b r. The angle associated
with z̄ is drawn as an
arc of angle −θ
We can interpret the number cos(θ) + i sin(θ) in a different way. Given a complex
number z = a + b i, we define the complex conjugate of z to be z̄ = a − b i. It is
easy to see that
z z̄ = | z |2
z̄
z −1 =
| z |2
Now look at Fig. 6.1 again. In this figure, z is graphed in Quadrant 1 of the complex
plane. Now imagine that we replace z by z̄. Then the imaginary component changes
to −5 which is a reflection across the positive x axis. The magnitude of z̄ and z will
then be the same but Arg(z̄) is −θ. We see this illustrated in Fig. 6.2.
iy
Im(z)
A complex number
−2 + 8 i has real
part −2 and imaginary
z = −2 + 8i
part 8. The coordinate
(−2, 8) is graphed in the
r usual Cartesian manner
as an ordered pair in the
x − iy complex plane.
θ
The magnitude of z is
(−2)2 + (8)2 which
is shown on the graph as
r. The angle associated
with z is drawn as an
Re(z) x arc of angle θ
6.1.2 Homework
Let z = r (cos(θ) + i sin(θ)) and let’s think of θ are a variable now. We have seen the
function f (θ) = r (cos(θ) + i sin(θ)) arises when we interpret a complex number
in terms of the triangle formed by its angle and its magnitude. Now let’s think of θ
as a variable. Let’s find f (θ). We have
146 6 Complex Numbers
= i f (θ).
f (θ)
So f (θ) = i f (θ) or f (θ)
= i. Taking the antiderivative of both sides, this suggests
Thus, we can rephrase the polar coordinate form of the complex number z = a + b i
as z = r e i θ .
e(−2 +8 i)t
6.2 Complex Functions 147
Hence,
e(−1 +2 i)t
Solution We have
e2i t
Solution We have
| e(−1 +2 i)t | = 1
6.2.2 Homework
For each of the following complex numbers, find its magnitude, write it in the form
r eiθ using radians, and graph it in the complex plane showing angle in degrees.
Exercise 6.2.1 z = −3 + 6 i.
Exercise 6.2.2 z = −3 − 6 i.
Exercise 6.2.3 z = 3 − 6 i.
Exercise 6.2.4 z = 2 + 8 i.
Exercise 6.2.5 z = 5 + 1 i.
For each of the following complex functions, find its magnitude and write it in its
fully expanded form.
We now turn our attention to Linear Second Order Differential Equations. The way
we solve these is built from our understanding of exponential growth and the models
built out of that idea. The idea of a half life is also important though it is not used
as much in these second order models. A great example also comes from Protein
Modeling and its version of half life called Response Time.
These have the general form
where we assume a is not zero. Here are some examples just so you can get the feel
of these models.
We have already seen that for a first order problem like u (t) = r u(t) with u(0) = u 0
has the solution u(t) = u 0 er t . It turns out that the solutions to Eqs. 7.1 and 7.2 will
also have this kind of form. To see how this happens, let’s look at a new concept called
an operator. For us, an operator is something that takes as input a function and then
transforms the function into another one. A great example is the indefinite integral
operator we might
call I which takes a nice continuous function f and outputs the
new function f . Another good one is the differentiation operator D which takes a
differentiable function f and creates the new function f . Hence, if we let D denote
differentiation with respect to the independent variable and c be the multiply by a
constant operator defined by c( f ) = c f , we could rewrite u (t) = r u(t) as
Du = r u (7.3)
where we suppress the (t) notation for simplicity of exposition. In fact, we could
rewrite the model again as
D−r u =0 (7.4)
where we let D − r act on u to create u − r u. We can apply this idea to Eq. 7.1.
Next, let D2 be the second derivative operator: this means D2 u = u . Then we can
rewrite Eq. 7.1 as
a D2 u + b Du + c u = 0 (7.5)
where we again suppress the time variable t. For example, if we had the problem
it could be rewritten as
D2 u + 5 Du + 6 u = 0.
A little more practice is good at this point. These models convert to the operator form
indicated We ignore the initial conditions for the moment.
In fact, we should begin to think of the models and their operator forms as inter-
changeable. The model most useful to us now is the first order linear model. We now
can write
and
y + 5y + 6y = 0 ⇐⇒ ( D2 + 5 D + 6)(y) = 0.
Let’s try factoring: consider for an function f and do the computations with the
factors in both orders.
( D + 2) ( D + 3)( f ) = ( D + 2)( f + 3 f )
= D( f + 3 f ) + 2 ( f + 3 f )
= f + 3 f + 2 f + 6 f = f + 5 f + 6 f.
and
( D + 3) ( D + 2)( f ) = ( D + 3)( f + 2 f )
= D( f + 2 f ) + 3 ( f + 2 f )
= f + 2 f + 3 f + 6 f = f + 5 f + 6 f.
We see that
Now we can figure out how to find the most general solution to this model.
• The general solution to ( D + 3)(y) = 0 is y(t) = Ae−3t .
• The general solution to ( D + 2)(y) = 0 is y(t) = Be−2t .
• Let our most general solution be y(t) = Ae−3t + Be−2t .
We know from our study of first order equations that a problem of the form
( D + r)u = 0 has a solution of the form er t . This suggests that there are two
possible solutions to the problem above. One satisfies ( D + 3)u = 0 and the other
( D + 2)u = 0. Hence, it seems that any combination of the functions e−3t and
e−2t should work. Thus, a general solution to our problem would have the form
A e−3t + B e−2t for arbitrary constants A and B. With this intuition established, let’s
try to solve this more formally.
For the problem Eq. 7.1, let’s assume that er t is a solution and try to find what
values of r might work. We see for u(t) = er t , we find
0 = a r 2 + b r + c.
The roots of the quadratic equation above are the only values of r that will work
as the solution er t . We call this quadratic equation the Characteristic Equation of a
linear second order differential equation Eq. 7.1. To find these values of r , we can
either factor the quadratic or use the quadratic formula. If you remember from your
earlier algebra course, there are three types of roots:
(i): the roots are both real and distinct. We let r1 be the smallest root and r2 the
bigger one.
(ii): the roots are both the same. We let r1 = r2 = r in this case.
(iii): the roots are a complex pair of the form a ± b i.
(D 2 + 7D + 10)(x) = 0.
Let’s derive the characteristic equation. We assume the model has a solution of the
form er t for some value of r . Then, plugging u(t) = er t into the model, we find
(r 2 + 7r + 10)(er t ) = 0.
Since er t is never 0 no matter what r ’s value is, we see this implies we need values
of r that satisfy
r 2 + 7r + 10 = 0.
This factors as
(r + 2)(r + 5) = 0.
Thus, the roots of the model’s characteristic equation are r1 = −5 and r2 = −2.
7.1 Homework 153
7.1 Homework
Exercise 7.1.1
Exercise 7.1.2
Exercise 7.1.3
Exercise 7.1.4
Exercise 7.1.5
0 = a r 2 + b r + c,
154 7 Linear Second Order ODEs
factors as
0 = (r − r1 ) (r − r2 ).
We are told that initially, u(0) = u 0 and u (0) = u 1 . Taking the derivative of u,
we find
Hence, to satisfy the initial conditions, we must find A and B to satisfy the two
equations in two unknowns below:
Thus to find the appropriate A and B, we must solve the system of two equations in
two unknowns:
A + B = u0,
r1 A + r2 B = u 1 .
Solution (i): To derive the characteristic equation, we assume the solution has the
form er t and plug that into the problem. We find
r 2 − 3 r − 10 er t = 0.
7.2 Distinct Roots 155
r 2 − 3 r − 10 = 0.
(r + 2) (r − 5) = 0.
(iv): Next, we find the values of A and B which will let the solution satisfy the initial
conditions. We have
u(0) = A e0 + B e0 = A + B = −10
u (0) = −2 A e0 + 5 B e0 = −2 A + 5 B = 20.
This gives the system of two equations in the two unknowns A and B
A + B = −10
−2 A + 5 B = 10.
Multiplying the first equation by 2 and adding, we get B = −10/7. It then follows
that A = −60/7. Thus, the solution to this initial value problem is
7.2.1 Homework
0 = a r 2 + b r + c,
factors as
0 = (r − r1 )2 .
We have one solution u 1 (t) = er1 t , but we don’t know if there are others. Let’s
assume that another solution is a product f (t)er1 t . We know we want
0= D − r1 D − r1 f (t) er1 t
= D − r1 f (t) + r1 f (t) − r1 f (t) e r1 t
= D − r1 f (t)er1 t
= f (t) + r1 f (t) − r1 f (t) er1 t
= f (t) er1 t .
158 7 Linear Second Order ODEs
Since er1 t is never zero, we must have f = 0. This tells us that f (t) = α t + β for
any α and β. Thus, a second solution has the form
v(t) = (α t + β) er1 t
= α t er 1 t + β er 1 t
The only new function in this second solution is u 2 (t) = ter1 t . Hence, our general
solution in the case of repeated roots will be
We are told that initially, u(0) = u 0 and u (0) = u 1 . Taking the derivative of u, we
find
u (t) = B e + r1 A + B t er1 t .
r1 t
Hence, to satisfy the initial conditions, we must find A and B to satisfy the two
equations in two unknowns below:
u(0) = A + B (0) er1 0 = A = u 0
u (0) = B er1 (0) + r1 A + B (0) er1 (0) = B + r1 A = u 1 .
Thus to find the appropriate A and B, we must solve the system of two equations in
two unknowns:
A = u0,
B + r1 A = u 1 .
Example 7.3.1 Now let’s look at a problem with repeated roots. We want to
Solution (i): To find the characteristic equation, we assume the solution has the
form er t and plug that into the problem. We find
r + 16 r + 64 er t = 0.
2
r 2 + 16 r + 64 = 0.
(r + 8) (r + 8) = 0.
Hence, the roots of this characteristic equation are repeated: r1 = −8 and r2 = −8.
(iii): The general solution is thus
(iv): Next, we find the values of A and B which will let the solution satisfy the initial
conditions. We have
u(0) = A e0 + B 0 e0 = A = 1
−8t −8t
−8t
u (0) = −8Ae + Be − 8Bte = −8 A + B = 8.
t=0
This gives the system of two equations in the two unknowns A and B
A=1
−8 A + B = 8.
This tells us that B = 16. Thus, the solution to this initial value problem is
7.3.1 Homework
Exercise 7.3.1
Exercise 7.3.2
Exercise 7.3.3
Exercise 7.3.4
Exercise 7.3.5
0 = a r 2 + b r + c,
factors as
0 = a r − (c + di) r − (c − di) .
because the roots are complex. Now we suspect the solutions are
We have already seen how to interpret the complex functions e(c+di)t and e(c−di)t
(see Chap. 6, Definition 6.2.2). Let’s try to find out what the derivative of such a
function might be. First, it seems reasonable that if f (t) has a derivative f (t) at t,
162 7 Linear Second Order ODEs
then multiplying by i to get i f (t) only changes the derivative to i f (t). In fact, the
derivative of (c + id) f (t) should be (c + id) f (t). Thus,
e(c+id)t = ect cos(dt) + i sin(dt)
= ect cos(dt) c + i d + i ect sin(dt) c + i d
= (c + i d)e cos(dt) + i sin(dt)
ct
= (c + i d) e(c+id)t .
We conclude that we can now take the derivative of e(c±id)t to get (c ± id) e(c±id)t .
We can now test to see if Ae(c+id)t + Be(c−id)t solves our problem. We see
e(c ± id)t = (c ± id)2 e(c ± id)t .
Thus,
a Ae(c+id)t + Be(c−id)t +b Ae(c+id)t + Be(c−id)t +c Ae(c+id)t + Be(c−id)t
= Ae(c+id)t a (c + id)2 + b (c + id) + c + Be(c−id)t a (c − id)2 + b (c − id) + c .
a (c + id)2 + b (c + id) + c = 0
a (c − id)2 + b (c − id) + c = 0.
Thus,
a Ae(c+id)t + Be(c−id)t +b Ae(c+id)t + Be(c−id)t +c Ae(c+id)t + Be(c−id)t
= Ae(c+id)t 0 + Be(c−id)t 0 = 0.
7.4 Complex Roots 163
In fact, you can see that all of the calculations above would work even if the constants
A and B were complex numbers. So we have shown that any combination of the two
complex solutions e(c+id)t and e(c−id)t is a solution of our problem. Of course, a
solution that actually has complex numbers in it doesn’t seem that useful for our
world. After all, we can’t even graph it! So we have to find a way to construct
solutions which are always real valued and use them as our solution. Note
1 (c+id)t (c−id)t
u 1 (t) = e +e
2
1
= ect cos(dt) + i sin(dt) + cos(dt) − i sin(dt)
2
1
= 2 cos(dt) ect
2
= ect cos(dt)
and
1 (c+id)t (c−id)t
u 2 (t) = e −e
2i
1
= ect cos(dt) + i sin(dt) − cos(dt) + i sin(dt)
2i
1
= 2 i sin(dt) ect
2i
= ect sin(dt)
are both real valued solutions! So we will use as general solution to this problem
where the constants A and B are now restricted to be real numbers. To solve the
initial value problem, then we have
Hence, to solve the initial value problem, we find A and B by solving the two
equations in two unknowns below:
A = u0
c A + d B = u1.
164 7 Linear Second Order ODEs
Solution (i): To find the characteristic equation, we assume the solution has the
form er t and plug that into the problem. We find
r 2 + 8 r + 25 er t = 0.
r 2 + 8 r + 25 = 0.
(iv): The real solutions we want are then e−4t cos(3t) and e−4t sin(3t). The general
real solution is thus
for arbitrary real numbers A and B. (v): Next, we find the values of A and B which
will let the solution satisfy the initial conditions. We have
7.4 Complex Roots 165
u(0) = A e−4t cos(3t) + B e−4t sin(3t)
t=0
= A = 3,
u (0) = −4 Ae−4t cos(3t) − 3Ae−4t sin(3t) − 4Be−4t sin(3t) + 3Be−4t cos(3t)
t=0
= −4 A + 3B = 4.
This gives the system of two equations in the two unknowns A and B
A=3
−4 A + 3 B = 4.
It then follows that B = 16/3. Thus, the solution to this initial value problem is
(iv): Next, we find the values of A and B which will let the solution satisfy the initial
conditions. We have u(0) = 3 and u (0) = 4 A + 2B = 4. This gives the system of
two equations in the two unknowns A and B
A=3
4 A + 2 B = 4 ⇒ 2B = 4 − 4 A = −8.
It then follows that B = −4. Thus, the solution to this initial value problem is
Of course, we have to judge the time interval to choose for the linspace command.
We see the plot in Fig. 7.4. Note as t gets large, u(t) oscillates out of control.
7.4 Complex Roots 167
7.4.1 Homework
Exercise 7.4.1
Exercise 7.4.2
Exercise 7.4.3
Exercise 7.4.4
We can also write these solutions in another form. Our solutions here look like
u(t) = A ect cos(dt) + B ect sin(dt) Let R = (A)2 + (B)2 . Rewrite the solution
as
A B
u(t) = R ect cos(dt) + sin(dt) .
R R
Define the angle δ by tan(δ) = BA . Then the angle’s value will depend on where A and
B just like when we find angles for complex numbers and vectors. So cos(δ) = RA and
sin(δ) = BR . Now, there is a trigonometric identity cos(E − F) = cos(E) cos(F) +
sin(E) sin(F). Here we have
u(t) = R ect cos(δ) cos(dt) + sin(δ) sin(dt)
The angle δ is called the phase shift. When written in this form, the solution is said
to be in phase shifted cosine form.
Example 7.4.3 Consider the solution u(t) = 3 e−4t cos(3t) + (16/3) e−4t sin(3t).
which gives u(0) = 3 and u (0) = 4. Find the phase shifted cosine solution.
√
Solution Let R = (3)2 + (16/3)2 = 265/3. A and B are positive so they are in
Quadrant 1. So δ = tan −1 (16/9). We have u(t) = (3)2 + (16/3)2 e−4t cos(3t−δ).
To draw this solution by hand, do the following:
√
• On your graph, draw the curve ( 265/3)e−4t . This is the top curve that bounds
our solution called the top envelope.√
• On your graph, draw the curve −( 265/3)e−4t . This is the bottom curve that
bounds our solution called the bottom envelope.
• Draw the point (0, 3) and from it draw an arrow pointing up as the initial slope is
positive.
7.4 Complex Roots 169
• The solution starts at t = 0 and points up. It hits the top curve when the cos term
hits its maximum of 1. It then flips and moves towards its minimum value of −1
where it hits the bottom curve.
• Keep drawing the curve as it hits top and bottom in a cycle. This graph is expo-
nential decay that is oscillating towards zero.
Example 7.4.4 Consider the solution u(t) = 3 e2t cos(4t) − 5 e2t sin(4t). This gives
u(0) = 3 and u (0) = 6 − 20 = −14. Convert to phase shifted form.
√
Solution Let R = (3)2 + (−5)2 = 34. Since A = 3 and √ B = −5, this is
quadrant 4. So we use δ = 2π − tan −1 (5/3). We have u(t) = 34 e2t cos(4t − δ).
To draw the solution by hand, do this:
√
• On your graph, draw the curve 34e2t . This is the top curve that bounds our
solution. √
• On your graph, draw the curve − 34e2t . This is the bottom curve that bounds
our solution.
• Draw the point (0, 3) and from it draw an arrow pointing down as the initial slope
is negative.
• The solution starts at t = 0 and points down. It hits the bottom curve when the cos
term hits its minimum of −1. It then flips and moves towards its maximum value
of 1 where it hits the top curve.
• Keep drawing the curve as it hits bottom and top in a cycle. This graph is expo-
nential growth that is oscillating out of control.
Example 7.4.5 Consider the solution u(t) = −8 e−2t cos(2t) − 6 e−2t sin(2t). Here
u(0) = −8 and u (0) = 4. Find the phase shifted form.
7.4.2.1 Homework
Exercise 7.4.5
Exercise 7.4.6
Exercise 7.4.7
We are now ready to solve what are called Linear Systems of differential equations.
These have the form
x (t) = a x(t) + b y(t) (8.1)
y (t) = c x(t) + d y(t) (8.2)
x(0) = x0 (8.3)
y(0) = y0 (8.4)
for any numbers a, b, c and d and initial conditions x0 and y0 . The full problem is
called, as usual, an Initial Value Problem or IVP for short. The two initial conditions
are just called the IC’s for the problem to save writing. For example, we might be
interested in the system
Here the IC’s are x(0) = 5 and y(0) = −3. Another sample problem might be the
one below.
For linear first order problems like u = 3u and so forth, we have found the solution
has the form u(t) = α e3t for some number α. We would then determine the value of
α to use by looking at the initial condition. To see what to do with Eqs. 8.1 and 8.2,
first let’s rewrite the problem in terms of matrices and vectors. In this form, Eqs. 8.1
and 8.2 can be written as
x (t) a b x(t)
= .
y (t) c d y(t)
The initial conditions Eqs. 8.3 and 8.4 can then be redone in vector form as
x(0) x
= 0 .
y(0) y0
Here are some examples of the conversion of a system of two linear differential
equations into matrix–vector form.
Example 8.1.1 Convert
x (t) = 6 x(t) + 9 y(t)
y (t) = −10 x(t) + 15 y(t)
x(0) = 8
y(0) = 9
Now that we know how to do this conversion, it seems reasonable to believe that is
a constant times er t solve a first order linear problem like u = r u, perhaps a vector
times er t will work here. Let’s make this formal. We’ll work with a specific system
first because numbers are always easier to make sense of in the initial exposure to a
technique. So let’s look at the problem below
Let’s assume the solution has the form V er t because by our remarks above since
this is a vector system it seems reasonable to move to using a vector rather than a
constant. Let’s denote the components of V as follows:
V1
V = .
V2
Let’s plug in our possible solution into the original problem. That is, we assume the
solution is
x(t)
= V er t .
y(t)
Hence,
x (t)
= r V er t .
y (t)
When, we plug these terms into the matrix–vector form of the problem, we find
3 2
rVe = rt
V er t .
−4 5
Since one of these terms is a matrix and one is a vector, we need to write all the terms
in terms of matrices if possible. Recall, the two by two identity matrix is
1 0
I =
0 1
Even though we don’t know yet what values of r will work for this problem, we do
know that the term er t is never zero no matter what value r has. Hence, we can say
that we are looking for a value of r and a vector V so that
8.1 Finding a Solution 175
1 0 3 2 0
r V− V = .
0 1 −4 5 0
For convenience, let the matrix of coefficients determined by our system of differ-
ential equations be denoted by A, i.e.
3 2
A= .
−4 5
Finally, noting the vector V is common, we factor again to get our last equation
0
rI−A V = .
0
We can then plug in the value of I and A to get the system of equations that r and V
must satisfy in order for V er t to be a solution.
r −3 −2 V1 0
= .
−(−4) r − 5 V2 0
To finish this discussion, note that for any value of r , this is a system of two linear
equations in the two unknowns V1 and V2 . If we choose a value of r for which
det (r I − A) was non zero, the theory we have so carefully gone over in Sect. 2.4 tells
us the two lines determined by row 1 and row 2 of this system have different slopes.
This means this system of equations has only one solution. Since both equations
cross through the origin, this unique solution must be V1 = 0 and V2 = 0. But, of
course, this tells us the solution is x(t) = 0 and y(t) = 0! We will not be able to
solve for the initial conditions x(0) = 2 and y(0) = −3 with this solution. So we
must reject any choice of r which gives us det (r I − A) = 0.
This leaves only one choice: the values of r where det (r I − A) = 0. Now, go
back to Sect. 2.10 where we discussed the eigenvalues and eigenvectors of a matrix
A. The values of r where det (r I − A) = 0 are what we called the eigenvalues of
our matrix A and for these values of r , we must find non zero vectors V (non zero
because otherwise, we can’t solve the IC’s!) so that
r −3 −2 V1 0
= .
4 r −5 V2 0
176 8 Systems
Then, for each eigenvalue we find, we should have a solution of the form
x(t) V1
= er t .
y(t) V2
In general, for a system of two linear models like this, there are three choices for the
eigenvalues.
• Two real and distinct eigenvalues r1 and r2 which the eigenvectors E 1 and E 2 . This
has been discussed thoroughly. We can now say more about this type of solution.
The two eigenvectors E 1 and E 2 are linearly independent vectors in 2 and the
two solutions er1 tt and er2 t are linear independent functions. Hence, the set of
all possible solutions which is denoted by the general solution
x(t)
= a E 1 er 1 t + b E 2 er 2 t
y(t)
represents the span of these two linearly independent functions. In fact, these
two linearly independent solutions to the model are the basis vectors of the two
dimensional vector space that consists of the solutions to this model. Note, we are
not saying anything new here, but we are saying it with new terminology and a
higher level of abstraction. Thus, the general solution will be
x(t)
= a E 1 er 1 t + b E 2 er 2 t ,
y(t)
where E 1 is the eigenvector for eigenvalue r1 and E 2 is the eigenvector for eigen-
value r2 and a and b are arbitrary real numbers chosen to satisfy the IC’s.
• The eigenvalues are repeated so that r1 = r2 = α for some real number. We are
not yet sure what to do in this case. There are two possibilities:
1. The eigenvalue of value α when plugged into the eigenvalue–eigenvector
equation
α−a −b V1 0
= .
c α−d V2 0
turns out to be as usual and the two rows of this matrix are multiples. Hence, we
use either the top or bottom row to find our choice of nonzero eigenvector E 1 .
α−a −b V1 0
= .
c α−d V2 0
8.1 Finding a Solution 177
For example, if
3 1
A=
−1 1
T
The top row and the bottom row are multiples and we find E 1 = 1 −1 . Note
in this case, the set of all V1 and V2 we can use are all multiples of E 1 . Hence,
this set of numbers forms a line through the origin in 2 . Another way of saying
this is that the set of all possible V1 and V2 here is a one dimensional subspace
of 2 . We know one solution to our model is E 1 e2t but what is the other one?
2. The other possibility is that A is a multiple of the identity, say A = 2I. Then,
the characteristic equation is the same as the first case: (r − 2)2 . However, the
eigenvalue–eigenvector equation is very different. We find
2−2 0 V1 0
= .
0 2−2 V2 0
which is a very strange system as both the top and bottom equation give 0V1 +
0V2 = 0. This says there are no restrictions on the values of V1 and V2 but they
can be picked independently. So pick V1 = 1 and V2 = 0 to give one choice
T
of eigenvector: E 1 = 1 0 . Then pick V2 = 0 and V2 = 1 to give a second
T
choice of eigenvector: E 2 = 0 1 . Another way of looking at this is the set
of all possible V1 and V2 is just and so we are free to pick any basis of 2
2
for our eigenvectors we want. Hence, we might as well pick the simplest one:
E 1 = i and E 2 = j . We actually have two linearly independent solutions to our
model in this case. They are E 1 e2t and E 2 e2t a
• In the last case, the eigenvalues are complex number. If we let the eigenvalue be
r = α + βi, note the corresponding eigenvector could be a complex vector. So
let’s write it as V = E + i F where E and F have only real valued components.
Then we know
a b
(E + i F) = r (E + i F).
c d
But all the entries of the matrix are real, so complex conjugation does not change
them. The other conjugations then give
a b
(E − i F) = r (E − i F).
c d
where c1 and c2 are complex numbers. Since we are interested in real solutions,
from this general complex solution, we will extract two linearly independent real
solutions which will form the basis for our two dimensional subspace of solutions.
We will return to this case later.
We are now ready for some definitions for Characteristic Equation of the linear
system.
Definition 8.1.1 (The Characteristic Equation of a Linear System of ODEs)
For the system
We assume the solution has the form V er t and plug this into the system. This gives
8 9 0
r V er t − V er t = .
3 −2 0
Then, since er t can never be zero no matter what value r is, we find the values of r
and the vectors V we seek satisfy
8 9 0
rI− V = .
3 −2 0
Now, if r is chosen so that det (r I − A) = 0, the only solution to this system of two
linear equations in the two unknowns V1 and V2 is V1 = 0 and V2 = 0. This leads
to the solution x(t) = 0 and y(t) = 0 always and this solution does not satisfy the
initial conditions. Hence, we must find values r which give det (r I − A) = 0. The
resulting polynomial is
8 9 r −8 −9
det r I − = det
3 −2 −3 r + 2
= (r − 8)(r + 2) − 27 = r 2 − 6r − 43.
The next one is very similar. We will expect you to be able to do this kind of
derivation also.
Example 8.1.4 Derive the characteristic equation for the system below
Assume the solution has the form V er t and plug this into the system giving
−10 −7 0
r V er t − V er t = .
8 5 0
8.1 Finding a Solution 181
Then, since er t can never be zero no matter what value r is, we find the values of r
and the vectors V we seek satisfy
−10 −7 0
rI− V = .
8 5 0
Again, if r is chosen so that det (r I − A) = 0, the only solution to this system of two
linear equations in the two unknowns V1 and V2 is V1 = 0 and V2 = 0. This gives
us the solution x(t) = 0 and y(t) = 0 always and this solution does not satisfy the
initial conditions. Hence, we must find values r which give det (r I − A) = 0. The
resulting polynomial is
−10 −7 r + 10 7
det r I − = det
8 5 −8 r − 5
= (r + 10)(r − 5) + 56 = r 2 + 5r + 6.
8.1.4 Homework
x = 2 x + 3 y
y = 8 x − 2 y
x(0) = 3
y(0) = 5.
Exercise 8.1.2
x = −4 x + 6 y
y = 9 x + 2 y
x(0) = 4
y(0) = −6.
182 8 Systems
Next, let’s do the simple case of two distinct real eigenvalues. Before you look at these
calculations, you should review how we found eigenvectors in Sect. 2.10. We worked
out several examples there, however, now let’s put them in this system context. Each
eigenvalue r has a corresponding eigenvector E. Since in this course, we want to
concentrate on the situation where the two roots of the characteristic equation are
distinct real numbers, we will want to find the eigenvector, E 1 , corresponding to
eigenvalue r1 and the eigenvector, E 2 , corresponding to eigenvalue r2 . The general
solution will then be of the form
x(t)
= a E 1 er 1 t + b E 2 er 2 t
y(t)
where we will use the IC’s to choose the correct values of a and b. Let’s do a complete
example now. We start with the system
Thus, the eigenvalues of the coefficient matrix A are r1 = −2 and r = 1. The general
solution will then be of the form
x(t)
= a E 1 e−2t + b E 2 et
y(t)
This gives
1 −4
1 −4
The two rows of this matrix should be multiples of one another. If not, we made
a mistake and we have to go back and find it. Our rows are indeed multiples, so
pick one row to solve for the eigenvector. We need to solve
1 −4 v1 0
=
1 −4 v2 0
v1 − 4 v2 = 0
1
v2 = v1
4
Letting v1 = a, we find the solutions have the form
v1 1
=a
v2 1
4
The vector
1
E1 =
1/4
This gives
4 −4
1 −1
Again, the two rows of this matrix should be multiples of one another. If not, we
made a mistake and we have to go back and find it. Our rows are indeed multiples,
so pick one row to solve for the eigenvector. We need to solve
4 −4 v1 0
=
1 −1 v2 0
v1 − v2 = 0
v2 = v1
The vector
1
E2 =
1
Finally, we solve the IVP. Given the IC’s, we find two equations in two unknowns
for a and b:
x(0) 2 1 1
= =a e0 + b e0
y(0) −4 1/4 1
a+b
= .
(1/4)a + b
a+b = 2
(1/4)a + b = −4.
This easily solves to give a = 24/5 and b = −14/5. Hence, the solution to this IVP
is
x(t) x1 (t) x2 (t)
=a +b
y(t) y1 (t) y2 (t)
1 −2t 1
= (24/5) e + (−14/5) et .
1/4 1
Note when t is very large, the only terms that matter are the ones which grow fastest.
Hence, we could say
x(t) ≈ − (14/5) et
y(t) ≈ − (14/5) et .
or in vector form
x(t) 1
≈ −(14/5) et .
y(t) 1
This is just a multiple of the eigenvector E 2 ! Note the graph of x(t) and y(t) on an
x–y plane will get closer and closer to the straight line determined by this eigenvector.
So we will call E 2 the dominant eigenvector direction for this system.
186 8 Systems
You need some additional practice. Let’s work out a few more. Here is the first one.
or
r + 20 −12
0 = det
13 r − 5
= (r + 20)(r − 5) + 156
= r 2 + 15r + 56
= (r + 8)(r + 7)
This gives
12 −12
13 −13
Again, the two rows of this matrix should be multiples of one another. If not, we
made a mistake and we have to go back and find it. Our rows are indeed multiples,
so pick one row to solve for the eigenvector. We need to solve
8.2 Two Distinct Eigenvalues 187
12 −12 v1 0
=
3 −13 v2 0
12v1 − 12 v2 = 0
v2 = v1
The vector
1
E1 =
1
This gives
13 −12
13 −12
13v1 − 12v2 = 0
v2 = (13/12)v1
The vector
1
E2 =
13/12
We solve the IVP by finding the a and b that will give the desired initial conditions.
This gives
−1 1 1
=a +b
2 1 13/12
or
−1 = a + b
2 = a + (13/12)b
This is easily solved using elimination to give a = −37 and b = 36. The solution
to the IVP is therefore
x(t) 1 −8t 1
= −37 e + 36 e−7t
y(t) 1 13/12
−37 e−8t + 36 e−7t
=
−37 e−8t − 36(13/12) e−7t
Note when t is very large, the only terms that matter are the ones which grow fastest
or, in this case, the ones which decay the slowest. Hence, we could say
x(t) 1
≈ 36 e−7t .
y(t) (13/12)
This is just a multiple of the eigenvector E 2 ! Note the graph of x(t) and y(t) on an
x–y plane will get closer and closer to the straight line determined by this eigenvector.
So we will call E 2 the dominant eigenvector direction for this system.
We now have all the information needed to analyze the solutions to this system
graphically.
Here is another example is great detail. Again, remember you will have to know
how to do these steps yourselves.
or
r −4 −9
0 = det
1 r +6
= (r − 4)(r + 6) + 9
= r 2 + 2 r − 15
= (r + 5)(r − 3)
This gives
−9 −9
1 1
We need to solve
−9 −9 v1 0
=
1 1 v2 0
v1 + v2 = 0
v2 = −v1
The vector
1
E1 =
−1
This gives
−1 −9
1 9
v1 + 9 v2 = 0
−1
v2 = v1
9
Letting v1 = b, we find the solutions have the form
v1 1
=b
v2 −1/9
The vector
1
−1/9
We solve the IVP by finding the a and b that will give the desired initial conditions.
This gives
4 1 1
=a +b
−2 −1 −1/9
or
4 = a+b
−1
−2 = −a + b
9
This is easily solved using elimination to give a = 7/4 and B = 9/4. The solution
to the IVP is therefore
x(t) 7 1 −5t 9 1
= e + −1 e3t
y(t) 4 −1 4 9
7 −5t 9 3t
4
e +4e
= −7
4
e + −1
−5t
4
e3t
8.2.2 Homework
Exercise 8.2.1
x = 3 x + y
y = 5 x − y
x(0) = 4
y(0) = −6.
Exercise 8.2.2
x = x + 4 y
y = 5 x + 2 y
x(0) = 4
y(0) = −5.
Exercise 8.2.3
x = −3 x + y
y = −4 x + 2 y
x(0) = 1
y(0) = 6.
Let’s try to analyze these systems graphically. We are interested in what these solu-
tions look like for many different initial conditions. So let’s look at the problem
The set of (x, y) pairs where x = 0 is called the nullcline for x; similarly, the points
where y = 0 is the nullcline for y. The x equation can be set equal to zero to get
−3x + 4y = 0. This is the same as the straight line y = 3/4 x. This straight line
divides the x–y plane into three pieces: the part where x > 0; the part where x = 0;
and, the part where x < 0. In Fig. 8.1, we show the part of the x–y plane where
x > 0 with one shading and the part where it is negative with another. Similarly, the
y equation can be set to 0 to give the equation of the line −x + 2y = 0. This gives
the straight line y = 1/2 x. In Fig. 8.2, we show how this line also divided the x–y
plane into three pieces .
8.2 Two Distinct Eigenvalues 193
The shaded areas shown in Figs. 8.1 and 8.2 can be combined into Fig. 8.3. In this
figure, we divide the x–y plane into four regions marked with a I, II, III or IV. In each
region, x and y are either positive or negative. Hence, each region can be marked
with an ordered pair, (x ±, y ±).
8.2.3.2 Homework
Exercise 8.2.4
x = 2 x + 3 y
y = 8 x − 2 y
x(0) = 3
y(0) = 5.
Exercise 8.2.5
x = −4 x + 6 y
y = 9 x + 2 y
x(0) = 4
y(0) = −6.
Now we add the eigenvector lines. In Sect. 8.2, we found that this system has eigen-
values r1 = −2 and r2 = 1 with associated eigenvectors
1 1
E1 = , E2 = .
1/4 1
8.2 Two Distinct Eigenvalues 195
determines a straight line with slope b/a. Hence, these eigenvectors each determine
a straight line. The E 1 line has slope 1/4 and the E 2 line has slope 1. We can graph
these two lines overlaid on the graph shown in Fig. 8.3.
8.2.3.4 Homework
Fig. 8.4 Drawing the nullclines and the eigenvector lines on the same graph
196 8 Systems
Exercise 8.2.6
x = 2 x + 3 y
y = 8 x − 2 y
x(0) = 3
y(0) = 5.
Exercise 8.2.7
x = −4 x + 6 y
y = 9 x + 2 y
x(0) = 4
y(0) = −6.
In each of the four regions, we know the algebraic signs of the derivatives x and
y . If we are given an initial condition (x0 , y0 ) which is in one of these regions, we
can use this information to draw the set of points (x(t), y(t)) corresponding to the
solution to our system
x (t) = −3 x(t) + 4 y(t)
y (t) = −x(t) + 2 y(t)
x(0) = x0
y(0) = y0 .
This set of points is called the trajectory corresponding to this solution. The first
point on the trajectory is the initial point (x0 , y0 ) and the rest of the points follow
from the solution
x(t) 1 1
=a e−2t + b et .
y(t) 1/4 1
x0 = a + b
y0 = (1/4)a + b
x(t) = a e−2t + b et
y(t) = (1/4)a e−2t + b et .
8.2 Two Distinct Eigenvalues 197
Hence,
dy y (t)
=
dx x (t)
(−2/4)a e−2t + b et
=
−2a e−2t + b et
when t is large, as long as b is not zero, the terms involving e−2t are negligible and
so we have
dy y (t)
=
dx x (t)
b et
≈
b et
≈ E2.
Hence, when t is large, the slopes of the trajectory approach 1, the slope of E 2 . So,
we can conclude that for large t, as long as b is not zero, the trajectory either parallels
the line determined by E 2 or approaches it asymptotically.
Of course, if an initial condition is chosen that lies on the line determined by E 1 ,
then a little thought will tell you that b is zero in this case and we have
dy y (t)
=
dx x (t)
(−2/4)a e−2t
=
−2a e−2t
= E1.
In this case,
x(t) = a e−2t
y(t) = (1/4)a e−2t .
and so the coordinates (x(t), y(t)) go to (0, 0) along the line determined by E 1 .
We conclude that unless an initial condition is chosen exactly on the line deter-
mined by E 1 , all trajectories eventually begin to either parallel the line determined by
E 2 or approach it asymptotically. If the initial condition is chosen on the line deter-
mined by E 1 , then the trajectories stay on this line and approach the origin where
they stop as that is a place where both x and y become 0. In Fig. 8.5, we show three
trajectories which begin in Region I. They all have a (+, +) sign pattern for x and
y , so the x and y components should both increase. We draw the trajectories with
the concavity as shown because that is the only way they can smoothly approach the
eigenvector line E 2 . We show this in Fig. 8.5.
198 8 Systems
Is it possible for two trajectories to cross? Consider the trajectories shown in Fig. 8.6.
These two trajectories cross at some point. The two trajectories correspond to dif-
ferent initial conditions which means that the a and b associated with them will be
different. Further, these initial conditions don’t start on eigenvector E 1 or eigenvec-
tor E 2 , so the a and b values for both trajectories will be non zero. If we label these
trajectories by (x1 , y1 ) and (x2 , y2 ), we see
x1 (t) = a1 e−2t + b1 et
y1 (t) = (1/4)a1 e−2t + b1 et .
and
x2 (t) = a2 e−2t + b2 et
y2 (t) = (1/4)a2 e−2t + b2 et .
8.2 Two Distinct Eigenvalues 199
E2
I (+,+)
x = 0
II (−,+)
y = 0
E1
E2
Since we assume they cross, there has to be a time point, t ∗ , so that (x1 (t ∗ ), y1 (t ∗ ))
and (x2 (t ∗ ), y2 (t ∗ )) match. This means, using vector notation,
x1 (t) 1 1
= a1 e−2t + b1 et ,
y1 (t) 1/4 1
x2 (t) 1 1
= a2 e−2t + b2 et .
y2 (t) 1/4 1
This is clearly not possible, so we have to conclude that trajectories can’t cross. We
can do this sort of analysis for trajectories that start in any region, whether it is I, II, III
or IV. Further, a similar argument shows that a trajectory can’t cross an eigenvector
line as if it did, the argument above would lead us to the conclusion that E 1 is a
multiple of E 2 , which it is not.
We can state the results here as formal rules for drawing trajectories.
assume the eigenvalues r1 and r2 are different with either both negative or one
negative and one positive. Let E 1 and E 2 be the associated eigenvectors. Then, the
trajectories of this system corresponding to different initial conditions can not cross
each other. In particular, trajectories can not cross eigenvector lines.
In region II, trajectories start where x < 0 and y > 0. Hence, the x values must
decrease and the y values, increase in this region. We draw the trajectory in this way,
making sure it curves in such a way that it has no corners or kinks, until it hits the
nullcline x = 0. At that point, the trajectory moves into region I. Now x > 0 and
y > 0, so the trajectory moves upward along the eigenvector E 2 line like we showed
in the Region I trajectories. We show this in Fig. 8.7. Note although the trajectories
seem to overlap near the E 2 line, they actually do not because trajectories can not
cross as was be explained in Sect. 8.2.3.6.
8.2 Two Distinct Eigenvalues 201
Next, we examine trajectories that begin in Region III. Here x and y are negative,
so the x and y values will decrease and the trajectories will approach the dominant
eigenvector E 2 line from the right side as is shown in Fig. 8.8. The initial condition
that starts in Region III above the eigenvector E 2 line will move towards the y = 0
line following x < 0 and y < 0 until it hits the line x = 0 using x < 0 and y > 0.
Then it moves upward towards the eigenvector E 2 line as shown. It is easier to see
this in a magnified view as shown in Fig. 8.9.
Finally, we examine trajectories that begin in region IV. Here x is positive and y is
negative, so the x values will grow and y values will decrease. The trajectories will
behave in this manner until they intersect the x = 0 nullcline. Then, they will cross
202 8 Systems
into Region III and approach the dominant eigenvector E 2 line from the left side as
is shown in Fig. 8.10.
In Fig. 8.11, we show all the region trajectories on one plot. We can draw more, but
these should be enough to give you an idea of how to draw them. In addition, there
is a type of trajectory we haven’t drawn yet. Recall, the general solution is
x1 (t) 1 −2t 1
=a e +b et .
y1 (t) 1/4 1
E2
I (+,+)
x = 0
II (−,+)
y = 0
III (−,−)
E1
Thus, these trajectories start somewhere on the eigenvector E 1 line and then as t
increases, x(t) and y(t) go to (0, 0) along this eigenvector. You can easily imagine
these trajectories by placing a dot on the E 1 line with an arrow pointing towards the
origin.
We can do this sort of qualitative analysis for the three cases:
• One eigenvalue negative and one eigenvalue positive: example r1 = −2 and r2 = 1
which we have just completed.
• Both eigenvalues negative: example r1 = −2 and r2 = −1 which we have not
done.
• Both eigenvalues positive: example r1 = 1 and r2 = 2 which we have not done.
• In each case, we have two eigenvectors E1 and E2 . The way we label our eigenval-
ues will always make most trajectories approach the E2 line as t increases because
r2 is always the largest eigenvalue.
204 8 Systems
Here we have one negative eigenvalue and one positive eigenvalue. The positive one
is the dominant one: example, r1 = −2 and r2 = 1 so the dominant eigenvalue is
r2 = 1.
• Trajectories that start on the E1 line go towards (0, 0) along that line.
• Trajectories that start on the E2 line move outward along that line.
• All other ICs give trajectories that move outward from (0, 0) and approach the
dominant eigenvector line, the E2 line as t increases.
• The (+, +), (+, −), (−, +) and (−, −) regions tells us the details of how this is
done. We find these regions using the nullcline analysis.
8.2 Two Distinct Eigenvalues 205
Here we have two negative eigenvalues. The least negative one is the dominant one:
example, r1 = −2 and r2 = −1 so the dominant eigenvalue is r2 = −1.
• Trajectories that start on the E1 line go towards (0, 0) along that line.
• Trajectories that start on the E2 line go towards (0, 0) along that line.
• All other ICs give trajectories move towards (0, 0) and approach the dominant
eigenvector line, the E2 line as t increases.
• The (+, +), (+, −), (−, +) and (−, −) regions tells us the details of how this is
done. We find these regions using the nullcline analysis.
Now we have two positive eigenvalues. The positive one is the dominant one: exam-
ple, r1 = 2 and r2 = 3 so the dominant eigenvalue is r2 = 3.
• Trajectories that start on the E1 line move outward along that line.
• Trajectories that start on the E2 line move outward along that line.
206 8 Systems
• All other ICs give trajectories move outward and approach the dominant
eigenvector line, the E2 line as t increases. This case where both eigenvalues
are positive is the hardest one to draw and many times these trajectories become
parallel to the dominant line rather than approaching it.
• The (+, +), (+, −), (−, +) and (−, −) regions tells us the details of how this is
done. We find these regions using the nullcline analysis.
8.2.3.14 Examples
Finally, here is how we would work out a problem by hand in Figs. 8.12, 8.13 and
8.14; note the wonderful handwriting displayed on these pages.
8.2.3.15 Homework
You are now ready to do some problems on your own. For the problems below
Exercise 8.2.8
x (t) 1 3 x(t)
=
y (t) 3 1 y(t)
x(0) −3
= .
y(0) 1
Exercise 8.2.9
x (t) 3 12 x(t)
=
y (t) 2 1 y(t)
x(0) 6
= .
y(0) 1
8.2 Two Distinct Eigenvalues 211
Exercise 8.2.10
x (t) −1 1 x(t)
=
y (t) −2 −4 y(t)
x(0) 3
= .
y(0) 8
Exercise 8.2.11
x (t) 3 4 x(t)
=
y (t) −7 −8 y(t)
x(0) −2
= .
y(0) 4
Exercise 8.2.12
x (t) −1 1 x(t)
=
y (t) −3 −5 y(t)
x(0) 2
= .
y(0) −4
Exercise 8.2.13
x (t) −5 2 x(t)
=
y (t) −4 1 y(t)
x(0) 21
= .
y(0) 5
We call the vector whose components are u and v, the inputs to this model. In general,
if the external inputs are both zero, we call the model a homogeneous model and
otherwise, it is a nonhomogeneous model. Now we don’t know how to solve this, but
let’s try a guess. Let’s assume there is a solution xinputs (t) = x ∗ and yinputs (t) = y ∗ ,
where x ∗ and y ∗ are constants. Then,
∗
xinputs (t) x 0
= ∗ =
yinputs (t) y 0
Now plugging this assumed solution into the original model, we find
∗
0 −2 3 x u
= +
0 4 5 y∗ v
Since det ( A) = −22 which is not zero, we know A−1 exists. Manipulating a bit,
our original equation becomes
∗
−2 3 x u
=−
4 5 y∗ v
for any inputs u and v. For this sample problem, the characteristic equation is
det (r I − A) = r 2 − 3r − 22
for arbitrary a and b. A little thought then shows that adding [xno input (t), yno input (t)]T
to the solution [xinputs (t), yinputs (t)]T will always solve the model with inputs. So
the most general solution to the model with constant inputs must be of the form
x(t) x (t) x (t)
= no input + inputs
y(t) yno input (t) yinputs (t)
u
= a E 1 e−3.42t + b E 2 e6.42t − A−1
v
where A is the coefficient matrix of our model. The solution [xno input (t), yno input (t)]T
occurs so often it is called the homogeneous solution to the model (who knows why!)
and the solution [xinputs (t), yinputs (t)]T because it actually works for the model
with these particular inputs, is called the particular solution. To save subscript-
ing, we label the homogeneous solution [x h (t), yh (t)]T and the particular solution
[x p (t), y p (t)]T . Finally, any model with inputs that are not zero is called an non-
homogeneous model. We are thus ready for a definition.
for input functions f (t) and g(t) is called the particular solution and is labeled
[x p (t), y p (t)]T . The model with no inputs is called the homogeneous model and its
solutions are called homogeneous solutions and labeled [x h (t), yh (t)]T . The general
solution to the model with nonzero inputs is then
x(t) x (t) x (t)
= h + p
y(t) yh (t) y p (t)
For a linear model, from our discussions above, we know how to find a complete
solution.
214 8 Systems
Example 8.2.6
x (t) 2 −5 x(t) 2
= +
y (t) 4 −7 y(t) 1
x(0) 1
=
y(0) 5
Note, this particular solution is biologically reasonable if x and y are proteins and
the dynamics represent their interaction since x p and y p are positive.
The homogeneous solution is
x h (t)
= a E 1 e−3t + b E 2 e−2t
yh (t)
which is two equations in two unknowns which we solve in the usual way. We find
1 = a + b + 3/2
5 = a + 0.8b + 1
8.2 Two Distinct Eigenvalues 215
or
−1/2 = a + b
4 = a + 0.8b
8.2.3.18 Homework
Exercise 8.2.14
x (t) 1 3 x(t) 3
= +
y (t) 3 1 y(t) 5
x(0) −3
= .
y(0) 1
Exercise 8.2.15
x (t) 3 12 x(t) 6
= +
y (t) 2 1 y(t) 8
x(0) 6
= .
y(0) 1
Exercise 8.2.16
x (t) −1 1 x(t) 1
= +
y (t) −2 −4 y(t) 10
x(0) 3
= .
y(0) 8
Exercise 8.2.17
x (t) 3 4 x(t) 0.3
= +
y (t) −7 −8 y(t) 6
x(0) −2
= .
y(0) 4
216 8 Systems
Exercise 8.2.18
x (t) −1 1 x(t) 7
= +
y (t) −3 −5 y(t) 0.5
x(0) 2
= .
y(0) −4
Exercise 8.2.19
x (t) −5 2 x(t) 23
= +
y (t) −4 1 y(t) 11
x(0) 21
= .
y(0) 5
In this case, the two roots of the characteristic equation are both the same real value.
For us, there are two cases where this can happen.
Let the matrix A be a multiple of the identity. For example, we have the linear system
The characteristic equation is then (r + 2)2 = 0 with repeated roots r = −2. When
we solve the eigenvalue–eigenvector equation, we find when we substitute in the
eigenvalue r = −2 that we must solve
00 V1 0
=
00 V2 0
8.3 Repeated Eigenvalues 217
which is a strange looking system of equations which we have not seen before. The
way to interpret this is to go back to looking at it as two equations in the unknowns
V1 and V2 . We have
0 V1 + 0 V2 = 0
0 V1 + 0 V2 = 0
As usual both rows of the eigenvalue–eigenvector equation are the same and so
choosing the top row to work with, the equation says there are no constraints on the
values of V1 and V2 . Hence, any nonzero vector [a, b]T will work. This gives us the
two parameter family
V1 a 1 0
= a +b .
V2 b 0 1
We choose as the first eigenvector E 1 = [1, 0]T and as the second eigenvector
E 2 = [0, 1]T . Hence, the two linearly independent solutions to this system are
E 1 e−2t and E 2 e−2t with the general solution
−2t
x(t) 1 −2t 0 −2t Ae
=A e +B e =
y(t) 0 1 Be−2t
Clearly, these two rows are equivalent and hence we only need to choose one to solve
for V2 in terms of V1 . The first row gives V1 + V2 = 0. Letting V1 = a, we find
V2 = −a. Hence, the eigenvectors have the form
1
V =a
−1
and find conditions on the vectors F and G that make this true. If this is a solution,
then we must have
x2 (t) x2 (t)
= A
y2 (t) y2 (t)
Rewriting, we obtain
A F e3t + A G t e3t = 3 F + G e3t + 3t G e3t
8.3 Repeated Eigenvalues 219
AG =3G
A F = 3 F + G.
which can easily be solved for F. The next linearly independent solution therefore
has the form (F + Et) e3t where F solves the system
1 1 F1 −1
= −E =
−1 −1 F2 1
The contribution to the general solution from b E can be lumped in with the first
solution Ee3t as we have discussed, so we only need choose
0
F= .
−1
Note for large t, the (x, y) trajectory is essentially parallel to the eigenvector line for
E which has slope −1. Hence, if we scale by e3t to obtain
x(t)/e3t 2 − 7t
=
y(t)/e3t 5 + 7t
the phase plane trajectories will look like a series of parallel lines. If we keep in the
e3t factor, the phase plane trajectories will exhibit some curvature. Without scaling, a
typical phase plane plot for multiple trajectories looks like what we see in Fig. 8.15.
If we do the scaling, we see the parallel lines as we mentioned. This is shown in
Fig. 8.16.
From our discussions for this case of repeated eigenvalues, we can see the general
rule that if r = α is a repeated eigenvalue with only one eigenvector E, the two
linearly independent solutions are
x1 (t)
= E eαt
y1 (t)
x2 (t)
= F + Et eαt .
y2 (t)
8.3.2.1 Homework
We can now do some problems. Find the solution to the following models.
Exercise 8.3.1
Exercise 8.3.2
Exercise 8.3.3
Exercise 8.3.4
Let’s begin with a theoretical analysis for a change of pace. If the real valued matrix
A has a complex eigenvalue r = α + iβ, then there is a nonzero vector G so that
AG = (α + iβ)G.
A G = (α + iβ) G.
However, since A has real entries, its complex conjugate is simply A back. Thus,
after taking complex conjugates, we find
A G = (α − iβ) G
for arbitrary complex numbers c1 and c2 . We can reorganize this solution into a more
convenient form as follows.
φ(t)
= eαt c1 (E + i F) e(iβ)t + c2 (E − i F) e(−iβ)t
ψ(t)
= eαt c1 e(iβ)t + c2 e(−iβ)t E + i c1 e(iβ)t − c2 e(−iβ)t F .
The first real solution is found by choosing c1 = 1/2 and c2 = 1/2. This give
x1 (t) αt
(iβ)t (−iβ)t
(iβ)t (−iβ)t
=e (1/2) e +e E + i (1/2) e −e F .
y1 (t)
However, we know that (1/2) e(iβ)t +e(−iβ)t = cos(βt) and (1/2) e(iβ)t −e(−iβ)t =
i sin(βt). Thus, we have
x1 (t) αt
=e E cos(βt) − F sin(βt) .
y1 (t)
The second real solution is found by setting c1 = 1/2i and c2 = −1/2i which gives
x2 (t)
= eαt (1/2i) e(iβ)t − e(−iβ)t E + i (1/2i) e(iβ)t + e(−iβ)t F
y2 (t)
= eαt E sin(βt) + F cos(βt) .
Although it is not immediately apparent, the second row is a multiple of row one.
Multiply row one by −1 − 2i. This gives the row [−1 − 2i, 5]. Now multiple this new
row by −1 to get [1 + 2i, −5] which is row one. So even though it is harder to see,
these two rows are equivalent and hence we only need to choose one to solve for V2
in terms of V1 . The first row gives (1 + 2i)V1 + 5V2 = 0. Letting V1 = a, we find
V2 = (−1 − 2i)/5 a. Hence, the eigenvectors have the form
1 1 0
G=a = +i
− 1+2i
5
− 15 − 25
Hence,
1 0
E= and F =
− 15 − 25
8.4.1.1 Homework
We can now do some problems. Find the solution to the following models.
Exercise 8.4.1
Exercise 8.4.2
Exercise 8.4.3
Exercise 8.4.4
We know
x (t) x(t)
= A .
y (t) y(t)
ab = a E + b F, b E − a F .
Whew! But wait, we can do more! With a bit more factoring, we have
x (t) αt α cos(βt) − β sin(βt)
= e ab
y (t) β cos(βt) + α sin(βt)
α −β cos(βt)
= eαt ab .
β α sin(βt)
8.4 Complex Eigenvalues 227
Hence, since
x(t) cos(βt)
= eαt ab .
y(t) sin(βt)
we see
−1
u (t) x (t)
= ab
v (t) y (t)
−1
x(t)
= ab A .
y(t)
But, we know
−1 −1
α −β
ab A= ab .
β α
228 8 Systems
Thus, we have
−1
u (t) α −β x(t)
= ab
v (t) β α y(t)
α −β u(t)
= .
β α v(t)
This transformed system in the variables u and v also has the eigenvalues α + iβ but
it is simpler to solve.
This gives the equation iβV1 + βV2 = 0. Letting V1 = 1, we have V2 = −iβ. Thus,
V1 1 1 0
= = +i
V2 −iβ 0 −1
Thus,
1 0
E= and F =
0 −1
Let the angle δ be defined by tan(δ) = b/a. This tells us cos(δ) = a/R and sin(δ) =
b/R. Then, cos(π/2 − δ) = b/R and sin(π/2 − δ) = a/R. Plugging these values
into our expression, we find
u(t) cos(δ) cos(βt) + sin(δ) sin(βt)
= R eαt
v(t) − cos(π/2 − δ) cos(βt) + sin(π/2 − δ) sin(βt)
cos(δ) cos(βt) + sin(δ) sin(βt)
= R eαt
− sin(δ) cos(βt) + cos(δ) sin(βt)
8.4.5.1 Homework
Exercise 8.4.5
Exercise 8.4.6
Exercise 8.4.7
Exercise 8.4.8
Then we have
x(t) αt λ11 cos(βt) + λ12 sin(βt)
=e
y(t) λ21 cos(βt) + λ22 sin(βt)
Let R1 = λ211 + λ212 and R2 = λ221 + λ222 . Then
⎡ ⎤
λ11 λ12
⎢ R1 cos(βt) +
R1 R1
sin(βt) ⎥
x(t)
= eαt ⎢
⎣ ⎥
⎦
y(t)
R2 λR212 cos(βt) + λ22
R2
sin(βt)
8.4 Complex Eigenvalues 231
Let the angles δ1 and δ2 be defined by tan(δ1 ) = λ12 /λ11 and tan(δ2 ) = λ22 /λ21
where in the case where we divide by zero, these angles are assigned the value of
±π/2 as needed. For convenience of exposition, we will assume here that all the
entries of ab are nonzero although, of course, what we really know is that this
matrix is invertible and so it is possible for some entries to be zero. But that is just
a messy complication and easy enough to fix with a little thought. So once we have
our angles δ1 and δ2 , we can rewrite the solution as
⎡ ⎤
R cos(δ ) cos(βt) + sin(δ ) sin(βt)
⎢ 1 1 1 ⎥
x(t)
= eαt ⎢ ⎥ = eαt R1 cos(βt − δ1 )
y(t) ⎣ ⎦ R2 cos(βt − δ2 )
R2 cos(δ2 ) cos(βt) + sin(δ2 ) sin(βt)
This is a little confusing as is, so let’s do an example with numbers. Suppose our
solution was
x(t) αt 2 cos(βt − π/6)
=e
y(t) 3 cos(βt − π/3)
Then this solution is clearly periodic since the cos functions are periodic. Note
that x hits its maximum and minimum value of 2 and −2 at the times t1 where
βt1 −π/6 = 0 and t2 with βt2 −π/6 = π. This gives t1 = π/(6β) and t2 = 7π/(6β).
At these values of time, the√y values are 3 cos(βt1 − π/3) = 3 cos(π/6 − π/3)
or y = 3 cos(−π/6) √ = 3 3/2 and 3 cos(βt2 − √ π/3) = 3 cos(π/3 − √ π/6) or
y = 3 cos(π/6) = −3 3/2. Thus, the points (2, 3 3/2) √ and (−2, −3 3/2) are
on this trajectory. The trajectory
√ reaches the point (2, 3 3/2) in time π/(6β) and
then hits the point (−2, −3 3/2) at time π/β + π/(6β).
The extremal y values occur at ±3. The maximum y of 3 is obtained at βt3 −
π/3 = 0 or t3 = π/(3β). √ The corresponding x value is then 2 cos(βt3 − √π/6) =
2 cos(π/3 − π/6) = 2 3/2. So at time t3 , the trajectory passes through (2 3/2, 3).
Finally, the minimum y value of −3 is achieved at βt4 − π/3 = π √ or βt4 = 8π/6.
At this time, the corresponding x value is 2 cos(8π/6 − π/6) = −2 3/2.
This is probably not very helpful; however, what we have shown here is that this
trajectory is an ellipse which is rotated. Take a sheet of paper and mark off the x and
y axes. Choose a small angle to represent π/(6β). Draw a√line through the origin
at angle √π/(6β) and on this line mark the two points (2, 3 3/2) (angle is π/(6β)
and (−3 3/2, −2) (angle 7π/(6β)). This line is the horizontal axis of the ellipse.
Now draw another line through the origin at angle π/(3β) which is double the first
angle.
√ This line is the vertical axis of√the ellipse. On this line plot the two points
(2 3/2, 3) (angle is π/(3β) and (−2 3/2, −3) (angle 4π/(3β)). This is a phase
shifted ellipse. At time π/(6β), we start at the farthest positive x value of the ellipse
on the horizontal axis. Then in π/(6β) additional time, we hit the largest y value
of the vertical axis. Next, in an additional 5π/(6β) we reach the most negative x
232 8 Systems
value of the ellipse on the horizontal axis. After another π/(6β) we arrive at the most
negative y value on the vertical axis. Finally, we arrive back at the start point after
another 5π/(6β). Try drawing it!
Maybe a little Matlab/Octave will help. Consider the following quick plot. We
won’t bother to label axes and so forth as we just want to double check all of the
complicated arithmetic above.
Listing 8.1: Checking our arithmetic!
beta = 3;
x = @( t ) 2∗ c o s ( b e t a ∗ t − p i / 6 ) ;
y = @( t ) 3∗ c o s ( b e t a ∗ t − p i / 3 ) ;
t = linspace (0 ,1 ,21) ;
u = x( t ) ;
v = y( t ) ;
plot (u , v) ;
This generates part of this ellipse as we can see in Fig. 8.17. We can close the plot
by plotting for a longer time.
Listing 8.2: Plotting for more time
beta = 3;
x = @( t ) 2∗ c o s ( b e t a ∗ t − p i / 6 ) ;
y = @( t ) 3∗ c o s ( b e t a ∗ t − p i / 3 ) ;
t = linspace (0 ,3 ,42) ;
u = x( t ) ;
v = y( t ) ;
plot (u , v) ;
This fills in the rest of the ellipse as we can see in Fig. 8.18.
We can plot the axes of this ellipse easily as well using the MatLab/Octave session
below.
We can now see the axes lines we were talking about earlier in Fig. 8.19. Note these
axes are not perpendicular like we would usually see in an ellipse! Now, this picture
does not include the exponential decay or growth we would get by multiplying by
the scaling factor eαt for various α’s. A typical spiral out would look like Fig. 8.20.
To determine the direction of motion in these trajectories, we do the usual nullcline
analysis to get the algebraic sign pairs for (x , y ) as usual.
8.4.6.1 Homework
Write MatLab/Octave code to graph the following solutions and their axes. We are
using the same models we have solved before but with different initial conditions.
234 8 Systems
Exercise 8.4.9
Exercise 8.4.10
Exercise 8.4.11
Exercise 8.4.12
We now want to learn how to solve systems of differential equations. A typical system
is the following
where x and y are our variables of interest which might represent populations to two
competing species or other quantities of biological interest. The system starts at time
t0 (which for us is usually 0) and we specify the values the variables x and y start at
as x0 and y0 respectively. The functions f and g are “nice” meaning that as functions
of three arguments (t, x, y) they do not have jumps and corners. The exact nature of
“nice” here is a bit beyond our ability to discuss in this introductory course, so we
will leave it at that. For example, we could be asked to solve the system
f (t, x, y) = 3x + 2x y − 3y 2
or the system
The functions f and g in these examples are not linear in the variables x and y;
hence the system Eqs. 9.4 and 9.5 and system Eqs. 9.7 and 9.8 are what are called
a nonlinear systems. In general, arbitrary functions of time, φ and ψ to the model
giving
The functions φ and ψ are what we could call data functions. For example, if
φ(t) = sin(t) and ψ(t) = te−t , the system Eqs. 9.10 and 9.11 would become
f (t, x, y) = 3x + 2x y − 3y 2 + sin(t)
g(t, x, y) = −2x + 2x 2 y + 5y + te−t
How do we solve such a system of differential equations? There are some things
we can do if the functions f and g are linear in x and y, but many times we will be
forced to look at the solutions using numerical techniques. We explored how to solve
first order equations in Chap. 3 and we will now adapt the tools developed in that
chapter to systems of differential equations. First, we will show you how to write
a system in terms of matrices and vectors and then we will solve some particular
systems.
x 1 (t) u(t)
x(t) = =
x 2 (t) u (t)
Then,
Let the matrix above be called A. Then we have converted the original system into
the matrix–vector equation x (t) = A x(t), x(0) = x 0 .
Let the matrix above be called A. Then we have converted the original system into
the matrix–vector equation x (t) = A x(t), x(0), x(0) = x 0 .
Example 9.1.3 Convert to a matrix–vector system
u1 (t) x(t)
u(t) = = .
u2 (t) y(t)
9.1.1 Homework
Now let’s consider how to adapt our previous code to handle these systems of dif-
ferential equations. We will begin with the linear second order problems because
we know how to do those already. First, let’s consider a general second order linear
problem.
where we assume a is not zero, so that we really do have a second order problem!
As usual, we let the vector x be given by
x 1 (t) u(t)
x(t) = =
x 2 (t) u (t)
242 9 Numerical Methods Systems of ODEs
Then,
x 1 (t) = u (t) = x2 (t),
x 2 (t) = u (t)
= −(c/a)u(t) − (b/a)u (t) + (1/a) g(t)
Let the matrix above be called A. Then we have converted the original system into
the matrix–vector equation x (t) = A x(t), x(0) = x 0 . For our purposes of using
MatLab, we need to write this in terms of vectors. We have
x 1 (t) x2
=
x 2 (t) −(c/a) x 1 − (b/a) x 2 + (1/a) g(t)
x + 4x − 5x = te−0.03t
x(0) = −1.0
x (0) = 1.0
A + B = −1
−5A + B = 1
It is straightforward to see that A = −1/3 and B = −2/3. Then we solve the system
using MatLab with this session:
One comment about this code. The function true has vector values; the first com-
ponent is the true solution and the second component is the true solution’s derivative.
Since we want to plot only the true solution, we need to extract it from true. We
do this with the command ytrue=true(time) which saves the vector of true
values into the new variable ytrue. Then in the plot command, we plot only the
solution by using ytrue(1,:). This generates a plot as shown in Fig. 9.1.
244 9 Numerical Methods Systems of ODEs
Now let’s add an external input, g(t) = 10 sin(5 ∗ t)e−0.03t . In a more advanced
class, we could find the true solution for this external input, but it if we changed to
g(t) = 10 sin(5 ∗ t)e−0.03t we would not be able to do that. So in general, there
2
are many models we can not find the true solution to. However, the Runge–Kutta
methods work quite well. Still, we always have the question in the back of our minds:
is this plot accurate?
% d e f i n e t h e d y n a mi c s f o r y ’ ’ + 4 y − 5 y = 10 s i n ( 5 t ) e ˆ{ −.03 t }
a = 1 ; b = 4 ; c = −5;
B = −(b / a ) ; A = −(c / a ) ; C = 1 / a ;
g = @( t ) 10∗ s i n ( 5 ∗ t ) . ∗ exp ( −.03∗ t ) ;
5 f = @( t , y ) [ y ( 2 ) ;A∗y ( 1 ) + B∗y ( 2 ) + C∗g ( t ) ] ;
y0 = [ − 1 ; 1 ] ;
h = .2;
T = 3;
N = c e i l (T/ h ) ;
10 [ htime1 , rkapprox1 ] = FixedRK ( f , 0 , y0 , h , 1 ,N) ;
y h a t 1 = rkapprox1 ( 1 , : ) ;
[ htime2 , rkapprox2 ] = FixedRK ( f , 0 , y0 , h , 2 ,N) ;
y h a t 2 = rkapprox2 ( 1 , : ) ;
[ htime3 , rkapprox3 ] = FixedRK ( f , 0 , y0 , h , 3 ,N) ;
9.2 Linear Second Order Problems as Systems 245
15 y h a t 3 = rkapprox3 ( 1 , : ) ;
[ htime4 , rkapprox4 ] = FixedRK ( f , 0 , y0 , h , 4 ,N) ;
y h a t 4 = rkapprox4 ( 1 , : ) ;
p l o t ( htime1 , yhat1 , ’o’ , htime2 , yhat2 , ’*’ , . . .
htime3 , yhat3 , ’+’ , htime4 , yhat4 , ’-’ ) ;
20 x l a b e l ( ’Time’ ) ;
y l a b e l ( ’Approx y’ ) ;
t i t l e ( ’Solution to x’’’’ + 4x’’ - 5x = 10 sin(5t) eˆ{-.03t}, x(0) = -1, x’’(0) =
1 on [0,3]’ ) ;
l e g e n d ( ’RK1’ , ’RK2’ , ’RK3’ , ’RK4’ , ’Location’ , ’Best’ ) ;
On all of these problems, choose an appropriate stepsize h, time interval [0, T ] for
some positive T and
• find the true solution and write this as MatLab code.
• find the Runge–Kutta order 1 through 4 solutions.
• Write this up with attached plots.
Exercise 9.2.1
For these models, find the Runge–Kutta 1 through 4 solutions and do the write up
with plot as usual.
Exercise 9.2.4
We now turn our attention to solving systems of linear ODEs numerically. We will
show you how to do it in two worked out problems. We then generate the plot of y
versus x, the two lines representing the eigenvectors of the problem, the x = 0 and
the y = 0 lines on the same plot. A typical MatLab session would look like this:
9.3 Linear Systems Numerically 247
This generates Fig. 9.3. This matches the kind of qualitative analysis we have done
by hand, although when we do the plots by hand we get a more complete picture.
Here, we see only one plot instead of the many trajectories we would normally sketch.
We solve the IVP by finding the A and B that will give the desired initial conditions.
This gives
4 1 1
=A + B
−2 −1 −1/9
or
4= A + B
−1
−2 = − A + B
9
250 9 Numerical Methods Systems of ODEs
This is easily solved using elimination to give A = 7/4 and B = 9/4. The solution
to the IVP is therefore
x(t) 7 1 −5t 9 1
= e + −1 e3t
y(t) 4 −1 4
9
7/4 e−5t + 9/4 e3t
=
−7/4 e−5t − 1/4 e3t
We now have all the information needed to solve this numerically. The MatLab session
for this problem is then
This generates Fig. 9.4. Again this matches the kind of qualitative analysis we
have done by hand, but for only one plot instead of the many trajectories we would
normally sketch. We had to choose the final time T and the step size h by trial and
error to generate the plot you see. If T is too large, the growth term in the solution
generates x and y values that are too big and the trajectory just looks like it lies on
top of the dominant eigenvector line.
9.3.1 Homework
Exercise 9.3.1
x (t) 1 1 x(t)
=
y (t) −1 −3/2 y(t)
x(0) −5
=
y(0) −2
Exercise 9.3.2
x (t) 4 2 x(t)
=
y (t) −9 −5 y(t)
x(0) −1
=
y(0) 2
Exercise 9.3.3
x (t) 37 x(t)
=
y (t) 46 y(t)
x(0) −2
=
y(0) −3
252 9 Numerical Methods Systems of ODEs
Now let’s try to generate a real phase plane portrait by automating the phase plane
plots for a selection of initial conditions. Consider the code below which is saved in
the file AutoPhasePlanePlot.m.
There are some new elements here. We set up vectors u and v to construct our
initial conditions from. Each initial condition is of the form (u i , v j ) and we use that
to set the initial condition x0 we pass into FixedRK() as usual. We start by telling
MatLab the plot we are going to build is a new one; so the previous plot should be
erased. The command hold all then tells MatLab to keep all the plots we generate
as well as the line colors and so forth until a hold off is encountered. So here we
generate a bunch of plots and we then see them on the same plot at the end! A typical
session usually requires a lot of trial and error. In fact, you should find the analysis by
hand is actually more informative! As discussed, the AutoPhasePlanePlot.m
script is used by filling in values for the inputs it needs. Again the script has these
inputs
9.4 An Attempt at an Automated Phase Plane Plot 253
So let’s start using some more MatLab tools. As always, with power comes an
increased need for responsible behavior!
We will now discuss certain ways to compute eigenvalues and eigenvectors for a
square matrix in MatLab. For a given A, we can compute its eigenvalues as follows:
A =
1 2 3
6 4 5 6
7 8 −1
E = e i g (A)
11 E =
−0.3954
11.8161
−6.4206
9.5 Further Automation! 255
So we have found the eigenvalues of this small 3 × 3 matrix. Note, in general they are
not returned in any sorted order like small to large. Bummer! To get the eigenvectors,
we do this:
V =
10 D =
−0.3954 0 0
0 11.8161 0
0 0 −6.4206
λ1 = −0.3954
⎡ ⎤
0.7530
V1 = ⎣ −0.6525 ⎦
0.0847
λ2 = 11.8161
⎡ ⎤
−0.3054
V2 = ⎣ −0.7238 ⎦
−0.6187
λ3 = −6.4206
⎡ ⎤
−0.2580
V3 = ⎣ −0.3770 ⎦
0.8896
1 2 3 4 5
2 5 6 7 9
11 3 6 1 2 3
4 7 2 8 9
5 9 3 9 6
[W, Z ] = e i g ( B)
16
W =
26 Z =
0.1454 0 0 0 0
0 2.4465 0 0 0
0 0 −2.2795 0 0
31 0 0 0 −5.9321 0
0 0 0 0 26.6197
It is possible to show that the eigenvalues of a symmetric matrix will be real and
eigenvectors corresponding to distinct eigenvalues will be 90◦ apart. Such vectors
are called orthogonal and recall this means their inner product is 0. Let’s check it
out. The eigenvectors of our matrix are the columns of W above. So their dot product
should be 0!
3 C =
1 . 3 3 3 6 e −16
Well, the dot product is not actually 0 because we are dealing with floating point
numbers here, but as you can see it is close to machine zero (the smallest number
our computer chip can detect). Welcome to the world of computing!
We have already solves linear models using MatLab tools. Now we will learn to do a
bit more. We begin with a sample problem. Note to analyze a linear systems model,
9.5 Further Automation! 257
we can do everything by hand, we can then try to emulate the hand work, using
computational tools. A sketch of the process is thus:
1. For the system below, first do the work by hand.
x (t) 4 −1 x(t)
=
y (t) 8 −5 y(t)
x(0) −3
=
y(0) 1
D = max ( xtop , y t o p ) ;
x = l i n s p a c e (−D, D, 2 0 1 ) ;
25 p l o t ( x , E1 ( x ) , ’-r’ , x , E2 ( x ) , ’-m’ , x , xp ( x ) , ’-b’ , x , yp ( x ) , ’-c’ ,X, Y, ’-k’ ) ;
x l a b e l ( ’x’ ) ;
y l a b e l ( ’y’ ) ;
t i t l e ( ’Phase Plane for Linear System x’’ = 4x-y, y’’=8x-5y, x(0) = -3, y(0) =
1’ ) ;
l e g e n d ( ’E1’ , ’E2’ , ’x’’=0’ , ’y’’=0’ , ’y vs x’ , ’Location’ , ’Best’ ) ;
A =
4 −1
6 8 −5
[V, D] = e i g s (A)
V =
11
0.1240 0.7071
0.9923 0.7071
16 D =
−4 0
0 3
you should be able to see that the eigenvectors we got before by hand are the
same as these except they are written as vectors of length one.
4. Now plot many trajectories at the same time. We discussed this a bit earlier. It is
very important to note that the hand analysis is in many ways easier. A typical
MatLab session usually requires a lot of trial and error! So try not to get too
frustrated. The AutoPhasePlanePlot.m script is used by filling in values
for the inputs it needs.
Now some attempts.
9.5.3 Project
For this project, follow the outline just discussed above to solve any one of these
models. Then
Solve the Model By Hand: Do this and attach to your project report.
Plot One Trajectory Using MatLab: Follow the outline above. This part of the
report is done in a word processor with appropriate comments, discussion etc.
Make sure you document all of your MatLab work thoroughly! Show your MatLab
code and sessions as well as plots.
Find the Eigenvalues and Eigenvectors in MatLab: Explain how the MatLab
work connects to the calculations we do by hand.
Plot Many Trajectories Simultaneously Using MatLab: This part of the report
is also done in a word processor with appropriate comments, discussion etc. Show
your MatLab code and sessions as well as plots with appropriate documentation.
Exercise 9.5.1
x (t) 15 x(t)
=
y (t) 51 y(t)
x(0) 4
=
y(0) 10
Exercise 9.5.2
x (t) 1 1 x(t)
=
y (t) −1 −3/2 y(t)
x(0) −4
=
y(0) −6
Exercise 9.5.3
x (t) 4 2 x(t)
=
y (t) −9 −5 y(t)
x(0) 2
=
y(0) −5
Exercise 9.5.4
x (t) 37 x(t)
=
y (t) 46 y(t)
x(0) −2
=
y(0) 5
x (t) 15 x(t)
=
y (t) 51 y(t)
x(0) 2
=
y(0) 3
9.6 AutoPhasePlanePlot Again 261
Here is an enhanced version of the automatic phase plane plot tool. It would be nice
to automate the plotting of the eigenvector lines, the nullclines and the trajectories so
that we didn’t have to do so much work by hand. Consider the new function in Listing
9.17. In this function, we pass in vecfunc into the argument fname. We evaluate
this function using the command feval(fname,0,[a time; an x]);. The
linear model has a coefficient matrix A of the form
ab
cd
xp = @( x ) −(a / b ) ∗x ;
40 yp = @( x ) −(c / d ) ∗x ;
% c l e a r o u t any o l d p i c t u r e s
clf
% s e t u p x and y i n i t i a l c o n d i t i o n box
x i c = l i n s p a c e ( xmin , xmax , x b o x s i z e ) ;
45 y i c = l i n s p a c e ( ymin , ymax , y b o x s i z e ) ;
% f i n d a l l t h e t r a j e c t o r i e s and s t o r e them
Approx = { } ;
f o r i = 1: x b o x s i z e
f o r j =1: y b o x s i z e
50 x0 = [ x i c ( i ) ; y i c ( j ) ] ;
[ ht , rk ] = FixedRK ( fname , 0 , x0 , s t e p s i z e , 4 , n ) ;
Approx{ i , j } = rk ;
U = Approx{ i , j } ;
X = U( 1 , : ) ;
55 Y = U( 2 , : ) ;
% g e t t h e p l o t t i n g s q u a r e f o r each t r a j e c t o r y
umin = min (X) ;
umax = max (X) ;
utop = max ( abs ( umin ) , abs ( umax ) ) ;
60 vmin = min (Y) ;
vmax = max (Y) ;
v t o p = max ( abs ( vmin ) , abs ( vmax ) ) ;
D( i , j ) = max ( utop , v t o p ) ;
end
65 end
% get the l a r g e s t square to put a l l the p l o t s i n t o
E = max ( max (D) )
% setup the x linspace for the p l o t
x = l i n s p a c e (−E , E , 2 0 1 ) ;
70 % s t a r t the hold
h o l d on
% p l o t t h e e i g e n v e c t o r l i n e s and t h e n t h e n u l l c l i n e s
p l o t ( x , E 1 l i n e ( x ) , ’-r’ , x , E 2 l i n e ( x ) , ’-m’ , x , xp ( x ) , ’-b’ , x , yp ( x ) , ’-c’ ) ;
% l o o p o v e r a l l t h e ICS and g e t a l l t r a j e c t o r i e s
75 f o r i = 1: x b o x s i z e
f o r j =1: y b o x s i z e
U = Approx{ i , j } ;
X = U( 1 , : ) ;
Y = U( 2 , : ) ;
80 p l o t (X, Y, ’-k’ ) ;
end
end
% s e t l a b e l s and s o f o r t h
x l a b e l ( ’x’ ) ;
85 y l a b e l ( ’y’ ) ;
t i t l e ( ’Phase Plane’ ) ;
l e g e n d ( ’E1’ , ’E2’ , ’x’’=0’ , ’y’’=0’ , ’y vs x’ , ’Location’ , ’BestOutside’ ) ;
% s e t zoom f o r p l o t u s i n g mag
a x i s ([ −E∗mag E∗mag −E∗mag E∗mag ] ) ;
90 % f i n i s h the hold
hold o f f ;
end
fname. To extract A, we use the following lines of code. Note, for a linear model,
f = @(t,x) [a*x(1) + b*x(2); c*x(1)+d*x(2)];. Hence, we can
do evaluations to find the coefficients: f(0,[1;0]) = [a;c] and f(0,[0;1])
= [b;d].
Next, we set the first column of V to be the eigenvector E1 and the second column
of V to be the eigenvector E1. Then we setup the lines we need to plot the eigenvectors
and the nullclines. The x = 0 nullcline is ax + by = 0 or y = −(a/b)x and the
y = 0 nullcline is cx + dy = 0 or y = −(c/d)x.
Then clear any previous figures with clf before we get started with the plot.
The initial conditions are chosen by dividing the interval [xmin, xmax] into
xboxsize points. Similarly, we divide [ymin, ymax] into yboxsize points.
We use these xboxsi ze × yboxsi ze possible pairs as our initial conditions.
9.6 AutoPhasePlanePlot Again 265
Listing 9.21: Set up Initial conditions, find trajectories and the bounding boxes
% s e t up p o s s i b l e x c o o r d i n a t e s o f t h e ICs
x i c = l i n s p a c e ( xmin , xmax , x b o x s i z e ) ;
% s e t up p o s s i b l e y c o o r d i n a t e s o f t h e ICs
y i c = l i n s p a c e ( ymin , ymax , y b o x s i z e ) ;
5 % s e t up a d a t a s t r u c t u r e c a l l a c e l l t o s t o r e
% each t r a j e c t o r y
Approx = { } ;
% loop over a l l p o s s i b l e i n i t i a l condition
f o r i =1: x b o x s i z e
10 f o r j =1: y b o x s i z e
% s e t t h e IC
x0 = [ x i c ( i ) ; y i c ( j ) ] ;
% s o l v e t h e model u s i g n RK4 .
% r e t u r n the approximate v a l u e s in rk
15 [ ht , rk ] = FixedRK ( fname , 0 , x0 , s t e p s i z e , 4 , n ) ;
% s t o r e r k a s t h e i , j t h e n t r y i n t h e c e l l Approx
Approx{ i , j } = rk ;
% s e t U t o be t h e c u r r e n t Approx c e l l e n t r y
% which i s t h e same a s t h e r e t u r n e d r k
20 U = Approx{ i , j } ;
% s e t t h e f i r s t row o f U t o be X
% s e t t h e s e c o n d row o f U t o be Y
X = U( 1 , : ) ;
Y = U( 2 , : ) ;
25 % find the square the t r a j e c t o r y f i t s i n s i d e
umin = min (X) ;
umax = max (X) ;
utop = max ( abs ( umin ) , abs ( umax ) ) ;
vmin = min (Y) ;
30 vmax = max (Y) ;
v t o p = max ( abs ( vmin ) , abs ( vmax ) ) ;
% s t o r e the s i z e of the square t h a t f i t s the t r a j e c t o r y
% f o r t h i s IC
D( i , j ) = max ( utop , v t o p ) ;
35 end
end
Now that we have all these squares for the possible trajectories, we find the biggest
one possible and set up the linspace command for this box. All of our trajectories
will be drawn inside the square [−E, E] × [−E, E].
Listing 9.22: Set up the bounding box for all the trajectories
E = max ( max (D) )
x = l i n s p a c e (−E , E , 2 0 1 ) ;
Next, plot all the trajectories and set labels and so forth. The last thing we do is
to set the axis as axis([-E*mag E*mag -E*mag E*mag]); which zeros or
zooms in on the interval [−E ∗ mag, E ∗ mag] × [−E ∗ mag E ∗ mag].
266 9 Numerical Methods Systems of ODEs
9.6.1 Project
Exercise 9.6.1
x (t) 1 1 x(t)
=
y (t) −1 −3/2 y(t)
Exercise 9.6.2
x (t) 4 2 x(t)
=
y (t) −9 −5 y(t)
9.6 AutoPhasePlanePlot Again 267
Exercise 9.6.3
x (t) 37 x(t)
=
y (t) 46 y(t)
Exercise 9.6.4
x (t) 15 x(t)
=
y (t) 51 y(t)
In general, although it is easy to write down a two protein system, it is a bit harder to
make sure you get positive values for the long term protein concentration levels. So as
we were making up problems, we used a MatLab script to help get a good coefficient
matrix. This is an interesting use of Matlab and for those of you who would like to dig
268 9 Numerical Methods Systems of ODEs
more deeply into this aspect of modeling, we encourage you to study closely what
we do. The terms that inhibit or enhance the other protein’s production in general
are a c1 y in the x dynamics and a c2 x in the y dynamics. Note, we can always think
of a two dimensional vector like this
c1 1
= c1
c2 c2 /c1
We can do a similar thing for the constant production terms d1 for x and d2 for y to
write
d1 1 1 β
= d1 = d1 =
d2 d2 /d1 s sβ
for d1 = β and s = d1 /d2 . Finally, we can handle the two decay rates for x and y
similarly. If these two rates are −α1 for x and −α2 for y, we can model that as
α1 1 1 α
= α1 = α1 =
α2 α2 /α1 u uα
for α1 = α and u = α1 /α2 . So, if u is a lot more than 1, y will decay fast and we
would expect x to take a lot longer to reach equilibrium. Our general two protein
model will then have the form
x = −α x − γ y + β
y = −uα y + sβ − tγ x
and we want to choose these parameters so we get positive protein levels. The coef-
ficient matrix A here is
−α −γ
A=
−t γ −u α
This implies
uα2
t< .
γ2
Next, we set up the growth levels. We want the particular solution to have positive
components. If we let the particular solution be X p , we know X p = −A−1 F where
F is the vector of constant external inputs. Hence, we want −A−1 F to have positive
components. Thus,
−u ∗ α γ β 0
−(1/det (A)) >
t ∗ γ −α sβ 0
−u α β + s β γ < 0
t β γ − s α β < 0.
Cancelling β, we find
−u α + s γ < 0 ⇒ s γ < u α
t γ − s α < 0 ⇒ tγ < sα.
Plugging in t, we want
sγ<uα
α2
0.5 u 2 γ < sα.
γ
270 9 Numerical Methods Systems of ODEs
Hence, any s on the line between 0.5uα and uα/γ will work. Parameterize this line
as s = u(α/γ)(1 − 0.5z) which at z = 0 gives uα/γ and at z = 1 gives 0.5uαγ. We
will choose z = 1/3. The commands in Matlab are then
We can easily write MatLab code to do all of this in a function we’ll call two-
proteins. It will return a good A and F for our choices of α, γ, u and β.
Listing 9.26: Finding the A and F for a two protein model: uncommented
f u n c t i o n [A, F ] = twoprotsGetAF ( alpha , gamma , u , b e t a )
2 %
% −a l p h a i s d e c a y r a t e f o r p r o t e i n x
% −gamma i s d e c a y r a t e f o r p r o t e i n y i n f l u e n c i n g p r o t e i n x
% −t ∗gamma i s d e c a y r a t e f o r p r o t e i n x i n f l u e n c i n g p r o t e i n y
% −u∗ a l p h a i s d e c a y r a t e f o r p r o t e i n y
7 % b e t a i s t h e g r o w th r a t e f o r p r o t e i n x
% s ∗ b e t a i s the growth r a t e f o r p r o t e i n y
% A i s t h e d y n a mi c s
%
t = 0 . 5 ∗ a lp h a ˆ 2 ∗ u / ( gamma ˆ 2 )
12 A = [− alpha ,−gamma;− t ∗gamma,−u∗ a l p h a ] ;
% s e t up g r o w t h l e v e l s
z = 1/3;
s = ( 1 − 0 . 5 ∗ z ) ∗ a lp h a ∗u / gamma
F = [ beta ; s ∗ beta ] ;
The section above shows us how to design the two protein system so the protein
levels are always positive so let’s try it out. Let α = γ = a for any positive a and
u = 8. Then for our nice choice of t, we have could do the work by hand and find
−a −a
A=
−4a −8a
But let’s be smarter and let MatLab do this for us! We can do this for any positive a
and generate a reasonable two protein model to play with. In Matlab. we begin by
setting up the matrix A with twoprotsGetAF. We will choose do this for β = 10.
272 9 Numerical Methods Systems of ODEs
−0.0010000 −0.0010000
10 −0.0040000 −0.0080000
F
F =
10.000
15 66.667
Then we find the inverse of A using the formulae we developed in class for the
inverse of a 2 × 2 matrix of numbers.
5 −2000 250
1000 −250
4 3333.67
6666.67
Now we find the eigenvalues and eigenvectors of the matrix A using the eig com-
mand as we have done before. Recall, this commands returns the eigenvectors as
columns of a matrix V and returns the eigenvalues as the diagonal entries of the
matrix D.
0.88316 0.13163
5 −0.46907 0.99130
D =
D i a g o n a l Matrix
10
−4.6887 e −04 0
0 −8.5311 e −03
9.7 A Two Protein Model 273
Notice that since eigenvector one and eigenvector two are the columns of the matrix
V, we can rewrite this as in matrix–vector form as
x(t) C1
=V + X P.
y(t) C2
Our protein models start with zero levels of proteins, so to satisfy the initial condi-
tions, we have to solve
0 0.88316 0.13163 3333.67
= C1 + C2 +
0 −0.46907 0.99130 6666.67
5 −2589.4
−7950.4
Then we can construct the solutions x(t) and y(t) so we can plot the protein concen-
trations as a function of time. First, store the eigenvalues in a more convenient form
by grabbing the diagonal entries of D using Ev = diag(D). Then, we construct
the solutions like this:
274 9 Numerical Methods Systems of ODEs
In summary, to solve the two protein problem, you would just type a few lines in
MatLab as follows:
The last thing is to think about response times. Here x grows a lot slower by design
as we set u = 8 and so we could temporarily think of x as ≈0 giving the y dynam-
ics, y ≈ −uαy + sβ. Thus, the response time for y is about tr = ln(2)/uα or
y
Note the large difference in time scales! The protein levels we converge to are
x = 3333.67 and y = 6666.67, but the response time for the x protein is much
longer than the response time for the y protein. So if we plot over different time
scales, we should see primarily y protein until we reach a suitable fraction of the x
y
protein response time. Figure 9.11 shows a plot over a short time scale, 10tr . Over
10 short time scales, protein y is essentially saturated but protein x is only about 1/3
of its equilibrium value.
y
Figure 9.12 shows a plot over a long time scale, 10trx which is about 80tr . This
allows x to get close to its asymptotic value. Note the long term value of y drops
some from its earlier peak. Our response times here are just approximations and the
Fig. 9.11 Phase plane for two interacting proteins on a short time scale
9.7 A Two Protein Model 277
Fig. 9.12 Phase plane for two interacting proteins on a long time scale
protein interactions of the model eventually take effect the the equilibrium value of
y drops to its final particular solution value.
9.7.3 Project
Now we are ready for the project. For the model choice of α = 0.003, γ = 0.005
and u = 8 and β = 0.01, we expect x to grow a lot slower. Your job is the follow
the discussion above and generate a nice report. The report has this format.
Introduction (5 Points) Explain what we are trying to do here with this model.
Contrast this model with two proteins to our earlier model for one protein.
Description of Model (10 Points) Describe carefully what each term in the model
represents in terms of protein transcription. Since these models are abstractions
of reality, explain how these models are a simplified version of the protein tran-
scription process.
278 9 Numerical Methods Systems of ODEs
Annotated Solution Discussion (27 Points) In this part, you solve the model using
MatLab as I did in the example above. As usual, explain your steps nicely. Cal-
culate the approximate response times for protein x and protein y and use them
to generate appropriate plots. For each plot you generate, provide useful labels,
legends and titles and include them in your document.
Conclusion (5 Points) Discuss what you have done here. Do you think you could
do something similar for more than two proteins using MatLab?
References (3 Points) Put any reference material you use in here.
Now do it all again, but this time set β = 2 with u = 0.1. This should switch the
short term and long term protein behavior.
This is a simple model of how the ocean responds to the various inputs that give rise
to greenhouse warming. The ocean is modeled as two layers: the top layer is shallow
and is approximately 100 m in depth while the second layer is substantially deeper
as it is on average 4000 m in depth. The top layer is called the mixed layer as it
interacts with the atmosphere and so there is a transfer of energy back and forth from
the atmosphere to the top layer. The bottom layer is the deep layer and it exchanges
energy only with the mixed layer. The model we use is as follows:
Cm Tm = F − λ1 Tm − λ2 (Tm − Td )
Cd Td = λ2 (Tm − Td )
There are a lot of parameters here and they all have a physical meaning.
t: This is time.
Tm : This is the temperature of the mixed layer. We assume it is measured as the
deviation from the mixed layer’s equilibrium temperature. Hence, Tm = 1 would
mean the temperature has risen 1◦ from its equilibrium temperature. We therefore
set Tm = 0 initially and when we solve our model, we will know how much Tm
has gone up.
Td : This is the temperature of the deep layer. Again, this is measured as the
difference from the equilibrium temperature of the deep layer. We set Td = 0
initially also.
F: This is the external input which represents all the outside things that contribute
to global warming such as CO2 release and so forth. It does not have to be a constant
but in our project we will use a constant value for F.
Cm : This is the heat capacity of the mixed layer.
Cd : This is the heat capacity of the deep layer.
λ1 : This is the exchange coefficient which determines the rate at which heat is
transferred from the mixed layer to the atmosphere.
9.8 A Two Box Climate Model 279
λ2 : This is the exchange coefficient which determines the rate at which heat is
transferred from the mixed layer to the deep layer.
Looking at the mixed layer dynamics, we see there are two loss terms for the mixed
layer temperature. The first is based on the usual exponential decay model using the
exchange coefficient λ1 which specifies how fast heat is transferred from the mixed
layer to the atmosphere. The second loss term models how heat is transferred from the
mixed layer to the deep layer. We assume this rate is proportional to the temperature
difference between the layers which is why this loss is in terms of Tm − Td . The
deep layer temperature dynamics is all growth as the deep layer is picking up energy
from the mixed layer above it. Note this type of modeling—a loss in Tm and the
loss written as a gain for Td —is exactly what we will do in the SIR disease model
that comes later. Our reference for this model is the nice book on climate modeling
(Vallis 2012). It is a book all of you can read with profit as it uses mathematics you
now know very well. We can rewrite the dynamics into the standard form with a little
manipulation.
Tm −(λ1 + λ2 )/Cm λ2 /Cm Tm F/Cm
= +
Td λ2 /Cd −λ2 /Cd Td 0
Hence, this model is another of our standard linear models with an external input
such as we solved in the two protein model. The A matrix here is
−(λ1 + λ2 )/Cm λ2 /Cm
A=
λ2 /Cd −λ2 /Cd
and letting F denote the external input vector and T denote the vector of layer
derivatives, we see the two box climate model is represented by our usual dynamics
T = AT + F which we know how to solve.
Note the particular solution here is T P = − A−1 F.
1 −λ2 /Cd −λ2 /Cm F/Cm
TP = −
det ( A) −λ2 /Cd −(λ1 + λ2 )/Cm 0
280 9 Numerical Methods Systems of ODEs
So, if both eigenvalues of this model were negative, we would have the long term
equilibrium of both the mixed and deep layer would be the same Td∞ = Tm∞ = F/λ1 .
Next, we show the eigenvalues are indeed negative here.
For convenience of exposition, let α = λ1 /Cm , β = λ2 /Cm and γ = λ2 /Cd .
Also, it is helpful to express Cd as a multiple of Cm , so we write Cd = ρCm where
ρ is our multiplier. With these changes, the model can be rewritten. We find
Tm −(α + β) β Tm F/Cm
= +
Td γ −γ Td 0
But since γ = λ2 /Cd and Cd = ρCm we have γ = β/ρ. This gives the new form
Tm −(α + β) β Tm F/Cm
= +
Td β/ρ −β/ρ Td 0
Hence, any two box climate model can be represented by the dynamics T = AT + F
where
−α − β β
A=
β/ρ −β/ρ
The eigenvalue corresponding to the minus is negative. Next, look at the other root.
The discriminant D is the term inside the square root. Let’s show it is positive and
that will help us show the other root is negative also. We have
9.8 A Two Box Climate Model 281
D = (α + β + β/ρ)2 − 4αβ/ρ
= (α + β)2 + 2(α + β)(β/ρ) + β 2 /ρ2 − 4αβ/ρ
= (α + β)2 − 2αβ/ρ + 2β 2 /ρ + β 2 /ρ2
= 2αβ + β 2 + 2β 2 /ρ + β 2 /ρ2 − 2αβ/ρ + α2
So we see
Hence, the eigenvalues r1 and r2 are both negative and the general solution is
Tm F/λ1
= a E1 e r 1 t + b E2 e r 2 t +
Td F/λ1
This is our familiar protein synthesis model with steady state value of T̂m∞ = F/(λ1 +
λ2 ) and response time t Rm = Cm ln(2)/(λ1 + λ2 ). After sufficient response times
have passed for the mixed layer temperature to reach quasi equilibrium, we will have
Tm ≈ 0 as Tm is no longer changing by much. Hence, setting Tm = 0, we find a
relationship between Tm and Td once this new equilibrium is achieved. We have
−(λ1 + λ2 )Tm + F + λ2 Td = 0
which tells us that Tm = (λ2 Td + F)/(λ1 + λ2 ). Substitute this value of Tm into the
Td dynamics and we will get the final equilibrium value of Td for time much larger
that t Rm . The deep layer temperature dynamics now become
282 9 Numerical Methods Systems of ODEs
This is also a standard protein synthesis problem with a steady state value of
λ2 λ1 λ2
T̂d∞ = F /
Cd (λ1 + λ2 ) Cd (λ1 + λ2 )
F
=
λ1
ln(2)Cd (λ1 + λ2 )
t Rd =
λ1 λ2
and as t gets large, both temperatures approach the common steady state value of
F/λ1 .
So with a little reasoning and a lot of algebra we find the two response times of
the climate model are t Rm and t Rd as given above. To see these results in a specific
problem, we wrote a MatLab script to solve a typical climate problem. For the values
λ1 = 0.1, λ2 = 0.15000, Cm = 1.0820, Cd = 4.328 = 4Cm so that ρ = 4 and an
external input of F = 0.5, we can compute the solutions to the climate model as
follows
We see the approximate response time of the mixed layer is only 3 years but the
approximate response time for the deep layer is about 50 years. This MatLab session
generates three plots as shown in Figs. 9.13, 9.14 and 9.15.
9.8.1 Project
Now we are ready for the project. Solve the two box climate model with λ1 = 0.01,
λ2 = 0.02000, Cm = 0.34625, ρ = 2.2222 (so that Cd = ρCm ) and an external
input of F = 0.06. For this model, follow what I did in the example and generate a
report in word as follows:
Introduction (8 Points) Explain what we are trying to do here with this model.
The project description here tells you how this model works. Make sure you
understand all of the steps so that you can explain what this model really means
for policy decisions on global warning.
Description of Model (12 Points) Find additional references and use them to build
a two page description of this kind of simple climate model. Pay particular atten-
tion to a discussion of what the parameters in the model mean and what the
9.8 A Two Box Climate Model 285
There is talk currently about strategies to reduce carbon loading via a genetically
engineered bacteria or a nano machine. Such a strategy is called carbon sequestering
and it is not clear if this is possible. If you have read or watched much science fiction,
you’ll recognize carbon sequestering as a type of terraforming. This is certainly
both ambitious and no doubt a plan having a lot of risk that needs to be thought
about carefully. However, if we assume such a control strategy for carbon loading
is implemented say 25 years in the future, what does our simple model say? The
analysis is straightforward. At 25 years, our model reach a value Tm25 and Td25 which
we know are far from the true equilibrium values of ≈5◦ of global warming in our
example. If the sequestering strategy were instantly successful this would correspond
to setting T = 0 in our external data. This gives a straight exponential decay system
25
Tm −(λ1 + λ2 )/Cm λ2 /Cm Tm Tm T
= , = m25 .
Td λ2 /Cd −λ2 /Cd Td Td Td
The eigenvalue and eigenvectors for this system are the same as before and we know
half life of about 50 years.
ln(2)Cd (λ1 + λ2 )
t Rd ≈ ≈ 50.
λ1 λ2
It doesn’t really matter at what point carbon loading stops. So we can do this analysis
whether carbon loading stops 25 years in the future or 50 years. The deep layer still
takes a long time to return to a temperature of 0 which for us represents current
temperature and thus no global warming. Note the half life t Rd ≈ 50 is what drives
this. The constants λ1 and λ2 represent exchange rates between the mixed layer and
the atmosphere and the deep layer and the mixed layer. The constant Cd is the heat
capacity of the deep layer. If we also assumed carbon sequestering continued even
286 9 Numerical Methods Systems of ODEs
after carbon loading was shut off, perhaps this would correspond to increasing the
two critical parameters λ1 and λ2 and decreasing Cd . Since
1 1
t Rd ≈ ln(2) Cd +
λ1 λ2
such changes would decrease t Rd and bring the deep layer temperature to 0 faster. It
is conceivable administering nano machines and gene engineered bacteria might do
this but it is a strategy that would alter fundamental balances that we have had in
place for millions of years. Hence, it is hard to say if this is wise; certainly further
study is needed. But the bottom line is the carbon sequestering strategies will not
easily return the deep layer to a state of no global increase in temperature quickly.
Note we could also assume the carbon sequestering control strategy alters carbon
loading as a simple exponential decay Fe− t for some positive epsilon. Then our
model becomes
− t 25
Tm −(λ1 + λ2 )/Cm λ2 /Cm Tm Fe Tm T
= + = m25 .
Td λ2 /Cd −λ2 /Cd Td 0 Td Td
which we could solve numerically. Here, the carbon loading would not instanta-
neously drop to zero at some time like 25; instead is decays gracefully. However, this
still does not change the fact that the response time is determined by the coefficient
matrix A and so the return to a no global warming state will be slow. Finally, it is
sobering to think about how all of this analysis plays out in the backdrop of political
structures that remain in authority for about 4 years or so. We can see how hard it is
to elicit change when the results won’t show up for 10–40 administrations!
Reference
G. Vallis (ed.), Climate and the Oceans, Princeton Primers on Climate (Princeton University Press,
Princeton, 2012)
Part IV
Interesting Models
Chapter 10
Predator–Prey Models
In the 1920s, the Italian biologist Umberto D’Ancona studied population variations
of various species of fish that interact with one another. He came across the data
shown in Table 10.1.
Here, we interpret the percentage we see in column two of Table 10.1 as predator
fish, such as sharks, skates and so forth. Also, the catches used to calculate these
percentages were reported from all over the Mediterranean. The tonnage from all
the different catches for the entire year were then added and used to calculate the
percentages in the table. Thus, we can also calculate the percentage of catch that was
food by subtracting the predator percentages from the predator ones. This leads to
what we see in Table 10.2.
D’Ancona noted the time period coinciding with World War One, when fishing
was drastically cut back due to military actions, had puzzling data. Let’s highlight
this in Table 10.3. D’Ancona expected both food fish and predator fish to increase
when the rate of fishing was cut back. But in these war years, there is a substantial
increase in the percentage of predators caught at the same time the percentage of
food fish went down. Note, we are looking at percentages here. Of course, the raw
tonnage of fish caught went down during the war years, but the expectation was
that since there is reduced fishing, there should be a higher percentage of food fish
because they have not been harvested. D’Ancona could not understand this, so he
asked the mathematician Vito Volterra for help.
10.1 Theory
Volterra approached the modeling this way. He let the variable x(t) denote the pop-
ulation of food fish and y(t), the population of predator fish at time t. He was
constructing what you might call a coarse model. The food fish are divided into
categories like halibut, mackerel with a separate variable for each and the predators
are also not divided into different classes like sharks, squids and so forth. Hence,
instead of dozens of variables for both the food and predator population, everything
was lumped together. Following Volterra, we make the following assumptions:
1. The food population grows exponentially. Letting xg denote the growth rate of
the food fish, we must have
xg = a x
2. The number of contacts per unit time between predators and prey is proportional
to the product of their populations. We assume the food fish are eaten by the
predators at a rate proportional to this contact rate. Letting the decay rate of the
food be denoted by xd , we see
xd = − b x y
x = a x − b x y.
for some positive constants a and b. He made assumptions about the predators as
well.
1. Predators naturally die following an exponential decay; letting this decay rate be
given by yd , we have
yd = −c y
yg = d x y
y = −c y + d x y.
for some positive constants c and d. The full Volterra model is thus
x = a x − b x y (10.1)
y = −c y + d x y (10.2)
x(0) = x0 (10.3)
y(0) = y0 (10.4)
Equations 10.1 and 10.2 give the dynamics of this system. Note these are nonlinear
dynamics, the first we have seen since the logistics model. Equations 10.3 and 10.4
292 10 Predator–Prey Models
are the initial conditions for the system. Together, these four equations are called a
Predator–Prey system. Since Volterra’s work, this model has been applied in many
other places. A famous example is the wolf–moose predator–prey system which has
been extensively modeled for Island Royale in Lake Superior. We are now going to
analyze this model. We have been inspired by the analysis given Braun (1978), but
Braun can use a bit more mathematics in his explanations and we will try to use only
calculus ideas.
Once we obtain a solution (x, y) to the Predator–Prey problem, we have two nice
curves x(t) and y(t) defined for all non negative time t. As we did in Chap. 8, if we
graph in the x–y plane the ordered pairs (x(t), y(t)), we will draw a curve C where
any point on C corresponds to an ordered pair (x(t), y(t)) for some time t. At t = 0,
we are at the point (x0 , y0 ) on C . Hence, the initial conditions for the Predator–Prey
problem determine the starting point on C . As time increases, the pairs (x(t), y(t))
move in the direction of the tangent line to C . If we knew the algebraic sign of the
derivatives x and y at any point on C , we could decide the direction in which we
are moving along the curve C . So we begin our analysis by looking at the curves in
the x–y plane where x and y become 0. From these curves, we will be able to find
out the different regions in the plane where each is positive or negative. From that,
we will be able to decide in which direction a point moves along the curve.
or
∗ ∗
0 = x(t ) a − b y(t )
are the ones where the rate of change of the food fish will be zero. Now these pairs
can correspond to many different time values t ∗ so what we really need to do is to
find all the (x, y) pairs where this happens. Since this is a product, there are two
possibilities: x = 0; the y axis and y = ab ; a horizontal line.
In a similar way, the pairs (x, y) where y becomes zero satisfy the equation
0 = y −c + d x .
Again, there are two possibilities: y = 0; the x axis and x = dc ; a vertical line.
10.2.2.1 An Example
Just like we did in Chap. 8, we find the parts of the x–y plane where the algebraic
signs of x and y are (+, +), (+, −), (−, +) and (−, −). As usual, the set of (x, y)
pairs where x = 0 is called the nullcline for x; similarly, the points where y = 0 is
the nullcline for y. The x = 0 equation gives us the y axis and the horizontal line
y = ab while the y = 0 gives the x axis and the vertical line x = dc . The x dynamics
thus divide the plane into three pieces: the part where x > 0; the part where x = 0;
and, the part where x < 0.
We have already determined the nullclines for this model as shown in Figs. 10.1 and
10.2. We combine the x and y nullcline information to create a map of how x and
y change sign in the x y plane.
• Regions I, II, III and IV divide Quadrant 1.
• We will show there are trajectories moving down the positive y axis and out along
the positive x axis.
• Thus, a trajectory that starts in Quadrant 1 with positive initial conditions can’t
cross the trajectory on the positive x axis or the trajectory on the positive y axis.
• Thus, the Predator–Prey trajectories that start in Quadrant 1 with positive initial
conditions will stay in Quadrant 1.
This is shown in Fig. 10.3.
Fig. 10.4 Finding where x < 0 and x > 0 for the predator–prey model
10.2 The Nullcline Analysis 297
Fig. 10.5 Finding where y < 0 and y > 0 for the predator–prey model
• The factor −c + d x is positive when −c + d x > 0 or when x > c/d. Hence, the
factor is negative when x < c/d.
• The factor y is positive when y > 0 and negative when y < 0.
• So the combination y(−c + d x) has a sign that can also be determined easily.
In Fig. 10.5, we show how the y nullcline divides the x–y plane into three pieces as
well.
The shaded areas shown in Figs. 10.4 and 10.5 can be combined into Fig. 10.6. In
each region, x and y are either positive or negative. Hence, each region can be
marked with an ordered pair, (x ±, y ±).
We drew trajectories for the linear system models already without a lot of background
discussion. Now we’ll go over it again in more detail. We use the algebraic signs
of x and y to determine this. For example, if we are in Region I, the sign of x is
negative and the sign of y is positive. Thus, the variable x decreases and the variable
y increases in this region. So if we graphed the ordered pairs (x(t), y(t)) in the
x–y plane for all t > 0, we would plot a y versus x curve. That is, we would have
y = f (x) for some function of x. Note that, by the chain rule
dy dx
= f (x) .
dt dt
Hence, as long as x is not zero (and this is true in Region I!), we have at each time
t, that the slope of the curve y = f (x) is given by
df y (t)
(t) = .
dx x (t)
Since our pair (x, y) is the solution to a differential equation, we expect that x and y
both are continuously differentiable with respect to t. So if we draw the curve for y
versus x in the x–y plane, we do not expect to see a corner in it (as a corner means
the derivative fails to exist). So we can see three possibilities:
• a straight line as x equals y at each t meaning the slope is always the same,
To analyze this nonlinear model, we need a fact from more advanced courses. For
these kinds of nonlinear models, trajectories that start at different initial conditions
can not cross.
x = a x − b x y (10.5)
y = −c y + d x y (10.6)
x(0) = x0 (10.7)
y(0) = y0 (10.8)
Let’s begin by looking at a trajectory that starts on the positive y axis. We therefore
need to solve the system
x = a x − b x y (10.9)
y = −c y + d x y (10.10)
x(0) = 0 (10.11)
y(0) = y0 > 0 (10.12)
300 10 Predator–Prey Models
It is easy to guess the solution is the pair (x(t), y(t)) with x(t) = 0 always and y(t)
satisfying y = −c y(t). Hence,
y(t) = y0 e−ct
x = a x − b x y (10.13)
y = −c y + d x y (10.14)
x(0) = x0 > 0 (10.15)
y(0) = 0 (10.16)
Again, it is easy to guess the solution is the pair (x(t), y(t)) with y(t) = 0 always
and x(t) satisfying x = a x(t). Hence,
x(t) = x0 eat
and the trajectory moves along the positive x axis always increasing as t increases.
Since trajectories can’t cross other trajectories, this tells us a trajectory that begins in
Quadrant 1 with a positive (x0 , y0 ) can’t hit the x axis or the y axis in a finite amount
of time because it did, we would have two trajectories crossing.
This is good news for our biological model. Since we are trying to model food
and predator interactions in a real biological system, we always start with initial
conditions (x0 , y0 ) that are in Quadrant One. It is very comforting to know that these
solutions will always remain positive and, therefore, biologically realistic. In fact, it
doesn’t seem biologically possible for the food or predators to become negative, so
if our model permitted that, it would tell us our model is seriously flawed! Hence, for
our modeling purposes, we need not consider initial conditions that start in Regions
V–IX. Indeed, if you look at Fig. 10.7, you can see that a solution trajectory could
only hit the y axis from Region II. But that can’t happen as if it did, two trajectories
would cross! Also, a trajectory could only hit the x axis from a start in Region III.
Again, since trajectories can’t cross, this is not possible either. So, a trajectory that
starts in Quadrant 1, stays in Quadrant 1—kind of has a Las Vegas feel doesn’t it?.
10.3 Only Quadrant One Is Biologically Relevant 301
10.3.4 Homework
For the following problems, find the x and y nullclines and sketch using multiple
colors, the algebraic sign pairs (x , y ) the nullclines determine in the x–y plane.
Exercise 10.3.1
Exercise 10.3.2
Exercise 10.3.3
Exercise 10.3.4
Exercise 10.3.5
So we can assume that for a start in Quadrant 1, the solution pair is always positive.
Let’s see how far we can get with a preliminary mathematical analysis. We can
analyze these trajectories like this. For convenience, assume we start in Region II
and the resulting trajectory hits the y = ab line at some time t ∗ . At that time, we will
have x (t ∗ ) = 0 and y(t ∗ ) < 0. We show this situation in Fig. 10.8.
Look at the Predator–Prey model dynamics for 0 ≤ t < t ∗ . Since all variables are
positive and their derivatives are not zero for these times, we can look at the fraction
y (t)/x (t).
302 10 Predator–Prey Models
10.4.1 Example
• Rewrite as y /x :
• Put all the y stuff on the left and all the x stuff on the right:
2 − 5y(t) −6 + 3x(t)
y (t) = x (t) .
y(t) x(t)
10.4 The Nonlinear Conservation Law 303
y (t) x (t)
2 − 5y (t) = −6 + 3x (t).
y(t) x(t)
• Evaluate:
y(t) x(t)
2 ln − 5(y(t) − y0 ) = −6 ln + 3(x(t) − x0 ).
y0 x0
• Combine ln terms:
6 2
x(t) y(t)
ln + ln = 3(x(t) − x0 ) + 5(y(t) − y0 ).
x0 y0
• Put all function terms on the left and all constant terms on the right:
• We did this analysis for Region II, but it works in all the regions. So for the entire
trajectory, we know
• The equation
for f (x) = x 6 /e3x and g(y) = y 2 /e5y , is called the Nonlinear Conservation Law
or NLCL for the Predator–Prey model
Recall, we were looking at the Predator–Prey model dynamics for 0 ≤ t < t ∗ . Since
all variables are positive and their derivatives are not zero for these times, we can
look at the fraction y (t)/x (t).
The equation above will not hold at t ∗ , however, because at that point x (t ∗ ) = 0.
But for t below that critical value, it is ok to look at this fraction.
10.4 The Nonlinear Conservation Law 305
• switching to the variable s for 0 ≤ s < t, for any value t strictly less than our
special value t ∗ , we have
y (s) x (s)
a − b y (s) = −c + d x (s).
y(s) x(s)
• Now we simplify a lot (remember x0 and y0 are positive so absolute values are not
needed around them). First, we use a standard logarithm property:
y(t) x(t)
a ln − b y(T ) − y(0) = −c ln + d x(T ) − x0 .
y0 x0
Then, put all the logarithm terms on the left side and pull the powers a and c inside
the logarithms:
a c
y(t) x(t)
ln + ln = b y(t) − y0 + d x(t) − x0 .
y0 x0
306 10 Predator–Prey Models
• Now exponentiate both sides and use the properties of the exponential function to
simplify. We find
a c
y(t) x(t) eb y(t) ed x(t)
= .
y0 x0 eb y0 ed x0
The equation
for f (x) = x c /ed x and g(y) = y a /eby , is called the Nonlinear Conservation Law
or NLCL for the general Predator–Prey model
10.4.2.1 Approaching t ∗
Now the right hand side is a positive number which for convenience we will call α.
Hence, we have the equation
(y(t))a (x(t))c
=α
eb y(t) ed x(t)
holds for all time t strictly less than t ∗ . Thus, as we allow t to approach t ∗ from
below, the continuity of our solutions x(t) and y(t) allows us to say
(y(t))a (x(t))c (y(t ∗ ))a (x(t ∗ ))c
lim∗ lim∗ = = α.
t−>t eb y(t) t−>t ed x(t) eb y(t ∗ ) ed x(t ∗ )
We can do a similar analysis for a trajectory that starts in Region IV and moves up
until it hits the y = ab line where x = 0. This one will start at an initial point (x0 , y0 )
in Region IV and terminate on the y = ab line at the point (x(t ∗ ), ab ) for some time
t ∗ . In this case, we continue the analysis as before. For any time t < t ∗ , the variables
x(t) and y(t) are positive and their derivatives non zero. Hence, we can manipulate
the Predator–Prey Equations just like before to end up with
t
y (s) t
t
x (s) t
a ds − b y (s) ds = −c ds + d x (s) ds.
0 y(s) 0 0 x(s) 0
We integrate in the same way and apply the initial conditions to obtain Eq. 10.17
again.
(y(t))a (x(t))c y0a x0c
= .
eb y(t) ed x(t) eb y0 ed x0
Then, taking the limit at t goes to t ∗ , we see this equation holds at t ∗ also. Again,
label the right hand side as the positive constant α. We then have
(y(t))a (x(t))c
= α.
eb y(t) ed x(t)
We conclude Eq. 10.17 holds for trajectories that start in regions that terminate on
the x = 0 line y = ab . Since trajectories that start in region I and III never have x
become 0, all of the analysis we did above works perfectly. Hence, we can conclude
that Eq. 10.17 holds for all trajectories starting at a positive initial point (x0 , y0 ) in
Quadrant 1.
We know the pairs (x(t), y(t)) are on the trajectory that corresponds to the ini-
tial start of (x0 , y0 ). Hence, we can drop the time dependence (t) above and write
Eq. 10.18 which holds for any (x, y) pair that is on the trajectory.
ya xc y0a x0c
= . (10.18)
eb y ed x eb y0 ed x0
Equation 10.18 is called the Nonlinear Conservation Law associated with the
Predator–Prey model.
308 10 Predator–Prey Models
Although we have assumed trajectories can’t cross and therefore a trajectory starting
in Region II can’t hit the y axis for that reason, we can also see this using the nonlinear
conservation law. We can do the same derivation for a trajectory starting in Region
II with a positive x0 and y0 and this time assume the trajectory hits the y axis at a
time t ∗ at the point (0, y1 ) with y1 > 0. We can repeat all of the integration steps to
obtain
(y(t))a (x(t))c y0a x0c
= .
eb y(t) ed x(t) eb y0 ed x0
This equation holds for all t before t ∗ . Taking the limit as t goes to t ∗ , we obtain
(y(t ∗ ))a (x(t ∗ ))c (y1 )a (0)c y0a x0c
= =0= .
eb y(t ∗ ) ed x(t ∗ ) eb y1 ed 0 eb y0 ed x0
This is not possible, so we have another way to seeing that a trajectory can’t hit the
y axis. A similar argument shows a trajectory in Region III can’t hit the x axis. We
will leave the details of that argument to you!
10.4.4 Homework
For the following Predator–Prey models, derive the nonlinear conservation law. Since
our discussions have shown us the times when x = 0 in the fraction y /x do not
give us any trouble, you can derive this law by integrating
in the way we have described in this section for the particular values of a, b, c and
d in the given model. So you should derive the equation
ya x c (y(0))a (x(0))c
=
eby ed x eby(0) ed x(0)
Exercise 10.4.1
Exercise 10.4.2
Exercise 10.4.3
Exercise 10.4.4
Exercise 10.4.5
From the discussions above, we now know that given an initial start (x0 , y0 ) in
Quadrant 1 of the x–y plane, the solution to the Predator–Prey system will not leave
Quadrant 1. If we piece the various trajectories together for Regions I, II, III and IV,
the solution trajectories will either periodic, spiraling in to some center or spiraling
out to give unbounded motion. These possibilities are shown in Fig. 10.9 (periodic),
Fig. 10.10 (spiraling out) and Fig. 10.11 (spiraling in). We want to find out which of
these possible trajectories is possible.
We have already defined the functions f and g by for all non negative real numbers
by
310 10 Predator–Prey Models
xc
f (x) =
ed x
ya
g(y) = by .
e
These functions have a very specific look. We can figure this out using a bit of
common sense and some first semester calculus.
10.5 Qualitative Analysis 311
10.5.1.1 An Example
We know for any (x0 >, y0 > 0), the trajectory (x(t), y(t)) satisfies the NLCL
where
x7
f (x) =
e3x
y8
g(y) = 6y
e
What do f and g look like? Let’s look at f first. Recall L’Hôpital’s rule.
x7 ∞
lim =
x→∞ e3x ∞
and so
x7 (x 7 ) 7x 6
lim = lim 3x = lim .
x→∞ e 3x x→∞ (e ) x→∞ 3e3x
312 10 Predator–Prey Models
But this limit is also ∞/∞ and so we can apply L’Hôpital’s rule again.
7x 6 42x 5
lim = lim
x→∞ 3e3x x→∞ (9e3x
x7 7!
lim = lim 7 3x = 0
x→∞ e3x x→∞ 3 e
Since e3x is never zero, f (x) = 0 when x = 0 or when x = 7/3. Note this is c/d
for our Predator–Prey model. A similar analysis holds for g(y) = y 8 /e6y . We find
y → ∞, g(y) → 0 from above as g(y) is always positive and since g(0) = 0, we
know g has a maximum. We use calculus to show the maximum occurs at y = 8/6
which is a/b for our Predator–Prey model.
We can do the same sort of analysis for a general Predator–Prey model. From our
specific example, it is easy to infer that f and g have the same generic form which
are shown in Figs. 10.12 and 10.13.
10.5.2.1 Homework
For the following Predator–Prey models, state what the f and g growth functions
are, use calculus to derive where there maximum occurs (you can do either f or g
as the derivation is the same for both) and sketch their graphs nicely.
Exercise 10.5.1
Exercise 10.5.2
Exercise 10.5.3
Exercise 10.5.4
Exercise 10.5.5
We can write the nonlinear conservation law using the growth functions f and g in
the form of Eq. 10.19:
The trajectories formed by the solutions of the Predator–Prey model that start in
Quadrant 1 are powerfully shaped by these growth functions. It is easy to see that
if we choose (x0 = dc , y0 = ab ), i.e. we start at the places where f and g have their
maximums, the resulting trajectory is very simple. It is the single point (x(t) =
d
c
, y(t) = ab ) for all time t. The solution to this Predator–Prey model with this initial
condition is thus to simply stay at the point where we start. If x0 = c/d and y0 = a/b,
then the NLCL says
But f max gmax is just a number so this trajectory is the constant x(t) = c/d and
y(t) = a/b for all time. Otherwise, there are two cases: if x0 = c/d, then f (x0 ) <
f max . So we can write f (x0 ) = r1 f max where r1 < 1. We don’t know where y0 is,
but we do know g(y0 ) ≤ gmax . So we can write g(y0 ) = r2 gmax with r2 ≤ 1. So in
this case, the NLCL gives
where μ < 1. Finally, if y0 = a/b, then g(y0 ) < gmax . We can thus write g(y0 ) =
r2 f max where r2 < 1. Although we don’t know where x0 is, we do know f (x0 ) ≤
f max . So we can write f (x0 ) = r1 f max with r1 ≤ 1. So in this case, the NLCL again
gives
where μ < 1. We conclude all trajectories with x0 > 0 and y0 > 0 have an associated
μ ≤ 1 so that the NCLC that can be written
and μ < 1 for any trajectory with ICs different from the pair(c/d, a/b).
The next step is to examine what happens if we choose a value of μ < 1.
Let’s assume an IC corresponding to μ = 0.7. The arguments work for any μ but it
is nice to be able to pick a number and work off of it.
• Step 1: draw the f curve and the horizontal line of value 0.7 f max . The horizontal
line will cross the f curve twice giving two corresponding x values. Label these
x1 and x2 as shown. Also label the point c/d = 7/3 and draw the vertical lines
from these x values to the f curve itself.
• Also pick a point x1∗ < x1 and a point x2∗ > x2 and draw them in along with their
vertical lines that go up to the f curve.
• We show all this in Fig. 10.14.
• Step 2: Is it possible for the trajectory to contain the point x1∗ ? If so, there is a
corresponding y value so the NLCL holds: f (x1∗ )g(y) = 0.7 f max gmax . But this
implies that
0.7 f max
g(y) = gmax > gmax
f (x1∗ )
as the bottom of the fraction 0.7 f max / f (x1∗ ) is smaller than the top making the
fraction larger than 1. But no y can have a value larger than gmax . Hence, the point
x1∗ is not on the trajectory.
• Step 3: Is it possible for x1 to be on the trajectory? If so, there is a y value
so the NLCL holds giving f (x1 )g(y) = 0.7 f max gmax . But f (x1 ) = 0.7 f max , so
cancelling, we find g(y) = gmax . Thus y = a/b = 8/6 and (x1 , a/b = 8/6) is on
the trajectory.
• Step 4: Is it possible for the trajectory to contain the point x2∗ ? If so, there is a
corresponding y value so the NLCL holds: f (x2∗ )g(y) = 0.7 f max gmax . But this
implies that
316 10 Predator–Prey Models
Fig. 10.14 The conservation law f (x) g(y) = 0.7 f max gmax implies there are two critical points
x1 and x2 of interest
0.7 f max
g(y) = gmax > gmax
f (x2∗ )
as the bottom of the fraction 0.7 f max / f (x2∗ ) is smaller than the top making the
fraction larger than 1. But no y can have a value larger than gmax . Hence, the point
x2∗ is not on the trajectory.
• Step 5: Is it possible for x2 to be on the trajectory? If so, there is a y value
so the NLCL holds giving f (x2 )g(y) = 0.7 f max gmax . But f (x2 ) = 0.7 f max , so
cancelling, we find g(y) = gmax . Thus y = a/b = 8/6 and (x2 , a/b = 8/6) is on
the trajectory. We conclude if (x(t), y(t)) is on the trajectory, then x1 ≤ x(t) ≤ x2 .
We show this in Fig. 10.15.
Fig. 10.15 Predator–prey trajectories with initial conditions from Quadrant 1 are bounded in x
10.5 Qualitative Analysis 317
0.7gmax
f (x) = f max > f max
g(y1∗ )
as the bottom of the fraction 0.7gmax /g(y1∗ ) is smaller than the top making the
fraction larger than 1. But no x can have a value larger than f max . Hence, the point
y1∗ is not on the trajectory.
• Step 3: Is it possible for y1 to be on the trajectory? If so, there is a x value
so the NLCL holds giving f (x)g(y1 ) = 0.7 f max gmax . But g(y1 ) = 0.7gmax , so
Fig. 10.16 The conservation law f (x) g(y) = 0.7 f max gmax implies there are two critical points
y1 and y2 of interest
318 10 Predator–Prey Models
Fig. 10.17 Predator–prey trajectories with initial conditions from Quadrant 1 are bounded in y
cancelling, we find f (x) = f max . Thus x = c/d = 7/3 and (c/d = 7/3, y1 ) is on
the trajectory.
• Step 4: Is it possible for the trajectory to contain the point y2∗ ? If so, there is a
corresponding x value so the NLCL holds: f (xg(y2∗ ) = 0.7 f max gmax . But this
implies that
0.7gmax
f (x) = f max > f max
g(y2∗ )
as the bottom of the fraction 0.7gmax /g(y2∗ ) is smaller than the top making the
fraction larger than 1. But no x can have a value larger than f max . Hence, the point
y2∗ is not on the trajectory.
• Step 5: Is it possible for y2 to be on the trajectory? If so, there is a x value
so the NLCL holds giving f (x)g(y2 ) = 0.7 f max gmax . But g(y2 ) = 0.7gmax , so
cancelling, we find f (x) = f max . Thus x = c/d = 7/3 and (c/d = 7/3, y2 ) is on
the trajectory. We conclude if (x(t), y(t)) is on the trajectory, then y1 ≤ y(t) ≤ y2 .
We show these bounds in Fig. 10.17.
Combining, we see trajectories are bounded in Quadrant 1. We show this in Fig. 10.18.
We can do the analysis we did for the specific Predator–Prey model for the general
one. Some people like to see these arguments with the parameters a, b, c and d and
10.5 Qualitative Analysis 319
Fig. 10.18 Predator–prey trajectories with initial conditions from Quadrant 1 are bounded in x
and y
others like to see the argument with numbers. However, learning how to see things
abstractly is a skill that is honed by thinking with general terms and not specific
numbers. So as we redo the arguments from the specific example in this general way,
reflect on how similar there are even though you don’t see numbers! We now look
at the general model
Let’s again assume an IC corresponding to μ = 0.7. Again, the arguments work for
any μ but it is nice to be able to pick a number and work off of it. So our argument
is a bit of a hybrid: general parameter values and a specific mu value.
• Step 1: draw the f curve and the horizontal line of value 0.7 f max . The horizontal
line will cross the f curve twice giving two corresponding x values. Label these
x1 and x2 as shown. Also label the point c/d and draw the vertical lines from these
x values to the f curve itself.
• Also pick a point x1∗ < x1 and a point x2∗ > x2 and draw them in along with their
vertical lines that go up to the f curve.
• We show all this in Fig. 10.19.
• Step 2: Is it possible for the trajectory to contain the point x1∗ ? If so, there is a
corresponding y value so the NLCL holds: f (x1∗ )g(y) = 0.7 f max gmax . But this
implies that
0.7 f max
g(y) = gmax > gmax
f (x1∗ )
as the bottom of the fraction 0.7 f max / f (x1∗ ) is smaller than the top making the
fraction larger than 1. But no y can have a value larger than gmax . Hence, the point
x1∗ is not on the trajectory.
320 10 Predator–Prey Models
Fig. 10.19 The conservation law f (x) g(y) = 0.7 f max gmax implies there are two critical points
x1 and x2 of interest
0.7 f max
g(y) = gmax > gmax
f (x2∗ )
as the bottom of the fraction 0.7 f max / f (x2∗ ) is smaller than the top making the
fraction larger than 1. But no y can have a value larger than gmax . Hence, the point
x2∗ is not on the trajectory.
• Step 5: Is it possible for x2 to be on the trajectory? If so, there is a y value
so the NLCL holds giving f (x2 )g(y) = 0.7 f max gmax . But f (x2 ) = 0.7 f max , so
cancelling, we find g(y) = gmax . Thus y = a/b and (x2 , a/b) is on the trajectory.
We conclude if (x(t), y(t)) is on the trajectory, then x1 ≤ x(t) ≤ x2 . We show this
in Fig. 10.20.
Fig. 10.20 Predator–Prey trajectories with initial conditions from Quadrant 1 are bounded in x
• Step 1: draw the g curve and the horizontal line of value 0.7gmax . The horizontal
line will cross the g curve twice giving two corresponding y values. Label these
y1 and y2 as shown. Also label the point a/b and draw the vertical lines from these
y values to the g curve itself.
• Also pick a point y1∗ < y1 and a point y2∗ > y2 and draw them in along with their
vertical lines that go up to the g curve.
• We show all this in Fig. 10.21.
• Step 2: Is it possible for the trajectory to contain the point y1∗ ? If so, there is a
corresponding x value so the NLCL holds: f (x)g(y1∗ ) = 0.7 f max gmax . But this
implies that
0.7gmax
f (x) = f max > f max
g(y1∗ )
Fig. 10.21 The conservation law f (x) g(y) = 0.7 f max gmax implies there are two critical points
y1 and y2 of interest
322 10 Predator–Prey Models
as the bottom of the fraction 0.7gmax /g(y1∗ ) is smaller than the top making the
fraction larger than 1. But no x can have a value larger than f max . Hence, the point
y1∗ is not on the trajectory.
• Step 3: Is it possible for y1 to be on the trajectory? If so, there is a x value
so the NLCL holds giving f (x)g(y1 ) = 0.7 f max gmax . But g(y1 ) = 0.7gmax , so
cancelling, we find f (x) = f max . Thus x = c/d and (c/d, y1 ) is on the trajectory.
• Step 4: Is it possible for the trajectory to contain the point y2∗ ? If so, there is a
corresponding x value so the NLCL holds: f (xg(y2∗ ) = 0.7 f max gmax . But this
implies that
0.7gmax
f (x) = f max > f max
g(y2∗ )
as the bottom of the fraction 0.7gmax /g(y2∗ ) is smaller than the top making the
fraction larger than 1. But no x can have a value larger than f max . Hence, the point
y2∗ is not on the trajectory.
• Step 5: Is it possible for y2 to be on the trajectory? If so, there is a x value
so the NLCL holds giving f (x)g(y2 ) = 0.7 f max gmax . But g(y2 ) = 0.7gmax , so
cancelling, we find f (x) = f max . Thus x = c/d and (c/d, y2 ) is on the trajectory.
We conclude if (x(t), y(t)) is on the trajectory, then y1 ≤ y(t) ≤ y2 . We show
these bounds in Fig. 10.22.
Combining, we see trajectories are bounded in Quadrant 1. We show this in Fig. 10.23.
Now that we have discussed these two cases, note that we could have just done
the x variable case and said that a similar thing happens for the y variable. In many
texts, it is very common to do this. Since you are beginners at this kind of reasoning,
Fig. 10.22 Predator–prey trajectories with initial conditions from Quadrant 1 are bounded in y
10.5 Qualitative Analysis 323
Fig. 10.23 Predator–prey trajectories with initial conditions from Quadrant 1 are bounded in x
and y
we have presented both cases in detail. But you should start training your mind to
see that presenting one case is actually enough!
10.5.10 Homework
For these Predator–Prey models, follow the analysis of the section above to show
that the trajectories must bounded.
Exercise 10.5.6
Exercise 10.5.7
Exercise 10.5.8
Exercise 10.5.9
Exercise 10.5.10
If the trajectory was not periodic, then there would be horizontal and vertical lines
that would intersect the trajectory in more than two places. We will show that we can
have at most two intersections which tells us the trajectory must be periodic. We go
back to our specific example
This time we will show the argument for this specific case only and not do a general
example. But it is easy to infer from this specific example how to handle this argument
for any other Predator–Prey model!
• We draw the same figure as before for f but we don’t need the points x1∗ and x2∗ .
This time we add a point x ∗ between x1 and x2 . We’ll draw it so that it is between
c/d = 7/3 and x2 but just remember it could have been chosen on the other side.
We show this in Fig. 10.24.
Fig. 10.24 The f curve with the point x1 < c/d = 7/3 < x ∗ < x2 added
10.6 The Trajectory Must Be Periodic 325
• At the point x ∗ , the NLCL says the corresponding y values satisfy f (x ∗ )g(y) =
0.7 f max gmax . This tells us
0.7 f max
g(y) = gmax
f (x ∗ )
• The biggest the ratio 0.7 f max / f (x ∗ ) can be is when the bottom f (x ∗ ) is the
smallest. This occurs when x ∗ is chosen to be x1 or x2 . Then the ratio is
0.7 f max /(0.7 f max ) = 1.
• The smallest the ratio 0.7 f max / f (x ∗ ) can be is when the bottom f (x ∗ ) is the largest.
This occurs when x ∗ is chosen to be c/d = 7/3. Then the ratio is 0.7 f max / f max ) =
0.7.
• So the ratio 0.7 f max / f (x ∗ ) is between 0.7 and 1.
• Draw the g now adding in a horizontal line for r gmax where r = 0.7 f max / f (x ∗ ).
The lowest this line can be is on the line 0.7gmax and the highest it can be is the
line of value gmax . This is shown in Fig. 10.25.
• The figure above shows that there are at most two intersections with the g curve.
• The case of spiral in or spiral out trajectories implies there are points x ∗ with more
than two corresponding y values. Hence, spiral in and spiral out trajectories are
not possible and the only possibility is that the trajectory is periodic.
• So there is a smallest positive number T called the period of the trajectory which
means
Fig. 10.25 The g curve with the r gmax line added showing the y values for the chosen x ∗
326 10 Predator–Prey Models
10.6.1 Homework
For the following problems, show the details of the periodic nature of the Predator–
Prey trajectories by mimicking the analysis in the section above.
Exercise 10.6.1
Exercise 10.6.2
Exercise 10.6.3
Exercise 10.6.4
Exercise 10.6.5
Now that we know the trajectory is periodic, let’s look at the plot more carefully.
We know the trajectories must lie within the rectangle [x1 , x2 ] × [y1 , y2 ]. Mathe-
matically, this means there is a smallest positive number T so that x(0) = x(T ) and
y(0) = y(T ). This number T is called the period of the Predator–Prey model. We
can see the periodicity of the trajectory by doing a more careful analysis of the tra-
jectories. We know the trajectory hits the points (x1 , ab ), (x2 , ab ), ( dc , y1 ) and ( dc , y2 ).
What happens when we look at x points u with x1 < u < x2 ? For convenience, let’s
look at the case x1 < u < dc and the case u = dc separately.
10.7 Plotting Trajectory Points! 327
10.7.1 Case 1: u = c
d
or
g(v) = μ gmax .
Since μ is less than 1, we draw the μ gmax horizontal line on the g graph as usual to
obtain the figure we previously drew as Fig. 10.21. Hence, there are two values of v
that give the value μ gmax ; namely, v = y1 and v = y2 . We conclude there are two
possible points on the trajectory, ( dc , v = y1 ) and ( dc , v = y2 ). This gives the usual
points shown as the vertical points in Fig. 10.23.
The analysis is very similar to the one we just did for u = dc . First, for this choice of
u, we can draw a new graph as shown in Fig. 10.26.
Here, the conservation law gives
Fig. 10.26 The predator–prey f growth graph trajectory analysis for x1 < u < c
d
328 10 Predator–Prey Models
Fig. 10.27 The predator–prey g growth analysis for one point x1 < u < c
d
f max
g(v) = μ gmax .
f (u)
Here the ratio f max / f (u) is larger than 1 (just look at Fig. 10.26 to see this). Call this
ratio r . Hence, μ < μ ( f max / f (u)) and so μgmax < μ ( f max / f (u)) gmax . Also from
Fig. 10.26, we see the ratio μ f max < f (u) which tells us (μ f max )/ f (u) gmax <
gmax . Now look at Fig. 10.27. The inequalities above show us we must draw the
horizontal line μ r gmax above the line μ gmax and below the line gmax . So we seek v
values that satisfy
f max
μ gmax < g(v) = μ gmax = μ r gmax < gmax .
f (u)
We already know the values of v that satisfy g(v) = μ gmax which are labeled in
Fig. 10.26 as v = y1 and v = y2 . Since the number μ r is larger than μ, we see from
Fig. 10.27 there are two values of v, v = z 1 and v = z 2 for which g(v) = μ r gmax
and y1 < z 1 < ab < z 2 < y2 as shown.
From the above, we see that in the case x1 < u < dc , there are always 2 and only
2 possible v values on the trajectory. These points are (u, z 1 ) and (u, z 2 ).
What happens if we pick two points, x1 < u 1 < u 2 < dc ? The f curve analysis is
essentially the same but now there are two vertical lines that we draw as shown in
Fig. 10.28.
10.7 Plotting Trajectory Points! 329
Fig. 10.28 The predator–prey f growth graph trajectory analysis for x1 < u 1 < u 2 < c
d points
This implies we are searching for v values in the following two cases:
f max
g(v) = μ gmax
f (u 1 )
and
f max
g(v) = μ gmax .
f (u 2 )
f max
μ<μ = μ r1
f (u 2 )
f max
<μ = μ r2
f (u 1 )
< 1.
330 10 Predator–Prey Models
Fig. 10.29 The spread of the trajectory through fixed lines on the x axis gets smaller as we move
away from the center point dc
Now look at Fig. 10.29. The inequalities above show us we must draw the horizontal
line μ r1 gmax above the line μ r2 gmax which is above the line μ gmax . We already
know the values of v that satisfy g(v) = μ gmax which are labeled in Fig. 10.27 as
v = y1 and v = y2 . Since the number μ r2 is larger than μ, we see from Fig. 10.29
there are two values of v, v = z 21 and v = z 22 for which g(v) = μ r2 gmax and y1 <
z 21 < ab < z 22 < y2 as shown. But we can also do this for the line μ r1 gmax to find
two more points z 11 and z 12 satisfying
a
y1 < z 21 < z 11 < < z 12 < z 22 < y2
b
as seen in Fig. 10.29 also.
We also see that the largest spread in the y direction is at x = dc giving the two
points ( dc , y1 ) and ( dc , y2 ) which corresponds to the line segment [y1 , y2 ] drawn at
the x = dc location. If we pick the point x1 < u 2 < dc , the two points on the trajectory
give a line segment [z 21 , z 22 ] drawn at the x = u 2 location. Note this line segment
is smaller and contained in the largest one [y1 , y2 ]. The corresponding line segment
for the point u 1 is [z 11 , z 12 ] which is smaller yet.
If you think about it a bit, if we picked three points as follows, x1 < u 1 < u 2 < u 3 <
c
d
and three more points dc < u 4 < u 5 < u 6 < x2 , we would find line segments as
follows:
10.7 Plotting Trajectory Points! 331
Point Spread
x1 One point (x1 , ab )
u1 [z 11 , z 12 ]
u2 [z 21 , z 22 ] contains [z 11 , z 12 ]
u3 [z 31 , z 32 ] contains [z 21 , z 22 ]
c
d
[y1 , y2 ] contains [z 21 , z 22 ]
u4 [z 41 , z 42 ] inside [y1 , y2 ]
u2 [z 51 , z 52 ] inside [z 41 , z 42 ]
u1 [z 61 , z 62 ] inside [z 51 , z 52 ]
x2 One point (x2 , ab ) inside [z 51 , z 52 ]
We draw these line segments in Fig. 10.30. We know the Predator–Prey trajectory
must go through these points. Every time the trajectory hits the x value dc , the cor-
responding y spread is [y1 , y2 ]. If the trajectory was spiraling inwards, then the first
time we hit dc , the spread would be [y1 , y2 ] and the next time, the spread would have
to be less so that the trajectory moved inwards. This can’t happen as the second time
we hit dc , the spread is exactly the same. The points shown in Fig. 10.30 are always
the same. Again, note since the trajectory is periodic is there is a smallest positive
number T so that
for all values of t. This is the behavior we are seeing in Fig. 10.30. Note the value
of this period is really determined by the initial values (x0 , y0 ) as they determine the
bounding box [x1 , x2 ] × [y1 , y2 ] since the initial condition determines μ.
If we had a positive function h defined on an interval [a, b], we can define the average
value of h over [a, b] by the integral
b
1
h̄ = h(t) dt. (10.20)
b−a a
To motivate this definition, let’s look at the Riemann sums of some nice function
f on the interval [1, 3]. Take a uniform partition [1, 3] with 5 points. P4 = {1, 1 +
h, 1 + 2h, 1 + 3h, 3} where h = (3 − 1)/4 = 0.5. The evaluation set is the left hand
endpoints: E 4 = {1, 1 + h, 1 + 2h, 1 + 3h = 3}. Note 4h = 3 − 1 = 2 which is the
length of [1, 3]. The Riemann sum is
RS = f (1) + f (1 + h) + f (1 + 2h) + f (1 + 3h) h
f (1) + f (1 + h) + f (1 + 2h) + f (1 + 3h)
= (4h)
4
f (1) + f (1 + h) + f (1 + 2h) + f (1 + 3h)
= (2)
4
Note ( 3j=0 f (1 + j h))/4 is an estimate of the average value of f on [1, 3] using
4 values of the function.
Now cut h in half. The new partition is {1, 1 + h/2, 1 + 2(h/2), 1 + 3(h/2), . . . ,
3} which has 8 subintervals with 9 points. The evaluation set is left hand endpoints
again. Note 8(h/2) = 3 − 1 = 2 which is the length of [1, 3]
Here are the details. Now, recall the Predator–Prey model is given by
x = x a−b y
y = y −c + d x
x(0) = x0
y(0) = y0
x (s)
= a − b y(s)
x(s)
for all 0 ≤ s ≤ T where T is the period for this trajectory. Now integrate from s = 0
to s = T to get
T
x (s) T
ds = a − b y(s) ds.
0 x(s) 0
Hence, we have
T
T
ln x(s) = a T − b y(s) ds.
0 0
334 10 Predator–Prey Models
Simplifying, we find
x(T ) T
ln = a T −b y(s) ds.
x0 0
However, since T is the period for this trajectory, we know x(T ) must equal x(0).
Hence, ln(x(T )/x0 ) = ln(1) = 0. Rearranging, we conclude
T
0=aT− b y(s) ds,
0
T
b y(s) ds = a T,
0
T
1 a
y(s) ds = .
T 0 b
The term on the left hand side is the average value of the solution y over the one
period of time, [0, T ]. Using the usual average notation, we will call this ȳ. Thus,
we have
T
1 a
ȳ = y(s) ds = . (10.21)
T 0 b
We can do a similar analysis for the average value of the x component of the solution.
We find
y (s)
= −c + d x(s), 0 ≤ s ≤ T,
y(s)
T T
y (s)
ds = −c + d x(s) ds,
0 y(s) 0
T T
ln y(s) = −c T + d x(s) ds,
0 0
T
y(T )
ln = −c T + d x(s) ds.
y0 0
However, since T is the period for this trajectory, we know y(T ) must equal y(0).
Hence, ln(y(T )/y0 ) = ln(1) = 0. Rearranging, we conclude
T
0 = −c T + d x(s) ds,
0
T
d x(s) ds = c T,
0
T
1 c
x(s) ds = .
T 0 d
10.8 The Average Value of a Predator–Prey Solution 335
The term on the left hand side is the average value of the solution x over the one
period of time, [0, T ]. Using the usual average notation, we will call this x̄. Thus,
we have
T
1 c
x̄ = x(s) ds = . (10.22)
T 0 d
10.8.1 Homework
For the following Predator–Prey models, derive the average x and y equations.
Exercise 10.8.1
Exercise 10.8.2
Exercise 10.8.3
Exercise 10.8.4
Exercise 10.8.5
Solution For any choice of initial conditions (x0 , y0 ), we can solve this as discussed
in the previous sections. We find a = 2, b = 10 so that ab = 0.2 and c = 3, d = 18
so that dc = 0.16666̄. We know a lot about these solutions now.
1. The solution (x(t), y(t)) has average value x value x̄ = dc which is 0.16666̄ and
an average y value, ȳ = ab = 0.2.
2. The initial condition (x0 , y0 ) is some point on the curve.
3. For each choice of initial condition (x0 , y0 ), there is a corresponding period T
so that (x(t), y(t)) = (x(t + T ), y(t + T )) for all time t.
4. Looking at Fig. 10.30, we can connect the dots so to speak to generate the trajec-
tory shown in Fig. 10.31.
Now that you know how to analyze the predator–prey models, you can look in
the literature and see how they are used. We will leave it up to you to find the many
references on how this model is used to study the wolve–moose population on Isle
Royal in Lake Superior and instead point you to a different one. In Axelsen et al.
(2001), the predators are Atlantic Puffins and the prey are juvenile herring and the
research is trying to understand the shapes that schools of herring take in the wild
under predation. This study is primarily descriptive with no mathematics at all but
the references point to other papers where simulations are carried out. You have
enough training now to follow this paper trail and see how the simulation papers and
the descriptive papers work hand in hand. But we leave the details and hard work
to do this to you. Happy hunting! If you look at the references in this paper, you’ll
note that one the papers there is by Hamilton—the same biologist whose work on
altruism we studied in Peterson (2015).
10.9.1 Homework
Do these analyses for some specific value of μ; for example, mu = 0.8 or something
similar. Of course, the specific value doesn’t matter that much, but it is easier to see
the graphical analysis is μ f max is not too close to the peak of the f . This time also
draw the bounding boxes that we get for different values of μ. You will see that when
μ is close to 1, the bounding box is very small and as μ get close to 0, the bounding
boxes get very large. Also, you will see they are nested inside each other.
10.9 A Sample Predator–Prey Model 337
Fig. 10.31 The theoretical trajectory for x = 2x − 10x y; y = −3y + 18x y. We do not know the
actual trajectory as we can not solve for x and y explicitly as functions of time. However, our
analysis tells us the trajectory has the qualitative features shown
The Predator–Prey model we have looked at so far did not help Volterra explain the
food and predator fish data seen in the Mediterranean sea during World War I. The
model must also handle changes in fishing rates. War activities had decreased the
rate of fishing from 1915 to 1919 or so as shown in Table 10.2. To understand this
data, Volterra added a new decay rate to the model. He let the positive constant r
represent the rate of fishing and assumed that −r x would be removed from food fish
due to fishing and also assumed that the same rate would apply to predator removal.
Hence, −r y would be removed from the predators. This led to the Predator–Prey
with fishing given by
We don’t have to work to hard to understand what adding the fishing does to our
model results. We can rewrite the model as
We see immediately that it doesn’t make sense for the fishing rate to exceed a as we
want a − r to be positive. We also know the new averages are
c+r
x̄r =
d
a −r
ȳr = .
b
where we label the new averages with a subscript r to denote their dependence on
the fishing rate r . What happens if we halve the fishing rate r ? The new model is
c + r/2
x̄r/2 =
d
10.10 Adding Fishing Rates 339
a − r/2
ȳr/2 = .
b
Note that as long as we use a feasible r value (i.e. r < a), we have the following
inequality relationships:
c + r/2 c+r
x̄r/2 = < x̄r =
d d
a − r/2 a −r
x̄r/2 = > ȳr = .
b b
Hence, if we decrease the fishing rate r , the predator percentage goes up and the food
percentage goes down. Now look at Table 10.2 rewritten with the percentages listed
as fractions and interpreted as x̄ and ȳ. We show this in Table 10.4.
Note that Volterra’s Predator–Prey model with fishing rates added has now
explained this data. During the war years, predator amounts went up and food fish
amounts went down. A wonderful use of modeling, don’t you think? Insight was
gained from the modeling that had not been able to be achieved using other types of
analysis.
Let’s do an example to set this in place. Consider the following Predator–Prey
model with fishing added.
Example 10.10.1
Table 10.4 The average food and predator fish caught in the Mediterranean Sea
Year x̄ ȳ Fishing rate change ( x̄, ȳ)
1914 0.881 0.119 Starting value No change yet
1915 0.786 0.214 Down relative to 1914 (−, +)
1916 0.779 0.221 Down relative to 1914 (−, +)
1917 0.788 0.212 Down relative to 1914 (−, +)
1918 0.636 0.364 Down relative to 1914 (−, +)
1919 0.727 0.273 Increased relative to 1918 (+, −)
1920 0.840 0.160 Increased relative to 1918 (+, −)
1921 0.841 0.159 Increased relative to 1918 (+, −)
1922 0.852 0.148 Increased relative to 1918 (+, −)
1923 0.893 0.107 Back to normal 1914 rate Back to normal
340 10 Predator–Prey Models
We see that halving the fishing rate, decreases the food fish amounts (0.2381 down
to 0.1905) and increases the predator amounts (0.1111 up to 0.1667). We could also
shown this graphically by drawing all three average pairs on the same x–y plane
but we will leave that to you in the exercises.
10.10.1 Homework
For the following problems, add fishing to the model at some rate r which is given.
Find the new average solutions (x̄, ȳ) and explain what happens if we half the fishing
rate and how this relates to the way Volterra explained the Mediterranean Sea fishing
data from World War I. Draw a simple picture shown these three averages on the
same x–y graph: show original (x̄, ȳ), the (x̄, ȳ) when the fishing is added and the
(x̄, ȳ) when the fishing is halved. You should clearly see that adding halving the
fishing rate leads to the average predator value going up with the average food fish
value going down.
Exercise 10.10.1
Exercise 10.10.2
Exercise 10.10.3
Exercise 10.10.4
Exercise 10.10.5
Let’s try to solve a typical predator–prey system such as the one given below numer-
ically.
x (t) = a x(t) − b x(t) y(t)
y (t) = −c y(t) + d x(t) y(t)
and we can no longer find the true solution, although our theoretical investigations
have told us a lot about the behavior that the true solution must have.
Let’s solve a Predator–Prey Model with Runge–Kutta Order 4.
This gives the plot of Fig. 10.32. Let’s annotate this code.
342 10 Predator–Prey Models
% s e t x and y l a b e l s
xlabel ( ’x ’) ;
ylabel ( ’y ’) ;
% set t i t l e
t i t l e ( ’ Phase Pl a n e f o r P r e d a t o r − Prey model x ’ ’ = 1 2 x − 5xy , y ’ ’= −6 y+3xy , x ( 0 ) =
0.2 , y (0) = 8.6 ’) ;
% s e t legend
l e g e n d ( ’ x1 ’ , ’ x2 ’ , ’ y1 ’ , ’ y2 ’ , ’ y v s x ’ , ’ L o c a t i o n ’ , ’ Best ’ ) ;
% c a n c e l hold
hold o f f
We’ll estimate the period for our sample problem. We start with a small final time T
and move it up until the trajectory is almost closed.
This gives us Fig. 10.33 and we can see the period T > 1.01.
344 10 Predator–Prey Models
This gives us Fig. 10.34 and we can see the trajectory is now closed. So the period
T ≤ 1.02. Hence, we know 1.01 < T ≤ 1.02.
10.11 Numerical Solutions 345
We can also write code to generate x versus t plots and y versus t plots From these,
we can also estimate the period T . The x versus t code is shown below and right
after it is the one line modification need to generate the plot of y versus t. The codes
below shows a little bit more than one period for x and y.
This is much more compact! We can use it to generate our graphs much faster.
We can use this to show you how the choice of step size is crucial to generating a
decent plot. We show what happens with too large a step size in Fig. 10.37 and what
we see with a better step size choice in Fig. 10.38.
Next, we can generate a real phase plane portrait by automating the phase plane plots
for a selection of initial conditions. This uses the code AutoPhasePlanePlot.m
which we discussed in Sect. 9.4.
We generate a very nice phase plane plot as shown in Fig. 10.39 for the model
for initial conditions from the box [0.1, 4.5] × [0.1, 4.5] using a fairly small step
size of 0.2.
10.11.5 Homework
Exercise 10.11.1
1. Use our Runge–Kutta codes for h sufficiently small to generate a periodic orbit
using initial conditions:
(a)
2
1
(b)
5
2
Exercise 10.11.2
1. Use our Runge–Kutta codes for h sufficiently small to generate a periodic orbit
using initial conditions:
(a)
4
12
(b)
5
20
Exercise 10.11.3
1. Use our Runge–Kutta codes for h sufficiently small to generate a periodic orbit
using initial conditions:
10.11 Numerical Solutions 351
(a)
40
2
(b)
5
25
Exercise 10.11.4
1. Use our Runge–Kutta codes for h sufficiently small to generate a periodic orbit
using initial conditions:
(a)
7
12
(b)
0.2
2
Exercise 10.11.5
1. Use our the Runge–Kutta codes for h sufficiently small to generate a periodic
orbit using initial conditions:
(a)
0.1
18
352 10 Predator–Prey Models
(b)
6
0.1
We now have quite a few tools for analyzing Predator–Prey models. Let’s look at
a sample problem. We can analyze by hand or with computational tools. Here is a
sketch of the process on a sample problem.
1. For the system below, first do the work by hand. For the model
4. Now plot many trajectories at the same time. A typical session usually requires
a lot of trial and error. The AutoPhasePlanePlot.m script is used by filling
in values for the inputs it needs. We generate the plot seen in Fig. 10.41.
10.12.1 Project
Solve the Model By Hand: Do this and attach to your project report.
Plot One Trajectory Using MatLab: Follow the outline above. This part of the
report is done in a word processor with appropriate comments, discussion etc.
Show your MatLab code and sessions as well as plots.
Estimate The Period T: Estimate the period T using the x versus time plot and
then fine tune your estimate using the phase plane plot—keep increasing the
final time until the trajectories touch for the first time. Pick an interesting initial
condition, of course!
Plot Many Trajectories Simultaneously Using MatLab: Follow the outline
above. This part of the report is also done in a word processor with appropri-
ate comments, discussion etc. Show your MatLab code and sessions as well as
plots.
References
Many biologists of Volterra’s time criticized his Predator–Prey model because it did
not include self-interaction terms. These are terms that model how food fish inter-
actions with other food fish and sharks interactions with other predators effect their
populations. We can model these effects by assuming their magnitude is proportional
to the interaction. Mathematically, we assume these are both decay terms giving us
the Predator–Prey Self Interaction model
xsel f = −ex x
ysel f = − f y y.
for positive constants e and f . We are thus led to the new self-interaction model
given below:
x (t) = a x(t) − b x(t) y(t) − e x(t)2
y (t) = −c y(t) + d x(t) y(t) − f y(t)2
The nullclines for the self-interaction model are a bit more complicated, but still
straightforward to work with. First, we can factor the dynamics to obtain
x = x (a − b y − e x),
y = y (−c + d x − f y).
Looking at the predator–prey self interaction dynamics, equations, we see the (x, y)
pairs in the x–y plane where
© Springer Science+Business Media Singapore 2016 355
J.K. Peterson, Calculus for Cognitive Scientists,
Cognitive Science and Technology, DOI 10.1007/978-981-287-877-9_11
356 11 Predator–Prey Models with Self Interaction
0 = x a−b y−ex
are the ones where the rate of change of the food fish will be zero. Now these pairs
can correspond to many different time values so what we really need to do is to
find all the (x, y) pairs where this happens. Since this is a product, there are two
possibilities:
In a similar way, the pairs (x, y) where y becomes zero satisfy the equation
0 = y −c+d x − f y .
Just like we did in Chap. 8, we find the parts of the x–y plane where the algebraic
signs of x and y are (+, +), (+, −), (−, +) and (−, −). As usual, the set of (x, y)
pairs where x = 0 is called the nullcline for x; similarly, the points where y = 0 is
the nullcline for y. The x = 0 equation gives us the y axis and the line y = ab − be x
while the y = 0 gives the x axis and the line y = − cf + df x. The x and y nullclines
thus divide the plane into the usual three pieces: the part where the derivative is
positive, zero or negative. In Fig. 11.1, we show the part of the x–y plane where
x > 0 with one shading and the part where it is negative with another. In Fig. 11.2,
we show how the y nullcline divides the x–y plane into three pieces as well. For x ,
in each region of interest, we know the term x has the two factors x and a − by − ex.
The second factor is positive when
a − by − ex > 0
a e
− x>y
b b
11.1 The Nullcline Analysis 357
UL x > 0 UR
x(a − by − ex)
− −⇒+
x < 0 x = x (a − by − ex).
x(a − by − ex) Setting this to 0,
+ −⇒− we get x = 0 and
y = a/b
x = 0 y = ab − eb x whose
y = a/b − e/bx graphs are shown.
The algebraic signs
x axis
x = a/e of x are shown in
the picture along
x > 0 with the signs of
x < 0 x(a − by − ex) each of the compo-
x(a − by − ex) + +⇒+ nent factors.
− +⇒−
x = 0
LL LR
y axis
Fig. 11.1 Finding where x < 0 and x > 0 for the Predator–Prey self interaction model
UL
y < 0 y = y (−c + dx −
y(−c + dx − f y) f y). Setting this
+ −⇒− y = 0 UR to 0, we get y = 0
y = −c/f + d/f x y > 0 and y = − fc + fd x
y(−c + dx − f y) whose graphs are
y = 0 + +⇒+ shown. The alge-
x axis braic signs of y
x = c/d
y > 0 are shown in the
y(−c + dx − f y) picture along with
y < 0 the signs of each
− −⇒+
y = −c/f y(−c + dx − f y) of the component
−+⇒− factors.
LL LR
y axis
Fig. 11.2 Finding where y < 0 and y > 0 for the Predator–Prey self interaction model
So below the line, the factor is positive. In a similar way, the term y has the two
factors y and −c + d x − f y. Here the second factor is positive when
−c + d x − f y > 0
c d
− + x>y
f f
So below the line, the factor is positive. We then use this information to determine
the algebraic signs in each region. In Fig. 11.1, we show these four regions (think of
them as Upper Left (UL), Upper Right (UR), Lower Left (LL) and Lower Right (LR)
for convenience) with the x equation shown in each region along with the algebraic
signs for each of the two factors. The y signs are shown in Fig. 11.2.
358 11 Predator–Prey Models with Self Interaction
The areas shown in Figs. 11.1 and 11.2 can be combined into one drawing. To do
this, we divide the x–y plane into as many regions as needed and in each region, label
x and y as either positive or negative. Hence, each region can be marked with an
ordered pair, (x ±, y ±). In this self-interaction case, there are three separate cases:
the one where dc < ae which gives an intersection in Quadrant 1, the one where
d
c
= ae which gives an intersection on the x axis and where dc > ae which gives an
intersection in Quadrant 4. We are interested in biologically reasonable solutions so
if the initial conditions start in Quadrant 1, we would like to know the trajectories
stay in Quadrant 1 away from the x and y axes.
Solution • For x , in each region of interest, we know the term x has the two factors
x and 4 − 5y − ex. The second factor is positive when
4 e
4 − 5y − ex > 0 ⇒ − x>y
5 5
So below the line, the factor is positive.
• the term y has the two factors y and −6 + 2x − f y. Here the second factor is
positive when
6 2
−6 + 2x − f y > 0 ⇒ − + x>y
f f
11.1.4 Homework
For these models, do the complete nullcline analysis for x = 0 and y = 0 separately
with all details.
Exercise 11.1.1
UL x > 0 UR
x(4 − 5y − ex)
− −⇒+
x < 0 x = x (4 − 5y − ex).
x(4 − 5y − ex) Setting this to 0,
+ −⇒− we get x = 0 and
y = 4/5
x = 0 y = 45 − 5e x whose
y = 4/5 − e/5x graphs are shown.
The algebraic signs
x axis
x = 4/e of x are shown in
the picture along
x > 0 with the signs of
x < 0 x(4 − 5y − ex) each of the compo-
x(4 − 5y − ex) + +⇒+ nent factors.
− +⇒−
x = 0
LL LR
y axis
UL
y < 0 y = y (−6 + 2x −
y(−6 + 2x − f y) f y). Setting this
+ −⇒− y = 0 UR to 0, we get y = 0
y = −6/f + 2/f x y > 0 and y = − f6 + f2 x
y(−6 + 2x − f y) whose graphs are
y = 0 + +⇒+ shown. The alge-
x axis braic signs of y
x = 6/2
y > 0 are shown in the
y(−6 + 2x − f y) picture along with
y < 0 the signs of each
− −⇒+
y = −6/f y(−6 + 2x − f y) of the component
−+⇒− factors.
LL LR
y axis
Exercise 11.1.2
Exercise 11.1.3
Exercise 11.1.4
To prepare for our Quadrant 1analysis, let’s combine the nullclines, but only in
Quadrant 1. First, let’s redraw the derivative sign analysis just in Quadrant 1. In
Fig. 11.5 we show the Quadrant 1 x + and x − regions in Quadrant 1 only. We will
show that we only need to look at the model in Quadrant 1. To do this, we will show
that the trajectories starting on the positive y axis move down towards the origin.
Further, we will show trajectories starting on the positive x axis move towards the
point (a/e, 0). Then since trajectories can not cross this tells us that a trajectory that
starts in Q1 with positive ICs can not cross the y axis and these trajectories can not
cross the positive x axis but they can end up at the point (a/e, 0).
In Fig. 11.6 we then show the Quadrant 1 analysis for the y + and y − regions.
We know different trajectories can not cross, so if we can show there are trajecto-
ries that stay on the x and y axes, we will know that trajectories starting in Quadrant
1 stay in Quadrant 1.
The x equation In
x = 0 Quadrant 1 is x =
y axis
x (a − b y − e x).
Setting this to 0,
we get x = 0 and
y = a/b − e/b x
x < 0 whose graphs are
y = a/b shown. The alge-
x = 0 braic signs of x
x > 0
y = a/b − e/b x are shown in the
x axis picture.
x = a/e
Fig. 11.5 The x < 0 and x > 0 signs in Quadrant 1 for the Predator–Prey self interaction model
The y equation in
Quadrant 1 is y =
y axis
y (−c + d x − f y).
Setting this to 0,
y = 0 we get y = 0 and
y = −c/f + d/f x
y = −c/f + d/f x
whose graphs are
shown. The alge-
y < 0 y > 0 braic signs of y
are then shown in
x = c/d x axis the picture.
y = 0
Fig. 11.6 The y < 0 and y > 0 signs in Quadrant 1 for the Predator–Prey self interaction model
11.2 Quadrant 1 Trajectories Stay in Quadrant 1 361
Let’s look at the trajectories that start on the positive y axis for this model.
• Rewriting
y
= −1.
y (6 + f y)
1 α β
= + .
u(6 + f u) u 6+ f u
• We want
1 α β
= +
u (6 + f u) u 6+ f u
1 = α (6 + f u) + β u.
362 11 Predator–Prey Models with Self Interaction
• Exponentiate
y(t) 6 + f y0
= e−6t .
6 + f y(t) y0
Let’s look at the trajectories that start on the positive x axis for the same model.
• Hence, the x equation is a logistics model with L = 4/e and α = e. So x(t) → 4/e
as t → ∞. If x0 > 4/e, x(t) goes down toward 4/e and if x0 < 4/e, x(t) goes
up to 4/e.
• This argument works for any e and any other model. So trajectories that start on
the positive x axis move towards a/e.
11.2.3 Homework
Analyze the x and y positive axis trajectories as we have done in the above discus-
sions.
Exercise 11.2.1
Exercise 11.2.2
Exercise 11.2.3
Exercise 11.2.4
We now know that if a trajectory starts at x0 > 0 and y0 > 0, it can not cross the
positive x or y axis. There are three cases to consider. If dc < ae , the trajectories
will cross somewhere in Quadrant 1. This intersection point will play the same role
as the average x and average y in the Predator–Prey model without self interaction.
If dc = ae , the two nullclines intersect on the x axis at ae . Finally, if dc > ae , the
intersection occurs in Quadrant 4 which is not biologically reasonable and is not
even accessible as the trajectory can not cross the x access. Let’s work out the details
of these possibilities.
We now combine Figs. 11.5 and 11.6 to create the combined graph for the case of
the intersection in Quadrant 1. We show this in Fig. 11.7. The two lines cross when
ex +b y = a
dx− f y=c
.
x = 0 y = 0
y axis y = −c/f + d/f x
(−, −)
a/b The combined
x = 0
(x , y ) algebraic
y = a/b − e/b x sign graph in
(+, −) (−, +) Quadrant 1 for
the case dc < ae .
There are four re-
gions of interest as
c/d a/e shown.
x axis
y = 0
(+, +)
Fig. 11.7 The Quadrant 1 nullcline regions for the Predator–Prey self interaction model when
c/d < a/e
a b
det
c −f
x∗ =
e b
det
d −f
a f +bc
=
e f +bd
e a
det
d c
y∗ =
e b
det
d −f
ad −ec
= .
e f +bd
The second case is the one where the nullclines touch on the x axis. We show this
situation in Fig. 11.8. This occurs when c/d = a/e.
The second case is the one where the nullclines do not cross in Quadrant 1. We show
this situation in Fig. 11.9. This occurs when c/d > a/e. The two lines now cross at
a negative y value, but since in this model also, trajectories that start in Quadrant 1
366 11 Predator–Prey Models with Self Interaction
x = 0
y axis
The combined
a/b (−, −)
(x , y ) algebraic
x = 0 sign graph in
y = a/b − e/b x Quadrant 1 for
y = −c/f + d/f x the case dc =
a
y = 0 e ; hence, the
(+, −) (−, +) nullclines don’t
cross.
a/e = c/d x axis
y = 0
Fig. 11.8 The qualitative nullcline regions for the Predator–Prey self interaction model when
c/d = a/e
x = 0
y axis
The combined
a/b (−, −)
(x , y ) algebraic
x = 0 sign graph in
y = a/b − e/b x Quadrant 1 for
the case dc >
y = −c/f + d/f x a
y = 0 e ; hence, the
(+, −) (−, +) nullclines don’t
cross.
a/e c/d x axis
y = 0
Fig. 11.9 The qualitative nullcline regions for the Predator–Prey self interaction model when
c/d > a/e
can’t cross the x or y axis, we only draw the situation in Quadrant 1. By Cramer’s
rule, the solution to
ex +b y = a
dx− f y=c
.
a f +bc
x∗ =
e f +bd
ad −ec
y∗ = .
e f +bd
11.3 The Combined Nullcline Analysis in Quadrant 1 367
In this case, we have a/e < c/d or a d − e c < 0 and so y ∗ is negative and not
biologically interesting.
11.3.4 Example
x = 0 y = 0
y axis y = −6/f + 2/f x
The combined
(−, −)
4/5 (x , y ) alge-
x = 0
braic sign graph
y = 4/5 − 1/5 x
in Quadrant
(+, −) (−, +)
1 for the case
d = 3 < e = 4.
c a
Fig. 11.10 The Quadrant 1 nullcline regions for the Predator–Prey self interaction model when
c/d < a/e
368 11 Predator–Prey Models with Self Interaction
x = 0
y axis The combined
(x , y ) algebraic
4/5 (−, −)
sign graph in
x = 0 Quadrant 1 for
y = 4/5 − (4/3)/5 x
the case dc = 3 =
y = −6/f + 2/f x
e = 4/(4/3);
a
y = 0
(+, −) (−, +) Hence, nullclines
touch at x =
4/(4/3) = 6/2 a/e = 3.
x axis
y = 0
Fig. 11.11 The qualitative nullcline regions for the Predator–Prey self interaction model when
c/d = a/e
x = 0
y axis
The combined
4/5 (−, −) (x , y ) algebraic
x = 0 sign graph in
y = 4/5 − 2/5 x Quadrant 1 for
the case dc >
y = −6/f + 2/f x a
y = 0 e ; hence, the
(+, −) (−, +)
nullclines don’t
cross.
4/2 6/2 x axis
y = 0
Fig. 11.12 The qualitative nullcline regions for the Predator–Prey self interaction model when
c/d > a/e
Fig. 11.14 Sample Predator–Prey model with self interaction: the nullclines cross on the x axis
Finally, for e = 2, the nullclines do not cross in Quadrant I and the trajectories
move toward the point (4/2, 0). We show the trajectories in Fig. 11.15.
Fig. 11.15 Sample Predator–Prey model with self interaction: the nullclines cross in Quadrant 4
have to work as hard as we did before to establish this! However, it is also clear there
is not a true average x and y value here as the trajectory is not periodic. However,
there is a notion of an asymptotic average value which we now discuss.
Now the discussions below will be complicated, but all of you can wade through it as
it does not really use any more mathematics than we have seen before. It is, however,
very messy and looks quite intimidating! Still, mastering these kind of things brings
rewards: your ability to think through complicated logical problems is enhanced! So
grab a cup of tea or coffee and let’s go for a ride. We are going to introduce the idea
of limiting average x and y values.
We know at any time t, the solutions x(t) and y(t) must be positive. Rewrite the
model as follows:
x
+ex = a−b y
x
y
+ f y = −c + d x.
y
We obtain
t t
x(t)
ln +e x(s) ds = a t − b y(s) ds
x0 0 0
t t
y(T )
ln + f y(s) ds = −c t + d x(s) ds.
y0 0 0
are also continuous by the Fundamental Theorem of Calculus. Using the new vari-
ables X and Y , we can rewrite these integrations as
x(t)
ln = a t − e X (t) − b Y (t)
x0
y(t)
ln = −c t + d X (t) − f Y (t).
y0
Hence,
x(t)
d ln = a d t − e d X (t) − b d Y (t)
x0
y(t)
e ln = −c e t + d e X (t) − e f Y (t).
y0
From Fig. 11.7, it is easy to see that no matter what (x0 , y0 ) we choose in Quadrant
1, the trajectories are bounded and so there is a positive constant we will call B so
that
x(t) d y(t) e
ln
x0 y0 ≤ B.
372 11 Predator–Prey Models with Self Interaction
Hence, if we let t grow larger and larger, B/t gets smaller and smaller, and in fact
1 x(t) d y(t) e B
lim
t→∞ t
ln
x0 y0 ≤ t→∞
lim
t
= 0.
But the left hand side is always non-negative also, so we have
1 x(t) d y(t) e
0 ≤ lim ln
t→∞ t x0 y0 ≤ 0,
t
The term Y (t)/t is actually (1/t) 0 y(s) ds which is the average of the solution y
on the interval [0, t]. It therefore follows that
1 t
ad −ce
0 = lim y(s) ds = .
t→∞ t 0 e f +bd
But the term on the right hand side is exactly the y coordinate of the intersection
of the nullclines, y ∗ . We conclude the limiting average value of the solution y is
given by
t
1
lim y(s) ds = y ∗ . (11.2)
t→∞ t 0
These two results are similar to what we saw in the Predator–Prey model without
self-interaction. Of course, we only had to consider the averages over the period
before, whereas in the self-interaction case, we must integrate over all time. It is
instructive to compare these results:
Look back at the signs we see in Fig. 11.9. It is clear that trajectories that start to the
left of c/d go up and to the left until they enter the (−, −) region. The analysis we
did for trajectories starting on the x axis or y axis in the crossing nullclines case are
still appropriate. So we know one a trajectory is in the (−, −) region, it can’t hit the x
axis except at a/e. Similarly, a trajectory that starts in (+, −) moves right and down
towards the x axis, but can’t hit the x axis except at a/e. We can look at the details of
the (+, −) trajectories by reusing the material we figured out in the limiting averages
discussion. Since this trajectory is bounded, as t grows arbitrarily large, the x(t) and
y(t) values must approach fixed values. We will call these asymptotic x and y values
x ∞ and y ∞ for convenience. The trajectory must satisfy
1 x(t) d y(t) e 1
ln = ad −ce − e f +bd Y (t). (11.4)
t x0 y0 t
with the big difference that the term a d − c e is now negative. Exponentiate to obtain
d e
x(t) y(t)
= e(a d−c e) t e−(e f +b d)Y (t)
. (11.5)
x0 y0
Now note
• The term e−(e f +b d)Y (t) is bounded by 1.
• The term e(a d−c e) t goes to zero as t gets large because a d − c e is negative.
Hence, as t increases to infinity, we find
d e d e
x(t) y(t) x∞ y∞
lim = = 0.
t→∞ x0 y0 x0 y0
374 11 Predator–Prey Models with Self Interaction
11.6.1 Homework
Draw suitable trajectories for the following Predator–Prey models with self interac-
tion in great detail.
Exercise 11.6.1
Exercise 11.6.2
Exercise 11.6.3
Exercise 11.6.4
Exercise 11.6.5
We can now summarize how you would completely solve a typical Predator–Prey
self-interaction model problem from first principles. These are the steps you need to
do:
1. Draw the nullclines reasonably carefully in multiple colors to make your teacher
happy. Once you know what the nullclines do, you can solve these problems
completely.
2. Determine if the nullclines cross as this makes a big difference in the kind of
trajectories we will see. Find the place where the nullclines cross if they do.
3. Once you know what the nullclines do, you can solve these problems completely.
Draw a few trajectories in each of the regions determined by the nullclines.
4. From our work in Sect. 11.5, we know the solutions to the Predator–Prey model
with self-interaction having initial conditions in Quadrant 1 are always positive.
You can then use this fact to derive the amazingly true statement that a solution
pair, x(t) and y(t), satisfies
d e t
x(t) y(t)
ln = ad −ce t − e f +bd y(s) ds.
x0 y0 0
From our theoretical investigations, we know if the ratio c/d exceeds the ratio a/e,
the solutions should approach the ratio a/e as time gets large. Let’s see if we get that
result numerically.
Let’s try this problem,
We generate the plot as shown in Fig. 11.16. Note that here c/d = 4/5 and a/e = 2/3
so c/d > a/e which tells us the nullcline intersection is in Quadrant 4. Hence, all
trajectories should go toward a/e = 2/3 on the x-axis.
376 11 Predator–Prey Models with Self Interaction
Now let’s look at what happens when the nullclines cross. We now use the model
Since a/e = 2/1.5 and c/d = 4/5, the nullclines cross in Quadrant 1 and the
trajectories should converge to
a f + bc 2(1.5) + 3(4) 15
x∗ = = = = 0.87
e f + bd 1.5(1.5) + 3(5) 17.25
ad − ec 2(5) − 1.5(4) 4
y∗ = = = = 0.23
e f + bd 17.25 17.25
11.8.1 Homework
Exercise 11.8.1
Exercise 11.8.2
Exercise 11.8.3
Exercise 11.8.4
Exercise 11.8.5
x = ax − bx y − ex 2
y = −cy + d x y − f y 2
The term f y 2 models how much is lost to self interaction between the predators.
It seems reasonable that this loss should be less than the amount of food fish that
are being eaten by the predators. Hence, we will assume in this model that f < b
always so that we get a biologically reasonable model. Then note adding fishing can
be handled in the same way as before. The role of the average values will now be
played by the value of the intersection of the nullclines. We have
∗ ∗ a f + bc ad − ec
xno , yno = ,
e f + bd e f + bd
(a − r ) f + b(c + r ) (a − r )d − e(c + r )
xr∗ , yr∗ = ,
e f + bd e f + bd
a f + bc ad − ec −r f + br −r d − er )
= , + ,
e f + bd e f + bd e f + bd e f + bd
∗ ∗ b − f d + e
= xno , yno +r , −
e f + bd e f + bd
∗ ∗ ∗ ∗ r b − f d +e
xr/2 , yr/2 = xno , yno + , −
2 e f + bd e f + bd
Now compare.
∗ ∗ r b− f
xr/2 = xno +
2 e f + bd
∗ b− f r b− f
= xno +r −
e f + bd 2 e f + bd
r b − f
= xr∗ −
2 e f + bd
∗
which shows us xr/2 goes down with the reduction in fishing as b > f . Similarly,
∗ ∗ r d +e
yr/2 = yno −
2 e f + bd
∗ b− f r b− f
= yno − r +
e f + bd 2 e f + bd
r b − f
= yr∗ +
2 e f + bd
11.9 Adding Fishing! 379
∗
which shows us yr/2 goes up with the reduction in fishing as b > f . This is the same
behavior we saw in the original Predator–Prey model without self-interaction.
In our study of the Predator–Prey model, we have seen the model without self-
interaction was very successful at giving us insight into the fishing catch data during
World War I in the Mediterranean sea. This was despite the gross nature of the model.
No attempt was made to separate the food fish category into multiple classes of food
fish; no attempt made to break down the predatory category into various types of
predators. Yet, the modeling was ultimately successful as it provided illumination
into a biological puzzle. However, the original model lacked the capacity for self-
interaction and so it seemed plausible to add this feature. The self-interaction terms
we use in this chapter seemed quite reasonable, but our analysis has shown it leads
to completely wrong biological consequences. This tells the way we model self-
interaction is wrong. The self-interaction model, in general, would be
x = a x − b x y − e u(x, y)
y = −c y + d x y − f v(x, y)
where u(x, y) and v(x, y) are functions of both x and y that determine the self-
interaction. To analyze this new model, we would proceed as before. We determine
the nullclines
0 = x = a x − b x y − e u(x, y)
0 = y = −c y + d x y − f v(x, y)
and begin our investigations. We will have to decide if the model generates trajectories
that remain in Quadrant 1 if they start in Quadrant 1 so that they are biologically
reasonable. This will require a lot of hard work!
Note, we can simply compute solutions using MatLab or some other tool. We will
not know what the true solution is or even any ideas as to what general appearance it
might have. You should be able to see that a balanced blend of mathematical analysis,
computational study using a tool and intuition from the underlying science must be
used together to solve the problems.
Chapter 12
Disease Models
We will now build a simple model of an infectious disease called the SIR model.
Assume the total population we are studying is fixed at N individuals. This population
is then divided into three separate pieces: we have individuals
• that are susceptible to becoming infected are called Susceptible and are labeled
by the variable S. Hence, S(t) is the number that are capable of becoming infected
at time t.
• that can infect others. They are called Infectious and the number that are infectious
at time t is given by I (t).
• that have been removed from the general population. These are called Removed
and their number at time t is labeled by R(t).
We make a number of key assumptions about how these population pools interact.
• Individuals stop being infectious at a positive rate γ which is proportional to the
number of individuals that are in the infectious pool. If an individual stops being
infectious, this means this individual has been removed from the population. This
could mean they have died, the infection has progressed to the point where they
can no longer pass the infection on to others or they have been put into quarantine
in a hospital so that further interactions with the general population is not possible.
In all of these cases, these individuals are not infectious or can’t cause infections
and so they have been removed from the part of the population N which can be
infected or is susceptible. Mathematically, this means we assume
Iloss = −γ I.
Igain = r S I.
We can then figure out the net rates of change of the three populations. The infectious
populations gains at the rate r S I and loses at the rate γ I . Hence, the net gain is
Igain + Iloss or
I = r S I − γ I.
The net change of Susceptible’s is that of simple decay. Susceptibles are lost at the
rate −r S I . Thus, we have
S = − r S I.
Finally, the removed population increases at the same rate the infectious population
decreases. We have
R = γ I.
We also know that R(t) + S(t) + I (t) = N for all time t because our population is
constant. So only two of the three variables here are independent. We will focus on
the variables I and S from now on. Our complete Infectious Disease Model is then
I = r S I − γ I (12.1)
S = −r S I (12.2)
I (0) = I0 (12.3)
S(0) = S0 . (12.4)
I = 0 = I (r S − γ)
S = 0 = − r S I.
Fig. 12.1 Finding where I < 0 and I > 0 for the disease model
Fig. 12.2 Finding where S < 0 and S > 0 regions for the disease model
The nullcline information for I = 0 and S = 0 can be combined into one picture
which we show in Fig. 12.3.
384 12 Disease Models
Fig. 12.3 Finding the (I , S ) algebraic sign regions for the disease model
12.1.1 Homework
For the following disease models, do the I and S nullcline analysis separately and
then assemble.
Exercise 12.1.1
Exercise 12.1.2
Exercise 12.1.3
Exercise 12.1.4
Consider a trajectory that starts at a point on the positive I axis. Hence, I0 > 0 and
S0 = 0. It is easy to see that if we choose S(t) = 0 for all time t and I satisfying
I = −γ I
I (0) = I0
is trajectory. Since trajectories can not cross, we now know that a trajectory starting
in Quadrant 1 with biologically reasonable values of I0 > 0 and S0 > 0 must remain
on the right side of the I –S plane. Next, if we look at a trajectory which starts on
the positive S axis at the point S0 > 0 and I0 = 0, we see immediately that the
pair S(t) = S0 and I (t) = 0 for all time t satisfies the disease model by direct
calculation:
(S0 ) = 0 = −r S0 0.
In fact, any trajectory with starting point on the positive S axis just stays there. This
makes biological sense as since I0 is 0, there is no infection and hence no disease
dynamics at all. On the other hand, for the point I0 > 0 and S0 > γ/r , the algebraic
signs we see in Fig. 12.3 tell us the trajectory goes to the left and upwards until it
hits the line S = γ/r and then it decays downward toward the S axis. The trajectory
386 12 Disease Models
can’t hit the I axis as that would cross a trajectory, so it must head downward until
it hits the positive S axis. This intersection will be labeled as (S ∞ , 0) and it is easy
to see S ∞ < γ/r . At the point (S ∞ , 0), the trajectory will stop as both the I and S
derivatives become 0 there. Hence, we conclude we only need to look at trajectories
starting in Quadrant 1 with I0 > 0 as shown in Fig. 12.4.
12.2.1 Homework
For the following disease models, analyze the trajectories on the positive I and S
axis and show why this means disease trajectories that start in Q1+ stay there and
end on the positive S axis.
Exercise 12.2.1
Exercise 12.2.2
Exercise 12.2.3
Exercise 12.2.4
We know that biologically reasonable solutions occur with initial conditions starting
in Quadrant 1 and we know that our solutions satisfy S < 0 always with both S and
I positive until we hit the S axis. Let the time where we hit the S axis be given by t ∗ .
Then, we can manipulate the disease model as follows. For any t < t ∗ , we can divide
to obtain
Thus,
dI γ 1
= −1 +
dS r S
or integrating, we find
γ S(t)
I (t) − I0 = − S(t) − S0 + ln .
r S0
Dropping the dependence on time t for convenience of notation, we see in Eq. 12.5,
the functional dependence of I on S.
γ S
I = I0 + S0 − S ln . (12.5)
r S0
It is clear that this curve has a maximum at the critical value γ/r . This value is very
important in infectious disease modeling and we call it the infectious to susceptible
rate ρ. We can use ρ to introduce the idea of an epidemic.
For this model, we say the infection becomes an epidemic if the initial value of
susceptibles, S0 exceeds the critical infectious to susceptible ratio ρ = γr because
the number of infections increases to its maximum before it begins to drop. This
12.3 The I Versus S Curve 389
behavior is easy to interpret as an infection going out of control; i.e. it has entered
an epidemic phase.
12.4 Homework
We are now ready to do some exercises. For the following disease models
12.5.1 Homework
For the following disease models, do the single plot corresponding to an initial
condition that gives an epidemic and also draw a nice phase plane plot using
AutoPhasePlanePlot.
Exercise 12.5.1
Exercise 12.5.2
Exercise 12.5.3
R = γ I = γ (N − R − S).
12.6 Estimating Parameters 393
Further, we know
dS
dS −r S I
= dt
=
dR dR
dt
−γ I
S
=− .
ρ
Hence,
R = γ (N − R − S0 e−R/ρ ). (12.7)
This differential equation is not solvable directly, so we will try some estimates.
To estimate the solution to Eq. 12.7, we would like to replace the term e which makes
our integration untenable with a quadratic approximation like that of Eq. 3.5 from
Sect. 3.1.3. We would approximate around the point R = 0, giving
R R 1 R 2
Q = 1− +
ρ ρ 2 ρ
We need to see if this error is not too large. Recall the I and S solution satisfies
S
I + S = I0 + S0 + ρ ln .
S0
An epidemic would start with the number of removed individuals R0 being 0. Hence,
we know initially N = I0 + S0 and so since R = N − I − S, we have
S
N − R = N + ρ ln
S0
394 12 Disease Models
or
S S0
R = −ρ ln = ρ ln .
S0 S
We know S always decreases from its initial value of S0 , so the fraction S0 /S is larger
than one; hence, the logarithm is positive. We conclude
R S
− = ln < 1.
ρ S0
Now, let’s use this approximation in Eq. 3.5 to derive an approximation to R(t). The
approximate differential equation to solve is
R
R = γ N − R − S0 Q (12.8)
ρ
R 1 R 2
= γ N − R − S0 1 − + . (12.9)
ρ 2 ρ
This can be rewritten as follows (we will go through all the steps because it is
intense!):
S0 2 ρ2 S0 2 ρ2
R = γ N − S0 + −1 R − R2
2 ρ2 S0 ρ S0
S0 S0 2ρ 2
2 ρ2
= −γ R −
2
−1 R − N − S0
2 ρ2 ρ S0 S0
S0 S0 − ρ N − S0
= −γ R 2
− 2 ρ R − 2 ρ 2
.
2 ρ2 S0 S0
Now the next step is truly complicated. We complete the square on the quadratic.
This gives
2 2
S0 S0 − ρ S0 − ρ S0 − ρ
R = −γ R −
2
2ρ R+ ρ −
2
ρ2
2 ρ2 S0 S0 S0
N − S0
− 2 ρ2
S0
12.6 Estimating Parameters 395
2 2
S0 S0 − ρ S0 − ρ N − S0
= −γ R− ρ − ρ2 − 2 ρ2 .
2 ρ2 S0 S0 S0
and
S0 − ρ
β= ρ.
S0
Now we can go about the business of solving the differential equation. We will use a
new approach (rather than the integration by partial fraction decomposition we have
already used). After separating variables, we have
dR S0
= −γ dt.
(R − β)2 − α2 2 ρ2
Substitute u = R − β to obtain
du S0
= −γ dt.
u2 −α 2 2 ρ2
We will do these integrations using what are called hyperbolic trigonometric func-
tions. We make the following definitions:
e x − e−x
sinh(x) = , the hyperbolic sine,
2
e x + e−x
cosh(x) = , the hyperbolic cosine,
2
sinh(x)
tanh(x) = ,
cosh(x)
e x − e−x
= x , the hyperbolic tangent,
e + e−x
396 12 Disease Models
1
sech(x) = ,
cosh(x)
2
= x , the hyperbolic secant,
e + e−x
Note that these definitions are similar, but different, from the ones you are used to
with the standard trigonometric functions sin(x), cos(x) and so forth.
And then there are the derivatives:
Definition 12.6.2 (Hyperbolic Function Derivatives)
The hyperbolic functions are continuous and differentiable for all real x. We have
sinh(x) = cosh(x)
cosh(x) = sinh(x)
tanh(x) = sech 2 (x)
Now let’s go back to the differential equation we need to solve. Make the substitution
u = α tanh(z). Then, we have du = α sech 2 (z) dz. Making the substitution, we
find
α sech 2 (z) dz S0
= −γ dt.
α (tanh (z) − 1)
2 2 2 ρ2
α sech 2 (z) dz −1 S0
= dz = −γ dt.
−α sech (z)
2 2 α 2 ρ2
S0
dz = α γ dt.
2 ρ2
Integrating, we obtain
S0
z(t) − z(0) = α γ t.
2 ρ2
12.6 Estimating Parameters 397
Just like there is an inverse tangent for trigonometric functions, there is an inverse
for the hyperbolic tangent also.
Definition 12.6.3 (Inverse Hyperbolic Function)
It is straightforward to see that tanh(x) is always increasing and hence it has a nicely
defined inverse function. We call this inverse the inverse hyperbolic tangent function
and denote it by the symbol tanh−1 (x).
We can show using rather messy calculations the following sum and difference
formulae for tanh. We will be using these in a bit.
tanh(u) + tanh(v)
tanh(u + v) =
1 + tanh(u) tanh(v)
tanh(u) − tanh(v)
tanh(u − v) = .
1 − tanh(u) tanh(v)
We have
−1 R(t) − β −1 R0 − β S0
tanh − tanh = αγ t.
α α 2 ρ2
Now we want R as a function of t, so we have some algebra to suffer our way through.
Grab another cup of coffee as this is going to be a rocky ride!
R(t) − β β S0 R(t) − β β
+ = tanh α γ t 1+ .
α α 2 ρ2 α α
398 12 Disease Models
using the addition formula for tanh from Definition 12.6.3. We can then find the long
sought formula for R . It is
α2 γ S0 2 α γ S0
R (t) = sech t − φ
2ρ2 2ρ2
Assume we have collected data for the rate of change of R with respect to time during
an infectious incident. The general R model is of the form
R (t) = A sech 2 (a t − b)
for some choice of positive constants a, b and A. We fit our R data by choosing a,
b and A carefully using some technique (these sorts of tools would be discussed in
another class). We know
N − S0
α2 = β 2 + 2 ρ2 . (12.10)
S0
12.6 Estimating Parameters 399
The right hand side is known from our data fit and we can assume we have an estimate
of the total population N also. In addition, if we can estimate the initial susceptible
value S0 , we will have an estimate for the critical value ρ from our data:
2
1 S0 A
ρ =
2
1 − tanh (b) .
2
2 N − S0 a
It is a lot of work to generate the approximate value of R but the payoff is that
we obtain an estimate of the γ/r ratio which determines whether we have an
epidemic or not.
Chapter 13
A Cancer Model
1890 David von Hansermann noted cancer cells have abnormal cell division events.
1914 Theodore Boveri sees that something is wrong in the chromosomes of can-
cer cells: they are aneuploid. That is, they do not have the normal number of
chromosomes.
1916 Ernst Tyzzer first applied the term somatic mutation to cancer.
1927 Herman Muller discovered ionizing radiation which was known to cause can-
cer (i.e. was carcinogenic) was also able to cause genetic mutations (i.e. was
mutagenic).
1951 Herman Muller proposed cancer requires a single cell to receive multiple
mutations.
1950–1959 Mathematical modeling of cancer begins. It is based on statistics.
1971 Alfred Knudson proposes the concept of a Tumor Suppressor Gene or TSG.
The idea is that it takes a two point mutation to inactivate a TSG. TSG’s play a
central role in regulatory networks that determine the rate of cell cycling. Their
inactivation modifies regulatory networks and can lead to increased cell prolifer-
ation.
1986 A Retinoblastoma TSG is identified which is a gene involved in a childhood
eye cancer.
Since 1986, about 30 more TSG’s have been found. An important TSG is p53.
This is mutated in more than 50 % of all human cancers. This gene is at the center of a
control network that monitors genetic damage such as double stranded breaks (DSB)
of DNA. In a single stranded break (SSB), at some point the double stranded helix
of DNA breaks apart on one strand only. In a DSB, the DNA actually separates into
different pieces giving a complete gap. DSB’s are often due to ionizing radiation. If
a certain amount of damage is achieved, cell division is paused and the cell is given
time for repair. If there is too much damage, the cell will undergo apoptosis. In many
cancer cells, p53 is inactivated. This allows these cells to divide in the presence of
substantial genetic damage. In 1976, Michael Bishop and Harold Varmus introduced
the idea of oncogenes. These are another class of genes involved in cancer. These
genes increase cell proliferation if they are mutated or inappropriately expressed.
Now a given gene that occupies a certain position on a chromosome (this position is
called the locus of the gene) can have a number of alternate forms. These alternate
forms are called alleles. The number of alleles a gene has for an individual is called
that individual’s genotype for that gene. Note, the number of alleles a gene has is
therefore the number of viable DNA codings for that gene. We see then that mutations
of a TSG and an oncogene increase the net reproductive rate or somatic fitness of a
cell. Further, mutations in genetic instability genes also increase the mutation rate.
For example, mutations in mismatch repair genes lead to 50–100 fold increases in
point mutation rates. These usually occur in repetitive stretches of short sequences of
DNA. Such regions are called micro satellite regions of the genome. These regions are
used as genetic markers to track inheritance in families. They are short sequences of
nucleotides (i.e. ATCG) which are repeated over and over. Changes can occur such
as increasing or decreasing the number of repeats. This type of instability is thus
called a micro satellite or MIN instability. It is known that 15 % of colon cancer cells
have MIN.
13 A Cancer Model 403
Let’s look now at colon cancer itself. Look at Fig. 13.1. In this figure, you see a
typical colon crypt. Stem cells at the bottom of the crypt differentiate and move up
the walls of the crypt to the colon lining where they undergo apoptosis.
At the bottom of the crypt, a small number of stem cells slowly divide to produce
differentiated cells. These differentiated cells divide a few times while migrating
to the top of the crypt where they undergo apoptosis. This architecture means only
a small subset of cells are at risk of acquiring mutations that become fixed in the
permanent cell lineage. Many mutations that arise in the differentiated cells will be
removed by apoptosis.
Colon rectal cancer is thought to arise as follows. A mutation inactivates the
Adenomatous Polyposis Coli or APC TSG pathway. Ninety- five percent of col-
orectal cancer cells have this mutation with other mutations accounting for the other
5 %. The crypt in which the APC mutant cell arises becomes dyplastic; i.e. has
abnormal growth and produces a polyp. Large polyps seem to require additional
oncogene activation. Then 10–20 % of these large polyps progress to cancer.
Assumption 13.1.2 (Mutation rates û2 and û3 give selective advantage)
The events governed by û2 and û3 give what is called selective advantage. This
means that the size of the population size does matter.
Using these assumptions, we will model û2 and û3 like this:
û2 = N u2
u1 û2
Without CIN A+/+ A+/− A−/−
uc uc uc
u1 û3
With CIN A+/+ CIN A+/− CIN A−/− CIN
and
û3 = N u3 .
where u2 and u3 are neutral rates. We can thus redraw our figure as Fig. 13.3.
The mathematical model is then setup as follows. Let
X0 (t) is the probability a cell in cell type A+/+ at time t.
X1 (t) is the probability a cell in cell type A+/− at time t.
X2 (t) is the probability a cell in cell type A−/− at time t.
Y0 (t) is the probability a cell in cell type A+/+ CIN at time t.
Y1 (t) is the probability a cell in cell type A+/− CIN at time t.
Y2 (t) is the probability a cell in cell type A−/− CIN at time t.
Looking at Fig. 13.3, we can generate rate equations. First, let’s rewrite Fig. 13.3
using our variables as Fig. 13.4.
To generate the equations we need, note each box has arrows coming into it and
arrows coming out of it. The arrows in are growth terms for the net change of the
variable in the box and the arrows out are the decay or loss terms. We model growth
as exponential growth and loss as exponential decay. So X0 only has arrows going
out which tells us it only has loss terms. So we would say X0 loss = −u1 X0 − uc X0
which implies X0 = −(u1 + uc )X0 . Further, X1 has arrows going in and out which
tells us it has growth and loss terms. So we would say X1 loss = −Nu2 X1 − uc X1
and X1 growth = u1 X0 which implies X1 = u1 X0 − (Nu2 + uc )X1 . We can continue
u1 N u2
Without CIN A+/+ A+/− A−/−
uc uc uc
u1 N u3
A+/+ CIN A+/− CIN A−/− CIN
With CIN
Fig. 13.3 The pathways for the TSG allele losses rewritten using selective advantage
u1 N u2
Without CIN X0 X1 X2
uc uc uc
u1 N u3
Y0 Y1 Y2
With CIN
Fig. 13.4 The pathways for the TSG allele losses rewritten using mathematical variables
406 13 A Cancer Model
in this way to find all the model equations. We can then see the Cancer Model rate
equations are
X0 = −(u1 + uc ) X0 (13.1)
X1 = u1 X0 − (uc + N u2 ) X1 (13.2)
X2 = N u2 X1 − uc X2 (13.3)
Y0 = uc X0 − u1 Y0 (13.4)
Y1 = uc X1 + u1 Y0 − N u3 Y1 (13.5)
Y2 = N u3 Y1 + uc X2 (13.6)
Since our interest in these variables is over the typical lifetime of a human being, we
need to pick a maximum typical lifetime.
Assumption 13.2.1 (Average human lifetime)
The average human life span is 100 years. We also assume that cells divide once per
day and so a good choice of time unit is days. The final time for our model will be
denoted by T and hence
13.2 Model Assumptions 407
Next, recall our colonic crypt, N is from 1000 to 4000 cells. For estimation purposes,
we often think of N as the upper value, N = 4 × 103 .
u1 ≈ 10−7
u2 ≈ 10−7 .
We will assume the rate N u3 is quite rapid and so it is close to 1. We will set u3 as
follows:
Assumption 13.2.3 (Losing the second allele due to CIN is close to probability one)
We assume
N u3 ≈ 1 − r.
Hence, once a cell reaches the Y1 state, it will rapidly transition to the end state Y2 if
r is sufficiently small.
We are not yet sure how to set the magnitude of uc , but it certainly is at least u1 .
For convenience, we will assume
uc = R u1 .
where R is a number at least 1. For example, if uc = 10−5 , this would mean R = 100.
The mathematical model can be written in matrix vector form as usual. This gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
X0 −(u1 + uc ) 0 0 0 0 0 X0
⎢X ⎥ ⎢ u −(u + Nu2 ) 0 0 0 0⎥ ⎢X1 ⎥
⎢ 1 ⎥ ⎢ 1 c ⎥ ⎢ ⎥
⎢X ⎥ ⎢ 0 Nu −uc 0 0 0⎥ ⎢X2 ⎥
⎢ 2 ⎥ = ⎢ 2 ⎥ ⎢ ⎥
⎢Y ⎥ ⎢ uc 0 0 −u1 0 0⎥ ⎢Y0 ⎥
⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥
⎣Y1 ⎦ ⎣ 0 uc 0 u1 −Nu3 0⎦ ⎣Y1 ⎦
Y2 0 0 uc 0 Nu3 0 Y2
408 13 A Cancer Model
⎡ ⎤ ⎡ ⎤
X0 (0) 1
⎢X1 (0)⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥
⎢X2 (0)⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥
⎢Y0 (0)⎥ = ⎢0⎥
⎢ ⎥ ⎢ ⎥
⎣Y1 (0)⎦ ⎣0⎦
Y2 (0) 0
We can find eigenvalues and associated eigenvectors for this system as we have done
for 2 × 2 models in the past. However, it is clear, we can solve the X0 , X1 system
directly and then use that solution to find X2 . Here are the details.
We will now solve the top pathway model exactly using the tools we have developed
in this course.
X0 −(u1 + uc ) 0 X0
=
X1 u1 −(uc + Nu2 ) X1
X0 (0) 1
=
X1 (0) 0
Hence, the two eigenvalues are r1 = −(u1 +uc ) and r2 = −(uc +Nu2 ). The associated
eigenvectors are straightforward to find:
1 0
E1 = u1 and E2 =
Nu2 −u1 1
Note, we don’t have to use all the machinery of eigenvalues and eigenvectors here.
It is clear that solving X0 is a simple integration.
The next solutions all use the integrating factor method. The next four variables all
satisfy models of the form. u (t) = −au(t) + f (t); u(t) = 0 where f is the external
data. Using the Integrating factor approach, we find
Hence,
t
u(t) = e−at eas f (s)ds.
0
u1
= e−(uc +u1 )t − e(uc +Nu2 )t
Nu2 − u1
Nu1 u2 1 −uc t
X2 (t) = e − e−(u1 +uc )t
Nu2 − u1 u1
1
− e−uc t − e−(uc +Nu2 )t
Nu2
The same integrating factor technique with Y0 = −u1 Y0 + uc X0 with Y (0) = 0 using
a = u1 and f (t) = uc X0 (t) leads to
t
Y0 (t) = e−u1 t eu1 s uc X0 (s) ds.
0
However, the solutions for the models Y1 and Y2 are very messy and so we will try
to see what is happening approximately.
To see how to approximate these solution, let’s recall ideas from Sect. 3.1 and apply
them to the approximation of the difference of two exponentials. Let’s look at the
function f (t) = e−rt − e−(r+a)t for positive r and a. To approximate this difference,
we expand each exponential function into the second order approximation plus the
error as usual.
t2 t3
e−rt = 1 − rt + r 2 − r 3 e−rc1
2 6
2
t t3
e−(r+a)t = 1 − (r + a)t + (r + a)2 − (r + a)3 e−(r+a)c2
2 6
for some c1 and c2 between 0 and t. Subtracting, we have
t2 t3
e−rt − e−(r+a)t = 1 − rt + r 2 − r 3 e−rc1
2 6
t2 t3
− 1 − (r + a)t + (r + a)2 (r + a)3 e−(r+a)c2
2 6
t2 t3
= at − (a2 + 2ar) + −r 3 e−rc1 + (r + a)3 e−(r+a)c2
2 6
13.5 Approximation Ideas 411
We conclude
t2 t3
e−rt − e−(r+a)t = at − (a2 + 2ar) + −r 3 e−rc1 + (r + a)3 e−(r+a)c2 (13.11)
2 6
We can also approximate the function g(t) = e−(r+a)t − e−(r+b)t for positive r, a and
b. Using a first order tangent line type approximation, we have
t2
e−(r+a)t = 1 − (r + a)t + (r + a)2 e−(r+a)c1
2
2
2 −(r+a)c2 t
e−(r+b)t = 1 − (r + b)t + (r + b) e
2
for some c1 and c2 between 0 and t. Subtracting, we find
t2
e−(r+a)t − e−(r+b)t = 1 − (r + a)t + (r + a)2 e−(r+a)c1
2
t2
− 1 − (r + b)t + (r + b)2 e−(r+b)c2
2
t2
= (−a + b)t + (r + a)2 e−(r+a)c1 − (r + b)2 e−(r+b)c2
2
We conclude
t2
e−(r+a)t − e−(r+b)t = (−a + b)t + (r + a)2 e−(r+a)c1 − (r + b)2 e−(r+b)c2 (13.12)
2
13.5.1 Example
Solution We know
t2 t3
e−rt − e−(r+a)t = at − (a2 + 2ar) + −r 3 e−rc1 + (r + a)3 e−(r+a)c2
2 6
t2
e−1.0t − e−1.1t ≈ 0.1t − (0.01 + 0.2) = 0.1t − 0.105t 2
2
and the error is on the order of t 3 which we write as O(t 3 ) where O stands for order.
t2 t3
e−rt − e−(r+a)t = at − (a2 + 2ar) + −r 3 e−rc1 + (r + a)3 e−(r+a)c2
2 6
t2
e−2.0t − e−2.1t ≈ 0.1t − (0.01 + 2(0.1)(2)) = 0.1t − 0.21t 2
2
t2
e−(r+a)t − e−(r+b)t = (−a + b)t + (r + a)2 e−(r+a)c1 − (r + b)2 e−(r+b)c2
2
Example 13.5.4 Approximate e−2.1t − e−2.2t − e−1.1t − e−1.2t using Eq. 13.12.
Solution We have
plus O(t 2 ) which is not very useful. Of course, if the numbers had been a little
different, we would have not gotten 0. If we instead approximate using Eq. 13.11 we
find
plus O(t 3 ) which is better. Note if the numbers are just right lots of stuff cancels!
13.5 Approximation Ideas 413
13.5.2 Homework
Exercise 13.5.6 Approximate e−1.1t − e−1.3t − e−0.7t − e−0.8t using Eqs. 13.11
and 13.12.
Exercise 13.5.7 Approximate e−2.2t − e−2.4t − e−1.8t − e−1.9t using Eqs. 13.11
and 13.12.
We will want to solve the Y0 –Y2 models in addition to solving the equations for the
top pathway. To do this, we can take advantage of some key approximations.
13.6.1 Approximating X0
We have
t2
X0 (t) = 1 − (u1 + uc )t + +(u1 + uc )2 e−(uc +u1 )c1
2
for some c1 between 0 and t. Hence, X0 (t) ≈ 1 − (u1 + uc )t with error E0 (t)
t2
E0 (t) = (u1 + uc )2 e−(uc +u1 )c1
2
T2
≤ (u1 + uc )2 .
2
414 13 A Cancer Model
We want this estimate for X0 be reasonable; i.e. first, give a positive number over the
human life time range and second, the discrepancy between the true X0 (t) and this
approximation is small. Hence, we will assume we want the maximum error in X0
to be 0.05, we which implies
or
R < 38.7
Since uc = R u1 , we se
13.6.2 Approximating X1
Recall
u1
X1 (t) = e−(uc +u1 )t − e−(uc +Nu2 )t
Nu2 − u1
Since X1 (t) is written as the difference of two exponentials, we can use a first order
approximation as discussed in Eq. 13.12, to find
u1
X1 (t) = (Nu2 − u1 )t + (uc + u1 )2 e−(uc +u1 )c1
Nu2 − u1
t2
− (uc + Nu2 )2 e−(uc +Nu2 )c2
2
u1 t2
= u1 t + (uc + u1 )2 e−(uc +u1 )c1 − (uc + Nu2 )2 e−(uc +Nu2 )c2 .
Nu2 − u1 2
13.6 Approximation of the Top Pathway 415
u1 t2
E1 = max (uc + u1 )2 e−(uc +u1 )c1 − (uc + Nu2 )2 e−(uc +Nu2 )c2
0≤t≤T Nu2 − u1 2
u1 T2
≤ (uc + u1 )2 + (uc + Nu2 )2 .
Nu2 − u1 2
We have already found the R < 39 and for N at most 4000 with u1 = u2 = 10−7 ,
we see Nu2 − u1 ≈ Nu2 . Further, uc = Ru1 ≤ 10−5 and Nu2 ≤ 4 × 10−4 so that in
the second term, Nu2 is dominant. We therefore have
u1 T2
E1 ≤ (1 + R)2 (u1 )2 + (Nu2 )2 .
Nu2 2
u1 T2
E1 ≈ (Nu2 )2
Nu2 2
2
T
= Nu1 u2 = 4000 × 10−14 × 6.67 × 108 = 0.027.
2
13.6.3 Approximating X2
Now here comes the messy part. Apply the second order difference of exponen-
tials approximation from Eq. 13.11 above to our X2 solution. To make our notation
somewhat more manageable, we will define the error term E(r, a, t) by
t3 t3
E(r, a, t) = −r 3 e−rc1 + (r + a)3 e−(r+a)c2 ≤ 2(r + a)3
6 6
Note the maximum error over human life time is thus E(r, a) which is
T3
E(r, a) = 2(r + a)3 . (13.13)
6
Now let’s try to find an approximation for X2 (t). We have
Nu1 u2 1 −uc t 1
X2 (t) = e − e−(u1 +uc )t − e−uc t − e−(uc +Nu2 )t
Nu2 − u1 u1 Nu2
Nu1 u2 1 t2
= u1 t − (u12 + 2u1 uc ) + E(uc , u1 , t)
Nu2 − u1 u1 2
416 13 A Cancer Model
Nu1 u2 1 t2
− Nu2 t − ((Nu2 )2 + 2Nu2 uc ) + E(uc , Nu2 , t)
Nu2 − u1 Nu2 2
Nu1 u2 t2 E(uc , u1 , t)
X2 (t) = t − (u1 + 2uc ) +
Nu2 − u1 2 u1
Nu1 u2 t2 E(uc , Nu2 , t)
− t − (Nu2 + 2uc ) +
Nu2 − u1 2 Nu2
Nu1 u2 t2 E(uc , u1 , t)
X2 (t) = t − (u1 + 2uc ) + −t
Nu2 − u1 2 u1
t2 E(uc , Nu2 , t)
+ (Nu2 + 2uc ) −
2 Nu2
Nu1 u2 t2 E(uc , u1 , t) E(uc , Nu2 , t)
= (Nu2 − u1 ) + −
Nu2 − u1 2 u1 Nu2
t2 Nu2 u1
= Nu1 u2 + E(uc , u1 , t) − E(uc , Nu2 , t).
2 Nu2 − u1 Nu2 − u1
2
Hence, we see X2 (t) ≈ Nu1 u2 t2 with maximum error E2 over human life time T
given by
Nu2 u1
E2 = max E(uc , u1 , t) − E(uc , Nu2 , t).
0≤t≤T Nu2 − u1 Nu2 − u1
Nu2 u1
≤ E(uc , u1 ) + E(uc , Nu2 )
Nu2 − u1 Nu2 − u1
Nu2 u1 T3
= 2(u1 + uc )3 + 2(uc + Nu2 )3
Nu2 − u1 Nu2 − u1 6
u1 T3
E2 ≈ 2uc3 + (Nu2 )3
Nu2 − u1 6
3
u 1 T
≈ 2uc3 + (Nu2 )3
Nu2 6
3
T
≈ 2uc3 + u1 (Nu2 )2
6
13.6 Approximation of the Top Pathway 417
Table 13.1 The Non CIN Pathway Approximations with error estimates
Approximation Maximum error
T2
X0 (t) ≈ 1 − (u1 + uc ) t (u1 + uc )2 2
u1 T2
X1 (t) ≈ u1 t Nu2 −u1 (uc + u1 )2 + (uc + Nu2 )2 2
t2 3
X2 (t) ≈ N u1 u2 2 2uc3 T6
But the term u1 (Nu2 )2 is very small (≈1.6e − 14) and can also be neglected. So, we
have
T3
E2 ≈ 2uc3
6
We summarize our approximation results for the top pathway in Table 13.1.
Y0 = uc X0 − u1 Y0 (13.17)
Y1 = uc X1 + u1 Y0 − N u3 Y1 (13.18)
Y2 = N u3 Y1 + uc X2 . (13.19)
Now, replace X0 , X1 and X2 in the CIN cell population dynamics equations to give
us the approximate equations to solve.
418 13 A Cancer Model
is our model of the difference. We know E(t, 0) = (u1 + uc )2 e−(u1 +uc )β t2 here, so
2
t2
(t) = ± uc (u1 + uc )2 , (0) = 0
2
The maximum error we can have due to this difference over human life time is then
found by integrating. We have
T3
(T ) = uc (u1 + uc )2 ,
6
which is about 0.0005. Hence, this contribution to the error is a bit small. Next, let’s
think about the other approximation error. After rearranging, the Y0 approximate
dynamics are
Y0 + u1 Y0 = uc − uc (u1 + uc ) t.
We solve this equation using the integrating factor method, with factor eu1 t . This
yields
Y0 (t) eu1 t = uc eu1 t − uc (u1 + uc ) t eu1 t .
To see what is going on here, we split this into two pieces as follows:
uc u1 uc + uc2
Y0 (t) = 1 − e−u1 t + 1 − ut t − e−u1 t . (13.23)
u1 u12
e−u1 t ≈ 1 − u1 t
t2
E(t, 0) = u12 e−u1 β
2
where β is some number between 0 and t. Then, as before, the largest possible error
over a human lifetime uses maximum time T = 3.65 × 104 giving
T2
|E(t, 0)| ≤ u12 .
2
Hence, we can rewrite Eq. 13.23 as
uc u1 uc + uc2
Y0 (t) = u1 t − E(t, 0) + − E(t, 0) (13.24)
u1 u12
2 u1 uc + uc2
= uc t − E(t, 0). (13.25)
u12
420 13 A Cancer Model
2 u1 uc + uc2 2 T 2 2 T
2
|Y0 (t) − uc t| ≤ u1 = (2 u1 uc + uc ) . (13.26)
u12 2 2
For our chosen value of R ≤ 39, we see the magnitude of the Y0 error is given by
E = (2 + R) R (6.67 × 10−6 ).
If we add the error due to replacing the true dynamics by the approximation, we find
the total error is about 0.01067 + 0.0005 = 0.0105.
We can do the error due to the replacement of the true solutions by their approxima-
tions here too, but the story is similar; the error is small. We will focus on how to
approximate Y1 using the approximate dynamics. The Y1 dynamics are
Y1 = uc X1 + u1 Y0 − N u3 Y1 .
Over the effective lifetime of a human being, we can use our approximations for X1
and Y0 to obtain the dynamics that are relevant. This yields
2 u1 uc
Y1 (t) = (N u3 t − 1) + e−N u3 t .
(N u3 )2
13.7 Approximating the CIN Pathway Solutions 421
2 u1 uc
Y1 (t) = N u3 t + (e−N u3 t − 1)
(N u3 )2
2 u1 uc 2 u1 uc −N u3 t
= t+ (e − 1).
N u3 (N u3 )2
E(t, 0) = −N u3 e−N u3 β t
for some β in [0, t]. As usual, this error is largest when β = 0 and t is the lifetime
of our model. Thus
|E(t, 0)| ≤ N u3 T .
Thus,
Y1 (t) − 2 u1 uc t ≤
2 u1 uc
(N u3 ) T
N u3 (N u3 )2
2 u1 uc
= T
N u3
2 u1 uc
|Y1 (t) − t| ≤ 2u1 uc T
N u3
≤ 2 (10−7 ) R (10−7 ) T ,
using our value for u1 and our model for uc . We already know to have reasonable
error in X0 we must have R < 39. Since our average human lifetime is T = 36,500
days, we see
2 u1 uc
|Y1 (t) − t| ≤ 2 (10−7 ) 39 (10−7 ) 3.65 (104 )
N u3
≤ 2.85 × 10−8 .
t2
Y2 = N u3 Y1 + uc N u1 u2 .
2
2 u1 uc t2
Y2 = N u3 t + uc N u1 u2
N u3 2
N u1 u2 uc 2
= 2 u1 uc t + t .
2
The integration (for a change!) is easy. We find
N u1 u2 uc 3
Y2 (t) = u1 uc t 2 + t .
6
Note that
Y2 (t) − u1 uc t 2 = N u1 u2 uc t 3 .
6
The error term is largest at the effective lifetime of a human being. Thus, we can say
(using our assumptions on the sizes of u1 , u2 and uc )
3
Y2 (t) − u1 uc t 2 ≤ (N u1 u2 uc ) T .
6
The magnitude estimate for Y2 can now be calculated. For R < 39, we have
We summarize our approximation results for the top pathway in Table 13.2.
13.8 Error Estimates 423
In Sect. 13.4, we solved for X0 , X1 and X2 exactly and then we developed their
approximations in Sect. 13.6. The approximations to X0 , X1 and X2 then let us develop
approximations to the CIN solutions. We have found that for reasonable errors in
X0 (t), we need R < 38.7. We were then able to calculate the error bounds for all the
variables. For convenience, since u1 and u2 are equal, let’s set them to be the common
value u. Then, by assumption uc = R u for some R ≥ 1. Our error estimates can then
be rewritten as seen in Table 13.3. This uses our assumptions that u1 = u2 = 10−7
and uc = Ru1 ≈ 4 × 10−6 .
The error magnitude estimates are summarized in Table 13.4.
Table 13.3 The Non CIN and CIN Model Approximations with error estimates using u1 = u2 = u
and uc = R u
Approximation Maximum error
X0 (t) ≈ 1 − (u1 + uc ) t 0.01
X1 (t) ≈ u1 t 0.027
t2
X2 (t) ≈ N u1 u2 2 0.0009
Y0 (t) ≈ uc t 0.0016
Y1 (t) ≈ 2Nu1uu3 c t 2.9 × 10−8
Y2 (t) ≈ u1 uc t 2 0.0013
Table 13.4 The Non CIN and CIN Model Approximations Dependence on population size N and
the CIN rate for R ≈ 39 with u1 = 10−7 and uc = R u1
Approximation Maximum error
X0 (t) ≈ 1 − (u1 + uc ) t (1 + R)2 6.67 × 10−6 < 0.01
u12 T2
X1 (t) ≈ u1 t N (1 + R)2 + (R + N)2 2 < 0.027
t2
X2 (t) ≈ N u1 u2 2 (3 N R + N−1
N
(N 2 + 1) e−RuT ) 8.11 × 10−9 < 0.0009
Y0 (t) ≈ uc t (2 + R) R 6.67 × 10−6 < 0.0016
Y1 (t) ≈ 2Nu1uu3 c t 2Ru12 T < 2.9 × 10−8
T3
Y2 (t) ≈ u1 uc t 2 RN u13 6 = RN 8.11 × 10−9 < 0.0013
424 13 A Cancer Model
We think of N as about 4000 for a colon cancer model, but the loss of two allele
model is equally valid for different population sizes. Hence, we can think of N as
a variable also. In the estimates we generated in Sects. 13.6 and 13.7, we used the
value 4000 which generated even larger error magnitudes than what we would have
gotten for smaller N. To see if the CIN pathway dominates, we can look at the ratio
of the Y2 output to the X2 output. The ratio of Y2 to X2 tells us how likely the loss of
both alleles is due to CIN or without CIN. We have, for R < 39, that
Y2 (T ) u1 uc T 2 + E(T )
=
X2 (T ) (1/2)Nu1 u2 T 2 + F(T )
where E(T ) and F(T ) are the errors associated with our approximations for X2 and
Y2 . We assume u1 = u2 and so we can rewrite this as
For N = 4000 and u + 1 = u2 = 10−7 and T = 36, 500 days, we find 2/Nu1 u2 T 2 =
37.5 and hence we have
Y2 (T ) (2R/N) + 37.5 E(T )
=
X2 (T ) 1 + 37.5 F(T )
Y2 (T ) (2R/N) + 0.0388
≈
X2 (T ) 1.0488
We can estimate how close this is to the ratio 2R/N and find Now E(T ) ≈ 0.00009
and F(T ) ≈ 0.0013 and so
(2R/N) + 0.04 2R 0.04 − 0.05(2R/N)
− ≈
1.05 N 1.05
Here R < 39 and N is at least 1000, so (2R/N) ≈ 0.08. Hence, the numerator is
about |0.04 − 0.0004| ≈ 0.04. Thus, we see the error we make in using (2R/N) as an
estimate for Y2 (T )/X2 (T ) is about 0.04 which is fairly small. Hence, we can be rea-
sonably confident that the critical ratio (2R)/N is the same as the ratio Y2 (T )/X2 (T )
as the error over human life time is small. Our analysis only works for R < 39 though
so we should be careful in applying it. Hence, we can say
Y2 (T ) 2R
≈ .
X2 (T N
13.9 When Is the CIN Pathway Dominant? 425
Table 13.5 The CIN decay rates, uc required for CIN dominance. with u1 = u2 = 10−7 and
uc = R u1 for R ≥ 1
N Permissible uc For CIN dominance R value
100 >5.0 × 10−6 50
170 >8.5 × 10−6 85
200 >1.0 × 10−5 100
500 >2.5 × 10−5 250
800 >4.0 × 10−5 400
1000 >5.0 × 10−5 500
2000 >10.0 × 10−5 1000
4000 >20.0 × 10−5 2000
The third column shows the R value needed for a good CIN dominance
Hence, the pathway to Y2 is the most important if 2 R > N. This implies the CIN
pathway is dominant if
N
R> . (13.27)
2
For the fixed value of u1 = 10−7 , we calculate in Table 13.5 possible uc values for
various choices of N. We have CIN dominance if R > N2 and the approximations are
valid if R < 39.
Our estimates have been based on a cell population size N of 1000–4000. We
also know for a good X0 (t) estimate we want R < 39. Hence, as long as uc <
3.9 × 10−6 , we have a good X0 (t) approximation and our other estimates are valid.
From Table 13.5, it is therefore clear we will not have the CIN pathway dominant.
We would have to drop the population size to 70 or so to find CIN dominance. Now
our model development was based on the loss of alleles in a TSG with two possible
alleles, but the mathematical model is equally valid in another setting other than the
colon cancer one. If the population size is smaller than what we see in our colon
cancer model, it is therefore possible for the CIN pathway to dominate! However,
in our cancer model, it is clear that the non CIN pathway dominates.
Note we can’t even do this ratio analysis unless we are confident that our approx-
imations to Y2 and X2 are reasonable. The only way we know if they are is to do a
careful analysis using the approximation tools developed in Sect. 3.1. The moral is
that we need to be very careful about how we use estimated solutions when we
try to do science! However, with caveats, it is clear that our simple model gives an
interesting inequality, 2uc > N u2 , which helps us understand when the CIN pathway
dominates the formation of cancer.
13.9.1 Homework
life time is T = 36,500 days. and finally, we have N, the total number of cells in the
population which is 1000–4000. Now let’s do some calculations using different T ,
N, u1 and R values. For the choices below,
• Find the approximate values of all variables at the given T .
• Interpret the number at T as the number of cells in the that state after T days. This
means take your values for say X2 (T ) and multiple by N to get the number of cells.
• Determine if the top or bottom pathway to cancer is dominant Recall, if our approx-
imations are good, the equation 2R/N > 1 implies the top pathway is dominant.
As you answer these questions, note that we can easily use the equation above for
R and N values for which our approximations are probably not good. Hence, it is
always important to know the guideline we use to answer our question can be used!
So for example, most of the work in the book suggests that R ≤ 39 or so, so when
we use these equations for R = 110 etc. we can’t be sure our approximations are
accurate to let us do that!
Exercise 13.9.1 R = 32, N = 1500, u1 = u2 = 6.3 × 10−8 and T = 100 years but
express in days.
Exercise 13.9.2 R = 70, N = 4000, u1 = u2 = 8.3 × 10−8 and T = 100 years but
express in days.
Let’s look at how we might solve a cancer model using Matlab. Our first attempt
might be something like this. First, we set up the dynamics as
But to make this work, we also must initialize all of our parameters before we try to
use the function. We have annotated the code lightly as most of it is pretty common
13.10 A Little Matlab 427
place to us now. We set the final time, as usual, to be human lifetime of 100 years
or 36,500 days. Our mutation rate u1 = 1.0 × 10−7 is in units of base pairs/day and
this requires many steps. If we set the step size to 0.5 as we do here, 73,000 steps
are required and even on a good laptop it will take a long time. Let’s do the model
for a value of R = 80 which is much higher than we can handle with our estimates!
The plot of the number of cells in the A−− and A−−CIN state is seen in Fig. 13.5.
428 13 A Cancer Model
Fig. 13.5 The number of cells in A−− and A−−CIN state versus time
If we could use our theory, it would tell us that since 2R/N = 160/1000 = 0.16, the
top pathway to cancer is dominant. Our numerical results give us
so the numerical results, while giving a Y 2(T )/X2(T ) different from 2R/N still
predict the top pathway to cancer is dominant. So it seems our general rule that
2R/N < 1 which we derived using complicated approximation machinery is working
well. We can change our time units to years to cut down on the number of steps we
need to take. To do this, we convert base pairs/day to base pairs/year by multiplying
u1 by 365. The new code is then
13.10 A Little Matlab 429
Listing 13.4: Switching to time units to years: step size is one half year
u1 = 1 . 0 e −7∗365;
2 u2 = u1 ;
N = 1000;
R = 80;
uc = R∗u1 ;
r = 1 . 0 e −4;
7 u3 = (1− r ) /N;
f = @( t , x ) [ −( u1+uc ) ∗x ( 1 ) ; . . .
u1∗x ( 1 ) −(uc+N∗u2 ) ∗x ( 2 ) ; . . .
N∗u2∗x ( 2 )−uc ∗x ( 3 ) ; . . .
uc ∗x ( 1 )−u1∗x ( 4 ) ; . . .
12 uc ∗x ( 2 ) +u1∗x ( 4 )−N∗u3∗x ( 5 ) ; . . .
N∗u3∗x ( 5 ) +uc ∗x ( 3 ) ] ;
T = 100;
h = .5;
M = c e i l (T / h ) ;
17 [ htime , rk , f r k ] = FixedRK ( f , 0 , [ 1 ; 0 ; 0 ; 0 ; 0 ; 0 ] , h , 4 ,M) ;
X0 = rk ( 1 , : ) ;
X1 = rk ( 2 , : ) ;
X2 = rk ( 3 , : ) ;
Y0 = rk ( 4 , : ) ;
22 Y1 = rk ( 5 , : ) ;
Y2 = rk ( 6 , : ) ;
[ rows , c o l s ] = s i z e ( rk )
N∗Y2 ( c o l s )
N∗X2 ( c o l s )
27 % Find Y2 ( T ) / X2 ( T )
Y2 ( c o l s ) / X2 ( c o l s )
2∗R /N
Y2 ( c o l s )
9 . 0 2 8 3 e −04
32 X2 ( c o l s )
0.0020
Y2 ( c o l s ) / X2 ( c o l s )
0.4548
We get essentially the same results using a time unit of years rather than days! The
example above uses a time step of h = 0.5 which is a half year step. We can do
equally well using a step size of h = 1 which is a year step. We can do it again using
h = 1 (i.e. the step is one year now)
Listing 13.5: Switching the step size to one year
h = 1 ; M = c e i l (T / h ) ; %ans = 100
[ htime , rk , f r k ] = FixedRK ( f , 0 , [ 1 ; 0 ; 0 ; 0 ; 0 ; 0 ] , h , 4 ,M) ;
X0 = rk ( 1 , : ) ; X1 = rk ( 2 , : ) ; X2 = rk ( 3 , : ) ;
Y0 = rk ( 4 , : ) ; Y1 = rk ( 5 , : ) ; Y2 = rk ( 6 , : ) ;
5 [ rows , c o l s ] = s i z e ( rk ) ;
Y2 ( c o l s ) / X2 ( c o l s )
ans = 0 . 4 5 2 9 1
2∗R /N
ans = 0 . 1 6 0 0 0
10 Y2 ( c o l s )
ans = 8 . 9 4 3 5 e −04
N∗Y2 ( c o l s )
ans = 0 . 8 9 4 3 5
X2 ( c o l s )
15 ans = 0 . 0 0 1 9 7 4 7
N∗X2 ( c o l s )
ans = 1 . 9 7 4 7
13.10.1 Homework
We have had to work hard to develop some insight into the relative dominance of
the CIN pathway in this two allele cancer model. It has been important to solve our
model for arbitrary parameters u1 , u2 , u3 , uc and N. This need to do everything in
terms of parameters treated as variables complicated much of our analysis. Could
we do it another way? Well, we could perform a parametric study. We could solve
the model for say 10 choices of each of the 5 parameters using MatLab for the
values X2 (T ) and Y2 (T ) which, of course, depend on the values of u1 , u2 , u3 , uc
and N used in the computation. Hence, we should label this final time values as
X2 (T , u1 , u2 , u3 , uc , N) and Y2 (T , u1 , u2 , u3 , uc , N) to denote this dependence. After
105 separate MatLab computations, we would then have a data set consisting of 105
values of the variables X2 (T , u1 , u2 , u3 , uc , N) and Y2 (T , u1 , u2 , u3 , uc , N). We could
then try to do statistical modeling to see if we could tease out the CIN dominance
relation 2 uc > N u2 . But, choosing only 10 values for each parameter might be too
coarse for our purposes. If we choose 100 values for each parameter, we would have
to do 1010 computations to develop an immense table of 1010 entries of X2 (T ) and
Y2 (T ) values! You should be able to see that the theoretical approach we have taken
here, while hard to work through, has some benefits!
13.11 Insight Is Difficult to Achieve 431
With all the we have said so far, look at the following exposition of our cancer
model; you might read the following in a research report. Over the typical lifetime
of a human being, the variables in our model have functional dependencies on time
(denoted by the symbol ∼) given as follows:
X0 (t) ∼ 1 − (u1 + uc ) t
X1 (t) ∼ u1 t
t2
X2 (t) ∼ N u1 u2
2
Y0 (t) ∼ uc t
2 u1 uc
Y1 (t) ∼ t
N u3
Y2 (t) ∼ u1 uc t 2 .
It is then noted that the ratio Y2 to X2 gives interesting information about the domi-
nance of the CIN pathway. CIN dominance requires the ratio exceeds one giving us
the fundamental inequality
2 uc
> 1.
N u2
Note there is no mention in this derivation about the approximation error magnitudes
that must be maintained for the ratio tool to be valid! So because no details are
presented, perhaps we should be wary of accepting this model! if we used it without
being sure it was valid to make decisions, we could be quite wrong.
References
M. Novak, Evolutionary Dynamics: Exploring the Equations of Life (Belknap Press, Cambridge,
2006)
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd, Singapore, 2015 In press)
B. Ribba, T. Colin, S. Schnell, A multiscale mathematical model of cancer, and its use in analyzing
irradiation therapies. Theor. Biol. Med. Model. 3, 1–19 (2006)
Y. Tatabe, S. Tavare, D. Shibata, Investigating stem cells in human colon using methylation patterns.
Proc. Natl. Acad. Sci. 98, 10839–10844 (2001)
Part V
Nonlinear Systems Again
Chapter 14
Nonlinear Differential Equations
We are now ready to solve nonlinear systems of nonlinear differential equations using
our new tools. Our new tools will include
1. The use of linearization of nonlinear ordinary differential equation systems to gain
insight into their long term behavior. This requires the use of partial derivatives.
2. More extended qualitative graphical methods.
Recall, this means we are using the ideas of approximating functions of two variables
by tangent planes which results in tangent plane error. This error is written in terms
of Hessian like terms. The difference now is that x has its dynamics f and y has
its dynamics g and so we have to combine tangent plane approximations to both f
and g into one Hessian like term to estimate the error. So the discussions below are
similar to the ones in the past, but a bit different.
x = f (x, y)
y = g(x, y)
x(0) = x0
y(0) = y0
where both f and g possess continuous partial derivatives up to the second order.
For any fixed x and y, define the function h by
f (x0 + tx, y0 + ty)
h(t) =
g(x0 + tx, y0 + ty)
t2
h(t) = h(0) + h (0)t + h (c) .
2
Using the chain rule, we find
f (x + tx, y0 + ty)tx + f y (x0 + tx, y0 + ty)ty
h (t) = x 0
gx (x0 + tx, y0 + ty)tx + g y (x0 + tx, y0 + ty)ty
or
⎡ ⎤
T tx
⎢ tx ty H f (x0 + tx, y0 + ty) ty ⎥
⎢ ⎥
h (t) = ⎢
⎢
⎥
⎥
⎣ T tx ⎦
tx ty Hg (x0 + tx, y0 + ty)
ty
1
h(1) = h(0) + h (0)(1 − 0) + h (c)
2
for some c between 0 and 1, we have
⎡ ⎤
f (x0 + x, y0 + y)
⎣ ⎦
g(x0 + x, y0 + y)
⎡ T ⎤
1 x ∗ ∗ x
⎢ f (x0 , y0 ) + f x (x0 , y0 )x + f y (x0 , y0 )y + 2 y H f (x , y )
y ⎥
⎢ ⎥
=⎢⎢
⎥
⎥
⎣ T ⎦
1 x ∗ ∗ x
g(x0 , y0 ) + gx (x0 , y0 )x + g y (x0 , y0 )y + 2 Hg (x , y )
y y
where the error terms E f (x ∗ , y ∗ ) and Eg (x ∗ , y ∗ ) are the usual Hessian based terms
∗ ∗ 1 x T ∗ ∗ x
E f (x , y ) = H f (x , y )
2 y y
T
1 x x
Eg (x ∗ , y ∗ ) = Hg (x ∗ , y ∗ )
2 y y
With sufficient knowledge of the Hessian terms, we can have some understanding
of how much error we make but, of course, it is difficult in interesting problems to
make this very exact. Roughly speaking though, for our functions with continuous
second order partial derivatives, in any closed circle of radius R around the base
point (x0 , y0 ), there is a constant B R so that
1 1
E f (x ∗ , y ∗ ) ≤ B R (|x| + |y|)2 and E f (x ∗ , y ∗ ) ≤ B R (|x| + |y|)2 .
2 2
It is clear that the simplest approximation arises at those special points (x0 , y0 ) where
both f (x0 , y0 ) = 0 and g(x0 , y0 ) = 0. The points are called equilibrium points of
the model. At such points, we can a linear approximation to the true dynamics of
this form
x = f x (x0 , y0 )x + f y (x0 , y0 )y
y = gx (x0 , y0 )x + g y (x0 , y0 )y
and this linearization of the model is going to give us trajectories that are close to the
true trajectories when are deviations x and y are small enough! We can rewrite
our linearizations again using x − x0 = x and y − y0 = y as follows
x f x (x0 , y0 ) f y (x0 , y0 ) x − x0
= .
y gx (x0 , y0 ) g y (x0 , y0 ) y − y0
The special matrix of first order partials of f and g is called the Jacobian of our
model and is denoted by J(x, y). Hence, in general
f (x, y) f y (x, y)
J(x, y) = x
gx (x, y) g y (x, y)
438 14 Nonlinear Differential Equations
so that the linearization of the model at the equilibrium point (x0 , y0 ) has the form
x x − x0
= J(x0 , y0 )
y y − y0
as J(x0 , y0 ) is simply as 2 × 2 real matrix which will have real distinct, repeated
or complex conjugate pair eigenvalues which we know how to deal with after our
discussions in Chap. 8. We are now ready to study interesting nonlinear models using
the tools of linearization.
which has multiple equilibrium points. Clearly, we need to stay away from the line
x = −1 for initial conditions as there the dynamics themselves are not defined!
However, we can analyze at other places. We define the nonlinear function f and g
then by
2x y
f (x, y) = (1 − x)x −
1+x
y
g(x, y) = 1 − y
1+x
We need to discard x = −1 as the dynamics are not defined there. So the last equi-
librium point is at (1, 0). We can then use MatLab to find the Jacobians and the
associated eigenvalues and eigenvectors are each equilibrium point. We encode the
Jacobian
1 − 2x − 2 (1+x)
y
2 −2 (1+x)
x
J (x, y) = y2
(1+x)2 )
1 − 2 1+x
y
We will use MatLab to help with our analysis of these equilibrium points. Here are
the MatLab sessions for all the equilibrium points. The MatLab command eig is used
to find the eigenvalues and eigenvectors of a matrix.
Equilibrium Point (0,0) For the first equilibrium point (0, 0), we find the Jacobian
at (0, 0) and the associated eigenvalues and eigenvectors with the following MatLab
commands.
Hence, there is a repeated eigenvalue, r = 1 but there are two different eigenvectors:
1 0
E1 = , E2 =
0 1
440 14 Nonlinear Differential Equations
where a and b are arbitrary. Hence, trajectories move away from the origin locally.
Recall, the local linear system is
x (t) 10 x (t)
=
y (t) 01 y (t)
which is the same as the local variable system using the change of variables u = x
and v = y.
u (t) 1 0 u (t)
=
v (t) 0 1 v (t)
Equilibrium Point (1,0) For the second equilibrium point (1, 0), we find the Jacobian
at (1, 0) and the associated eigenvalues and eigenvectors in a similar way.
where a and b are arbitrary. Hence, trajectories move away from (1, 0) locally for
all trajectories except those that start on E 1 . Recall, the local linear system is
x (t) −1 −1 x (t) − 1
=
y (t) 0 1 y (t)
Equilibrium Point (0,1) For the third equilibrium point (0, 1), we again find the
Jacobian at (0, 1) and the associated eigenvalues and eigenvectors in a similar way.
Now there is again a repeated eigenvalue, r1 = −1. If you look at the D2 matrix,
you see both the columns are the same. In this case, MatLab does not give us useful
information. We can use the first column as our eigenvector E 1 , but we still must
find the other vector F.
Recall, the local linear system is
x (t) −1 0 x (t)
=
y (t) 1 −1 y (t) − 1
Recall, the general solution to a model with a repeated eigenvalue with only one
eigenvector is given by
442 14 Nonlinear Differential Equations
x (t)
= a E 1 e−t + b F e−t + E 1 t e−t
y (t)
where a and b are arbitrary. Hence, trajectories move toward from (0, 1) locally.
Piecing together the global behavior from the local trajectories is difficult, so it is
helpful to write scripts in MatLab to help us. We can use the AutoPhasePlanePlot()
function from before, but this time we use different dynamics. The dynamics are
stored in the file autonomousfunc.m which encodes the right hand side of the model
2x y
x = (1 − x)x −
1+x
y
y = 1 − y
1+x
Then, to generate a nice phase plane portrait, we try a variety of [xmin, xmax] ×
[ymin, ymax] initial condition boxes until it looks right! Here, we avoid any initial
conditions that have negative values as for those the trajectories go off to infinity and
the plots are not manageable. We show the plot in Fig. 14.1.
You should play with this function. You’ll see it involves a lot of trial and error. Any
box [xmin, xmax] × [ymin, ymax] which includes trajectories whose x or y values
increase exponentially causes the overall plot’s x and y ranges to be skewed toward
those large numbers. This causes a huge loss in resolution on all other trajectories!
So it takes time and a bit of skill to generate a nice collection of phase plane plots!
We can find the equilibrium points using the root finding methods called bisection
and Newton’s method.
We need a simple function to find the root of a nice function f of the real variable
x using what is called bisection. The method is actually quite simple. We know that
if f is a continuous function on the finite interval [a, b] then f must have a zero
inside the interval [a, b] if f has a different algebraic sign at the endpoints a and b.
This means the product f (a) f (b) is not zero. So we assume we can find an interval
[a, b] on which this change in sign satisfies f (a) f (b) ≤ 0 (which we can do by
switching to − f if we have to!) and then if we divide the interval [a, b] into two
equal pieces [a, m] and [m, b], f (m) can’t have the same sign as both f (a) and f (b)
because of the assumed sign difference. So at least one of the two halves has a sign
change.
Note that if f (a) and f (b) was zero then we still have f (a) f (b) ≤ 0 and either a
or b could be our chosen root and either half interval works fine. If only one of the
444 14 Nonlinear Differential Equations
endpoint function values is zero, then the bisection of [a, b] into the two halves still
finds the one half interval that has the root.
So our prototyping Matlab code should use tests like f (x) f (y) ≤ 0 rather than
f (x) f (y) < 0 to make sure we catch the root.
Simple Bisection MatLab Code Here is a simple Matlab function to perform the
Bisection routine.
We should look at some of these lines more closely. First, to use this routine, we need
to write a function definition for the function we want to apply bisection to. We will
do this in a file called func.m (Inspired Name, eh?) An example would be the one
we wrote for the function
x
f (x) = tan − 1;
4
which is coded in Matlab by
14.1 Linear Approximations to Nonlinear Models 445
So to apply bisection to this function on the interval [2, 4] with a stopping tolerance
of say 10−4 , in Matlab, we would type the command
Note that the name of our supplied function, the uninspired choice func, is passed in
as the first argument in single quotes as it is a string. Also, in the Bisection routine,
we have added the code to print out what is happening at each iteration of the while
loop. Matlab handles prints to the screen a little funny, so do set up a table of printed
values we use this syntax:
Running the Code As mentioned above, we will test this code on the function
x
f (x) = tan − 1;
4
on the interval [2, 4] with a stopping tolerance of δ = 10−6 . Our function has been
written as the Matlab function func supplied in the file func.m. The Matlab run
time looks like this:
446 14 Nonlinear Differential Equations
Homework Well, you have to practice this stuff to see what is going on. So here are
two problems to sink your teeth into!
Exercise 14.1.1 Use bisection to find the first five positive solutions of the equation
x = tan(x). You can see where this is roughly by graphing tan(x) and x simultane-
ously. Do this for tolerances {10−1 , 10−2 , 10−3 , 10−4 , 10−5 , 10−6 , 10−7 }. For each
root, choose a reasonable bracketing interval [a, b], explain why you chose it, pro-
vide a table of the number of iterations to achieve the accuracy and a graph of this
number versus accuracy.
Exercise 14.1.2 Use the Bisection Method to find the largest real root of the function
f (x) = x 6 − x − 1. Do this for tolerances {10−1 , 10−2 , 10−3 , 10−4 , 10−5 , 10−6 ,
10−7 }. Choose a reasonable bracketing interval [a, b], explain why you chose it,
provide a table of the number of iterations to achieve the accuracy and a graph of
this number versus accuracy.
Newton’s method is based on the tangent line and rapidly converges to a zero of the
function f if the original guess is reasonable. Of course, that is the problem. A bad
initial guess is a great way to generate random numbers! So usually, we find a good
14.1 Linear Approximations to Nonlinear Models 447
interval where the root might reside by first using bisection. The following code uses
a simple test to see which we should do in our zero finding routine.
Then, once the bisection steps have given us an interval where the root might be,
we switch to Newton’s method. This takes the current guess, say x1 , and finds the
tangent line, T (x), to the function at that point. This gives
If we find the value of x where the tangent line crosses the x axis, this becomes our
next guess x1 . We find
f (x0 )
x1 = x0 − .
f (x0 )
In essence this is Newton’s Method which can be rephrased for the scalar function
case as
f (xn )
xn+1 = x N − .
f (xn )
and it is clear the method fails if f (xn ) = 0 or is close to 0 at any iteration. MatLab
code to implement this method is given next.
A Global Newton Method The code for a global Newton method is pretty straight-
forward. Here is the listing.
448 14 Nonlinear Differential Equations
nEvals = 1;
k = 1;
disp (’ ’)
31 d i s p ( ’ Step | k | a(k) | x(k) | b(k) ’ )
d i s p ( s p r i n t f ( ’Start | %6d | %12.7f | %12.7f | %12.7f’ , k , a , x , b ) ) ;
while ( abs ( a−b )>t o l x ) && ( abs ( f x )> t o l f ) && ( nEvals<nEvalsMax ) | | ( n E v a l s ==1)
%[ a , b ] b r a c k e t s a r o o t and x=a o r x=b
check = S t e p I s I n ( x , fx , fpx , a , b ) ;
36 i f check
%Take Newton S t e p
x = x − fx / fpx ;
else
%Take a B i s e c t i o n S t e p :
41 x = ( a+b ) / 2 ;
end
f x = f e v a l ( fName , x ) ;
f p x = f e v a l ( fpName , x ) ;
n E v a l s = nE v a l s + 1;
46 i f f a ∗ fx <=0
%t h e r e i s a r o o t i n [ a , x ] . Use r i g h t e n d p o i n t .
b = x;
fb = fx ;
else
51 %t h e r e i s a r o o t i n [ x , b ] . Bring in l e f t e n d p o i n t .
a = x;
fa = fx ;
end
k = k + 1;
56 i f ( check )
d i s p ( s p r i n t f ( ’Newton | %6d | %12.7f | %12.7f | %12.7f’ , k , a , x , b ) ) ;
else
d i s p ( s p r i n t f ( ’Bisection | %6d | %12.7f | %12.7f | %12.7f’ , k , a , x , b ) ) ;
end
61 end
aF = a ;
bF = b ;
end
A Run Time Example We will apply our global newton method root finding code
to a simple example: find a root for f (x) = sin(x) in the interval [ −7π
2
, 15π + 0.1].
14.1 Linear Approximations to Nonlinear Models 449
We code the function and its derivative in two simple Matlab files; f1.m and f1p.m.
These are
and
Listing 14.15: Global Newton Function Derivative
f u n c t i o n y = f1p ( x )
2 y = cos ( x ) ;
end
To run this code on this example, we would then type a phrase like the one below:
Listing 14.16: Global Newton Sample
[ x , fx , nEvals , aLast , bLast ] = GlobalNewton ( ’f1’ , ’f1p’ , −7∗ p i / 2 , . . .
2 15∗ p i + . 1 , 1 0 ˆ − 6 , 1 0 ˆ − 8 , 2 0 0 )
Homework
Exercise 14.1.3 Use the Global Newton Method to find the first five positive solu-
tions of the equation x = tan(x). You can see where this is roughly by graphing
tan(x) and x simultaneously. Do this for tolerances {10−1 , 10−2 , 10−3 , 10−4 , 10−5 ,
10−6 , 10−7 }. For each root, choose a reasonable bracketing interval [a, b], explain
why you chose it, provide a table of the number of iterations to achieve the accuracy
and a graph of this number versus accuracy.
450 14 Nonlinear Differential Equations
Exercise 14.1.4 Use the Global Newton Method to find the largest real root of the
function f (x) = x 6 − x − 1. Do this for tolerances {10−1 , 10−2 , 10−3 , 10−4 , 10−5 ,
10−6 , 10−7 }. Choose a reasonable bracketing interval [a, b], explain why you chose
it, provide a table of the number of iterations to achieve the accuracy and a graph of
this number versus accuracy.
We can also choose to replace the derivative function for f with a finite difference
approximation. We will use
f (xc + δc ) − f (xc )
f (x) ≈
δc
f (xc ) − f (x− )
f (x) ≈
xc − x−
where x− is the previous iterate from our routine. The Matlab fragment we need is
then:
We add the finite difference routines into our Global Newton’s Method as follows:
14.1 Linear Approximations to Nonlinear Models 451
nEvals = 1;
29 k = 1;
%d i s p ( ’ ’ )
%d i s p ( ’ S t e p | k | a(k) | x(k) | b ( k ) ’)
%d i s p ( s p r i n t f ( ’ S t a r t | %6d | %12.7 f | %12.7 f | %12.7 f ’ , k , a , x , b ) ) ;
while ( abs ( a−b )>t o l x ) && ( abs ( f x )> t o l f ) && ( nEvals<nEvalsMax ) | | ( n E v a l s ==1)
34 %[ a , b ] b r a c k e t s a r o o t and x=a o r x=b
check = S t e p I s I n ( x , fx , fpx , a , b ) ;
i f check
%Take Newton S t e p
x = x − fx / fpx ;
39 else
%Take a B i s e c t i o n S t e p :
x = ( a+b ) / 2 ;
end
f x = f e v a l ( fName , x ) ;
44 f p v a l = f e v a l ( fName , x+ d e l t a ) ;
f p x = ( f p v a l −f x ) / d e l t a ;
n E v a ls = n E v a ls +1;
i f f a ∗ fx <=0
%t h e r e i s a r o o t i n [ a , x ] . Use r i g h t e n d p o i n t .
49 b = x;
fb = fx ;
else
%t h e r e i s a r o o t i n [ x , b ] . Bring in l e f t e n dpoi nt .
a = x;
54 fa = fx ;
end
k = k + 1;
%i f ( c h e c k )
% d i s p ( s p r i n t f ( ’ Newton | %6d | %12.7 f | %12.7 f | %12.7 f ’ , k , a , x , b ) ) ;
59 %e l s e
% d i s p ( s p r i n t f ( ’ B i s e c t i o n | %6d | %12.7 f | %12.7 f | %12.7 f ’ , k , a , x , b ) ) ;
%end
end
aF = a ;
64 bF = b ;
end
√
Note, we use for our finite difference stepsize machine |x|.
452 14 Nonlinear Differential Equations
A Run Time Example We will apply our finite difference global newton method
root finding code to the same simple example: find a root for f (x) = sin(x) in the
interval [ −7π
2
, 15π + 0.1]. We only need the code for the function now which is as
usual in the file f1.m.
To run this code on this example, we would then type a phrase like the one below:
Homework
Exercise 14.1.5 Use the Finite Difference Global Newton Method to find the second
positive solution of the equation x = tan(x). Do this for tolerances 10−8 . This time
alter the GlobalNewtonFD code to allow the finite difference step size delta to be
a parameter and do a parametric study on the effects of delta. Note that the code
√
now uses the reasonable choice of machine |x| but you need to use the additional δ
choices {10−4 , 10−6 , 10−8 , 10−10 }. This will give you five δ choices. Provide a table
and a graph of δ versus accuracy of the root approximation.
Exercise 14.1.6 Use the Finite Difference Global Newton Method to find the largest
real root of the function f (x) = x 6 − x − 1. Do this for tolerances 10−8 . Again
use altered GlobalNewtonFD code with the finite difference step size delta as a
parameter and do a parametric study on the effects of delta. Note that the code
√
now uses the reasonable choice of machine |x| but you need to use the additional δ
14.1 Linear Approximations to Nonlinear Models 453
choices {10−4 , 10−6 , 10−8 , 10−10 }. This will give you five δ choices. Provide a table
and a graph of δ versus accuracy of the root approximation.
Exercise 14.1.7 Do the same thing for the problems above, but replace the Finite
Difference Global Newton Code with a Secant Global Newton Code. This will only
require a few lines of code to change really, so don’t freak out!
x = 0.5(−h(x) + y)
y = 0.2(−x − 1.5y + 1.2)
for
This is model of how an electrical component called a diode behaves called the
trigger model and the details are not really important as we are just investigating
how to use our code and theoretical ideas. The equilibrium points are the solutions
to the simultaneous equations
We can see these solutions graphically by plotting the two curves simultaneously
and using the cursor to locate the roots and read off the (x, y) values from the plot.
This is not quite accurate so a better way is to find the roots numerically. The plot
which shows the equilibrium points graphically is shown in Fig. 14.2.
We do this with the following MatLab/Octave session.
Q 1 = (0.062695, 0.758672)
Q 2 = (0.28537, 0.60975)
Q 3 = (0.88443, 0.21038)
The Jacobian is
−0.5h (x) 0.5
J (x, y) =
−0.2 −0.3
D1 =
13
D i a g o n a l Matrix
−3.58798 0
0 −0.33041
Hence, there are two real distinct eigenvalues, r1 = 3.58798 and r2 = −0.33041.
The eigenvectors are
−0.998155 −0.150341
E1 = and E2 =
−0.060715 −0.988634
We set up the plot as follows. We define the coefficient matrix A1 of our linearization
and set up the parameter p1 by listing all the entries of A1 followed by the coordinates
of Q1. We then generate the automatic phase plane plot.
456 14 Nonlinear Differential Equations
For equilibrium point Q2, we find the linearization just like we did for Q1. First, we
find the Jacobian at this point.
Listing 14.27: Jacobian at Q2
J ( x2 , y2 )
ans =
1.82012 0.50000
5 −0.20000 −0.30000
[ V2 , D2 ] = e i g ( J ( x2 , y2 ) )
V2 =
0 . 9 9 5 3 7 3 −0.234595
−0.096085 0.972093
10 D2 =
1.77185 0
0 −0.25173
Hence, there are two real distinct eigenvalues, r1 = 1.77185 and r2 = −0.25173.
The eigenvectors are
0.995373 −0.234595
E1 = and E2 =
−0.096085 0.972093
Finally, we analyze the model near the point Q3. The Jacobian is now
Hence, there are two real distinct eigenvalues, r1 = −1.34096 and r2 = −0.39607.
The eigenvectors are
−0.98204 −0.43297
E1 = and E2 =
−0.18868 −0.90141
The dominant eigenvector is now E2 . We set the coefficient matrix A3 of our lin-
earization and the parameter p3 of our model and generate the automatic phase plane
plot.
We can also generate the full plot of the original system using the function
triggermodel
14.1 Linear Approximations to Nonlinear Models 459
The next nonlinear system is the familiar Predator–Prey model. Consider the follow-
ing example
460 14 Nonlinear Differential Equations
x (3 − 4y) = 0
y (−5 + 7x) = 0
which has the familiar solutions Q 1 = (0, 0) and Q 2 = ( 75 , 43 ). The Jacobian here is
3 − 4y −4x
J(x, y) =
7y −5 + 7x
Hence, there are two real distinct eigenvalues, r1 = −5 and r2 = 3. The eigenvectors
are
0 1
E1 = and E2 =
1 0
The dominant eigenvector is thus E2 and it is easy to plot the resulting trajectories.
Using the function linearsystemep we set up the plot. Using the coefficient
matrix A1 of our linearization and the coordinates of Q 1 , we set the parameter p1
by listing all the entries of A1 followed by the coordinates of Q1. We then generate
the automatic phase plane plot.
5 3 −0 0 −5 0 0
AutoPhasePlanePlotLinearSystemRKF5 ( ’x axis’ , . . .
’y axis’ , ’Q1 Plot’ , . . .
1 . 0 e −6 ,1.0 e − 6 , . 0 1 , . 2 , . . .
10 ’linearsystemep’ , p1 , . 0 1 , 0 , . 4 , 1 2 , 1 2 , − 1 , 1 , − 1 , 1 ) ;
Hence, there is now a complex conjugate eigenvalue pair which has zero real part.
Thus, these trajectories will be circles about Q 2 . The eigenvalues are r1 = 3.8730i
and r2 = −3.8730i. The eigenvectors are not really needed for our phase plane. We
set the coefficient matrix A2 of our linearization and the parameter p2 of our model
and generate the automatic phase plane plot.
The full plot of the original system uses the standard function PredPrey modified
for the model at hand.
The next nonlinear system is the Predator–Prey model with self interaction. Consider
the following example
where e and f are positive numbers. From our earlier discussions, we know that there
are essentially two interesting cases here. The x and y nullclines cross in the first
quadrant leading to spiral in trajectories towards that common point or they cross
in the fourth quadrant which leads to all trajectories converging to a point on the
positive x axis. We can now look at this model using our new tools. The equilibrium
points (x, y) solve the equations
x (3 − 4y − ex) = 0
y (−5 + 7x − f y) = 0
Three of the equilibrium points are then Q 1 = (0, 0), Q 2 = (0, −5/ f ) and Q 3 =
(3/e, 0). These equilibrium points coincide with trajectories that we do not see if we
choose positive initial conditions in Quadrant I.
This situation occurs when 3/e > 5/7 so as an example, let’s choose the value e = 1.
The value of f is not very important, so let’s choose f = 1 also just to make it easy.
Then, the intersection occurs when
x + 4y = 3
7x − y = 5
We can find this point of intersection many ways. The old fashioned way is by elimi-
nation. We find x = 23/29 and y = 16/29. Let Q 4 = (23/29, 16/29). The Jacobian
in general is
466 14 Nonlinear Differential Equations
3 − 4y − 2ex −4x
J(x, y) = .
7y −5 + 7x − 2 f y
We will only look at the local linearizations for equilibrium points Q 4 and then Q 3 .
The other two are similar to what we have done before.
Equilibrium Point Q4 For the equilibrium point Q4, we can find the Jacobian at Q 4
and the corresponding eigenvalues and eigenvectors. As expected, the eigenvalues
are complex with negative real part implying the local linearization gives spiral in
trajectories.
The eigenvalues are both real, r1 = −3 and r2 = 16. Hence, the dominant eigenvector
is E2 where
−0.53399
E2 =
0.84549
So we will see all the trajectories moving parallel or towards the E2 line. We set the
coefficient matrix A3 of our linearization and the parameter p3 of our model and
generate the automatic phase plane plot. It took a bit of experimentation to generate
this plot so that it looked reasonably good. The eigenvalue of 16 causes very fast
growth!
MatLab! A Predator–Prey Model with Self Interaction! the nullclines intersect in
Quadrant 1: session code for phase plane portrait for equilibrium point Q3
468 14 Nonlinear Differential Equations
To generate the full phase plane in all quadrants is difficult due to the growth rates
in all areas of the plane other than quadrant I.
This situation occurs when 3/e < 5/7 so as an example, let’s choose the value e = 5.
The value of f is not very important, so let’s choose f = 5 also just to make it easy.
Then, the intersection occurs when
5x + 4y = 3
7x − 5y = 5
We find x = 35/53 and y = −4/53. Let Q 4 = (35/53, −4/53). The Jacobian for
e = 5 and f = 5 is
3 − 4y − 10x −4x
J(x, y) = .
7y −5 + 7x − 10y
The other equilibrium points are Q 1 = (0, 0), Q 2 = (0, −5/ f ) and Q 3 = (3/e, 0).
Equilibrium points coincide with trajectories that we do not see if we choose positive
initial conditions in Quadrant I. However, equilibrium point Q 3 is different now. All
the trajectories that start in the positive first quadrant will converge to Q 3 . Again, we
will only look at the local linearizations for equilibrium points Q 3 and Q 4 .
Equilibrium Point Q4 For the equilibrium point Q4, we can find the Jacobian at
Q 4 and the corresponding eigenvalues and. eigenvectors
470 14 Nonlinear Differential Equations
Hence, there are two real roots that are distinct and the dominant eigenvector is E2 .
Using the coefficient matrix A4 of our linearization and the coordinates of Q 4 , we
set the parameter p4 and generate the automatic phase plane plot.
The eigenvalues are now both real and negative, r1 = −3 and r2 = −0.8. The dom-
inant eigenvector is again E2 . So we will see all the trajectories moving parallel or
towards the E2 line.
To generate the full phase plane in all quadrants is as always difficult and we
have to play with the settings in the function AutoPhasePlanePlotRKF5NoP
Multiple.
14.1.6 Problems
1. Graph the nullclines x = 0 and y = 0 and show on the x − y plane the regions
where x and y take on their various algebraic signs.
2. Find the equilibrium points.
3. At each equilibrium point, find the Jacobian of the system and analyze the lin-
earized system we have discussed in class. This means:
• find eigenvalues and eigenvectors if the system has real eigenvalues. You don’t
need the eigenvectors if the eigenvalues are complex.
• sketch a graph of the linearized solutions near the equilibrium point.
4. Using 1 and 3 to combine all this information into full graph of the system.
Exercise 14.1.8
x = y
x3
y = −x + −y
6
Exercise 14.1.9
x = −x + y
y = 0.1x − 2y − x 2 − 0.1x 3
Exercise 14.1.10
x = y
y = −x + y (1 − 3x 2 − 2y 2 )
Chapter 15
An Insulin Model
We are now going to discuss a very nice model of diabetes detection or equivalently,
a model of insulin regulation, which was presented in the classic text on applied
mathematical modeling by Braun, Differential Equations and Their Applications
(Braun 1978).
In diabetes there is too much sugar in the blood and the urine. This is a metabolic
disease and if a person has it, they are not able to use up all the sugars, starches
and various carbohydrates because they don’t have enough insulin. Diabetes can be
diagnosed by a glucose tolerance test (GTT). If you are given this test, you do an
overnight fast and then you are given a large dose of sugar in a form that appears in
the bloodstream. This sugar is called glucose. Measurements are made over about
five hours or so of the concentration of glucose in the blood. These measurements
are then used in the diagnosis of diabetes. It has always been difficult to interpret
these results as a means of diagnosing whether a person has diabetes or not. Hence,
different physicians interpreting the same data can come up with a different diagnosis,
which is a pretty unacceptable state of affairs!
In this chapter, we are going to discuss a criterion developed in the 1960s by
doctors at the Mayo Clinic and the University of Minnesota that was fairly reliable.
It showcases a lot of our modeling in this course and will give you another example
of how we use our tools. We start with a simple model of the blood glucose regulatory
system.
Glucose plays an important role in vertebrate metabolism because it is a source of
energy. For each person, there is an optimal blood glucose concentration and large
deviations from this leads to severe problems including death. Blood glucose levels
are autoregulated via standard forward and backward interactions like we see in many
biological systems. An example is the signal that is used to activate the creation of
a protein which we discussed earlier. The signaling molecules are typically either
bound to another molecule in the cell or are free. The equilibrium concentration of
free signal is due to the fact that the rate at which signaling molecules bind equals the
rate at which they split apart from their binding substrate. When an external message
comes into the cell called a trigger, it induces a change in this careful balance which
temporarily upgrades or degrades the equilibrium signal concentration. This then
© Springer Science+Business Media Singapore 2016 475
J.K. Peterson, Calculus for Cognitive Scientists,
Cognitive Science and Technology, DOI 10.1007/978-981-287-877-9_15
476 15 An Insulin Model
influence the protein concentration rate. Blood glucose concentrations work like this
too, although the details differ. The blood glucose concentration is influenced by a
variety of signaling molecules just like the protein creation rates can be. Here are some
of them. The hormone that decreases blood glucose concentration is insulin. Insulin
is a hormone secreted by the β cells of the pancreas. After we eat carbohydrates, our
gastrointestinal tract sends a signal to the pancreas to secrete insulin. Also, the glucose
in our blood directly stimulates the β cells to secrete insulin. We think insulin helps
cells pull in the glucose needed for metabolic activity by attaching itself to membrane
walls that are normally impenetrable. This attachment increases the ability of glucose
to pass through to the inside of the cell where it can be used as fuel. So, if there is not
enough insulin, cells don’t have enough energy for their needs. The other hormones
we will focus on all tend to change blood glucose concentrations also.
ability to help those cells pull glucose out of the blood stream. These actions can
therefore increase blood glucose levels.
Now net hormone concentration is the sum of insulin plus the others. Let H denote
this net hormone concentration. At normal conditions, call this concentration H0 .
There have been studies performed that show that under close to normal conditions,
the interaction of the one hormone insulin with blood glucose completely dominates
the net hormonal activity. That is normal blood sugar levels primarily depend on
insulin-glucose interactions.
So if insulin increases from normal levels, it increases net hormonal concentration
to H0 + H and decreases glucose blood concentration. On the other hand, if other
hormones such as cortisol increased from base levels, this will make blood glucose
levels go up. Since insulin dominates all activity at normal conditions, we can think
of this increase in cortisol as a decrease in insulin with a resulting drop in blood
glucose levels. A decrease in insulin from normal levels corresponds to a drop in net
hormone concentration to H0 − H. Now let G denote blood glucose level. Hence,
in our model an increase in H means a drop in G and a decrease in H means an
increase in G! Note our lumping of all the hormone activity into a single net activity
is very much like how we modeled food fish and predator fish in the predator–prey
model.
The idea of our model for diagnosing diabetes from the GTT is to find a simple
dynamical model of this complicated blood glucose regulatory system in which
the values of two parameters would give a nice criterion for distinguishing normal
individuals from those with mild diabetes or those who are pre diabetic. Here is what
we will do. We describe the model as
where the function J is the external rate at which blood glucose concentration is
being increased. There are two nonlinear interaction functions F1 and F2 because
we know G and H have complicated interactions.
Let’s assume G and H have achieved optimal values G 0 and H0 by the time
the fasting patient has arrived at the hospital. Hence, we don’t expect to have any
contribution to G (0) and H (0); i.e. F1 (G 0 , H0 ) = 0 and F2 (G 0 , H0 ) = 0.
We are interested in the deviation of G and H from their optimal values G 0 and
H0 , so let g = G − G 0 and h = H − H0 . We can then write G = G 0 + g and
H = H0 + h. The model can then be rewritten as
or
g (t) = F1 (G 0 + g, H0 + h) + J(t)
h (t) = F2 (G 0 + g, H0 + h)
We know the tangent plane to a function F(x, y) at the point (x0 , y0 ) is given by
where the error is E F . We use this idea on our functions F1 and F2 at the optimal
values G 0 and H0 . We have
∂ F1 ∂ F1
F1 (G 0 + g, H0 + h) = F1 (G 0 , H0 ) + (G 0 , H0 ) g + (G 0 , H0 ) h + E F1
∂g ∂h
∂ F2 ∂ F2
F2 (G 0 + g, H0 + h) = F2 (G 0 , H0 ) + (G 0 , H0 ) g + (G 0 , H0 ) h + E F2
∂g ∂h
∂ F1 ∂ F1
F1 (G 0 + g, H0 + h) = (G 0 , H0 ) g + (G 0 , H0 ) h + E F1
∂g ∂h
∂ F2 ∂ F2
F2 (G 0 + g, H0 + h) = (G 0 , H0 ) g + (G 0 , H0 ) h + E F2
∂g ∂h
It seems reasonable to assume that since we are so close to ordinary operating con-
ditions, the errors E F1 and E F2 will be negligible. Thus our model approximation is
∂ F1 ∂ F1
g (t) = (G 0 , H0 ) g + (G 0 , H0 ) h + J(t)
∂g ∂h
∂ F2 ∂ F2
h (t) = (G 0 , H0 ) g + (G 0 , H0 ) h
∂g ∂h
We can reason out the algebraic signs of the four partial derivatives to be
∂ F1
(G 0 , H0 ) = −
∂g
∂ F1
(G 0 , H0 ) = −
∂h
∂ F2
(G 0 , H0 ) = +
∂g
∂ F2
(G 0 , H0 ) = −
∂h
15 An Insulin Model 479
The arguments for these algebraic signs come from our understanding of the phys-
iological processes that are going on here. Let’s look at a small positive deviationg
from the optimal value G 0 while letting the net hormone concentration be fixed at
H0 . At this point, we are not adding an external input, so here J(t) = 0. Then our
model approximation is
∂ F1
g (t) = (G 0 , H0 ) g
∂g
At a state where we have an increase in blood sugar levels over optimal, i.e. g > 0,
the other hormones such as cortisol and glucagon will try to regulate the blood sugar
level down by increasing their concentrations and for example storing more sugar
into glycogen. Hence, the term ∂∂gF1 (G 0 , H0 ) should be negative as here g is negative
as g should be decreasing. So we model this as ∂∂gF1 (G 0 , H0 ) = −m 1 for some
positive number m 1 . Now consider a positive change in h from the optimal level
while keeping at the optimal level G 0 . Then the model is
∂ F1
g (t) = (G 0 , H0 ) h
∂h
and since h > 0, this means the net hormone concentration is up which we interpret
as insulin above normal. This means blood sugar levels go down which implies g
is negative again. Thus, ∂∂hF1
(G 0 , H0 ) must be negative which means we model it as
∂ F1
∂h
(G 0 , H0 ) = −m 2 for some positive m 2 .
Now look at the h model in these two cases. If we have a small positive deviation
g from the optimal value G 0 while letting the net hormone concentration be fixed at
H0 , we have
∂ F2
h (t) = (G 0 , H0 ) g.
∂g
Again, since g is positive, this means we are above normal blood sugar levels which
implies mechanisms are activated to bring the level down. Hence h > 0 as we have
increasing net hormone levels. Thus, we must have ∂∂g F2
(G 0 , H0 ) = m 3 for some
positive m 3 . Finally, if we have a positive deviation h from optimal while blood
sugar levels are optimal, the model is
∂ F2
h (t) = (G 0 , H0 ) h.
∂h
Since h is positive, we have the concentrations of the hormones that pull glucose
out of the blood stream are above optimal. This means that too much sugar is being
removed as so the regulatory mechanisms will act to stop this action implying h < 0.
This tells us ∂∂g
F2
(G 0 , H0 ) = −m 4 for some positive constant m 4 . Hence, the four
480 15 An Insulin Model
partial derivatives at the optimal points can be defined by four positive numbers m 1 ,
m 2 , m 3 and m 4 as follows:
∂ F1
(G 0 , H0 ) = −m 1
∂g
∂ F1
(G 0 , H0 ) = −m 2
∂h
∂ F2
(G 0 , H0 ) = +m 3
∂g
∂ F2
(G 0 , H0 ) = −m 4
∂h
Our model dynamics are thus approximated by
g (t) = −m 1 g − m 2 h + J(t)
h (t) = m 3 g − m 4 h
This implies
g (t) = −m 1 g − m 2 h + J (t)
g (t) = −m 1 g − m 2 (m 3 g − m 4 h) + J (t)
= −m 1 g − m 2 m 3 g + m 2 m 4 h + J (t).
m 2 h = −g (t) − m 1 g + J(t)
which leads to
g (t) = −m 1 g − m 2 m 3 g + m 4 −g (t) − m 1 g + J(t) + J (t)
= −(m 1 + m 4 ) g − (m 1 m 4 + m 2 m 3 )g + m 4 J(t) + J (t).
g (t) + 2α g + ω 2 g = S(t).
15 An Insulin Model 481
where S(t) = m 4 J(t) + J (t). Now the right hand side here is zero except for the
very short time interval when the glucose load is being ingested. Hence, we can
simply search for the solution to the homogeneous model
g (t) + 2α g + ω 2 g = 0.
The most interesting case is if we have complex roots. In that case, α2 − ω 2 < 0.
Let 2 = |α2 − ω 2 |. Then, the general phase shifted solution has the form g =
Re−αt cos(t − δ) which implies
Hence, our model has five unknowns to find: G 0 , R, α, and δ, The easiest way to
do this is to measure G 0 , the patient’s initial blood glucose concentration, when the
patient arrives. Then measure the blood glucose concentration N more times giving
the data pairs (t1 , G 1 ), (t2 , G 2 ) and so on out to (t N , G N ). Then form the least squares
error function
N
2
E= G i − G 0 − Re−αti cos(ti − δ)
i=1
and find the five parameter values that make this error a minimum. This can be
done using MatLab using some tools that outside of the scope of our text. Numerous
experiments have been done with this model and if we let T0 = 2π/, it has been
found that if T0 < 4 h, the patient is normal and if T0 is much larger than that, the
patient has mild diabetes.
We will now try to find the parameter values which minimize the nonlinear least
squares problem we have here. This appears to be a simple problem, but you will see
all numerical optimization problems are actually fairly difficult. Our problem is to
find the free parameters G o , R, α, and δ which minimize
N
2
E(G 0 , R, α, , δ) = G i − G 0 − Re−αti cos(ti − δ)
i=1
∂E N
∂ fi
=2 f i (X)
∂ Go i=1
∂ Go
∂E N
∂ fi
=2 f i (X)
∂R i=1
∂R
∂E N
∂ fi
=2 f i (X)
∂α i=1
∂α
∂E N
∂ fi
=2 f i (X)
∂ i=1
∂
∂E N
∂ fi
=2 f i (X)
∂δ i=1
∂δ
∂ fi
= −1
∂ Go
∂ fi
= −e−αti cos(ti − δ)
∂R
∂ fi
= ti Re−αti cos(ti − δ),
∂α
∂ fi
= ti Re−αti sin(ti − δ)
∂
∂ fi
= −Re−αti sin(ti − δ)
∂δ
15.1 Fitting the Data 483
and so
∂E N
= −2 f i (X)
∂ Go i=1
∂E N
= −2 f i (X) e−αti cos(ti − δ)
∂R i=1
∂E N
=2 f i (X) ti Re−αti cos(ti − δ),
∂α i=1
∂E N
=2 f i (X) ti Re−αti sin(ti − δ)
∂ i=1
∂E N
= −2 f i (X) Re−αti sin(ti − δ)
∂δ i=1
Now suppose we are at the point X 0 and we want to know how much of the descent
vector D to use. Note, if we use the amount ξ of the descent vector at X 0 , we
compute the new error value E(X 0 −ξ D(X 0 )). Let g(ξ) = E(X 0 −ξ D(X 0 )). We see
g(0) = E(X 0 ) and given a first choice of ξ = λ, we have g(λ) = E(X 0 − λD(X 0 )).
Next, let Y = X 0 − ξ D(X 0 ). Then, using the chain rule, we can calculate the
derivative of g at 0. First, we have
g (ξ) = −<∇(E), D(X 0 ) >
Now let’s approximate g using a quadratic model. Since we are trying for a minimum,
in general we try to take a step in the direction of the negative gradient which makes
the error function go down. Then, we have g(0) = E(X 0 ) is less than g(λ) =
E(X 0 − λD(X 0 )) and the directional derivative gives g (0) = −||∇(E)|| < 0.
Hence, if we approximate g by a simple quadratic model, g(ξ) = A + Bξ + Cξ 2 ,
this model will have a unique minimizer and we can use the value of ξ where the
minimum occurs as our next choice of descent step. This technique is called a Line
Search Method and it is quite useful. To summarize, we fit our g model and find
g(0) = E(X 0 )
g (0) = −||∇(E)||
E(X 0 − λD(X 0 )) − E(X 0 ) + ||∇(E)||λ
g(λ) = A + Bλ + Cλ2 =⇒ C =
λ2
Let’s get started on how to find the optimal parameters numerically. Along the way,
we will show you how hard this is. We start with a minimal implementation. We have
already discussed some root finding codes in Chap. 14 so we have seen code kind of
similar. But this will be a little different and it is good to have you see a bit about it.
What is complicated here is that we have lots of functions that depend on the data
we are trying to fit. So the number of functions depends on the size of the data set
which makes it harder to set up.
Note inside this function, we call another function to calculate the gradient of
the norm. This is given below and implements the formulae we presented earlier for
these partial derivatives.
15.1 Fitting the Data 485
We also need code for the error calculations which is given here.
Listing 15.3: Diabetes Error Calculation
f u n c t i o n E = D i a b e t e s E r r o r ( f , g , r , a , o , d , Time ,G)
2 %
% T = Time
% G = Glocuose v a l u e s
% f = n o n l i n e a r i n s u l i n model
% g , a , r , o , d = p a r a m e t e r s i n d i a b e t e s n o n l i n e a r model
7 % N = s i z e of data
N = l e n g t h (G) ;
E = 0.0;
% calculate error function
f o r i = 1 :N
12 E = E +( G( i ) − f ( Time ( i ) , g , r , a , o , d ) ) ˆ 2 ;
end
Listing 15.4: Run time results for gradient descent on the original data
Data = [ 0 , 9 5 ; 1 , 1 8 0 ; 2 , 1 5 5 ; 3 , 1 4 0 ] ;
2 Time = Data ( : , 1 ) ;
G = Data ( : , 2 ) ;
f = @( t , equilG , r , a , o , d ) e qu i l G + r ∗ exp(−a ˆ 2 ∗ t ) . ∗ ( c o s ( o ∗ t−d ) ) ;
time = l i n s p a c e ( 0 , 3 , 4 1 ) ;
RInitial = 53.64;
7 G O I n i t i a l = 95 + R I n i t i a l ;
AInitial = sqrt ( log (17/5) ) ;
OInitial = pi ;
d I n i t i a l = −p i ;
I n i t i a l = [ GOInitial ; R I n i t i a l ; A I n i t i a l ; OInitial ; d I n i t i a l ] ;
12 [ Error , G0 , R, alpha , Omega , d e l t a , normgrad , update ] = DiabetesGrad ( I n i t i a l , 5 . 0 e
−4 ,20000 , Data , 0 ) ;
I n i t i a l E = 463.94
E = 376.40
o c t a v e :16> [ Error , G0 , R, alpha , Omega , d e l t a , normgrad , update ] = DiabetesGrad (
I n i t i a l , 5 . 0 e −4 ,40000 , Data , 0 ) ;
I n i t i a l E = 463.94
17 E = 377.77
o c t a v e :145 > [ Error , G0 , R, alpha , Omega , d e l t a , normgrad , update ] = DiabetesGrad (
I n i t i a l , 5 . 0 e −4 ,100000 , Data , 0 ) ;
I n i t i a l E = 463.94
E = 377.77
After 100,000 iterations we still do not have a good fit. Note we start with a
small constant λ = 5.0e − 4 here. Try it yourself. If you let this value be larger,
the optimization spins out of control. Also, we have not said how we chose our
initial values. We actually looked at the data on a sheet of paper and did some rough
calculations to try for some decent values. We will leave that to you to figure out. If
the initial values are poorly chosen, gradient descent optimization is a great way to
generate really bad values! So be warned. You will have to exercise due diligence to
find a sweet starting spot.
We can see how we did by looking the resulting curve fit in Fig. 15.1.
Now let’s add line search and see if it gets better. We will also try scaling the data
so all the variables in question are roughly the same size. For us, a good choice is
to scale the G 0 and the R value by 50, although we could try other choices. We
have already discussed line search for our problem, but here it is again in a quick
nutshell. If we are minimizing a function of M variables, say f (X), then if we are
at the point X 0 , we can look at the slice of this function if we move out from the
base point X 0 is the direction of the negative gradient, ∇( f (X 0 ) = ∇( f 0 ). Define
a function of the single variable ξ as g(ξ) = f (X 0 + ξ ∇( f 0 )). Then, we can try
to approximate g as a quadratic, g(ξ) ≈= A + Bξ + Cξ 2 . Of course, the actual
function might not be approximated nicely by such a quadratic, but it is worth a shot!
Once we fit the parameters A, B and C, we see this quadratic model is minimized
at λ∗ = − 2CB
. The code now adds the line search code which is contained in the block
We can see how this is working by letting some of the temporary calculations
print. Here are two iterations of line search printing out the A, B and C and the rele-
vant energy values. Our initial values don’t matter much here as we are just checking
out the line search algorithm.
15.1 Fitting the Data 489
Now let’s remove those prints and let it run for awhile. We are using the original
data here to try to find the fit.
We have success! The line search got the job done in 30,000 iterations while the
attempt using just gradient descent without line search failed. but remember, we do
additional processing at each step. We show the resulting curve fit in Fig. 15.2.
The qualitative look of this fit is a bit different. We leave it you to think about how
we are supposed to choose which fit is better; i.e. which fit is a better one to use for
the biological reality we are trying to model? This is a really hard question. Finally,
the optimal values of the parameters are
490 15 An Insulin Model
The critical value of 2π/ = 1.6663 here which is less than 4 so this patient is
normal!
Also, note these sorts of optimizations are very frustrating, If we use the
scaled version of the first Initial =[GOInitial;.RInitial;AInitial;
OInitial;.dInitial]; we make no progress even though we run for 60, 000
iterations and these iterations a bit more expensive because we use line search. So
let’s perturb the starting point a bit and see what happens.
Reference
Let’s look at another tool we can use to solve models. Sometimes our models involve
partial derivatives instead of normal derivatives; we call such models partial differ-
ential equation models or PDE models. They are pretty important and you should
have a beginning understanding of them. Let’s get started with a common tool called
the Separation of Variables Method.
A common PDE model is the general cable model which is given below is fairly
abstract form.
∂2 ∂
β2 −−α = 0, for 0 ≤ x ≤ L , t ≥ 0,
∂x 2 ∂t
∂
(0, t) = 0,
∂x
∂
(L , t) = 0,
∂x
(x, 0) = f (x).
for positive constants α and β. The domain is the usual half infinite [0, L] × [0, ∞)
where the spatial part of the domain corresponds to the length of the dendritic cable
in an excitable nerve cell. We won’t worry too much about the details of where
this model comes from as we will discuss that in another volume. The boundary
conditions u x (0, t) = 0 and u x (L , t) = 0 are called Neumann Boundary conditions.
The conditions u(0, t) = 0 and u(L , t) = 0 are known as Dirichlet Boundary
conditions. The solution to a model such as this is a function (x, t) which is
sufficiently smooth to have partial derivatives with respect to the needed variables
continuous for all the orders required. For these problems, the highest order we need
is the second order partials. One way to find the solution is to assume we can separate
the variables so that we can write (x, t) = u(x)w(t). If we make this separation
assumption, we will find solutions that must be written as what are called infinite
series and to solve the boundary conditions, we will have to be able to express
boundary functions as series expansions. Hence, we will have to introduce some
new ideas in order to understand these things. Let’s motivate what we need to do by
applying the separation of variables technique to the cable equation. This will shows
the ideas we need to use in a specific example. Then we will step back and go over
the new mathematical ideas of series and then return to the cable model and finish
finding the solution.
We assume a solution of the form (x, t) = u(x) w(t) and compute the needed
partials. This leads to a the new equation
d 2u dw
β2 w(t) − u(x)w(t) − αu(x) = 0.
dx2 dt
Rewriting, we find for all x and t, we must have
2
2d u dw
w(t) β 2
− u(x) = αu(x) .
dx dt
This tells us
2
β 2 dd xu2 − u(x) α dw
= dt , 0 ≤ x ≤ L , t > 0.
u(x) w(t)
The only way this can be true is if both the left and right hand side are equal to a
constant that is usually called the separation constant . This leads to the decoupled
Eqs. 16.1 and 16.2.
dw
α = w(t), t > 0, (16.1)
dt
d 2u
β 2 2 = (1 + ) u(x), 0 ≤ x ≤ L , (16.2)
dx
We also have boundary conditions. Our assumption leads to the following boundary
conditions in x:
du
(0) w(t) = 0, t > 0,
dx
du
(L) w(t) = 0, t > 0.
dx
16.1 The Separation of Variables Method 497
du
(0) = 0, (16.3)
dx
du
(L) = 0. (16.4)
dx
Equations 16.1–16.4 give us the boundary value problem in u(x) we need to solve.
Then, we can find w.
16.1.1.1 Case I: 1 + = ω 2 , ω = 0
ω2
u − u=0
β2
u (0) = 0,
u (L) = 0.
with characteristic equation r 2 − ωβ 2 = 0 with the real roots ± √ωD . The general
2
which tells us
ω ω ω ω
u (x) = A sinh x + B cosh x
β β β β
u (0) = 0 = B
ω
u (L) = 0 = A sinh L
β
Hence, B = 0 and A sinh L ωβ = 0. Since sinh is never zero when ω is not zero,
we see A = 0 also. Hence, the only u solution is the trivial one and we can reject
this case.
u = 0
u (0) = 0,
u (L) = 0.
with characteristic equation r 2 = 0 with the double root r = 0. Hence, the general
solution is now
u(x) = A + B x
Applying the boundary conditions, u(0) = 0 and u(L) = 0. Hence, since u (x) = B,
we have
u (0) = 0 = B
u (L) = 0 = B L
Hence, B = 0 but the value of A can’t be determined. Hence, any arbitrary constant
which is not zero is a valid non zero solution. Choosing B = 1, let u 0 (x) = 1 be
our chosen nonzero solution for this case.
We now need to solve for w in this case. Since = −1, the model to solve is
dw 1
= − w(t), 0 < t,
dt α
w(L) = 0.
16.1 The Separation of Variables Method 499
The general solution is w(t) = Ce− α t for any value of C. Choose C = 1 and we
1
set
w0 (y) = e− α t .
1
Hence, the product φ0 (x, t) = u 0 (x) w0 (t) solves the boundary conditions. That is
φ0 (x, t) = e− α t .
1
is a solution.
ω2
u + u=0
β2
u (0) = 0,
u (L) = 0.
and hence
ω ω ω ω
u (x) = −A sin x + B cos x
β β β β
u (0) = 0 = B
ω
u (L) = 0 = A sin L
β
Hence, B = 0 and A sin L ωβ = 0. Thus, we can determine a unique value of A
only if sin L ωβ = 0. If ω = nπβ
L
, we can solve for B and find B = 0, but otherwise,
B can’t be determined. So the only solutions are the trivial or zero solutions unless
ωL = nπβ. Letting ωn = nπβ L
, we find a non zero solution for each nonzero value
of B of the form
ωn nπ
u n (x) = B cos x = B cos x .
β L
500 16 Series Solutions
dw (1 + ωn2 )
=− w(t), t ≥ 0.
dt α
The general solution is
1+ωn2 1+n 2 π 2 β 2
w(t) = Bn e− α t
= Bn e− αL 2
t
n2 π2 β 2
wn (t) = e− αL 2
t
will solve the model with the x boundary conditions and any finite sum of the form,
for arbitrary constants An
N
N
N (x, t) = An φn (x, t) = An u n (x) wn (t)
n=1 n=1
N
nπ 1+n 2 π 2 β 2
= An cos x e− αL 2 t
n=1
L
Adding in the 1 + = 0 case, we find the most general finite term solution has the
form
N
N
N (x, t) = A0 φ0 (x, t) + An φn (x, t) = A0 u 0 (x)w0 (t) + An u n (x) wn (t)
n=1 n=1
N
nπ 1+n 2 π 2 β 2
= A0 e − α1 t
+ An cos x e− αL 2 t .
n=1
L
∞
∞
(x, t) = A0 φ0 (x, t) + An φn (x, t) = A0 u 0 (x)w0 (t) + An u n (x) wn (t)
n=1 n=1
∞
nπ 1+n 2 π 2 β 2
= A0 e− α t + x e− αL 2 t .
1
An cos
n=1
L
This is the form that will let us solve the remaining boundary condition. We need to
step back now and talk more about this idea of a series solution to our model.
Let’s
look at sequences offunctions
made up of building blocks of the form u n (x) =
cos L or vn (x) = sin L for various values of the integer n. The number L is
nπ nπ
a fixed value here. We can combine these functions into finite sums: let U N (x) and
VN (x) be defined as follows:
N
nπ
U N (x) = an sin x .
n=1
L
and
N
nπ
VN (x) = b0 + bn cos x .
n=1
L
and
are the partial sums formed from the sequences of cosine and sine numbers. However,
the underlying sequences can be negative, so these are not sequences of non negative
terms like we previously discussed. These sequences of partial sums may or may
not have a finite supremum value. Nevertheless, we still
represent the supremum
∞
using the same notation: i.e. the supremum of Ui (x0 ) and the supremum of
∞ i=1
∞ ∞
Vi (x0 ) can be written as n=1 an sin L x and b0 + n=1 bn cos L x
nπ nπ
i=0
502 16 Series Solutions
This sequence of real numbers converges to a possibly different number for each x0 ;
hence, let’s call this possible limit S(x0 ). Now the limit may not exist, of course. We
will write limn→ Un (x0 ) = S(x0 ) when the limit exists. If the limit does not exist for
some value of x0 , we will understand that the value S(x0 ) is not defined in some way.
Note, from our discussion above, this could mean the limiting value flips between
a finite set of possibilities, the limit approaches ∞ or the limit approaches −∞. In
any case, the value S(x0 ) is not defined as a finite value. We would say this precisely
as follows: given any positive tolerance , there is a positive integer N so that
n iπ
n > N =⇒ ai sin − S(x0 ) < .
i=1
L
n
iπ
lim ai sin = S(x0 ).
n−>∞
i=1
L
As before, this symbol is called an infinite series and we see we get a potentially
different series at each point x0 . The error term S(x0 ) − Un (x0 ) is then written as
n ∞
iπ iπ
S(x0 ) − ai sin x0 = ai sin x0 ,
i=1
L i=n+1
L
which you must remember is just a short hand for this error.
Now that we have an infinite series notation defined, we note the term Un (x0 ),
which is the sumof n terms, is also called the nth partial sum of the series
∞ iπ
i=1 ai sin L x 0 . Note we can define the convergence at a point x 0 for the partial
Let’s go back and think about vectors in 2 . As you know, we think of these as
arrows with a tail fixed at the origin of the two dimensional coordinate system we
call the x–y plane. They also have a length or magnitude and this arrow makes an
angle with the positive x axis. Suppose we look at two such vectors, E and F. Each
vector has an x and a y component so that we can write
a c
E= , F=
b d
The cosine of the angle between them is proportional to the inner product
< E, F >= ac + bd. If this angle is 0 or π, the two vectors lie along the same
line. In any case, the angle associated with E is tan−1 ( ab ) and for F, tan−1 ( dc ).
Hence, if the two vectors lie on the same line, E must be a multiple of F. This means
there is a number β so that
E = β F.
Now let the number 1 in front of E be called −α. Then the fact that E and F lie on
the same line implies there are 2 constants α and β, both not zero, so that
α E + β F = 0.
Note we could argue this way for vectors in 3 and even in n . Of course, our ability
to think of these things in terms of lying on the same line and so forth needs to be
extended to situations we can no longer draw, but the idea is essentially the same.
Instead of thinking of our two vectors as lying on the same line or not, we can rethink
what is happening here and try to identify what is happening in a more abstract
way. If our two vectors lie on the same line, they are not independent things in the
sense one is a multiple of the other. As we saw above, this implies there was a linear
equation connecting the two vectors which had to add up to 0. Hence, we might say
the vectors were not linearly independent or simply, they are linearly dependent.
Phrased this way, we are on to a way of stating this idea which can be used in many
more situations. We state this as a definition.
504 16 Series Solutions
α E + β F = 0.
α1 E 1 + · · · + α N E N = 0.
Note we have changed the way we define the constants a bit. When there are more
than two objects involved, we can’t say, in general, that all of the constants must be
non zero.
Now let’s apply these ideas to functions f and g defined on some interval I . By this
we mean either
• I is all of , i.e. a = −∞ and b = ∞,
• I is half-infinite. This means a = −∞ and b is finite with I of the form (−∞, b)
or (−∞, b]. Similarly, I could have the form (a, ∞) or [a, ∞,
• I is an interval of the form (a, b), [a, b), (a, b] or [a, b] for finite a < b.
We would say f and g are linearly independent on the interval I if the equation
implies α1 and α2 must both be zero. Here is an example. The functions sin(t) and
cos(t) are linearly independent on because
also implies the above equation holds for the derivative of both sides giving
for all t. The determinant of the matrix here is cos2 (t) + sin2 (t) = 1 and so picking
any t we like, we find the unique solution is α1 = α2 = 0. Hence, these two
functions are linearly independent on . In fact, they are linearly independent on any
interval I .
This leads to another important idea. Suppose f and g are linearly independent
differentiable functions on an interval I . Then, we know the system
f (t) g(t) α1 0
=
f (t) g (t) α2 0
for all t in I . This determinant comes up a lot and it is called the Wronskian of
the two functions f and g and it is denoted by the symbol W ( f, g). Hence, we have
the implication: if f and g are linearly independent differentiable functions, then
W ( f, g) = 0 for all t in I . What about the converse? If the Wronskian is never zero
on I , then the system
f (t) g(t) α1 0
=
f (t) g (t) α2 0
Theorem 16.3.1 (Two Functions are Linearly Independent if and only if their
Wronskian is not zero) If f and g are differentiable functions on I , the Wronskian of
f and g is defined
to be
f (t) g(t)
W ( f, g) = det .
f (t) g (t)
where W ( f, g) is the symbol for the Wronskian of f and g. Sometimes, this is just
written as W , if the context is clear. Then f and g are linearly independent on I if
and only if W ( f, g) is non zero on I .
506 16 Series Solutions
If f , g and h are twice differentiable on I , the Wronskian uses a third row of second
derivatives and the statement that these three functions are linearly independent on
I if and only if their Wronskian is non zero on I is proved essentially the same way.
The appropriate theorem is
Theorem 16.3.2 (Three Functions are Linearly Independent if and only if their
Wronskian is not zero) If f , g and h are twice differentiable functions on I , the
Wronskian of f , g and h is defined to be
⎛⎡ ⎤⎞
f (t) g(t) h(t)
W ( f, g, h) = det ⎝⎣ f (t) g (t) h (t) ⎦⎠ .
f (t) g (t) h (t)
where W ( f, g, h) is the symbol for the Wronskian of f and g. Then f , g and h are
linearly independent on I if and only if W ( f, g, h) is non zero on I .
For example, to show the three functions f (t) = t, g(t) = sin(t) and h(t) = e2t
are linearly independent on , we could form their Wronskian
⎛⎡ ⎤⎞
t sin(t) e2t
cos(t) 2e2t sin(t) e2t
W ( f, g, h) = det ⎝⎣1 cos(t) 2e2t ⎦⎠ = t −
− sin(t) 4e2t − sin(t) 4e2t
0 − sin(t) 4e2t
= t e (4 cos(t) + 2 sin(t) − e (4 sin(t) + sin(t)
2t 2t
= e 4t cos(t) + 2t sin(t) − 5 sin(t) .
2t
zero for all t? If so, that would mean the functions t sin(t), t cos(t) and sin(t) are
linearly dependent. We could then form another Wronskian for these functions which
would be rather messy. To see these three new functions are linearly independent, it is
easier to just pick thr ee points t from and solve the resulting linearly dependence
equations. Since t = 0 does not give any information, let’s try t = −π, t = π4 and
t = π2 . This gives the system
⎡ ⎤⎡ ⎤ ⎡ ⎤
−4π 0 0 α1 0
⎢ √ √
π 2 ⎥ ⎣α ⎦ = ⎣0⎦
⎣ π 2 4 2 −5 2 ⎦
2
2
0 2π −5 α3 0
2
16.3 Independant Objects 507
0 too. This shows t sin(t), t cos(t) and sin(t) are linearly independent and show
the line 4t cos(t) + 2t sin(t) − 5 sin(t) is not zero for all t. Hence, the functions
f (t) = t, g(t) = sin(t) and h(t) = e2t are linearly independent. As you can see,
these calculations become messy quickly. Usually, the Wronskian approach for more
than two functions is too hard and we use the “pick three suitable points ti , from I”
approach and solve the resulting linear system. If we can show the solution is always
0, then the functions are linearly independent.
16.3.2 Homework
We can make the ideas we have been talking about more formal. If we have a set
of objects u with a way to add them to create new objects in the set and a way to
scale them to make new objects, this is formally called a Vector Space with the set
denoted by V . For our purposes, we scale such objects with either real or complex
numbers. If the scalars are real numbers, we say V is a vector space over the reals;
otherwise, it is a vector space over the complex field.
Definition 16.4.1 (Vector Space) Let V be a set of objects u with an additive
operation ⊕ and a scaling method . Formally, this means
1. Given any u and v, the operation of adding them together is written u ⊕ v and
results in the creation of a new object w in the vector space. This operation is
commutative which means the order of the operation is not important; so u ⊕ v
and v ⊕ u give the same result. Also, this operation is associative as we can group
any two objects together first, perform this addition ⊕ and then do the others and
the order of the grouping does not matter.
508 16 Series Solutions
2. Given any u and any number c (either real or complex, depending on the type
of vector space we have), the operation c u creates a new object. We call such
numbers scalars.
3. The scaling and additive operations are nicely compatible in the sense that order
and grouping is not important. These are called the distributive laws for scaling
and addition. They are
c (u ⊕ v) = (c u) ⊕ (c v)
(c + d) u = (c u) ⊕ (d u).
(0 + 0) u = (0 u) ⊕ (0 u)
0 = (1 − 1) u
= (1 u) ⊕ (−1 u)
2. C 1 [a, b] is the set of all functions whose domain is [a, b] that are continuously
differentiable on the domain.
3. R[a, b] is the set of all functions whose domain is [a, b] that are Riemann inte-
grable on the domain.
There are many more, of course.
Vector spaces have two other important ideas associated with them. We have already
talked about linearly independent objects. Clearly, the kinds of objects we were
focusing on were from some vector space V . The first idea is that of the span of a set.
Definition 16.4.2 (The Span Of A Set Of Vectors) Given a finite set of vectors in
a vector space V , W = {u1 , . . . , u N } for somepositive integer N , the span of W
N
is the collection of all new vectors of the form i=1 ci ui for any choices of scalars
c1 , . . . , c N . It is easy to see W is a vector space itself and since it is a subset of V ,
we call it a vector subspace. The span of the set W is denoted by S pW . If the set
of vectors W is not finite, the definition is similar but we say the span of W is the
N
set of all vectors which can be written as i=1 ci ui for some finite set of vectors
u1 , . . . u N from W .
Then there is the notion of a basis for a vector space. First, we need to extend the
idea of linear independence to sets that are not necessarily finite.
Definition 16.4.3 (Linear Independence For Non Finite Sets) Given a set of vectors
in a vector space V , W , we say W is a linearly independent subset if every finite set
of vectors from W is linearly independent in the usual manner.
Definition 16.4.4 (A Basis For A Vector Space) Given a set of vectors in a vector
space V , W , we say W is a basis for V if the span of W is all of V and if the vectors
in W are linearly independent. Hence, a basis is a linearly independent spanning set
for V . The number of vectors in W is called the dimension of V . If W is not finite
is size, then we say V is an infinite dimensional vector space.
Comment 16.4.5 In a vector space like n , the maximum size of a set of linearly
independent vectors is n, the dimension of the vector space.
Comment 16.4.6 Let’s look at the vector space C[0, 1], the set of all continuous
functions on [0, 1]. Let W be the set of all powers of t, {1, t, t 2 , t 3 , . . .}. We can
use the derivative technique to show this set is linearly independent even though
it is infinite in size. Take any finite subset from W . Label the resulting powers as
{n 1 , n 2 , . . . , n p }. Write down the linear dependence equation
c1 t n 1 + c2 t n 2 + · · · + c p t n p = 0.
Take n p derivatives to find c p = 0 and then backtrack to find the other constants
are zero also. Hence C[0, 1] is an infinite dimensional vector space. It is also clear
that W does not span C[0, 1] as if this was true, every continuous function on [0, 1]
would be a polynomial of some finite degree. This is not true as sin(t), e−2t and many
others are not finite degree polynomials.
510 16 Series Solutions
Now there is an important result that we use a lot in applied work. If we have an object
u in a Vector Space V , we often want to find to approximate u using an element from
a given subspace W of the vector space. To do this, we need to add another property
to the vector space. This is the notion of an inner product. We already know what
an inner product is in a simple vector space like n . Many vector spaces can have
an inner product structure added easily. For example, in C[a, b], since each object is
continuous, each object is Riemann integrable. Hence, given two functions f and g
b
from C[a, b], the real number given by a f (s)g(s)ds is well-defined. It satisfies all
the usual properties that the inner product for finite dimensional vectors in n does
also. These properties are so common we will codify them into a definition for what
an inner product for a vector space V should behave like.
Definition 16.4.5 (Real Inner Product) Let V be a vector space with the reals as
the scalar field. Then a mapping ω which assigns a pair of objects to a real number
is called an inner product on V if
1. ω(u, v) = ω(v, u); that is, the order is not important for any two objects.
2. ω(c u, v) = cω(u, v); that is, scalars in the first slot can be pulled out.
3. ω(u ⊕ w, v) = ω(u, v) + ω(w, v), for any three objects.
4. ω(u, u) ≥ 0 and ω(u, u) = 0 if and only if u = 0.
These properties imply that ω(u, c v) = cω(u, v) as well. A vector space V with
an inner product is called an inner product space.
Comment 16.4.7 The inner product is usually denoted with the symbol <, > instead
of ω( , ). We will use this notation from now on.
Comment 16.4.8 When we have an inner product, we can measure the size or
magnitude of an object, as follows. We define the analogue of the euclidean norm of
an object u using the usual || || symbol as
√
||u|| = < u, u >.
This is called the norm induced by the inner product of the object. In C[a, b], with
b
the inner product < f, g> = a f (s)g(s)ds, the norm of a function f is thus || f || =
b 2
a f (s)ds. This is called the L 2 norm of f .
It is possible to prove the Cauchy–Schwartz inequality in this more general setting
also.
Theorem 16.4.1 (General Cauchy–Schwartz Inequality)
If V is an inner product space with inner product < , > and induced norm || ||, then
Proof The proof is different than the one you would see in a Calculus text for 2 ,
of course, and is covered in a typical course on beginning linear analysis.
Comment 16.4.9 We can use the Cauchy–Schwartz inequality to define a notion of
angle between objects exactly like we would do in 2 . We define the angle θ between
u and v via its cosine as usual.
< u, v >
cos(θ) = .
||u|| ||v||
Hence, objects can be perpendicular or orthogonal even if we can not interpret them
as vectors in 2 . We see two objects are orthogonal if their inner product is 0.
Comment 16.4.10 If W is a finite dimensional subspace, a basis for W is said to be
an orthonormal basis if each object in the basis has L 2 norm 1 and all of the objects
are mutually orthogonal. This means < ui , u j > is 1 if i = j and 0 otherwise. We
typically let the Kronecker delta symbol δi j be defined by δi j = 1 if i = j and 0
otherwise so that we can say this more succinctly as < ui , u j >= δi j .
∞
iπ iπ
S(x) = b0 + ai sin x + bi cos x
i=1
L L
for any numbers an and bn . Of course, there is no guarantee that this series will
converge at any x! If we start with a function f which is continuous on the interval
[0, L], we can define the trigonometric series associated with f as follows
1
S(x) = < f, 1 >
L
∞
2 iπ iπ 2 iπ iπ
+ f (x), sin x sin x + f (x), cos x cos x .
L L L L L L
i=1
where the symbol < , > is the inner product in the set of functions C[0, L] defined
L
by < u, v >= 0 u(s)v(s)ds. The coefficients in the Fourier series for f are called
the Fourier coefficients of f . Since these coefficients are based on inner products
with scaled sin and cos functions, we call these the normalized Fourier coefficients.
Let’s be clear about this and a bit more specific. The nth Fourier sin coefficient,
n ≥ 1, of f is as follows:
L
2 iπ
an ( f ) = f (x) sin x dx
L 0 L
512 16 Series Solutions
N
nπ 1+n 2 π 2 β 2
A0 e − α1 t
+ An cos x e− αL 2 t .
n=1
L
are like Fourier series although in terms of two variables. We can show these series
converge pointwise for x in [0, L] and all t. We can also show that we can take the
partial derivative of this series solutions term by term (see the discussions in Peterson
(2015) for details) to obtain
N
nπ nπ 1+n 2 π 2 β 2
−An sin x e− αL 2 t .
n=1
L L
This series evaluated at x = 0 and x = L gives 0 and hence the Neumann conditions
are satisfied. Hence, the solution (x, t) given by
N
nπ 1+n 2 π 2 β 2
(x, t) = A0 e − α1 t
+ An cos x e− αL 2 t .
n=1
L
for the arbitrary sequence of constants (An ) is a well-behaved solution on our domain.
The remaining boundary condition is
and
∞
nπ
(x, 0) = A0 + An cos x .
n=1
L
∞
nπ
A0 + An cos x = f (x)
n=1
L
with
1 L
B0 = f (x),
L 0
2 L nπ
Bn = f (x) cos x d x.
L 0 L
Then, setting these series equal, we find that the solution is given by An = Bn for
all n ≥ 0. The full details of all of this is outside the scope of our work here, but this
will give you a taste of how these powerful tools can help us solve PDE models.
Reference
J. Peterson. Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015 in press)
Part VII
Summing It All Up
Chapter 17
Final Thoughts
How can we train people to think in an interdisciplinary manner using these sorts
of tools? It is our belief we must foster a new mindset within practicing scientists.
We can think of this as the development of a new integrative discipline we could
call How It All Fits Together Science. Of course, this name is not formal enough
to serve as a new disciplinary title, but keeping to its spirit, we could use the name
Integrative Science. Now the name Integrative Biology has occurred here and there
in the literature over the last few decades and it is not clear to us it has the right
tone. So, we will begin calling our new integrative point of view Building Scientific
Bridges or B SB.
We would like the typical scientist to have a deeper appreciation of the use of the
triad of biology, mathematics and computer science (BMC) in this attempt to build
bridges between the many disparate areas of biology, cognitive sciences and the
other sciences. The sharp disciplinary walls that have been built in academia hurt
everyone’s chances at developing an active and questing mind that is able to both
be suspicious of the status—quo and also have the tools to challenge it effectively.
Indeed, we have longed believed that all research requires a rebellious mind. If you
revere the expert opinion of others too much, you will always be afraid to forge a
new path for yourself. So respect and disrespect are both part of the toolkit of our
budding scientists.
There is currently an attempt to create a new Astrobiology program at the Uni-
versity of Washington which is very relevant to our discussion. Astrobiology is an
excellent example of how science and a little mathematics can give insight into issues
such as the creation of life and its abundance in the universe. Read its primer edited
by Linda Billings and others, (Billings 2006) for an introduction to this field. Careful
arguments using chemistry, planetary science and many other fields inform what is
probable. It is a useful introduction not only to artificial life issues, but also to the
© Springer Science+Business Media Singapore 2016 517
J.K. Peterson, Calculus for Cognitive Scientists,
Cognitive Science and Technology, DOI 10.1007/978-981-287-877-9_17
518 17 Final Thoughts
process by which we marshal ideas form disparate fields to form interesting models.
Indeed, if you look at the new textbook, Planets and Life: The Emerging Science
of Astrobiology, (Sullivan and Baross 2007) you will be inspired at the wealth of
knowledge such a field must integrate. This integration is held back by students who
are not trained in both BMC and B SB. We paraphrase some of the important points
made by the graduate students enrolled in this new program about interdisciplinary
training and research. Consider what the graduate students in this new program have
said Sullivan and Baross (2007, p. 548):
…some of the ignorance exposed by astrobiological questions reveals not the boundaries
of scientific knowledge, but instead the boundaries of individual disciplines. Furthermore,
collaboration by itself does not address this ignorance, but instead compounds it by encour-
aging scientists to rely on each other’s authority. Thus, anachronistic disciplinary borders
are reinforced rather than overrun. In contrast astrobiology can motivate challenges to dis-
ciplinary isolation and the appeals to authority that such isolation fosters.
Indeed, studying problems that require points of view from many places is of great
importance to our society and from Sullivan and Baross (2007, p. 548) we hear that
many different disciplines should now be applied to a class of questions perceived as
broadly unified and that such an amalgamation justifies a new discipline (or even meta
discipline) such as astrobiology.
We believe, as the astrobiology students (Sullivan and Baross 2007, p. 550) do that
[B SB enriched by BMC] can change this by challenging the ignorance fostered by disci-
plinary structure while pursuing the creative ignorance underlying genuine inquiry. Because
of its integrative questions, interdisciplinary nature,…, [B SB enriched by BMC] emerges
as an ideal vehicle for scientific education at the graduate, undergraduate and even high
school levels. [It] permits treatment of traditionally disciplinary subjects as well as areas
where those subjects converge (and, sometimes, fall apart!) At the same time, [it] is well
suited to reveal the creative ignorance at scientific frontiers that drives discovery.
17.1 Fostering Interdisciplinary Appreciation 519
• All mathematical concepts are tied to real biological and scientific need. Hence,
after preliminary calculus concepts have been discussed, there is a careful devel-
opment of the exponential and natural logarithm functions. This is done in such
a way that all properties are derived so there are no mysteries, no references to
“this is how it is” and we don’t go through the details because the mathematical
derivation is beyond you. We insist that learning to think with the language of
mathematics is an important skill. Once the exponential function is understood,
it is immediately tied to the simplest type of real biological model: exponential
growth and decay.
• Mathematics is subordinate to the science in the sense that the texts build the math-
ematical knowledge needed to study interesting nonlinear biological and cognitive
models. We emphasize that to add more interesting science always requires more
difficult mathematics and concomitant intellectual resources.
• Standard mathematical ideas from linear algebra such as eigenvalues and eigen-
vectors can be used to study systems of linear differential equations using both
numerical tools (MatLab based currently) and graphical tools. We always stress
how the mathematics and the biology interact.
• Nonlinear models are important to study scientific things. We begin with the logis-
tics equation and progress to the Predator–Prey model and a standard SIR disease
model. We emphasize how we must abstract out of biological and scientific com-
plexity the variables necessary to build the model and how we can be wrong. We
show how the original Predator–Prey model, despite being a gross approximation
of the biological reality of fish populations in the sea, gives great explanatory
insight. Then, we equally stress that adding self-interaction to the model leads to
erroneous biological predictions. The SIR disease model is also clearly explained
as a gross simplification of the immunological reality which nevertheless illumi-
nates the data we see.
• Difficult questions can be hard to formulate quantitatively but if we persevere, we
can get great insights. We illustrate this point of view with a model of cancer which
requires 6 variables but which is linear. We show how much knowledge is both used
and thrown away when we make the abstractions necessary to create the model.
At this point, the students also know that models can easily have more variables
in them than we can graph comfortably. Also, numerical answers delivered via
MatLab are just not very useful. We need to derive information about the functional
relationships between the parameters of the model and that requires a nice blend
of mathematical theory and biology.
The use of BMC is therefore fostered in this text and its earlier companion by
introducing the students to concepts from the traditional Math-Two-Engineering (just
a little as needed), Linear Algebra (a junior level mathematics course) and Differential
520 17 Final Thoughts
Indeed, there is more that can be said (Sullivan and Baross 2007, p. 552)
What [B SB enriched by BMC] can mean as a science and discipline is yet to be decided,
for it must face the two-fold challenge of cross-disciplinary ignorance that disciplinary
education itself enforces. First, ignorance cannot be skirted by deferral to experts, or by
other implicit invocations of the disciplinary mold that [B SB enriched by BMC] should
instead critique. Second, ignorance must actually be recognized. This is not trivial: how do
you know what you do not know? Is it possible to understand a general principle without
also understanding the assumptions and caveats underlying it? Knowledge superficially
“understood” is self-affirming. For example, the meaning of the molecular tree of life may
appear unproblematic to an astronomer who has learned that the branch lengths represent
evolutionary distance, but will the astronomer even know to consider the hidden assumptions
about rate constancy by which the tree is derived? Similarly, images from the surface of Mars
showing evidence of running water are prevalent in the media, yet how often will a biologist
be exposed to alternative explanations for these geologic forms, or to the significant evidence
to the contrary? [There is a need] for a way to discriminate between science and … uncritically
accepted results of science.
A first attempt at developing a first year curriculum for the graduate program in
astrobiology led to an integrative course in which specialists from various disciplines
germane to the study of astrobiology gave lectures in their own areas of expertise
and then left as another expert took over. This was disheartening to the students in
the program. They said that (Sullivan and Baross 2007, p. 553)
As a group, we realized that we could not speak the language of the many disciplines
in astrobiology and that we lacked the basic information to consider their claims critically.
Instead, this attempt at an integrative approach provided only a superficial introduction to
the major contributions of each discipline to astrobiology. How can critical science be built
on a superficial foundation? Major gaps in our backgrounds still needed to be addressed. In
addition, we realized it was necessary to direct ourselves toward a more specific goal. What
types of scientific information did we most need? What levels of mastery should we aspire
to? At the same time, catalyzed by our regular interactions in the class, we students realized
that we learned the most (and enjoyed ourselves the most) in each other’s interdisciplinary
company. While each of us had major gaps in our basic knowledge, as a group we could
begin to fill many of them.
17.1 Fostering Interdisciplinary Appreciation 521
Given the above, it is clear we must introduce and tie disparate threads of material
together with care. In many things, we favor theoretical approaches that attempt to
put an overarching theory of everything together into a discipline. Since biology and
cognitive science is so complicated, we will always have to do this. You can see
examples of this way of thinking explored in many areas of current research. For
example, it is very hard to understand how organs form in the development process. A
very good book on these development issues is first the one by Davies (2005). Davis
talks about the complexity that emerges from local interactions. This is, of course,
difficult to define precisely, yet, it is a useful principle that helps us understand the
formation of a kidney and other large scale organs. Read this book and then read the
one by Schmidt-Rhaesa (2007) and you will see that the second book makes more
sense given the ideas you have mastered from Davis.
Finally, it is good to always keep in mind (Sullivan and Baross 2007, p. 552)
Too often, the scientific community and academia facilely redress ignorance by appealing
to the testimony of experts. This does not resolve ignorance. It fosters it.
This closing comment sums it up nicely. We want to give the students and ourselves
an atmosphere that fosters solutions and our challenge is to find a way to do this.
So how do we become a good user of the three fold way of mathematics, science
and computer tools? We believe it is primarily a question of deep respect for the
balance between these disciplines. The basic idea is that once we abstract from
biology or some other science how certain quantities interact, we begin to phrase
these interactions in terms of mathematics. It is very important to never forget that
once the mathematical choices have been made, the analysis of the mathematics
alone will lead you to conclusions which may or may not be biologically relevant.
You must always be willing to give up a mathematical model or a computer science
model if it does not lead to useful insights into the original science.
We can quote from Randall (2005, pp. 70–71). She works in Particle Physics
which is the experimental arm that gives us data to see if string theory or loop gravity
is indeed a useful model of physical reality. It is not necessary to know physics here
to get the point. Notice what she says:
The term “model” might evoke a small scale battleship or castle you built in your child-
hood. Or you might think of simulations on a computer that are meant to reproduce known
dynamics—how a population grows, for example, or how water moves in the ocean. Model-
ing in particle physics is not the same as either of these definitions. Particle physics models
are guesses at alternate physical theories that might underlie the standard model…Different
522 17 Final Thoughts
assumptions and physical concepts distinguish theories, as do the distance or energy scales
at which a theory’s principles apply. Models are a way at getting at the heart of such dis-
tinguishing features. They let you explore a theory’s potential implications. If you think of
a theory as general instructions for making a cake, a model would be a precise recipe. The
theory would say to add sugar, a model would specify whether to add half a cup or two cups.
Now substitute Biological for Particle Physics and so forth and you can get a feel for
what a model is trying to do. Of course, biological models are much more complicated
than physics ones!
The primary message of this course is thus to teach you to think deeply and
carefully. The willingness to attack hard problems with multiple tools is what we
need in young scientists. We hope this course teaches you a bit more about that.
We believe that we learn all the myriad things we need to build reasonable models
over a lifetime of effort. Each model we design which pulls in material from disparate
areas of learning enhances our ability to develop the kinds of models that give insight.
As Pierrehumbert (2010, p. xi) says about climate modeling
When it comes to understanding the whys and wherefores of climate, there is an infinite
amount one needs to know, but life affords only a finite time in which to learn it…It is a
lifelong process. [We] attempt to provide the student with a sturdy scaffolding upon which
a deeper understanding may be built later.
The climate system [and other biological systems we may study] is made up of building
blocks which in themselves are based on elementary…principles, but which have surprising
and profound collective behavior when allowed to interact on the [large] scale. In this sense,
the “climate game” [the biological modeling game] is rather like the game of Go, where
interesting structure emerges from the interaction of simple rules on a big playing field,
rather than complexity in the rules themselves.
References
To learn more about how the combination of mathematics, computational tools and
science have been used with profit, we have some favorite books that have tried to
do this. Reading these books have helped us design biologically inspired algorithms
and models and has inspired how we wrote this text! A selection of these books
would include treatment of regulatory systems in genomics such as Davidson (2001,
2006). An interesting book on how the architecture of genomes may have developed
which gives lots of food for thought for model abstraction is found in Lynch (2007).
Since the study of how to build cognitive systems is our big goal, the encyclopediac
treatment of how nervous systems evolved given in J. Kass’s four volumes (Kaas and
Bullock 2007a, b; Kaas and Krubitzer 2007; Kaas and Preuss 2007) is very useful.
To build a cognitive system that would be deployable as an autonomous device in a
robotic system requires us to think carefully about how to build a computer system
using computer architectural choices and hardware that has some plasticity. Clues as
to how to do this can be found by looking at how simple nervous systems came to
be. After all, the complexity of the system we can build will of necessity be fair less
than that of a real neural system that controls a body; hence, any advice we can get
from existing biological systems is welcome! With that said, as humans we process
a lot of environmental inputs and the thalamus plays a vital role in this. If you read
Murray Sherman’s discussion of the thalamus in Sherman and Guillery (2006), you
will see right away how you could begin to model some of the discussed modules in
the thalamus which will be useful in later work in the next two volumes.
Let’s step back now and talk about our ultimate goal which is to study cognitive
systems. You now have two self study texts under your belt (ahem, Peterson 2015a, b),
and you have been exposed to a fair amount of coding in MatLab. You have seen a
programming language is just another tool we can use to gain insight into answering
very difficult questions. The purpose of this series of four books (the next ones are
Peterson 2015c, d) are to prime you to always think about attacking complicated and
difficult questions using a diverse toolkit. You have been trained enough now that
you can look at all that you read and have a part of you begin the model building
process for the area you are learning about in your reading.
We believe firmly that the more you are exposed to ideas from multiple disciplines,
the better you will be able to find new ways to solve our most challenging problems.
Using a multidisciplinary toolkit does not always go over big with your peers, so
you must not let that hold you back. A great glimpse into how hard it is to get ideas
published (even though they are great) is to look at how this process went for others.
We studied Hamilton’s models of altruism in the previous book, so it might surprise
you to know he had a hard time getting them noticed. His biography (Segerstrale
2013) is fascinating reading as is the collection of stories about biologists who thought
differently (Harman and Dietrich 2013). Also, although everyone knows who Francis
Crick was, reading his biography (Olby 2009) gives a lot of insight into the process
of thinking creatively outside of the established paradigms. Crick believed, as we do,
that theoretical models are essential to developing a true understanding of a problem.
The process of piecing together a story or model of a biological process indirectly
from many disparate types of data and experiments is very hard. You can learn much
about this type of work by reading carefully the stories of people who are doing
this. Figuring out what dinosaurs were actually like even though they are not alive
now requires that we use a lot of indirect information. The book on trace fossils (i.e.
footprints and coprolites—fossilized poop, etc.) Martin (2014) shows you how much
can be learned from those sources of data. In another reconstruction process, we know
enough now about how hominids construct their bodies; i.e. how thick the fat is on top
of muscle at various places on the face and so forth, to make a good educated guess at
the building a good model of extinct hominids such as Australopithecus. This process
is detailed in Gurchie (2013). Gurchie is an artist and it is fascinating to read how he
uses the knowledge scientists can give him and his own creative insight to create the
stunning models you can see at the Smithsonian now in the Hall of Humanity. We
also know much more about how to analyze DNA sample found in very old fossils
and the science behind how we can build a reasonable genome for a Neanderthal as
detailed in Pääbo (2014) is now well understood. Read Martin’s Gurchie’s, Pääbo’s
discussions and apply what you have been learning about mathematics, computation
and science to what they are saying. You should be seeing it all come together much
better now.
Since our ultimate aim is to build cognitive models with our tools, it is time for
you to read some about the brains that have been studied over the years. Kaas’s books
mentioned about nice, but almost too intense. Study some of these others and apply
the new intellectual lens we have been working to develop.
• Stiles (2008) has written a great book on brain development which can be read in
conjunction with another evolutionary approach by Streidter (2005). Read them
both and ask yourself how you could design and implement even a minimal model
that can handle such modular systems interacting in complex ways. Learning the
right language to do that is the province of what we do in Peterson (2015c, d),
so think of this reading as wetting your appetite! Another lower level and a bit
more layman in tone is the nice text by Allman (2000) which is good to read as
a overview of what the others are saying. The last one you should tackle is Allen
18 Background Reading 527
(2009) book about the evolution of the brain. Compare and contrast what all these
books are saying: learn to think critically!
• On a slightly different note, Taylor (2012) tells you about the various indirect ways
we use to measure what is going on in a functioning neural system. At the bottom
of all these tools is a lot of mathematics, computation and science which you are
much better prepared to read about now.
• It is very hard to understand how to model altruism as we have found out in
this book. It is also very hard to model quantitatively sexual difference. As you
are reading about the brain in the books we have been mentioning, you can dip
into Jordan-Young (2010) to see just how complicated the science behind sexual
difference is and how difficult it is to get good data. Further, there are many
mysteries concerning various aspects of human nature which are detailed in Barash
(2012). Ask yourself, how would you build models of such things? Another really
good read here on sex is by Ridley (1993) which is on sex and how human nature
evolved. It is a bit old now, but still very interesting.
• Once you have read a bit about the brain in general and from an evolutionary
viewpoint, you are ready to learn more about how our behavior is modulated by
neurotransmitters and drugs. S. Snyder has a nice layman discussion of those ideas
in Snyder (1996).
To understand cognition in some ways to try to understand how biological
processes evolved to handle information flow. Hence, reading about evolution in
general—going beyond what we touched on in our discussions of altruism—is a
good thing. An important part of our developmental chain is controlled by stem cells
and a nice relatively non technical introduction to those ideas is in Fox (2007) which
is very interesting also because it includes a lot of discussion on how stem cell ma-
nipulation is a technology we have to come to grips with. A general treatment of
human development can be found in Davies (2014) which you should read as well.
Also, the ways in which organ development is orchestrated by regulatory genes helps
you again see the larger picture in which many modules of computation interact. You
should check out A. Schmidt-Raisa’s book on how organ systems evolved as well as
J. Davies’s treatment of how an organism determines its shape in Davies (2005).
The intricate ways that genes work together is hard to model and we must
make many abstractions and approximations to make progress. G. Wagner has done
a terrific job of explaining some of the key ideas in evolutionary innovation in
Wagner (2014) and J. Archibald shows how symbiosis is a key element in evolu-
tion in Archibald (2014).
Processing books like these will help hone your skills and acquaint you with lots
of material outside of your usual domain. This is a good thing! So keep reading and
we hope you join us in the next volumes!
All of these readings have helped us to see the big picture!
528 18 Background Reading
References
J. Allen, The Lives of the Brain: Human Evolution and the Organ of Mind (The Belknap Press of
Harvard University Press, Cambridge, 2009)
J. Allman, Evolving Brains (Scientific American Library, New York, 2000)
J. Archibald, One PlusOne Equals One (Oxford University Press, Oxford, 2014)
D. Barash, Homo Mysterious: Evolutionary Puzzles of Human Nature (Oxford University Press,
Oxford, 2012)
E. Davidson, Genomic Regulatory Systems: Development and Evolution (Academic Press, San
Diego, 2001)
E. Davidson, The Regulatory Genome: Gene Regulatory Networks in Development and Evolution
(Academic Press Elsevier, Burlington, 2006)
J. Davies, Mechanisms of Morphogenesis (Academic Press Elsevier, Boston, 2005)
J. Davies, Life Unfolding: How the Human Body Creates Itself (Oxford University Press, Oxford,
2014)
C. Fox, Cell of Cells: The Global Race To Capture and Control the Stem Cell (W. H. Norton and
Company, New York, 2007)
J. Gurchie, Shaping Humanity: How Science, Art, and Imagination Help Us Understand Our Origins
(Yale University Press, New Haven, 2013)
O. Harman, M. Dietrich, Outsider Scientists: Routes to Innovation in Biology (University of Chicago
Press, Chicago, 2013)
R. Jordan-Young, Brainstorm: The Flaws in the Science of Sex Differences (Harvard University
Press, Brainstorm: The Flaws in the Science of Sex Differences, 2010)
J. Kaas, T. Bullock, (eds). Evolution of Nervous Systems: A Comprehensive Reference Editor J.
Kaas (Volume 1: Theories, Development, Invertebrates). (Academic Press Elsevier, Amsterdam,
2007a)
J. Kaas, T. Bullock, (eds). Evolution of Nervous Systems: A Comprehensive Reference Editor J.
Kaas (Volume 2: Non-Mammalian Vertebrates). (Academic Press Elsevier, Amsterdam, 2007b)
J. Kaas, L. Krubitzer, (eds). Evolution of Nervous Systems: A Comprehensive Reference Editor J.
Kaas (Volume 3: Mammals). (Academic Press Elsevier, Amsterdam, 2007)
J. Kaas, T. Preuss, (eds). Evolution of Nervous Systems: A Comprehensive Reference Editor J. Kaas
(Volume 4: Primates). (Academic Press Elsevier, Amsterdam, 2007)
M. Lynch, The Origins of Genome Architecture (Sinauer Associates, Inc., Sunderland, 2007)
A. Martin, Dinosaurs Without Bones: Dinosaur Lives Revealed By Their Trace Fossils (Pegasus
Books, New York, 2014)
S. Murray Sherman, R. Guillery, Exploring The Thalamus and Its Role in Cortical Function (The
MIT Press, Cambridge, 2006)
R. Olby, Francis Crick: Hunter of Life’s Secrets (ColdSpring Harbor Laboratory Press, New York,
2009)
S. Pääbo, Neanderthal Man: In Search of Lost Genomes (Basic Books, New York, 2014)
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd, Singapore, 2015a in press)
J. Peterson, Calculus for Cognitive Scientists: Higher Order Models and Their Analysis, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015b in press)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015c in press)
J. Peterson, BioInformation Processing: A Primer On Computational Cognitive Science, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015d in press)
References 529
M. Ridley, The Red Queen: Sex and the Evolution of Human Nature (Harper Perennial, New York,
1993)
U. Segerstrale, Nature’s Oracle: The Life and Work of W.D. Hamilton (Oxford University Press,
Oxford, 2013)
S. Snyder, Drugs and the Brain (Scientific American Library, New York, 1996)
J. Stiles, The Fundamentals of Brain Development (Harvard University Press, Cambridge, 2008)
G. Streidter, Principles of Brain Evolution (Sinauer Associates, Inc., Sunderland, 2005)
K. Taylor, The Brain Supremacy: Notes from the frontiers of neuroscience (Oxford University Press,
Oxford, 2012)
G. Wagner, Homology, Genes and Evolutionary Innovation (Princeton University Press, Oxford,
2014)
Glossary
what is called selective advantage. This means that the size of the population
size does matter. Using these assumptions, we therefore model û 2 and û 3 as
û 2 = N u 2
and
û 3 = N u 3 .
where u 2 and u 3 are neutral rates. The mathematical model is then setup as follows.
Let
X 0 (t) is the probability a cell in cell type A+/+ at time t.
X 1 (t) is the probability a cell in cell type A+/− at time t.
X 2 (t) is the probability a cell in cell type A−/− at time t.
Y0 (t) is the probability a cell in cell type A+/+ C I N at time t.
Y1 (t) is the probability a cell in cell type A+/− C I N at time t.
Y2 (t) is the probability a cell in cell type A−/− C I N at time t.
We can then derive rate equations to be
X 0 = −(u 1 + u c ) X 0
X 1 = u 1 X 0 − (u c + N u 2 ) X 1
X 2 = N u 2 X 1 − u c X 2
Y0 = u c X 0 − u 1 Y0
Y1 = u c X 1 + u 1 Y0 − N u 3 Y1
Y2 = N u 3 Y1 + u c X 2
We are interested in analyzing this model over a typical human life span of 100
years, p. 406.
assume that er t is a solution and try to find what values of r might work. We see
for u(t) = er t , we find
Glossary 533
0 = a r 2 + b r + c.
The roots of the quadratic equation above are the only values of r that will work
as the solution er t . We call this quadratic equation the characteristic equation for
this differential equation, p. 152.
Characteristic Equation of the linear system For a linear system
x (t) x(t)
=A
y (t) y(t)
x(0) x
= 0 .
y(0) y0
assume the solution is V er t for some nonzero vector V . This implies that r and
V must satisfy
0
rI−A Ve =
rt
.
0
The only way we can get nonzero vectors V as solutions is to choose the values
of r so that
det r I − A = 0.
The second order polynomial we obtain is called the characteristic equation as-
sociated with this linear system. Its roots are called the eigenvalues of the system
and any nonzero vector V associated with an eigenvalue r is an eigenvector for
r , p. 178.
Complex number This is a number which has the form of a + b i where a and
b are arbitrary real numbers and the letter i represents a very abstract concept: a
number whose square i 2 = −1! We usually draw a complex number in a standard
534 Glossary
x–y Cartesian plane with the y axis labeled as i y instead of the usual y. Then
the number 5 + 4 i would be graphed just like the two dimensional coordinate
(4, 5), p. 141.
Continuity A function f is continuous at a point p if for all positive tolerances
, there is a positive δ so that | f (t) − f ( p) |< if t is in the domain of f and
| t − p |< δ. You should note continuity is something that is only defined at a
point and so functions in general can have very few points of continuity. Another
way of defining the continuity of f at the point p is to say the limt → p f (t) exists
and equals f ( p), p. 129.
f (t) − f ( p)
| − L |< if t is in the domain of f and | t − p |< δ
t−p
You should note differentiability is something that is only defined at a point and
so functions in general can have very few points of differentiability. Another way
of defining the differentiability of f at the point p is to say the limt → p f (t)− f ( p)
t− p
exists. At each point p where this limit exists, we can define a new function called
the derivative of f at p. This is usually denoted by f ( p) or ddtf ( p), p. 129.
Exponential growth Some biological systems can be modeled using the idea of
exponential growth. This means the variable of interest, x, has growth proportional
to its rate of change. Mathematically, this means x ∝ r x for some proportionality
constant r , p. 149.
Half life The amount of time it takes a substance x to lose half its original value
under exponential decay. It is denoted by t1/2 and can also be expressed as t1/2 =
ln(2)/r where r is the decay rate in the differential equation x (t) = −r x(t),
p. 149.
Infectious Disease Model Assume the total population we are studying is fixed
at N individuals. This population is then divided into three separate pieces: we
have individuals
• that are susceptible to becoming infected are called Susceptible and are labeled
by the variable S. Hence, S(t) is the number that are capable of becoming
infected at time t.
• that can infect others. They are called Infectious and the number that are in-
fectious at time t is given by I (t).
• that have been removed from the general population. These are called Removed
and their number at time t is labeled by R(t).
We make a number of key assumptions about how these population pools interact.
• Individuals stop being infectious at a positive rate γ which is proportional to the
number of individuals that are in the infectious pool. If an individual stops being
infectious, this means this individual has been removed from the population.
This could mean they have died, the infection has progressed to the point where
they can no longer pass the infection on to others or they have been put into
quarantine in a hospital so that further interactions with the general population
is not possible. In all of these cases, these individuals are not infectious or can’t
cause infections and so they have been removed from the part of the population N
which can be infected or is susceptible. Mathematically, this means we assume
I = −γ I.
S = r S I.
536 Glossary
We can then figure out the net rates of change of the three populations. The
infectious populations gains at the rate r S I and loses at the rate γ I . Hence
I = r S I − γ I.
The net change of Susceptible’s is that of simple decay. Susceptibles are lost at
the rate r S I . Thus, we have
S = − r S I.
Finally, the removed population increases at the same rate the infectious population
decreases. We have
R = γ I.
We also know that R(t) + S(t) + I (t) = N for all time t because our population
is constant. So only two of the three variables here are independent. We typically
focus on the variables I and S for that reason. Our complete Infectious Disease
Model is then
I = r S I − γ I
S = −r S I
I (0) = I0
S(0) = S0 .
of insulin plus the others and we let H denote the net hormone concentration. At
normal conditions, call this concentration H0 . Under close to normal conditions,
the interaction of the one hormone insulin with blood glucose completely dom-
inates the net hormonal activity; so normal blood sugar levels primarily depend
on insulin–glucose interactions. Hence, if insulin increases from normal levels, it
increases net hormonal concentration to H0 + H and decreases glucose blood
concentration. On the other hand, if other hormones such as cortisol increased
from base levels, this will make blood glucose levels go up. Since insulin domi-
nates all activity at normal conditions, we can think of this increase in cortisol as
a decrease in insulin with a resulting drop in blood glucose levels. A decrease in
insulin from normal levels corresponds to a drop in net hormone concentration to
H0 −H. Now let G denote blood glucose level. Hence, in our model an increase
in H means a drop in G and a decrease in H means an increase in G! Note our
lumping of all the hormone activity into a single net activity is very much like how
we modeled food fish and predator fish in the predator–prey model. We describe
the model as
where the function J is the external rate at which blood glucose concentration is
being increased in a glucose tolerance test. There are two nonlinear interaction
functions F1 and F2 because we know G and H have complicated interactions.
Let’s assume G and H have achieved optimal values G 0 and H0 by the time the
fasting patient has arrived at the hospital. Hence, we don’t expect to have any
contribution to G (0) and H (0); i.e. F1 (G 0 , H0 ) = 0 and F2 (G 0 , H0 ) = 0. We
are interested in the deviation of G and H from their optimal values G 0 and H0 ,
so let g = G − G 0 and h = H − H0 . We can then write G = G 0 + g and
H = H0 + h. The model can then be rewritten as
g (t) = F1 (G 0 + g, H0 + h) + J(t)
h (t) = F2 (G 0 + g, H0 + h)
and we can then approximate these dynamics using tangent plane approximations
to F1 and F2 giving
∂ F1 ∂ F1
g (t) ≈ (G 0 , H0 ) g + (G 0 , H0 ) h + J(t)
∂g ∂h
∂ F2 ∂ F2
h (t) ≈ (G 0 , H0 ) g + (G 0 , H0 ) h
∂g ∂h
It is this linearized system of equations we can analyze to give some insight into
how to interpret the results of a glucose tolerance test, p. 475.
538 Glossary
Linear Second Order Differential Equations These have the general form
We typically call the coefficient matrix A so that the system is, p. 171.
x (t) x(t)
= A
y (t) y(t)
x(0) x
= 0 ,
y(0) y0
1. The food population grows exponentially. Letting xg denote the growth rate of
the food, we have
xg = a x
xd = −b x y
x = a x − b x y.
for some positive constants a and b. We then make assumptions about the predators
as well.
1. Predators naturally die following an exponential decay; letting this decay rate
be given by yd , we have
yd = −c y
yg = d x y
y = −c y + d x y.
for some positive constants c and d. The full model is thus, p. 292.
540 Glossary
x = a x − b x y
y = −c y + d x y
x(0) = x0
y(0) = y0 ,
Predator–Prey Self Interaction The original Predator–Prey model does not in-
clude self-interaction terms. These are terms that model how the food and predator
populations interaction with themselves. We can model these effects by assuming
their magnitude is proportional to the interaction. Mathematically, we assume
these are both decay terms giving us
xsel f = −e x x
ysel f = − f y y.
for positive constants e and f . We are thus led to the new self-interaction model
given below, p. 355:
• proteins are destroyed by other proteins in the cell. Call this rate of destruction
αdes .
• the concentration of protein in the cell goes down because the cell grows and
therefore its volume increases. Protein is usually measured as a concentration
and the concentration goes down as the volume goes up. Call this rate αd i l —the
dil is for dilation.
The net or total loss of protein is called α and hence
α = αdes + αd i l
The net rate of change of the protein concentration is then our familiar model
dY ∗
= β − αY ∗
dt
constant growth loss term
We usually do not make a distinction between the gene Y and its transcribed
protein Y ∗ . We usually treat the letters Y and Y ∗ as the same even though it is not
completely correct. Hence, we just write as our model
Y = β − α Y
Y (0) = Y0
and then solve it using the integrating factor method even though, strictly speaking,
Y is the gene!, p. 149.
y (t) = −α y(t) + β
y( 0 ) = y0
the time it takes the solution to go from its initial concentration y0 to a value
halfway between the initial amount and the steady state value is called the re-
sponse time. It is denoted by tr and tr = ln(2)/α so it is functionally the same as
the half life in an exponential decay model, p. 149.
542 Glossary
The collection of points from the interval [a, b] is called a Partition of [a, b] and is
denoted by some letter—here we will use the letter P. So if we say P is a partition
of [a, b], we know it will have n + 1 points in it, they will be labeled from t0 to
tn and they will be ordered left to right with strict inequalities. But, we will not
know what value the positive integer n actually is. The simplest Partition P is the
two point partition {a, b}. Note these things also:
S( f, P, E) = f (si ) (ti+1 − ti )
i∈P
Glossary 543
and we just remember that the choice of P will determine the size of n. Each
partition P has a maximum subinterval length—let’s use the symbol || P || to
denote this length. We read the symbol || P || as the norm of P. Each par-
tition P and evaluation set E determines the number S( f, P, E) by a simple
calculation. So if we took a collection of partitions P1 , P2 and so on with associ-
ated evaluation sets E 1 , E 2 etc., we would construct a sequence of real numbers
{S( f, P1 , E 1 ), S( f, P2 , E 2 ), . . . , S( f, Pn , E n ), . . .}. Let’s assume the norm of the
partition Pn gets smaller all the time; i.e. limn → ∞ || Pn ||= 0. We could then
ask if this sequence of numbers converges to something. What if the sequence
of Riemann sums we construct above converged to the same number I no matter
what sequence of partitions whose norm goes to zero and associated evaluation
sets we chose? Then, we would have that the value of this limit is independent of
the choices above. This is what we mean by the Riemann Integral of f on the
interval [a, b]. If there is a number I so that
lim S( f, Pn , E n ) = I
n→∞
∂2 ∂
β2 −−α = 0, for 0 ≤ x ≤ L , t ≥ 0,
∂x 2 ∂t
∂
(0, t) = 0,
∂x
∂
(L , t) = 0,
∂x
(x, 0) = f (x).
The domain is the usual half infinite [0, L] × [0, ∞) where the spatial part of
the domain corresponds to the length of the dendritic cable in an excitable nerve
cell. We won’t worry too much about the details of where this model comes from
as we will discuss that in another volume. The boundary conditions u x (0, t) =
0 and u x (L , t) = 0 are called Neumann Boundary conditions. The conditions
u(0, t) = 0 and u(L , t) = 0 are known as Dirichlet Boundary conditions. One
way to find the solution is to assume we can separate the variables so that we can
write (x, t) = u(x)w(t). We assume a solution of the form (x, t) = u(x)w(t)
and compute the needed partials. This leads to a the new equation
d 2u dw
β2 w(t) − u(x)w(t) − αu(x) = 0.
dx2 dt
Rewriting, we find for all x and t, we must have
2
2d u dw
w(t) β − u(x) = αu(x) .
dx2 dt
This tells us
2
β 2 dd xu2 − u(x) α dw
= dt , 0 ≤ x ≤ L , t > 0.
u(x) w(t)
The only way this can be true is if both the left and right hand side are equal
to a constant that is usually called the separation constant . This leads to the
decoupled equations
dw
α = w(t), t > 0,
dt
d 2u
β 2 2 = (1 + ) u(x), 0 ≤ x ≤ L ,
dx
Glossary 545
du
(0) = 0
dx
du
(L) = 0.
dx
This gives us a second order ODE to solve in x and a first order ODE to solve in
t. We have a lot of discussion about this in the text which you should study. In
general, we find there is an infinite family of solutions that solve these coupled
ODE models which we can label u n (x) and wn (t). Thus, any finite combination
N
n (x, t) = n=0 an u n (x)wn (t) will solve these ODE models, but we are still
left with satisfying the last condition that (x, 0) = f (x). We do this by finding
a series solution. We can show that the data function f can be written as a series
f (x) = ∞ n=0 bn u n (x) for a set of constants {b0 , b1 , . . .} and we can also show
that the series (x, t) = ∞ n=0 an u n (x)wn (t) solves the last boundary condition
(x, 0) = ∞ a
n=0 n nu (x)w n (0) = f (x) as long as we choose an = bn for all n.
The idea of a series and the mathematical machinery associated with that takes a
while to explain, so Chap. 16 is devoted to that, p. 495.
Tangent plane error We can characterize the error made when a function of
two variables is replaced by its tangent plane at a point better if we have ac-
cess to the second order partial derivatives of f . The value of f at the point
(x0 + x, y0 + y) can be expresses as follows:
where c between 0 and 1 so that the tangent plane error is given by, p. 435
1 x T x
E(x0 , y0 , x, y) = H(x0 + cx, y0 + cy) ,
2 y y
Index
The Infinite Series Solution, 512 needed ideas: functions satisfying Lip-
The Separated ODEs, 496 schitz conditions, 74
The Separation Constant, 496 Traditional engineering calculus course, 3
The Heat Equation Training Interdisciplinary Students
The Family Of Nonzero Solutions for Fostering the triad: biology, mathematics
u, Separation Constant is −1, 498 and computational, 520
Series Overcoming disciplinary boundaries,
Trigonometric Series, 511 517
Convergence of a sequence of sums of
functions, 502
Finite sums of sine and cosine functions, V
501 Vector
Infinite Series notation for the remainder Angle Between A Pair Of Two Dimen-
∞
after n terms, i=n+1 ai sin iπ x
L 0 ,
sional Vectors, 16
Inner product of two vectors, 12
502 n Dimensional Cauchy Schwartz Theo-
Partial sums, 502 rem, 16
Writing limit S(x0 ) using the series no- Orthogonal, 26
∞
tation i=1 , 502 Two Dimensional Cauchy Schwartz The-
orem, 15
Two Dimensional Collinear, 15
T What does the inner product mean?, 14
The Inverse of the 2 × 2 matrix A, 40
The Inverse of the matrix A, 39
Theorem
W
Cauchy–Schwartz Inequality, 510
Worked Out Solutions, 20
Chain Rule For Two Variables, 109
Adding fishing rates to the predator–prey
Cramer’s Rule, 28
model, 339
Differentiability Implies Continuous For
Angle Between, 18
Two Variables, 107
Collinearity Of, 20, 21
Error Estimates For Euler’s Method, 76
Complex Functions
Linear ODE Systems
Drawing, 200 e(−1+2i)t , 147
MultiVariable Calculus e(−2+8i)t , 147
Nonzero values occur in neighbor- e2it , 147
hoods, 120 Complex Numbers
Second Order Test for Extrema, 121 −2 + 8i, 144
Rolle’s Theorem, 62 z = 2 + 4i, 143
Taylor Polynomials Convert to a matrix–vector system, 172
FirstOrder, 66, 68 Converting into a matrix–vector system,
Zeroth Order, 63 172
The Mean Value Theorem, 62 Converting to a matrix - vector system,
Three Functions are Linearly Indepen- 239, 240
dent if and only if their Wronskian is Converting to a vector - matrix system,
not zero, 506 239
Two Functions are Linearly Independent Cramer’s Rule, 28, 29
if and only if their Wronskian is not Deriving the characteristic equation, 179,
zero, 505 180
Time Dependent Euler’s Method Determine Consistency, 33
error estimates analysis, 76 Euler’s Method, 79, 80
finding bounds for | x (t) | on the time Inner Product, 17
domain, 75 Integration
By Parts
10
needed ideas: exponential function (2t−3)(8t+5) dt, 137
5
bounds, 74 (t+3)(t−4) dt, 136
556 Index
6
dt, 138 Linear second order ODE: Characteristic
(4−t)(9+t)
ln(t)dt, 130 equation distinct roots case, derivation,
3 2 154
t sin(t)dt, 133
17 −6 Linear second order ODE: Solution com-
dt, 138
4 (t−2)(2t+8) plex roots case, 164
tt ln(t)dt, 131 Linear second order ODE: Solution dis-
te2 dt, 131 tinct roots case, 154
t 2 sin(t)dt,
t dt, 132
133 Linear second order ODE: Solution re-
3t e peated roots case, 159
t ln(t)dt, 131
Matrix Vector Equation, 23, 24
Linear second order ODE: Characteris-
tic equation complex roots case, deriva- Partial Derivatives, 95
tion, 164 Solving numerically, 249
Linear second order ODE: Characteris- Solving The Consistent System, 37
tic equation derivation repeated roots Solving the IVP completely, 186, 189
case, 159 Solving the Predator–Prey model, 336