Main
Main
Front Matter 7
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Creative Commons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
To The Student . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
To the Instructor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1 Preliminary Topics 13
1.1 What Is Numerical Analysis? . . . . . . . . . . . . . . . . . . . . 13
1.2 Arithmetic in Base 2 . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Floating Point Arithmetic . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Approximating Functions . . . . . . . . . . . . . . . . . . . . . . 23
1.5 Approximation Error with Taylor Series . . . . . . . . . . . . . . 32
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2 Algebra 45
2.1 Intro to Numerical Root Finding . . . . . . . . . . . . . . . . . . 45
2.2 The Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 The Regula Falsi Method . . . . . . . . . . . . . . . . . . . . . . 58
2.4 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.5 The Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.7 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3 Calculus 81
3.1 Intro to Numerical Calculus . . . . . . . . . . . . . . . . . . . . . 81
3.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.5 Calculus with numpy and scipy . . . . . . . . . . . . . . . . . . . 119
3.6 Least Squares Curve Fitting . . . . . . . . . . . . . . . . . . . . . 125
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3
4 CONTENTS
Resources
• HTML Version of this book: https://NumericalMethodsSullivan.github.io
• PDF Version of this book: https://github.com/NumericalMethodsSulliva
n/NumericalMethodsSullivan.github.io/blob/master/_main.pdf
• Print On Demand Version: Available on Amazon (ISBN 9798687369954)
• Complete Instructor’s Solutions: available to verified instructors
• Google Colab:
– Welcome notebook: https://colab.research.google.com/
– Introduction video: https://www.youtube.com/watch?v=inN8seMm7
UI
• Jupyter Notebooks: https://jupyter.org/
• YouTube Playlist for Python How To: https://www.youtube.com/playli
st?list=PLftKiHShKwSO4Lr8BwrlKU_fUeRniS821
Preface
This book grew out of lecture notes, classroom activities, code, examples, exer-
cises, projects, and challenge problems for my introductory course on numerical
methods. The prerequisites for this material include a firm understanding of
single variable calculus (though multivariable calculus doesn’t hurt), a good
understanding of the basics of linear algebra, a good understanding of the basics
of differential equations, and some exposure to scientific computing (as seen
in other math classes or perhaps from a computer science class). The primary
audience is any undergraduate STEM major with an interest in using computing
to solve problems.
A note on the book’s title: I do not call these materials “numerical analysis” even
though that is often what this course is called. In these materials I emphasize
“methods” and implementation over rigorous mathematical “analysis.” While
8 CONTENTS
this may just be semantics I feel that it is important to point out. If you are
looking for a book that contains all of the derivations and rigorous proofs of the
primary results in elementary numerical analysis, then this not the book for you.
I have intentionally written this material with an inquiry-based emphasis which
means that this is not a traditional text on numerical analysis – there are plenty
of those on the market.
Creative Commons
©Eric Sullivan. Some Rights Reserved.
Acknowledgements
I would first like to thank Dr. Kelly Cline and Dr. Corban Harwood for being
brave enough to teach a course that they love out of a rough draft of my book.
Your time, suggested edits, and thoughts for future directions of the book were,
and are, greatly appreciated. Second, I would like to thank Johnanna for simply
being awesome and giving your full support along the way. Next I would like to
thank my students and colleagues, past, present, and future, for giving feedback
and support for this project.
To The Student
The Inquiry-Based Approach
Let’s start the book off right away with a problem designed for groups, discussion,
disagreement, and deep critical thinking. This problem is inspired by Dana
Ernst’s first day IBL activity titled: Setting the Stage.
CONTENTS 9
Exercise 0.1. * Get in groups of size 3-4. * Group members should introduce
themselves. * For each of the questions that follow I will ask you to:
a. Think about a possible answer on your own
b. Discuss your answers with the rest of the group
c. Share a summary of each group’s discussion
Questions:
Question #1: What are the goals of a university education?
Question #2: How does a person learn something new?
Question #3: What do you reasonably expect to remember from your courses
in 20 years?
Question #4: What is the value of making mistakes in the learning process?
Question #5: How do we create a safe environment where risk taking is encour-
aged and productive failure is valued?
1. Fight! You will have to fight hard to work through this material. The fight
is exactly what we’re after since it is ultimately what leads to innovative
thinking.
2. Screw Up! More accurately, don’t be afraid to screw up. You should write
code, work problems, and prove theorems then be completely unafraid to
scrap what you’ve done and redo it from scratch.
3. Collaborate! You should collaborate with your peers with the following
caveats:
a. When you are done collaborating you should go your separate ways.
When you write your solution you should have no written (or digital)
record of your collaboration.
b. Use of the internet to help solve these problems robs you of the most
important part of this class; the chance for original thought.
4. Enjoy! Part of the fun of IBL is that you get to experience what it is
like to think like a true mathematician / scientist. It takes hard work but
ultimately this should be fun!
To the Instructor
If you are an instructor wishing to use these materials then I only ask that you
adhere to the Creative Commons license. You are welcome to use, distribute,
and remix these materials for your own purposes. Thanks for considering my
materials for your course! Let me know if you have questions, edits, or suggestions:
esullivan at carroll dot edu. Furthermore, if you are interested in a full
collection of solutions to this book please contact me. I only ask that you don’t
share these solutions.
I have authored this version of the book using R-Bookdown [1] as the primary
authoring tool. This particular tool mixes the LaTeX typesetting language along
with the powerful Markdown language. It also allows for the Python code to be
embedded directly into the book so I can run the code, build the figures, and
generate output all in one place.
either at the boards in the classroom or in some way where they can share their
work. Much of my class time is spent with students actively building algorithms
or group coding. The beauty, as I see it, of IBL is that you can run your course
in any way that is comfortable for you. You can lecture through some of the
material in a more traditional way, you can let the students completely discover
some of the methods, or you can do a mix of both.
You will find that I do not give rigorous (in the mathematical sense) proofs or
derivations of many of the algorithms in this book. I tend to lean on numerical
experiments to allow students to discover algorithms, error estimates, and other
results without the rigor. The makeup of my classes tends to be math majors
along with engineering, computer science, physics, and data science students.
The Projects
I have taught this class with anywhere from two to four projects during the
semester. Each of the projects is designed to give the students an open-ended
task where they can show off their coding skills and, more importantly, build
their mathematical communication skills. Projects can be done in groups or
individually depending on the background and group dynamics of your class.
Appendix B contains several tips for how to tackle the writing in the projects.
Coding
I expect that my students come with some coding experience from other math-
ematics or computer science classes. With that, I leave the coding help as an
appendix (see Appendix A) and only point the students there for refreshers. If
your students need a more thorough ramp up to the coding then you might want
to start the course with Appendix A to get the students up to speed. I expect
the students to do most of the coding the in the class, but occasionally we will
code algorithms together (especially earlier in the semester when the students
are still getting their feet underneath them).
I encourage students to learn Python. It is a general purpose language that
does extremely well with numerical computing when paired with numpy and
matplotlib. Appendix A has several helpful sections for getting students up to
speed with Python.
I encourage you to consider having your students code in Jupyter Notebooks
or Google CoLab. The advantage is that students can mix their writing and
their code in a seamless way. This allows for an iterative approach to coding
and writing and gives the students the tools to explain what they’re doing as
they code.
Pacing
The following is a typical 15-week semester with these materials.
12 CONTENTS
Other Considerations:
Projects: I typically assign a project after Chapter 2 or 3, a second project
after Chapter 4, and a third project after Chapter 5. The fourth project,
if time allows, typically comes from Chapter 6. I typically dedicate two
class days to the first project and then one class day to each subsequent
project. For the final project I typically have students present their work
so this takes a day or two out of our class time.
Exercises: I typically assign one collection of exercises per week. Students are
to work on these outside of class, but in some cases it is worth taking
class time to let students work in teams. Of particular note are the coding
exercises in Chapter 1. If your students need practice with coding then it
might be worthwhile to mix these exercises in through several assignments
and perhaps during a few class periods. I have also taken extra class time
with the exercises in Chapter 5 to let the students work in pairs on the
modeling aspects of some of the problems.
Exams: This is a non-traditional book and as such you might want to consider
some non-traditional exam settings. Some ideas that my colleagues and I
have used are:
• Use code and functions that you’ve written to solve several new
problems during a class period.
• Give the mathematical details and the derivations of key algorithms.
• Take several problems home (under strict rules about collaboration)
and return with working code and a formal write up.
• No exams, but put heavier weight on the projects.
Chapter 1
Preliminary Topics
The field of Numerical Analysis is really the study of how to take mathematical
problems and perform them efficiently and accurately on a computer. While
the field of numerical analysis is quite powerful and wide-reaching, there are
some mathematical problems where numerical analysis doesn’t make much sense
(e.g. finding an algebraic derivative of a function, proving a theorem, uncovering
a pattern in a sequence). However, for many problems a numerical method that
gives an approximate answer is both more efficient and more versatile than any
analytic technique. Let’s look at several examples. You can also watch a short
introduction video here: https://youtu.be/yH0zhca0hbs
Example from Algebra: Solve the equation ln(x) = sin(x) for x in the in-
terval x ∈ (0, π). Stop and think about all of the algebra that you ever
learned. You’ll quickly realize that there are no by-hand techniques that
can solve this problem! A numerical approximation, however, is not so
hard to come by.
Again, trying to use any of the possible techniques for using the Fun-
damental Theorem of Calculus, and hence finding an antiderivative, on
the function sin(x2 ) is completely hopeless. Substitution, integration by
parts, and all of the other techniques that you know will all fail. Again, a
numerical approximation is not so difficult and is very fast! By the way,
this integral (called the Fresnel Sine Integral) actually shows up naturally
in the field of optics and electromagnetism, so it is not just some arbitrary
integral that I cooked up just for fun.
Example from Differential Equations: Say we needed to solve the differ-
ential equation dy 2
dt = sin(y ) + t. The nonlinear nature of the problem
precludes us from using most of the typical techniques (e.g. separation of
variables, undetermined coefficients, Laplace Transforms, etc). However,
computational methods that result in a plot of an approximate solution
can be made very quickly and likely give enough of a solution to be usable.
Example from Linear Algebra: You have probably never row reduced a
matrix larger than 3 × 3 or perhaps 4 × 4 by hand. Instead, you often turn
to technology to do the row reduction for you. You would be surprised to
find that the standard row reduction algorithm (RREF) that you do by
hand is not what a computer uses. Instead, there are efficient algorithms
to do the basic operations of linear algebra (e.g. Gaussian elimination,
matrix factorization, or eigenvalue decomposition)
In this chapter we will discuss some of the basic underlying ideas in Numerical
Analysis, and the essence of the above quote from Nick Trefethen will be part of
the focus of this chapter. Particularly, we need to know how a computer stores
numbers and when that storage can get us into trouble. On a more mathematical
side, we offer a brief review of the Taylor Series from Calculus at the end of this
chapter. The Taylor Series underpins many of our approximation methods in
this class. Finally, at the end of this chapter we provide several coding exercises
that will help you to develop your programming skills. It is expected that you
know some of the basics of programming before beginning this class. If you need
to review the basics then see Appendix A
You’ll have more than just the basics by the end.
Let’s begin.
1.2. ARITHMETIC IN BASE 2 15
xn ∈ [0, 12 ]
2xn ,
xn+1 =
2xn − 1, xn ∈ ( 12 , 1]
Exercise 1.2. Now use a spreadhseet and to do the computations. Do you get
the same answers?
Exercise 1.3. Finally, solve this problem with Python. Some starter code is
given to you below.
x = 1.0/10
for n in range(50):
if x<= 0.5:
# put the correct assignment here
else:
# put the correct assigment here
print(x)
Exercise 1.4. It seems like the computer has failed you! What do you think
happened on the computer and why did it give you a different answer? What, do
you suppose, is the cautionary tale hiding behind the scenes with this problem?
Exercise 1.5. Now what happens with this problem when you start with
x0 = 1/8? Why does this new initial condition work better?
A computer circuit knows two states: on and off. As such, anything saved in
computer memory is stored using base-2 numbers. This is called a binary number
system. To fully understand a binary number system it is worth while to pause
and reflect on our base-10 number system for a few moments.
What do the digits in the number “735” really mean? The position of each digit
tells us something particular about the magnitude of the overall number. The
number 735 can be represented as a sum of powers of 10 as
and we can read this number as 7 hundreds, 3 tens, and 5 ones. As you can see,
in a “positional number system” such as our base-10 system, the position of the
16 CHAPTER 1. PRELIMINARY TOPICS
number indicates the power of the base, and the value of the digit itself tells you
the multiplier of that power. This is contrary to number systems like Roman
Numerals where the symbols themselves give us the number, and meaning of the
position is somewhat flexible. The number “48,329” can therefore be interpreted
as
48, 329 = 40, 000+8, 000+300+20+9 = 4×104 +8×103 +3×102 +2×101 +9×100 ,
Four ten thousands, eight thousands, three hundreds, two tens, and nine ones.
Now let’s switch to the number system used by computers: the binary number
system. In a binary number system the base is 2 so the only allowable digits are
0 and 1 (just like in base-10 the allowable digits were 0 through 9). In binary
(base-2), the number “101, 101” can be interpreted as
101, 1012 = 1 × 25 + 0 × 24 + 1 × 23 + 1 × 22 + 0 × 21 + 1 × 20
(where the subscript “2” indicates the base to the reader). If we put this back
into base 10, so that we can read it more comfortably, we get
The reader should take note that the commas in the numbers are only to allow
for greater readability – we can easily see groups of three digits and mentally
keep track of what we’re reading.
Exercise 1.7. Explain the joke: There are 10 types of people. Those who
understand binary and those who don’t.
Exercise 1.8. Discussion: With your group discuss how you would convert a
base-10 number into its binary representation. Once you have a proposed method
put it into action on the number 23710 to show that the base-2 expression is
11, 101, 1012
Exercise 1.9. Convert the following numbers from base 10 to base 2 or visa
versa.
• Write 1210 in binary
• Write 1110 in binary
1.2. ARITHMETIC IN BASE 2 17
Exercise 1.10. Now that you have converted several base-10 numbers to base-2,
summarize an efficient technique to do the conversion.
13710 = 128 + 8 + 1
= 27 + 23 + 20
= 1 × 27 + 0 × 26 + 0 × 25 + 0 × 24 + 1 × 23 + 0 × 22 + 0 × 21 + 1 × 20
= 100010012
Next we’ll work with fractions and decimals. For example, let’s take the base 10
number 5.34110 and expand it out to get
3 4 1
5.34110 = 5 + + + = 5 × 100 + 3 × 10−1 + 4 × 10−2 + 1 × 10−3 .
10 100 1000
The position to the right of the decimal point is the negative power of 10 for the
given position. We can do a similar thing with binary decimals.
1, 101.012 =? × 23 + 1 × 22 + 0 × 21 +? × 20 + 0 × 2? + 1 × 2−2 .
Exercise 1.12. Repeating digits in binary numbers are rather intriguing. The
number 0.0111 = 0.01110111011101110111 . . . surely also has a decimal represen-
18 CHAPTER 1. PRELIMINARY TOPICS
We want to know what this series converges to in base 10. Work with your
partners to approximate the base-10 number.
Exercise 1.14. Convert the base 10 decimal 0.635 to binary using the following
steps.
a. Multiply 0.635 by 2. The whole number part of the result is the first binary
digit to the right of the decimal point.
b. Take the result of the previous multiplication and ignore the digit to the
left of the decimal point. Multiply the remaining decimal by 2. The whole
number part is the second binary decimal digit.
c. Repeat the previous step until you have nothing left, until a repeating
pattern has revealed itself, or until your precision is close enough.
Explain why each step gives the binary digit that it does.
1.2. ARITHMETIC IN BASE 2 19
Exercise 1.15. Based on your previous problem write an algorithm that will
convert base-10 decimals (less than 1) to base decimal expansions.
Exercise 1.16. Convert the base 10 fraction 1/10 into binary. Use your solution
to fully describe what went wrong in the Exercise 1.1.
20 CHAPTER 1. PRELIMINARY TOPICS
Exercise 1.17. Let’s start the discussion with a very concrete example. Consider
the number x = −123.15625 (in base 10). As we’ve seen this number can be
converted into binary. Indeed
x = −123.1562510 = −1111011.001012
x = −1. ×2
b. Based on the fact that every binary number, other than 0, can be written
in this way, what three things do you suppose a computer needs to store
for any given number?
c. Using your answer to part (b), what would a computer need to store for
the binary number x = 10001001.11001100112 ?
x = (−1)s × (1 + m) × 2E
where s ∈ {0, 1} is called the sign bit and m is a binary number such that
0 ≤ m < 1.
For a number x = (−1)s × (1 + m) × 2E stored in a computer, the number m is
called the mantissa or the significand, s is known as the sign bit, and E is
known as the exponent.
Example 1.3. What are the mantissa, sign bit, and exponent for the numbers
710 , −710 , and (0.1)10 ?
Solution:
• For the number 710 = 1112 = 1.11 × 22 we have s = 0, m = 0.11 and
E = 2.
1.3. FLOATING POINT ARITHMETIC 21
In the last part of the previous example we saw that the number (0.1)10 is
actually a repeating decimal in base-2. This means that in order to completely
represent the number (0.1)10 in base-2 we need infinitely many decimal places.
Obviously that can’t happen since we are dealing with computers with finite
memory. Over the course of the past several decades there have been many
systems developed to properly store numbers. The IEEE standard that we now
use is the accumulated effort of many computer scientists, much trial and error,
and deep scientific research. We now have three standard precisions for storing
numbers on a computer: single, double, and extended precision. The double
precision standard is what most of our modern computers use.
Definition 1.1. There are three standard precisions for storing numbers in a
computer.
• A single-precision number consists of 32 bits, with 1 bit for the sign, 8
for the exponent, and 23 for the significand.
• A double-precision number consists of 64 bits with 1 bit for the sign, 11
for the exponent, and 52 for the significand.
• An extended-precision number consists of 80 bits, with 1 bit for the
sign, 15 for the exponent, and 64 for the significand.
For all practical purposes the computer cannot tell the difference between two
numbers if the difference is smaller than machine precision. This is of the utmost
important when you want to check that something is “zero” since a computer
just cannot know the difference between 0 and .
Exercise 1.18. To make all of these ideas concrete let’s play with a small
computer system where each number is stored in the following format:
s E b1 b2 b3
22 CHAPTER 1. PRELIMINARY TOPICS
The first entry is a bit for the sign (0= + and 1 = −). The second entry, E is
for the exponent, and we’ll assume in this example that the exponent can be 0,
1, or −1. The three bits on the right represent the significand of the number.
Hence, every number in this number system takes the form
(−1)s × (1 + 0.b1 b2 b3 ) × 2E
• What is the smallest positive number that can be represented in this form?
• What is the largest positive number that can be represented in this form?
Exercise 1.19. What are the largest and smallest numbers that can be stored
in single and double precision?
Exercise 1.20. What is machine precision for the single and double precision
standard?
Exercise 1.21. Explain the behavior of the sequence from the first problem in
these notes using what you know about how computers store numbers in double
precision.
xn ∈ [0, 12 ]
2xn , 1
xn+1 = 1 with x0 =
2xn − 1, xn ∈ ( 2 , 1] 10
In particular, now that you know about how numbers are stored in a computer,
how long do you expect it to take until the truncation error creeps into the
computation?
Much more can be said about floating point numbers such as how we store
infinity, how we store NaN, and how we store 0. The Wikipedia page for floating
point arithmetic might be of interest for the curious reader. It is beyond the
scope of this class and this book to go into all of those details here. Instead, the
biggest takeaway points from this section and the previous are:
• All numbers in a computer are stored with finite precision.
• Nice numbers like 0.1 are sometimes not machine representable in binary.
• Machine precision is the gap between 1 and the next largest number that
can be stored.
• Computers cannot tell the difference between two numbers if the difference
is less than machine precision.
1.4. APPROXIMATING FUNCTIONS 23
Exercise 1.22. In this problem we’re going to make a bit of a wish list for all
of the things that a computer will do when approximating a function. We’re
going to complete the following sentence:
If we are going to approximate f (x) near the point x = x0 with a simpler function
g(x) then . . .
(I’ll get us started with the first two things that seems natural to wish for. The
rest of the wish list is for you to complete.)
• the functions f (x) and g(x) should agree at x = x0 . In other words,
f (x0 ) = g(x0 )
• the function g(x) should only involve addition, subtraction, multiplication,
division, and integer exponents since computer are very good at those sorts
of operations.
• if f (x) is increasing / decreasing to the right of x = x0 then g(x) . . .
• if f (x) is increasing / decreasing to the left of x = x0 then g(x) . . .
• if f (x) is concave up / down to the right of x = x0 then g(x). . .
• if f (x) is concave up / down to the left of x = x0 then g(x) . . .
• if we zoom into plots of the functions f (x) and g(x) near x = x0 then . . .
• . . . is there anything else that you would add?
Exercise 1.24. Let’s put some parts of the wish list into action. If f (x) is a
differentiable function at x = x0 and if g(x) = A + B(x − x0 ) + C(x − x0 )2 +
D(x − x0 )3 then
24 CHAPTER 1. PRELIMINARY TOPICS
Exercise 1.25. Let f (x) = ex . Put the answers to the previous question into
action and build a cubic polynomial that approximates f (x) = ex near x0 = 0.
In the previous 4 exercises you have built up some basic intuition for what we
would want out of a mathematical operation that might build an approximation
of a complicated function. What we’ve built is actually a way to get better and
better approximations for functions out to pretty much any arbitrary accuracy
that we like so long as we are near some anchor point (which we called x0 in the
previous exercises).
In the next several problems you’ll unpack the approximations of f (x) = ex a
bit more carefully and we’ll wrap the whole discussion with a little bit of formal
mathematical language. Then we’ll examine other functions like sine, cosine,
logarithms, etc. One of the points of this whole discussion is to give you a little
glimpse as to what is happening behind the scenes in scientific programming
languages when you do computations with these functions. A bigger point is to
start getting a feel for how we might go in reverse and approximate an unknown
function out of much simpler parts. This last goal is one of the big takeaways from
numerical analysis: we can mathematically model highly complicated functions
out of fairly simple pieces.
Exercise 1.26. What is Euler’s number e? You likely remember using this
number often in Calculus and Differential Equations. Do you know the decimal
approximation for this
√ number? Moreover, is there a way that we could approxi-
mate something like e = e0.5 or e−1 without actually having access to the full
decimal expansion?
For all of the questions below let’s work with the function f (x) = ex .
a. The function g(x) = 1 matches f (x) = ex exactly at the point x = 0 since
f (0) = e0 = 1. Furthermore if x is very very close to 0 then the functions
f (x) and g(x) are really close to each other. Hence we could say that
g(x) = 1 is an approximation of the function f (x) = ex for values of x very
very close to x = 0. Admittedly, though, it is probably pretty clear that
this is a horrible approximation for any x just a little bit away from x = 0.
b. Let’s get a better approximation. What if we insist that our approximation
g(x) matches f (x) = ex exactly at x = 0 and ALSO has exactly the same
1.4. APPROXIMATING FUNCTIONS 25
Figure 1.1: The first few polynomial approximations of the exponential function.
Exercise 1.27. Let’s extend the idea from the previous problem to much better
approximations of the function f (x) = ex .
a. Let’s build a function g(x) that matches f (x) exactly at x = 0, has exactly
the same first derivative as f (x) at x = 0, AND has exactly the same
second derivative as f (x) at x = 0. To do this we’ll use a quadratic function.
For a quadratic approximation of a function we just take a slight extension
to the point-slope form of a line and use the equation
f 00 (x0 )
y = f (x0 ) + f 0 (x0 )(x − x0 ) + (x − x0 )2 .
2
In this case we are using x0 = 0 so the quadratic approximation function
looks like
f 00 (0) 2
y = f (0) + f 0 (0)x + x .
2
26 CHAPTER 1. PRELIMINARY TOPICS
e ≈ 2.718281828459045
√
Exercise 1.29. Use the functions that you’ve built to approximate e = e0.5 .
Check the accuracy of your answer using np.exp(0.5) in Python.
1.4. APPROXIMATING FUNCTIONS 27
What we’ve been exploring so far in this section is the Taylor Series of a
function.
Definition 1.3. (Taylor Series)If f (x) is an infinitely differentiable function
at the point x0 then
Don’t let the notation scare you. In a Taylor Series you are just saying: give me
a function that
• matches f (x) at x = x0 exactly,
• matches f 0 (x) at x = x0 exactly,
• matches f 00 (x) at x = x0 exactly,
• matches f 000 (x) at x = x0 exactly,
• etc.
(Take a moment and make sure that the summation notation makes sense to
you.)
Moreover, Taylor Series are built out of the easiest types of functions: polynomials.
Computers are rather good at doing computations with addition, subtraction,
multiplication, division, and integer exponents, so Taylor Series are a natural
way to express functions in a computer. The down side is that we can only
get true equality in the Taylor Series if we have infinitely many terms in the
series. A computer cannot do infinitely many computations. So, in practice,
we truncate Taylor Series after many terms and think of the new polynomial
function as being close enough to the actual function so far as we don’t stray
too far from the anchor x0 .
just a special case of a Taylor Series, so throughout this book we will refer to
both Taylor Series and Maclaurin Series simply as Taylor Series.
Exercise 1.31. Verify from your previous work that the Taylor Series centered
at x0 = 0 (i.e. the Maclaurin Series) for f (x) = ex is indeed
x2 x3 x4 x5
ex = 1 + x + + + + + ··· .
2 3! 4! 5!
Exercise 1.32. Do all of the calculations to show that the Taylor Series centered
at x0 = 0 for the function f (x) = sin(x) is indeed
x3 x5 x7
sin(x) = x − + − + ···
3! 5! 7!
Exercise 1.33. Do all of the calculations to show that the Taylor Series centered
at x0 = 0 for the function f (x) = cos(x) is indeed
x2 x4 x6
cos(x) = 1 − + − + ···
2! 4! 6!
Exercise 1.34. Let’s compute a few Taylor Series that are not centered at
x0 = 0 (that is, Taylor Series that are not Maclaurin Series). For example, let’s
approximate the function f (x) = sin(x) near x0 = π2 . Near the point x0 = π2 ,
the Taylor Series approximation will take the form
π f 00 π2 π 2 f 000 π2
π
0 π
π 3
f (x) = f +f x− + x− + x− +···
2 2 2 2! 2 3! 2
Write the first several terms of the Taylor Series for f (x) = sin(x) centered at
x0 = π2 . Then write Python code to build the plot below showing successive
approximations for f (x) = sin(x) centered at π/2.
f (x) = 1 + x + x2 + x3 + x4 + x5 + · · ·
(you should stop now and verify this!). However, if we plot the function f (x)
along with several successive approximations for f (x) we find that beyond
x = 1 we don’t get the correct behavior of the function (see Figure 1.3). More
specifically, we cannot get the Taylor Series to change behavior across the
vertical asymptote of the function at x = 1. This example is meant to point out
the fact that a Taylor Series will only ever make sense near the point at which
1
you center the expansion. For the function f (x) = 1−x centered at x0 = 0 we
can only get good approximations within the interval x ∈ (−1, 1) and no further.
Figure 1.3: Several Taylor Series approximations of the function f (x) = 1/(1−x).
domain of convergence where the Taylor Series actually makes sense and gives
good approximations. While it is beyond the scope of this section to give all of the
details for finding the domain of convergence for a Taylor Series, a good heuristic
is to observe that a Taylor Series will only give reasonable approximations of a
function from the center of the series to the nearest asymptote. The domain of
convergence is typically symmetric about the center as well. For example:
A Taylor Series will give good approximations to the function within the domain
1.4. APPROXIMATING FUNCTIONS 31
of convergence, but will give garbage outside of it. For more details about the
domain of convergence of a Taylor Series you can refer to the Taylor Series
section of the online Active Calculus Textbook [2].
32 CHAPTER 1. PRELIMINARY TOPICS
x2 x3 x4
ex = 1 +x + + + + ···.
|{z} 2! 3! 4!
0th order approximation
| {z }
remainder
where O(x) (read “Big-O of x”) tells us that the expected error for approx-
imations close to x0 = 0 is about the same size as x.
1st Order Approximation of f (x) = ex : If we want to use a first-order (lin-
ear) approximation of f (x) = ex then we gather the 0th order and 1st order
1.5. APPROXIMATION ERROR WITH TAYLOR SERIES 33
2
Therefore we would approximate ex as ex ≈ 1 + x + x2 for values of x that
are close to x0 = 0. Furthermore, for values of x very close to x0 = 0 the
largest term in the remainder is the x3 term. Using Big-O notation we can
write the approximation as
x2
2nd order approximation: ex ≈ 1 + x + + O(x3 ).
2
Again notice that we don’t explicitly say what the coefficient is for the
x3 term. Instead we are just saying that using the quadratic function
2
y = 1 + x + x2 to approximate ex for values of x near x0 = 0 will result in
errors that are proportional to x3 .
For the function f (x) = ex the idea of approximating the amount of approxima-
tion error by truncating the Taylor Series is relatively straight forward: if we
want an nth order polynomial approximation of ex near x0 = 0 then
x2 x3 x4 xn
ex = 1 + x + + + + ··· + + O(xn+1 )
2! 3! 4! n!
meaning that we expect the error to be proportional to xn+1 .
Keep in mind that this sort of analysis is only good for values of x that are very
close to the center of the Taylor Series. If you are making approximations that
are too far away then all bets are off.
34 CHAPTER 1. PRELIMINARY TOPICS
Exercise 1.38. Let’s make the previous discussion a bit more concrete. We
know the Taylor Series for f (x) = ex quite well at this point so let’s use it to
approximate the value of f (0.1) = e0.1 with different order polynomials. Notice
that x = 0.1 is pretty close to the center of the Taylor Series x0 = 0 so this sort
of approximation is reasonable.
Using np.exp(0.1) we have Python’s approximation e0.1 ≈ np.exp(0.1) =
1.1051709181.
Fill in the blanks in the table.
Expected
Taylor Series Approximation Absolute Error Error
0th Order 1 |e0.1 − 1| = O(x) = 0.1
0.1051709181
1st Order 1.1 |e0.1 − 1.1| = O(x2 ) =
0.0051709181 0.12 = 0.01
2nd Order 1.105
3rd Order
4th Order
5th Order
Observe in the previous exercise that the actual absolute error is always less
than the expected error. Using the first term in the remainder to estimate the
approximation error of truncating a Taylor Series is crude but very easy to
implement.
Theorem 1.1. The approximation error when using a truncated Taylor Series
is roughly proportional to the size of the next term in the Taylor Series.
Exercise 1.39. Next we will examine the approximation error for the sine
function near x0 = 0. We know that the sine function has the Taylor Series
centered at x0 = 0 as
x3 x5 x7
sin(x) = x − + − + ··· .
3! 5! 7!
since we are really only interested in absolute error (i.e. we don’t care
if we overshoot or undershoot).)
b. Notice that there are no quadratic terms in the Taylor Series so there is
no quadratic approximation for sin(x) near x0 = 0.
c. A cubic approximation of sin(x) near x0 = 0 is sin(x) =??−?? + O(??).
i. Fill in the question marks in the cubic approximation formula.
ii. Use the cubic approximation formula to approximate sin(0.2).
iii. What is the approximation error for your approximation?
d. What is the next approximation formula for sin(x) near x0 = 0? Use it to
approximate sin(0.2), and give the expected approximation error.
e. Now let’s check all of our answers against what Python says we should
get for sin(0.2). If you use np.sin(0.2) you should get sin(0.2) ≈
np.sin(0.2) = 0.1986693308. Fill in the blanks in the table below and
then discuss the quality of our error approximations.
b. Based on what you did in part (a), complete the Taylor Series for ln(x)
centered at x0 = 1.
1 1 ?? ??
ln(x) = 0 + 1(x − 1) − (x − 1)2 + (x − 1)3 − (x − 1)4 + (x − 1)5 + · · · .
2 3 ?? ??
c. The nth order Taylor approximation of ln(x) near x0 = 1 is given below.
What is the order of the estimated approximation error?
Exercise 1.41. In the previous problem you found an approximation for ln(1.1)
to 5 decimal places. In doing so you had to build a Taylor Series at a well-known
point nearby 1.1 and then use our approximation of the error to determine the
number of terms to keep in the approximation. In this exercise we want an
approximation of cos π2 + 0.05 . To do so you should build a Taylor Series for
the cosine function centered at an appropriate point, determine an estimate for
the approximation error, and then use that estimate to determine the number of
terms to keep in the approximation.
1.6. EXERCISES 37
1.6 Exercises
1.6.1 Coding Exercises
The first several exercises here are meant for you to practice and improve your
coding skills. If you are stuck on any of the coding then I recommend that you
have a look at Appendix A. Please refrain from Googling anything on these
problems. The point is to struggle through the code, get it wrong many times,
debug, and then to eventually have working code.
Exercise 1.44. Write computer code that will draw random numbers from the
unit interval [0, 1], distributed uniformly (using Python’s np.random.rand()),
until the sum of the numbers that you draw is greater than 1. Keep track of
how many numbers you draw. Then write a loop that does this process many
many times. On average, how many numbers do you have to draw until your
sum is larger than 1?
Hint #1: Use the np.random.rand()command to draw a single number from
a uniform distribution with bounds (0, 1).
Hint #2: You should do this more than 1,000,000 times to get a good average
. . . and the number that you get should be familiar!
• 0 = not prime,
• 1 = prime.
Next write a script to find the sum of all of the prime numbers less than 1000.
Hint: Remember that a prime number has exactly two divisors: 1 and itself.
You only need to check divisors as large as the square root of n. Your
script should probably be smart enough to avoid all of the non-prime even
numbers.
12 + 22 + · · · + 102 = 385
The square of the sum of the first ten natural numbers is,
Hence the difference between the square of the sum of the first ten natural
numbers and the sum of the squares is 3025 − 385 = 2640.
Write code to find the difference between the square of the sum of the first one
hundred natural numbers and the sum of the squares. Your code needs to run
error free and output only the difference.
Hint: You will likely want to use modular division for this problem.
1.6. EXERCISES 39
Exercise 1.50. The following iterative sequence is defined for the set of positive
integers:
n
n→ (n is even)
2
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
• Assume that the account balances are uniformly distributed between $100
and $100,000.
• Assume that the annual interest rate on the accounts is 5% and the interest
is compounded daily and added to the accounts, except that fractions of
cents are truncated.
• Assume that your illegal account initially has a $0 balance.
Your Tasks:
a. Explain what the code below does.
import numpy as np
accounts = 100 + (100000-100) * np.random.rand(50000,1);
accounts = np.floor(100*accounts)/100;
1
Exercise 1.54. Find the Taylor Series for f (x) = ln(x) centered at the point
1
x0 = e. Then use the Taylor Series to approximate the number ln(3) to 4
decimal places.
1.6. EXERCISES 41
Exercise 1.55. In this problem we will use Taylor Series to build approximations
for the irrational number π.
a. Write the Taylor series centered at x0 = 0 for the function
1
f (x) = .
1+x
1
b. Now we want to get the Taylor Series for the function g(x) = 1+x 2 . It
ii. Make a few plots to verify that we indeed now have a Taylor Series
1
for the function g(x) = 1+x 2.
Exercise 1.56. In this problem we will prove the famous (and the author’s
favorite) formula
eiθ = cos(θ) + i sin(θ).
This is known as Euler’s formula after the famous mathematician Leonard Euler.
Show all of your work for the following tasks.
a. Write the Taylor series for the functions ex , sin(x), and cos(x).
1 There are many reasons why integrating an infinite series term by term should give you a
moment of pause. For the sake of this problem we are doing this operation a little blindly, but
in reality we should have verified that the infinite series actually converges uniformly.
42 CHAPTER 1. PRELIMINARY TOPICS
√
b. Replace x with iθ in the Taylor expansion of ex . Recall that i = −1 so
i2 = −1, i3 = −i, and i4 = 1. Simplify all of the powers of iθ that arise in
the Taylor expansion. I’ll get you started:
x2 x3 x4 x5
ex = 1 + x + + + + + ···
2 3! 4! 5!
(iθ)2 (iθ)3 (iθ)4 (iθ)5
eiθ = 1 + (iθ) + + + + + ···
2! 3! 4! 5!
θ2 θ3 θ4 θ5
= 1 + iθ + i2 + i3 + i4 + i5 + · · ·
2! 3! 4! 5!
= . . . keep simplifying ... . . .
c. Gather all of the real terms and all of the imaginary terms together. Factor
the i out of the imaginary terms. What do you notice?
d. Use your result from part (c) to prove that eiπ + 1 = 0.
Erel = γmc2
where
1
γ=q .
v2
1− c2
1 3 mv 4 5 mv 6
Erel = mc2 + mv 2 + 2
+ + ··· .
2 8 c 16 c4
a. What do we recover if we consider an object with zero velocity?
b. Why might it be completely reasonable to only use the quadratic approxi-
mation
1
Erel = mc2 + mv 2
2
for the relativistic energy equation?2
c. (some physics knowledge required) What do you notice about the second
term in the Taylor Series approximation of the relativistic energy function?
d. Show all of the work to derive the Taylor Series centered at v = 0 given
above.
2 This is something that people in physics and engineering do all the time – there is some
complicated nonlinear relationship that they wish to use, but the first few terms of the Taylor
Series captures almost all of the behavior since the higher-order terms are very very small.
1.6. EXERCISES 43
Exercise 1.58. (The Python Caret Operator)Now that you’re used to using
Python to do some basic computations you are probably comfortable with the
fact that the caret, ˆ, does NOT do exponentiation like it does in many other
programming languages. But what does the caret operator do? That’s what we
explore here.
a. Consider the numbers 9 and 5. Write these numbers in binary representa-
tion. We are going to use four bits to represent each number (it is ok if
the first bit happens to be zero).
9=
5=
Algebra
`(x) − r(x) = 0.
Hence, we can define a function f (x) as f (x) = `(x) − r(x) and observe that
every equation can be written as:
If f (x) = 0, find x.
This gives us a common language for which to frame all of our numerical
algorithms.
For example, if we want to solve the equation 3 sin(x) + 9 = x2 − cos(x) then
this is the same as solving (3 sin(x) + 9) − (x2 − cos(x)) = 0. We illustrate this
idea in Figure 2.1. You should pause and notice that there is no way that you
are going to apply by-hand techniques from algebra to solve this equation . . .
an approximate answer is pretty much our only hope.
On the left-hand side of Figure 2.1 we see the solutions to the equation 3 sin(x) +
9 = x2 − cos(x), and on the right-hand side we see the solutions to the equation
(3 sin(x) + 9) − x2 − cos(x) = 0.
From the plots it is apparent that the two equations have the same solutions:
x1 ≈ −2.55 and x2 ≈ 2.88. Figure 2.1 should demonstrate what we mean when
we say that solving equations of the form `(x) = r(x) will give the same answer
2.1. INTRO TO NUMERICAL ROOT FINDING 47
as solving f (x) = 0. Pause for a moment and closely examine the plots to verify
this for yourself.
We now have one way to view every equation-solving problem. As we’ll see in
this chapter, if f (x) has certain properties then different numerical techniques
for solving the equation will apply – and some will be much faster and more
accurate than others. The following sections give several different techniques
for solving equations of the form f (x) = 0. We will start with the simplest
techniques to implement and then move to the more powerful techniques that
require some ideas from Calculus to understand and analyze. Throughout this
chapter we will also work to quantify the amount of error that we make while
using these techniques.
48 CHAPTER 2. ALGEBRA
Exercise 2.2. Now let’s say that Sally has a continuous function that has a
root somewhere between x = 2 and x = 10. Modify your strategy from the
number guessing game in the previous problem to narrow down where the root is.
Exercise 2.3. Was it necessary to say that Sally’s function was continuous?
Could your technique work if the function were not continuous.
Now let’s get to the math. We’ll start the mathematical discussion with a
theorem from Calculus.
Theorem 2.1. (The Intermediate Value Theorem (IVT)) If f (x) is a
continuous function on the closed interval [a, b] and y∗ lies between f (a) and
f (b), then there exists some point x∗ ∈ [a, b] such that f (x∗ ) = y∗ .
Exercise 2.4. Draw a picture of what the intermediate value theorem says
graphically.
Corollary 2.1. If f (x) is a continuous function on the closed interval [a, b] and
if f (a) and f (b) have opposite signs then from the Intermediate Value Theorem
2.2. THE BISECTION METHOD 49
we know that there exists some point x∗ ∈ [a, b] such that ____.
Exercise 2.6. Fill in the blank in the previous corollary and then draw several
pictures that indicate why this might be true for continuous functions.
The Intermediate Value Theorem (IVT) and its corollary are existence theorems
in the sense that they tell us that some point exists. The annoying thing about
mathematical existence theorems is that they typically don’t tell us how to find
the point that is guaranteed to exist – annoying. The method that you developed
in Exercises 2.1 and 2.2 give one possible way to find the root.
In Exercises 2.1 and 2.2 you likely came up with an algorithm such as this:
• Say we know that the root of a continuous function lies between x = a and
x = b.
• Guess that the root is at the midpoint m = a+b
2 .
• By using the signs of the function narrow the interval which contains the
root to either [a, m] or [m, b].
• Repeat
Now we will turn this optimal strategy into computer code that will simply
play the game for us. But first we need to pay careful attention to some of the
mathematical details.
Exercise 2.7. Where is the Intermediate Value Theorem used in the root-
guessing strategy?
Exercise 2.8. Why was it important that the function f (x) is continuous when
playing this root-guessing game? Provide a few sketches to demonstrate your
answer.
Exercise 2.9. (The Bisection Method) Goal: We want to solve the equation
f (x) = 0 for x assuming that the solution x∗ is in the interval [a, b].
The Algorithm: Assume that f (x) is continuous on the closed interval [a, b].
To make approximations of the solutions to the equation f (x) = 0, do the
following:
1. Check to see if f (a) and f (b) have opposite signs. You can do this taking
the product of f (a) and f (b).
• If f (a) and f (b) have different signs then what does the IVT tell you?
50 CHAPTER 2. ALGEBRA
• If f (a) and f (b) have the same sign then what does the IVT not tell
you? What should you do in this case?
• Why does the product of f (a) and f (b) tell us something about the
signs of the two numbers?
a+b
2. Compute the midpoint of the closed interval, m = 2 , and evaluate f (m).
• Will m always be a better guess of the root than a or b? Why?
• What should you do here if f (m) is really close to zero?
3. Compare the signs of f (a) vs f (m) and f (b) vs f (m).
• What do you do if f (a) and f (m) have opposite signs?
• What do you do if f (m) and f (b) have opposite signs?
4. Repeat steps 2 and 3 and stop when f (m) is close enough to zero.
Exercise 2.10. Draw a picture illustrating what the Bisection Method does to
approximate solutions to the equation f (x) = 0.
Exercise 2.11. We want to write a Python function for the Bisection Method.
Instead of jumping straight into the code we should ALWAYS write pseudo-code
first. It is often helpful to write pseudo-code as comments in your file. Use the
template below to complete your pseudo-code.
def Bisection(f , a , b , tol):
# The input parameters are
# f is a Python function or a lambda function
# a is the lower guess
# b is the upper guess
# tol is an optional tolerance for the accuracy of the root
Exercise 2.12. Now use the pseudo-code as structure to complete a function for
the Bisection Method. Also write test code that verifies that your function works
properly. Be sure that it can take a Lambda Function as an input along with
an initial lower bound, an initial upper bound, and an optional error tolerance.
The output should be only 1 single number: the root.
Exercise 2.13. Test your Bisection Method code on the following equations.
a. x2 − 2 = 0 on x ∈ [0, 2]
b. sin(x) + x2 = 2 ln(x) + 5 on x ∈ [0, 5] (be careful! make a plot first)
c. (5 − x)ex = 5 on x ∈ [0, 5]
2.2.2 Analysis
After we build any root finding algorithm we need to stop and think about how
it will perform on new problems. The questions that we typically have for a
root-finding algorithm are:
• Will the algorithm always converge to a solution?
• How fast will the algorithm converge to a solution?
• Are there any pitfalls that we should be aware of when using the algorithm?
Exercise 2.14. Discussion: What must be true in order to use the bisection
method?
Exercise 2.15. Discussion: Does the bisection method work if the Intermediate
Value Theorem does not apply? (Hint: what does it mean for the IVT to “not
apply?”)
Next we’ll focus on a deeper mathematical analysis that will allow us to determine
exactly how fast the bisection method actually converges to within a pre-set
tolerance. Work through the next problem to develop a formula that tells you
exactly how many steps the bisection method needs to take in order to stop.
Exercise 2.17. Let f (x) be a continuous function on the interval [a, b] and
assume that f (a) · f (b) < 0. A reoccurring theme in Numerical Analysis is to
approximate some mathematical thing to within some tolerance. For example, if
we want to approximate the solution to the equation f (x) = 0 to within ε with
the bisection method, we should be able to figure out how many steps it will
take to achieve that goal.
a. Let’s say that a = 3 and b = 8 and f (a) · f (b) < 0 for some continuous
function f (x). The width of this interval is 5, so if we guess that the root
is m = (3 + 8)/2 = 5.5 then our error is less than 5/2. In the more general
setting, if there is a root of a continuous function in the interval [a, b] then
how far off could the midpoint approximation of the root be? In other
words, what is the error in using m = (a + b)/2 as the approximation of
the root?
b. The bisection method cuts the width of the interval down to a smaller
size at every step. As such, the approximation error gets smaller at every
step. Fill in the blanks in the following table to see the pattern in how the
approximation error changes with each iteration.
Exercise 2.18. Is it possible for a given function and a given interval that the
Bisection Method converges to the root in fewer steps than what you just found
in the previous problem? Explain.
Exercise 2.19. Create a second version of your Python Bisection Method func-
tion that uses a for loop that takes the optimal number of steps to approximate
the root to within some tolerance. This should be in contrast to your first version
which likely used a while loop to decide when to stop. Is there an advantage to
using one of these version of the Bisection Method over the other?
The final type of analysis that we should do on the bisection method is to make
plots of the error between the approximate solution that the bisection method
gives you and the exact solution to the equation. This is a bit of a funny thing!
Stop and think about this for a second: if you know the exact solution to the
equation then why are you solving it numerically in the first place!?!? However,
whenever you build an algorithm you need to test it on problems where you
actually do know the answer so that you can can be somewhat sure that it
isn’t giving you nonsense. Furthermore, analysis like this tells us how fast the
algorithm is expected to perform.
From Theorem 2.2 you know that the bisection method cuts the interval in
half at every iteration. You proved in Exercise 2.17 that the error given by the
bisection method is therefore cut in half at every iteration as well. The following
example demonstrate this theorem graphically.
54 CHAPTER 2. ALGEBRA
2
Example 2.1. √ Let’s solve the very simple equation x − 2 = 0 for x to get the
solution x = 2 with the bisection method. Since we know the exact answer
we can compare the exact answer to the value of the midpoint given at each
iteration and calculate an absolute error:
a. If we plot the absolute error on the vertical axis and the iteration number
on the horizontal axis we get Figure 2.2. As expected, the absolute error
follows an exponentially decreasing trend. Notice that it isn’t a completely
smooth curve since we will have some jumps in the accuracy just due to
the fact that sometimes the root will be near the midpoint of the interval
and sometimes it won’t be.
Figure 2.2: The evolution of the absolute error when solving the equation
x2 − 2 = 0 with the bisection method.
b. Without Theorem 2.2 it would be rather hard to tell what the exact
behavior is in the exponential plot above. We know from Theorem 2.2 that
the error will divide by 2 at every step, so if we instead plot the base-2
logarithm of the absolute error against the iteration number we should see
a linear trend as shown in Figure 2.3. There will be times later in this
course where we won’t have a nice theorem like Theorem 2.2 and instead
we will need to deduce the relationship from plots like these.
i. The trend is linear since logarithms and exponential functions are
inverses. Hence, applying a logarithm to an exponential will give a
linear function.
2.2. THE BISECTION METHOD 55
ii. The slope of the resulting linear function should be −1 in this case
since we are dividing by 1 power of 2 each iteration. Visually verify
that the slope in the plot below follows this trend (the red dashed
line in the plot is shown to help you see the slope).
Figure 2.3: Iteration number vs the base-2 logarithm of the absolute error.
Notice the slope of −1 indicating that the error is divided by 1 factor of 2 at
each step of the algorithm.
c. Another plot that numerical analysts use quite frequently for determining
how an algorithm is behaving as it progresses is described by the following
bullets:
• The horizontal axis is the absolute error at iteration k.
• The vertical axis is the absolute error at iteration k + 1.
See Figure 2.4 below, but this type of plot takes a bit of explaining the first time
you see it. Start on the right-hand side of the plot where the error is the largest
(this will be where the algorithm starts). The coordinates of the first point are
interpreted as:
Etc. Examining the slope of the trend line in this plot shows how we expect the
error to progress from step to step. The slope appears to be about 1 in the plot
below and the intercept appears to be about −1. In this case we used a base-2
logarithm for each axis so we have just empirically shown that
Rearranging the algebra a bit we see that this linear relationship turns into
(You should stop now and do this algebra.) Rearranging a bit more we get
1
(absolute error at step k + 1) = (absolute error at step k),
2
exactly as expected!! Pause and ponder this result for a second – we just empiri-
cally verified the convergence rate for the bisection method just by examining
the plot below!! That’s what makes these types of plots so powerful!
Figure 2.4: The base-2 logarithm of the absolute error at iteration k vs the
base-2 logarithm of the absolute error at iteration k + 1.
d. The final plot that we will make in analyzing the bisection method is
the same as the plot that we just made but with the base-10 logarithm
instead. See Figure 2.5. In future algorithms we will not know that the
error decreases by a factor of 2 so instead we will just try the base-10
logarithm. We will be able to extract the exact same information from
this plot. The primary advantage of this last plot is that we can see how
the order of magnitude (the power of 10) for the error progresses as the
algorithm steps forward. Notice that for every order of magnitude iteration
k decreases, iteration k + 1 decreases by one order of magnitude. That is,
the slope of the best fit line in Figure 2.5 is approximately 1. Discuss what
this means about how the error in the bisection method behaves as the
iterations progress.
2.2. THE BISECTION METHOD 57
Figure 2.5: The base-10 logarithm of the absolute error at iteration k vs the
base-10 logarithm of the absolute error at iteration k + 1.
Exercise 2.20. Carefully read and discuss all of the details of the previous
example and plots. Then create plots similar to this example to solve an equation
to which you know the exact solution to. You should see the same basic behavior
based on the theorem that you proved in Exercise 2.17. If you don’t see the
same basic behavior then something has gone wrong.
Hints: You will need to create a modified bisection method function which
returns all of the iterations instead of just the final root.
If the logarithms of your absolute errors are in a Python list called error
then a command like plt.plot(error[:-1],error[1:],'b*') will plot
the (k + 1)st absolute error against the k th absolute error.
If you want the actual slope and intercept of the trend line then you can
use m, b = np.polyfit(error[:-1], error[1:], deg=1).
58 CHAPTER 2. ALGEBRA
Exercise 2.21. In the Bisection Method, we always used the midpoint of the
interval as the next approximation of the root of the function f (x) on the interval
[a, b]. The three pictures in Figure 2.6 show the same function with three different
choices for a and b. Which one will take fewer Bisection-steps to find the root?
Which one will take more steps? Explain your reasoning.
(Note: The root in question is marked with the green star and the initial interval
is marked with the red circles.)
Figure 2.6: In the bisection method you get to choose the starting interval
however you like. That choice will make an impact on how fast the algorithm
converges to the approximate root.
Exercise 2.22. Now let’s modify the Bisection Method approach. Instead of
always using the midpoint (which as you saw in the previous problem could take
a little while to converge) let’s draw a line between the endpoints and use the
x-intercept as the updated guess. If we use this method can we improve the
speed of convergence on any of the choices of a and b for this function? Which
one will now likely take the fewest steps to converge? Figure 2.7 shows three
different starting intervals marked in red with the new guess marked as a black
X.
2.3. THE REGULA FALSI METHOD 59
The algorithm that you played with graphically in the previous problem is known
as the Regula Falsi (false position) algorithm. It is really just a minor tweak
on the Bisection method. After all, the algorithm is still designed to use the
Intermediate Value Theorem and to iteratively zero in on the root of the function
on the given interval. This time, instead of picking the midpoint of the interval
that contains the root we draw a line between the function values at either end
of the interval and then use the intersection of that line with the x axis as the
new approximation of the root. As you can see in Figure 2.7 you might actually
converge to the approximate root much faster this way (like with the far right
plot) or you might gain very little performance (like the far left plot).
Exercise 2.23. (The Regula Falsi Method) Assume that f (x) is continuous
on the interval [a, b]. To make iterative approximations of the solutions to the
equation f (x) = 0, do the following:
1. Check to see if f (a) and f (b) have opposite signs so that the intermediate
value theorem guarantees a root on the interval.
2. We want to write the equation of the line connecting the points (a, f (a))
and (b, f (b)).
• What is the slope of this line?
m=
3. Find the x intercept of the linear function that you wrote in the previous
step by setting the y to zero and solving for x. Call this point x = c.
c=
Hint: The x intercept occurs with y = 0.
4. Just as we did with the bisection method, compare the signs of f (a) vs
f (c) and f (b) vs f (c). Replace one of the endpoints with c. Which one do
you replace and why?
5. Repeat steps 2 - 4, and stop when f (c) is close enough to zero.
Exercise 2.24. Draw a picture of what the Regula Falsi method does to
approximate a root.
Exercise 2.25. Give sketches of functions where the Regula Falsi method will
perform faster than the Bisection method and visa versa. Justify your thinking
with several pictures and be prepared to defend your answers.
Exercise 2.26. Create a new Python function called regulafalsi and write
comments giving pseudo-code for the Regula-Falsi method. Remember that
starting with pseudo-code is always the best way to start your coding. Write
comments that give direction to the code that you’re about to write. It is a trap
to try and write actual code without any pseudo-code to give you a backbone
for the function.
Exercise 2.27. Use your pseudo-code to create a Python function that im-
plements the Regula Falsi method. Write a test script that verifies that your
function works properly. Your function should accept a Python function or a
Lambda function as input along with an initial lower bound, an initial upper
bound, and an optional error tolerance. The output should be only 1 single
number: the approximate root.
2.3.2 Analysis
In this subsection we will lean on the fact that we developed a bunch of analysis
tools in the Analysis section of the Bisection Method. You may want to go back
to that section first and take another look at the plots and tools that we built.
2.3. THE REGULA FALSI METHOD 61
2
√ solve the equation x − 2 = 0
Exercise 2.28. In this problem we are going to
since we know that the exact answer is x = 2. You will need to start by
modifying your regulafalsi function from Exercise 2.26 so that it returns all
of the iterations instead of just the root.
a. Start with the interval [0, 2] and solve the equation x2 − 2 = 0 with the
Regula-Falsi method.
i. Find√the absolute error between each iteration and the exact answer
x = 2.
ii. Make a plot of the base-10 logarithm of the absolute error at step
k against the base-10 logarithm of the absolute error at step k + 1.
This plot will be very similar to Figure 2.5.
iii. Approximate the slope and intercept of the linear trend in the plot.
iv. Based on the work that we did in Example 2.1 estimate the rate of
convergence of the Regula-Falsi method.
b. Repeat part (a) with the initial interval [1, 2].
c. Repeat part (a) with the initial interval [0, 1.5].
Exercise 2.30. Is the Regula-Falsi always better than the bisection method at
finding an approximate root for a continuous function that has a known root in
a closed interval? Why / why not? Discuss.
62 CHAPTER 2. ALGEBRA
Exercise 2.31. We will start this section with a reminder from Differential
Calculus.
The x-intercept of a function is where the function is 0. Root finding is really the
process of finding the x-intercept of the function. If the function is complicated
(e.g. highly nonlinear or doesn’t lend itself to traditional by-hand techniques) then
we can approximate the x-intercept by creating a Taylor Series approximation of
the function at a nearby point and then finding the x-intercept of that simpler
Taylor Series. The simplest non-trivial Taylor Series is a linear function – a
tangent line!
Exercise 2.33. Now let’s use the computations you did in the previous exercises
to look at an algorithm for approximating the root of a function. In the following
sequence of plots we do the following algorithm:
• Given a value of x that is a decent approximation of the root, draw a
tangent line to f (x) at that point.
• Find where the tangent line intersects the x axis.
• Use this intersection as the new x value and repeat.
The first step has been shown for you. Take a couple more steps graphically.
Does the algorithm appear to converge to the root? Do you think that this will
generally take more or fewer steps than the Bisection Method?
Figure 2.8: Using successive tangent line approximations to find the root of a
function
Exercise 2.35. Make a complete list of what you must know about the function
f (x) for the previous algorithm to work?
The algorithm that we just played with is known as Newton’s Method. The
method was originally proposed by Isaac Newton, and later modified by Joseph
Raphson, for approximating roots of the equation f (x) = 0. It should be clear
that Newton’s method requires the existence of the first derivative so we are
asking a bit more of our functions than we were before. In Bisection and Regula
64 CHAPTER 2. ALGEBRA
Falsi we only asked that the functions be continuous, now we’re asking that they
be differentiable. Stop and think for a moment . . . why is this a more restrictive
thing to ask for of the function f (x)?
y− = · (x − )
x1 =
5. Now iterate the process by replacing the labels “x1 ” and “x0 ” in the
previous step with xn+1 and xn respectively.
xn+1 =
Exercise 2.38. Create a new Python function called newton() and write
comments giving pseudo-code for Newton’s method. Your function needs to
accept a Python function for f (x), a Python function for f 0 (x), an initial guess,
and an optional error tolerance. You don’t need to set aside any code for
calculating the derivative.
2.4. NEWTON’S METHOD 65
Exercise 2.39. Using your pseudocode from the previous problem, write the
full newton() function. The only output should be the solution to the equation
that you are solving. Write a test script to verify that your Newton’s method
code indeed works.
2.4.2 Analysis
There are several ways in which Newton’s Method will behave unexpectedly – or
downright fail. Some of these issues can be foreseen by examining the Newton
iteration formula
f (xn )
xn+1 = xn − 0 .
f (xn )
Some of the failures that we’ll see are a little more surprising. Also in this section
we will look at the convergence rate of Newton’s Method and we will show that
we can greatly outperform the Bisection and Regula-Falsi methods.
Exercise 2.40. There are several reasons why Newton’s method could fail.
Work with your partners to come up with a list of reasons. Support each of your
reasons with a sketch or an example.
Exercise 2.42. An interesting failure can occur with Newton’s Method that
you might not initially expect. Consider the function f (x) = x3 − 2x + 2. This
function has a root near x = −1.77. Fill in the table below and draw the tangent
lines on the figure for approximating the solution to f (x) = 0 with a starting
point of x = 0.
n xn f (xn )
0 x0 = 0 f (x0 ) = 2
f (x0 )
1 x1 = 0 − f 0 (x0 ) =1 f (x1 ) = 1
f (x1 )
2 x2 = 1 − f 0 (x1 ) = f (x2 ) =
3 x3 = f (x3 ) =
4 x4 = f (x4 ) =
.. .. ..
. . .
66 CHAPTER 2. ALGEBRA
√
Exercise 2.43. Now let’s consider the function f (x) = 3 x. This function has
a root x = 0. Furthermore, it is differentiable everywhere except at x = 0 since
1 −2/3 1
f 0 (x) = x = 2/3 .
3 3x
The point of this problem is to show what can happen when the point of
non-differentiability is precisely the point that you’re looking for.
a. Fill in the table of iterations starting at x = −1, draw the tangent lines on
the plot, and make a general observation of what is happening with the
Newton iterations.
n xn f (xn )
0 x0 = −1 f (x0 ) = −1
f (−1)
1 x1 = −1 − f 0 (−1) = f (x1 ) =
2
3
4
.. .. ..
. . .
b. Now let’s look at the Newton iteration in a bit more detail. Since f (x) =
x1/3 and f 0 (x) = 13 x−2/3 the Newton iteration can be simplified as
x1/3 x1/3
xn+1 = xn − 1 −2/3
= x n − 3 = xn − 3xn = −2xn .
3x
x−2/3
What does this tell us about the Newton iterations?
Hint: You should have found the exact same thing in the numerical
experiment in part (a).
c. Was there anything special about the starting point x0 = −1? Will this
problem exist for every starting point?
2.4. NEWTON’S METHOD 67
Exercise 2.44. Repeat the previous exercise with the function f (x) = x3 − 5x
with the starting point x0 = −1.
b. Create the error plot with |xk − x∗ | on the horizontal axis and |xk+1 − x∗ |
on the vertical axis
c. Demonstrate that this plot has a slope of 2.
d. Give a thorough explanation for how to interpret the plot that you just
made.
e. When solving an equation with Newton’s method Joe found that the
absolute error at iteration 1 of the process was 0.15. Based on the fact
that Newton’s method is a second order method this means that the
absolute error at step 2 will be less than or equal to some constant times
0.152 = 0.0225. Similarly, the error at step 3 will be less than or equal to
some scalar multiple of 0.00252 = 0.00050625. What would Joe’s expected
error be bounded by for the fourth iteration, fifth iteration, etc?
2.5. THE SECANT METHOD 69
Exercise 2.46. (The Secant Method) Assume that f (x) is continuous and
we wish to solve f (x) = 0 for x.
1. Determine if there is a root near an arbitrary starting point x0 . How might
you do that?
2. Pick a second starting point near x0 . Call this second starting point x1 .
Note well that the points x0 and x1 should be close to each other. Why?
(The choice here is different than for the Bisection and Regula Falsi methods.
We are not choosing the left- and right- sides of an interval surrounding
the root.)
3. Use the backward difference
f (xn ) − f (xn−1 )
f 0 (xn ) ≈
xn − xn−1
to approximate the derivative of f at xn . Discuss why this approximates
the derivative.
4. Perform the Newton-type iteration
f (xn )
xn+1 = xn −
f (xn )−f (xn−1 )
xn −xn−1
70 CHAPTER 2. ALGEBRA
until f (xn ) is close enough to zero. Notice that the new iteration simplifies
to
f (xn ) (xn − xn−1 )
xn+1 = xn − .
f (xn ) − f (xn−1 )
Exercise 2.47. Draw several pictures showing what the Secant method does
pictorially.
Exercise 2.48. Write pseudo-code to outline how you will implement the Secant
Method.
Exercise 2.49. Write Python code for solving equations of the form f (x) = 0
with the Secant method. Your function should accept a Python function, two
starting points, and an optional error tolerance. Also write a test script that
clearly shows that your code is working.
2.5.2 Analysis
Up to this point we have done analysis work on the Bisection Method, the
Regula-Falsi Method, and Newton’s Method. We have found that the methods
are first order, first order, and second order respectively. We end this chapter by
doing the same for the Secant Method.
Exercise 2.50. Choose a non-trivial equation for which you know the solution
and write a script to empirically determine the convergence rate of the Secant
method.
2.6. EXERCISES 71
2.6 Exercises
2.6.1 Algorithm Summaries
The following four problems are meant to have you re-build each of the algo-
rithms that we developed in this chapter. Write all of the mathematical details
completely and clearly. Don’t just write “how” the method works, but give all
of the mathematical details for “why” it works.
Exercise 2.51. Let f (x) be a continuous function on the interval [a, b] where
f (a) · f (b) < 0. Clearly give all of the mathematical details for how the Bisection
Method approximates the root of the function f (x) in the interval [a, b].
Exercise 2.52. Let f (x) be a continuous function on the interval [a, b] where
f (a) · f (b) < 0. Clearly give all of the mathematical details for how the Regula
Falsi Method approximates the root of the function f (x) in the interval [a, b].
Exercise 2.55. √How many iterations of the bisection method are necessary
to approximate 3 to within 10−3 , 10−4 , . . . , 10−15 using the initial interval
[a, b] = [0, 2]? See Theorem 2.2.
Exercise 2.56. Refer back to Example 2.1 and demonstrate that you get the
same results by solving the problem x3 − 3 = 0. Generate versions of all of the
plots from the Example and give thorough descriptions of what you learn from
each plot.
72 CHAPTER 2. ALGEBRA
Exercise 2.57. In this problem you will demonstrate that all of your root
finding codes work. At the beginning of this chapter we proposed the equation
solving problem
3 sin(x) + 9 = x2 − cos(x).
Write a script that calls upon your Bisection, Regula Falsi, Newton, and Secant
methods one at a time to find the positive solution to this equation. Your script
needs to output the solutions in a clear and readable way so you can tell which
answer can from which root finding algorithm.
|xk+1 − x∗ | ≤ C|xk − x∗ |M .
Here, x∗ is the exact root, xk is the k th iteration of the root finding technique,
and xk+1 is the (k + 1)st iteration of the root finding technique.
a. If we consider the equation
|xk+1 − x∗ | ≤ C|xk − x∗ |M
and take the logarithm (base 10) of both sides then we get
log (|xk+1 − x∗ |) ≤ +
b. In part (a) you should have found that the log of new error is a linear
function of the log of the old error. What is the slope of this linear function
on a log-log plot?
c. In the plots below you will see six different log-log plots of the new error
to the old error for different root finding techniques. What is the order of
the approximate convergence rate for each of these methods?
d. In your own words, what does it mean for a root finding method to have a
“first order convergence rate?” “Second order convergence rate?” etc.
Exercise 2.60. There are MANY other root finding techniques beyond the four
that we have studied thus far. We can build these methods using Taylor Series
as follows:
Near x = x0 the function f (x) is approximated by the Taylor Series
N
X f (n) (x0 )
f (x) ≈ y = f (x0 ) + (x − x0 )n
n=1
n!
for x. For example, if N = 1 then we need to solve 0 = f (x0 ) + f 0 (x0 )(x − x0 ) for
x. In doing so we get x = x0 − f (x0 )/f 0 (x0 ). This is exactly Newton’s method.
If N = 2 then we need to solve
f 00 (x0 )
0 = f (x0 ) + f 0 (x0 )(x − x0 ) + (x − x0 )2
2!
for x.
74 CHAPTER 2. ALGEBRA
a. Solve for x in the case that N = 2. Then write a Python function that
implements this root-finding method.
b. Demonstrate that your code from part (a) is indeed working by solving
several problems where you know the exact solution.
c. Show several plots that estimates the order of the method from part (a).
That is, create a log-log plot of the successive errors for several different
equation-solving problems.
d. What are the pro’s and con’s to using this new method?
Exercise 2.61. (modified from [5]) An object falling vertically through the
air is subject to friction due to air resistance as well as gravity. The function
describing the position of such a function is
mg m2 g
s(t) = s0 − t + 2 1 − e−kt/m ,
k k
where m is the mass measured in kg, g is gravity measured in meters per second
per second, s0 is the initial position measured in meters, and k is the coefficient
of air resistance.
a. What are the units of the parameter k?
b. If m = 1kg, g = 9.8m/s2 , k = 0.1, and s0 = 100m how long will it take for
the object to hit the ground? Find your answer to within 0.01 seconds.
c. The value of k depends on the aerodynamics of the object and might be
challenging to measure. We want to perform a sensitivity analysis on your
answer to part (b) subject to small measurement errors in k. If the value
of k is only known to within 10% then what are your estimates of when
the object will hit the ground?
Exercise 2.62. Can the Bisection Method, Regula Falsi Method, or Newton’s
Method be used to find the roots of the function f (x) = cos(x) + 1? Explain
why or why not for each technique?
Exercise 2.63. In Single Variable Calculus you studied methods for finding
local and global extrema of functions. You likely recall that part of the process
is to set the first derivative to zero and to solve for the independent variable
(remind yourself why you’re doing this). The trouble with this process is that
it may be very very challenging to solve by hand. This is a perfect place for
Newton’s method or any other root finding techinque!
Find the local extrema for the function f (x) = x3 (x − 3)(x − 6)4 using numerical
techniques where appropriate.
2.6. EXERCISES 75
Exercise 2.64. A fixed point of a function f (x) is a point that solves the
equation f (x) = x. Fixed points are interesting in iterative processes since fixed
points don’t change under repeated application of the function f .
For example, consider the function f (x) = x2 − 6. The fixed points of f (x) can
be found by solving the equation x2 − 6 = x which, when simplified algebraically,
is x2 − x − 6 = 0. Factoring the left-hand side gives (x − 3)(x + 2) = 0 which
implies that x = 3 and x = −2 are fixed points for this function. That is,
f (3) = 3 and f (−2) = −2. Notice, however, that finding fixed points is identical
to a root finding problem.
x2 − xy 2 = 2
xy = 2
import numpy as np
from scipy.optimize import fsolve
def F(x):
Output = [ x[0]*np.cos(x[1])-4 ]
Output.append( x[0]*x[1] - x[1] - 5 )
return Output
fsolve(F,[6,1],full_output=1)
# Note: full_output gives the solver diagnostics
2.7. PROJECTS 77
2.7 Projects
At the end of every chapter we propose a few projects related to the content
in the preceding chapter(s). In this section we propose one ideas for a project
related to numerical algebra. The projects in this book are meant to be open
ended, to encourage creative mathematics, to push your coding skills, and to
require you to write and communicate your mathematics. Take the time to read
Appendix B before you write your final paper.
3
5. Now for the fun part! Consider the function f (z) = z√ − 1 where z is
a complex variable. That is, z = x + iy where i = −1. From the
Fundamental Theorem of Algebra we know that there are three roots to
this polynomial in the√complex plane. In fact, √
we know that the roots are
z0 = 1, z1 = 12 −1 + 3i , and z2 = 12 −1 − 3i (you should stop now
and check that these three numbers are indeed roots of the polynomial
f (z)). Your job is to build a picture of the basins of attraction for the three
roots in the complex plane. This picture will naturally be two-dimensional
since numbers in the complex plane are two dimensional (each has a real
and an imaginary part). When you have your picture give a thorough
write up of what you found.
6. Now pick your favorite complex-valued function and build a picture of the
basins of attraction. Consider this an art project! See if you can come up
with the prettiest basin of attraction picture.
2.7.2 Artillery
An artillery officer wishes to fire his cannon on an enemy brigade. He wants to
know the angle to aim the cannon in order to strike the target. If we have control
over the initial velocity of the cannon ball, v0 , and the angle of the cannon above
horizontal, θ, then the initial vertical component of the velocity of the ball is
vy (0) = v0 sin(θ) and the initial horizontal component of the velocity of the ball
is vx (0) = v0 cos(θ). In this problem we will assume the following:
• We will neglect air resistance1 so, for all time, the differential equations
vy0 (t) = −g and vx0 (t) = 0 must both hold.
1 Strictly speaking, neglecting air resistance is a poor assumption since a cannon ball moves
fast enough that friction with the air plays a non-negligible role. However, the assumption
of no air resistance greatly simplifies the math and makes this version of the problem more
tractable. The second version of the artillery problem in Chapter 5 will look at the effects of
air resistance on the cannon ball.
2.7. PROJECTS 79
• We will assume that the position of the cannon is the origin of a coordinate
system so sx (0) = 0 and sy (0) = 0.
• We will assume that the target is at position (x∗ , y∗ ) which you can measure
accurately relative to the cannon’s position. The landscape is relatively
flat but y∗ could be a bit higher or a bit lower than the cannon’s position.
Use the given information to write a nonlinear equation2 that relates x∗ , y∗ ,
v0 , g, and θ. We know that g = 9.8m/s2 is constant and we will assume that
the initial velocity can be adjusted between v0 = 100m/s and v0 = 150m/s in
increments of 10m/s. If we then are given a fixed value of x∗ and y∗ the only
variable left to find in your equation is θ. A numerical root-finding technique
can then be applied to your equation to approximate the angle. Create several
look up tables for the artillery officer so they can be given v0 , x∗ , and y∗ and
then use your tables to look up the angle at which to set the cannon. Be sure to
indicate when a target is out of range.
Write a brief technical report detailing your methods. Support your work with
appropriate mathematics and plots. Include your tables at the end of your
report.
2 Hint: Symbolically work out the amount of time that it takes until the vertical position of
the cannon ball reaches y∗ . Then substitute that time into the horizontal position, and set the
horizontal position equation to x∗ .
80 CHAPTER 2. ALGEBRA
Chapter 3
Calculus
Recall the typical techniques from differential calculus: the power rule, the chain
rule, the product rule, the quotient rule, the differentiation rules for exponentials,
inverses, and trig functions, implicit differentiation, etc. With these rules, and
enough time and patience, we can find a derivative of any algebraically defined
function. The truth of the matter is that not all functions are given to us
algebraically, and even the ones that are given algebraically are sometimes really
cumbersome.
Exercise 3.1. A water quality engineering team wants to find the rate at which
the volume of waste water is changing in their containment pond throughout
the year. They presently only have data on the specific geometric shape of the
containment pond as well as the depth of the waste water each day for the
past year. Propose several methods for approximating the first derivative of the
volume of the waste water pond.
82 CHAPTER 3. CALCULUS
Exercise 3.2. When a police officer fires a radar gun at a moving car it uses a
laser to measure the distance from the officer to the car:
• The speed of light is constant.
• The time between when the laser is fired and when the light reflected off
of the car is received can be measured very accurately.
• Using the formula distance = rate · time, the time for the laser pulse to be
sent and received can then be converted to a distance.
How does the radar gun then use that information to calculate the speed of the
moving car?
Integration, on the other hand, is a more difficult situation. You may recall
some of the techniques of integral calculus such as the power rule, u-substitution,
and integration by parts. However, these tools are not enough to find an
antiderviative for any given function. Furthermore, not every function can be
written algebraically.
Exercise 3.3. In statistics the function known as the normal distribution (the
bell curve) is defined as
1 2
N (x) = √ e−x /2 .
2π
One of the primary computations of introductory statistics is to find the area
under a portion of this curve since this area gives the probability of some event
Z b
1 2
P (a < x < b) = √ e−x /2 dx.
a 2π
The trouble is that there is no known antiderivative of this function. Propose a
method for approximating this area.
Exercise 3.4. Give a list of five functions for which an exact algebraic derivative
is relatively easy but an exact antiderivative is either very hard or maybe
impossible. Be prepared to compare with your peers.
Exercise 3.5. A dam operator has control of the rate at which water is flowing
out of a hydroelectric dam. He has records for the approximate flow rate through
the dam over the course of a day. Propose a way for the operator to use his data
to determine the total amount of water that has passed through the dam during
that day.
3.1. INTRO TO NUMERICAL CALCULUS 83
What you’ve seen here are just a few examples of why you might need to use
numerical calculus instead of the classical routines that you learned earlier in
your mathematical career. Another typical need for numerical derivatives and
integrals arises when we approximate the solutions to differential equations in
the later chapters of this book.
Throughout this chapter we will make heavy use of Taylor’s Theorem to build
approximations of derivatives and integrals. If you find yourself still a bit shaky
on Taylor’s Theorem it would probably be wise to go back to Section 1.4 and do
a quick review.
At the end of the chapter we’ll examine a numerical technique for solving
optimization problems without explicitly finding derivatives. Then we’ll look at
a common use of numerical calculus for fitting curves to data.
84 CHAPTER 3. CALCULUS
3.2 Differentiation
3.2.1 The First Derivative
Exercise 3.6. Recall from your first-semester Calculus class that the derivative
of a function f (x) is defined as
f (x + ∆x) − f (x)
f 0 (x) = lim
∆x→0 ∆x
A Calculus student proposes that it would just be much easier if we dropped the
limit and instead just always choose ∆x to be some small number, like 0.001 or
10−6 . Discuss the following questions:
a. When might the Calculus student’s proposal actually work pretty well in
place of calculating an actual derivative?
b. When might the Calculus student’s proposal fail in terms of approximating
the derivative?
In this section we’ll build several approximation of first and second derivatives.
The primary idea for each of these approximations is:
• Partition the interval [a, b] into N sub intervals
• Define the distance between two points in the partition as h.
a ··· x ··· b
x−h x+h
Figure 3.1 shows a depiction of the partition as well as making clear that h is
the separation between each of the points in the partition. Note that in general
the points in the partition do not need to be equally spaced, but that is the
simplest place to start.
Exercise 3.7. Let’s take a close look at partitions before moving on to more
details about numerical differentiation.
3.2. DIFFERENTIATION 85
a. If we partition the interval [0, 1] into 3 equal sub intervals each with length
h then:
i. h =
ii. [0, 1] = [0, ]∪[ , ]∪[ , 1]
iii. There are four total points that define the partition. They are
0, ??, ??, 1.
b. If we partition the interval [3, 7] into 5 equal sub intervals each with length
h then:
i. h =
ii. [3, 7] = [3, ]∪[ , ]∪[ , ]∪[ , ]∪[ , 7]
iii. There are 6 total points that define the partition. They are
0, ??, ??, ??, ??, 7.
c. More generally, if a closed interval [a, b] contains N equal sub intervals
where
??−??
h= .
??
df f (x + h) − f (x)
≈ .
dx h
In this approximation of the derivative we have simply removed the limit and
instead approximated the derivative as the slope. It should be clear that this
approximation is only good if h is small. In Figure 3.2 we see a graphical
depiction of what we’re doing to approximate the derivative. The slope of the
tangent line (∆y/∆x) is what we’re after, and a way to approximate it is to
calculate the slope of the secant line formed by looking h units forward from the
point x.
Figure 3.2: The forward difference differentiation scheme for the first derivative.
While this is the simplest and most obvious approximation for the first derivative
there is a much more elegant technique, using Taylor series, for arriving at this
approximation. Furthermore, the Taylor series technique suggests an infinite
family of other techniques.
Exercise 3.10. From Taylor’s Theorem we know that for an infinitely differen-
tiable function f (x),
What do we get if we replace every “x” in the Taylor Series with “x + h” and
replace every “x0 ” in the Taylor Series with “x?” In other words, in Figure 3.1
we want to center the Taylor series at x and evaluate the resulting series at the
point x + h.
f (x + h) =
Exercise 3.11. Solve the result from the previous problem for f 0 (x) to create
an approximation for f 0 (x) using f (x + h), f (x), and some higher order terms.
(fill in the blanks and the question marks)
f (x + h)−???
f 0 (x) = +
??
Exercise 3.12. In the formula that you developed in Exercise 3.11, if we were
to drop everything after the fraction (called the remainder) we know that we
would be introducing error into our derivative computation. If h is taken to be
very small then the first term in the remainder is the largest and everything else
in the remainder can be ignored (since all subsequent terms should be extremely
small . . . pause and ponder this fact). Therefore, the amount of error we make
in the derivative computation by dropping the remainder depends on the power
of h in that first term in the remainder.
What is the power of h in the first term of the remainder from Exercise 3.11?
to the length of the subinterval in the partition of the interval (see Figure 3.1)
then we call that scheme “first order” and say that the error is O(h).
More generally, we say that the error in a differentiation scheme is O(hk ) (read:
“big O of hk ”) if and only if there is a positive constant M such that
|Error| ≤ M · hk .
This is equivalent to saying that a differentiation scheme is “k th order.” This
means that the error in using the scheme is proportional to hk .
d. There was nothing really special in part (c) about powers of 2. Use your
spreadsheet to build similar tables for the following sequences of h:
Exercise 3.14. The following incomplete block of Python code will help to
streamline the previous problem so that you don’t need to do the computation
with a spreadsheet.
a. Comment every existing line with a thorough description.
b. Fill in the blanks in the code to perform the spreadsheet computations
from the previous problem.
c. Run the code for several forms of h
d. Do you still observe the same result that you observed in part (e) of the
previous problem?
90 CHAPTER 3. CALCULUS
Exercise 3.15. Assume that f (x) is some differentiable function and that we
have calculated the value of f 0 (c) using the forward difference formula
f (c + h) − f (c)
f 0 (c) ≈ .
h
Using what you learned from the previous problem to fill in the following table.
Exercise 3.16. Explain the phrase: The first derivative approximation f 0 (x) ≈
f (x+h)−f (x)
h is first order.
3.2. DIFFERENTIATION 91
c. Now we want to call upon this function to build the first order approx-
imation of the first derivative for some function. We’ll use the function
f (x) = sin(x) on the interval [0, 2π] with 100 sub intervals (since we know
what the answer should be). Complete the code below to call upon your
FirstDeriv() function and to plot f (x), f 0 (x), and the approximation of
92 CHAPTER 3. CALCULUS
f 0 (x).
f = lambda x: np.sin(x)
exact_df = lambda x: np.cos(x)
a = ???
b = ???
N = 100 # What is this?
x = np.linspace(a,b,N+1)
# What does the prevoius line do?
# What's up with the N+1?
Exercise 3.18. Now let’s build the first derivative function in a much smarter
way – using numpy lists in Python. Instead of looping over all of the elements
we can take advantage of the fact that every thing is stored in lists. Hence we
can just do list operations and do all of the subtractions and divisions at once
without a loop.
Exercise 3.19. Write code that finds a first order approximation for the first
derivative of f (x) = sin(x) − x sin(x) on the interval x ∈ (0, 15). Your script
should output two plots (side-by-side).
a. The left-hand plot should show the function in blue and the approximate
first derivative as a red dashed curve. Sample code for this problem is
given below.
import matplotlib.pyplot as plt
import numpy as np
ax[0].grid()
b. The right-hand plot should show the absolute error between the exact
derivative and the numerical derivative. You should use a logarithmic y
axis for this plot.
exact = lambda x: # write a function for the exact derivative
# There is a lot going on the next line of code ... explain it.
ax[1].semilogy(x[0:-1],abs(exact(x[0:-1]) - df))
ax[1].grid()
c. Play with the number of sub intervals, N , and demonstrate the fact that
we are using a first order method to approximate the first derivative.
Exercise 3.20. Consider again the Taylor series for an infinitely differentiable
function f (x):
f 0 (x0 ) f 00 (x0 ) f (3) (x0 )
f (x) = f (x0 ) + (x − x0 )1 + (x − x0 )2 + (x − x0 )3 + · · ·
1! 2! 3!
a. Replace the “x” in the Taylor Series with “x + h” and replace the “x0 ” in
the Taylor Series with “x” and simplify.
f (x + h) =
b. Now replace the “x” in the Taylor Series with “x − h” and replace the “x0 ”
in the Taylor Series with “x” and simplify.
f (x − h) =
d. Solve for f 0 (x) in your result from part (c). Fill in the question marks and
blanks below once you have finished simplifying.
???−???
f 0 (x) = + .
2h
3.2. DIFFERENTIATION 95
f 0 (x) = + O(h2 ).
f. Draw a picture similar to Figure 3.2 showing what this scheme is doing
graphically.
Exercise 3.21. Let’s return to the function f (x) = sin(x)(1 − x) but this time
we will approximate the first derivative at x = 1 using the formula
f (1 + h) − f (1 − h)
f 0 (1) ≈ .
2h
You should already have the first derivative and the exact answer from Exercise
3.13 (if not, then go get them by hand again).
a. Fill in the table below with the derivative approximation and the absolute
error associated with each given h. You may want to use a spreadsheet to
organize your data (be sure that you’re working in radians!).
b. There was nothing really special in part (c) about powers of 2. Use your
spreadsheet to build similar tables for the following sequences of h:
helpful to include a column in your table that tracks the error reduction
factor as we decrease h.
d. What does your answer to part (e) have to do with the approximation
order of the numerical derivative method that you used?
Exercise 3.22. Assume that f (x) is some differentiable function and that we
have calculated the value of f 0 (c) using the centered difference formula
f (c + h) − f (c − h)
f 0 (c) ≈ .
2h
Using what you learned from the previous problem to fill in the following table.
Exercise 3.24. Test the code you wrote in the previous exercise on functions
where you know the first derivative.
Exercise 3.25. The plot shown in Figure 3.3 shows the maximum absolute
error between the exact first derivative of a function f (x) and a numerical first
derivative approximation scheme. At this point we know two schemes:
f (x + h) − f (x)
f 0 (x) = + O(h)
h
and
f (x + h) − f (x − h)
f 0 (x) = + O(h2 ).
2h
a. Which curve in the plot matches with which method. How do you know?
b. Recreate the plot with a function of your choosing.
3.2. DIFFERENTIATION 97
Figure 3.3: Maximum absolute error between the first derivative and two different
approximations of the first derivative.
Exercise 3.26. The goal of this problem is to use the Taylor series for f (x + h)
and f (x − h) to arrive at an approximation scheme for the second derivative
f 00 (x).
a. Add the Taylor series for f (x + h) and f (x − h) and combine all like terms.
You should notice that several terms cancel.
f (x + h) + f (x − h) = .
c. If we were to drop all of the terms after the fraction on the right-hand
side of the previous result we would be introducing some error into the
derivative computation. What does this tell us about the order of the error
for the second derivative approximation scheme we just built?
Exercise 3.29. Test your second derivative code on the function f (x) =
sin(x) − x sin(x) by doing the following.
a. Find the analytic second derivative by hand.
b. Find the numerical second derivative with the code that you just wrote.
c. Find the absolute difference between your numerical second derivative and
the actual second derivative. This is point-by-point subtraction so you
should end up with a vector of errors.
d. Find the maximum of your errors.
e. Now we want to see how the code works if you change the number of points
used. Build a plot showing the value of h on the horizontal axis and the
maximum error on the vertical axis. You will need to write a loop that
gets the error for many different values of h. Finally, it is probably best to
build this plot on a log-log scale.
f. Discuss what you see? How do you see the fact that the numerical second
derivative is second order accurate?
3.2. DIFFERENTIATION 99
The table below summarizes the formulas that we have for derivatives thus
far. The exercises at the end of this chapter contain several more derivative
approximations. We will return to this idea when we study numerical differential
equations in Chapter 5.
3.3 Integration
Now we begin our work on the second principle computation of Calculus: evalu-
ating a definite integral. Remember that a single-variable definite integral can be
interpreted as the signed area between the curve and the x axis. In this section
we will study three different techniques for approximating the value of a definite
integral.
Exercise 3.31. Consider the shaded area of the region under the function
plotted in Figure 3.4 between x = 0 and x = 2.
a. What rectangle with area 6 gives an upper bound for the area under the
curve? Can you give a better upper bound?
b. Why must the area under the curve be greater than 3?
c. Is the area greater than 4? Why/Why not?
d. Work with your partner to give an estimate of the area and provide an
estimate for the amount of error that you’re making.
2.5
1.5
0.5
0.5 1 1.5 2
where N is the number of sub intervals on the interval [a, b] and ∆x is the width
of the interval. As with differentiation, we can remove the limit and have a
decent approximation of the integral so long as N is large (or equivalently, if ∆x
102 CHAPTER 3. CALCULUS
is small).
Z b N
X
f (x)dx ≈ f (xj )∆x.
a j=1
You are likely familiar with this approximation of the integral from Calculus. The
value of xj can be chosen anywhere within the sub interval and three common
choices are to use the left-aligned, the midpoint-aligned, and the right-aligned.
We see a depiction of this in Figure 3.5.
4 4 4
0 0 0
0 1 1 3 2 0 1 1 3 2 0 1 1 3 2
2 2 2 2 2 2
Clearly, the more rectangles we choose the closer the sum of the areas of the
rectangles will get to the integral.
Exercise 3.33. Consider the function f (x) = sin(x). We know the antiderivative
for this function, F (x) = − cos(x) + C, but in this question we are going to get
a sense of the order of the error when doing Riemann Sum integration.
a. Find the exact value of Z 1
f (x)dx.
0
b. Now build a Riemann Sum approximation (using your code) with various
values of ∆x. For all of your approximation use left-justified rectangles.
Fill in the table with your results.
3.3. INTEGRATION 103
c. There was nothing really special about powers of 2 in part (b) of this
problem. Examine other sequences of ∆x with a goal toward answering
the question:
If we find an approximation of the integral with a fixed ∆x and find an
absolute percent error, then what would happen to the absolute percent error
if we divide ∆x by some positive constant M ?
d. What is the apparent approximation error of the Riemann Sum method
using left-justified rectangles.
Rb
Theorem 3.2. In approximating the integral a
f (x)dx with a fixed interval
width ∆x we find an absolute percent error P .
• If we use left rectangles and an interval width of ∆x
M then the absolute
percent error will be approximately .
• If we use right rectangles and an interval width of ∆x
M then the absolute
percent error will be approximately .
Exercise 3.36. Create a plot with the width of the subintervals on the horizontal
axis and the absolute error between your Riemann sum calculations (left, right,
104 CHAPTER 3. CALCULUS
and midpoint) and the exact integral for a known definite integral. Your plot
should be on a log-log scale. Based on your plot, what is the approximate order
of the error in the Riemann sum approximation?
3 3 3
2 2 2
1 1 1
1 2 3 4 1 2 3 4 1 2 3 4
Now use the same idea with h = ∆x = 1 from Figure 3.6 to approximate the
area under the function f (x) = 15 x2 (5 − x) between x = 1 and x = 4 using three
trapezoids.
3.3. INTEGRATION 105
2
b2
b1 h
1 2 3 4
c. From the table that you built in part (b), what do you conjecture is the
order of the approximation error for the trapezoid method?
106 CHAPTER 3. CALCULUS
Z b
Definition 3.3. (The Trapezoidal Rule) We want to approximate f (x)dx.
a
One of the simplest ways is to approximate the area under the function with a
trapezoid. Recall from basic geometry that area of a trapezoid is A = 12 (b1 + b2 )h.
In terms of the integration problem we can do the following:
a. First partition [a, b] into the set {x0 = a, x1 , x2 , . . . , xn−1 , xn = b}.
b. On each part of the partition approximate the area with a trapezoid:
1
Aj = [f (xj ) + f (xj−1 )] (xj − xj−1 )
2
c. Approximate the integral as
Z b n
X
f (x)dx = Aj
a j=1
Exercise 3.39. Write code to give the trapezoidal rule approximation for the
Rb
definite integral a f (x)dx. Test your code on functions where you know the
definite area. Then test your code on functions where you have approximated
the area by examining a plot (i.e. you have a visual estimate of the area).
Exercise 3.40. Use the code that you wrote in the previous problem to test
your conjecture about the order of the approximation error for the trapezoid
rule. Integrate the function f (x) = sin(x) from x = 0 to x = 1 with more and
more trapezoids. In each case compare to the exact answer and find the absolute
percent error. The goal is to answer the question:
If we calculate the definite integral with a fixed ∆x and get an absolute percent
error, P , then what absolute percent error will we get if we use a width of ∆x/M
for some positive number M ?
Rb
We want to find constants A1 , A2 , and A3 such that the integral a f (x)dx can
be written as a linear combination of f (a), f (m), and f (b). Specifically, we want
to find constants A1 , A2 , and A3 in terms of a, b, f (a), f (b), and f (m) such
that Z b
f (x)dx = A1 f (a) + A2 f (m) + A3 f (b)
a
is exact for all constant, linear, and quadratic functions. This would guarantee
that we have an exact integration method for all polynomials of order 2 or less
but should serve as a decent approximation if the function is not quadratic.
Exercise 3.41. Draw a picture showing what the previous two paragraphs
discussed.
b. Prove that
b
b2 − a 2
Z
a+b
xdx = = A1 a + A2 + A3 b.
a 2 2
c. Prove that
b 2
b3 − a3
Z
2 a+b
x dx = = A1 a2 + A2 + A3 b2 .
a 3 2
Exercise 3.43. At this point we can see that an integral can be approximated
as Z b
b−a a+b
f (x)dx ≈ f (a) + 4f + f (b)
a 6 2
and the technique will give an exact answer for any polynomial of order 2 or
below.
Verify the previous sentence by integrating f (x) = 1, f (x) = x and f (x) = x2
by hand on the interval [0, 1] and using the approximation formula
Z b
b−a a+b
f (x)dx ≈ f (a) + 4f + f (b) .
a 6 2
108 CHAPTER 3. CALCULUS
a. Use the method described above to approximate the area under the curve
f (x) = (1/5)x2 (5 − x) on the interval [1, 4]. To be clear, you will be using
the points a = 1, m = 2.5, and b = 4 in the above derivation.
b. Next find the exact area under the curve g(x) = (−1/2)x2 + 3.3x − 2 on
the interval [1, 4].
c. What do you notice about the two areas? What does this sample problem
tell you about the formula that we derived above?
To make the punchline of the previous exercises a bit more clear, using the
formula Z b
a−b
f (x)dx ≈ (f (a) + 4f (m) + f (b))
a 6
is the same as fitting a parabola to the three points (a, f (a)), (m, f (m)), and
(b, f (b)) and finding the area under the parabola exactly. That is exactly the
step up from the trapezoid rule and Riemann sums that we were after:
• Riemann sums approximate the function with constant functions,
• the trapezoid rule uses linear functions, and
• now we have a method for approximating with parabolas.
To improve upon this idea we now examine the problem of partitioning the
interval [a, b] into small pieces and running this process on each piece. This is
called Simpson’s Rule for integration.
Definition 3.4. (Simpson’s Rule) Now we put the process explained above
into a form that can be coded to approximate integrals. We call this method
Simpson’s Rule after Thomas Simpson (1710-1761) who, by the way, was a basket
weaver in his day job so he could pay the bills and keep doing math.
a. First partition [a, b] into the set {x0 = a, x1 , x2 , . . . , xn−1 , xn = b}.
b. On each part of the partition approximate the area with a parabola:
1 xj + xj−1
Aj = f (xj ) + 4f + f (xj−1 ) (xj − xj−1 )
6 2
Exercise 3.44. We have spent a lot of time over the past many pages building
approximations of the order of the error for numerical integration and differenti-
ation schemes. It is now up to you.
3.3. INTEGRATION 109
Build a numerical experiment that allows you to conjecture the order of the
approximation error for Simpson’s rule. Remember that the goal is to answer
the question:
If I approximate the integral with a fixed ∆x and find an absolute percent error
of P , then what will the absolute percent error be using a width of ∆x/M ?
Exercise 3.45. Write a Python function that implements Simpson’s Rule. You
should ALWAYS start by writing pseudo-code as comments in your file. You
shouldn’t need a loop in your function.
Exercise 3.46. Test your function on known integrals and approximate the
order of the error based on the mesh size.
Thus far we have three numerical approximations for definite integrals: Riemann
sums (with rectangles), the trapezoidal rule, and Simpsons’s rule. There are
MANY other approximations for integrals and we leave the further research to
the curious reader.
Exercise 3.47. Theorem 3.3 simply states the error rates for our three primary
integration schemes. For this problem you need to empirically verify these error
rates. Use the integration problem and exact answer
Z π/4
3 3π/4 2
e3x sin(2x)dx = e +
0 13 13
and write code that produces a log-log error plot with ∆x on the horizontal axis
and the absolute error on the vertical axis. Fully explain how the error rates
show themselves in your plot.
3.4. OPTIMIZATION 111
3.4 Optimization
3.4.1 Single Variable Optimization
You likely recall that one of the major applications of Calculus was to solve
optimization problems – find the value of x which makes some function as big
or as small as possible. The process itself can sometimes be rather challenging
due to either the modeling aspect of the problems and/or the fact that the
differentiation might be quite cumbersome. In this section we will revisit those
problems from Calculus, but our goal will be to build a numerical method for
the Calculus step in hopes to avoid the messy algebra and differentiation.
x x
x x
20cm
x x
x x
20cm
The hard part of the single variable optimization process is often solving the
equation f 0 (x) = 0. We could use numerical root finding schemes to solve this
112 CHAPTER 3. CALCULUS
equation, but we could also potentially do better without actually finding the
derivative. In the following we propose a few numerical techniques that can
approximate the solution to these types of problems. The basic ideas are simple!
Exercise 3.49. If you were blind folded and standing on a hill could you find
the top of the hill? (assume no trees and no cliffs . . . this isn’t supposed to be
dangerous) How would you do it? Explain your technique clearly.
Exercise 3.50. If you were blind folded and standing on a crater on the moon
could you find the lowest point? How would you do it? Remember that you can
hop as far as you like . . . because gravity . . . but sometimes that’s not a great
thing because you could hop too far.
Exercise 3.51. Let’s turn your intuitions into algorithms. If f (x) is the function
that you are trying to maximize then turn your ideas from the previous problems
into step-by-step algorithms which could be coded. Then try out your codes on
the function
2
f (x) = e−x + sin(x2 )
to see if your algorithms can find the local maximum near x ≈ 1.14. Try to
generate several different algorithms.
Some of the most common algorithms are listed below. Read through them and
see which one(s) you ended up recreating? The intuition for these algorithms is
pretty darn simple – travel uphill if you want to maximize – travel downhill if
you want to minimize.
Exercise 3.54. Write code to implement the 1D monte carlo search algorithm
and use it to solve Exercise 3.48. Compare your answer to the analytic solution.
• Set the derivative to zero and use a numerical root finding method (such
as bisection or Newton) to find the critical point.
• Use the extreme value theorem to determine if the critical point or one of
the endpoints is the maximum (or minimum).
Exercise 3.55. Write code to implement the 1D numerical root finding opti-
mization algorithm and use it to solve Exercise 3.48. Compare your answer to
the analytic solution.
Exercise 3.56. In this problem we will compare an contrast the four methods
proposed in the previous problem.
a. What are the advantages to each of the methods proposed?
b. What are the disadvantages to each of the methods proposed?
c. Which method, do you suppose, will be faster in general? Why?
d. Which method, do you suppose, will be slower in general? Why?
should notice that the algebra and calculus for solving this problem is no longer
really a desirable way to go. Use an appropriate numerical technique to solve
this problem.
12
10
1 2 3 4 5
Let’s see if you can extend your intuition from single variable to multivariable.
This particular subsection is intentionally quite brief. If you want more details on
multivariable optimization it would be wise to take a full course in optimization.
Exercise 3.61. The derivative free optimization method discussed in the single
variable optimization section just said that you should pick two points and pick
the one that takes you furthest uphill.
a. Why is it insufficient to choose just two points if we are dealing with a
function of two variables? Hint: think about contour line.
b. For a function of two variables, how many points should you use to compare
and determine the direction of “uphill?”
c. Extend your answer from part (b) to n dimensions. How many points
should we compare if we are in n dimensions and need to determine which
direction is “uphill?”
d. Back in the case of a two-variable function, you should have decided that
three points was best. Explain an algorithm for moving one point at a time
so that your three points eventually converge to a nearby local maximum.
It may be helpful to make a surface plot or a contour plot of a well-known
function just as a visual.
The code below will demonstrate how to make a contour plot.
import numpy as np
import matplotlib.pyplot as plt
xdomain = np.linspace(-4,4,100)
ydomain = np.linspace(-4,4,100)
X, Y = np.meshgrid(xdomain,ydomain)
f = lambda x, y: np.sin(x)*np.exp(-np.sqrt(x**2+y**2))
plt.contour(X,Y,f(X,Y))
plt.grid()
plt.show()
Exercise 3.62. Now let’s tackle the gradient ascent/descent algorithm. You
should recall that the gradient vector points in the direction of maximum change.
How can you use this fact to modify the gradient ascent/descent algorithm given
previously? Clearly write your algorithm so that a classmate could turn it into
code.
Exercise 3.63. How does the Monte Carlo algorithm extend to a two-variable
optimization problem? Clearly write your algorithm.
3.4. OPTIMIZATION 117
Exercise 3.64. Try out the gradient descent/ascent and Monte Carlo algorithms
on the function f (x, y) = sin(x) cos(y) + 0.1x2 which has many local extrema and
no global maximum. We are not going to code the multidimensional derivative
free optimization routine in this section.
The derivative free, gradient ascent/descent, and monte carlo techniques still
have good analogues in higher dimensions. We just need to be a bit careful
since in higher dimensions there is much more room to move. Below we’ll give
the full description of the gradient ascent/descent algorithm. We don’t give the
full description of the derivative free or Monte Carlo algorithms since there are
many ways to implement them. The interested reader should see a course in
mathematical optimization or machine learning.
Take Note: If you are looking to maximize your objective function then in the
Monte-Carlo search you should examine if z is greater than your current largest
value. For gradient descent you should actually do a gradient ascent instead and
follow the positive gradient instead of the negative gradient.
Exercise 3.65. The functions like f (x, y) = sin(x) cos(y) have many local
extreme values which makes optimization challenging. Implement your Gradient
Descent code on this function to find the local minimum (−π/2, 0). Start
somewhere near (−π/2, 0) and show by way of example that your gradient
descent code may not converge to this particular local minimum. Why is this
important?
118 CHAPTER 3. CALCULUS
3.5. CALCULUS WITH NUMPY AND SCIPY 119
3.5.1 Differentiation
There are two main tools built into the numpy and scipy libraries that do
numerical differentiation. In numpy there is the np.diff() command. In scipy
there is the scipy.misc.derivative() command.
Exercise 3.66. In the following blocks of Python code we demonstrate what the
np.diff() command does. Use these examples to give a thorough description
for what np.diff() does to a Python list.
Exercise 3.67. Why does the np.diff() command produce a list that is one
element shorter than the original list?
Exercise 3.68. If we have a list of x values and a list of y values for a function
y = f (x) then how do we use np.diff() to approximate the first derivative of
f (x)? What is the order of the error in the approximation?
Exercise 3.69. What does the following block of Python code do?
import numpy as np
x = np.linspace(0,1,6)
dx = x[1]-x[0]
y = x**2
print( np.diff(y,2) / dx**2 )
Exercise 3.70. Use the np.diff() command to approximate the first and
second derivatives of the function f (x) = x sin(x) − ln(x) on the domain [1, 5].
Then create a plot that shows f (x) and the approximations of f 0 (x) and f 00 (x).
import numpy as np
x = np.linspace(0,1,6)
dx = x[1]-x[0]
y = x**2
dy = 2*x
print("function values: \n",y)
3.5. CALCULUS WITH NUMPY AND SCIPY 121
Exercise 3.72. In the following code we find the first and second derivatives
of f (x) = x sin(x) − ln(x) using scipy.misc.derivative(). Notice that we’ve
chosen to take dx=1e-6 for each of the derivative computations. That may seem
like an odd choice, but there is more going on here. Try successively smaller and
smaller values for the dx parameter. What do you find? Why does it happen?
import numpy as np
import scipy.misc
import matplotlib.pyplot as plt
f = lambda x: np.sin(x)*x-np.log(x)
x = np.linspace(1,5,100) # x domain: 100 points between 1 and 5
df = scipy.misc.derivative(f,x,dx=1e-6)
df2 = scipy.misc.derivative(f,x,dx=1e-6,n=2)
plt.plot(x,f(x),'b',x,df,'r--',x,df2,'k--')
plt.legend(["f(x)","f'(x)","f''(x)"])
plt.grid()
plt.show()
3.5.2 Integration
In numpy there is a nice tool called np.trapz() that implements the trapezoidal
rule. In the following problem you will find several examples of the np.trapz()
command. Use these examples to determine how the command works to integrate
functions.
R2
Exercise 3.73. First we’ll approximate the integral −2
x2 dx. The exact answer
122 CHAPTER 3. CALCULUS
is
2
x3
Z 2 16
x2 dx = = = 5.3333...
−2 3 −2 3
import numpy as np
x = np.linspace(-2,2,100)
dx = x[1]-x[0]
y = x**2
print("Approximate integral is ",np.trapz(y)*dx)
R 2π
Next we’ll approximate 0 sin(x)dx. We know that the exact value is 0.
import numpy as np
x = np.linspace(0,2*np.pi,100)
dx = x[1]-x[0]
y = np.sin(x)
print("Approximate integral is ",np.trapz(y)*dx)
Pick a function and an interval for which you know the exact definite integral.
Demonstrate how to use np.trapz() on your definite integral.
Exercise 3.74. Notice in the last examples that we multiplied the result of the
np.trapz() command by dx. Why did we do this? What is the np.trapz()
command doing without the dx?
like Simpson’s rule are called quadrature rules for integration. The function
scipy.integrate.quad() accepts a Python function (or a lambda function)
and the bounds of the definite integral. It outputs an approximation of the
integral along with an approximation of the error in the integral calculation. See
the Python code below.
import numpy as np
import scipy.integrate
f = lambda x: x**2
I = scipy.integrate.quad(f,-2,2)
print(I)
Exercise 3.75. What are the advantages and disadvantages to using the
scipy.integrate.quad() command as compared to the np.trapz() command.
Exercise 3.76. If you have data for the hourly rate at which water is being
drained from a dam and you want to find the total amount of water drained
over the course of the time in the dataset, then which of the tools that we know
would you use? Why?
3.5.3 Optimization
As you’ve seen in this section there are many tools built into numpy and scipy
that will do some of our basic numerical computations. The same is true for
numerical optimization problems. Keep in mind throughout the remainder of
this section that the whole topic of numerical optimization is still an active
area of research and there is much more to the story that what we’ll see here.
However, the Python tools that we will use are highly optimized and tend to
work quite well.
a. Implement the code above then spend some time playing around with the
minimize command to minimize more challenging functions.
b. Explain what all of the output information is from the .minimize()
command.
Exercise 3.80. Consider the function f (x) that goes exactly through the points
(0, 1), (1, 4), and (2, 13).
a. Find a function that goes through these points exactly. Be able to defend
your work.
b. Is your function unique? That is to say, is there another function out there
that also goes exactly through these points?
Exercise 3.81. Now let’s make a minor tweak to the previous problem. Let’s
say that we have the data points (0, 1.07), (1, 3.9), (2, 14.8), and (3, 26.8). Notice
that these points are close to the points we had in the previous problem, but
all of the y values have a little noise in them and we have added a fourth point.
If we suspect that a function f (x) that best fits this data is quadratic then
f (x) = ax2 + bx + c for some constants a, b, and c.
a. Plot the four points along with the function f (x) for arbitrarily chosen
values of a, b, and c.
b. Work with your partner(s) to systematically change a, b, and c so that you
get a good visual match to the data. The Python code below will help you
get started.
import numpy as np
import matplotlib.pyplot as plt
xdata = np.array([0, 1, 2, 3])
ydata = np.array([1.07, 3.9, 14.8, 26.8])
a = # conjecture a value of a
b = # conjecture a value of b
c = # conjecture a value of c
x = # build an x domain starting at 0 and going through 4
guess = a*x**2 + b*x + c
126 CHAPTER 3. CALCULUS
Exercise 3.82. Now let’s be a bit more systematic about things from the
previous problem. Let’s say that you have a pretty good guess that b ≈ 2 and
c ≈ 0.7. We need to get a good estimate for a.
a. Pick an arbitrary starting value for a then for each of the four points find
the error between the predicted y value and the actual y value. These
errors are called the residuals.
b. Square all four of your errors and add them up. (Pause, ponder, and
discuss: why are we squaring the errors before we sum them?)
3.6. LEAST SQUARES CURVE FITTING 127
c. Now change your value of a to several different values and record the sum
of the square errors for each of your values of a. It may be worth while to
use a spreadsheet to keep track of your work here.
d. Make a plot with the value of a on the horizontal axis and the value of the
sum of the square errors on the vertical axis. Use your plot to defend the
optimal choice for a.
Exercise 3.83. We’re going to revisit part (c) of the previous problem. Write a
loop that tries many values of a in very small increments and calculates the sum
of the squared errors. The following partial Python code should help you get
started. In the resulting plot you should see a clear local minimum. What does
that minimum tell you about solving this problem?
import numpy as np
import matplotlib.pyplot as plt
xdata = np.array([0, 1, 2, 3])
ydata = np.array([1.07, 3.9, 14.8, 26.8])
b = 2
c = 0.75
A = # give a numpy array of values for a
SumSqRes = [] # this is storage for the sum of the sq. residuals
for a in A:
guess = a*xdata**2 + b*xdata + c
residuals = # write code to calculate the residuals
SumSqRes.append( ??? ) # calculate the sum of the squ. residuals
plt.plot(A,SumSqRes,'r*')
plt.grid()
plt.xlabel('Value of a')
plt.ylabel('Sum of squared residuals')
plt.show()
Now let’s formalize the process that we’ve described in the previous problems.
b. Calculate the square error between the data point and the prediction from
the function f (x)
2
error for the point xi : ei = (yi − f (xi )) .
Note that squaring the error has the advantages of removing the sign,
accentuating errors larger than 1, and decreasing errors that are less than
1.
c. As a measure of the total error between the function and the data, sum
the squared errors
n
X 2
sum of square errors = (yi − f (xi )) .
i=1
Exercise 3.84. In 3.10 the last step is a bit vague. That was purposeful since
there are many techniques that could be used to minimize the sum of the square
errors. However, if we just think about the sum of the squared residuals as a
function then we can apply scipy.optimize.minimize() to that function in
order to return the values of the parameters that best minimize the sum of the
squared residuals. The following blocks of Python code implement the idea in a
very streamlined way. Go through the code and comment each line to describe
exactly what it does.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
xdata = np.array([0, 1, 2, 3])
ydata = np.array([1.07, 3.9, 14.8, 26.8])
def SSRes(parameters):
# In the next line of code we want to build our
# quadratic approximation y = axˆ2 + bx + c
# We are sending in a list of parameters so
# a = parameters[0], b = parameters[1], and c = parameters[2]
yapprox = parameters[0]*xdata**2 + \
parameters[1]*xdata + \
parameters[2]
residuals = np.abs(ydata-yapprox)
return np.sum(residuals**2)
3.6. LEAST SQUARES CURVE FITTING 129
BestParameters = minimize(SSRes,[2,2,0.75])
print("The best values of a, b, and c are: \n",BestParameters.x)
# If you want to print the diagnositc then use the line below:
# print("The minimization diagnostics are: \n",BestParameters)
plt.plot(xdata,ydata,'bo',markersize=5)
x = np.linspace(0,4,100)
y = BestParameters.x[0]*x**2 + \
BestParameters.x[1]*x + \
BestParameters.x[2]
plt.plot(x,y,'r--')
plt.grid()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Best Fit Quadratic')
plt.show()
Exercise 3.85. With a partner choose a function and then choose 10 points on
that function. Add a small bit of error into the y-values of your points. Give
your 10 points to another group. Upon receiving your new points:
• Plot your points.
• Make a guess about the basic form of the function that might best fit the
data. Your general form will likely have several parameters (just like the
quadratic had the parameters a, b, and c).
130 CHAPTER 3. CALCULUS
• Modify the code from above to find the best collection of parameters
minimize the sum of the squares of the residuals between your function
and the data.
• Plot the data along with your best fit function. If you are not satisfied
with how it fit then make another guess on the type of function and repeat
the process.
• Finally, go back to the group who gave you your points and check your
work.
Exercise 3.86. For each dataset associated with this exercise give a functional
form that might be a good model for the data. Be sure to choose the most general
form of your guess. For example, if you choose “quadratic” then your functional
guess is f (x) = ax2 + bx + c, if you choose “exponential” then your functional
guess should be something like f (x) = aeb(x−c) + d, or if you choose “sinusoidal”
then your guess should be something like f (x) = a sin(bx) + c cos(dx) + e. Once
you have a guess of the function type create a plot showing your data along
with your guess for a reasonable set of parameters. Then write a function that
leverages scipy.optimize.minimize() to find the best set of parameters so that
your function best fits the data. Note that if scipy.optimize.minimize() does
not converge then try the alternative scipy function scipy.optimize.fmin().
Also note that you likely need to be very close to the optimal parameters to get
the optimizer to work properly.
You can load the data with the following script.
import numpy as np
import pandas as pd
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
datasetA = np.array( pd.read_csv(URL+'Exercise3_datafit2.csv') )
datasetB = np.array( pd.read_csv(URL+'Exercise3_datafit3.csv') )
datasetC = np.array( pd.read_csv(URL+'Exercise3_datafit4.csv') )
# Exercise3_datafit1.csv,
# Exercise3_datafit2.csv,
# Exercise3_datafit3.csv
As a nudge in the right direction, in the left-hand pane of Figure 3.13 the
function appears to be exponential. Hence we should choose a function of the
form f (x) = aeb(x−c) + d. Moreover, we need to pick good approximations of the
parameters to start the optimization process. In the left-hand pane of Figure
3.13 the data appears to start near x = 1970 so our initial guess for c might
be c ≈ 1970. To get initial guesses for a, b, and d we can observe that the
expected best fit curve will approximately go through the points (1970, 15000),
(1990, 40000), and (2000, 75000). With this information we get the equations
3.6. LEAST SQUARES CURVE FITTING 131
a+d ≈ 15000, ae20b +d ≈ 40000 and ae30b +d ≈ 75000 and work to get reasonable
approximations for a, b, and d to feed into the scipy.optimize.minimize()
command.
Figure 3.13: Raw data for least squares function matching problems.
132 CHAPTER 3. CALCULUS
3.7 Exercises
3.7.1 Algorithm Summaries
Exercise 3.87. Starting from Taylor series prove that
f (x + h) − f (x)
f 0 (x) ≈
h
is a first-order approximation of the first derivative of f (x). Clearly describe
what “first-order approximation” means in this context.
f (x + h) − f (x − h)
f 0 (x) ≈
2h
is a second-order approximation of the first derivative of f (x). Clearly describe
what “second-order approximation” means in this context.
f (x + h) − 2f (x) + f (x − h)
f 00 (x) ≈
h2
is a second-order approximation of the second derivative of f (x). Clearly describe
what “second-order approximation” means in this context.
Exercise 3.93. Explain in clear language how the derivative free optimization
method works on a single-variable function.
Exercise 3.95. Explain in clear language how the Monte Carlo search optimiza-
tion method works on a single-variable function.
Exercise 3.96. Explain in clear language how you find the optimal set of
parameters given a set of data and a proposed general function type.
Exercise 3.98. Write a function that accepts a list of (x, y) ordered pairs
from a spreadsheet and returns a list of (x, y) ordered pairs for a first order
approximation of the first derivative of the underlying function. Create a test
spreadsheet file and a test script that have graphical output showing that your
function is finding the correct derivative.
Exercise 3.99. Write a function that accepts a list of (x, y) ordered pairs from a
spreadsheet or a *.csv file and returns a list of (x, y) ordered pairs for a second
order approximation of the second derivative of the underlying function. Create
134 CHAPTER 3. CALCULUS
a test spreadsheet file and a test script that have graphical output showing that
your function is finding the correct derivative.
time (min) 0 7 19 25 38 47 55
flow rate (gal/sec) 316 309 296 298 305 314 322
You can download the data directly from the textbook’s github page with the
code below.
import numpy as np
import pandas as pd
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
data = np.array( pd.read_csv(URL+'Exercise3_waterflow.csv') )
# Exercise3_waterflow.csv
b. The department of transportation finds that the rate at which cars cross a
bridge can be approximated by the function
22.8
f (t) = ,
3.5 + 7(t − 1.25)4
where t = 0 at 4pm, and is measured in hours, and f (t) is measured in
cars per minute. Estimate the total number of cars that cross the bridge
between 4 and 6pm. Make sure that your estimate has an error less than
5% and provide sufficient mathematical evidence of your error estimate.
Exercise 3.104. Numerically integrate each of the functions over the interval
[−1, 2] with an appropriate technique and verify mathematically that your
numerical integral is correct to 10 decimal places. Then provide a plot of the
function along with its numerical first derivative.
x
a. f (x) = 1+x4
Time (sec) 0 10 20 30 40 50 60 70 80 90
Speed (ft/sec) 34 32 29 33 37 40 41 36 38 39
136 CHAPTER 3. CALCULUS
Exercise 3.106. For each of the following functions write code to numerically
approximate the local maximum or minimum that is closest to x = 0. You may
want to start with a plot of the function just to get a feel for where the local
extreme value(s) might be.
x
a. f (x) = + sin(x)
1 + x4
3 2
b. g(x) = (x − 1) · (x − 2) + e−0.5·x
Exercise 3.107. Go back to your old Calculus textbook or homework and find
your favorite optimization problem. State the problem, create the mathematical
model, and use any of the numerical optimization techniques in this chapter to
get an approximate solution to the problem.
Exercise 3.108. In the code below you can download several sets of noisy data
from measurements of elementary single variable functions.
a. Make a hypothesis about which type of function would best model the
data. Be sure to choose the most general (parameterized) form of your
function.
b. Use appropriate tools to find the parameters for the function that best fits
the data. Report you sum of square residuals for each function.
The functions that you propose must be continuous functions.
import numpy as np
import pandas as pd
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
datasetA = np.array( pd.read_csv(URL+'Exercise3_datafit5.csv') )
datasetB = np.array( pd.read_csv(URL+'Exercise3_datafit6.csv') )
datasetC = np.array( pd.read_csv(URL+'Exercise3_datafit7.csv') )
datasetD = np.array( pd.read_csv(URL+'Exercise3_datafit8.csv') )
3.7. EXERCISES 137
3.8 Projects
In this section we propose several ideas for projects related to numerical Calculus.
These projects are meant to be open ended, to encourage creative mathematics,
to push your coding skills, and to require you to write and communicate your
mathematics. Take the time to read Appendix B before you write your final
solution.
For this project we will be analyzing the galaxy “ngc 1275.” The black hole at
the center of this galaxy is often referred to as the “Galactic Spaghetti Monster”
since the magnetic field “sustains a mammoth network of spaghetti-like gas
filaments around it.” You can download the data file associated with this project
with the following Python code.
import numpy as np
import pandas as pd
URL1='https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2='/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
ngc1275 = np.array( pd.read_csv(URL+'ngc1275.csv') )
# ngc1275.csv
In the data you will see the spectral data measuring the light intensity from ncg
1275 at several different wavelengths (measured in Angstroms ). You will notice
in this data set that there are several emission lines at various wavelengths. Of
particular interest are the peaks near 3800 Angstroms, 5100 Angstroms, 6400
Angstroms, and the two peaks around 6700 Angstroms. The data set contains
1,727 data points at different wavelengths. Your first job will be to transform
3.8. PROJECTS 139
c
λ=
f
where λ is the wavelength, c is the speed of light, and f is the frequency (measured
in Hertz). Be sure to double check the units. Given the inverse relationship
between frequency and wavelength you should see the emission lines flip to the
other side of the plot (right-to-left or left-to-right).
The strength of each emission line (in W/m2 ) is defined as the relative intensity
of each peak across the associated frequencies. Note that you are not interested
in the intensity of the continuous spectrum – just the peaks. That is to say
that you are only interested in the area above the background curve and the
background noise.
Your primary task is to develop a process for analyzing data sets like this so as
to determine the strength of each emission lines. You must demonstrate your
process on this particular data set, but your process must be generalizable to any
similar data set. Your process must clearly determine the strength of peaks in
data sets like this and you must apply your procedure to determine the strength
of each of these four lines with an associated margin of error. Keep in mind that
you will first want to first develop a method for removing the background noise.
Finally, the double peak near 6700 Angstroms needs to be handled with care:
the strength of each emission line is only the integral over one peak, not two, so
you’ll need to determine a way to separate these peaks.
Figure 3.14 depicts what this looks like when we zoom in to a pixel and
its immediate neighbors. The pixel labeled G[i,j] is the pixel at which
we want to evaluate the gradient, and the surrounding pixels are labeled
by their indices relative to [i,j].
well as give a list of pros and cons for using the new numerical gradient for
edge detection based on what you see in your images. As an example, you
could use a centered difference scheme that looks two pixels away instead
of at the immediate neighboring pixels
???f (x − 2)+???f (x + 2)
f 0 (x) ≈ .
???
Of course you would need to determine the coefficients in this approximation
scheme.
Another idea could use a centered difference scheme that uses pixels that
are immediate neighbors AND pixels that are two units away
???f (x − 2)+???f (x − 1)+???f (x + 1)+???f (x + 2)
f 0 (x) ≈ .
???
In any case, you will need to use Taylor Series to derive coefficients in
the formulas for the derivatives as well as the order of the error. There
are many ways to approximate the first derivatives so be creative. In
your exploration you are not restricted to using just the first derivative.
There could be some argument for using the second derivatives and/or the
Hessian matrix of the gray scale image function G(x, y) and using some
function of the concavity as a means of edge detection. Explore and have
fun!
The following code will allow you to read an image into Python as an np.array().
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import image
I = np.array(image.plt.imread('ImageName.jpg'))
plt.imshow(I)
plt.axis("off")
plt.show()
You should notice that the image, I, is a three dimensional array. The three
layers are the red, green, and blue channels of the image. To flatten the image
to gray scale you can apply the rule
grayscale value = 0.3Red + 0.59Green + 0.11Blue.
The output should be a 2 dimensional numpy array which you can show with the
following Python code.
plt.imshow(G, cmap='gray') # "cmap" stands for "color map"
plt.axis("off")
plt.show()
Figure 3.15 shows the result of different threshold values applied to the simplest
numerical gradient computations. The image was taken by the author.
3.8. PROJECTS 143
Figure 3.15: Edge detection using different thresholds for the value of the gradient
on the grayscale image
144 CHAPTER 3. CALCULUS
Chapter 4
Linear Algebra
• The scalar field of real numbers is given as R and the scalar field of
complex numbers is given as C.
Example 4.1. (numpy Arrays) In Python you can build a list using square
brackets such as [1,2,3]. This is called a “Python list” and is NOT a vector in
the way that we think about it mathematically. It is simply an ordered collection
of objects. To build mathematical vectors in Python we need to use numpy arrays
with np.array(). For example, the vector
1
u = 2
3
Notice that Python defines the vector u as a matrix without a second dimension.
You can see that in the following code.
import numpy as np
u= np.array([1,2,3])
print("The length of the u vector is \n",len(u))
print("The shape of the u vector is \n",u.shape)
Moreover, we can extract the shape, the number of rows, and the number of
columns of A using the A.shape command. To be a bit more clear on this one
148 CHAPTER 4. LINEAR ALGEBRA
import numpy as np
A = np.matrix([[1,2,3],[4,5,6]])
print("The shape of the A matrix is \n",A.shape)
print("Number of rows in A is \n",A.shape[0])
print("Number of columns in A is \n",A.shape[1])
Example 4.3. (Row and Column Vectors in Python) You can more
specifically build row or column vectors in Python using the np.matrix()
command and then only specifying one row or column. For example, if you want
the vectors
1
u = 2 and v = 4 5 6
3
Alternatively, if you want to define a column vector you can define a row vector
(since there are far fewer brackets to keep track of) and then transpose the matrix
to turn it into a column.
import numpy as np
u = np.matrix([[1,2,3]])
u = u.transpose()
print("The column vector u is \n",u)
Example 4.4. (Matrix Indexing) Python indexes all arrays, vectors, lists,
and matrices starting from index 0. Let’s get used to this fact.
Consider the matrix A defined in the previous problem. Mathematically we
know that the entry in row 1 column 1 is a 1, the entry in row 1 column 2 is a 2,
and so on. However, with Python we need to shift the way that we enumerate
the rows and columns of a matrix. Hence we would say that the entry in row 0
column 0 is a 1, the entry in row 0 column 1 is a 2, and so on.
Mathematically we can view all Python matrices as follows. If A is an n × n
4.2. VECTORS AND MATRICES IN PYTHON 149
matrix then
A0,0 A0,1 A0,2 ··· A0,n−1
A1,0 A1,1 A1,2 ··· A1,n−1
A=
.. .. .. .. ..
. . . . .
An−1,0 An−1,1 An−1,2 ··· An−1,n−1
Exercise 4.1. Build your own matrix in Python and practice choosing individual
entries from the matrix.
Example 4.5. (Matrix Slicing) The last thing that we need to be familiar with
is slicing a matrix. The term “slicing” generally refers to pulling out individual
rows, columns, entries, or blocks from a list, array, or matrix in Python. Examine
the code below to see how to slice parts out of a numpy matrix.
import numpy as np
A = np.matrix([[1,2,3],[4,5,6],[7,8,9]])
print(A)
print("The first column of A is \n",A[:,0])
print("The second row of A is \n",A[1,:])
print("The top left 2x2 sub matrix of A is \n",A[:-1,:-1])
print("The bottom right 2x2 sub matrix of A is \n",A[1:,1:])
u = np.array([1,2,3,4,5,6])
print("The first 3 entries of the vector u are \n",u[:3])
print("The last entry of the vector u is \n",u[-1])
print("The last two entries of the vector u are \n",u[-2:])
150 CHAPTER 4. LINEAR ALGEBRA
Exercise 4.2. Define the matrix A and the vector u in Python. Then perform
all of the tasks below.
1 3 5 7 10
A= 2 4 6 8 and u = 20
−3 −2 −1 0 30
a. Print the matrix A, the vector u, the shape of A, and the shape of u.
b. Print the first column of A.
c. Print the first two rows of A.
d. Print the first two entries of u.
e. Print the last two entries of u.
f. Print the bottom left 2 × 2 submatrix of A.
g. Print the middle two elements of the middle row of A.
4.3. MATRIX AND VECTOR OPERATIONS 151
Now let’s get the formal definitions of the dot product on the table.
Definition 4.1. (“The Dot Product) The dot product of two vectors
u, v ∈ Rn is
Xn
u·v = uj vj .
j=1
u · v = u1 v1 + u2 v2 + · · · + un vn .
Alternatively, you may also recall that the dot product of two vectors is given
geometrically as
u · v = kukkvk cos θ
152 CHAPTER 4. LINEAR ALGEBRA
where kuk and kvk are the magnitudes (or lengths) of u and v respectively, and
θ is the angle between the two vectors. In physical applications the dot product
is often used to find the angle between two vectors (e.g. between two forces).
Hence, the last form of the dot product is often rewritten as
u·v
θ = cos−1 .
kukkvk
√
Exercise 4.4. Verify that u · u indeed gives the Pythagorean Theorem for
u ∈ R2 .
Exercise 4.5. Our task now is to write a Python function that accepts two
vectors (defined as numpy arrays) and returns the dot product. Write this code
without the use any loops.
import numpy as np
def myDotProduct(u,v):
return # the dot product formula uses a product inside a sum.
is given below. Test your code on other vectors. Then implement an error catch
into your code to catch the case where the two input vectors are not the same
size. You will want to use the len() command to find the length of the vectors.
1 You
√
should also note that kuk = u · u is not the only definition of distance. More
p if you let hu, vi be an inner product for u and v in some vector space V then
generally,
kuk = hu, ui. In most cases in this text we will be using the dot product as our prefered
inner product so we won’t have to worry much about this particular natural extension of the
definition of the length of a vector.
4.3. MATRIX AND VECTOR OPERATIONS 153
u = np.array([1,2,3])
v = np.array([4,5,6])
myDotProduct(u,v)
Exercise 4.7. Try sending Python lists instead of numpy arrays into your
myDotProduct function. What happens? Why does it happen? What is the
cautionary tale here? Modify your myDotProduct() function one more time so
that it starts by converting the input vectors into numpy arrays.
u = [1,2,3]
v = [4,5,6]
myDotProduct(u,v)
Exercise 4.8. The numpy library in Python has a built-in command for doing
the dot product: np.dot(). Test the np.dot() command and be sure that it
does the same thing as your myDotProduct() function.
Now that you’ve practiced the algorithm for matrix multiplication we can
formalize the definition and then turn the algorithm into a Python function.
A moment’s reflection reveals that each entry in the matrix product is actually
a dot product,
(Entry in row i column j of AB) = (Row i of matrix A)·(Column j of matrix B) .
Exercise 4.10. The definition of matrix multiplication above contains the cryptic
phrase a moment’s reflection reveals that each entry in the matrix product is
actually a dot product. Let’s go back to the matrices A and B defined above and
re-evaluate the matrix multiplication algorithm to make sure that you see each
entry as the end result of a dot product.
We want to find the product of matrices A and B using dot products.
1 2
7 8 9
A = 3 4 B=
10 11 12
5 6
a. Why will the product AB clear be a 3 × 3 matrix?
b. When we do matrix multiplication we take the product of a row from the
first matrix times a column from the second matrix . . . at least that’s how
many people think of it when they perform the operation by hand.
i. The rows of A can be written as the vectors
a0 = 1 2
a1 =
a2 =
ii. The columns of B can be written as the vectors
7
b0 =
10
b1 =
b2 =
Partial code is given below. Fill in all of the details and give ample comments
showing what each line does.
import numpy as np
def myMatrixMult(A,B):
# Get the shapes of the matrices A and B.
# Then write an if statement that catches size mismatches
# in the matrices. Next build a zeros matrix that is the
# correct size for the product of A and B.
AB = ???
# AB is a zeros matix that will be filled with the values
# from the product
#
# Next we do a double for-loop that loops through all of
# the indices of the product
for i in range(n): # loop over the rows of AB
for j in range(m): # loop over the columns of AB
# use the np.dot() command to take the dot product
AB[i,j] = ???
return AB
Use the following test code to determine if you actually get the correct matrix
product out of your code.
A = np.matrix([[1,2],[3,4],[5,6]])
B = np.matrix([[7,8,9],[10,11,12]])
AB = myMatrixMult(A,B)
print(AB)
Now that you’ve been through the exercise of building a matrix multiplication
function we will admit that using it inside larger coding problems would be a
bit cumbersome (and perhaps annoying). It would be nice to just type * and
have Python just know that you mean to do matrix multiplication. The trouble
is that there are many different versions of multiplication and any programming
language needs to be told explicitly which type they’re dealing with. This is
where numpy and np.matrix() come in quite handy.
(Note that the product AB does not make sense under the mathematical definition
of matrix multiplication, but it does make sense in terms of element-by-element
(“naive”) multiplication.)
import numpy as np
A = [[1,2],[3,4],[5,6]]
2 You might have thought that naive multiplication was a much more natural way to do
matrix multiplication when you first saw it. Hopefully now you see the power in the definition
of matrix multiplication that we actually use. If not, then I give you this moment to ponder
that (a) matrix multiplication is just a bunch of dot products, and (b) dot products can be
seen as projections. Hence, matrix multiplication is really just a projection of the rows of A
onto the columns of B. This has much more rich geometric flavor than naive multiplication.
4.3. MATRIX AND VECTOR OPERATIONS 157
B = [[7,8],[9,10],[11,12]]
np.multiply(A,B)
The key takeaways for doing matrix multiplication in Python are as follows:
• If you are doing linear algebra in Python then you should define vectors
with np.array() and matrices with np.matrix().
• If your matrices are defined with np.matrix() then * does regular matrix
multiplication and np.multiply() does element-by-element multiplication.
158 CHAPTER 4. LINEAR ALGEBRA
If you need a nudge to get started then jump ahead to the next problem.
ii. Now scale row 1 by something and add it to row 0 so that the entry
in row 0 column 1 becomes a 0.
iii. Next scale row 1 by something and add it to row 2 so that the entry
in row 2 column 1 becomes a 0.
1 0 −1 − 53
0 1 2 4
3 .
0 0 −9 3
Exercise 4.17. Summarize the process for doing Gaussian Elimination to solve
a square system of linear equations.
What if we just row reduce until the system is simple enough to solve
by hand?
That’s what the next several exercises are going to lead you to. Our goal here is to
develop an algorithm that is fast to implement on a computer and simultaneously
performs the same basic operations as row reduction for solving systems of linear
equations.
L1 A =
L2 (L1 A) =
Pure insanity!!
c. Now let’s say that you want to make the entry in row 2 column 1 into a 0
by scaling row 1 by something and then adding to row 2. Determine what
162 CHAPTER 4. LINEAR ALGEBRA
the scalar would be and then determine which matrix, call it L3 , would do
the trick so that L3 (L2 L1 A) would be the next row reduced step.
1
L3 = 1
1
L3 (L2 L1 A) =
Exercise 4.19. Apply the same idea from the previous problem to do the first
three steps of row reduction to the matrix
2 6 9
A = −6 8 1
2 2 10
Exercise 4.20. Now let’s make a few observations about the two previous
problems.
a. What will multiplying A by a matrix of the form
1 0 0
c 1 0
0 0 1
do?
b. What will multiplying A by a matrix of the form
1 0 0
0 1 0
c 0 1
do?
c. What will multiplying A by a matrix of the form
1 0 0
0 1 0
0 c 1
do?
d. More generally: If you wanted to multiply row j of an n × n matrix by c
and add it to row k, that is the same as multiplying by what matrix?
4.4. THE LU FACTORIZATION 163
Exercise 4.21. After doing all of the matrix products, L3 L2 L1 A, the resulting
matrix will have zeros in the entire lower triangle. That is, all of the nonzero
entries of the resulting matrix will be on the main diagonal or above. We call
this matrix U , for upper triangular. Hence, we have formed a matrix
L3 L2 L1 A = U
Exercise 4.22. It would be nice, now, if the inverses of the L matrices were
easy to find. Use np.linalg.inv() to directly compute the inverse of L1 , L2 ,
and L3 for each of the example matrices. Then complete the statement: If Lk is
an identity matrix with some nonzero c in row i and column j then L−1
k is what
matrix?
and we defined
1 0 0 1 0 0 1 0 0
L1 = −4 1 0 , L2 = 0 1 0 , and L3 = 0 1 0 .
0 0 1 −7 0 1 0 −2 1
A = L−1 −1 −1
1 L2 L3 U.
Throughout all of the preceding exercises, our final result is that we have
factored the matrix A into the product of a lower triangular matrix and an upper
triangular matrix. Stop and think about that for a minute . . . we just factored
a matrix!
164 CHAPTER 4. LINEAR ALGEBRA
Give details of what happens at every step of the algorithm. I’ll get you started.
• n=3, L starts as an identity matrix of the correct size, and U starts as a
copy of A.
• Start the outer loop: j=0: (j is the counter for the column)
– Start the inner loop: i=1: (i is the counter for the row)
3 Take careful note here. We have actually just built a special case of the LU decomposition.
Remember that in row reduction you are allowed to swap the order of the rows, but in our LU
algorithm we don’t have any row swaps. The version of LU with row swaps is called LU with
partial pivoting. We won’t built the full partial pivoting algorithm in this text but feel free to
look it up. The wikipedia page is a decent place to start. What you’ll find is that there are
indeed many different versions of the LU decomposition.
4.4. THE LU FACTORIZATION 165
Exercise 4.25. Apply your new myLU code to other square matrices and verify
that indeed A is the product of the resulting L and U matrices. You can produce
a random matrix with np.random.randn(n,n) where n is the number of rows
and columns of the matrix. For example, np.random.randn(10,10) will produce
a random 10 × 10 matrix with entries chosen from the normal distribution with
center 0 and standard deviation 1. Random matrices are just as good as any
other when testing your algorithm.
and
A lower triangular system: Ly = b.
In the following exercises we will devise algorithms for solving triangular systems.
After we know how to work with triangular systems we’ll put all of the pieces
together and show how to leverage the LU decomposition and the solution
techniques for triangular systems to quickly and efficiently solve linear systems.
Exercise 4.26. Outline a fast algorithm (without formal row reduction) for
166 CHAPTER 4. LINEAR ALGEBRA
Exercise 4.28. Work with your partner(s) to apply the lsolve() code to the
lower triangular system
1 0 0 y0 1
4 1 0 y1 = 0
7 2 1 y2 2
by hand. It is incredibly important to impelement numerical linear algebra
routines by hand a few times so that you truly understand how everything is
being tracked and calculated.
I’ll get you started.
4.4. THE LU FACTORIZATION 167
• Start: i=0:
– y[0]=1 since b[0]=1.
– The next for loop does not start since range(0) has no elements
(stop and think about why this is).
• Next step in the loop: i=1:
– y[1] is initialized as 0 since b[1]=0.
– Now we enter the inner loop at j=0:
∗ What does y[1] become when j=0?
– Does j increment to anything larger?
• Finally we increment i to i=2:
– What does y[2] get initialized to?
– Enter the inner loop at j=0:
∗ What does y[2] become when j=0?
– Increment the inner loop to j=1:
∗ What does y[2] become when j=1?
• Stop
Exercise 4.29. Copy the code from Definition 4.5 into a Python function but
in your code write a comment on every line stating what it is doing. Write a test
script that creates a lower triangular matrix of the correct form and a right-hand
side b and solve for y. Test your code by giving it a large lower triangular system.
Now that we have a method for solving lower triangular systems, let’s build
a similar method for solving upper triangular systems. The merging of lower
and upper triangular systems will play an important role in solving systems of
equations.
Exercise 4.30. Outline a fast algorithm (without formal row reduction) for
solving the upper triangular system
1 2 3 x0 1
0 −3 −6 x1 = −4
0 0 −9 x2 3
The most natural algorithm that most people devise here is called backward
substitution. Notice that in our upper triangular matrix we do not have a
diagonal containing all ones.
Exercise 4.32. Now we will work through the backward substitution algorithm
to help fill in the blanks in the code. Consider the upper triangular system
1 2 3 x0 1
0 −3 −6 x1 = −4
0 0 −9 x2 3
Work the code from Definition 4.6 to solve the system. Keep track of all of the
indices as you work through the code. You may want to work this problem in
conjunction with the previous two problems to unpack all of the parts of the
backward substitution algorithm.
I’ll get you started.
• In your backward substitution algorithm you should have started with the
last row, therefore the outer loop starts at n-1 and reads backward to 0.
(Why are we starting at n-1 and not n?)
• Outer loop: i=2:
– We want to solve the equation −9x2 = 3 so the clear solution is to
divide by −9. In code this means that x[2]=y[2]/U[2,2].
Exercise 4.33. Copy the code from Definition 4.6 into a Python function but
in your code write a comment on every line stating what it is doing. Write
a test script that creates an upper triangular matrix of the correct form and
a right-hand side y and solve for x. Your code needs to work on systems of
arbitrarily large size.
Exercise 4.36. Test your lsolve, usolve, and myLU functions on a linear
system for which you know the answer. Then test your problem on a system
that you don’t know the solution to. As a way to compare your solutions you
should:
4.4. THE LU FACTORIZATION 171
t0 = time.time()
Abrref = # row reduce the symbolic augmented matrix
t1 = time.time()
RREFTime = t1-t0
t0=time.time()
exact = # use np.linalg.solve() to solve the linear system
t1=time.time()
exactTime = t1-t0
t0 = time.time()
L, U = # get L and U from your myLU
y = # use forward substitution to get y
x = # use bacckward substituation to get x
t1 = time.time()
LUTime = t1-t0
Exercise 4.38. What happens when you try to solve the system of equations
0 0 1 x0 7
0 1 0 x1 = 9
1 0 0 x2 −3
Theorem 4.1 begs an obvious question: Is there a way to turn any matrix A into
an orthogonal matrix so that we can solve Ax = b in this same very efficient
and fast way?
The answer: Yes. Kind of.
In essence, if we can factor our coefficient matrix into an orthonormal matrix and
some other nicely formatted matrix (like a triangular matrix, perhaps) then the
job of solving the linear system of equations comes down to matrix multiplication
and a quick triangular solve – both of which are extremely extremely fast!
What we will study in this section is a new matrix factorization called the QR
factorization who’s goal is to convert the matrix A into a product of two matrices,
174 CHAPTER 4. LINEAR ALGEBRA
Exercise 4.40. Let’s say that we have a matrix A and we know that it can
be factored into A = QR where Q is an orthonormal matrix and R is an upper
triangular matrix. How would we then leverage this factorization to solve the
system of equation Ax = b for x?
Before proceeding to the algorithm for the QR factorization let’s pause for a
moment and review scalar and vector projections from Linear Algebra. In Figure
4.1 we see a graphical depiction of the vector u projected onto vector v. Notice
that the projection is indeed the perpendicular projection as this is what seems
natural geometrically.
The vector projection of u onto v is the vector cv. That is, the vector
projection of u onto v is a scalar multiple of the vector v. The value of the
scalar c is called the scalar projection of u onto v.
cv
We can arrive at a formula for the scalar projection rather easily is we consider
that the vector w in Figure 4.1 must be perpendicular to cv. Hence
w · (cv) = 0.
(u − cv) · (cv) = 0.
cu · v − c2 v · v = 0
Therefore,
4.5. THE QR FACTORIZATION 175
hypotenuse = a1
a1 · q 0
leg 1 = q0 =
q0 · q0
leg 2 = − .
d. Compute the vector for leg 2 and then normalize it to turn it into a unit
vector. Call this vector q 1 and put it in the second column of Q.
e. Verify that the columns of Q are now orthogonal and are both unit vectors.
f. The matrix R is supposed to complete the matrix factorization A = QR.
We have built Q as an orthonormal matrix. How can we use this fact to
solve for the matrix R?
g. You should now have an orthonormal matrix Q and an upper triangular
matrix R. Verify that A = QR.
h. An alternate way to build the R matrix is to observe that
a0 · q 0 a1 · q 0
R= .
0 a1 · q 1
Show that this is indeed true for the matrix A from this problem.
Exercise 4.43. You should notice that the code in the previous exercise does
not depend on the specific matrix A that we used? Put in a different 2 × 2 matrix
and verify that the process still works. That is, verify that Q is orthonormal, R
is upper triangular, and A = QR. Be sure, however, that your matrix A is full
rank.
Exercise 4.44. Draw two generic vectors in R2 and demonstrate the process
outlined in the previous problem to build the vectors for the Q matrix starting
from your generic vectors.
Exercise 4.45. Now we’ll extend the process from the previous exercises to
three dimensions. This time we will seek a matrix Q that has three othonormal
vectors starting from the three original columns of a 3 × 3 matrix A. Perform
each of the following steps by hand on the matrix
1 1 0
A = 1 0 1 .
0 1 1
In the end you should end up with an orthonormal matrix Q and an upper
triangular matrix R.
• Step 1: Pick column a0 from the matrix A and normalize it. Call this
new vector q 0 and make that the first column of the matrix Q.
• Step 2: Project column a1 of A onto q 0 . This forms a right triangle with
a1 as the hypotenuse, the projection of a1 onto q 0 as one of the legs, and
178 CHAPTER 4. LINEAR ALGEBRA
the vector difference between these two as the second leg. Notice that the
second leg of the newly formed right triangle is perpendicular to q 0 by
design. If we normalize this vector then we have the second column of Q,
q1 .
• Step 3: Now we need a vector that is perpendicular to both q 0 AND q 1 .
To achieve this we are going to project column a2 from A onto the plane
formed by q 0 and q 1 . We’ll do this in two steps:
– Step 3a: We first project a2 down onto both q 0 and q 1 .
Exercise 4.46. Repeat the previous exercise but write code for each step so
that Python can handle all of the computations. Again use the matrix
1 1 0
A = 1 0 1 .
0 1 1
Example 4.7. (QR for n = 3) For the sake of clarity let’s now write down the
full QR factorization for a 3 × 3 matrix.
If the columns of A are a0 , a1 , and a2 then
a0
q0 =
ka0 k
a1 − (a1 · q 0 ) q 0
q1 =
ka1 − (a1 · q 0 ) q 0 k
a2 − (a2 · q 0 ) q 0 − (a2 · q 1 ) q 1
q2 =
ka2 − (a2 · q 0 ) q 0 − (a2 · q 1 ) q 1 k
and
a0 · q 0 a1 · q 0 a2 · q 0
R= 0 a1 · q 1 a2 · q 1
0 0 a2 · q 2
partially complete. Fill in the missing pieces of code and then test your code on
square matrices of many different sizes. The easiest way to check if you have an
error is to find the normed difference between A and QR with np.linalg.norm(A
- Q*R).
import numpy as np
def myQR(A):
n = A.shape[0]
Q = np.matrix( np.zeros( (n,n) ) )
for j in range( ??? ): # The outer loop goes over the columns
q = A[:,j]
# The next loop is meant to do all of the projections.
# When do you start the inner loop and how far do you go?
# Hint: You don't need to enter this loop the first time
for i in range( ??? ):
length_of_leg = np.sum(A[:,j].T * Q[:,i])
q = q - ??? * ??? # This is where we do projections
Q[:,j] = q / np.linalg.norm(q)
R = # finally build the R matrix
return Q, R
# Test Code
A = np.matrix( ... )
# or you can build A with use np.random.randn()
# Often time random matrices are good test cases
Q, R = myQR(A)
error = np.linalg.norm(A - Q*R)
print(error)
Exercise 4.49. Write code that builds a random n × n matrix and a random
n × 1 vector. Solve the equation Ax = b using the QR factorization and compare
the answer to what we find from np.linalg.solve(). Do this many times for
various values of n and create a plot with n on the horizontal axis and the
normed error between Python’s answer and your answer from the QR algorithm
on the vertical axis. It would be wise to use a plt.semilogy() plot. To find
the normed difference you should use np.linalg.norm(). What do you notice?
4.6. OVER DETERMINED SYSTEMS AND CURVE FITTING 181
Back in Exercise 3.81 and the subsequent problems we approached this problem
using an optimization tool in Python. You might be surprised to learn that
there is a way to do this same optimization with linear algebra!!
We don’t know the values of a, b, or c but we do have four different (x, y) ordered
pairs. Hence, we have four equations:
There are four equations and only three unknowns. This is what is called an
over determined systems – when there are more equations than unknowns.
Let’s play with this problem.
a. First turn the system of equations into a matrix equation.
0 0 1 1.07
a
b = 3.9 .
14.8
c
26.8
b. None of our techniques for solving systems will likely work here since it is
highly unlikely that the vector on the right-hand side of the equation is in
the column space of the coefficient matrix. Discuss this.
c. One solution to the unfortunate fact from part (b) is that we can project
the vector on the right-hand side into the subspace spanned by the columns
of the coefficient matrix. Think of this as casting the shadow of the right-
hand vector down onto the space spanned by the columns. If we do this
projection we will be able to solve the equation for the values of a, b, and
c that will create the projection exactly – and hence be as close as we can
get to the actual right-hand side. Draw a picture of what we’ve said here.
d. Now we need to project the right-hand side, call it b, onto the column
space of the the coefficient matrix A. Recall the following facts:
• Projections are dot products
• Matrix multiplication is nothing but a bunch of dot products.
• The projections of b onto the columns of A are the dot products of b
with each of the columns of A.
182 CHAPTER 4. LINEAR ALGEBRA
Exercise 4.51. Fit a linear function to the following data. Solve for the slope
and intercept using the technique outlined in Theorem 4.3. Make a plot of the
points along with your best fit curve.
x y
0 4.6
1 11
2 12
3 19.1
4 18.8
5 39.5
6 31.1
7 43.4
8 40.3
9 41.5
10 41.6
# Exercise4_51.csv
Exercise 4.52. Fit a quadratic function to the following data using the technique
outlined in Theorem 4.3. Make a plot of the points along with your best fit
curve.
x y
0 -6.8
1 11.8
2 50.6
3 94
4 224.3
5 301.7
6 499.2
7 454.7
8 578.5
9 1102
10 1203.2
Exercise 4.53. The Statistical technique of curve fitting is often called “linear
regression.” This even holds when we are fitting quadratic functions, cubic
functions, etc to the data . . . we still call that linear regression! Why?
This section of the text on solving over determined systems is just a bit of a
teaser for a bit of higher-level statistics, data science, and machine learning. The
normal equations and solving systems via projections is the starting point of
many modern machine learning algorithms. For more information on this sort of
problem look into taking some statistics, data science, and/or machine learning
courses. You’ll love it!
184 CHAPTER 4. LINEAR ALGEBRA
Theorem 4.4. Recall that to solve the eigen-problem for a square matrix A we
complete the following steps:
a. First rearrange the definition of the eigenvalue-eigenvector pair to
(Ax − λx) = 0.
(A − λI)x = 0.
4 Numerical Linear Algebra is a huge field and there is way more to say . . . but alas, this is
OK. Now that you recall some of the basics let’s play with a little limit problem.
The following exercises are going to work us toward the power method for
finding certain eigen-structure of a matrix.
If we have
3
x = −2v 1 + 1v 2 − 3v 3 = −7
−1
then we want to do a bit of an experiment. What happens when we iteratively
multiply x by A but at the same time divide by the largest eigenvalue. Let’s see:
• What is A1 x/31 ?
• What is A2 x/32 ?
• What is A3 x/33 ?
• What is A4 x/34 ?
186 CHAPTER 4. LINEAR ALGEBRA
• ...
It might be nice now to go to some Python code to do the computations (if you
haven’t already). Use your code to conjecture about the following limit.
Ak x
lim =???.
k→∞ λk
max
In this limit we are really interested in the direction of the resulting vector, not
the magnitude. Therefore, in the code below you will see that we normalize the
resulting vector so that it is a unit vector.
Note: be careful, computers don’t do infinity, so for powers that are too large
you won’t get any results.
import numpy as np
A = np.matrix([[8,5,-6],[-12,-9,12],[-3,-3,5]])
x = np.matrix([[3],[-7],[-1]])
eigval_max = 3
k = 4
result = A**k * x / eigval_max**k
print(result / np.linalg.norm(result) )
Exercise 4.58. Explain your result from the previous exercise geometrically.
Exercise 4.59. The algorithm that we’ve been toying with will find the dominant
eigenvector of a matrix fairly quickly. Why might you be only interested in the
dominant eigenvector of a matrix? Discuss.
Exercise 4.60. In this problem we will formally prove the conjecture that you
just made. This conjecture will lead us to the power method for finding the
dominant eigenvector and eigenvalue of a matrix.
a. Assume that A has n linearly independent eigenvectors v 1 , v 2 , . . . , v n and
4.7. THE EIGENVALUE-EIGENVECTOR PROBLEM 187
Pn
choose x = j=1 cj v j . You have proved in the past that
Theorem 4.5. (The Power Method) The following algorithm, called the
power method will quickly find the eigenvalue of largest absolute value for a
square matrix A ∈ Rn×n as well as the associated (normalized) eigenvector. We
are assuming that there are n linearly independent eigenvectors of A.
Step #1: Given a nonzero vector x, set v (1) = x/kxk. (Here the superscript
indicates the iteration number) Note that the initial vector x is pretty
irrelevant to the process so it can just be a random vector of the correct
size..
Step #2: For k = 2, 3, . . .
Step #2a: Compute ṽ (k) = Av (k−1) (this gives a non-normalized version
of the next estimate of the dominant eigenvector.)
Step #2b: Set λ(k) = ṽ (k) · v (k−1) . (this gives an approximation of the
eigenvalue since if v (k−1) was the actual eigenvector we would have
λ = Av (k−1) · v (k−1) . Stop now and explain this.)
Step #2c: Normalize ṽ (k) by computing v (k) = ṽ (k) /kṽ (k) k. (This guar-
antees that you will be sending a unit vector into the next iteration of
the loop)
Exercise 4.61. Go through Theorem 4.5 carefully and describe what we need
to do in each step and why we’re doing it. Then complete all of the missing
pieces of the following Python function.
import numpy as np
def myPower(A, tol = 1e-8):
n = A.shape[0]
188 CHAPTER 4. LINEAR ALGEBRA
x = np.matrix( np.random.randn(n,1) )
x = # turn x into a unit vector
# we don't actually need to keep track of the old iterates
L = 1 # initialize the dominant eigenvalue
counter = 0 # keep track of how many steps we've taken
# You can build a stopping rule from the definition
# Ax = lambda x ...
while (???) > tol and counter < 10000:
x = A*x # update the dominant eigenvector
x = ??? # normalize
L = ??? # approximate the eignevalue
counter += 1 # increment the counter
return x, L
Exercise 4.62. Test your myPower() function on several matrices where you
know the eigenstructure. Then try the myPower() function on larger random
matrices. You can check that it is working using np.linalg.eig() (be sure to
normalize the vectors in the same way so you can compare them.)
Exercise 4.63. In the Power Method iteration you may end up getting a
different sign on your eigenvector as compared to np.linalg.eig(). Why might
this happen? Generate a few examples so you can see this. You can avoid this
issue if you use a while loop in your Power Method code and the logical check
takes advantage of the fact that we are trying to solve the equation Ax = λx.
Hint: Ax = λx is equivalent to Ax − λx = 0.
Exercise 4.65. (onvergence Rate of the Power Method) The proof that the
power method will work hinges on the fact that |λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |.
5 To build a matrix with specific eigenvalues it may be helpful to recall the matrix fac-
torization A = P DP −1 where the columns of P are the eigenvectors of A and the diagonal
entries of D are the eigenvalues. If you choose P and D then you can build A with your
specific eigen-structure. If you are looking for complex eigenvalues then remember that the
eigenvectors may well be complex too.
4.7. THE EIGENVALUE-EIGENVECTOR PROBLEM 189
Ak x
lim
k→∞ λk
1
converges to the dominant eigenvector, but how fast is the convergence? What
does the speed of the convergence depend on?
Take note that since we’re assuming that the eigenvalues are ordered, the ratio
λ2 /λ1 will be larger than λj /λ1 for all j > 2. Hence, the speed at which the
power method converges depends mostly on the ratio λ2 /λ1 . Let’s build a
numerical experiment to see how sensitive the power method is to this ratio.
Build a 4 × 4 matrix A with dominant eigenvalue λ1 = 1 and all other eigenvalues
less than 1 in absolute value. Then choose several values of λ2 and build an
experiment to determine the number of iterations that it takes for the power
method to converge to within a pre-determined tolerance to the dominant
eigenvector. In the end you should produce a plot with the ratio λ2 /λ1 on the
horizontal axis and the number of iterations to converge to a fixed tolerance on
the vertical axis. Discuss what you see in your plot.
Hint: To build a matrix with specific eigen-structure use the matrix factorization
A = P DP −1 where the columns of P contain the eigenvectors of A and the
diagonal of D contains the eigenvalues. In this case the P matrix can be random
but you need to control the D matrix. Moreover, remember that λ3 and λ4
should be smaller than λ2 .
190 CHAPTER 4. LINEAR ALGEBRA
4.8 Exercises
4.8.1 Algorithm Summaries
Exercise 4.66. Explain in clear language how to efficiently solve an upper
triangular system of linear equations.
Exercise 4.70. Explain in clear language the algorithm for finding the columns
of the Q matrix in the QR factorization. Give all of the mathematical details.
Exercise 4.71. Explain in clear language how to find the upper triangular
matrix R in the QR factorization. Give all of the mathematical details.
Exercise 4.73. Explain in clear language how the power method works to find
the dominant eigenvalue and eigenvector of a square matrix. Give all of the
mathematical details.
To time code in Python first import the time library. Then use start =
time.time() at the start of your code and stop = time.time() and the end of
your code. The difference between stop and start is the elapsed computation
time.
Make observations about how the algorithms perform for different sized matrices.
You can use random matrices and vectors for A and b. The end result should be
a plot showing how the average computation time for each algorithm behaves as
a function of the size of the coefficient matrix.
The code below will compute the reduced row echelon form of a matrix (RREF).
Implement the code so that you know how it works.
import sympy as sp
import numpy as np
# in this problem it will be easiest to start with numpy matrices
A = np.matrix([[1, 0, 1], [2, 3, 5], [-1, -3, -3]])
b = np.matrix([[3],[7],[3]])
Augmented = np.c_[A,b] # augment b onto the right hand side of A
Msymbolic = sp.Matrix(Augmented)
MsymbolicRREF = Msymbolic.rref()
print(MsymbolicRREF)
Exercise 4.75. Imagine that we have a 1 meter long thin metal rod that has
been heated to 100◦ on the left-hand side and cooled to 0◦ on the right-hand
side. We want to know the temperature every 10 cm from left to right on the
rod.
a. First we break the rod into equal 10cm increments as shown. See Figure
4.2. How many unknowns are there in this picture?
b. The temperature at each point along the rod is the average of the tempera-
tures at the adjacent points. For example, if we let T1 be the temperature
192 CHAPTER 4. LINEAR ALGEBRA
at point x1 then
T0 + T2
T1 = .
2
Write a system of equations for each of the unknown temperatures.
c. Solve the system for the temperature at each unknown node using either
LU or QR decomposition.
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
Exercise 4.76. Write code to solve the following systems of equations via both
LU and QR decompositions. If the algorithm fails then be sure to explain exactly
why.
a.
x + 2y + 3z =4
2x + 4y + 3z =5
x+y =4
b.
2y + 3z =4
2x + 3z =5
y =4
c.
2y + 3z =4
2x + 4y + 3z =5
x+y =4
Exercise 4.77. Give a specific example of a nonzero matrix which will NOT
have an LU decomposition. Give specific reasons why LU will fail on your
matrix.
Exercise 4.78. Give a specific example of a nonzero matrix which will NOT
have an QR decomposition. Give specific reasons why QR will fail on your
matrix.
Exercise 4.79. Have you ever wondered how scientific software computes a
determinant? The formula that you learned for calculating determinants by
hand is horribly cumbersome and computationally intractible for large matrices.
4.8. EXERCISES 193
This problem is meant to give you glimpse of what is actually going on under
the hood.6
If A has an LU decomposition then A = LU . Use properties that you know
about determinants to come up with a simple way to find the determinant for
matrices that have an LU decomposition. Show all of your work in developing
your formula.
Once you have your formula for calculating det(A), write a Python function that
accepts a matrix, produces the LU decomposition, and returns the determinant
of A. Check your work against Python’s np.linalg.det() function.
Exercise 4.80. For this problem we are going to run a numerical experiment to
see how the process of solving the equation Ax = b using the LU factorization
performs on random coefficient matrices A and random right-hand sides b. We
will compare against Python’s algorithm for solving linear systems.
We will do the following:
Create a loop that does the following:
a. Loop over the size of the matrix n.
b. Build a random matrix A of size n × n. You can do this with the code A
= np.matrix( np.random.randn(n,n) )
c. Build a random vector b in Rn . You can do this with the code b =
np.matrix( np.random.randn(n,1) )
d. Find Python’s answer to the problem Ax = b =0 using the command
exact = np.linalg.solve(A,b)
e. Write code that uses your three LU functions (myLU, lsolve, usolve) to
find a solution to the equation Ax = b.
f. Find the error between your answer and the exact answer using the code
np.linalg.norm(x - exact)
g. Make a plot (plt.semilogy()) that shows how the error behaves as the
size of the problem changes. You should run this for matrices of larger and
larger size but be warned that the loop will run for quite a long time if
you go above 300 × 300 matrices. Just be patient.
Conclusions: What do you notice in your final plot. What does this tell you
about the behavior of our LU decomposition code?
Exercise 4.81. Repeat Exercise 4.80 for the QR decomposition. Your final plot
should show both the behavior of QR and of LU throughout the experiement.
What do you notice?
6 Actually, the determinant computation uses LU with partial pivoting which we did not
cover here in the text. What we are looking at in this exercise is a smaller subcase of what
happens when you have a matrix A that does not require any row swaps in the row reduction
process.
194 CHAPTER 4. LINEAR ALGEBRA
Notice that A has a tiny, but nonzero, value in the first entry.
a. Solve the linear system Ax = b by hand.
b. Use your myLU, lsolve, and usolve functions to solve this problem using
the LU decomposition method.
c. Compare your answers to parts (a) and (b). What went wrong?
This type of matrix is often used to test numerical linear algebra algorithms
since it is known to have some odd behaviors . . . which you’ll see in a moment.
a. Write code to build a n × n Hilbert Matrix and call this matrix H. Test
your code for various values of n to be sure that it is building the correct
matrices.
b. Build a vector of ones called b with code b = np.ones( (n,1) ). We will
use b as the right hand side of the system of equations Hx = b.
c. Solve the system of equations Hx = b using any technique you like from
this chapter.
4.8. EXERCISES 195
d. Now let’s say that you change the first entry of b by just a little bit, say
10−15 . If we were to now solve the equation Hxnew = bnew what would
you expect as compared to solving Hx = b.
e. Now let’s actually make the change suggested in part (d). Use the code bnew
= np.ones( (n,1) ) and then bnew[0] = bnew[0] + 1e-15 to build a
new b vector with this small change. Solve Hx = b and Hxnew = bnew
and then compare the maximum absolute difference np.max( np.abs( x
- xnew ) ). What do you notice? Make a plot with n on the horizontal
axis and the maximum absolute difference on the vertical axis. What does
this plot tell you about the solution to the equation Hx = b?
f. We know that HH −1 should be the identity matrix. As we’ll see, however,
Hilbert matrices are particularly poorly behaved! Write a loop over n
that (i) builds a Hilbert matrix of size n, (ii) calculates HH −1 (using
np.linalg.inv() to compute the inverse directly), (iii) calculates the
norm of the difference between the identity matrix (np.identity(n)) and
your calculated identity matrix from part (ii). Finally. Build a plot that
shows n on the horizontal axis and the normed difference on the vertical
axis. What do you see? What does this mean about the matrix inversion
of the Hilbert matrix.
g. There are cautionary tales hiding in this problem. Write a paragraph
explaining what you can learn by playing with pathological matrices like
the Hilbert Matrix.
Exercise 4.85. Now that you have QR and LU code we’re going to use both
of them! The problem is as follows:
We are going to find the polynomial of degree 4 that best fits the function $
y = cos(4t) + 0.1ε(t)
Build the t vector and the y vector (these are your data). We need to set up
the least squares problems Ax = b by setting up the matrix A as we did in the
other least squares curve fitting problems and by setting up the b vector using
the y data you just built. Solve the problem of finding the coefficients of the
best degree 4 polynomial that fits this data. Report the sum of squared error
and show a plot of the data along with the best fit curve.
Exercise 4.86. Find the largest eigenvalue and the associated eigenvector of the
196 CHAPTER 4. LINEAR ALGEBRA
3 4 5 6
Exercise 4.88. Will the power method fail, slow down, or be uneffected if one
(or more) of the non-dominant eigenvalues is zero? Give sufficient mathematical
evidence or show several numerical experiments to support your answer.
Exercise 4.89. Find a cubic function that best fits the following data. you can
download the data directly with the code below.
x Data y Data
0 1.0220
0.0500 1.0174
0.1000 1.0428
0.1500 1.0690
0.2000 1.0505
0.2500 1.0631
0.3000 1.0458
0.3500 1.0513
0.4000 1.0199
0.4500 1.0180
0.5000 1.0156
0.5500 0.9817
0.6000 0.9652
0.6500 0.9429
0.7000 0.9393
0.7500 0.9266
0.8000 0.8959
0.8500 0.9014
0.9000 0.8990
0.9500 0.9038
1.0000 0.8989
4.8. EXERCISES 197
import numpy as np
import pandas as pd
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
data = np.array( pd.read_csv(URL+'Exercise4_89.csv') )
# Exercise4_89.csv
Exercise 4.91. (This concept for this problem is modified from [6]. The data
is taken from NOAA and the National Weather Service with the specific values
associated with La Crosse, WI.)
Floods in the Mississippi River Valleys of the upper midwest have somewhat
predictable day-to-day behavior in that the flood stage today has high predictive
power for the flood stage tomorrow. Assume that the flood stages are:
• Stage 0 (Normal): Average daily flow is below 90,000 f t3 /sec (cubic feet
per second = cfs). This is the normal river level.
• Stage 1 (Action Level): Average daily flow is between 90,000 cfs and 124,000
cfs.
• Stage 2 (Minor Flood): Average daily flow is between 124,000 cfs and
146,000 cfs.
• Stage 3 (Moderate Flood): Average daily flow is between 146,000 cfs and
170,000 cfs.
• Stage 4 (Extreme Flood): Average daily flow is above 170,000 cfs.
The following table shows the probability of one stage transitioning into another
stage from one day to the next.
Mathematically, if sk is the state at day k and A is the matrix given in the table
above then the difference equation sk+1 = Ask shows how a state will transition
from day to day. For example, if we are currently in Stage 0 then
1
0
0 .
s0 =
0
0
d. Interpret your answer to part (c) in the context of the problem. Be sure
that your interpretation could be well understood by someone that does
not know the mathematics that you just did.
4.9 Projects
In this section we propose several ideas for projects related to numerical linear
algebra. These projects are meant to be open ended, to encourage creative math-
ematics, to push your coding skills, and to require you to write and communicate
your mathematics. Take the time to read Appendix B before you write your
final solution.
1 2
6 5
4. When a steady state is reached we sort the resulting vector xk to give the
page rank. The node (web page) with the highest rank will be the top
search result, the second highest rank will be the second search result, and
so on.
It doesn’t take much to see that this process can be very time consuming. Think
about your typical web search with hundreds of thousands of hits; that makes a
square matrix H that has a size of hundreds of thousands of entries by hundreds
of thousands of entries! The matrix multiplications alone would take many
minutes (or possibly many hours) for every search! . . . but Brin and Page were
pretty smart dudes!!
We now state a few theorems and definitions that will help us simplify the
iterative Page Rank process.
A probability vector is a vector with entries on the interval [0, 1] that add up
to 1.
A stochastic matrix is a square matrix whose columns are probability vectors.
Exercise 4.95. Finish writing the hyperlink matrix H from Figure 4.3.
Exercise 4.96. Write code to implement the iterative process defined previously.
Make a plot that shows how the rank evolves over the iterations.
Exercise 4.97. What must be true about a collection of n pages such that an
n × n hyperlink matrix H is a stochastic matrix.
Exercise 4.98. The statement of the next theorem is incomplete, but the proof
is given to you. Fill in the blank in the statement of the theorem and provide a
few sentences supporting your answer.
xequilib = lim Ak x0 = .
k→∞
Proof:
First note that A is an n × n stochastic matrix so from Theorem 4.8 we know
that there are n linearly independent eigenvectors. We can then substitute the
eigenvalues from Theorem 4.8 in Theorem 4.7. Noting that if 0 < λj < 1 we
have limk→∞ λkj = 0 the result follows immediately.
204 CHAPTER 4. LINEAR ALGEBRA
Exercise 4.99. Discuss how Theorem 4.9 greatly simplifies the PageRank
iterative process described previously. In other words: there is no reason to
iterate at all. Instead, just find . . . what?
Exercise 4.100. Now use the previous two problems to find the resulting
PageRank vector from the web in Figure 4.3? Be sure to rank the pages in order
of importance. Compare your answer to the one that you got in problem 2.
1 2
7 3
6 5
8 4
Exercise 4.102. One thing that we didn’t consider in this version of the Google
Page Rank algorithm is the random behavior of humans. One, admittedly
slightly naive, modification that we can make to the present algorithm is to
assume that the person surfing the web will randomly jump to any other page
in the web at any time. For example, if someone is on page 1 in Figure 4.4 then
they could randomly jump to any page 2 - 8. They also have links to pages 2, 3,
and 7. That is a total of 10 possible next steps for the web surfer. There is a
2/10 chance of heading to page 2. One of those is following the link from page 1
to page 2 and the other is a random jump to page 2 without following the link.
Similarly, there is a 2/10 chance of heading to page 3, 2/10 chance of heading to
page 7, and a 1/10 chance of randomly heading to any other page.
Implement this new algorithm, called the random surfer algorithm, on the web
in Figure 4.4. Compare your ranking to the non-random surfer results from the
previous problem.
4.9. PROJECTS 205
Ax = b
As an example,
2 3 4 2 0 0 0 3 4
5 6 7 = 5 6 0 + 0 0 7 .
8 9 1 8 9 1 0 0 0
Ax = b =⇒ (L + U )x = b =⇒ Lx + U x = b.
7 Technically speaking we should not call this a “factorization” since we have not split the
matrix A into a product of two matrices. Instead we should call it a “partition” since in
number theory we call the process of breaking an integer into the sum of two integers is called
a “partition.” Even so, we will still use the word factorization here for simpllicity.
206 CHAPTER 4. LINEAR ALGEBRA
Your Tasks:
4.9. PROJECTS 207
Your Tasks
1. Pick a small (larger than 3 × 3) matrix and an appropriate right-hand side
b and work each of the algorithms by hand. You do not need to write this
step up in the final product, but this exercise will help you locate where
things may go wrong in the algorithms and what conditions we might need
on A in order to get convergent sequences of approximate solutions.
2. Build Python functions that accept a square matrix A and complete the
factorizations A = L + U and A = L + D + U .
3. Build functions to implement the two methods and then demonstrate that
the methods work on a handful of carefully chosen test examples. As part of
these functions you need to build a way to deal with the matrix inversions
as well as build a stopping rule for the iterative schemes. Hint: You should
use a while loop with a proper logical condition. Think carefully about
what we’re finding at each iteration and what we can use to check our
accuracy at each iteration. It would also be wise to write your code in such
a way that it checks to see if the sequence of approximations is diverging.
4. Discuss where each method might fail and then demonstrate the possible
failures with several carefully chosen examples. Stick to small examples
and work these out by hand to clearly show the failure.
5. Iterative methods such as these will produce a sequence of approximations,
but there is no guranatee that either method will actually produce a
convergent sequence. Experiment with several examples and propose a
condition on the matrix A which will likely result in a convergent sequence.
Demonstrate that the methods fail if your condition is violated and that
the methods converge if your condition is met. Take care that it is tempting
to think that your code is broken if it doesn’t converge. The more likely
scenario is that the problem that you have chosen to solve will result
in a non-convergent sequence of iterations, and you need to think and
experiment carefully when choosing the example problems to solve. One
such convergence criterion has something to do with the diagonal entries of
A relative to the other entries, but that doesn’t mean that you shouldn’t
explore other features of the matrices as well (I-gen can’t give you any
more hints than that). This task is not asking for a proof; just a conjecture
and convincing numerical evidence that the conjecture holds. The actual
proofs are beyond the scope of this project and this course.
6. Devise a way to demonstrate how the time to solve a large linear system
Ax = b compares between our two new methods, the LU algorithm, and
the QR algorithm that we built earlier in the chapter. Conclude this
demonstration with apropriate plots and ample discussion.
You need to do this project without the help of your old buddy Google. All code
must be originally yours or be modified from code that we built in class. You
can ask Google how Python works with matrices and the like, but searching
directly for the algorithms (which are actually well-known, well-studied, and
named algorithms) is not allowed.
208 CHAPTER 4. LINEAR ALGEBRA
Ordinary Differential
Equations
our galaxy)!
Other examples of ODEs that are impossible to solve analytically are
• The motion of a pendulum where the angle from equilibrium is allowed
to be large or the pendulum is allowed to swing over the top (e.g. the
nonlinear pendulum).
• Systems of differential equations that model nonlinear predator-prey inter-
actions (e.g. the Lotka-Voltera equations).
• Some types of damped oscillations in electric circuits (e.g. the Van der Pol
oscillator).
• . . . and many others.
The impossibility of solving a differential equation stems partly from the impossi-
bility of integrating most functions. If we were to just randomly choose functions
to integrate we would find that the vast majority do not have antiderivatives. The
story in ODEs is the same: pick any combination of a function, its derivatives,
and other forcing functions and you will find that there is no way to arrive at an
analytic solution involving the regular operations and functions of mathematics:
linear combinations, powers, roots, trigonometric functions, logarithms, etc.
There are theorems from differential equations that will guarantee the existence
and uniqueness of solutions to many differential equations, but just knowing
that the solution exists isn’t enough to actually go and find it. Numerical
techniques give us an avenue to at least approximate these solutions. For a video
introduction to numerical ODEs go to https://youtu.be/I2_vabu_VlU.
So what is a numerical solution to a differential equation?
When solving a differential equation with analytic techniques the goal is to
come up with a function. In a numerical solution the goal is typically to divide
the domain (typically the domain is time) for the solution function into a fine
partition, just like we did with numerical differentiation and integration, and
then to approximate the solution to the differential equation at each point in
that partition. Hence, the end result will be a list of approximate solution values
associated with each time. In the strictest sense a list of approximate solutions
on a partition is actually a function (a relation between input and output), but
this isn’t a function in terms of sines, powers, roots, logarithms, etc. The best
way to deliver a numerical solution is just to make a plot. Your intuition of what
the plot should look like based on the context of the problem is one of the best
tools for you to check your work.
Exercise 5.1. Sketch a plot of the function that would model each of the
following scenarios.
a. A population of an endangered species is slowly dying off. The rate at
which the population decreases is proportional to the amount of population
that is presently there. What does the population as a function of time
look like for this species?
5.1. INTRO TO NUMERICAL ODES 211
Now let’s formalize the conversation about differential equations, analytic solu-
tions, and numerical solutions.
Definition 5.1. (Differential Equation) A differential equation is an
equation that relates the derivative (or derivatives) of an unknown function to
itself.
Furthermore, x(0) = 3e−0.25·0 = 3e0 = 3X. Hence, the function x(t) = 3e−0.25t
is indeed a solution to the differential equation x0 = −0.25x with x(0) = 3.
In this chapter we will examine some of the more common ways to create
approximations of solutions to differential equations. Moreover, we will lean
heavily on Taylor Series to give us ways to accurately measure the order of the
errors that we make in the process.
5.2. RECALLING THE BASICS OF ODES 213
Exercise 5.2. Identify which of the following problems are differential equations
and which are algebraic equations. (Do not try to solve any of these equations)
a. x2 + 5x = 7x3 − 2
b. x00 + 5x = 7x000 − 2
c. x0 + 5 = −3x
d. x00 x0 x = 8
e. x2 · x = 8
c. x(t) = C0 t3 + C1 t + C2
d. x(t) = C3 t3 + C2 t2 + C1 t + C0
e. x(t) = e3t + C1 t + C2
f. x(t) = sin(3t) + C1 t + C2
Exercise 5.5. Prove that the function x(t) = − 21 cos(2t) + 72 solves the differen-
tial equation x0 = sin(2t) with the initial condition x(0) = 3.
Next we can recall one of the easiest techniques of solving ODEs by hand:
separation of variables. We review separation here since we will often choose
very easy (i.e. separable) differential equations to check our numerical work.
Theorem 5.1. (Separation of Variables) To solve a differential equation of
the form
dx
= f (x)g(t)
dt
we can separate the variables and rewrite the problem as
Z Z
dx
= g(t)dt.
f (x)
Integrating both sides and solving for x(t) gives the solution.
Pproof:*
If dx
dt = f (x)g(t) then we can first divide both sides by f (x) (assuming that it is
nonzero) and integrate both sides of the equation with respect to t to get
Z Z
1 dx
dt = g(t)dt.
f (x) dt
The expression dx
dt dt in the left-hand integral is the definition of the differential
dx so the integral equation can be rewritten as
Z Z
1
dx = g(t)dt.
f (x)
Note that it may be quite challenging to actually integrate the functions resulting
from separation of variables.
Exercise 5.7. Solve the differential equation x0 = −2x + 12 with x(0) = 2 using
separation of variables.
There are MANY other techniques for solving differential equations, but a full
discussion of all of those techniques is beyond the scope of this book. For the
remainder of this chapter we will focus on finding approximate solutions to
differential equations. It will be handy, however, to be able to check our work on
problems where an analytic solution is available. Techniques you should remind
yourself of are:
• The method of undetermined coefficients for first- and second-order linear
differential equations.
• The method of integrating factors.
• The eigenvalue-eigenvector method for solving linear systems of differential
equations.
216 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
x(1) ≈
b. Use your answer from part (a) for time t = 1 to approximate the x
value at time t = 2. Then use that value to approximate the value at
time t = 3. Repeat the process to approximate the value of x at times
t = 2, 3, 4, 5, . . . , 10. Record your answers in the table below. Then find
the analytic solution to this differential equation and record the x values
at the appropriate times.
t 0 1 2 3 4 5 6 7 8 9 10
Approximation of x(t) 6
Exact value of x(t) 6
iii. Why is it helpful to have the slope field in the background on this
plot?
Exercise 5.10. In Figure 5.2 you see the analytic solution at x(0) = 5 and a
slope field for an unknown differential equation.
5.3. EULER’S METHOD 217
Figure 5.1: Plot your approximate solution on top of the slope field and the
exact solution.
a. Use the slope field and a step size of ∆t = 1 to plot approximate solution
values at t = 1, t = 2, . . ., t = 10. Connect your points with straight lines.
The collection of line segments that you just drew is an aproximation to
the solution of the unknown differential equation.
b. Use the slope field and a step size of ∆t = 0.5 to plot approximate solution
avlues at t = 0.5, t = 1, t = 1.5, . . ., t = 10. Again, connect your points
with straight lines to get an approximation of the solution to the unknown
differential equation.
c. If you could take ∆t to be very very small, what difference would you see
graphically between the exact solution and your collection of line segments?
Why?
Figure 5.2: Plot your approximate solution on top of the slope field and the
exact solution.
x(t + h) − x(t)
≈ f (t, x(t)).
h
Rewriting as a difference equation, letting xn+1 = x(tn + h) and xn = x(tn ), we
get
xn+1 = xn + hf (tn , xn )
A way to think about Euler’s method is that at a given point, the slope is
approximated by the value of the right-hand side of the differential equation
and then we step forward h units in time following that slope. Figure 5.3 shows
a depiction of the idea. Notice in the figure that in regions of high curvature
Euler’s method will overshoot the exact solution to the differential equation.
However, taking the limit as h tends to 0 theoretically gives the exact solution
at the trade off of needing infinite computational resources.
Exercise 5.11. Why would Euler’s method overshoot the exact solution in
regions where the solution exhibits high curvature?
5.3. EULER’S METHOD 219
5 y
Exact solution
Euler with h = 1
4 Euler with h = 0.5
t
1 2 3 4 5
Exercise 5.12. Write code to implement Euler’s method for initial value
problems. Your function should accept as input a Python function f (t, x), an
initial condition, a start time, an end time, and the value of h = ∆t. The output
should be vectors for t and x that you can easily plot to show the numerical
solution. The code below will get you started.
def euler1d(f,x0,t0,tmax,dt):
t = # set up the domain based on t0, tmax, and dt
# next set up an array for x that is the same size a t
x = np.zeros_like(t)
x[0] = # fill in the initial condition
for n in range( ??? ): # think about how far we should loop
x[n+1] = # advance the solution forward in time with Euler
return t, x
Exercise 5.13. Test your code from the previous exercise on a first order
differential equation where you know the answer. Then test your code on the
differential equation
1
x0 = − x + sin(t) where x(0) = 1.
3
The partial code below should get you started.
import numpy as np
import matplotlib.pyplot as plt
# put the f(t,x) function on the next line
# (be sure to specify t even if it doesn't show up in your ODE)
f = lambda t, x: # your function goes here
x0 = # initial condition
220 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
t0 = # initial time
tmax = # final time (your choice)
dt = # Delta t (your choice, but make it small)
t, x = euler1d(f,x0,t0,tmax,dt)
plt.plot(t,x,'b-')
plt.grid()
plt.show()
1 −t/3
x(t) = 19e + 3 sin(t) − 9 cos(t) .
10
The goal of this problem will be to compare the maximum error on the interval
t ∈ [0, 5] for various values of ∆t in your Euler solver.
a. Write code that gives the maximum point-wise error between your numeri-
cal solution and the analytic solution given a value of ∆t.
b. Using your code from part (a), build a plot with the value of ∆t on the
horizontal axis and the value of the associated error on the vertical axis.
You should use a log-log plot. Obviously you will need to run your code
many times at many different values of ∆t to build your data set.
c. In general, if you were to cut your value of ∆t in half, what would that do
to the value of the error? What about dividing ∆t by 10? 100? 1000?
Exercise 5.15. Shelby solved a first order ODE x0 = f (t, x) using Euler’s
method with a step size of dt = 0.1 on a domain t ∈ [0, 3]. To test her code
she used a differential equation where she new the exact analytic solution and
she found the maximum absolute error on the interval to be 0.15. Jackson then
solves the exact same differential equation, on the same interval, with the same
initial condition using Euler’s method and a step size of dt = 0.01. What is
Jackson’s expected maximum absolute error?
Theorem 5.2. Euler’s method is a first order method for approximating the
solution to the differential equation x0 = f (t, x). Hence, if the step size h of the
partition of the domain were to be divided by some positive constant M then the
maximum absolute error between the numerical solution and the exact solution
would ???
(Complete the last sentence.)
5.3. EULER’S METHOD 221
Exercise 5.17. If a mass is hanging from a spring then Newton’s second law,
F = ma, gives us the differential equation mx00 = Frestoring + Fdamping
P
where x is the displacement of the mass from equilibrium, m is the mass of the
object hanging from the spring, Frestoring is the force pulling the mass back to
equilibrium, and Fdamping is the force due to friction or air resistance that slows
the mass down.
a. Which of the following is a good candidate for a restoring force in a spring?
Defend your answer.
i. Frestoring = kx: The restoring force is proportional to the displace-
ment away from equilibrium.
ii. Frestoring = kx0 : The restoring force is proportional to the velocity
of the mass.
iii. Frestoring = kx00 : The restoring force is proportional to the accelera-
tion of the mass.
b. Which of the following is a good candidate for a damping force in a spring?
Defend your answer.
i. Fdamping = bx: The damping force is proportional to the displacement
away from equilibrium.
ii. Fdamping = bx0 : The damping force is proportional to the velocity of
the mass.
iii. Fdamping = bx00 : The damping force is proportional to the acceleration
of the mass.
c. Put your answers to parts (a) and (b) together and simplify to form a
second-order differential equation for position:
x00 + x0 + x=0
x00 = x1
x01 =
e. The code and Euler’s method algorithm that we’ve created thus far in this
chapter are only designed to work with a single differential equation instead
222 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
x0 = F (t, x)
where F is a function that accepts a vector of inputs, plus time, and returns
a vector of outputs. In the context of this particular problem,
0
x0 x1
F (t, x) = =
x01
xn+1 = + hF ( , ).
g. We now have a choice about how we’re going to code this new 2D version
of Euler’s method. We could just include one more input function and one
more input initial condition into the euler() function so that the Python
function call is euler(f0,f1,x0,x1,t0,tmax,dt) where f0 and f1 are
the two right-hand sides of the system, and x0 and x1 are the two initial
conditions. Alternatively, we could rethink our euler() function so that
it accepts an array of functions and an array of initial conditions so that
the Python function call is euler(F,X,t0,tmax,dt) where F is a Python
array of functions and X is a Python array of initial conditions. Discuss
the pros and cons of each approach.
h. The following Python function and associated script will implement the
vector version of Euler’s method. Complete the code and then use it to
solve the system of equations from part (d). Use a mass of m = 2kg,
a damping force of b = 40kg/s, and a spring constant of k = 128N/m.
Consider an initial position of x = 0m (equilibrium) and an initial velocity
of x1 = 0.6m/s. Show two plots: a plot that shows both position and
velocity vs time and a second plot, called a phase plot, that shows position
vs velocity.
def euler(F,x0,t0,tmax,dt):
t = # same code as before to set up a vector for time
# Next we set up x so that it is an array where the columns
# are the different dimensions of the problem. For example,
# in this problem there will be 2 columns and len(t) rows
x = np.zeros( (len(t), len(x0)) )
x[0,:] = x0 # store the initial condition in the first row
for n in range(len(t)-1):
5.3. EULER’S METHOD 223
To use the euler() function defined above we can use the following code. Fill
in the code for this system of differential equations with this problem.
F = lambda t, x: np.array([ x[1] , ??? ])
x0 = [ ??? , ??? ] # initial conditions
t0 = 0
tmax = 5 # pick something reasonable here
dt = 0.01 # your choice. pick something small
t, x = euler(F,x0,t0,tmax,dt)
# Next we plot the solutions against time
plt.plot(t,x[ ??? , ???],'b-',t,x[ ??? , ???],'r--')
plt.grid()
plt.title('Time Evolution of Position and Velocity')
plt.legend(['which legend entry here','which legend entry here'])
plt.xlabel('time')
plt.ylabel('position and velocity')
plt.show()
# Then we plot one solution against the other for a phase plot
# In a phase plot time is implicit (not one of the axes)
plt.plot(x[ ??? , ???], x[ ??? , ???], 'k--')
plt.grid()
plt.title('Phase Plot')
plt.xlabel('???')
plt.ylabel('???')
plt.show()
c. Using similar logic from part (b), write a second order differential equation
224 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
mx001 =
Exercise 5.19. Extend the previous exercise to that there are three masses
hanging in a chain.
Exercise 5.20. If the speed of the mass in the mass-spring oscillator is fast
enough then the damping force will no longer just be proportional to the velocity.
Instead, at higher speeds the drag force is proportional to the square of the veloc-
ity. You can think of this as a bungee jumper jumping off of a bridge. Modify the
single mass-spring oscillator equation to allow for nonlinear quadratic damping.
Solve the system numerically under several different physical conditions (stiff
spring, non-stiff spring, high damping, low damping, different initial conditions,
etc).
Exercise 5.21. (A Lotka-Volterra Model) Test your code from the previous
problems on the following system of differential equations by showing a time
evolution plot (time on x0 and populations on x1 ) as well as a phase plot (x0 on
the x and x1 on the y with time understood implicitly):
The Lotka-Volterra Predator-Prey Model:
Let x0 (t) denote the number of rabbits (prey) and x1 (t) denote the number of
foxes (predator) at time t. The relationship between the species can be modeled
by the classic 1920’s Lotka-Volterra Model:
x00
= αx0 − βx0 x1
x01 = δx0 x1 − γx1
where α, β, γ, and δ are positive constants. For this problems take α ≈ 1.1,
β ≈ 0.4, γ ≈ 0.1, and δ ≈ 0.4.
5.3. EULER’S METHOD 225
a. First rewrite the system of ODEs in the form x0 = F (t, x) so you can use
your euler() code.
b. Modify your code from the previous problem so that it works for this
problem. Use tmax = 200 and an appropriately small time step. Start
with initial conditions x0 (0) = 20 rabbits and x1 (0) = 1 fox.
c. Create the time evolution plot. What does this plot tell you in context?
d. Create a phase plot. What does this plot tell you in context?
e. If you cut your time step in half, what do you see in the two plots? Why?
What is Euler’s method doing here?
Exercise 5.22. (The SIR Model) A classic model for predicting the spread of
a virus or a disease is the SIR Model. In these models, S stands for the proportion
of the population which is susceptible to the virus, I is the proportion of the
population that is currently infected with the virus, and R is the proportion of
the population that has recovered from the virus. The idea behind the model is
that
• Susceptible people become infected by hanving interaction with the
infected people. Hence, the rate of change of the susceptible people is
proportional to the number of interactions that can occur between the S
and the I populations.
S 0 = −αSI
• The infected population gains people from the interactions with the suscep-
tible people, but at the same time, infected people recover at a predictable
rate.
I 0 = αSI − βI
• The people in the recovered class are then immune to the virus, so the
recovered class R only gains people from the recoveries from the I class.
R0 = βI
a. Explain the minus sign in the S 0 equation in the context of the spread of a
virus.
b. Explain the product SI in the S 0 equation in the context of the spread of
a virus.
c. Find a numerical solution to the system of equations using your euler()
function. Use the parameters α = 0.4 and β = 0.04 with initial conditions
S(0) = 0.99, I(0) = 0.01, and R(0) = 0. Explain all three curves in
context.
226 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
Exercise 5.24. Let’s return to the simple differential equation x0 = −0.5x with
x(0) = 6 that we saw in Exercise 5.9. Now we’ll propose a slightly different
method for approximating the solution.
a. At t = 0 we know that x(0) = 6. If we use the slope at time t = 0 to step
forward in time then we will get the Euler approximation of the solution.
Consider this alternative approach:
• Use the slope at time t = 0 and move half a step forward.
• Find the slope at the half-way point
• Then use the slope from the half way point to go a full step forward from
time t = 0.
Perhaps a bit confusing . . . let’s build this idea together:
• What is the slope at time t = 0? x0 (0) =
• Use this slope to step a half step forward and find the x value: x(0.5) ≈
• Now use the differential equation to find the slope at time t = 0.5. x0 (0.5) =
• Now take your answer from the previous step, and go one full step forward
from time t = 0. What x value do you end up with?
• Your answers to the previous bullets should be: x0 (0) = −3, x(0.5) ≈ 4.5,
x0 (0.5) = −2.25, so if we take a full step forward with slope m = −2.25
starting from t = 0 we get x(1) ≈ 3.75.
b. Repeat the process outlined in part (a) to approximate the solution to the
5.4. THE MIDPOINT METHOD 227
t 0 1 2 3 4 5 6 7 8 9 10
Euler approx of x(t) 6
New approx of x(t) 6
Exact value of x(t) 6
mn = f (tn , xn )
h
xtemp = xn + mn
2
∆t
xn+1 = xn + hf tn + , xtemp
2
Exercise 5.25. Complete the code below to implement the midpoint method
in one dimension.
def midpoint1d(f,x0,t0,tmax,dt):
t = # build the times
x = # build an array for the x values
x[0] = # build the initial condition
# On the next line: be careful about how far you're looping
for n in range( ??? ):
# The interesting part of the code goes here.
return t, x
228 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
Test your code on several differential equations where you know the solution
(just to be sure that it is working).
f = lambda t, x: # your ODE right hand side goes here
x0 = # initial condition
t0 = 0
tmax = # ending time (up to you)
dt = # pick something small
t, x = midpoint1d( ??? , ??? , ??? , ??? , ??? )
plt.plot( ??? , ??? , ??? )
plt.grid()
plt.show()
Exercise 5.26. The goal in building the midpoint method was to hopefully
capture some of the upcoming curvature in the solution before we overshot
it. Consider the differential equation x0 = − 13 x + sin(t) with initial condition
x(0) = 1 on the domain t ∈ [0, 4]. First get a numerical solution with Euler’s
method using ∆t = 0.1. Then get a numerical solution with the midpoint method
using the same value for ∆t. Plot the two solutions on top of each other along
with the exact solution
1 −t/3
x(t) = 19e + 3 sin(t) − 9 cos(t) .
10
What do you observe? What do you observe if you make ∆t a bit larger (like
0.2 or 0.3)? What do you observe if you make ∆t very very small (like 0.001 or
0.0001)?
There are several key takeaways from this problem. Discuss.
Exercise 5.27. Repeat Exercise 5.14 with the midpoint method. Compare your
results to what you found with Euler’s method.
Exercise 5.28. We have studied two methods thus far: Euler’s method and
the Midpoint method. In Figure 5.4 we see a graphical depiction of how each
method works on the differential equation y 0 = y with ∆t = 1 and y(0) = 1. The
exact solution at t = 1 is y(1) = e1 ≈ 2.718 and is shown in red in each figure.
The methods can be summarized in the table below.
Discuss what you observe as the pros and cons of each method based on the
table and on the Figure.
Euler Midpoint
3 3
(1, 2.71) (1, 2.71)
(1, 2.5)
2 2
(1, 2)
m=1 m = 1.5
1 1
Figure 5.4: Graphical depictions of two numerical methods: Euler (left) and
Midpoint (right). The exact solution is shown in red.
Exercise 5.29. When might you want to use Euler’s method instead of the
midpoint method? When might you want to use the midpoint method instead
of Euler’s method?
• You take a half a step forward using the slope where you’re standing. The
new point, denoted xn+1/2 , is given by
∆t
location a half step forward is: xn+1/2 = xn + mn .
2
• Now you’re standing at (tn + ∆t
2 , xn+1/2 ) so there is a new slope here given
by
slope after a half of an Euler step is: mn+1/2 = f (tn + ∆t/2, xn+1/2 ).
• Go back to the point (tn , xn ) and step a full step forward using slope
mn+1/2 . Hence the new approximation is
xn+1 = xn + ∆t · mn+1/2
Exercise 5.32. One of the troubles with the midpoint method is that it doesn’t
actually use the information at the point (tn , xn ). Moreover, it doesn’t leverage a
slope at the next time step tn+1 . Let’s see what happens when we try a solution
technique that combined the ideas of Euler and Midpoint as follows:
• The slope at the point (tn , xn ) can be called mn and we find it by evaluating
f (tn , xn ).
• The slope at the point (tn+1/2 , xn+1/2 ) can be called mn+1/2 and we find
it by evaluating f (tn+1/2 , xn+1/2 ).
• We can now take a full step using slope mn+1/2 to get the point xn+1 and
the slope there is mn+1 = f (tn+1 , xn+1 ).
• Now we have three estimates of the slope that we can use to actually
propagate forward from (tn , xn ):
– We could just use mn . This is Euler’s method.
– We could just use mn+1/2 . This is the midpoint method.
– We could use mn+1 . Would this approach be any good?
– We could use the average of the three slopes.
5.5. THE RUNGE-KUTTA 4 METHOD 231
# *********
# You should copy your euler and midpoint functions here.
# We will be comparing to these two existing methods.
# *********
def ode_test(f,x0,t0,tmax,dt):
t = np.arange(t0,tmax+dt,dt) # set up the times
x = np.zeros(len(t)) # set up the x
x[0] = x0 # initial condition
for n in range(len(t)-1):
m_n = f(t[n],x[n])
x_n_plus_half = x[n] + (dt/2)*m_n
m_n_plus_half = f( t[n]+dt/2 , x_n_plus_half )
x_n_plus_1 = x[n] + dt * m_n_plus_half
m_n_plus_1 = f(t[n]+dt, x_n_plus_1 )
estimate_of_slope = # This is where you get to play
x[n+1] = x[n] + dt * estimate_of_slope
return t, x
x0 = 1 # initial condition
t0 = 0 # initial time
tmax = 3 # max time
# set up blank arrays to keep track of the maximum absolute errorrs
err_euler = []
232 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
err_midpoint = []
err_ode_test = []
# Next give a list of Delta t values (what list did we give here)
H = 10.0**(-np.arange(1,7,1))
for dt in H:
# Build an euler approximation
t, xeuler = euler(f,x0,t0,tmax,dt)
# Measure the max abs error
err_euler.append( np.max( np.abs( xeuler - exact(t) ) ) )
# Build a midpoint approximation
t, xmidpoint = midpoint(f,x0,t0,tmax,dt)
# Measure the max abs error
err_midpoint.append( np.max( np.abs( xmidpoint - exact(t) ) ) )
# Build your new approximation
t, xtest = ode_test(f,x0,t0,tmax,dt)
# Measure the max abs error
err_ode_test.append( np.max( np.abs( xtest - exact(t) ) ) )
Exercise 5.33. In the previous exercise you should have found that an average
of the three slopes did just a little bit better than the midpoint method but the
order of the error (the slope in the loglog plot) stayed about the same. You
should have also found that the weighted average
mn + 2mn+1/2 + mn+1
estimate of slope =
4
did just a little bit better than just a plain average. Why might this be? (If you
haven’t tried this weighted average then go back and try it.) Do other weighted
averages of this sort work better or worse? Does it appear that we can improve
upon the order of the error (the slope in the loglog plot) using any of these
methods?
Exercise 5.34. OK. Let’s make one more modification. What if we built a
fourth slope that resulted from stepping a half step forward using mn+1/2 ? We’ll
5.5. THE RUNGE-KUTTA 4 METHOD 233
∆t
x∗n+1/2 = xn + mn+1/2
2
m∗n+1/2 = f (tn + ∆t/2, x∗n+1/2 )
Then calculate mn+1 using this new slope instead of what we did in the previous
problem.
a. Draw a picture showing where this slope was calculated.
b. Modify the code from above to include this fourth slope.
c. Experiment with several ideas about how to best combine the four slopes:
mn , mn+1/2 , m∗n+1/2 , and mn+1 .
• Should we just take an average of the four slopes?
• Should we give one or more of the slopes preferential treatment and
do some sort of weighted average?
• Should we do something else entirely?
Remember that we are looking to improve the slope in the loglog plot since that
indicates an improvement in the order of the error (the accuracy) of the method.
Exercise 5.35. In the previous exercise you no doubt experimented with many
different linear combinations of mn , mn+1/2 , m∗n+1/2 , and mn . Many of the
resulting numerical ODE methods likely had the same order of accuracy (again,
the order of the method is the slope in the error plot), but some may have
been much better or much worse. Work with your team to fill in the following
summary table of all of the methods that you devised. If you generated linear
combinations that are not listed below then just add them to the list (we’ve only
listed the most common ones here).
Exercise 5.36. In the previous exercise you should have found at least one of the
many methods to be far superior to the others. State which linear combination
of slopes seems to have done the trick, draw a picture of what this method does
to numerically approximate the next slope for a numerical solution to an ODE,
and clearly state what the order of the error means about this method.
k1 = f (tn , xn )
h h
k2 = f (tn + , xn + k1 )
2 2
h h
k3 = f (tn + , xn + k2 )
2 2
k4 = f (tn + h, xn + hk3 )
h
xn+1 = xn + (k1 + 2k2 + 2k3 + k4 )
6
a. Show that indeed we have derived the same exact algorithm.
Exercise 5.40. Let’s step back for a second and just see what the RK4 method
does from a nuts-and-bolts point of view. Consider the differential equation
x0 = x with initial condition x(0) = 1. The solution to this differential equation
is clearly x(t) = et . For the same of simplicity, take ∆t = 1 and perform 1 step
of the RK4 method BY HAND to approximate the value x(1).
def rk41d(f,x0,t0,tmax,dt):
t = np.arange(t0,tmax+dt,dt)
x = np.zeros_like(t)
x[0] = x0
for n in range(len(t)-1):
# the interesting bits of the code go here
return t, x
236 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
(**RK4 in Several Dimensions**) Modify your Runge-Kutta 4 code to work for any number o
5.6. ANIMATING ODE SOLUTIONS 237
5.6.1 ipywidgets.interactive
Consider the differential equation x0 = f (t, x) with x(0) = x0 . We would like to
build an animation of the numerical solution to this differential equation over
the parameters t but also over x0 and ∆t. The following blocks of python code
walk through this animation.
import numpy as np
In the next block of code we define our euler() solver. This particular step is
only included because we are using Euler’s method to solve this specific problem.
In general, include an functions or code that are going to be used to produce
the data that you will be plotting. We will also introduce the function f and
the parameter t0 since we will not be animating over these parameters.
def euler(f,x0,t0,tmax,dt):
N = int(np.floor((tmax-t0)/dt)+1)
t = np.linspace(t0,tmax,N+1)
x = np.zeros_like(t)
x[0] = x0
for n in range(len(t)-1):
x[n+1] = x[n] + dt*f(t[n],x[n])
return t, x
Next we build a function that accepts only the parameters that we want to
animate over and produces only a plot. This function will be called later by the
ipywidgets.interactive function every time we change one of the parameters
so be sure that this is a clean and fast function to evaluate (keep the code
simple).
def eulerAnimator(x0,tmax,dt):
# call on the euler function to build the solution
t, x = euler(f,x0,t0,tmax,dt)
plt.plot(t, x, 'b-') # plot the solution
plt.xlim(0,30)
plt.ylim( np.min(x)-1, np.max(x)+1)
plt.grid()
plt.show()
A static snapshot of the animation applet is shown in Figure 5.5. When you
build this animation you will have control over all three parameters. Like we
mentioned before, this sort of animation can be a great playground for building
insight into the interplay between parameters.
Exercise 5.42. Modify the previous exercise to use a different numerical solver
(e.g. the midpoint method) instead of Euler’s method.
5.6.2 matplotlib.animation
The next animation package that we discuss is the matplotlib.animation
package. This particular package is very similar to ipywidgets.interactive,
but results only in a playable movie that is embedded within the Google Colab
environment.
Next we write all of the code necessary to build an Euler solution for the
differential equation. Take note, of course, that much of this code is specific only
to this problem and what we really need here is code that produces data for the
animation.
def euler(f,x0,t0,tmax,dt): # this is the Euler function
N = int(np.floor((tmax-t0)/dt)+1)
t = np.linspace(t0,tmax,N+1)
x = np.zeros_like(t)
x[0] = x0
for n in range(len(t)-1):
x[n+1] = x[n] + dt*f(t[n],x[n])
return t, x
Next we have to set up the figure that we are going to animate. This involves:
fig, ax = plt.subplots()
plt.close()
# Below we set up many of the global parameters for the plot.
# Much of what we do here depends on what we are trying to animate.
ax.grid()
ax.set_xlabel('Time')
ax.set_ylabel('Approximate Solution')
ax.set_xlim(( t0, tmax))
ax.set_ylim((np.min(x)-0.5, np.max(x)+0.5))
frame, = ax.plot([], [], linewidth=2, linestyle='--')
# notice we also set line and marker parameters here
Now we build a function that accepts only the animation frame number, N, and
adds appropriate elements to the plot defined by frame.
def animator(N): # N is the animation frame number
T = t[:N] # get t data up to the frame number
X = x[:N] # get x data up to the frame number
# display the current simulation time in the title
ax.set_title('Time='+t[N])
# put the data for the current frame into the varable "frame"
frame.set_data(T,X)
return (frame,)
In the next block of code we define which frames we want to use in the anima-
tion and then we call upon the matplotlib.animation function to build the
animation.
# The Euler solution takes many very small time steps.
# To speed up the animation we view every 10th iteration.
PlotFrames = range(0,len(t),10)
anim = animation.FuncAnimation(fig, # call on the figure
# next call the function that builds the animation frame
animator,
# next tell which frames to pass to animator
frames=PlotFrames,
# lastly give the delay between frames
interval=100
)
Finally, we embed the animation into the Google Colab environment. Take note
that if you are using a different Python IDE then you may need to experiment
with how to show the resulting animation.
rc('animation', html='jshtml') # embed in the HTML for Google Colab
anim # show the animation
A static snapshot of the resulting animation can be seen in Figure 5.6. The
242 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
controls for the animation should be familiar from other media players.
Figure 5.6: Snapshot of the ODE animation applet with matplotlib animation.
Exercise 5.45. Modify the code from the previous exercise to show faster and
slower animations.
• We will always know the value of tn+1 and we will always know the value
of xn , but we don’t know the value of xn+1 . In fact, that is exactly what
we want. The major trouble is that xn+1 shows up on both sides of the
equation. Can you think of a way to solve for it? . . . you have code that
does this step!!!
• This method is called the Backward Euler method and is known as an
implicit method since you do not explicitly calculate xn+1 but instead
there is some intermediate calculation that needs to happen to solve for
xn+1 . The (usual) advantage to an implicit method such as Backward
Euler is that you can take far fewer steps with reasonably little loss of
accuracy. We’ll see that in the coming problems.
Exercise 5.48. Let’s take a few steps through the backward Euler method on
a problem that we know well: x0 = −0.5x with x(0) = 6.
Let’s take h = 1 for simplicity, so the backward Euler iteration scheme for this
particular differential equation is
1
xn+1 = xn − xn+1 .
2
Notice that xn+1 shows up on both sides of the equation. A little bit of
rearranging gives
3 2
xn+1 = xn =⇒ xn+1 = xn .
2 3
a. Complete the following table.
244 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
t 0 1 2 3 4 5 6 7 8 9 10
Euler Approx. of x 6 3 1.5 0.75
Back. Euler Approx.of x 6 4 2.667 1.778
Exact value of x 6 3.64 2.207 1.339
b. Compare now to what we found for the midpoint method on this problem
as well.
Exercise 5.49. The previous problem could potentially lead you to believe
that the backward Euler method will always result in some other nice difference
equation after some algebraic rearranging. That isn’t true! Let’s consider a
slightly more complicated differential equation and see what happens
1
x0 = − x2 with x(0) = 0.
2
a. Recall that the backward Euler approximation is
Let’s take h = 1 for simplicity (we’ll make it smaller later). What is the
backward Euler formula for this particular differential equation?
b. You should notice that your backward Euler formula is now a quadratic
function in xn+1 . That is to say, if you are given a value of xn then you
need to solve a quadratic polynomial equation to get xn+1 . Let’s be more
explicit:
We know that x(0) = 6 so in our numerical solutions, x1 = 6. In order to
get x2 we consider the equation x2 = x1 − 12 x22 . Rearranging we see that
we need to solve 12 x22 + x2 − 6 = 0 in order to get x2 . Doing so gives us
√
x2 = 13 − 1 ≈ 2.606.
c. Go two steps further with the backward Euler method on this problem.
Then take the same number of steps with regular (forward) Euler’s method.
d. Work our the analytic solution for this differential equation (using sepa-
ration of variables perhaps). Then compare the values that you found in
parts (b) and (c) of this problem to values of the analytic solution and
values that you would find form the regular (forward) Euler approximation.
What do you notice?
The complications with the backward Euler’s method are that you have a
nonlinear equation to solve at every time step
You know the values of h = ∆t, tn+1 and xn , and you know the function f ,
so, in a practical sense, you should use some sort of Newton’s method iteration
to solve that equation – at each time step. More simply, we could call upon
scipy.optimize.fsolve() to quickly implement a built in Python numerical
root finding technique for us.
Exercise 5.51. Test the Backward Euler method from the previous problem on
several differential equations where you know the solution.
Exercise 5.52. Write a script that outputs a log-log plot with the step size
on the horizontal axis and the error in the numerical method on the vertical
axis. Plot the errors for Euler, Midpoint, Runge Kutta, and Backward Euler
measured against a differential equation with a known analytic solution. Use
this plot to conjecture the convergence rates of the four methods. You can use
the differential equation x0 = − 13 x + sin(t) with x(0) = 1 like we have for many
of our past algorithm since we know that the solution is
1 −t/3
x(t) = 19e + 3 sin(t) − 9 cos(t)
10
Exercise 5.53. What is the order of the error on the Backward Euler method?
Given this answer, what are the pros and cons of the Backward Euler method
246 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
over the regular Euler method? What about compared to the Midpoint or Runge
Kutta methods?
Exercise 5.54. It may not be obvious at the outset, but the Backward Euler
method will actually behave better than our regular Euler’s method in some
sense. Let’s take a look. Consider, for example, the really simply differential
equation x0 = −x with x(0) = 1 on the interval t ∈ [0, 2]. The analytic solution
is x(t) = e−t . Write Python code that plots the analytic solution, the Euler
approximation, and the Backward Euler approximation on top of each other.
Use a time step that is larger than you normally would (such as ∆t = 0.25 or
∆t = 0.5 or larger). Try the same experiment on another differential equation
where we know the exact solution and the solution has some regions of high
curvature. What do you notice? What does Backward Euler do that is an
improvement on regular Euler?
5.8. FITTING ODE MODELS TO DATA 247
Exercise 5.55. (Newton’s Law of Cooling) From Calculus you may recall
Newton’s Law of Cooling:
dT
= −k(T − Tambient )
dt
where T is the temperature of some object (like a cup of coffee), Tambient is the
temperature of the ambient environment, and k is the proportionality constant
that governs the rate of cooling. This is a classic differential equation with a well
known solution.1 In the present situation we don’t want the analytic solution,
1 If you don’t know the solution to Newton’s Law of Cooling then take a moment and do
but instead we will work with a numerical solution since we are thinking ahead
to where the differential equation may be very hard to solve in future problems.
We also don’t want to just look at the data and guess an algebraic form for the
function that best fits the data. That would be a trap! (why?) Instead, we
rely on our knowledge of the physics of the situation to give us the differential
equation.
The following data table gives the temperature (degrees F ) at several times while
a cup of tea cools on a table [7]. The ambient temperature of the room is 65◦ F .
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
data = np.array( pd.read_csv(URL+'Exercise5_newtoncooling.csv') )
# Exercise5_newtoncooling.csv
#
# or you can load the data directly with
# data = np.array([[0,160],[60,155],[180,145],[210,142],[600,120]])
plt.plot(data[??? , ???] , data[??? , ???], 'b*')
plt.grid()
plt.show()
Now we will build several Python functions as well as several additional lines of
code that are created specifically for this problem. Note that every parameter
estimate problem of this type will take similar form, but there may be subtle
differences depending on the data that you need to account for in each problem.
You will need to taylor make parts of each parameter estimation script for each
new problem.
• First we set the stage by defining ∆t, a collection of times that contains
the data, the function f (t, x; k) which depends on the parameter k, and
any other necessary parameters of our specific problem.
5.8. FITTING ODE MODELS TO DATA 249
import numpy as np
Tambient = ???
# Next choose an appropriate value of dt.
# Choosing dt so that values of time in the data fall within
# the times for the numerical solution is typically a good
# practice (but is not always possible).
dt = ???
t0 = 0 # time where the data starts
tmax = ??? # just beyond where the data ends
t = np.arange(t0,tmax+dt,dt) # set up the times
# nest we define our specific differential equation
f = lambda t, x, k: -k*(x - Tambient)
x0 = ??? # initial condition pulled from the data
• Now we build a Python function that will accept a value of the parameter
k as the only input and will return a high quality numerical solution to
the proposed differential equation.
def numericalSolution(k):
x = np.zeros_like(t)
x[0] = x0
for n in range(len(t)-1):
# put the code necessary to build a good
# numerical solver here be sure to account
# for the parameter k in each of your function calls.
return t, x
• Spend a little time now playing with different parameters and plotting
numerical solutions along with the data to determine the proper ballpark
value of the parameter.
• Now we need to write a short Python script that will find all of the indices
where the value of time in the data closely match values of time in the
numerical solution. There are many ways to do this, but the most readable
is a pair of nested for loops. Outline what the following code does. Why
are we using dt/2 in the code below? You should work to find more
efficient ways to code this for bigger problems since the nested for loops
is potentially quite time consuming.
indices = []
for j in range(len(t)):
for k in range(len(data)):
if # write a check to find where t is closest to data[:,1]
indices.append(j)
solution associated with k and our data. Carefully dissect the following
code.
def dataMatcher(k):
t, x = numericalSolution(k)
err = []
counter = 0
for n in indices:
err.append( (data[counter,1] - x[int(n)])**2 )
counter += 1
print("For k=",k[0],", SSRes=",np.sum(err)) # optional
return np.sum(err)
• Note: If your optimization does not terminate successfully then you’ll need
to go back to the point where you guess a few values for the parameter so
that your initial guess for scipy.optimize.minimize() is close to what
it should be. It is always helpful to think about the physical context of
the problem to help guide your understanding of which value(s) to choose
for your parameter.
To recap:
• We have data and a proposed differential equation with an unknown
5.8. FITTING ODE MODELS TO DATA 251
parameter.
• We matched numerical solutions to the differential equation to the data
for various values of the parameter.
• We used an optimization routine to find the value of the parameter that
minimized the sum of the squared residuals between the data and the
numerical solution.
At this point you can now use the best numerical solution to answer questions
about the scientific setup (e.g. extrapolation).
Data was gathered on the outbreak and is shown in the table below.
252 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
Use the least squares fitting technique discussed in this section to find the
parameters α and β that minimize the sum of the squared residuals between
a numerical solution of the SIR model and the data. You can load the data
directly with the code below.
Note: The total population is fixed.
import numpy as np
import pandas as pd
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
data = np.array( pd.read_csv(URL+'Exercise5_village.csv') )
# Exercise5_village.csv
5.9 Exercises
5.9.1 Algorithm Summaries
Exercise 5.59. Consider the first-order differential equation x0 = f (t, x). What
is Euler’s method for approximating the solution to this differential equation?
What is the order of accuracy of Euler’s method? Explain the meaning of the
order of the method in the context of solving a differential equation.
Exercise 5.60. Explain in clear language what Euler’s method does geometri-
cally.
Exercise 5.61. Consider the first-order differential equation x0 = f (t, x). What
is the Midpoint method for approximating the solution to this differential
equation? What is the order of accuracy of the Midpoint method? Explain
the meaning of the order of the method in the context of solving a differential
equation.
Exercise 5.62. Explain in clear language what the Midpoint method does
geometrically.
Exercise 5.63. Consider the first-order differential equation x0 = f (t, x). What
is the Runge Kutta 4 method for approximating the solution to this differential
equation? What is the order of accuracy of the Runge Kutta 4 method? Explain
the meaning of the order of the method in the context of solving a differential
equation.
Exercise 5.64. Explain in clear language what the Runge Kutta 4 method does
geometrically.
Exercise 5.65. Consider the first-order differential equation x0 = f (t, x). What
is the Backward Euler method for approximating the solution to this differential
equation? What is the order of accuracy of the Backward Euler method? Explain
the meaning of the order of the method in the context of solving a differential
equation.
Exercise 5.66. Explain in clear language what the Backward Euler method
does geometrically.
5.9. EXERCISES 255
Exercise 5.67. Explain in clear language how to fit a numerical solution of and
ODE model to a dataset.
Exercise 5.69. Test the Euler, Midpoint, and Runge Kutta methods on the
differential equation
Find the exact solution by hand using the method of undetermined coefficients
and note that your exact solution will involve the parameter λ. Produce log-log
plots for the error between your numerical solution and the exact solution for
λ = −1, λ = −10, λ = −102 , . . . , λ = −106 . In other words, create 7 plots (one
for each λ) showing how each of the 3 methods performs for that value of λ at
different values for ∆t.
Exercise 5.70. Two versions of Python code for one dimensional Euler’s method
are given below. Compare and contrast the two implementations. What are the
advantages / disadvantages to one over the other? Once you have made your
pro/con list, devise an experiment to see which of the methods will actually
perform faster when solving a differential equation with a very small ∆t. (You
may want to look up how to time the execution of code in Python.)
def euler(f,x0,t0,tmax,dt):
t = [t0]
x = [x0]
steps = int(np.floor((tmax-t0)/dt))
256 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
for n in range(steps):
t.append(t[n] + dt)
x.append(x[n] + dt*f(t[n],x[n]))
return t, x
def euler(f,x0,t0,tmax,dt):
t = np.arange(t0,tmax+dt,dt)
x = np.zeros_like(t)
x[0] = x0
for n in range(len(t)-1):
x[n+1] = x[n] + dt*f(t[n],x[n])
return t, x
Exercise 5.71. We wish to solve the boundary valued problem x00 + 4x = sin(t)
with initial condition x(0) = 1 and boundary condition x(1) = 2 on the domain
t ∈ (0, 1). Notice that you do not have the initial position and initial velocity as
you normally would with a second order differential equation. Devise a method
for finding a numerical solution to this problem.
Exercise 5.72. Write code to numerically solve the boundary valued differential
equation
x00 = cos(t)x0 + sin(t)x with x(0) = 0 and x(1) = 1.
Exercise 5.73. In this model there are two characters, Romeo and Juliet, whose
affection is quantified on the scale from −5 to 5 described below:
• −5: Hysterical Hatred
• −2.5: Disgust
• 0: Indifference
• 2.5: Sweet Affection
• 5: Ecstatic Love
The characters struggle with frustrated love due to the lack of reciprocity of
their feelings. Mathematically,
• Romeo: “My feelings for Juliet decrease in proportion to her love for me.”
• Juliet: “My love for Romeo grows in proportion to his love for me.”
• Juliet’s emotional swings lead to many sleepless nights, which consequently
dampens her emotions.
This give rise to
dx
dt = −αy
dy
dt = βx − γy 2
5.9. EXERCISES 257
where x(t) is Romeo’s love for Juliet and y(t) is Juliet’s love for Romeo at time
t.
Your tasks:
a. First implement this 2D system with x(0) = 2, y(0) = 0, α = 0.2, β = 0.8,
and γ = 0.1 for t ∈ [0, 60]. What is the fate of this pair’s love under these
assumptions?
b. Write code that approximates the parameter γ that will result in Juliet
having a feeling of indifference at t = 30. Your code should not need
human supervision: you should be able to tell it that you’re looking for
indifference at t = 30 and turn it loose to find an approximation for γ.
Assume throughout this problem that α = 0.2, β = 0.8, x(0) = 2, and
y(0) = 0. Write a description for how your code works in your homework
document.
Exercise 5.74. In this problem we’ll look at the orbit of a celestial body around
the sun. The body could be a satellite, comet, planet, or any other object whose
mass is negligible compared to the mass of the sun. We assume that the motion
takes place in a two dimensional plane so we can describe the path of the orbit
with two coordinates, x and y with the point (0, 0) being used as the reference
point for the sun. According to Newton’s law of universal gravitation the system
of differential equations that describes the motion is
−x −y
x00 (t) = p 3 and y 00 (t) = p 3 .
x2 + y 2 x2 + y 2
a. Define the two velocity functions vx (t) = x0 (t) and vy (t) = y 0 (t). Using
these functions we can now write the system of two second-order differential
equations as a system of four first-order equations
x0 =
vx0 =
y0 =
vy0 =
b. Solve the system of equations from part (a) using an appropriate solver.
Start with x(0) = 4, y(0) = 0, the initial x velocity as 0, and the initial
y velocity as 0.5. Create several plots showing how the dynamics of the
system change for various values of the initial y velocity in the interval
t ∈ (0, 100).
c. Give an animated plot showing x(t) versus y(t).
258 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
Exercise 5.75. In this problem we consider the pursuit and evasion problem
where E(t) is the vector for an evader (e.g. a rabbit or a bank robber) and P (t)
is the vector for a pursuer (e.g. a fox chasing the rabbit or the police chasing the
bank robber)
xe (t) xp (t)
E(t) = and P (t) = .
ye (t) yp (t)
Let’s presume the following:
Assumption 1: the evader has a predetermined path (known only to him/her),
Assumption 2: the pursuer heads directly toward the evader at all times, and
Assumption 3: the pursuer’s speed is directly proportional to the evader’s
speed.
From the third assumption we have
Solving for P 0 (t) the differential equation that we need to solve becomes
E(t) − P (t)
P 0 (t) = kkE 0 (t)k .
kE(t) − P (t)k
Your Tasks:
a. Explain assumption #2 mathematically.
b. Explain assumption #3 physically. Why is this assumption necessary
mathematically?
c. Write code to find the path of the pursuer if the evader has the parameter-
ized path
0
E(t) = for t ≥ 0
5t
2
and the pursuer initially starts at the point P (0) = . Write your code
3
so that it stops when the pursuer is within 0.1 units of the evader. Run
your code for several values of k. The resulting plot should be animated.
d. Modify your code from part (c) to find the path of the pursuer if the evader
has the parameterized path
5 + cos(2πt) + 2 sin(4πt)
E(t) = for t ≥ 0
4 + 3 cos(3πt)
5.9. EXERCISES 259
0
and the pursuer initially starts at the point P (0) = . Write your code
50
so that it stops when the pursuer is within 0.1 units of the evader. Run
your code for several values of k. The resulting plot should be animated.
e. Create your own smooth path for the evader that is challenging for the
pursuer to catch. Write your code so that it stops when the pursuer is
within 0.1 units of the evader. Run your code for several values of k.
f. (Challenge) If you extend this problem to three spatial dimensions you
can have the pursuer and the evader moving on a multivariable surface
(i.e. hilly terrain). Implement a path along an appropriate surface but be
sure that the velocities of both parties are appropriately related to the
gradient of the surface.
Note: It may be easiest to build this code from scratch instead of using one of
our pre-written codes.
dB dK
a. What are the units of dt and dt ?
b. Explain what each of the four terms on the right-hand sides of the differ-
ential equations mean in the context of the problem. Include a reason for
why each term is positive or negative.
c. Find a numerical solution to the differential equation model using B(0) =
75, 000 whales and K(0) = 150 tons per acre.
260 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
Exercise 5.78. A trebuchet catapult throws a cow vertically into the air. The
differential equation describing its acceleration is
d2 x dx dx
2
= −g − c
dt dt dt
where g ≈ 9.8 m/s2 and c ≈ 0.02 m−1 for a typical cow. If the cow is launched
at an initial upward velocity of 30 m/s, how high will it go, and when will it
crash back into the ground? Hint: Change this second order differential equation
into a system of first order differential equations.
In the code below notice that we plot x[:,0] instead of just x. This is
overkill in the case of a scalar ODE, but in a system of ODEs this will be
important.
• You have to specify the array of time for the scipy.integrate.odeint()
function. It is typically easiest to use np.linspace() to build the array
of times.
import numpy as np
import matplotlib.pyplot as plt
import scipy.integrate
f = lambda x, t: -(1/3.0)*x + np.sin(t)
x0 = 1
t = np.linspace(0,5,1000)
x = scipy.integrate.odeint(f,x0,t)
plt.plot(t,x[:,0],'b--')
plt.grid()
plt.show()
Your Tasks:
a. First implement the two blocks of Python code given above. Be sure to
understand what each line of code is doing. Fully comment your code, and
then try the code with several different initial conditions.
262 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
b. For the pendulum system be sure to describe what your initial conditions
mean in the physical setup.
c. Use scipy.integrate.odeint() to solve a nontrivial scalar ODE of your
choosing. Clearly show your ODE and give plots of your solutions with
several different initial conditions.
d. Build a numerical experiment to determine the relationship between your
choice of ∆t and the absolute maximum error between the solution from
.odeint() and a known analytic solution to a scalar ODE. Support your
work with appropriate plots and discussion.
e. Solve the system of differential equations from Exercise 5.74 using
scipy.integrate.odeint(). Show appropriate plots of your solution.
5.10. PROJECTS 263
5.10 Projects
In this section we propose several ideas for projects related to numerical ordinary
differential equations. These projects are meant to be open ended, to encourage
creative mathematics, to push your coding skills, and to require you to write
and communicate your mathematics. Take the time to read Appendix B before
you write your final solution.
where
• P is a patient’s subjective pain level on a 0 to 10 scale,
• Di is the amount of the ith drug in the patient’s bloodstream,
– D1 is a long-acting opioid
– D2 is a short-acting opioid
– D3 is a non-opioid
• k0 is the relaxation rate to baseline pain without drugs,
• ki is the impact of the ith drug on the relaxation rate,
• u is the patient’s baseline (unmitigated) pain,
• kDi is the elimination rate of the ith drug from the bloodstream,
• Ni is the total number of the ith drug doses taken, and
• τi,j are the time times the patient takes the ith drug.
• δ() is the Dirac delta function.
Implement this model with parameters u = 8.01, k0 = ln(2)/2, k1 = 0.319,
k2 = 0.184, k3 = 0.201, kD1 = ln(0.5)/(−10), kD2 = ln(0.5)/(−4), and kD3 =
ln(0.5)/(−4). Take the initial pain level to be P (0) = 3 with no drugs on board.
Assume that the patient begins dosing the long-acting opioid at hour 2 and
takes 1 dose periodically every 24 hours. Assume that the patient begins dosing
the short-acting opioid at hour 0 and takes 1 dose periodically every 12 hours.
Finally assume that the patient takes 1 dose of the non-opioid drug every 48
hours starts at hour 24. Of particular interest are how the pain level evolves
over the first week out of surgery and how the drug concentrations evolve over
this time.
Other questions:
• What does this medication schedule do to the patient’s pain level?
5.10. PROJECTS 265
• What happens to the patient’s pain level if he/she forgets the non-opioid
drug?
• What happens to the patient’s pain level if he/she has a bad reaction to
opioids and only takes the non-opioid drug?
• What happens to the dynamics of the system if the patient’s pain starts
at 9/10?
• In reality, the unmitigated pain u will decrease in time. Propose a dif-
ferential equation model for the unmitigated pain that will have a stable
equilibrium at 3 and has a value of 5 on day 5. Add this fifth differential
equation to the pain model and examine what happens to the patient’s
pain over the first week. In this model, what happens after the first week
if the narcotics are ceased?
Now, imagine that you are living 200 years ago, acting as a consultant to an
artillery officer who will be going into battle (perhaps against Napoleon – he was
known for hiring mathematicians to help his war efforts). Although computers
have not yet been invented, given a few hours or a few days to work, a person
living in this time could project trajectories using numerical methods (yes,
numerical solutions to differential equations were well known back then too).
Using this, you can try various initial speeds v0 and angles θ0 until you find
a pair that reach any target. However, the artillery officer needs a faster and
simpler method. He can do math, but performing hundreds or thousands of
numerical calculations on the battlefield is simply not practical. Suppose that
our artillery piece will be firing at a target that is a distance ∆x away, and
that ∆x is approximately half a mile away – not exactly half a mile, but in that
general neighborhood.
a. Develop a method for estimating v0 and θ0 with reasonable accuracy given
the exact range to the target, ∆x. Your method needs to be simple enough
to use in real time on a historic (Napoleon-era) battle field without the
aid of a computer. (Be sure to persuade me that your numerical solution
is accurate enough.)
5.10. PROJECTS 267
Partial Differential
Equations
• heat transport
– The heat equation models heat energy (temperature) diffusing through
a metal rod or a solid body
• diffusion of a concentrated substance
– The diffusion equation is a PDE model for the diffusion of smells,
contaminants, or the motion of a solute
• wave propagation
– The wave equation is a PDE that can be used to model the standing
waves on a guitar string, the waves on lake, or sound waves traveling
270 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.1. With your partner answer each of the following questions. The
main ideas in this problem should be review from multivariable calculus. If you
and your partner are stuck then ask another group.
a. What is a partial derivative (explain geometrically)
b. What is the gradient of a function? What does it tell us physically or
geometrically? If u(x, y) = x2 + sin(xy) then what is ∇u?
c. What is the divergence of a vector-valued function? What does it tell us
physically or geometrically? If F (x, y) = sin(xy), x2 + y 2 then what is
∇ · F?
d. If u is a function of x, y, and z then what is ∇ · ∇u?
6.2. SOLUTIONS TO PDES 273
Exercise 6.2. Consider the PDE ut = Duxx where u(t, x) is the temperature of
a long thin metal rod at time t (in seconds) and spatial location x (in meters).
∂u
Note: the symbol ut is quick shorthand for the partial derivative ∂t and uxx is
2
a quick shorthand for the second partial derivative ∂∂xu2 .
a. What are the units of the constant D?
b. For each of the following functions, test whether it is an analytical solution
to this PDE by taking the first derivative with respect to time, the second
derivative with respect to position, and substituting them into this equation
to see if we get an identity (a true statement). If D = 3, which of these
functions is a solution? Be able to defend your answer.
i. u(t, x) = 4x3 + 6t2
ii. u(t, x) = 7x + 5
iii. u(t, x) = 8x2 t
iv. u(t, x) = e3t+x
v. u(t, x) = 6e3t+x + 5x − 2
vi. u(t, x) = e−3t + sin(x)
vii. u(t, x) = e3t sin(x)
viii. u(t, x) = e−3t sin(x)
ix. u(t, x) = 5e−3t sin(x) + 6x + 7
x. u(t, x) = −4e−3t sin(x) + 3t + 2
xi. u(t, x) = e−2t sin(3x)
xii. u(t, x) = e−12t cos(3x)
xiii. u(t, x) = e−12t cos(3x) + 4x2 + 8
xiv. u(t, x) = e−75t cos(5x)
xv. u(t, x) = 9e−75t cos(5x) + 2x + 7
274 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.3. Consider the PDE ut = Duxx , and suppose that D = 4. For
each of the following functions, find the value of the parameter a that will make
the function solve the PDE, by taking derivatives and substituting them into
the equation.
a. u(t, x) = 6e−8t sin(ax)
b. u(t, x) = −5e38t cos(ax)
c. u(t, x) = 3eat sin(5x)
d. u(t, x) = 7eat cos(2x)
e. u(t, x) = ae−36t cos(3x)
f. u(t, x) = ae−4t cos(6x)
Exercise 6.4. Consider again the PDE ut = Duxx . Is the function u(t, x) =
x2 − t3 a valid solution for this differential equation? If so, what is the value of
the constant k?
a. Calculate ut and uxx
ut = and uxx =
Exercise 6.5. The PDE ut = Duxx can be seen as asking two questions: (1)
the time derivative of the function u(t, x) is related to the function u itself, and
(2) the second spatial derivative of the function u(t, x) is related to the function
u itself.
a. What sort of function has the property that when you take the derivative
you get a scaled version of the function back.
b. What sort of function has the property that when you take two derivatives
you get a scaled version of the function back.
c. Based on your answers to parts (a) and (b), propose a function that might
be a solution to the PDE.
Exercise 6.6. Is the function u(t, x) = e−0.2t sin(πx) a solution to the PDE
ut = Dkuxx ? If this function is a solution to the PDE then what is the associated
value of D?
6.2. SOLUTIONS TO PDES 275
Exercise 6.7. Is a scalar multiple of the function in the previous exercise also
a solution to the PDE ut = Duxx with the exact same value for k? Will this
always be true? That is, if we have one solution u(t, x) to the PDE ut = Duxx
then will cu(t, x) be another solution for any real number c?
Exercise 6.8. When we studied ODEs we always had a starting point for a
solution – the initial condition. In the case of a PDE we also need to have an
initial condition, but the initial condition is associated with every point in the
spatial domain. Hence, the initial condition is actually a function of x. In the
previous exercise you found the u(t, x) = e−0.2t sin(πx) is a solution to the PDE
ut = Duxx . What is the initial condition that this solution satisfies? In other
words, what is the function u(t, x) at time t = 0?
Exercise 6.9. Since we have both temporal and spatial variables in PDEs
it stands to reason that we need conditions on both variables in order to get
a unique solution to the PDE. For the PDE ut = Duxx we already saw that
u(t, x) = e−0.2t sin(πx) is a solution to the PDE and the initial condition for
that solution is u(0, x) = sin(πx). If we are solving the PDE on the domain
x ∈ [0, 1] then what are the conditions that holds for all time at the points x = 0
and x = 1? These conditions are called boundary conditions.
a. The first idea is to show several discrete snapshots of time and to arrange
the plots in an array so we can read from left to right to see the evolution
in time.
import numpy as np
import matplotlib.pyplot as plt
u = lambda t, x: np.exp(-0.2*t) * np.sin(np.pi*x)
x = # code that gives 100 equally spaced points from 0 to 1
t = # code that gives 16 equally spaced points from 0 to 10
fig, ax = plt.subplots(nrows=4,ncols=4)
counter = 0 # this counter will count through the times
for n in range(4):
for m in range(4):
ax[n,m].plot(??? , ???, 'b') # plot x vs u(t[counter],x)
ax[n,m].grid()
276 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
def plotter(T):
plt.plot(x , u(T,x), 'b')
plt.grid()
plt.ylim(0,1)
plt.show()
6.2. SOLUTIONS TO PDES 277
Figure 6.3: 3D plot showing the time evolution of the solution to the PDE.
u = lambda t, x: np.exp(-0.2*t)*np.sin(np.pi*x)
x = np.linspace(0,1,25)
t = np.linspace(0,10,101)
fig, ax = plt.subplots()
plt.close()
ax.grid()
ax.set_xlabel('x')
ax.set_xlim(( 0, 1))
ax.set_ylim(( 0, 1))
frame, = ax.plot([], [], linewidth=2, linestyle='--')
def animator(N):
U = u(t[N],x)
ax.set_title('Time='+str(t[N]))
frame.set_data(x,U)
return (frame,)
PlotFrames = range(0,len(t),1)
anim = animation.FuncAnimation(fig,
animator,
6.2. SOLUTIONS TO PDES 279
frames=PlotFrames,
interval=100,
)
rc('animation', html='jshtml') # embed in the HTML for Google Colab
anim
Figure 6.4: Snapshot of an animation applet showing the time evolution of the
solution to the PDE.
Exercise 6.11. In the previous problem you built several plots of the function
u(t, x) = e−0.2t sin(πx) as a solution to the heat equation ut = Duxx .
a. Based on the plots, why do you think the equation ut = Duxx called the
“heat equation?” That is, why do the solutions look like dissipating heat?
Explain why your answer makes sense if we are solving an equation, called
the “heat equation,” that models the diffusion of heat through an object.
Hint: think of this object as a long thing metal rod and take note that the
boundary conditions are both 0.
280 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.12. Propose another solution to the PDE ut = Duxx that exactly
matches the boundary conditions u(t, 0) = 0 and u(t, 1) = 0 for all time t AND
exactly the same value for D as with the function u(t, x) = e−0.2t sin(πx). What
is the new initial condition associated with your new solution?
Hint: You may want to start with a function of the form u(t, x) = eat sin(bx) and
then determine values of a and b that will satisfy all of the required conditions.
Exercise 6.13. We now have two solutions to the PDE ut = Duxx that satisfy
both the PDE and the boundary conditions u(t, 0) = 0 and u(t, 1) = 0.
a. Prove that the sum of the two solutions also satisfies the PDE and the
same boundary conditions? If so, then the sum appears to be another
valid solution to the PDE.
b. What is the initial condition associated with the new solution you found
in part (a)?
c. Use the code that you built above to show the time evolution of your new
solution.
Exercise 6.14. Prove the previous theorem. Then extend the theorem to show
that if there are many functions that satisfy ut = Duxx and the boundary
conditions u(t, 0) = u(t, 1) = 0 then the sum of all of the functions is also a
solution and also satisfies the boundary conditions.
Exercise 6.15. Propose several solutions to the PDE ut = Duxx with the
boundary conditions u(t, 0) = 0 and ux (t, 1) = 0. That is to say that the
function u(t, x) is 0 at x = 0 and the derivative of u with respect to x at x = 1
is 0 (there is a horizontal tangent line to the function u at x = 1 for all times t).
Then use your plotting code to verify that your solution satisfies the boundary
conditions and visually shows the diffusion of heat as time evolves.
At this point we have a good notion of what the solutions to the PDE ut = Duxx
look and behave like. Now let’s ramp this up to two spatial dimensions.
6.2. SOLUTIONS TO PDES 281
Exercise 6.16. Leverage what you learned in the previous exercises to propose
a function u(t, x, y) that solves the equation
ut = D (uxx + uyy )
def plotter(T):
fig = plt.figure(figsize=(15,12))
ax = fig.gca(projection='3d')
z = u(T,x,y)
ax.plot_surface(x,y,z)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('u(t,x,y)')
ax.set_zlim(0,1)
plt.show()
Exercise 6.17. Prove that the function u(t, x, y) = e−0.2t sin(πx) sin(πy) is a
solution to the two dimensional heat equation ut = D(uxx + uyy ). Determine
the value of D for this particular solution. What are the boundary conditions
and the initial condition?
∂2u ∂2u
= c
∂t2 ∂x2
where u is the height of the wave at time t.
a. What are the units of c?
282 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
b. Reading from left-to-right, the partial differential equation says that the
second derivative of some function of t is related to that same function. If
you had to guess the type of function, what would you guess and why?
c. Reading from right-to-left, the partial differential equation says that the
second derivative of some function of x is related to that same function. If
you had to guess the type of function, what would you guess and why?
d. Based on your guesses from parts (a) and (b), what type of function would
think is a reasonable solution for the differential equation? Why?
e. If u(0, x) = sin(2πx) is the initial condition for the PDE and the boundary
conditions are u(t, 0) = u(t, 1) = 0 then propose a solution that matches
these conditions and make plots showing how the solution behaves over
time.
Theorem 6.2. If the function u(t, x) solves the 1D wave equation utt = cuxx
then u(t, x) likely has the functional form
u(t, x) = .
Exercise 6.20. Make several plots of your solution showing the time evolution
6.2. SOLUTIONS TO PDES 283
of the function. Examples of the plots are shown in Figures 6.6 and 6.7. Your
plots may look different given the oscillation period and the initial condition.
Exercise 6.21. If u0 (t, x) and u1 (t, x) are both solutions the wave equation
utt = cuxx matching boundary conditions u(t, 0) = u(t, 1) = 0, then is a linear
combination of u0 and u1 also a solution that matches the particular boundary
conditions?
Exercise 6.22. Consider the wave equation utt = c(uxx + uyy ) where u(x, y, t)
is the height (in centimeters) of a wave at time t in seconds and spatial location
(x, y) (each in centimeters).
a. What are the units of the constant c?
b. For each of the following functions, test whether it is an analytical solution
to this PDE by substituting the derivatives into the equation. If c = 2,
which of these functions is a solution?
i. u(t, x, y) = 3x + 2y + 5t − 6
ii. u(t, x, y) = 3x2 + 2y 2 + 5t2 − 6
iii. u(t, x, y) = sin(2x) + cos(3y) + sin(4t)
iv. u(t, x, y) = sin(2x) cos(3y) sin(4t)
v. u(t, x, y) = sin(3x) cos(4y) sin(10t)
vi. u(t, x, y) = −6 sin(3x) cos(4y) sin(10t) + 2x − 3y + 9 − 12
vii. u(t, x, y) = cos(7x) cos(3y) cos(12t)
284 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Figure 6.7: 3D plot of the time evolution of a solution to the wave equation.
√
viii. u(t, x, y) = cos(5x) sin(12y) cos(13 2t)
c. Make plots of the time evolution of the solutions from part (b). What
phenomena do you observe in the plots?
Theorem 6.3. If the function u(t, x, y) solves the 2D wave equation utt =
c (uxx + uyy ) then u(t, x, y) likely has the functional form
u(t, x, y) = .
Exercise 6.24. Propose a solution to the wave equation utt = cuxx where
u(t, 0) = 0 and ux (t, 1) = 0.
At this point we have only examined two PDEs, the heat and wave equations,
and have proposed possible forms of the analytic solutions. These two particular
PDEs have nice analytic solutions in terms of exponential and trigonometric
functions so it isn’t terribly challenging to guess at the proper functional forms
of the solutions. However, if we were to change the initial conditions, boundary
6.2. SOLUTIONS TO PDES 285
Definition 6.2. Let’s say that we want to solve a PDE with variable t and x
on the domain x ∈ [0, 1].
• The initial condition is a function f (x) where u(0, x) = f (x). In other
words, we are dictating the value of u at every point x at time t = 0.
• The boundary conditions are restrictions for how the solution behaves at
x = 0 and x = 1 (for this problem).
– If the value of the solution u at the boundary is either a fixed value or a
fixed function of time then we call the boundary condition a Dirichlet
boundary condition. For example, u(t, 0) = 1 and u(t, 1) = 5
are Dirichlet boundary conditions for this domain. Similarly, the
conditions u(t, 0) = 0 and u(t, 1) = sin(100πt) are also Dirichlet
boundary conditions. Dirichlet boundary conditions give the exact
value of u the the boundary points.
– If the value of the solution u depends on the flux of u at the bound-
ary then we call the boundary condition a Neumann boundary
condition. For example, ∂u ∂u
∂x (t, 0) = 0 and ∂x (t, 1) = 0 are Neumann
boundary conditions. They state that the flux of u is fixed at the
boundaries.
Let’s play with a couple problems that should help to build your intuition about
boundary conditions in PDEs. Again, we will do this graphically instead of
numerically.
Exercise 6.25. Consider solving the heat equation ut = Duxx in 1 spatial
dimension.
a. If a long thin metal rod is initially heated in the middle and the temperature
at the ends of the rod is held fixed at 0 then the heat diffusion is described
by the heat equation. What type of boundary conditions do we have in this
setup? How can you tell? Draw a picture showing the expected evolution
of the heat equation with these boundary conditions.
b. What if we take the initial condition for the 1D heat equation to be
u(0, x) = cos(2πx) and enforce the conditions ∂u ∂x = 0 and u(t, 1) = 1.
x=0
What types of boundary conditions are these? Draw a collection of pictures
showing the expected evolution of the heat equation with these boundary
conditions.
6.3. BOUNDARY CONDITIONS 287
Exercise 6.26. Consider solving the wave equation utt = cuxx in 1 spatial
dimension.
a. If a guitar string is pulled up in the center and held fixed at the frets then
the resulting vibrations of the string are described by the wave equation.
What type of boundary conditions do we have in this setup? How can you
tell? Draw a picture showing the expected evolution of the heat equation
with these boundary conditions.
b. What if we take the initial condition for the 1D wave equation to be
u(0, x) = cos(2πx) and enforce the conditions ∂u ∂x = 0 and u(t, 1) = 1.
x=0
What types of boundary conditions are these? Draw a collection of pictures
showing the expected evolution of the wave equation with these boundary
conditions.
The next two problems should help you to understand some of the basic scenarios
that we might wish to solve with the heat and wave equation.
Exercise 6.27. For each of the following situations propose meaningful boundary
conditions for the 1D or 2D heat equation.
a. A thin metal rod 1 meter long is heated to 100◦ C on the left end and is
cooled to 0◦ C on the right end. We model the heat transport with the 1D
heat equation ut = Duxx . What are the appropriate boundary and initial
conditions?
b. A thin metal rod 1 meter long is insulated on the left end so that the heat
flux through that end is 0. The rod is held at a constant temperature of
50◦ C on the right end. We model the heat transport with the 1D heat
equation ut = Duxx . What are the appropriate boundary conditions?
c. In a soil-science lab a column of packed soil is insulated on the sides and
cooled to 20◦ C at the bottom. The top of the column is exposed to a heat
lamp that cycles periodically between 15◦ C and 25◦ C and is supposed to
mimic the heating and cooling that occurs during a day due to the sun.
We model the heat transport within the column with the 1D heat equation
ut = Duxx . What are the appropriate boundary conditions?
d. A thin rectangular slab of concrete is being designed for a sidewalk. Imagine
the slab as viewed from above. We expect the right-hand side to be heated
to 50◦ C due to radiant heating from the road and the left-hand side to be
cooled to approximately 20◦ C due to proximity to a grassy hillside. The
top and bottom of the slab are insulated with a felt mat so that the flux
of heat through both ends is zero. We model the heat transport with the
2D heat equation ut = D(uxx + uyy ). What are the appropriate boundary
conditions?
288 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.28. For each of the following situations propose meaningful boundary
conditions for the 1D and 2D wave equation.
a. A guitar string is held tight at both ends and plucked in the middle.
We model the vibration of the guitar string with the 1D wave equation
utt = cuxx . What are the appropriate boundary conditions?
b. A rope is stretched between two people. The person on the left holds the
rope tight and doesn’t move. The person on the right wiggles the rope in
a periodic fashion completing one full oscillation per second. We model
the waves in the rope with the 1D wave equation utt = cuxx . What are
the appropriate boundary conditions?
c. A rubber membrane is stretched taught on a rectangular frame. The frame
is held completely rigid while the membrane is stretched from equilibrium
and then released. We model the vibrations in the membrane with the
2D wave equation utt = c(uxx + uyy ). What are the appropriate boundary
conditions?
6.4. THE HEAT EQUATION 289
Finally, if we take the limits as the time step, ∆t, and the length of the spatial
intervals, ∆x, get arbirarily small we get
∂u ∂2u
=D 2
∂t ∂x
where we have combined the coefficients on the right-hand side into the diffusion
coefficient D.
290 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
If there was a mechanism forcing density into each of the small intervals then
we would end up with the forced heat equation
∂u ∂2u
= D 2 + f (x).
∂t ∂x
where the function f (x) would model exactly how the density is being forced
into each spatial point x. We’ll let f (x) = 0 for the majority of this section for
simplicity, but you can modify any of the code that you write in this section to
include a forcing term.
Derivations of the 2D and 3D diffusion equations are very similar. You should
stop now and at least work out the details of the 2D heat equation.
In the remainder of this section we’ll use a technique called the finite difference
method to build numerical approximations to solutions of the heat equation in
1D, 2D, and 3D.
For the sake of simplicity we will start by considering the time dependent heat
equation in 1 spatial dimension with no external forcing function
∂u ∂2u
= D 2.
∂t ∂x
The constant D is the called the diffusivity (the rate of diffusion) so in terms of
physical problems, if D is small then the diffusion occurs slowly and if D is large
then the diffusion occurs quickly. Just as we did in Chapter 3 to approximate
derivatives and integrals numerically, and also in Chapter 5 to approximate
solutions to ODEs, we will start by partitioning the domain into finitely many
pieces and we will partition time into finitely many pieces.
ut = Duxx .
D∆t
Uin+1 =??? + (???−???+???) .
∆x2
The iterative scheme which you just derived is called a finite difference
scheme for the heat equation. Notice that the term on the left is the
only term at the next time step n + 1. So, for every spatial point xi we
can build Uin+1 by evaluating the right-hand side of the finite difference
scheme.
e. What is the expected order of the error for the approximation of the time
derivative in the finite difference scheme from part (d)?
f. What is the expected order of the error for the approximation of the spatial
second derivative in the finite difference scheme from part (d)?
g. The numerical errors made by using the finite difference scheme we just
built come from two sources: from the approximation of the time derivative
and from the approximation of the second spatial derivative. The total
error is the sum of the two errors. Fill in the question marks in the powers
of the following expression:
h. Explain what the result from part (g) means in plain English?
There are many different finite difference schemes due to the fact that there are
many different ways to approximate derivatives (See Chapter 3). One convenient
way to keep track of which information you are using and what you are calculating
in a finite difference scheme is to use a finite difference stencil image. Figure
6.8 shows the finite difference stencil for the approximation to the heat equation
that you built in the previous exercise. In this figure we are showing that the
292 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Figure 6.8: The finite difference stencil for the 1D heat equation.
n
function values Ui−1 , Uin , and Ui+1
n
at the points xi−1 , xi , and xi+1 are being
used at time step tn to calculate Uin+1 . We will build similar stencil diagrams
for other finite difference schemes throughout this chapter.
Exercise 6.30. Now we want to implement your answer to part (d) of the
previous exercise to approximate the solution to the following problem:
Solve: ut = 0.5uxx
with
x ∈ (0, 1), u(0, x) = sin(2πx), u(t, 0) = 0, and u(t, 1) = 0.
Some partial code is given below to get you started.
• First we import the proper libraries, set up the time domain, and set up
the spatial domain.
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive
# The next two lines build two parameters that are of interest
# for the finite difference scheme.
D = 0.5 # The diffusion coefficient for the heat equation given.
# The coefficient "a" appears in the finite difference scheme.
a = D*dt / dx**2
print("dt=",dt,", dx=",dx," and D dt/dxˆ2=",a)
• Next we build the array U so we can store all of the approximations at all
times and at all spatial points. The array will have the dimensions len(t)
versus len(x). We then need to enforce the boundary conditions so for all
times we fill the proper portions of the array with the proper boundary
conditions. Lastly, we will build the initial condition for all spatial steps
in the first time step.
U = np.zeros( (len(t),len(x)) )
U[:,0] = # left boundary condition
U[:,-1] = # right boundary condition
U[0,:] = # the function for the init. condition (should depend on x)
• Now we step through a loop that fills the U array one row at a time. Keep
in mind that we want to leave the boundary conditions fixed so we will
only fill indices 1 through -2 (stop and explain this). Be careful to get
the indexing correct. For example, if we want Uin we use U[n,1:-1], if we
n
want Ui+1 we use U[n,2:], if we want Uin+1 we use U[n+1,1:-1], etc.
for n in range(len(t)-1):
U[n+1,1:-1] = U[n,?:?] + a*( U[n,?:] - 2*U[n,?:?] + U[n,:?])
Note: If you don’t want to do an interactive plot then you can produce several
snapshots of the solutions with the following code.
for Frame in range(0,len(t),20): # ex: build every 20th frame
plotter(Frame)
294 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.31. You may have found that you didn’t get a sensible solution
out for the previous problem. The point of this exercise is to show that value
∆t
of a = D ∆x 2 controls the stability of the finite difference solution to the heat
equation, and furthermore that there is a cutoff for a below which the finite
difference scheme will be stable. Experiment with values of ∆t and ∆x and
∆t
conjecture the values of a = D ∆x 2 that give a stable result. Your conjecture
Exercise 6.32. Consider the one dimensional heat equation with diffusion
coefficient D = 1:
ut = uxx .
We want to solve this equation on the domain x ∈ (0, 1) and t ∈ (0, 0.5)
subject to the initial condition u(0, x) = sin(πx) and the boundary conditions
u(t, 0) = u(t, 1) = 0.
2
a. Prove that the function u(t, x) = e−π t sin(πx) is a solution to this
heat equation, satisfies the initial condition, and satisfies the boundary
conditions.
b. Pick values of ∆t and ∆x so that you can get a stable finite difference
solution to this heat equation. Plot your results on top of the analytic
solution from part (a).
c. Now let’s change the initial condition to u(0, x) = sin(πx) + 0.1 sin(100πx).
2 4 2
Prove that the function u(t, x) = e−π t sin(πx) + 0.1e−10 π t sin(100πx) is
a solution to this heat equation, matches this new initial condition, and
matches the boundary conditions.
d. Pick values of ∆t and ∆x so that you can get a stable finite difference
solution to this heat equation. Plot your results on top of the analytic
solution from part (c).
Exercise 6.33. In any initial and boundary value problem such as the heat
equation, the boundary values can either be classified as Dirichlet or Neumann
type. In Dirichlet boundary conditions the values of the solution at the boundary
are dictated specifically. So far we have only solved the heat equation with
Dirichlet boundary conditions. In contrast, Neumann boundary conditions
dictate the flux at the boundary instead of the value of the solution. Consider
the 1D heat equation ut = uxx with boundary conditions ux (t, 0) = 0 and
u(t, 1) = 0 with initial condition u(0, x) = cos(πx/2). Notice that the initial
d
condition satisfies both boundary conditions: dx (cos(π · x/2)) = 0 and
x=0
6.4. THE HEAT EQUATION 295
cos(π · 1/2) = 0. As the heat profile evolves in time the Neumann boundary
condition ux (t, 0) = 0 says that the slope of the solution needs to be fixed at 0
at the left-hand boundary.
a. Draw several images of what the solution to the PDE should look like as
time evolves. Be sure that all boundary conditions are satisfied and that
your solution appears to solve the heat equation.
b. The Neumann boundary condition ux (t, 0) = 0 can be approximated with
the first order approximation
U1n − U0n
ux (tn , 0) ≈ .
∆x
If we set this approximation to 0 (since ux (t, 0) = 0) and solve for U0n we
get an additional constraint at every time step of the numerical solution
to the heat equation. What is this new equation.
c. Modify your 1D heat equation code to implement this Neumann boundary
condition. Give plots that demonstrate that the Neumann boundary is
indeed satisfied.
Exercise 6.34. Modify your 1D heat equation code to solve the following
problems. For each be sure to classify the type of boundary conditions given.
Notice that we are now using initial and boundary conditions where it would be
quite challenging to built the analytic solution so we will only show numerical
solutions. Be sure that you choose ∆t and ∆x so that your solution is stable.
a. Solve ut = 0.5uxx with x ∈ (0, 1), u(0, x) = x2 , u(t, 0) = 0 and u(t, 1) = 1.
b. Solve ut = 0.5uxx with x ∈ (0, 1), u(0, x) = 1 − cos(πx/2), ux (t, 0) = 0 and
u(t, 1) = 1.
c. Solve ut = 0.5uxx with x ∈ (0, 1), u(0, x) = sin(2πx), u(t, 0) = 0 and
u(t, 1) = sin(5πt).
d. Solve ut = 0.5uxx + x2 /100 with x ∈ (0, 1), u(0, x) = sin(2πx), u(t, 0) = 0
and u(t, 1) = 0.
ut = D (uxx + uyy ) .
296 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
The finite difference stencil for the 2D heat equation is a bit more complicated
since we now have three indices to track. Hence, the stencil is naturally three
dimensional. Figure 6.9 shows the stencil for the finite difference scheme that
we built in the previous exercise. The left-hand subplot in the figure shows the
five points used in time step tn , and the right-hand subplot shows the one point
that is calculated at time step tn+1 .
6.4. THE HEAT EQUATION 297
Figure 6.9: The finite difference stencil for the 2D heat equation.
Exercise 6.36. Now we need to implement the finite difference scheme that
you developed in the previous problem. As a model problem, consider the 2D
heat equation ut = D(uxx + uyy ) on the domain (x, y) ∈ [0, 1] × [0, 1] with the
initial condition u(0, x, y) = sin(πx) sin(πy), homogeneous Dirichlet boundary
conditions, and D = 1.1 Fill in the holes in the following code chunks.
• First we import the proper libraries and set up the domains for x, y, and t.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm # this allows for color maps
from ipywidgets import interactive
homogeneous Dirichlet boundary conditions on this domain means that u(t, x, 0) = u(t, x, 1) =
u(t, 0, y) = u(t, 1, y) = 0.
298 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
• Next we have to set up the boundary and initial conditions for the given
problem.
U[0,:,:] = # initial condition depending on X and Y
U[:,0,:] = # boundary condition for x=0
U[:,-1,:] = # boundary condition for x=1
U[:,:,0] = # boundary condition for y=0
U[:,:,-1] = # boundary condition for y=1
• We know that the value of D∆t/∆x2 controls the stability of finite element
methods. Therefore, the next step in our code is to calculate this value
and print it.
D = 1
a = D*dt/dx**2
print(a)
• Next for the part of the code that actually calculates all of the time steps.
Be sure to keep the indexing straight. Also be sure that we are calculating
all of the spatial indices inside the domain since the boundary conditions
dictate what happens on the boundary.
for n in range(len(t)-1):
U[n+1,1:-1,1:-1] = U[n,1:-1,1:-1] + \
a*(U[n, ?:? , ?:?] + \
U[n, ?:?, ?:?] - \
4*U[n, ?:?, ?:?] + \
U[n, ?:?, ?:?] + \
U[n, ?:?, ?:?])
plt.show()
Fill in all of the holes in the code and verify that your solution appears to solve
a heat dissipation problem.
Theorem 6.4. In order for the finite difference solution to the 2D heat equation
on a square domain to be stable then we need D∆t/∆x2 < .
Experiment with several parameters to imperically determine the bound.
Exercise 6.38. Now solve the 2D heat equation on a rectangular domain. You
will need to make some modifications to your code since it is unlikely that
assuming that ∆x = ∆y is a good assumption any longer. Again, be prepared
to present your solutions.
300 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.40. Solve the 2D heat equation on the unit square with the following
parameters:
• A partition of 21 points in both the x and y direction.
• 301 points between 0 and 0.25 for time
• An initial condition of u(0, x, y) = sin(πx) sin(πy)
What happens near time step number 70?
Exercise 6.41. What you saw in the previous two exercises is an example of a
sawtooth error that occurs when a numerical solution technique for a PDE is
unstable. Propose a conjecture for why this type of error occurs.
Theorem 6.5. Let’s summarize the stability criteria for the finite difference
solutions to the heat equation.
• In the 1D heat equation the finite difference solution is stable if D∆t/∆x2 <
.
• In the 2D heat equation the finite difference solution is stable if D∆t/∆x2 <
(assuming a square domain where ∆x = ∆y)
• Propose a stability criterion for the 3D heat equation.
Exercise 6.42. Rewrite your finite difference code so that it produces an error
message when the parameters will result in an unstable finite difference solution.
Do the same for your 2D heat equation code.
It is actually possible to beat the stability criteria given in the previous exercises!
What follows are two implicit methods that use a forward-looking scheme to help
6.5. STABILITY OF THE HEAT EQUATION SOLUTION 301
b. Now we’re going to build a very small example with only 6 spatial points
so that you can clearly see the structure of the resulting linear system.
i. If we have 6 total points in the spatial grid (x0 , x1 , . . . , x5 ) then we
have the following equations (fill in the blanks):
ii. Notice that we aready know U0n+1 and U5n+1 since these are dictated
by the boundary conditions (assuming Dirichlet boundary conditions).
302 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
iii. Now we can leverage linear algebra and write this as a matrix equation.
n+1 n
U0n+1
0 0 U1 U1
n+1 n
0 U2n+1 = U2n + 0
0 U U3 0
3
n+1 n n+1
0 0 U4 U4 U5
c. At this point the structure of the coefficient matrix on the left and the
vector sum on the right should be clear (even for more spatial points). It
is time for us to start writing some code. We’ll start with the basic setup
of the problem.
import numpy as np
import matplotlib.pyplot as plt
D = 1
x = # set up a linearly spaced spatial domain
t = # set up a linearly spaced temporal domain
dx = x[1]-x[0]
dt = t[1]-t[0]
a = D*dt/dx**2
IC = lambda x: # write a function for the initial condition
BCleft = lambda t: 0*t # left boundary condition
# (we've used 0*t here for a homog. bc)
BCright = lambda t: 0*t # right boundary condition
# (we've used 0*t here for a homog. bc)
d. Next we write a function that takes in the number of spatial points and
returns the coefficient matrix for the linear system. Take note that the
first and last rows take a little more care than the rest.
def coeffMatrix(M,a): # we are using M=len(x) as the first input
A = np.matrix( np.zeros( (M-2,M-2) ) )
# why are we using M-2 X M-2 for the size?
6.5. STABILITY OF THE HEAT EQUATION SOLUTION 303
A = coeffMatrix(len(x),a)
print(A)
plt.spy(A)
# spy is a handy plotting tool that shows the structure
# of a matrix (optional)
plt.show()
e. Next we write a loop that iteratively solves the system of equations for
each new time step.
for n in range(len(t)-1):
b1 = U[n,???]
# b1 is a vector of U at step n for the inner spatial nodes
b2 = np.zeros_like(b1) # set up the second right-hand vector
b2[0] = ???*BCleft(t[n+1]) # fill in the correct first entry
b2[-1] = ???*BCright(t[n+1]) # fill in the correct last entry
b = b1 + b2 # The vector "b" is the right side of the equation
#
# finally use a linear algebra solver to fill in the
# inner spatial nodes at step n+1
U[n+1,???] = ???
f. All of the hard work is now done. It remains to plot the solution. Try this
method on several sets of initial and boundary conditions for the 1D heat
equation. Be sure to demonstrate that the method is stable no matter the
values of ∆t and ∆x.
g. What are the primary advantages and disadvantages to the implicit method
descirbed in this problem?
at the current time step and the central difference at the new time step. That is:
" !#
n+1
Uin+1 − Uin 1 n
Ui+1 − 2Uin + Ui+1
n Ui+1 − 2Uin+1 + Ui+1
n+1
= D +D .
∆t 2 ∆x2 ∆x2
d. Verify that the right-hand side of the equations that we built in parts (a)
and (b) can be written as
U1nrU0n+1
(1 − 2r) r 0 0 ··· 0
r (1 − 2r) r 0 ··· 0 U2n
0
U3n
0 r (1 − 2r) r 0 ..
+ .
.. ..
. 0
.
n n+1
r (1 − 2r) UM −2 rUM −1
6.5. STABILITY OF THE HEAT EQUATION SOLUTION 305
e. Now for the wonderful part! The entire system of equations from part (a)
can be written as
AU n+1 = BU n + D.
What are the matrices A and B and what are the vectors U n+1 , U n , and
D?
f. To solve for U n+1 at each time step we simply need to do a linear solve:
Exercise 6.45. To graphically show the Crank Nicolson method we can again
use a finite difference stencil to show where the information is coming from and
where it is going to. In Figure 6.10 notice that there are three points at the
new time step that are used to calculate the value of Uin+1 at the new time step.
Sketch a similar image for the original implicit scheme from Exercise 6.43
Figure 6.10: The finite difference stencil for the Crank Nicolson method.
306 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
∂2u
Tright sin θright − Tleft sin θleft = ρ∆x
∂t2
by the constant tension T we get
ρ∆x ∂ 2 u
tan θright − tan θleft = .
T ∂t2
Recognizing that the tangent of the angle will just be the slopes at the right and
left points we now have
ρ∆x ∂ 2 u
ux (t, x + ∆x) − ux (t, x) =
T ∂t2
which can be rearranged to
Allowing ∆x to get arbitarily small the difference quotient on the left-hand side
becomes the second spatial derivative and we arrive at the 1D wave equation
∂2u ∂2u
2
=c 2
∂t ∂x
where we have defined c = T /ρ as a parameter describing the stiffness of the
string. The 2D and 3D derivations are similar but a bit trickier with the trig
and geometry.
For the remainder of this section we will focus on approximating solutions to
the wave equation in 1D, 2D, and 3D numerically.
Exercise 6.46. Let’s write code to numerically solve the 1D wave equation. As
before, we use the notation Uin to represent the approximate solution u(t, x) at
the point t = tn and x = xi .
a. Give a reasonable discretization of the second derivative in time:
utt (tn , xi ) ≈ .
uxx (tn , xi ) ≈ .
c. Put your answers to parts (a) and (b) together with the wave equation to
get
???−???+??? ???−???+???
2
=c .
∆t ∆x2
308 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
d. Solve the equation from part (c) for Uin+1 . The resulting difference
equation is the finite difference scheme for the 1D wave equation.
e. You should notice that the finite difference scheme for the wave equation
references two different times: Uin and Uin−1 . Based on this observation,
what information do we need to in order to actually start our numerical
solution?
f. Consider the wave equation utt = 2uxx in x ∈ (0, 1) with u(0, x) = 4x(1−x),
ut (0, x) = 0, and u(t, 0) = u(t, 1) = 0. Use the finite difference scheme that
you built in this problem to approximate the solution to this PDE.
Figure 6.11 shows the finite difference stencil for the 1D wave equation. Notice
that we need two prior time steps in order to advance to the new time step. This
means that in order to start the finite difference scheme for the wave equation we
need to have information about time t0 and also time t1 . We get this information
by using the two initial conditions u(0, x) and ut (0, x).
Figure 6.11: The finite difference stencil for the 1D wave equation.
Exercise 6.47. The ratio c∆t2 /∆x2 shows up explicitly in the finite difference
scheme for the 1D wave equation. Just like in the heat equation, this parameter
controlls when the finite difference solution will be stable. Experiment with your
finite difference solution and conjecture a value of a = c∆t2 /∆x2 which divides
the regions of stability versus instability. Your answer should be in the form:
If a = c∆t2 /∆x2 < then the finite difference scheme for the 1D wave
equation will be stable. Otherwise it will be unstable.
Exercise 6.48. Show several plots demonstrating what occurs to the finite
difference solution of the wave equation when the parameters are in the unstable
region and right on the edge of the unstable region.
6.6. THE WAVE EQUATION 309
Exercise 6.49. What is the expected error in the finite difference scheme for
the 1D wave equation? What does this mean in plain English?
Exercise 6.50. Use your finite difference code to solve the 1D wave equation
utt = cuxx
with boundary conditions u(t, 0) = u(t, 1) = 0, initial condition u(0, x) =
4x(1 − x), and zero initial velocity. Experiment with different values of c. What
does the parameter c to the wave? Give a physical interpretation of c.
b. Substitute your discretizations into the 2D wave equation, make the sim-
n+1
plifying assumption that ∆x = ∆y, and solve for Ui,j . This is the finite
difference scheme for the 2D wave equation.
c. Write code to implement the finite difference scheme from part (b) on
the domain (x, y) ∈ (0, 1) × (0, 1) with homogeneous Dirichlet boundary
conditions, initial condition u(0, x, y) = sin(2π(x − 0.5)) sin(2π(y − 0.5)),
and zero initial velocity.
d. Draw the finite difference stencil for the 2D heat equation.
Exercise 6.53. What is the region of stability for the finite difference scheme
on the 2D wave equation? Produce several plots showing what happens when
we are in the unstable region as well as when we are right on the edge of the
stable region.
310 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.54. Solve the 2D wave equation on the unit square with u starting
at rest and being driven by a wave coming in from one boundary.
6.7. TRAVELING WAVES 311
ut + vux = 0.
In this equation u(t, x) is the height of a wave at time t and spatial location x.
The parameter v is the velocity of the wave. Imagine this as sending a single
solitary wave pulsing down a taught rope or as sending a single pulse of light
down a fiber optic cable.
Exercise 6.55. Consider the PDE ut + vux = 0. There is a very easy way to
get an analytic solution to this traveling wave equation. If we have the initial
2
condition u(0, x) = f (x) = e−(x−4) then we claim that u(t, x) = f (x − vt) is an
analytic solution to the PDE. More explicitely, we are claiming that
2
u(t, x) = e−(x−vt−4)
Exercise 6.56. Now we would like to visualize the solution to the PDE from
the previous exercise. The Python code below gives an interactive visual of the
solution. Experiment with different values of v and different initial conditions.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation, rc
from IPython.display import HTML
v = 1
f = lambda x: np.exp(-(x-4)**2)
u = lambda t, x: f(x - v*t)
x = np.linspace(0,10,100)
t = np.linspace(0,10,100)
fig, ax = plt.subplots()
plt.close()
ax.grid()
ax.set_xlabel('x')
ax.set_xlim(( 0, 10))
ax.set_ylim(( -0.1, 1))
312 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
def animator(N):
ax.set_title('Time='+str(t[N]))
frame.set_data(x,???) # plot the correct time step for u(t,x)
return (frame,)
PlotFrames = range(0,len(t),1)
anim = animation.FuncAnimation(fig,
animator,
frames=PlotFrames,
interval=100,
)
Theorem 6.6. If ut + vux = 0 with initial condition u(0, x) = f (x) then the
function u(t, x) = f (x − vt) is an analytic solution to the PDE.
Exercise 6.57. Use the chain rule to prove the previous theorem.
The traveling wave equation ut + vux = 0 has a very nice analytic solution which
we can always find. Therefore there is no need to ever find a numerical solution –
we can just write down the analytic solution if we are given the initial condition.
As it turns out though the numerical solutions exhibit some very interesting
behavior.
Exercise 6.58. Consider the traveling wave equation ut + vux = 0 with initial
condition u(0, x) = f (x) for some given function f and boundary condition
u(t, 0) = 0. To build a numerical solution we will again adopt the notation Uin
for the approximation to u(t, x) at the point t = tn and x = xi .
a. Write an approximation of ut using Uin+1 and Uin .
n
b. Write an approximation of ux using Ui+1 and Uin .
c. Substitute your answers from parts (a) and (b) into the traveling wave
equation and solve for Uin+1 . This is our first finite difference scheme for
the traveling wave equation.
d. Write Python code to get the finite difference approximation of the solution
to the PDE. Plot your finite difference solution on top of the analytic
6.7. TRAVELING WAVES 313
2
solution for f (x) = e−(x−4) . What do you notice? Can you stabilize this
method by changing the values of ∆t and ∆x like with did with the heat
and wave equations?
The finite difference scheme that you built in the previous exercise is called the
downwind scheme for the traveling wave equation. Figure 6.12 shows the finite
difference stencil for this scheme. We call this scheme “downwind” since we
expect the wave to travel from left to right and we can think of a fictitious wind
blowing the solution from left to right. Notice that we are using information
from “downwind” of the point at the new time step.
Figure 6.12: The finite difference stencil for the 1D downwind scheme on the
traveling wave equation.
Exercise 6.59. You should have noticed in the previous exercise that you cannot
reasonbly stabilize the finite difference scheme. Propose several reasons why this
method appears to be unstable no matter what you use for the ratio v∆t/∆x.
Exercise 6.60. One of the troubles with the finite difference scheme that we
have built for the traveling wave equation is that we are using the information
at our present spatial location and the next spatial location to the right to
propogate the solution forward in time. The trouble here is that the wave is
moving from left to right, so the interesting information about the next time
step’s solution is actually coming from the left, not the right. We call this
“looking upwind” since you can think of a fictitious wind blowing from left to
right, and we need to look “upwind” to see what is coming at us. If we write
the spatial derivative as
U n − Ui−1
n
ux ≈ i
∆x
314 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
we still have a first-order approximation of the derivative but we are now looking
left instead of right for our spatial derivative. Make this modification in your finite
difference code for the traveling wave equation (call it the “upwind method”).
Approximate the solution to the same PDE as we worked with in the previous
exercises. What do you notice now?
Figure 6.13 shows the finite difference stencil for the upwind scheme. We call
this scheme “up” since we expect the wave to travel from left to right and we
can think of a fictitious wind blowing the solution from left to right. Notice that
we are using information from “upwind” of the point at the new time step.
Figure 6.13: The finite difference stencil for the 1D downwind scheme on the
traveling wave equation.
b. In the upwind finite difference scheme for the traveling wave equation, the
approximate solution moves at the correct speed, but . . .
Exercise 6.62. Neither the downwind nor the upwind solutions for the traveling
wave equation are satisfactory. They completely miss the interesting dynamics of
the analytic solution to the PDE. Some ideas for stabilizing the finite difference
solution for the traveling wave equation are as follows. Implement each of these
ideas and discuss pros and cons of each. Also draw a finite difference stencil for
each of these methods.
a. Perhaps one of the issues is that we are using first-order methods to
approximate ut and ux . What if we used a second-order approximation
6.7. TRAVELING WAVES 315
Uin+1 − 12 Ui+1
n n
+ Ui−1
ut ≈ .
∆t
Solve this modified finite difference equation for Uin+1 and implement this
method. This is called the Lax-Friedrichs method.
c. Finally we’ll do something very clever (and very counter intuitive). What
if we inserted some artificial diffusion into the problem? You know from
your work with the heat equation that diffusion spreads a solution out.
The downwind scheme seemed to have the issue that it was bunching up
at the beginning and end of the wave, so artificial diffusion might smooth
this out. The Lax-Wendroff method does exactly that: take a regular
Euler-type step in time
Uin+1 − Uin
ut ≈ ,
∆t
use a second-order centered difference scheme in space to approximate ux
n n
Ui+1 − Ui−1
ux ≈ ,
2∆x
but add on the term
v 2 ∆t2
n
− 2Uin + Ui+1
n
Ui−1
2∆x2
to the right-hand side of the equation. Notice that this new term is a scalar
multiple of the second-order approximation of the second derivative uxx .
Solve this equation for Uin+1 and implement the Lax-Wendroff method.
316 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
b. Recall that the solution to a differential equation reaches a steady state (or
equilibrium) when the time rate of change is zero. Based on the physical
system, what is the steady state heat profile for this PDE?
c. Use your 1D heat equation code to show the full time evolution of this
PDE. Run the simulation long enough so that you see the steady state
heat profile.
2
Exercise 6.64. Now consider the forced 1D heat equation ut = uxx + e−(x−0.5)
with the same boundary and initial conditions as the previous exercise. The
exponential forcing function introduced in this equation is an external source of
heat (like a flame held to the middle of the metal rod).
a. Conjecture what the steady state heat profile will look like for this particular
setup. Be able to defend your answer.
b. Modify your 1D heat equation code to show the full time evolution of this
PDE. Run the simulation long enough so that you see the steady state
heat profile.
Exercise 6.65. Next we’ll examine 2D steady state heat profiles. Consider the
PDE ut = uxx +uyy with boundary conditions u(t, 0, y) = u(t, x, 0) = u(t, x, 1) =
0 and u(t, 1, y) = 1 with initial condition u(0, x, y) = 0.
a. Describe the physical setup for this problem.
b. Based on the physical system, describe the steady state heat profile for this
PDE. Be sure that your steady state solution still satisfies the boundary
conditions.
c. Use your 2D heat equation code to show the full time evolution of this
PDE. Run the simulation long enough so that you see the steady state
heat profile.
Exercise 6.66. Now consider the forced 2D heat equation ut = uxx + uyy +
2 2
10e−(x−0.5) −(y−0.5) with the same boundary and initial conditions as the pre-
vious exercise. The exponential forcing function introduced in this equation is
an external source of heat (like a flame held to the middle of the metal sheet).
a. Conjecture what the steady state heat profile will look like for this particular
setup. Be able to defend your answer.
6.8. THE LAPLACE AND POISSON EQUATIONS 317
b. Modify your 2D heat equation code to show the full time evolution of this
PDE. Run the simulation long enough so that you see the steady state
heat profile.
Up to this point we have studied PDEs that all depend on time. In many
applications, however, we are not interested in the transient (time dependent)
behavior of a system. Instead we are often interested in the steady state solution
when the forces in question are in static equilibrium. Two very famous time-
independent PDEs are the Laplace Equation
Notice that both the Laplace and Poisson equations are the equations that we
get when we consider the limit ut → 0. In the limit when the time rate of
change goes to zero we are actually just looking at the eventual steady state heat
profile resulting from the initial and boundary conditions of the heat equation.
In the previous exercises you already wrote code that will show the steady state
profiles in a few setups. The trouble with the approach of letting the time-
dependent simulation run for a long time is that the finite difference solution
for the heat equation is known to have stability issues. Moreover, it may take
a lot of computational time for the solution to reach the eventual steady state.
In the remainder of this section we look at methods of solving for the steady
state directly – without examining any of the transient behavior. We will first
examine a 1D version of the Laplace and Poisson equations.
Exercise 6.67. Consider a 1-dimensional rod that is infinitely thin and has
unit length. For the sake of simplicity assume the following:
• the specific heat of the rod is exactly 1 for the entire length of the rod,
• the temperature of the left end is held fixed at u(0) = 0,
• the temperature of the right end is held fixed at u(1) = 1, and
• the temperature has reached a steady state.
You can assume that the temperatures are reference temperatures instead of ab-
solute temperatures, so a temperature of “0” might represent room temperature.
Since there are no external sources of heat we model the steady-state heat profile
we must have ut = 0 in the heat equation. Thus the heat equation collapses to
uxx = 0. This is exactly the one dimensional Laplace equation.
a. To get an exact solution of the Laplace equation in this situation we simply
need to integrate twice. Do the integration and write the analytic solution
(there should be no surprises here).
318 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Notice that there are really only four unknowns since the boundary con-
ditions dictate two of the temperature values. Rearrange this system of
equations into a matrix equation and solve for the unknowns U1 , U2 , U3 ,
and U4 . Your coefficient matrix should be 4 × 4.
c. Compare your answers from parts (a) and (b).
d. Write code to build the numerical solution with an arbitrary value for ∆x
(i.e. an arbitrary number of sub intervals). You should build the linear
system automatically in your code.
Exercise 6.68. Solve the 1D Laplace equation with Dirichlet boundary con-
ditions is rather uninteresting since the answer will alway be a linear function
connecting the two boundary conditions. Prove this.
The Poisson equation, uxx = f (x), is more interesting than the Laplace equation
in 1D. The function f (x) is called a forcing function. You can think of it this
way: if u is the amount of force on a linear bridge, then f might be a function
that gives the distribution of the forces on the bridge due to the cars sitting on
the bridge. In terms of heat we can think of this as an external source of heat
energy warming up the one-dimensional rod somewhere in the middle (like a
flame being held to one place on the rod).
Exercise 6.69. How would we analytically solve the Poisson equation uxx = f (x)
in one spatial dimension? As a sample problem consider x ∈ [0, 1], the forcing
6.8. THE LAPLACE AND POISSON EQUATIONS 319
function f (x) = 5 sin(2πx) and boundary conditions u(0) = 2 and u(1) = 0.5.
Of course you need to check your answer by taking two derivatives and making
sure that the second derivative exactly matches f (x). Also be sure that your
solution matches the boundary conditions exactly.
Exercise 6.70. Now we can solve the Poisson equation from the previous
problem numerically. Let’s again build this with a partition that contains only 6
points just like we did with the Laplace equation a few exercise ago. We know
the approximation for uxx so we have the linear system
Exercise 6.71. The previous exercises only account for Dirichlet boundary
conditions (fixed boundary conditions). We would now like to modify our Poisson
solution to allow for a Neumann condition: where we know the derivative of u
at one of the boundaries. The statement of the problem is as follows:
and we simply need to add this equation to the system that we were solving in
the previous exercise. If we go back to our example of a partition with 6 points
the system becomes
U1 − U0
=α (left boundary condition)
∆x
U2 − 2U1 + U0
= f (x1 )
∆x2
U3 − 2U2 + U1
= f (x2 )
∆x2
U4 − 2U3 + U2
= f (x3 )
∆x2
U5 − 2U4 + U3
= f (x4 )
∆x2
U5 =β (right boundary condition).
b. In Figure 6.14 we see that there are 16 total equations resulting from the
discretization of the Poisson equation. Your first task is to write all 16 of
these equations. We’ll get you started:
−4 1 0 0 1 0 0 0 ··· 0
1
−4 1 0 0 1 0 0 ··· 0
0
1 −4 1 0 0 1 0 ··· 0
0
0 1 −4 1 0 0 1
.
−4 1 0 0 . .
1
A= 0 0 0
0
.
..
−4
d. In the coefficient matrix from part (c) notice that the small matrix
−4 1 0 0
1 −4 1 0
0 1 −4 1
0 0 1 −4
shows up in blocks along the main diagonal. If you have a hard copy of
the matrix go back and draw a box around these blocks in the coefficient
matrix. Also notice that there are diagonal bands of 1s . Discuss the
following:
i. Why are the blocks 4 × 4?
ii. How could you have predicted the location of the diagonal bands of
1s ?
322 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
iii. What would the structure of the matrix look like if we partitioned the
domain into a 10 × 10 grid of points instead of a 6 × 6 grid (including
the boundary points)?
iv. Why is it helpful to notice this structure?
e. The right-hand side of the matrix equation resulting the your system of
equations from part (b) is
f (x1 , y1 )
f (x1 , y2 )
f (x1 , y3 )
f (x1 , y4 )
b = ∆x2 f (x2 , y1 ) .
f (x2 , y2 )
..
.
f (x4 , yy )
Notice the structure of this vector. Why is it structured this way? Why is
it useful to notice this?
f. Write Python
2 2
problem at hand. Recall that f (x, y) =
code to solve the
−20 exp − −(x−0.5)0.05+(y−0.5)
. Show a contour plot of your solution. This
will take a little work changing the indices back from k to i and j. Think
carefully about how you want to code this before you put fingers to
keyboard. You might want to use the np.block() command to build the
coefficient matrix efficiently or you can use loops with carefully chosen
indices.
g. (Challenge) Generalize your code to solve the Poisson equation with a
much smaller value of ∆x = ∆y.
h. One more significant observation should be made about the 2D Poisson
equation on this square domain. Notice that the corner points of the
domain (e.g. i = 0, j = 0 or i = 5, j = 0) are never included in the system
of equations. What does this mean about trying to enforce boundary
conditions that only apply at the corners?
6.9. EXERCISES 323
Figure 6.14: A finite difference grid for the Poisson equation with 6 grid points
in each direction.
6.9 Exercises
6.9.1 Algorithm Summaries
Exercise 6.73. Explain in clear language what it means to check an analytic
solution to a differential equation.
Exercise 6.76. Show the full mathematical details for building a first-order in
time and second-order in space approximation method for the one-dimensional
heat equation. Explain what the order of the error means in this context
Exercise 6.77. Show the full mathematical details for building a second-order
in time and second-order in space approximation method for the one-dimensional
324 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
wave equation. Explain what the order of the error means in this context
Exercise 6.78. Show the full mathematical details for building a first-order in
time and second-order in space approximation method for the two-dimensional
heat equation. Explain what the order of the error means in this context
Exercise 6.79. Show the full mathematical details for building a second-order
in time and second-order in space approximation method for the two-dimensional
wave equation. Explain what the order of the error means in this context
Exercise 6.80. Explain in clear language what it means for a finite difference
method to be stable vs unstable.
Exercise 6.81. Show the full mathematical details for solving the 1D heat
equation using the implicit and Crank-Nicolson methods.
Exercise 6.82. Show the full mathematical details for building a downwind
finite difference scheme for the traveling wave equation. Discuss the primary
disadvantages of the downwind scheme.
Exercise 6.83. Show the full mathematical details for building an upwind
finite difference scheme for the traveling wave equation. Discuss the primary
disadvantages of the upwind scheme.
Exercise 6.84. Show the full mathematical details for numerically solving the
1D Laplace and Poisson equations.
Exercise 6.86. In a square domain create a function u(0, x, y) that looks like
your college logo. The simplest way to do this might be to take a photo of the
logo, crop it to a square, and use the scipy.ndimage.imread command to read
in the image. Use this function as the initial condition for the heat equation on
a square domain with homogeneous Dirichlet boundary conditions. Numerically
solve the heat equation and show an animation for what happens to the logo as
time evolves.
Exercise 6.87. Repeat the previous exercise but this time solve the wave
equation with the logo as the initial condition.
Exercise 6.88. The explicit finite difference scheme that we built for the 1D
heat equation in this chapter has error on the order of O(∆t) + O(∆x2 ). Explain
clearly what this means. Then devise a numerical experiment to empirically test
this fact. Clearly explain your thought process and show sufficient plots and
mathematics to support your work.
326 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
Exercise 6.89. Suppose that we have a concrete slab that is 10 meters in length,
with the left boundary held at a temperature of 75◦ and the right boundary held
at a temperature of 90◦ . Assume that the thermal diffusivity of concrete is about
k = 10−5 m2 /s. Assume that the initial temperature of the slab is given by the
function T (x) = 75 + 1.5x − 20 sin(πx/10). In this case, the temperature can be
analytically solved by the function T (t, x) = 75 + 1.5x − 20 sin(πx/10)e−ct for
some value of c.
a. Working by hand (no computers!) test the proposed analytic solution by
substituting it into the 1D heat equation and verifying that it is indeed a
solution. In doing so you will be able to find the correct value of c.
b. Write numerical code to solve this 1D heat equation. The output of your
code should be an animation showing how the error between the numerical
solution and the analytic solution evolve in time.
Exercise 6.90. (This problem is modified from [11]. The data given below is
real experimental data provided courtesy of the authors.)
Harry and Sally set up an experiment to gather data specifically for the heat
diffusion through a long thin metal rod. Their experimental setup was as follows.
• The ends of the rod are submerged in water baths at different temperatures
and the heat from the hot water bath (on the right hand side) travels
through the metal to the cooler end (on the left hand side).
• The temperature of the rod is measured at four locations; those measure-
ments are sent to a Raspberry Pi, which processes the raw data and sends
the collated data to be displayed on the computer screen.
• They used a metal rod of length L = 300mm and square cross-sectional
width 3.2mm.
a. At time time t = 960 seconds the temperatures of the rod are essentially
at a steady state. Use this data to make a prediction of the temperature
of the hot water bath located at x = 300mm.
Exercise 6.91. You may recall from your differential equations class that
population growth under limited resources is goverened by the logistic equation
x0 = k1 x(1 − x/k2 ) where x = x(t) is the population, k1 is the intrinsic growth
rate of the population, and k2 is the carrying capacity of the population. The
carrying capacity is the maximum population that can be supported by the
environment. The trouble with this model is that the species is presumed to
be fixed to a spatial location. Let’s make a modification to this model that
allows the species to spread out over time while they reproduce. We have seen
throughout this chapter that the heat equation ut = D(uxx + uyy ) models the dif-
fusion of a substance (like heat or concentration). We therefore propose the model
328 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
2
∂ u ∂2u
∂u u
= k1 u 1 − +D + 2
∂t k2 ∂x2 ∂y
where u(t, x, y) is the population density of the species at time t and spatial
point (x, y), (x, y) is a point in some square spatial domain, k1 is the growth
rate of the population, k2 is the carrying capacity of the population, and D
is the rate of diffusion. Develop a finite difference scheme to solve this PDE.
Experiment with this model showing the interplay between the parameters D,
k1 , and k2 . Take an initial condition of
2
+(y−0.5)2 )/0.05
u(0, x, y) = e−((x−0.5) .
Exercise 6.92. In Exercise 6.72 you solved the Poisson equation, uxx + uyy =
f (x, y), on the unit square with homogenous
Dirichlet boundary
conditions and
2 2
a forcing function f (x, y) = −20 exp − (x−0.5)0.05
+(y−0.5)
. Use a 10 × 10 grid of
points to solve the Poisson equation on the same domain with the same forcing
function but with boundary conditions
6.10 Projects
In this section we propose several ideas for projects related to numerical partial
differential equations. These projects are meant to be open ended, to encourage
creative mathematics, to push your coding skills, and to require you to write
and communicate your mathematics. Take the time to read Appendix B before
you write your final solution.
Since the population is mobile let’s make a few assumptions about the environ-
ment that they’re in and how the individuals move.
• Food is abundant in the entire environment.
• Individuals in the population like to spread out so that they don’t interfere
with each others’ hunt for food.
• It is equally easy for the individuals to travel in any direction in the
environment.
Clearly some of these assumptions are unreasonable for real populations and real
environments, but let’s go with it for now. Given the nature of these assumptions
we assume that a diffusion term models the spread of the individuals in the
population. Hence, the PDE model is
∂u u
= ru 1 − − hu + D uxx + uyy| .
∂t K
1. Use any of your ODE codes to solve the ordinary differential equation with
harvesting. Give a complete description of the parameter space.
2. Write code to solve the spatial+temporal PDE equation on the 2D domain
(x, y) ∈ [0, 1] × [0, 1]. Choose an appropriate initial condition and choose
appropriate boundary conditions.
330 CHAPTER 6. PARTIAL DIFFERENTIAL EQUATIONS
3. The third assumption isn’t necessary true for rough terrain. The true form
of the spatial component of the differential equation is ∇ · (D(x, y)∇u)
where D(x, y) is a multivariable function dictating the ease of diffusion in
different spatial locations. Propose a (non-negative) function D(x, y) and
repeat part (b) with this new diffusion term.
where cp is the specific heat of the adobe, ρ is the mass density of the adobe,
and k is the thermal conductivity of the adobe, can be used to model the heat
transfer through the adobe from the outside of the house to the inside. Clearly,
the thicker the adobe walls the better, but there is a trade off to be considered:
• it would be prohibitively expensive to build walls so think that the inside
temperature was (nearly) constant, and
• if the walls are too thin then the cost is low but the temperature inside
has a large amount of variability.
Your Tasks:
1. Pick a desert location in the southwestern US (New Mexico, Arizona,
Nevada, or Southern California) and find some basic temperature data to
model the outside temperature during typical summer and winter months.
2. Do some research on the cost of building adobe walls and find approxima-
tions for the parameters in the heat equation.
3. Use a numerical model to find the optimal thickness of an adobe wall. Be
sure to fully describe your criteria for optimality, the initial and boundary
conditions used, and any other simplifying assumptions needed for your
model.
Appendix A
Introduction to Python
In this optional Chapter we will walk through some of the basics of using Python3
- the powerful general-purpose programming language that we’ll use throughout
this class. I’m assuming throughout this Chapter that you’re familiar with other
programming languages such as R, Java, C, or MATLAB. Hence, I’m assuming
that you know the basics about what a programming langue “is” and “does.”
There are a lot of similarities between several of these languages, and in fact
they borrow heavily from each other in syntax, ideas, and implementation.
While you work through this chapter it is expected that you do every one of
the examples and exercises on your own. The material in this chapter is also
support by a collection of YouTube videos which you can find here: https://ww
w.youtube.com/playlist?list=PLftKiHShKwSO4Lr8BwrlKU_fUeRniS821.
for our computing needs. You’ll learn more as the course progresses so use this
chapter as a reference just to get going with the language.
There is an overwhelming abundance of information available about Python and
the suite of tools that we will frequently use.
• Python https://www.python.org/,
• numpy (numerical Python) https://www.numpy.org/,
• matplotlib (a suite of plotting tools) https://matplotlib.org/,
• scipy (scientific Python) https://www.scipy.org/, and
• sympy (symbolic Python) https://www.sympy.org/en/index.html.
These tools together provide all of the computational power that will need. And
they’re free!
In a Jupyter Notebook you will write your code in a code block, and when you’re
ready to run it you can press Shift+Enter (or Control+Enter) and you’ll see
your output. Shift+Enter will evaluate your code and advance to the next block
of code. Control+Enter will evaluate without advancing the cursor to the next
block.
Exercise A.3. You should now spend a bit of time poking around in Jupyter
Notebooks. Figure out how to
• save a file,
• load a new iPython Notebook (Jupyter Notebook) file from your computer
or your Google Drive,
• write text, including LaTeX formatted mathematics,in a Jupyter Notebook,
• share and download a Google Colab document, and
• use the keyboard to switch between writing text and writing code.
A.4.1 Variables
Variables in Python can contain letters (lower case or capital), numbers 0-9, and
some special characters such as the underscore. Variable names should start
with a letter. Of course there are a bunch of reserved words (just like in any
other language). You should look up what the reserved words are in Python so
you don’t accidentally use them.
You can do the typical things with variables. Assignment is with an equal sign
(be careful R users, we will not be using the left-pointing arrow here!).
Warning: When defining numerical variables you don’t always get floating point
numbers. In some programming languages, if you write x=1 then automatically
x is saved as 1.0; a floating point decimal number, not an integer. However, in
Python if you assign x=1 it is defined as an integer (with no decimal digits) but
if you assign x=1.0 it is assigned as a floating point number.
# assign some variables
x = 7 # integer assignment of the integer 7
y = 7.0 # floating point assignment of the decimal number 7.0
print("The variable x is",x," and has type", type(x),". \n")
print("The variable y is",y," and has type", type(y),". \n")
Exercise A.4. What happens if you type 7ˆ2 into Python? What does it give
you? Can you figure out what it is doing?
Exercise A.5. Write code to define positive integers a, b, and c of your own
choosing. Then calculate a2 , b2 , and c2 . When you have all three values
computed, check to see if your three values form a Pythagorean Triple so that
a2 + b2 = c2 . Have Python simply say True or False to verify that you do, or
do not, have a Pythagorean Triple defined. Hint: You will need to use the ==
Boolean check just like in other programming languages.
Example A.1. (Lists and Indexing) Let’s look at a few examples of indexing
from lists. In this example we will use the list of numbers 0 through 8. This list
contains 9 numbers indexed from 0 to 8.
• Create the list of numbers 0 through 8 and then print only the element
with index 0.
MyList = [0,1,2,3,4,5,6,7,8]
print(MyList[0])
• Print all elements up to, but not including, the third element of MyList.
MyList = [0,1,2,3,4,5,6,7,8]
print(MyList[:2])
• Print the elements indexed 1 through 4. Beware! This is not the first
through fifth element.
MyList = [0,1,2,3,4,5,6,7,8]
print(MyList[1:5])
• Print every other element in the list starting with the first.
MyList = [0,1,2,3,4,5,6,7,8]
print(MyList[0::2])
Example A.2. (Range and Lists) Let’s look at another example of indexing
in lists. In this one we’ll use the range command to build the initial list of
numbers. Read the code carefully so you know what each line does, and then
execute the code on your own to verify your thinking.
# range is a handy command for creating a sequence of integers
MySecondList = range(4,20)
print(MySecondList) # this is a "range object" in Python.
# When using range() we won't actually store all of the
# values in memory.
print(list(MySecondList))
# notice that we didn't create the last element!
336 APPENDIX A. INTRODUCTION TO PYTHON
In Python, elements in a list do not need to be the same type. You can mix
integers, floats, strings, lists, etc.
Example A.3. In this example we see a list of several items that have different
data types: float, integer, string, and complex. Note that the imaginary number
i is represented by 1j in Python. This is common in many scientific disciplines
and is just another thing that we’ll need to get used to in Python.
√ (For example,
j is commonly used as the symbol for the imaginary unit −1 ) in electical
engineering since i is the symbol commonly used for electric current, and using i
for both would be problematic).
MixedList = [1.0, 7, 'Bob', 1-1j]
print(MixedList)
print(type(MixedList[0]))
print(type(MixedList[1]))
print(type(MixedList[2]))
print(type(MixedList[3]))
# Notice that we use 1j for the imaginary number "i".
Exercise A.6. In this exercise you will put your new list skills into practice.
a. Create the list of the first several Fibonacci numbers:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89.
b. Print the first four elements of the list.
c. Print every third element of the list starting from the first.
d. Print the last element of the list.
e. Print the list in reverse order.
f. Print the list starting at the last element and counting backward by every
other element.
A.4. PYTHON PROGRAMMING BASICS 337
Example A.5. The .remove command can be used to remove an element from
a list.
MyList = [0,1,2,3]
MyList.append('a') # append the string 'a' to the end of the list
MyList.append('a') # do it again ... just for fun
MyList.append(15) # append the number 15 to the end of the list
MyList.remove('a') # remove the first instance of `a` from the list
print(MyList)
MyList.remove(3) # now let's remove the 3
print(MyList)
print(MyList)
MyList.insert(0,'A') # insert the letter `A` at the 0-indexed spot
# insert the letter `B` at the spot with index 3
MyList.insert(3,'B')
# remember that index 3 means the fourth spot in the list
print(MyList)
Exercise A.7. In this exercise you will go a bit further with your list operation
skills.
a. Create the list of the first several Lucas Numbers: 1, 3, 4, 7, 11, 18, 29, 47.
b. Add the next three Lucas Numbers to the end of the list.
c. Remove the number 3 from the list.
d. Insert the 3 back into the list in the correct spot.
e. Print the list in reverse order.
f. Do a few other list operations to this list and report your findings.
A.4.4 Tuples
In Python, a “tuple” is like an ordered pair (or order triple, or order quadruple,
...) in mathematics. We will occasionally see tuples in our work in numerical
analysis so for now let’s just give a couple of code snippets showing how to store
and read them.
We can define the tuple of numbers (10, 20) in Python as follows.
point = 10, 20 # notice that I don't need the parenthesis
print(point, type(point))
We can also define a tuple with parenthesis if we like. Python doesn’t care.
point = (10, 20) # now we define the tuple with parenthesis
print(point, type(point))
can do loops and conditional statements in very easy ways in Python. The thing
to keep in mind is that the Python language is very white-space-dependent. This
means that your indentations need to be correct in order for a loop to work. You
could get away with sloppy indention in other languages but not so in Python.
Also, in some languages (like R and Java) you need to wrap your loops in curly
braces. Again, not so in Python.
Caution: Be really careful of the white space in your code when you write
loops.
• a control statement,
• a colon, a new line,
• indent four spaces,
• some programming statements
When you are done with the loop just back out of the indention. There is no
need for an end command or a curly brace. All of the control statements in
Python are white-space-dependent.
In Python you can use a more compact notation for for loops sometimes. This
takes a bit of getting used to, but is super slick!
for loops can also be used to build recursive sequences as can be seen in the
next couple of examples.
Example A.10. In the following code we write a for loop that outputs a list of
the first 7 iterations of the sequence xn+1 = −0.5xn + 1 starting with x0 = 3.
Notice that we’re using the command x.append instead of x[n + 1] to append the
new term to the list. This allows us to grow the length of the list dynamically
as the loop progresses.
x=[3.0]
for n in range(0,7):
x.append(-0.5*x[n] + 1)
print(x) # print the whole list x at each step of the loop
Exercise A.8. We want to sum the first 100 perfect cubes. Let’s do this in two
ways.
a. Start off a variable called Total at 0 and write a for loop that adds the
next perfect cube to the running total.
b. Write a for loop that builds the sequence of the first 100 perfect cubes.
After the list has been built find the sum with the sum command.
The answer is: 25,502,500 so check your work.
A.4. PYTHON PROGRAMMING BASICS 341
Exercise A.9. Write a for loop that builds the first 20 terms of the sequence
xn+1 = 1 − x2n with x0 = 0.1. Pre-allocate enough memory in your list and
then fill it with the terms of the sequence. Only print the list after all of the
computations have been completed.
Example A.12. Print the numbers 0 through 4 and then the word “done.”
We’ll do this by starting a counter variable, i, at 0 and increment it every time
we pass through the loop.
i = 0
while i < 5:
print(i)
i += 1 # increment the counter
print("done")
Example A.13. Now let’s use a while loop to build the sequence of Fibonacci
numbers and stop when the newest number in the sequence is greater than 1000.
Notice that we want to keep looping until the condition that the last term is
greater than 1000 – this is the perfect task for a while loop, instead of a for
loop, since we don’t know how many steps it will take before we start the task
Fib = [1,1]
while Fib[-1] <= 1000:
Fib.append(Fib[-1] + Fib[-2])
print("The last few terms in the list are:\n",Fib[-3:])
Exercise A.10. Write a while loop that sums the terms in the Fibonacci
sequence until the sum is larger than 1000
A.4.5.3 If Statements
Conditional (if) statements allow you to run a piece of code only under certain
conditions. This is handy when you have different tasks to perform under
342 APPENDIX A. INTRODUCTION TO PYTHON
different conditions.
Name = "Billy"
if Name == "Alice":
print("Hello, Alice. Isn't it a lovely day to learn Python?")
else:
print("You're not Alice. Where is Alice?")
(Take note that the output will change every time you run it)
if x < 0.33:
print(x," < 1/3")
elif x < 0.67:
print("1/3 <= ",x,"< 2/3")
else:
print(x, ">= 2/3")
(Take note that the output will change every time you run it)
starting with a positive integer of your choosing. The sequence will converge1 to
1 so your code should stop when the sequence reaches 1.
A.4.6 Functions
Mathematicians and programmers talk about functions in very similar ways, but
they aren’t exactly the same. When we say “function” in a programming sense
we are talking about a chunk of code that you can pass parameters and expect
an output of some sort. This is not unlike the mathematician’s version, but
unlike a mathematical function we can have multiple outputs for a programmatic
function. For example, in the mathematical function f (x) = x2 + 3 we pass a real
number in as an input and get a real number out as output. In a programming
language, on the other hand, you might send in a function and a few real
numbers and output a plot of the function along with the definite integral of the
function between the real numbers. Notice that there can be multiple inputs
and multiple outputs, and the none have to be the same type of object. In this
sense, a programmer’s definition of a function is a bit more flexible than that of
a mathematician’s.
In Python, to define a function we start with def, followed by the function’s
name, any input variables in parenthesis, and a colon. The indented code after
the colon is what defines the actions of the function.
Example A.17. The following code defines the polynomial f (x) = x3 + 3x2 +
3x + 1 and then evaluates the function at a point x = 2.3.
1 Actually, it is still an open mathematical question that every integer seed will converge
to 1. The Collatz sequence has been checked for many millions of initial seeds and they all
converge to 1, but there is no mathematical proof that it will always happen.
344 APPENDIX A. INTRODUCTION TO PYTHON
def f(x):
return(x**3 + 3*x**2 + 3*x + 1)
f(2.3)
• Once we have the function defined we can call upon it just like we would
on paper.
• We cannot pass symbols into this type of function. See the section on
sympy in this chapter if you want to do symbolic computation.
Example A.18. One cool thing that you can do with Python functions is call
them recursively. That is, you can call the same function from within the function
itself. This turns out to be really handy in several mathematical situations.
Now let’s define a function for the factorial. This function is naturally going to
be recursive in the sense that it calls on itself!
def Fact(n):
if n==0:
return(1)
else:
return( n*Fact(n-1) )
# Note: we are calling the function recursively.
When you run this code there will be no output. You have just defined the
function so you can use it later. So let’s use it to make a list of the first several
factorials. Note the use of a for loop in the following code.
FactList = [Fact(n) for n in range(0,10)]
FactList
Example A.19. For this next example let’s define the sequence
2xn , xn ∈ [0, 0.5]
xn+1 =
2xn − 1, xn ∈ (0.5, 1]
A.4. PYTHON PROGRAMMING BASICS 345
as a function and then build a loop to find the first several iterates of the sequence
starting at any real number between 0 and 1.
# Define the function
def MySeq(xn):
if xn <= 0.5:
return(2*xn)
else:
return(2*xn-1)
# Now build a sequence with this function
x = [0.125] # arbitrary starting point
for n in range(0,5): # Let's only build the first 5 terms
x.append(MySeq(x[-1]))
print(x)
Example A.20. A fun way to approximate the square root of two is to start
with any positive real number and iterate over the sequence
1 1
xn+1 = xn +
2 xn
until we are within any tolerance we like of the square root of 2. Write code that
defines the sequence as a function and then iterates in a while loop until we are
within 10−8 of the square root of 2.
Hint: Import the math package so that you get the square root. More about
packages in the next section.
from math import sqrt
def f(x):
return(0.5*x + 1/x)
x = 1.1 # arbitrary starting point
print("approximation \t\t exact \t\t abs error")
while abs(x-sqrt(2)) > 10**(-8):
x = f(x)
print(x, sqrt(2), abs(x - sqrt(2)))
This problem will require that you build a function, write a ‘for’ loop (for the
integers 2-20), and write a ‘while’ loop inside your ‘for’ loop to do the iterations.
• As a lambda function:
f = lambda x: x**2+3
You can see that in the Lambda Function we are explicitly stating the name of
the variable immediately after the word lambda, then we put a colon, and then
the function definition.
Now if we want to evaluate the function at a point, say x = 1.5, then we can
write code just like we would mathematically: f (1.5)
f = lambda x: x**2+3
f(1.5) # evaluate the function at x=1.5
where the result is exactly the floating point number we were interested in.
The distinct mathematical advantage for using lambda functions is that the
code for setting up a Lambda Function is about as close as we’re going to get
to a mathematically defined function as we would write it on paper, but the
code for evaluating a lambda Function is exactly what we would write on paper.
Additionally, there is less coding overhead than for defining a function with the
command.
We can also define Lambda Functions of several variables. For example, if we
want to define the mathematical function f (x, y) = x2 + xy + y 3 we could write
the code
f = lambda x, y: x**2 + x*y + y**3
If we wanted the value f (2, 4) we could now write the code f(2,4).
A.4. PYTHON PROGRAMMING BASICS 347
Example A.21. You may recall Euler’s Method from your differential equations
training. Euler’s Method will give a list of approximate values of the solution to
a first order differential equation at given times.
Exercise A.14. Go back to Exercise A.12 and repeat this exercise using a
lambda Function instead of a Python function.
Exercise A.15. Go back to Exercise A.13 and repeat this exercise using a
lambda function instead of a Python function.
A.4.8 Packages
Unlike mathematical programming languages like MATLAB, Maple, or Mathe-
matica, where every package is already installed and ready to use, Python allows
you to only load the packages that you might need for a given task. There are
several advantages to this along with a few disadvantages.
Advantages:
1. You can have the same function doing different things in different scenarios.
For example, there could be a symbolic differentiation command and a
numerical differentiation command coming from different packages that
are used in different ways.
348 APPENDIX A. INTRODUCTION TO PYTHON
Example A.22. The code below imports the math package into your instance
of Python and calculates the cosine of π/4.
import math
x = math.cos(math.pi / 4)
print(x)
√
The answer, unsurprisingly, is the decimal form of 2/2.
Example A.23. Here we import the entire math package so we can use every
one of the functions therein without having to use the math prefix.
from math import * # read this as: from math import everything
x = cos(pi / 4)
print(x)
√
The end result is exactly the same: the decimal form of 2/2, but now we had
less typing to do.
A.4. PYTHON PROGRAMMING BASICS 349
Now you can freely use the functions that were imported from the math package.
There is a disadvantage to this, however. What if we have two packages that
import functions with the same name. For example, in the math package and
in the numpy package there is a cos() function. In the next block of code we’ll
import both math and numpy, but instead we will import them with shortened
names so we can type things a bit faster.
Example A.24. Here we import math and numpy under aliases so we can use
the shortened aliases and not mix up which functions belong to which packages.
import math as ma
import numpy as np
# use the math version of the cosine function
x = ma.cos( ma.pi / 4)
# use the numpy version of the cosine function
y = np.cos( np.pi / 4)
print(x, y)
√
Both x and y in the code give the decimal approximation of 2/2. This is clearly
pretty redundant in this really simple case, but you should be able to see where
you might want to use this and where you might run into troubles.
Of course, there will be times when you need help with a function. You can
use the help command to view the help documentation for any function. For
example, you can run the code help(math.acos) to get help on the arc cosine
function from the math package.
Exercise A.16. Import the math package, figure out how the log function
works, and write code to calculate the logarithm of the number 8.3 in base 10,
base 2, base 16, and base e (the natural logarithm).
350 APPENDIX A. INTRODUCTION TO PYTHON
Example A.26. To start with let’s look at a really simple example. Say you
have a list of real numbers and you want to take the sine every element in the
list. If you just try to take the sine of the list you will get an error. Try it
yourself.
from math import pi, sin
MyList = [0,pi/6, pi/4, pi/3, pi/2, 2*pi/3, 3*pi/4, 5*pi/6, pi]
sin(MyList)
You could get around this error using some of the tools from base Python, but
none of them are very elegant from a programming perspective.
from math import pi, sin
MyList = [0,pi/6, pi/4, pi/3, pi/2, 2*pi/3, 3*pi/4, 5*pi/6, pi]
SineList = [sin(n) for n in MyList]
print(SineList)
Perhaps more simply, say we wanted to square every number in a list. Just
appending the code **2 to the end of the list will fail!
MyList = [1,2,3,4]
MyList**2 # This will produce an error
If, instead, we define the list as a numpy array instead of a Python list then
everything will work mathematically exactly the way that we intend.
import numpy as np
MyList = np.array([1,2,3,4])
MyList**2 # This will work as expected! You should stop now and try to take the sine o
Example A.27. (numpy Matrices) The first thing to note is that a matrix is
a list of lists (each row is a list).
352 APPENDIX A. INTRODUCTION TO PYTHON
import numpy as np
A = np.array([[1,2],[3,4]])
print("The matrix A is:\n",A)
v = np.array([[5],[6]]) # this creates a column vector
print("The vector v is:\n",v)
w = np.array([5,6]) # this creates a row vector
print("The vector w is:\n",w)
Example A.31. Let’s read the first row from that matrix A.
import numpy as np
A = np.array([[1,2],[3,4]])
print(A[0,:])
Example A.32. Let’s read the second column from the matrix A.
import numpy as np
A = np.array([[1,2],[3,4]])
print(A[:,1])
If, however, we recast these arrays as matrices we can get them to behave as we
would expect from Linear Algebra. It is up to you to check that these products
are indeed correct from the definitions of matrix multiplication from Linear
Algebra.
Example A.35. Recasting the numpy arrays as matrices allows you to use
multiplication as we would expect from linear algebra.
import numpy as np
A = np.matrix([[1,2],[3,4]])
v = np.matrix([[5],[6]])
w = np.matrix([5,6])
print("The product A*A is:\n",A*A)
print("The product A*v is:\n",A*v)
print("The product w*A is:\n",w*A)
It remains to show some of the other basic linear algebra operations: inverses,
determinants, the trace, and the transpose.
Example A.37. (Matrix Inverse) The inverse of a square matrix is done with
A.I.
import numpy as np
A = np.matrix([[1,2],[3,4]])
Ainv = A.I # Taking the inverse is also pretty simple
print(Ainv)
print(A * Ainv) # check that we get the identity matrix back
Oddly enough, the trace returns a matrix, not a scalar Therefore you’ll have to
read the first entry (index [0,0]) from the answer to just get the trace.
Exercise A.17. Now that we can do some basic linear algebra with numpy it is
your turn. Define the matrix B and the vector u as
1 4 8 6
B = 2 3 −1 and u = 3 .
0 9 −3 −7
Then find
a. Bu
b. B 2 (in the traditional linear algebra sense)
c. The size and shape of B
d. BT u
e. The element-by-element product of B with itself
f. The dot product of u with the first row of B
• meshgrid builds two arrays that when paired make up the ordered pairs for
a 2D (or higher D) mesh grid of points. This is the same as the meshgrid
command in MATLAB.
Example A.41. The np.linspace command builds a list with equal (linear)
spacing between the starting and ending values.
import numpy as np
y = np.linspace(0,5,11)
print(y)
The thing to notice with the np.meshgrid() command is that when you lay the
two matrices on top of each other, the matching entries give every ordered pair
in the domain.
A.6. PLOTTING WITH MATPLOTLIB 357
Exercise A.18. Now time to practice with some of these numpy commands.
a. Create a numpy array of the numbers 1 through 10 and square every entry
in the list without using a loop.
b. Create a 10 × 10 identity matrix and change the top right corner to a 5.
Hint: np.identity()
c. Find the matrix-vector product of the answer to part (a) (as a column)
and the answer to part (b).
d. Change the bottom row of your matrix from part (b) to all 3’s, then change
the third column to all 7’s, and then find the 5th power of this matrix.
grid on it, and give a meaningful legend and axis labels. To do so we first need
to take care of a couple of housekeeping items.
• Import numpy so we can take advantage of some good numerical routines.
• Import matplotlib’s pyplot module. The standard way to pull it is in is
with the nickname plt (just like with numpy when we import it as np).
import numpy as np
import matplotlib.pyplot as plt
In Jupyter Notebooks the plots will not show up unless you tell the notebook to
put them “inline.” Usually we will use the following command to get the plots
to show up. You do not need to do this in Google Colab. The percent sign is
called a magic command in Jupyter Notebooks. This is not a Python command,
but it is a command for controlling the Jupyter IDE specifically.
%matplotlib inline
Now we’ll build a numpy array of x values (using the np.linspace command)
and a numpy array of y values for the sine function.
# 100 equally spaced points from 0 to 2pi
x = np.linspace(0,2*np.pi, 100)
y = np.sin(x)
• Finally, build the plot with plt.plot(). The syntax is: plt.plot(x, y,
’color’, ...) where you have several options that you can pass (more
on that later).
• Notice that we send the plot legend in directly to the plot command. This
is optional and could set the legend up separately if we like.
• Then we’ll add the grid with plt.grid()
• Then we’ll add the legend to the plot
• Finally we’ll add the axis labels
• We end the plotting code with plt.show() to tell Python to finally show
the plot. This line of code tells Python that you’re done building that plot.
plt.plot(x,y, 'green', label='The Sine Function')
plt.grid()
plt.legend()
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.show()
Example A.46. Now let’s do a second example, but this time we want to show
four different plots on top of each other. When you start a figure, matplotlib
is expecting all of those plots to be layered on top of each other.(Note:For
MATLAB users, this means that you do not need the hold on command since
it is automatically “on.”)
A.6. PLOTTING WITH MATPLOTLIB 359
on the domain x ∈ [0, 1] with 100 equally spaced points. We’ll give each of the
plots a different line style, built a legend, put a grid on the plot, and give axis
labels.
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline # you may need this in Jupyter Notebooks
plt.title("Awesome Title")
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.show()
Figure A.2: Plots of the sine, cosine, and sums and differences.
Notice that the legend was placed automatically. There are ways to control the
placement of the legend if you wish, but for now just let Python and matplotlib
have control over the placement.
Example A.47. Now let’s create the same plot with slightly different code.
The plot command can take several (x, y) pairs in the same line of code. This
can really shrink the amount of coding that you have to do when plotting several
functions on top of each other.
# The next line of code does all of the plotting of all
# of the functions. Notice the order: x, y, color and
# line style, repeat
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,1,100)
y0 = np.sin(2*np.pi*x)
y1 = np.cos(2*np.pi*x)
y2 = y0 + y1
y3 = y0 - y1
plt.plot(x, y0, 'b-.', x, y1, 'r--', x, y2, 'g:', x, y3, 'k-')
plt.grid()
A.6. PLOTTING WITH MATPLOTLIB 361
Figure A.3: A second plot of the sine, cosine, and sums and differences.
A.6.2 Subplots
It is often very handy to place plots side-by-side or as some array of plots. The
subplots command allows us that control. The main idea is that we are setting
up a matrix of blank plots and then populating the axes with the plots that we
want.
Example A.48. Let’s repeat the previous exercise, but this time we will put
each of the plots in its own subplot. There are a few extra coding quirks that
come along with building subplots so we’ll highlight each block of code separately.
• First we set up the plot area with plt.subplots(). The first two inputs
to the subplots command are the number of rows and the number of
362 APPENDIX A. INTRODUCTION TO PYTHON
columns in your plot array. For the first example we will do 2 rows of
plots with 2 columns – so there are four plots total. The last input for the
subplots command is the size of the figure (this is really just so that it
shows up well in Jupyter Notebooks – spend some time playing with the
figure size to get it to look right).
• Then we build each plot individually telling matplotlib which axes to use
for each of the things in the plots.
• Notice the small differences in how we set the titles and labels
• In this example we are setting the y-axis to the interval [−2, 2] for consis-
tency across all of the plots.
# set up the blank matrix of plots
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,1,100)
y0 = np.sin(2*np.pi*x)
y1 = np.cos(2*np.pi*x)
y2 = y0 + y1
y3 = y0 - y1
fig.tight_layout()
plt.show()
The fig.tight_layout() command makes the plot labels a bit more readable
in this instance (again, something you can play with).
how small the function is getting in orders of magnitude instead of as a raw real
number. We’ll use this often in numerical methods.
axis[0].plot(x,y, 'r')
axis[0].grid()
axis[0].set_title("Linearly scaled y axis")
axis[0].set_xlabel("x")
axis[0].set_ylabel("y")
axis[1].semilogy(x,y, 'k--')
axis[1].grid()
axis[1].set_title("Logarithmically scaled y axis")
axis[1].set_xlabel("x")
axis[1].set_ylabel("Log(y)")
plt.show()
It should be noted that the same result can be achieved using the yscale
command along with the plot command instead of using the semilogy command.
Pay careful attention to the subtle changes in the following code.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,500,1000)
y = 10**(-0.01*x)
fig, axis = plt.subplots(1,2, figsize = (10,5))
axis[0].plot(x,y, 'r')
axis[0].grid()
axis[0].set_title("Linearly scaled y axis")
axis[0].set_xlabel("x")
axis[0].set_ylabel("y")
axis[1].set_xlabel("x")
axis[1].set_ylabel("Log(y)")
Exercise A.21. Plot the function f (x) = x3 for x ∈ [0, 1] on linearly scaled axes,
logarithmic axis in the y direction, logarithmically scaled axes in the x direction,
and a log-log plot with logarithmic scaling on both axes. Use subplots to put
your plots side-by-side. Give appropriate labels, titles, etc.
Exercise A.22. Load sympy and use the dir() command to see what functions
are inside the sympy library.
Example A.51. Let’s define the variable x as a symbolic variable. Then we’ll
define a few symbolic expressions that use x as a variable.
import sympy as sp
x = sp.Symbol('x') # note the capitalization
Now we’ll define the function f (x) = (x + 2)3 and spend the next few examples
playing with it.
f = (x+2)**3 # A symbolic function
print(f)
Notice that the output of these lines of code is not necessarily very nicely
formatted as a symbolic expression. What we would really want to see is (x + 2)3 .
If you include the code sp.init_printing() after you import the sympy library
then you should get nice LaTeX style formatting in your answers.
Example A.52. Be careful that you are using symbolically defined function
along with your symbols. For example, see the code below:
# this line gives an error since it doesn't know
# which "sine" to use.
g = sin(x)
import sympy as sp
x = sp.Symbol('x')
g = sp.sin(x) # this one works
print(g)
A.7. SYMBOLIC PYTHON WITH SYMPY 367
import sympy as sp
x = sp.Symbol('x')
sp.simplify( sp.sin(x) / sp.cos(x)) # simplify a trig expression.
As expected, the roots of the function h(x) are x = −3 and x = 1 since h(x)
factors into h(x) = (x + 3)(x − 1).
f = (x+2)**3
f.subs(x,5) # This actually substitutes 5 for x in f
A.7.4.1 Derivatives
The diff command in sympy does differentiation: sp.diff(function,
variable, [order]).
Take careful note that diff is defined both in sympy and in numpy. That means
that there are symbolic and numerical routines for taking derivatives in Python
. . . and we need to tell our instance of Python which one we’re working with
every time we use it.
Example A.61. Now let’s get the first, second, third, and fourth derivatives of
the function f.
import sympy as sp
x = sp.Symbol('x') # Define the symbol x
f = (x+2)**3 # Define a symbolic function f(x) = (x+2)ˆ3
df = sp.diff(f,x,1) # first derivative
ddf = sp.diff(f,x,2) # second deriative
dddf = sp.diff(f,x,3) # third deriative
ddddf = sp.diff(f,x,4) # fourth deriative
print("f'(x) = ",df)
print("f''(x) = ",sp.simplify(ddf))
print("f'''(x) = ",sp.simplify(dddf))
print("f''''(x) = ",sp.simplify(ddddf))
370 APPENDIX A. INTRODUCTION TO PYTHON
Example A.62. Now let’s do some partial derivatives. The diff command is
still the right tool. You just have to tell it which variable you’re working with.
import sympy as sp
x, y = sp.symbols('x y') # Define the symbols
f = sp.sin(x*y) + sp.cos(x**2) # Define the function
fx = sp.diff(f,x)
fy = sp.diff(f,y)
print("f(x,y) = ", f)
print("f_x(x,y) = ", fx)
print("f_y(x,y) = ", fy)
Example A.63. It is worth noting that when you have a symbolically defined
function you can ask sympy to give you the LaTeX code for the symbolic function
so you can use it when you write about it.
import sympy as sp
x, y = sp.symbols('x y') # Define the symbols
f = sp.sin(x*y) + sp.cos(x**2) # Define the function
sp.latex(f)
A.7.4.2 Integrals
For integration, the sp.integrate tool is the command for the job:
sp.integrate(function, variable) will find an antiderivative and
sp.integrate(function, (variable, lower, upper)) will evaluate a
definite integral.
The integrate command in sympy accepts a symbolically defined function along
with the variable of integration and optional bounds. If the bounds aren’t
given then the command finds the antiderivative. Otherwise it finds the definite
integral.
The sympy package deals with the second variable just as it should.
import sympy as sp
x, y = sp.symbols('x y')
g = sp.sin(x*y) + sp.cos(x)
G = sp.integrate(g,x)
print(G)
It is apparent that sympy was sensitive to the fact that there was some trouble
at y = 0 and took care of it with a piece wise function.
Notice that the variable and the bounds are sent to the integrate command
as a tuple. Furthermore, notice that we had to send the symbolic version of π
instead of any other version (e.g. numpy).
import sympy as sp
x = sp.Symbol('x')
sp.integrate( sp.sin(x), (x,0,sp.pi))
We have to use the “infinity” symbol from sympy. It is two lower-case O’s next
to each other: oo. It kind of looks like and infinity I suppose.
import sympy as sp
x = sp.Symbol('x')
sp.integrate( sp.exp(-x**2) , (x, -sp.oo, sp.oo))
A.7.4.3 Limits
The limit command in sympy takes symbolic limits: sp.limit(function,
variable, value, [direction])
372 APPENDIX A. INTRODUCTION TO PYTHON
The direction (left or right) is optional and if you leave it off then the limit is
considered from both directions.
sin(x)
lim .
x→0 x
import sympy as sp
x = sp.Symbol('x')
sp.limit( sp.sin(x)/x, x, 0)
f (x + h) − f (x)
lim
h→0 h
for the function f (x) = (x + 2)3 . Taking the limit should give the derivative so
we’ll check that the diff command gives us the same thing using == . . . warning!
import sympy as sp
x = sp.Symbol('x')
f = (x+2)**3
print(sp.diff(f,x))
h = sp.Symbol('h')
df = sp.limit( (f.subs(x,x+h) - f) / h , h , 0 )
print(df)
print(df == sp.diff(f,x))
# notice that these are not "symbolically" equal
print(df == sp.expand(sp.diff(f,x))) # but these are
Notice that when we check to see if two symbolic functions are equal they must
be in the same exact symbolic form. Otherwise sympy won’t recognize them as
actually being equal even though they are mathematically equivalent.
Exercise A.23. Define the function f (x) = 3x2 + x sin(x2 ) symbolically and
then do the following:
a. Evaluate the function at x = 2 and get symbolic and numerical answers.
b. Take the first and second derivative
c. Take the antiderivative
d. Find the definite integral from 0 to 1
e. Find the limit as x goes to 3
A.7. SYMBOLIC PYTHON WITH SYMPY 373
Example A.70. Find the Taylor series for f (x) = ex centered at x = 0 and
centered at x = 1.
import sympy as sp
x = sp.Symbol('x')
sp.series( sp.exp(x),x)
import sympy as sp
x = sp.Symbol('x')
sp.series( sp.exp(x), x, 1, 3) # expand at x=1 (3 terms)
Finally, if we want more terms then we can send the number of desired terms to
the series command.
import sympy as sp
x = sp.Symbol('x')
sp.series( sp.exp(x), x, 0, 3) # expand at x=0 and give 3 terms
Example A.71.
√ Let’s solve the equation x2 − 2 = 0 for x. We know that the
roots are ± 2 so this should be pretty trivial for a symbolic solver.
import sympy as sp
x = sp.Symbol('x')
sp.solve( x**2 - 2, x)
374 APPENDIX A. INTRODUCTION TO PYTHON
Example A.72. Now let’s solve the equation x4 − x2 − 1 = 0 for x. You might
recognize this as a quadratic equation in disguise so you can definitely do it by
hand ... if you want to. (You could also recognize that this equation is related
to the golden ratio!)
import sympy as sp
x = sp.Symbol('x')
sp.solve( x**4 - x**2 - 1, x)
Run the code yourself to see the output. In nicer LaTeX style formatting, the
answer is
√ √ √ √
s s s s
−i − +1 5 1 5 1 5 1 5
, i − + , − + , +
2 2 2 2 2 2 2 2
Notice that sympy has no problem dealing with the complex roots.
In the previous example the answers may be a bit hard to read due to their
symbolic form. This is particularly true for far more complicated equation solving
problems. The next example shows how you can loop through the solutions and
then print them in decimal form so they are a bit more readable.
Exercise A.24. Give the exact and floating point solutions to the equation
x4 − x2 − x + 5 = 0.
When you want to solve a symbolic equation numerically you can use the nsolve
command. This will do something like Newton’s method in the background. You
need to give it a starting point where it can look for you the solution to your
equation: sp.nsolve( equation, variable, intial guess )
A.7. SYMBOLIC PYTHON WITH SYMPY 375
Run the code yourself to see the exact solution. In nicer LaTeX style formatting
the answer is:
√ √
s
1 1 3i 3 87 28 1
+ − − + + √ q√ ,
3 2 2 9 27
9 − 21 − 23i 3 987 + 28
27
√ √
s
1 1 1 3i 3 87 28
+ √ q√ + − + + ,
3 2 2 9 27
9 − 12 + 23i 3
9
87
+ 28
27
√
s
1 1 3 87 28
q√ + + +
3 87 28 3 9 27
9 9 + 27
which is rather challenging to read. We can give all of the floating point
approximations with the following code.
import sympy as sp
x = sp.Symbol('x')
ExactSoln = sp.solve(x**3 - x**2 - 2, x) # symbolic solution
print("First Solution: ",sp.N(ExactSoln[0]))
print("Second Solution: ",sp.N(ExactSoln[1]))
print("Third Solution: ",sp.N(ExactSoln[2]))
If we were only looking for the floating point real solution near x = 1 then we
could just use nsolve.
import sympy as sp
x = sp.Symbol('x')
NumericalSoln = sp.nsolve(x**3 - x**2 - 2, x, 1) # solution near x=1
print(NumericalSoln)
x3 ln(x) = 7
Example A.75. Let’s get a quick plot of the function f (x) = (x + 2)3 on the
domain x ∈ [−5, 2].
import sympy as sp
x = sp.Symbol('x')
f = (x+2)**3
sp.plot(f,(x,-5,2))
Example A.76. Multiple plots can be done at the same time with the
sympy.plot command.
Plot f (x) = (x + 2)3 and g(x) = 20 cos(x) on the same axes on the domain
x ∈ [−5, 2].
import sympy as sp
x = sp.Symbol('x')
f = (x+2)**3
g = 20*sp.cos(x)
sp.plot(f, g, (x,-5,2))
Mathematical Writing
This appendix is designed to give you helpful hints for the writing required of the
homework and the projects. You will find that mathematical writing is different
than writing for literature, for general consumption, or perhaps other scientific
disciplines. Pay careful attention to the conventions mentioned in this chapter
when you write math.
• Do all of the math first without worrying too much about the writing.
• When you have your mathematical results you can start writing.
• Write the introduction last since at that point you know what you’ve
written.
• You will spend more time creating well-crafted figures than any other part
of a mathematical writing project. Expect the figures to take at least as
long as the math, the writing, and the editing.
your solutions forces you to iterate over this process but remember that
the process isn’t done until you’ve proofread what you typed.
When you read the equal sign as part of the sentence you realize that there
is no reason to write “is true.”
= is NOT a conjunction: The mathematical symbol = is an assertion that
the expression on its left and the expression on its right are equal. Do
not use it as a connection between steps in a series of calculations. Use
words for this purpose. Here is an example that misuses the = symbol
when solving the equation 3x = 6:
3x 6
Incorrect! 3x = 6 = = =x=2
| {z } 3
3
false!
Avoid ambiguity: When in doubt, repeat a noun rather using unspecific words
like “it” or “the.” For example, in the sentences
Let G be a simple graph with n ≥ 2 vertices that is not complete
and let G be its complement. Then it must contain at least one
edge.
there is some ambiguity about whether “it” refers to G or to the complement
of G. The second sentence is better written as “Then G must contain at
least one edge.”
Use Proper Notation: There are many notational conventions in mathemat-
ics. You need to follow the accepted conventions when using notation. For
384 APPENDIX B. MATHEMATICAL WRITING
i=1 a
Parenthesis are important: Parenthesis show the grouping of terms, and the
omission of parenthesis can lead to much unneeded confusion. For example,
Label and reference equations: When you need to refer to an equation later
it is common practice to label the equation with a number and then to
refer to this equation by that number. This avoids ambiguity and gives the
reader a better chance at understanding what you’re writing. Furthermore,
avoid using words like “below” and “above” since the reader doesn’t really
know where to look. One implication to this style of referencing is that
you should never reference an equation before you define it.
Incorrect:
In the equation below we consider the domain x ∈ (−1, 1)
∞
X xn
f (x) = .
j=1
n!
B.4. TIPS FOR WRITING CLEAR MATH 385
Correct:
Consider the summation
∞
X xn
f (x) = .
j=1
n!
a function f at a point a is
f (a + h) − f (a)
f 0 (a) = lim .
h→0 h
Now this does give us a formula to use to compute the derivative, but we
prefer to call this particular formula a definition to highlight the fact that
this is what we have chosen the word derivative to mean.
Expression: The word expression is used when there isn’t an equal sign. You
probably won’t need this word very often, but it is used like this: “The
factorization of the expression x2 − x − 6 is (x − 3)(x + 2).”
Solve/Evaluate: Equations are solved, whereas functions are evaluated. So
you would say, “We solved the equation for x,” but you would say “We
evaluated the function at x = 5 and found the function value to be 26.”
Avoid the words “plugged in,” such as “we plug 5 in for x,” when you
actually mean that you are doing substitution.
Add Subtract vs Plus Minus: The word subtract is used when discussing
what needs to be done: “Subtract two from five to get three.” Add is used
similarly: “Add two and five to get seven.” Minus is used when reading a
mathematical equation or expression. For example, the equation x − y = 5
would be read as “x minus y is equal to five.” Plus is used similarly. So
the equation x + y = 5 would be read as “x plus y is equal to five.” Some
things we don’t say are “We plus 2 and 5 to get 7” or “We minus x from
both sides of the equation.”
Number/Amount: The word number is used when referring to discrete items,
such as “there were a large number of cougars,” or “there are a large
number of books on my shelf.” The word amount is used when referring to
something that might come in a pile, such as “that is a huge amount of
sand!” or, “I only use a small amount of salt when I cook.”
Many/Much: These words are used in much the same way as number and
amount, with many in place of number and much in place of amount. For
example, we might say, “There aren’t as many cougars here as before,” or
“I don’t use as much salt as you do.”
Fewer/Less: These are the diminutive analogues of many and much. So, “There
are fewer cougars here than before,” or “You use less salt than I do.”
must include code then provide a permanent link to a shared document with
the proper viewing privileges (e.g. a Google Colab document that is set to view
only permissions).
The big takeaway: you will almost never include your code in your paper since
the reader won’t read it and the code likely only makes your central mathematical
points less clear to the reader. This is sometimes rather emotionally difficult
since the bulk of your time will be spent writing your code, but the bulk of your
writing will not be about the code – it should be about solving the problem at
hand. Remember that well-crafted plots along with an explanation can capture
all of the details of your code while simultaneously being very clear to the reader.
In the case that that you are writing about implementing an algorithm in a
particular language, be sure that the code is written in a different font. In
LaTeX consider using the verbatim environment to set your code apart from
the paragraphs and to give a typewriter-style font that reminds the reader that
they are reading code.
388 APPENDIX B. MATHEMATICAL WRITING
Appendix C
Optional Material
This Appendix contains a few sections that are considered optional for the course
as we teach it. Instructors may be interested in expanding upon what is here for
their classes.
C.1 Interpolation
The least squares problem that we studied in Chapter 4 seeks to find a best fitting
function that is closest (in the Euclidean distance sense) to a set of data. What
if, instead, we want to match the data points with a function. This is the realm
of interpolation. Take note that there are many many forms of interpolation
that are tailored to specific problems. In this brief section we cover only a few
of the simplest forms of interpolation involving only polynomial functions. The
problem that we’ll focus on can be phrased as:
Given a set of n + 1 data points (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ), find a polynomial
of degree at most n that exactly matches these points.
If we want to fit a polynomial to this data then we can use a cubic function
(which has 4 parameters) to match the data perfectly. Why is a cubic polynomial
the best choice?
Exercise C.2. Using the data from the previous problem, if we choose p(x) =
390 APPENDIX C. OPTIONAL MATERIAL
1 3 9 27 β3 10
be a set of ordered pairs where the x values are all unique. The goal of interpo-
lation is to find a function f (x) that matches the data exactly. Vandermonde
interpolation uses a polynomial of degree n − 1 since with such a polynomial we
have n unknowns and we can solve the least squares problem exactly. Doing so,
we arrive at the system of equations
1 x0 x20 · · · xn0
β0 y0
1 x1 x21 · · · xn1 β1 y1
1 x2 x22 · · · xn2 β2 y2
= .
.. .. .. . . . . .
. . . . .. .. ..
1 xn x2n ··· xnn βn yn
Exercise C.3. Write a python function that accepts an array of ordered pairs
(where each x value is unique) and builds a Vandermonde interpolation polyno-
mial. Test your function on the simple example given above and then on several
larger problems. It may be simplest to initially test on data generated from
functions that we know.
Exercise C.6. Consider the data set S = {(0, 1) , (1, 2) , (2, 5) , (3, 10)}.
a. Based on the descriptions of the p(x) and φj (x) functions, why would p(x)
be defined as
Exercise C.7. Is the Lagrange interpolation polynomial built form the previous
problem the same as the Vandermonde interpolation polynomial for the same
data?
Example C.1. Build a Lagrange interpolation polynomial for the set of points
(x − 2)(x − 3)
φ0 (x) =
(1 − 2)(1 − 3)
(x − 1)(x − 3)
φ1 (x) =
(2 − 1)(2 − 3)
(x − 1)(x − 2)
φ2 (x) = .
(3 − 1)(3 − 2)
Take careful note that the φ functions are built in a very particular way. Indeed,
φ0 (1) = 1, φ0 (2) = 0, and φ0 (3) = 0. Also, φ1 (1) = 0, φ1 (2) = 1), and φ1 (3) = 0.
Finally, note that φ2 (1) = 0, φ2 (1) = 0 and φ2 (3) = 1. Thus, the polynomial
p(x) can be built as
Exercise C.8. Write a python function that accepts a list of list of ordered
pairs (where each x value is unique) and builds a Lagrange interpolation
polynomial. Test your function on several examples.
Exercise C.10. As you should have noticed the quality of the interpolation
gets rather terrible near the endpoints when you use linearly spaced points for
the interpolation. A fix to this was first proposed by the Russian mathematician
Pafnuty Chebyshev (1821-1894). The idea is as follows:
• Draw a semicircle above the closed interval on which you are interpolating.
• Pick n equally spaced points along the semicircle (i.e. same arc length
between each point).
• Project the points on the semicircle down to the interval. Use these
projected points for the interpolation.
a. Draw a picture of what we just described.
b. What do you notice about the x-values of these projected points? Why
might it be desirable to use a collection of points like this for interpolation?
To transform the Chebyshev points from the interval [−1, 1] to the interval [a, b]
we can apply a linear transformation which maps −1 to a and 1 to b:
b−a
xj ← (xj + 1) + a
2
394 APPENDIX C. OPTIONAL MATERIAL
where the “xj ” on the left is on the interval [a, b] and the “xj ” on the right is on
the interval [−1, 1].
1
Exercise C.11. Consider the function f (x) = 1+x 2 just as we did for the
first problem in this subsection. Write code that overlays an interpolation with
linearly spaced points and interpolation with Chebyshev points. Give plots for
polynomial of order n = 2, 3, 4, . . .. Be sure to show the original function on
your plots as well. What do you notice?
f (x, y) = 0
g(x, y) = 0.
In the present problem this amounts to solving the nonlinear system of equations
x sin(y) = 0
cos(x) + sin(y 2 ) = 0.
C.2. MULTI-DIMENSIONAL NEWTON’S METHOD 395
In this case it should be clear that we are implicitly defining f (x, y) = x sin(y)
and g(x, y) = cos(x) + sin(y 2 ). A moment’s reflection (or perhaps some deep
meditation) should reveal that (±π/2, 0) are two solutions to the system, and
given the trig functions it stands to reason that (π/2 + πk, πj) will be a solution
for all integer values of k and j.
b. Now let’s do some Calculus and algebra. Your job in this part of this
problem is to follow all of the algebraic steps.
i. In one-dimensional Newton’s Method we then write the equation of a
tangent line at a point (x0 , f (x0 )) as
Take very careful note here that we didn’t divide by the Jacobian.
Why not?
iii. The final step in one-dimensional Newton’s Method was to turn the
approximation of x into an iterative process by replacing x with xn+1
and replacing x0 with xn resulting in the iterative form of Newton’s
Method
f (xn )
xn+1 = xn − 0 .
f (xn )
We can do the exact same thing in the two-dimensional version of
Newton’s Method to arrive at
xn+1 xn
= − J −1 (xn , yn )F (xn , yn ).
yn+1 yn
x sin(y) = 0
cos(x) + sin(y 2 ) = 0.
Exercise C.14. Write code to solve the present nonlinear system of equations.
Implement some sort of linear solver within your code and be able to defend your
technique. Try to pick a starting point so that you find the solution (π/2, π) on
your first attempt at solving this problem. Then play with the starting point to
verify that you can get the other solutions.
Exercise C.15. Test your code from the previous problem on the system of
nonlinear equations
1 + x2 − y 2 + ex cos(y) = 0
2xy + ex sin(y) = 0.
Note here that f (x, y) = 1 + x2 − y 2 + ex cos(y) and g(x, y) = 2xy + ex sin(y).
where J −1 (xn ) is the inverse of the Jacobian matrix evaluated at the point xn .
Take note that you should not be calculating the inverse directly, but instead
you should be using a linear solve to get the vector b where J(xn )b = F (xn ).
398 APPENDIX C. OPTIONAL MATERIAL
Exercise C.16. Write code that accepts any number of functions and an initial
vector guess and returns an approximation to the root for the problem F (x) = 0.
Exercise C.19. One place that solving nonlinear systems arises naturally is
when we need to find equilibrium points for systems of differential equations.
Remember that to find the equilibrium points for a first order differential equation
we set the derivative term to zero and solve the resulting equation.
Find the equilibrium point(s) for the system of differential equations
x0 = αx − βxy
y 0 = δy + γxy
Exercise C.20. Find the equilibrium point(s) for the system of differential
equations
x0 = −0.1xy − x
y 0 = −x + 0.9y
z 0 = cos(y) − xz
if they exist.
the wood-frame model and y units per day of the aluminum-frame model, the
selling price cannot exceed
31 1.3
10 + √ + 0.2 dollars per unit
x y
15 0.8
5+ 0.4
+ 0.08 dollars per unit
y x
for the aluminum chairs. We want to find the optimal production levels. Write
this situation as a multi-variable mathematical model, use a computer algebra
system (or by-hand computation) to find the gradient vector, and then use the
multi-variable Newton’s method to find the critical points. Classify the critical
points as either local maximums or local minimums.
ut = Duxx
being, leave the time derivatives alone we get the system of approximations
u(x0 , t) = 0 (left boundary condition) (C.1)
∂u(t, x1 ) u(t, x0 ) − 2u(t, x1 ) + u(t, x2 )
≈ D (C.2)
∂t ∆x2
∂u(t, x2 ) u(t, x1 ) − 2u(t, x2 ) + u(t, x3 )
≈ D (C.3)
∂t ∆x2
.. ..
. . (C.4)
∂u(t, x9 ) u(t, x8 ) − 2u(t, x9 ) + u(t, x10 )
≈ D (C.5)
∂t ∆x2
u(x10 , t) = 0 (right boundary condition) (C.6)
where ∆x = 0.1 in this specific close. The value of x in each of the equations is
fixed so we can view u(t, x1 ) as a different function from u(t, x2 ) which is different
from u(t, x3 ) and so on. In other words, if we let u1 = u(t, x1 ), u2 = u(t, x2 ),
. . ., u(t, x9 ) = u9 (t) we get the coupled system of ordinary differential equations
∂u1 0 − 2u1 (t) + u2 (t)
= D (C.7)
∂t ∆x2
∂u2 u1 (t) − 2u2 (t) + u3 (t)
= D (C.8)
∂t ∆x2
.. ..
. . (C.9)
∂u9 u8 (t) − 2u9 (t) + 0
= D (C.10)
∂t ∆x2
in the functions u1 , u2 , . . . , u9 .
C.3. THE METHOD OF LINES 401
The initial conditions for these ODEs are given by the initial condition function
for the PDE shown as the black points in Figure C.1. One way to think of our
new system is that the coupled ODEs track the lengths of the black dashed lines
in Figure C.1 as they evolve in time. This technique is called the method of
lines.
Now we have re-framed the problem of approximating the solution to the PDE
as a problem of numerically solving a (potentially very large) system of ODEs.
Thankfully we already know several tools for solving systems of ODEs. We just
need to choose a method for stepping through time. Our choices, from Chapter
5, are Euler’s method, the Midpoint method, and the RK4 method. However,
practitioners of numerical analysis typically lean on pre-built tools to do the
job when using the method of lines. In the case of Python, there is a nice tool
in the scipy library to do the time stepping which leverages a very powerful
(RK4-like) method for doing the time stepping. You should stop now and check
out Exercise 5.79, if you haven’t already, since it gives several of the details
about how to use scipy.integrate.odeint().
Let’s put this into practice.
Exercise C.22. The code below gives an outline for implementing the method
of lines on the heat equation as described above. Complete and implement the
code. Once you have a full implementation test different ratios D∆t/∆x2 to
demonstrate that this method does not suffer from the stability issues that we
have seen througout the PDE chapter. (Recall that the ratio D∆t/∆x2 must be
less than a particular value for our typical finite difference discretization to be
stable. Show that you can beat it here!)
# import the proper libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint # this one will do the time stepping
# Now build an array to store the time steps of the numerical solution
U = np.zeros( (len(t), len(x)) )
U[0,:] = u0(x) # put the initial condition in the correct row
The next small block of code will do all of the hard work of time stepping for
us. Your first task is to explain completely what this small block of code does.
You may want to refer to Exercise 5.79 and/or the help documentation for
scipy.integrate.odeint.
for n in range(len(t)-1):
U[n+1,:] = odeint(F, U[n,:], [0,dt])[1,:]
To complete this Exercise create several plots showing the time evolution of
the solution. As an example, Figure C.2 shows several snapshots of the time
evolution of the heat equation with the initial condition given in Figure C.1.
In this simulation we use D = 0.2 and ∆t = 0.02. Figure C.3 shows the
same solution but were we use more spatial points to arrive at a smoother
approximation. Experiment with the values of D, ∆x, and ∆t (and hence the
ratio D∆t/∆x2 ) to see if you can force the solution to become unstable.
Exercise C.23. Modify your heat equation method of lines code from the
previous exercise to demonstrate how the method works with several different
types of boundary conditions and initial conditions. Show several snapshots of
the time evolution of the solution.
Exercise C.24. We can use the method of lines approach to solving PDEs for
the more than just the heat equation. Recall the traveling wave equation
ut + vux = 0
where the parameter v is the speed of propogation of the traveling wave. Recall
further that we had all sorts of trouble getting a stable numerical solution to
this equation. Now would be a good time to refer back to the section and your
work on the traveling wave equation.
Choose an appropriate initial condition and build a method of lines numerical
solution to the traveling wave equation. Experiment with your solution and see
if you get the same stability issues that we had with the traveling wave equation
before. (Remember to choose your spatial derivative method wisely.)
We haven’t yet mentioned using the method of lines for the wave equation, and
that’s for a good reason. Recall that the 1D wave equation is utt = cuxx , and
404 APPENDIX C. OPTIONAL MATERIAL
the fact that the time derivative is second order requires us to pay a bit closer
attention – we can’t just naturally apply an ODE time stepper to a second order
time derivative. In your ODE training you likely ran into second order ordinary
differential equations in the context of harmonic oscillators. One technique for
solving these types of ODEs was to introduce a new variable for the velocity of
the oscillator and then to solve the resulting system of equations. We can do
the same thing with PDEs.
Define the velocity function v = ut and observe that the wave equation utt = cuxx
can now be written as vt = cuxx . Hence we have the system of PDEs
ut = v (C.11)
vt = cuxx . (C.12)
If we discretize the domain then at each point in the domain we have a value of
the position, u, and the velocity, v. That is to say that we have twice as many
differential equations to keep track of at each point in the spatial discretization,
and this potentially causes some housekeeping headaches in your code. One
way to manage this doubling of data is to take the even indexed entries of our
solution vector to be u and to take the odd indexed entries to be v. Thus, for
each time step the numerical solution vector will be of the form
[u0 , v0 , u1 , v1 , u2 , v2 , ...]
Exercise C.25. The code below contains a partial implementation for the
method of lines for the 1D wave equation. Pick an appropriate intial position
and velocity as well as appropriate boundary conditions on the domain x ∈ [0, 1]
(hint: start simple!). Then complete the code and produce several plots showing
the time evolution of the solution to the wave equation.
# start by importing the proper libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
dt = ???
Plotting the solution is up to you. Just keep in mind that the position function u
is in the even indexed columns of the array UV. If you wanted to plot the velocity
of the string you now have the information too!
Exercise C.26. Using the method of lines and splitting the wave equation into
a system of PDEs actually allows for a simpler implementation of non-trivial
initial velocity functions. Pause and ponder here for a moment: we almost
always took our initial velocity to be zero in all prior implementations. Why?
Why are things easier now?
Experiment with numerically solving the 1D wave equation using several different
initial positions and velocities. Moreover, modify your code to allow for different
types of boundary conditions. Produce several snapshots of your more interesting
406 APPENDIX C. OPTIONAL MATERIAL
simulations.
Exercise C.27. Hopefully by now you agree that the method of lines is a very
powerful tool for numerically solving time dependent PDEs. But, it isn’t without
its faults. Discuss the pros and cons of using the method of lines to get numerical
solutions to time dependent PDEs.
Let’s return to the heat equation for a moment. In our implementations of the
method of lines for the heat equation we made a second-order discretization in
space of the form
un+1 − 2un + un−1
uxx ≈ .
∆x2
In our implementation we coded this directly using carefully chosen indices.
However, this is another way to build this discretization efficiently. Observe that
at any time step we can produce the spatial discretization as a matrix-vector
product as follows:
−2 1 0 0 0 ··· u1
D 1 −2 1 0 0 · · · u2
∆x2 0
1 −2 1 0 · · · u3
.. ..
..
. . .
Stop now and verify that the matrix-vector product will indeed produce the
correct spatial discretization.
Using this new form of the spatial discretization we now can rewrite the PDE as
a system of ODEs in the form
∂u
= Au
∂t
T
where u = u1 u2 u3 ··· un−1 .
OK. This is all well and good, but there was nothing really wrong with the way
that we implemented the spatial derivative in the past. Let’s recall a theorem
from differential equations.
Using this theorem we now have a way to solve the associated time-dependent
system of ODEs exactly! That’s right! We can avoid the use of any time stepping
routine all together by just remembering some linear algebra (. . . ahhh linear
algebra).
Exercise C.28. Write code to solve the 1D heat equation with a second order
spatial discretization and an exact solution to the resulting system of ODEs.
There is no time stepping needed, but instead your code will need to leverage
some linear algebra.
Hint: Use np.linalg.eig to find the eigenvalues and eigenvectors for you.
Exercise C.31. What are the pros and cons for solving PDEs with an exact
solution to the coupled system of ODEs resulting from the method of lines
approach? When would you want to use this approach vs an ODE time stepper?
The Deliverable
In this project you will be turning in a well-formatted and well-written Google
Colab document. No extra code should be apparent in your document (move it
somewhere else). All code needs to run without errors. All of the blocks of code
should be preceded by thorough exposition and clear explanation. Any plots
should be well built, clear, and tell a complete story. Be sure to discuss all of
the tasks completely.
Allowed Resources:
408 APPENDIX C. OPTIONAL MATERIAL